[ltt-dev] cli/sti vs local_cmpxchg and local_add_return

Mathieu Desnoyers mathieu.desnoyers at polymtl.ca
Mon Mar 23 12:50:09 EDT 2009


* Alan D. Brunelle (Alan.Brunelle at hp.com) wrote:
> Here are the results for:
> 
> processor  : 31
> vendor     : GenuineIntel
> arch       : IA-64
> family     : 32
> model      : 0
> model name : Dual-Core Intel(R) Itanium(R) 2 Processor 9050
> revision   : 7
> archrev    : 0
> features   : branchlong, 16-byte atomic ops
> cpu number : 0
> cpu regs   : 4
> cpu MHz    : 1598.002
> itc MHz    : 400.000000
> BogoMIPS   : 3186.68
> siblings   : 2
> physical id: 196865
> core id    : 1
> thread id  : 0
> 
> test init
> test results: time for baseline
> number of loops: 20000
> total time: 5002
> -> baseline takes 0 cycles
> test end
> test results: time for locked cmpxchg
> number of loops: 20000
> total time: 60083
> -> locked cmpxchg takes 3 cycles
> test end
> test results: time for non locked cmpxchg
> number of loops: 20000
> total time: 60002
> -> non locked cmpxchg takes 3 cycles
> test end
> test results: time for locked add return
> number of loops: 20000
> total time: 155007
> -> locked add return takes 7 cycles
> test end
> test results: time for non locked add return
> number of loops: 20000
> total time: 155004
> -> non locked add return takes 7 cycles
> test end
> test results: time for enabling interrupts (STI)
> number of loops: 20000
> total time: 45003
> -> enabling interrupts (STI) takes 2 cycles
> test end
> test results: time for disabling interrupts (CLI)
> number of loops: 20000
> total time: 59998
> -> disabling interrupts (CLI) takes 2 cycles
> test end
> test results: time for disabling/enabling interrupts (STI/CLI)
> number of loops: 20000
> total time: 107274
> -> enabling/disabling interrupts (STI/CLI) takes 5 cycles
> test end

Hi Alan,

Wow, disabling interrupts is incredibly cheap on the ia64, and
local_add_return especially costly. I think it's because it is done by
an underlying cmpxchg, and therefore not supported directly by the
architecture (except for the fetch add which is limited to very specific
values).

Given some ia64 code refers to NMIs, I guess this architecture supports
them. So in the end, the decision between speed and atomicity will
depend on a solidness vs speed tradeoff. But given the time it takes to
write data to memory, I think 5 cycles vs 10 cycles won't make a big
difference overall.

Thanks for those results !

Mathieu

> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68




More information about the lttng-dev mailing list