[ltt-dev] liburcu cache line size

Tue Aug 17 16:55:05 EDT 2010

* David Goulet (david.goulet at polymtl.ca) wrote:
>
>
> On 10-08-17 04:51 PM, Mathieu Desnoyers wrote:
>> * David Goulet (david.goulet at polymtl.ca) wrote:
>>>
>>>
>>> On 10-08-17 04:24 PM, Mathieu Desnoyers wrote:
>>>> * David Goulet (david.goulet at polymtl.ca) wrote:
>>>>> On 10-08-17 03:45 PM, Mathieu Desnoyers wrote:
>>>> [...]
>>>>>> Yes. The performance degradation caused by cache-line bouncing is _way_
>>>>>> worse than extra cache pressure.
>>>>>>
>>>>>
>>>>> There is something I don't understand here. Correct me if (most likely)
>>>>> I am wrong.
>>>>>
>>>>> How cache line bouncing is affected by the cache line size? If I
>>>>> understand correctly, cache line bounce is the problem where CPUs shares
>>>>> data and have to fetch it from CPU0 to CPU7 (between caches). And, I
>>>>> surely agree, this is costly!
>>>>
>>>> That's ok up to here.
>>>>
>>>>>
>>>>> However, if the size of the cache is bigger then the normal cache, you
>>>>> just loose space... For arch with 64 cache line size, you loose two line
>>>>> per structure aligned... How lowering down to 64 bytes will cause cache
>>>>> line bouncing?
>>>>
>>>> Let's take the following example:
>>>>
>>>> A multiprocessor machine with 256 bytes cache line size.
>>>> The program is built thinking the cache line size is only 128 bytes.
>>>>
>>>> So we allocate an array of what we hope are per-cpu variables:
>>>>
>>>>    malloc(nr_cpus * sizeof(struct type));
>>>>
>>>> Where struct type is __attribute__((aligned(128))
>>>>
>>>> So we end up having two structures sharing a cache-line, and these will
>>>> bounce between CPUs, even though the structures are not shared: only the
>>>> cache-lines are shared, because the structures happen to be on the same
>>>> cache line.
>>>>
>>>> So for allocation of individual objects which are meant to be per-cpu,
>>>> e.g. a structure controlling the per-cpu buffer, the allocator can put
>>>> one structure next to another (belonging to another cpu), thus causing
>>>> cache line bouncing.
>>>>
>>>> This phenomenon is called "false sharing".
>>>>
>>>
>>> Very nice. That clarify yes!
>>>
>>> However, please refer to Intel® 64 and IA-32 Architectures Software
>>> Developer's Manual Volume 3A: System Programming Guide.
>>>
>>> http://www.intel.com/Assets/PDF/manual/253668.pdf
>>>
>>> P. 527, Table 11-1
>>>
>>> • Pentium 4 and Intel Xeon processors (Based on Intel NetBurst
>>>   microarchitecture): 8-KByte, 4-way set associative, 64-byte cache line
>>> size.
>>> • Pentium 4 and Intel Xeon processors (Based on Intel NetBurst
>>>   microarchitecture): 16-KByte, 8-way set associative, 64-byte cache line
>>> size.
>>
>> Dunno why the Linux kernel choses that for P4. But we definitely have to
>> handle NUMA systems.
>>
>
> arch_numa.h ... possible?

See my comment to Alexandre about multiplying the number of targets
needlessly. Which one will be chosen by distros ?

We'll do it if you can find a real-world benchmark that is affected by
this. Good luck ;)

Mathieu

>
>> Mathieu
>>
>>>
>>> David
>>>
>>>> Mathieu
>>>>
>>>>>
>>>>> Thanks for your help on that!
>>>>> David
>>>>>
>>>>
>>>
>>> --
>>> David Goulet
>>> LTTng project, DORSAL Lab.
>>>
>>> PGP/GPG : 1024D/16BD8563
>>> BE3C 672B 9331 9796 291A  14C6 4AF7 C14B 16BD 8563
>>>
>>
>
> -- 
> David Goulet
> LTTng project, DORSAL Lab.
>
> PGP/GPG : 1024D/16BD8563
> BE3C 672B 9331 9796 291A  14C6 4AF7 C14B 16BD 8563
>

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com