[ltt-dev] liburcu cache line size

Tue Aug 17 16:53:00 EDT 2010


On 10-08-17 04:51 PM, Mathieu Desnoyers wrote:
> * David Goulet (david.goulet at polymtl.ca) wrote:
>>
>>
>> On 10-08-17 04:24 PM, Mathieu Desnoyers wrote:
>>> * David Goulet (david.goulet at polymtl.ca) wrote:
>>>> On 10-08-17 03:45 PM, Mathieu Desnoyers wrote:
>>> [...]
>>>>> Yes. The performance degradation caused by cache-line bouncing is _way_
>>>>> worse than extra cache pressure.
>>>>>
>>>>
>>>> There is something I don't understand here. Correct me if (most likely)
>>>> I am wrong.
>>>>
>>>> How cache line bouncing is affected by the cache line size? If I
>>>> understand correctly, cache line bounce is the problem where CPUs shares
>>>> data and have to fetch it from CPU0 to CPU7 (between caches). And, I
>>>> surely agree, this is costly!
>>>
>>> That's ok up to here.
>>>
>>>>
>>>> However, if the size of the cache is bigger then the normal cache, you
>>>> just loose space... For arch with 64 cache line size, you loose two line
>>>> per structure aligned... How lowering down to 64 bytes will cause cache
>>>> line bouncing?
>>>
>>> Let's take the following example:
>>>
>>> A multiprocessor machine with 256 bytes cache line size.
>>> The program is built thinking the cache line size is only 128 bytes.
>>>
>>> So we allocate an array of what we hope are per-cpu variables:
>>>
>>>    malloc(nr_cpus * sizeof(struct type));
>>>
>>> Where struct type is __attribute__((aligned(128))
>>>
>>> So we end up having two structures sharing a cache-line, and these will
>>> bounce between CPUs, even though the structures are not shared: only the
>>> cache-lines are shared, because the structures happen to be on the same
>>> cache line.
>>>
>>> So for allocation of individual objects which are meant to be per-cpu,
>>> e.g. a structure controlling the per-cpu buffer, the allocator can put
>>> one structure next to another (belonging to another cpu), thus causing
>>> cache line bouncing.
>>>
>>> This phenomenon is called "false sharing".
>>>
>>
>> Very nice. That clarify yes!
>>
>> However, please refer to Intel® 64 and IA-32 Architectures Software
>> Developer's Manual Volume 3A: System Programming Guide.
>>
>> http://www.intel.com/Assets/PDF/manual/253668.pdf
>>
>> P. 527, Table 11-1
>>
>> • Pentium 4 and Intel Xeon processors (Based on Intel NetBurst
>>   microarchitecture): 8-KByte, 4-way set associative, 64-byte cache line
>> size.
>> • Pentium 4 and Intel Xeon processors (Based on Intel NetBurst
>>   microarchitecture): 16-KByte, 8-way set associative, 64-byte cache line
>> size.
>
> Dunno why the Linux kernel choses that for P4. But we definitely have to
> handle NUMA systems.
>

arch_numa.h ... possible?

> Mathieu
>
>>
>> David
>>
>>> Mathieu
>>>
>>>>
>>>> Thanks for your help on that!
>>>> David
>>>>
>>>
>>
>> --
>> David Goulet
>> LTTng project, DORSAL Lab.
>>
>> PGP/GPG : 1024D/16BD8563
>> BE3C 672B 9331 9796 291A  14C6 4AF7 C14B 16BD 8563
>>
>

-- 
David Goulet
LTTng project, DORSAL Lab.

PGP/GPG : 1024D/16BD8563
BE3C 672B 9331 9796 291A  14C6 4AF7 C14B 16BD 8563