[lttng-dev] lttng snapshots and running traces

Thibault, Daniel Daniel.Thibault at drdc-rddc.gc.ca
Thu Sep 26 15:36:27 EDT 2013


Envoyé : 26 septembre 2013 14:33

>>    Is the following scenario possible?  The consumer tries to get_subbuf, and is denied access (because tracers
>> are writing into it).  It then tries to get the next sub-buffer, but the luck of task scheduling is such that by the 
>> time it actually calls the request routine, the tracers have already overwritten the sub-buffer in question and 
>> gone on to the next one.  The get_subbuf succeeds and the consumer keeps doing its thing (the timestamps 
>> remain monotonically increasing, so there is no corruption).  The resulting snapshot on disk has a large jump 
>> in its timestamps, having lost a whole buffer cycle's worth of event records.
>> 
>>    To complicate things, the tracers then stop (for lack of event occurrences) in some sub-buffer ahead of the 
>> consumer's current position, but still short of the consumer's stop goal.  When the consumer skips over the 
>> busy sub-buffer (where the tracers are "stuck"), it would read events that apparently jump back in time (a 
>> whole buffer cycle's worth of time), so I guess it would stop there and close the snapshot?
>
> One important design note to understand this situation: even though we write in a ring-buffer, the positions 
> are free-running counters, they "never" wrap-around (well, we are dealing with 64-bit counters). In order to 
> map a position to an actual subbuffer, we use a sort of modulo. This operation gives us the actual subbuffer 
> to use and also allows us to detect if the tracer has already reused this subbuffer.

   Good to know, but it doesn't answer the question.   :-)

   If the tracers pass the consumer, and then the consumer passes the tracers, does the consumer stop copying 
the buffer contents to the snapshot trace?

* consumer reads a number of "pass n" sub-buffers (where n is the number of times the tracers have gone around)
* tracers catch up with the consumer, skip ahead, write at least one whole "pass n+1" sub-buffer
* tracers then stop or slow down
* consumer reads the "pass n+1" sub-buffer(s), then overtakes the tracers
* consumer skips over the sub-buffer where the tracers are, finds a "pass n" sub-buffer

   If the consumer doesn't close the trace at that point, it would corrupt the snapshot.

Daniel U. Thibault
Protection des systèmes et contremesures (PSC) | Systems Protection & Countermeasures (SPC)
Cyber sécurité pour les missions essentielles (CME) | Mission Critical Cyber Security (MCCS)
R & D pour la défense Canada - Valcartier (RDDC Valcartier) | Defence R&D Canada - Valcartier (DRDC Valcartier)
2459 route de la Bravoure
Québec QC  G3J 1X5
CANADA
Vox : (418) 844-4000 x4245
Fax : (418) 844-4538
NAC : 918V QSDJ <http://www.travelgis.com/map.asp?addr=918V%20QSDJ>
Gouvernement du Canada | Government of Canada
<http://www.valcartier.drdc-rddc.gc.ca/>



More information about the lttng-dev mailing list