[lttng-dev] lttng snapshots and running traces

Julien Desfossez jdesfossez at efficios.com
Thu Sep 26 14:33:19 EDT 2013



On 13-09-26 02:13 PM, Thibault, Daniel wrote:
> -----Message d'origine-----
> De : Julien Desfossez [mailto:jdesfossez at efficios.com] 
> Envoyé : 26 septembre 2013 12:16
> 
>> When the consumer needs to read a subbuffer, it performs a get_subbuf operation. If this operation succeeds, 
>> the consumer has exclusive access to this subbuffer and so it can read it safely. If it fails, it means the tracer is 
>> currently using it and in the case of snapshots, the consumer will just skip to the next one (up to the end 
>> position it defined at the beginning of the snapshot).
>> So if there are gaps for a stream in the recorded snapshot, their size will be a multiple of the subbuffer size.
>>
>> Julien
> 
>    Ah, so it occurs at the sub-buffer level.  Simple and efficient.
> 
>    Is the following scenario possible?  The consumer tries to get_subbuf, and is denied access (because tracers are writing into it).  It then tries to get the next sub-buffer, but the luck of task scheduling is such that by the time it actually calls the request routine, the tracers have already overwritten the sub-buffer in question and gone on to the next one.  The get_subbuf succeeds and the consumer keeps doing its thing (the timestamps remain monotonically increasing, so there is no corruption).  The resulting snapshot on disk has a large jump in its timestamps, having lost a whole buffer cycle's worth of event records.
> 
>    To complicate things, the tracers then stop (for lack of event occurrences) in some sub-buffer ahead of the consumer's current position, but still short of the consumer's stop goal.  When the consumer skips over the busy sub-buffer (where the tracers are "stuck"), it would read events that apparently jump back in time (a whole buffer cycle's worth of time), so I guess it would stop there and close the snapshot?

One important design note to understand this situation : even though we
write in a ring-buffer, the positions are free-running counters, they
"never" wrap-around (well, we are dealing with 64-bit counters). In
order to map a position to an actual subbuffer, we use a sort of modulo.
This operation gives us the actual subbuffer to use and also allows us
to detect if the tracer has already reused this subbuffer.

> 
>    On a somewhat related topic, how does the session daemon handle the context added to a channel?  Is it a set of instructions (flags, really) sent to the tracer, similar to filter pseudo-code (but much less complicated)?
The contexts are kind of "hooks" activated at the channel level that are
activated when an event is recorded. It is not code sent by the session
daemon to the tracer (unlike filters), the code to fetch the information
is already in the tracer. So it is basically the same kind of operation
as activating an event in a channel.

Julien

> 
> Daniel U. Thibault
> Protection des systèmes et contremesures (PSC) | Systems Protection & Countermeasures (SPC)
> Cyber sécurité pour les missions essentielles (CME) | Mission Critical Cyber Security (MCCS)
> R & D pour la défense Canada - Valcartier (RDDC Valcartier) | Defence R&D Canada - Valcartier (DRDC Valcartier)
> 2459 route de la Bravoure
> Québec QC  G3J 1X5
> CANADA
> Vox : (418) 844-4000 x4245
> Fax : (418) 844-4538
> NAC : 918V QSDJ <http://www.travelgis.com/map.asp?addr=918V%20QSDJ>
> Gouvernement du Canada | Government of Canada
> <http://www.valcartier.drdc-rddc.gc.ca/>
> 
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> 



More information about the lttng-dev mailing list