Malformed Tracefiles

Kienan Stewart kstewart at efficios.com
Wed Jun 25 16:52:08 EDT 2025


Hi Georg,

On 6/25/25 1:27 PM, Georg Muck wrote:
> Hi Kienan,
> 
> thanks for your quick response. :)
> 
> I replaced "--daemonize" with "--background".
> 
> I guess I should briefly describe my goal to avoid confusion: I want to
> trace a ROS2-session online (tried LTTng-live, but it costs too much
> ressources, hence the aggressive rotation of 1s). LTTng gives me the path
> to a newly created trace-file as soon as it is created. This path is
> forwarded to a Thread-Pool that does the rest (analyses each tracefile by
> itself)… . However, the tracefiles are removed in the end, cause I'm not
> interested in them, only the program should be interested in them (forgot
> to remove it in the example). The problem only occurs when I start to many
> ROS2 instances at once, meaning gathering too many tracepoints at once (at
> once meaning in a very short amount of time). It works fine for 1-2
> instances, but it doesn't work for 16 instances.
> 
>> 1. *Before any traced application is run, babeltrace2 has the following
> error output:* Ah ok, thought I would be on the right track.
>> 2. *After a traced application is run whether is produces enabled events
> or not, I see the events recorded*: What do you mean by that? There is no
> babeltrace2 error if traced instances are running, even tho they are not
> creating events?

This is what I observed with a fresh start:

before running any traced application: bt2 error with no traced content

after running a traced application with no events: bt2 no longer 
produced an error, but output 0 events (as expected).

In this case I expect that it is because the consumer started and basic 
metadata was available so that was written in the snapshot. If I recall 
correctly, the consumer daemon only starts when the first application 
connects

after running a traced application with some events: as usual
>> 3. *When your session script stops, the output folder is deleted*: like
> described, forgot to remove, but it is on purpose
> 
> I tried *tp**:tptest*, but there is no output shown by babeltrace2 (is that
> intended?):
> 

I had the 100 events produced by gen-ust-events binary that invoked. You 
ran it but the events weren't recorded? Regardless I think we're on the 
same page that the session setup program you are using is fine right now.

> [georg at KARTOFF8xE-zenbook /tmp/structural_traces]$ babeltrace2 archives
> [georg at KARTOFF8xE-zenbook /tmp/structural_traces]$
> 
> 
> However, I figured something out:
> When I trace the "ros:rcl_node_init"-Tracepoint and start 12 ROS2
> Instances, I receive the following output for the entire archive-directory:
> [georg at KARTOFF8xE-zenbook /tmp/structural_traces/archives]$ babeltrace2 .
> 
> [19:01:41.474172508] (+?.?????????) KARTOFF8xE-zenbook:listener:(24897)
> ros2:rcl_node_init: { cpu_id = 1 }, { procname = "listener", vpid = 24897
> }, { node_handle = 0x5CCBD0EFA280, rmw_handle = 0x5CCBD107A420, node_name =
> "saturn_listener", namespace = "/hallo/saturn" }
> [19:01:41.493613263] (+0.019440755) KARTOFF8xE-zenbook:talker:(24898)
> ros2:rcl_node_init: { cpu_id = 4 }, { procname = "talker", vpid = 24898 },
> { node_handle = 0x5A8E10DFC6E0, rmw_handle = 0x5A8E10F8AA60, node_name =
> "erde_talker", namespace = "/hallo/erde" }
> 
> All coming from archive-file number 10:
> [georg at KARTOFF8xE-zenbook
> /tmp/structural_traces/archives/20250625T190141+0200-20250625T190142+0200-10]$
> babeltrace2 .
> 
> [19:01:41.474172508] (+?.?????????) KARTOFF8xE-zenbook:listener:(*24897*)
> ros2:rcl_node_init: { cpu_id = 1 }, { procname = "listener", vpid = 24897
> }, { node_handle = 0x5CCBD0EFA280, rmw_handle = 0x5CCBD107A420, node_name =
> "saturn_listener", namespace = "/hallo/saturn" }
> [19:01:41.493613263] (+0.019440755) KARTOFF8xE-zenbook:talker:(24898)
> ros2:rcl_node_init: { cpu_id = 4 }, { procname = "talker", vpid = 24898 },
> { node_handle = 0x5A8E10DFC6E0, rmw_handle = 0x5A8E10F8AA60, node_name =
> "erde_talker", namespace = "/hallo/erde" }
> 
> However, looking deeper into that file I saw that there are all 12
> instances (+some of ros2 itself) I started within the trace-directory
> number 10!
> [georg at KARTOFF8xE-zenbook
> /tmp/structural_traces/archives/20250625T190141+0200-20250625T190142+0200-10/ust/pid]$
> ls
> 
> listener-24897-20250625-190141  listener-24903-20250625-190141
>   ros2-24447-20250625-190131  ros2-24887-20250625-190141
>   talker-24900-20250625-190141  talker-24916-20250625-190141
> listener-24899-20250625-190141  listener-24914-20250625-190141
>   ros2-24451-20250625-190131  talker-24896-20250625-190141
>   talker-24902-20250625-190141
> listener-24901-20250625-190141  listener-24917-20250625-190141
>   ros2-24834-20250625-190134  talker-24898-20250625-190141
>   talker-24904-20250625-190141
> 
> BUT:
> I looked into the specific directory of a traced instance (underlined it)
> [georg at KARTOFF8xE-zenbook
> /tmp/structural_traces/archives/20250625T190141+0200-20250625T190142+0200-10/ust/pid]$
> ls listener-*24897*-20250625-190141/
> 
> index     structuralChannel_0  structuralChannel_2  structuralChannel_4
>   structuralChannel_6  structuralChannel_8  structuralChannel_10
> *metadata*  structuralChannel_1  structuralChannel_3  structuralChannel_5
>   structuralChannel_7  structuralChannel_9  structuralChannel_11
> 
> and of a non-traced instance:
> [georg at KARTOFF8xE-zenbook	
> /tmp/structural_traces/archives/20250625T190141+0200-20250625T190142+0200-10/ust/pid]$
> ls listener-24899-20250625-190141
> 
> index                structuralChannel_1  structuralChannel_3
>   structuralChannel_5  structuralChannel_7  structuralChannel_9
> structuralChannel_11
> structuralChannel_0  structuralChannel_2  structuralChannel_4
>   structuralChannel_6  structuralChannel_8  structuralChannel_10
> 
> Turns out: There is no metadata file within the second one. Therefore
> babeltrace2 is not able to parse the file. Do you have an idea why?
> 

 From your later e-mail with the updated sessiond log which shows the 
PERROR for too many open files it sounds like you have a thread to pull on.

You can try using the per-uid buffering scheme to reduce the number of 
FDs required when running 12 concurrent instances of ros.

> Best regards,
> Georg
> 
> Never burn the bridges before you
> If you've burned the bridges behind
> Never burn the one that you're walking upon
> Or you're sure to hit the water in time. *(Savatage, Streets, 1991)*
> 
> 
> Am Mi., 25. Juni 2025 um 17:50 Uhr schrieb Kienan Stewart <
> kstewart at efficios.com>:
> 
>> Hi Georg,
>>
>> On 6/25/25 9:36 AM, Georg Muck wrote:
>>> Hi Kienan,
>>>
>>> I did exactly that, without success.
>>>
>>> The log of the lttng-sessiond (lttng.log) did not change after starting
>> the
>>> ROS2 instances. I have also added the output of "lttng list
>>> structuralSession"
>>> I start my tracing instance as follows:
>>>
>>> At first there is a bash-script that sets everything up, it does the
>>> following:
>>>
>>> lttng destroy -a # to destroy all existing sessions
>>> lttng-sessiond --daemonize --verbose-consumer -vvv &> "/tmp/lttng.log" #
>>
>> When started with '--daemonize', the file descriptors are closed[1]
>> after forking which is why you don't see any further output to
>> /tmp/lttng.log. Try with '--background' instead.
>>
>>> starting a new session Deamon
>>> ./exemplary_tracing_session # starting my exemplary tracing session
>>>
>>> The exemplary tracing session is written by the help of the c Api
>>> (appendix).
>>>
>>
>> I modified your script to add 'tp:tptest' (a tracepoint from the
>> lttng-tools test suite program gen-ust-events) and did the following:
>>
>> ```
>> $ lttng-sessiond --background --verbose-consumer -vvv &> sessiond.log
>> $ ./exemplary_tracing_session &
>> $ lttng-tools/./tests/utils/testapp/gen-ust-events/gen-ust-events
>> babeltrace2 /tmp/structural_traces | wc -l
>> 100
>> ```
>>
>> It seems to be working fine for me. A couple of notes:
>>
>> 1. Before any traced application is run, babeltrace2 has the following
>> error output: No trace was found based on input `/tmp/structural_traces`
>> 2. After a traced application is run whether is produces enabled events
>> or not, I see the events recorded.
>> 3. When your session script stops, the output folder is deleted.
>>
>>> I started the the same procedure on another machine. The same fault
>>
>> The fault is that the babeltrace2 has an error, right? Are you willing
>> to share your trace that has the problem and/or show the output of
>> babeltrace2's error?
>>
>>> occurred. I think I need to configure more within the `lttngDomain`. But
>> I
>>> don't know how, would you have an idea/hint?
>>>
>>> Thanks for your help.
>>>
>>> Best regards,
>>> Georg
>>>
>>> Ps.: I also added the log of my tracing tool (structural.log)
>>>
>>> Never burn the bridges before you
>>> If you've burned the bridges behind
>>> Never burn the one that you're walking upon
>>> Or you're sure to hit the water in time. *(Savatage, Streets, 1991)*
>>>
>>>
>>> Am Mo., 23. Juni 2025 um 16:26 Uhr schrieb Kienan Stewart <
>>> kstewart at efficios.com>:
>>>
>>>> Hi Georg,
>>>>
>>>> a good first step is to run lttng-session with `-vvv --verbose-consumer`
>>>> and inspect the log file and `LTTNG_UST_DEBUG=1` in the environment.
>>>>
>>>> It may also be useful to run your traced applications with
>>>> `LTTNG_UST_DEBUG=1` set in the environment.
>>>>
>>>> Do you have steps that reproduce the issue? E.g., all the commands to
>>>> set up the tracing session and so forth.
>>>>
>>>> thanks,
>>>> kienan
>>>>
>>>> On 6/19/25 10:58 AM, Georg Muck via lttng-dev wrote:
>>>>> Hey,
>>>>>
>>>>> I am using the LTTng tracer (2.13.11, C-API, rotation=1s) to trace my
>>>> ROS2
>>>>> system (userspace only). Everything works just fine within my Docker
>>>>> Container (Ubuntu 24.04), but if I start the same procedure on the host
>>>>> itself (also Ubuntu 24.04), it does not create a TraceFile if to many
>>>>> tracepoints occur (speaking of roundabout 1500+ per second) (sometimes
>> it
>>>>> malformed the tracefile, but I was not able to recreate that. If I
>>>> remember
>>>>> correctly, Babeltrace2 threw an error that said that there was no
>>>>> metadata-file). I played around with some configurations (like
>> increasing
>>>>> the buffer size from ~16kB to ~512kB or a less aggressive rotation),
>> but
>>>>> the problem stayed.
>>>>>
>>>>> Would be thankful for help. I guess I delivered to less information,
>>>> please
>>>>> reach out what is interesting at this point, I am not sure about that
>> :).
>>>>>
>>>>> Best regards,
>>>>> Georg
>>>>>
>>>>>
>>>>>
>>>>> Never burn the bridges before you
>>>>> If you've burned the bridges behind
>>>>> Never burn the one that you're walking upon
>>>>> Or you're sure to hit the water in time. *(Savatage, Streets, 1991)*
>>>>>
>>>>
>>>>
>>>
>>
>> thanks,
>> kienan
>>
>> [1]: https://lttng.org/man/8/lttng-sessiond/v2.13/#doc-opt--daemonize
>>
> 



More information about the lttng-dev mailing list