Malformed Tracefiles

Georg Muck georgmuck1 at gmail.com
Thu Jun 26 04:22:18 EDT 2025


Hi Kienan,

switching to tracing per uid solved the Problem.

Huge thanks for your patients!


Best regards,

Georg


Never burn the bridges before you
If you've burned the bridges behind
Never burn the one that you're walking upon
Or you're sure to hit the water in time. *(Savatage, Streets, 1991)*

Kienan Stewart <kstewart at efficios.com> schrieb am Mi., 25. Juni 2025, 22:52:

> Hi Georg,
>
> On 6/25/25 1:27 PM, Georg Muck wrote:
> > Hi Kienan,
> >
> > thanks for your quick response. :)
> >
> > I replaced "--daemonize" with "--background".
> >
> > I guess I should briefly describe my goal to avoid confusion: I want to
> > trace a ROS2-session online (tried LTTng-live, but it costs too much
> > ressources, hence the aggressive rotation of 1s). LTTng gives me the path
> > to a newly created trace-file as soon as it is created. This path is
> > forwarded to a Thread-Pool that does the rest (analyses each tracefile by
> > itself)… . However, the tracefiles are removed in the end, cause I'm not
> > interested in them, only the program should be interested in them (forgot
> > to remove it in the example). The problem only occurs when I start to
> many
> > ROS2 instances at once, meaning gathering too many tracepoints at once
> (at
> > once meaning in a very short amount of time). It works fine for 1-2
> > instances, but it doesn't work for 16 instances.
> >
> >> 1. *Before any traced application is run, babeltrace2 has the following
> > error output:* Ah ok, thought I would be on the right track.
> >> 2. *After a traced application is run whether is produces enabled events
> > or not, I see the events recorded*: What do you mean by that? There is no
> > babeltrace2 error if traced instances are running, even tho they are not
> > creating events?
>
> This is what I observed with a fresh start:
>
> before running any traced application: bt2 error with no traced content
>
> after running a traced application with no events: bt2 no longer
> produced an error, but output 0 events (as expected).
>
> In this case I expect that it is because the consumer started and basic
> metadata was available so that was written in the snapshot. If I recall
> correctly, the consumer daemon only starts when the first application
> connects
>
> after running a traced application with some events: as usual
> >> 3. *When your session script stops, the output folder is deleted*: like
> > described, forgot to remove, but it is on purpose
> >
> > I tried *tp**:tptest*, but there is no output shown by babeltrace2 (is
> that
> > intended?):
> >
>
> I had the 100 events produced by gen-ust-events binary that invoked. You
> ran it but the events weren't recorded? Regardless I think we're on the
> same page that the session setup program you are using is fine right now.
>
> > [georg at KARTOFF8xE-zenbook /tmp/structural_traces]$ babeltrace2 archives
> > [georg at KARTOFF8xE-zenbook /tmp/structural_traces]$
> >
> >
> > However, I figured something out:
> > When I trace the "ros:rcl_node_init"-Tracepoint and start 12 ROS2
> > Instances, I receive the following output for the entire
> archive-directory:
> > [georg at KARTOFF8xE-zenbook /tmp/structural_traces/archives]$ babeltrace2
> .
> >
> > [19:01:41.474172508] (+?.?????????) KARTOFF8xE-zenbook:listener:(24897)
> > ros2:rcl_node_init: { cpu_id = 1 }, { procname = "listener", vpid = 24897
> > }, { node_handle = 0x5CCBD0EFA280, rmw_handle = 0x5CCBD107A420,
> node_name =
> > "saturn_listener", namespace = "/hallo/saturn" }
> > [19:01:41.493613263] (+0.019440755) KARTOFF8xE-zenbook:talker:(24898)
> > ros2:rcl_node_init: { cpu_id = 4 }, { procname = "talker", vpid = 24898
> },
> > { node_handle = 0x5A8E10DFC6E0, rmw_handle = 0x5A8E10F8AA60, node_name =
> > "erde_talker", namespace = "/hallo/erde" }
> >
> > All coming from archive-file number 10:
> > [georg at KARTOFF8xE-zenbook
> >
> /tmp/structural_traces/archives/20250625T190141+0200-20250625T190142+0200-10]$
> > babeltrace2 .
> >
> > [19:01:41.474172508] (+?.?????????) KARTOFF8xE-zenbook:listener:(*24897*)
> > ros2:rcl_node_init: { cpu_id = 1 }, { procname = "listener", vpid = 24897
> > }, { node_handle = 0x5CCBD0EFA280, rmw_handle = 0x5CCBD107A420,
> node_name =
> > "saturn_listener", namespace = "/hallo/saturn" }
> > [19:01:41.493613263] (+0.019440755) KARTOFF8xE-zenbook:talker:(24898)
> > ros2:rcl_node_init: { cpu_id = 4 }, { procname = "talker", vpid = 24898
> },
> > { node_handle = 0x5A8E10DFC6E0, rmw_handle = 0x5A8E10F8AA60, node_name =
> > "erde_talker", namespace = "/hallo/erde" }
> >
> > However, looking deeper into that file I saw that there are all 12
> > instances (+some of ros2 itself) I started within the trace-directory
> > number 10!
> > [georg at KARTOFF8xE-zenbook
> >
> /tmp/structural_traces/archives/20250625T190141+0200-20250625T190142+0200-10/ust/pid]$
> > ls
> >
> > listener-24897-20250625-190141  listener-24903-20250625-190141
> >   ros2-24447-20250625-190131  ros2-24887-20250625-190141
> >   talker-24900-20250625-190141  talker-24916-20250625-190141
> > listener-24899-20250625-190141  listener-24914-20250625-190141
> >   ros2-24451-20250625-190131  talker-24896-20250625-190141
> >   talker-24902-20250625-190141
> > listener-24901-20250625-190141  listener-24917-20250625-190141
> >   ros2-24834-20250625-190134  talker-24898-20250625-190141
> >   talker-24904-20250625-190141
> >
> > BUT:
> > I looked into the specific directory of a traced instance (underlined it)
> > [georg at KARTOFF8xE-zenbook
> >
> /tmp/structural_traces/archives/20250625T190141+0200-20250625T190142+0200-10/ust/pid]$
> > ls listener-*24897*-20250625-190141/
> >
> > index     structuralChannel_0  structuralChannel_2  structuralChannel_4
> >   structuralChannel_6  structuralChannel_8  structuralChannel_10
> > *metadata*  structuralChannel_1  structuralChannel_3  structuralChannel_5
> >   structuralChannel_7  structuralChannel_9  structuralChannel_11
> >
> > and of a non-traced instance:
> > [georg at KARTOFF8xE-zenbook
> >
> /tmp/structural_traces/archives/20250625T190141+0200-20250625T190142+0200-10/ust/pid]$
> > ls listener-24899-20250625-190141
> >
> > index                structuralChannel_1  structuralChannel_3
> >   structuralChannel_5  structuralChannel_7  structuralChannel_9
> > structuralChannel_11
> > structuralChannel_0  structuralChannel_2  structuralChannel_4
> >   structuralChannel_6  structuralChannel_8  structuralChannel_10
> >
> > Turns out: There is no metadata file within the second one. Therefore
> > babeltrace2 is not able to parse the file. Do you have an idea why?
> >
>
>  From your later e-mail with the updated sessiond log which shows the
> PERROR for too many open files it sounds like you have a thread to pull on.
>
> You can try using the per-uid buffering scheme to reduce the number of
> FDs required when running 12 concurrent instances of ros.
>
> > Best regards,
> > Georg
> >
> > Never burn the bridges before you
> > If you've burned the bridges behind
> > Never burn the one that you're walking upon
> > Or you're sure to hit the water in time. *(Savatage, Streets, 1991)*
> >
> >
> > Am Mi., 25. Juni 2025 um 17:50 Uhr schrieb Kienan Stewart <
> > kstewart at efficios.com>:
> >
> >> Hi Georg,
> >>
> >> On 6/25/25 9:36 AM, Georg Muck wrote:
> >>> Hi Kienan,
> >>>
> >>> I did exactly that, without success.
> >>>
> >>> The log of the lttng-sessiond (lttng.log) did not change after starting
> >> the
> >>> ROS2 instances. I have also added the output of "lttng list
> >>> structuralSession"
> >>> I start my tracing instance as follows:
> >>>
> >>> At first there is a bash-script that sets everything up, it does the
> >>> following:
> >>>
> >>> lttng destroy -a # to destroy all existing sessions
> >>> lttng-sessiond --daemonize --verbose-consumer -vvv &> "/tmp/lttng.log"
> #
> >>
> >> When started with '--daemonize', the file descriptors are closed[1]
> >> after forking which is why you don't see any further output to
> >> /tmp/lttng.log. Try with '--background' instead.
> >>
> >>> starting a new session Deamon
> >>> ./exemplary_tracing_session # starting my exemplary tracing session
> >>>
> >>> The exemplary tracing session is written by the help of the c Api
> >>> (appendix).
> >>>
> >>
> >> I modified your script to add 'tp:tptest' (a tracepoint from the
> >> lttng-tools test suite program gen-ust-events) and did the following:
> >>
> >> ```
> >> $ lttng-sessiond --background --verbose-consumer -vvv &> sessiond.log
> >> $ ./exemplary_tracing_session &
> >> $ lttng-tools/./tests/utils/testapp/gen-ust-events/gen-ust-events
> >> babeltrace2 /tmp/structural_traces | wc -l
> >> 100
> >> ```
> >>
> >> It seems to be working fine for me. A couple of notes:
> >>
> >> 1. Before any traced application is run, babeltrace2 has the following
> >> error output: No trace was found based on input `/tmp/structural_traces`
> >> 2. After a traced application is run whether is produces enabled events
> >> or not, I see the events recorded.
> >> 3. When your session script stops, the output folder is deleted.
> >>
> >>> I started the the same procedure on another machine. The same fault
> >>
> >> The fault is that the babeltrace2 has an error, right? Are you willing
> >> to share your trace that has the problem and/or show the output of
> >> babeltrace2's error?
> >>
> >>> occurred. I think I need to configure more within the `lttngDomain`.
> But
> >> I
> >>> don't know how, would you have an idea/hint?
> >>>
> >>> Thanks for your help.
> >>>
> >>> Best regards,
> >>> Georg
> >>>
> >>> Ps.: I also added the log of my tracing tool (structural.log)
> >>>
> >>> Never burn the bridges before you
> >>> If you've burned the bridges behind
> >>> Never burn the one that you're walking upon
> >>> Or you're sure to hit the water in time. *(Savatage, Streets, 1991)*
> >>>
> >>>
> >>> Am Mo., 23. Juni 2025 um 16:26 Uhr schrieb Kienan Stewart <
> >>> kstewart at efficios.com>:
> >>>
> >>>> Hi Georg,
> >>>>
> >>>> a good first step is to run lttng-session with `-vvv
> --verbose-consumer`
> >>>> and inspect the log file and `LTTNG_UST_DEBUG=1` in the environment.
> >>>>
> >>>> It may also be useful to run your traced applications with
> >>>> `LTTNG_UST_DEBUG=1` set in the environment.
> >>>>
> >>>> Do you have steps that reproduce the issue? E.g., all the commands to
> >>>> set up the tracing session and so forth.
> >>>>
> >>>> thanks,
> >>>> kienan
> >>>>
> >>>> On 6/19/25 10:58 AM, Georg Muck via lttng-dev wrote:
> >>>>> Hey,
> >>>>>
> >>>>> I am using the LTTng tracer (2.13.11, C-API, rotation=1s) to trace my
> >>>> ROS2
> >>>>> system (userspace only). Everything works just fine within my Docker
> >>>>> Container (Ubuntu 24.04), but if I start the same procedure on the
> host
> >>>>> itself (also Ubuntu 24.04), it does not create a TraceFile if to many
> >>>>> tracepoints occur (speaking of roundabout 1500+ per second)
> (sometimes
> >> it
> >>>>> malformed the tracefile, but I was not able to recreate that. If I
> >>>> remember
> >>>>> correctly, Babeltrace2 threw an error that said that there was no
> >>>>> metadata-file). I played around with some configurations (like
> >> increasing
> >>>>> the buffer size from ~16kB to ~512kB or a less aggressive rotation),
> >> but
> >>>>> the problem stayed.
> >>>>>
> >>>>> Would be thankful for help. I guess I delivered to less information,
> >>>> please
> >>>>> reach out what is interesting at this point, I am not sure about that
> >> :).
> >>>>>
> >>>>> Best regards,
> >>>>> Georg
> >>>>>
> >>>>>
> >>>>>
> >>>>> Never burn the bridges before you
> >>>>> If you've burned the bridges behind
> >>>>> Never burn the one that you're walking upon
> >>>>> Or you're sure to hit the water in time. *(Savatage, Streets, 1991)*
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >> thanks,
> >> kienan
> >>
> >> [1]: https://lttng.org/man/8/lttng-sessiond/v2.13/#doc-opt--daemonize
> >>
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20250626/9d444c50/attachment-0001.htm>


More information about the lttng-dev mailing list