[lttng-dev] Crash in application due to watchdog timeout with python3 lttng

Lakshmi Deverkonda laksd at nvidia.com
Tue Feb 13 10:35:19 EST 2024


Yes. We are trying to join only the threads related to the application. The timeout is happening while trying to join the threads started by the application.

Regards,
Lakshmi
________________________________
From: Kienan Stewart <kstewart at efficios.com>
Sent: 13 February 2024 20:50
To: Lakshmi Deverkonda <laksd at nvidia.com>; lttng-dev at lists.lttng.org <lttng-dev at lists.lttng.org>
Subject: Re: [lttng-dev] Crash in application due to watchdog timeout with python3 lttng

External email: Use caution opening links or attachments


Hi Lakshmi,

when the lttngust python agent starts, it attempts to connect to one or
more session daemons[1].

Each connection starts a thread that loops forever, retrying the
registration in case an exception occurs[2].

I don't think the it's designed to have `join()` called on those
threads, which I assume is happening in some of the code you or your
team have written.

My initial thought is that you should `join()` only the threads that
pertinent to your application, ignoring the lttngust agent threads and
then exit the application as normal.

[1]:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Flttng%2Flttng-ust%2Fblob%2F3287f48be61ef3491aff0a80b7185ac57b3d8a5d%2Fsrc%2Fpython-lttngust%2Flttngust%2Fagent.py%23L334&data=05%7C02%7Claksd%40nvidia.com%7Cbdf064d348474249f14a08dc2ca755c9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638434344447867621%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=7tmpFtjl7RkTVgYLr2YjdlPs2oM1F%2FXOg6W51mHDCws%3D&reserved=0<https://github.com/lttng/lttng-ust/blob/3287f48be61ef3491aff0a80b7185ac57b3d8a5d/src/python-lttngust/lttngust/agent.py#L334>
[2]:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Flttng%2Flttng-ust%2Fblob%2F3287f48be61ef3491aff0a80b7185ac57b3d8a5d%2Fsrc%2Fpython-lttngust%2Flttngust%2Fagent.py%23L83&data=05%7C02%7Claksd%40nvidia.com%7Cbdf064d348474249f14a08dc2ca755c9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638434344447874777%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=PKb8rKDWFKmuuVB4YQEL8ZtAP%2B%2BYfTniUuLN9fFBctc%3D&reserved=0<https://github.com/lttng/lttng-ust/blob/3287f48be61ef3491aff0a80b7185ac57b3d8a5d/src/python-lttngust/lttngust/agent.py#L83>

thanks,
kienan

On 2/13/24 09:23, Lakshmi Deverkonda via lttng-dev wrote:
> Hi,
>
> We are able to integrate python3 lttng module in our application(python3
> based). However, we are seeing that whenever the application terminates,
> there is watchdog timeout due to timeout in joining the threads. What
> could be the reason for this ? Does lttng module hold any thread event
> locks ?
> We are completely blocked on this issue. Could you please help ?
>
> Here is the snippet of the core dump
>
> (gdb) py-bt
> Traceback (most recent call first):
>    File "/usr/lib/python3.7/threading.py", line 1048, in
> _wait_for_tstate_lock
>      elif lock.acquire(block, timeout):
>    File "/usr/lib/python3.7/threading.py", line 1032, in join
>      self._wait_for_tstate_lock()
>    File "/usr/lib/python3/dist-packages/h.py", line 231, in JoinThreads
>      self.TT.join()
>    File "/usr/sbin/c", line 1466, in do_exit
>      H.JoinThreads()
>    File "/usr/sbin/c", line 7201, in main
>      do_exit(nlm, status)
>    File "/usr/sbin/c", line 7233, in <module>
>      main()
> (gdb)
>
> On a parallel note, thanks to Kienan who has been trying to provide
> pointers on various issues reported so far.
>
> Need help on this issue as well.
> Thanks in advance,
>
> Regards,
> Lakshmi
>
>
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.lttng.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Flttng-dev&data=05%7C02%7Claksd%40nvidia.com%7Cbdf064d348474249f14a08dc2ca755c9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638434344447880631%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=2iVi8xLrTS1Dj%2FcF3V30q0OjCvMP4kTpOUSthJvnZI0%3D&reserved=0<https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20240213/07263764/attachment-0001.htm>


More information about the lttng-dev mailing list