[lttng-dev] Crash in application due to watchdog timeout with python3 lttng

Lakshmi Deverkonda laksd at nvidia.com
Fri Feb 16 09:33:29 EST 2024


This is how, we have created the logger. So the first logger is for file logging where is as the second one is for lttng.

  self.logger  = logging.getLogger('cd')
 self.lttng_logger = logging.getLogger('cd-lttng')

It seems like at the instant exactly when lttng is logging some data on a particular thread and the same instant we receive SIGTERM for the application,
we are unable to join that particular thread. Can you please help.

Also we see that performance of lttng is not that good for python3. My application has around 24 threads and when logging is enabled for each of the threads,
there is a delay upto 24s for processing the external events.
Please suggest how to proceed further on these issues.

Regards,
Lakshmi

________________________________
From: Lakshmi Deverkonda <laksd at nvidia.com>
Sent: 13 February 2024 21:05
To: Kienan Stewart <kstewart at efficios.com>; lttng-dev at lists.lttng.org <lttng-dev at lists.lttng.org>
Subject: Re: [lttng-dev] Crash in application due to watchdog timeout with python3 lttng

Yes. We are trying to join only the threads related to the application. The timeout is happening while trying to join the threads started by the application.

Regards,
Lakshmi
________________________________
From: Kienan Stewart <kstewart at efficios.com>
Sent: 13 February 2024 20:50
To: Lakshmi Deverkonda <laksd at nvidia.com>; lttng-dev at lists.lttng.org <lttng-dev at lists.lttng.org>
Subject: Re: [lttng-dev] Crash in application due to watchdog timeout with python3 lttng

External email: Use caution opening links or attachments


Hi Lakshmi,

when the lttngust python agent starts, it attempts to connect to one or
more session daemons[1].

Each connection starts a thread that loops forever, retrying the
registration in case an exception occurs[2].

I don't think the it's designed to have `join()` called on those
threads, which I assume is happening in some of the code you or your
team have written.

My initial thought is that you should `join()` only the threads that
pertinent to your application, ignoring the lttngust agent threads and
then exit the application as normal.

[1]:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Flttng%2Flttng-ust%2Fblob%2F3287f48be61ef3491aff0a80b7185ac57b3d8a5d%2Fsrc%2Fpython-lttngust%2Flttngust%2Fagent.py%23L334&data=05%7C02%7Claksd%40nvidia.com%7Cbdf064d348474249f14a08dc2ca755c9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638434344447867621%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=7tmpFtjl7RkTVgYLr2YjdlPs2oM1F%2FXOg6W51mHDCws%3D&reserved=0<https://github.com/lttng/lttng-ust/blob/3287f48be61ef3491aff0a80b7185ac57b3d8a5d/src/python-lttngust/lttngust/agent.py#L334>
[2]:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Flttng%2Flttng-ust%2Fblob%2F3287f48be61ef3491aff0a80b7185ac57b3d8a5d%2Fsrc%2Fpython-lttngust%2Flttngust%2Fagent.py%23L83&data=05%7C02%7Claksd%40nvidia.com%7Cbdf064d348474249f14a08dc2ca755c9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638434344447874777%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=PKb8rKDWFKmuuVB4YQEL8ZtAP%2B%2BYfTniUuLN9fFBctc%3D&reserved=0<https://github.com/lttng/lttng-ust/blob/3287f48be61ef3491aff0a80b7185ac57b3d8a5d/src/python-lttngust/lttngust/agent.py#L83>

thanks,
kienan

On 2/13/24 09:23, Lakshmi Deverkonda via lttng-dev wrote:
> Hi,
>
> We are able to integrate python3 lttng module in our application(python3
> based). However, we are seeing that whenever the application terminates,
> there is watchdog timeout due to timeout in joining the threads. What
> could be the reason for this ? Does lttng module hold any thread event
> locks ?
> We are completely blocked on this issue. Could you please help ?
>
> Here is the snippet of the core dump
>
> (gdb) py-bt
> Traceback (most recent call first):
>    File "/usr/lib/python3.7/threading.py", line 1048, in
> _wait_for_tstate_lock
>      elif lock.acquire(block, timeout):
>    File "/usr/lib/python3.7/threading.py", line 1032, in join
>      self._wait_for_tstate_lock()
>    File "/usr/lib/python3/dist-packages/h.py", line 231, in JoinThreads
>      self.TT.join()
>    File "/usr/sbin/c", line 1466, in do_exit
>      H.JoinThreads()
>    File "/usr/sbin/c", line 7201, in main
>      do_exit(nlm, status)
>    File "/usr/sbin/c", line 7233, in <module>
>      main()
> (gdb)
>
> On a parallel note, thanks to Kienan who has been trying to provide
> pointers on various issues reported so far.
>
> Need help on this issue as well.
> Thanks in advance,
>
> Regards,
> Lakshmi
>
>
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.lttng.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Flttng-dev&data=05%7C02%7Claksd%40nvidia.com%7Cbdf064d348474249f14a08dc2ca755c9%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638434344447880631%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=2iVi8xLrTS1Dj%2FcF3V30q0OjCvMP4kTpOUSthJvnZI0%3D&reserved=0<https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20240216/5713add4/attachment-0001.htm>


More information about the lttng-dev mailing list