[lttng-dev] Crash in application due to watchdog timeout with python3 lttng

Kienan Stewart kstewart at efficios.com
Tue Feb 13 11:03:15 EST 2024


Hi Lakshmi,

On 2/13/24 10:35, Lakshmi Deverkonda wrote:
> Yes. We are trying to join only the threads related to the application. 
> The timeout is happening while trying to join the threads started by the 
> application.

In that case, I suspect that the issue is not related to lttngust. I 
can't help with your internal application code.

If you're able to produce a minimal example that reproduces an issue 
wherein you have deadlock when lttngust is imported, but not when it's 
omitted I think that would be very interesting.

I would also recommend reviewing the bug reporting guidelines at 
https://lttng.org/community/ to ensure that all the necessary 
information is present.

thanks,
kienan

> 
> Regards,
> Lakshmi
> ------------------------------------------------------------------------
> *From:* Kienan Stewart <kstewart at efficios.com>
> *Sent:* 13 February 2024 20:50
> *To:* Lakshmi Deverkonda <laksd at nvidia.com>; lttng-dev at lists.lttng.org 
> <lttng-dev at lists.lttng.org>
> *Subject:* Re: [lttng-dev] Crash in application due to watchdog timeout 
> with python3 lttng
> External email: Use caution opening links or attachments
> 
> 
> Hi Lakshmi,
> 
> when the lttngust python agent starts, it attempts to connect to one or
> more session daemons[1].
> 
> Each connection starts a thread that loops forever, retrying the
> registration in case an exception occurs[2].
> 
> I don't think the it's designed to have `join()` called on those
> threads, which I assume is happening in some of the code you or your
> team have written.
> 
> My initial thought is that you should `join()` only the threads that
> pertinent to your application, ignoring the lttngust agent threads and
> then exit the application as normal.
> 
> [1]:
> https://github.com/lttng/lttng-ust/blob/3287f48be61ef3491aff0a80b7185ac57b3d8a5d/src/python-lttngust/lttngust/agent.py#L334 <https://github.com/lttng/lttng-ust/blob/3287f48be61ef3491aff0a80b7185ac57b3d8a5d/src/python-lttngust/lttngust/agent.py#L334>
> [2]:
> https://github.com/lttng/lttng-ust/blob/3287f48be61ef3491aff0a80b7185ac57b3d8a5d/src/python-lttngust/lttngust/agent.py#L83 <https://github.com/lttng/lttng-ust/blob/3287f48be61ef3491aff0a80b7185ac57b3d8a5d/src/python-lttngust/lttngust/agent.py#L83>
> 
> thanks,
> kienan
> 
> On 2/13/24 09:23, Lakshmi Deverkonda via lttng-dev wrote:
>> Hi,
>>
>> We are able to integrate python3 lttng module in our application(python3
>> based). However, we are seeing that whenever the application terminates,
>> there is watchdog timeout due to timeout in joining the threads. What
>> could be the reason for this ? Does lttng module hold any thread event
>> locks ?
>> We are completely blocked on this issue. Could you please help ?
>>
>> Here is the snippet of the core dump
>>
>> (gdb) py-bt
>> Traceback (most recent call first):
>>    File "/usr/lib/python3.7/threading.py", line 1048, in
>> _wait_for_tstate_lock
>>      elif lock.acquire(block, timeout):
>>    File "/usr/lib/python3.7/threading.py", line 1032, in join
>>      self._wait_for_tstate_lock()
>>    File "/usr/lib/python3/dist-packages/h.py", line 231, in JoinThreads
>>      self.TT.join()
>>    File "/usr/sbin/c", line 1466, in do_exit
>>      H.JoinThreads()
>>    File "/usr/sbin/c", line 7201, in main
>>      do_exit(nlm, status)
>>    File "/usr/sbin/c", line 7233, in <module>
>>      main()
>> (gdb)
>>
>> On a parallel note, thanks to Kienan who has been trying to provide
>> pointers on various issues reported so far.
>>
>> Need help on this issue as well.
>> Thanks in advance,
>>
>> Regards,
>> Lakshmi
>>
>>
>>
>> _______________________________________________
>> lttng-dev mailing list
>> lttng-dev at lists.lttng.org
>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev <https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev>


More information about the lttng-dev mailing list