[lttng-dev] File descriptor leak in consumerd

Chris Cole ccole at juniper.net
Wed Jan 11 22:58:11 UTC 2017


Greetings.

Background:

We've written a background daemon to implement a soft limit on the amount of disk space used by traces.

To a first approximation, it removes old sessions first in an attempt to keep the usage below a fixed amount.

In an attempt to stress the system and test the limiting daemon, I crash a particular (LTTng-using) application in a loop. Each crash causes a new trace directory to be created.

Eventually, trace usage bumps up against the limit and we start throwing traces overboard.

For about an hour after the limiter kicks in, everything looks fine. The file descriptor count for consumerd holds steady:

...
848
690
6.0G	/var/log/lttng-traces
848
690
6.0G	/var/log/lttng-traces
...

At about an hour or so the file descriptor count starts to climb monotonically:

root at box:~# ls /proc/10623/fd | wc -l ; ls /proc/10633/fd | wc -l ; du -hs /var/log/lttng-traces

...
406
690
6.0G	/var/log/lttng-traces
412
690
6.0G	/var/log/lttng-traces
412
690
6.0G	/var/log/lttng-traces
412
690
6.0G	/var/log/lttng-traces
418
690
6.0G	/var/log/lttng-traces
418
690
6.0G	/var/log/lttng-traces
424
690
6.0G	/var/log/lttng-traces
...

I have captured logs (-v) from relayd, sessiond, and consumerd. The only thing that stands out is the following error:

lttng-sessiond.log:Jan 10 12:20:47 INFO:   PERROR - 12:20:47.240536 [10623/10630]: sendmsg inet: Connection reset by peer (in lttcomm_sendmsg_inet_sock() at inet.c:442)

Please let me know what further information you may need.

Cordialement,
Chris




More information about the lttng-dev mailing list