[lttng-dev] Large number of stream files in CTF trace -- too many file handles
Rocky Dunlap
dunlap at ucar.edu
Fri Mar 13 17:55:32 EDT 2020
I am attempting to use babeltrace2 to read a CTF trace that has ~2000
stream files. This is a custom trace collected from an MPI application on
an HPC platform. In this case, each MPI process opens and writes to its
own stream file, so you end up with one file per MPI task.
When I attempt to read the trace from the command line with babeltrace2, I
see the following error:
ERROR: [Babeltrace CLI] (babeltrace2.c:2548)
Graph failed to complete successfully
CAUSED BY [libbabeltrace2] (graph.c:473)
Component's "consume" method failed: status=ERROR, comp-addr=0x1beab20,
comp-name="pretty", comp-log-level=WARNING, comp-class-type=SINK,
comp-class-name="pretty", comp-class-partial-descr="Pretty-print messages
(`text` fo", comp-class-is-frozen=0,
comp-class-so-handle-addr=0x174fc10,
comp-class-so-handle-path="/usr/lib/x86_64-linux-gnu/babeltrace2/plugins/babeltrace-plugin-text.so",
comp-input-port-count=1, comp-output-port-count=0
CAUSED BY [libbabeltrace2] (iterator.c:864)
Component input port message iterator's "next" method failed:
iter-addr=0x1c7cec0, iter-upstream-comp-name="muxer",
iter-upstream-comp-log-level=WARNING,
iter-upstream-comp-class-type=FILTER, iter-upstream-comp-class-name="muxer",
iter-upstream-comp-class-partial-descr="Sort messages from multiple
inpu", iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out",
status=ERROR
CAUSED BY [muxer: 'filter.utils.muxer'] (muxer.c:991)
Cannot validate muxer's upstream message iterator wrapper:
muxer-msg-iter-addr=0x1c7d030, muxer-upstream-msg-iter-wrap-addr=0x1e23430
CAUSED BY [muxer: 'filter.utils.muxer'] (muxer.c:454)
Upstream iterator's next method returned an error: status=ERROR
CAUSED BY [libbabeltrace2] (iterator.c:864)
Component input port message iterator's "next" method failed:
iter-addr=0x1e22f00, iter-upstream-comp-name="auto-disc-source-ctf-fs",
iter-upstream-comp-log-level=WARNING,
iter-upstream-comp-class-type=SOURCE, iter-upstream-comp-class-name="fs",
iter-upstream-comp-class-partial-descr="Read CTF traces from the file
sy", iter-upstream-port-type=OUTPUT,
iter-upstream-port-name="21c4e078-a5c7-11e8-8529-34f39aeaad30 | 0 |
/home/rocky/tmp/fv3/wave/traceout/esmf_stream_1020", status=ERROR
CAUSED BY [auto-disc-source-ctf-fs (21c4e078-a5c7-11e8-8529-34f39aeaad30 |
0 | /home/rocky/tmp/fv3/wave/traceout/esmf_stream_1020): 'source.ctf.fs']
(fs.c:109)
Failed to get next message from CTF message iterator.
CAUSED BY [auto-disc-source-ctf-fs: 'source.ctf.fs'] (msg-iter.c:2899)
Cannot handle state: msg-it-addr=0x1e230f0, state=SWITCH_PACKET
CAUSED BY [auto-disc-source-ctf-fs (21c4e078-a5c7-11e8-8529-34f39aeaad30 |
0 | /home/rocky/tmp/fv3/wave/traceout/esmf_stream_1020): 'source.ctf.fs']
(data-stream-file.c:385)
failed to create ctf_fs_ds_file.
CAUSED BY [auto-disc-source-ctf-fs: 'source.ctf.fs'] (file.c:98)
* Cannot open file: Too many open files:
*path=/home/rocky/tmp/fv3/wave/traceout/esmf_stream_1020,
mode=rb
No doubt the issue is the large number of file handles.
I see a similar error when I try to use bt2.TraceCollectionMessageIterator.
This is probably somewhat non-standard to have so many file streams. But,
it works quite well to write them out this way on an HPC system--i.e., to
combine the streams during the application run would require MPI
communication, which would degrade performance and make the tracing more
complicated.
But, now that I have the streams and seeing the too many file handles
system error, I am thinking maybe I should post-process the streams down
from 2000 to a much smaller number, maybe 20, where 100 of the original
streams are merged. The good news is that each of the streams are not that
big, so the overall trace size should be manageable.
If this is the right approach, then what would be the best way to
post-process these streams down to a smaller number of files?
If this is not the right approach, how should I proceed? E.g., should the
source-ctf-fs manage a limited pool of file handles? I would think this
would be pretty inefficient as you would need to constantly open/close
files--expensive.
Any help is appreciated!
Rocky
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20200313/a22be032/attachment.htm>
More information about the lttng-dev
mailing list