[lttng-dev] Best way to analyze CTF files

Fri Oct 17 22:15:11 EDT 2014

On Fri, Oct 17, 2014 at 8:13 PM, Christian Babeux
<christian.babeux at efficios.com> wrote:
> Hi Sebastien,
>
> A simple Python script using the Babeltrace bindings would probably do
> the job here. See [1] for an example using the bindings.

Yes, here's a start Sébastien <http://pastebin.com/QXfev1ve>:

    import babeltrace
    import json
    import sys

    def get_msg_intervals(trace_path):
        col = babeltrace.TraceCollection()

        if col.add_trace(trace_path, 'ctf') is None:
            raise RuntimeError('Cannot add trace')

        msg_intervals = {}

        for ev in col.events:
            if ev.name not in ['message:actor_send', 'message:actor_receive']:
                continue

            msg_source_actor = ev['message_source_actor']
            msg_number = ev['message_number']

            if msg_source_actor not in msg_intervals:
                msg_intervals[msg_source_actor] = {}

            if msg_number not in msg_intervals[msg_source_actor]:
                msg_intervals[msg_source_actor][msg_number] = {}

            interval = msg_intervals[msg_source_actor][msg_number]

            if ev.name == 'message:actor_send':
                interval['send_ts'] = ev.timestamp
            else:
                interval['recv_ts'] = ev.timestamp

        return msg_intervals

    if __name__ == '__main__':
        msg_intervals = get_msg_intervals(sys.argv[1])
        print(json.dumps(msg_intervals, indent=2))

This will output a JSON dict mapping actors to message numbers, and then
each message number to a dict containing both the send and receive timestamps:

    python3 intervals.py /path/to/trace > intervals.json

Once you have that JSON database, you may either read it manually or query it
easily using another Python script <http://pastebin.com/fLwyvEz8>:

    import json
    import sys

    def print_msg_diff(filename, actor_id, msg_number):
        def get_diff(msg):
            send_ts = msg['send_ts']
            recv_ts = msg['recv_ts']

            if send_ts == -1 or recv_ts == -1:
                return None

            return recv_ts - send_ts

        with open(filename) as f:
            msg_intervals = json.load(f)

        actor = msg_intervals[actor_id]

        if msg_number is None:
            for msg_number in sorted(actor.keys()):
                msg = actor[msg_number]
                print('{}: {}'.format(msg_number, get_diff(msg)))
        else:
            print(get_diff(actor[msg_number]))

    if __name__ == '__main__':
        filename = sys.argv[1]
        actor_id = sys.argv[2]
        msg_number = None

        if len(sys.argv) > 3:
            msg_number = sys.argv[3]

        print_msg_diff(filename, actor_id, msg_number)

Use it like this:

    python3 query.py <JSON file path> <actor ID> [<message number>]

For example:

    python3 query.py intervals.json 1000019
    python3 query.py intervals.json 1000059
    python3 query.py intervals.json 1000019 14

Modify those scripts at will to fit your exact use case. Of course the versions
above could crash in many ways because they lack a few run time checks, etc.

Hope it helps,
Phil

>
> Thanks,
>
> Christian
>
> [1] - https://lttng.org/docs/#doc-viewing-and-analyzing-your-traces
>
> On Fri, Oct 17, 2014 at 6:23 PM, Boisvert, Sebastien <boisvert at anl.gov> wrote:
>> Bonjour,
>>
>> First, thank you for LTTng-UST. This is very useful and convenient.
>>
>> I just got started today using LTTng (LTTng-UST) for tracing a HPC application
>> that I am working on (I am a postdoc). I am impressed by how easy LTTng is to use it.
>>
>> In my system, an actor message is represented by a pair
>> <message_actor_source, message_number>.
>>
>> I want to list all messages that have a high delivery time (message:actor_receive - message:actor_send).
>>
>> I am doing this to get the messages of one actor (actor 1000019):
>>
>> [boisvert at bigmem biosal]$ babeltrace ~/lttng-traces/auto-20141017-181240|grep "message_source_actor = 1000019"  > actor_1000019
>>
>> Then, I can look at one message with (message <1000019, 14>):
>>
>> [boisvert at bigmem biosal]$ grep "message_number = 14," actor_1000019
>> [18:12:43.647017211] (+0.000005110) bigmem.knoxville.kbase.us message:actor_send: { cpu_id = 30 }, { message_number = 14, message_action = 31592, message_count = 8, message_source_actor = 1000019, message_destination_actor = 1000059, message_source_node = -1, message_destination_node = -1 }
>> [18:12:43.647025249] (+0.000002860) bigmem.knoxville.kbase.us message:actor_receive: { cpu_id = 49 }, { message_number = 14, message_action = 31592, message_count = 8, message_source_actor = 1000019, message_destination_actor = 1000059, message_source_node = 3, message_destination_node = 3 }
>>
>> If I substract the times:
>>
>> irb(main):003:0> (43.647025249-43.647017211)*10**9
>> => 8038.00000426236
>>
>> This message (<1000019, 14>) required 8038 ns for the delivery. This one is fine.
>>
>>
>> So basically my question is:
>>
>> Is there an easy way to analyze these tracepoint files ?
>> _______________________________________________
>> lttng-dev mailing list
>> lttng-dev at lists.lttng.org
>> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev