[lttng-dev] Allocation failures with babeltrace and TraceCompass - corrupt trace?

Fri Jun 23 19:55:21 UTC 2017

On 2017-06-23 11:48, Thomas McGuire wrote:
>>> Any idea what can cause the corrupted trace?
>> Based on your babeltrace backtrace, the possible culprits would be the
>> events that have a sequence (variable-sized array):
>>
>> syscalls: select, poll, ppoll, pselect6, epoll_wait, epoll_pwait
>>
>> block_rq_issue, block_rq_insert, block_rq_complete, block_rq_requeue, block_rq_abort.
>>
>> There are a few approaches to cornering the issue. You can try reproducing
>> on your workload/config by only enabling one of these events at a time.
>> Just knowing which event(s) is/are the culprit would be a good start.
>>
>> Another possibility would be to send us a trace reproducing the issue
>> with only those events enabled, which should not contain confidential
>> info about your system.
> 
> I've added some debug statements to babeltrace now. The culprit in this
> particular case is the first block_rq_complete event, the __cmd_length
> field contains a large value (3040877592). __cmd_length is used as the
> length for the _cmd sequence, and then of course allocating space for
> that sequence fails.
> 
> Any idea what can cause __cmd_length to be bogus?

Hi Thomas,

I see from the metadata file you provided that your kernel version is
4.9.28-20170428-1, is it built from vanilla kernel sources? If not,
could you point us to a git repo or source archive? It would help a lot
to figure this out.

Thanks,

Michael