[lttng-dev] error with ctf_sequence_text when using non-utf8 encoded strings

Nathan Ricci naricc at microsoft.com
Wed Jan 31 12:10:55 EST 2024


I am trying to emit a trace that has a wchar string, using as one of the fields ctf_sequence_text. When the trace is recorded, it seems like everything at but the first character is truncated, and I think this is because it is assuming UTF8 encoding and stopping at the first null character.  This is on Ubuntu 22.04, using this liblttng-ust package:

liblttng-ust-common1/jammy,now 2.13.1-1ubuntu1

I've boiled this down to a simple repro; I've included the code below, but you can also get it here: https://github.com/naricc/lttng-test

Here is the main file:
-----

#include <stdio.h>
#include <unistd.h>
#include <lttng/lttng.h>
#include <lttng/tracepoint.h>
#include <wchar.h>
#include "repro-tracepoint.h"



int main() {
    puts("Hello, World!\nPress Enter to continue...");
    getchar();

    const char* utf8_text_value = "Hello, UTF8 Sequence Text!";
    const wchar_t *wchar_text_value = L"Hello, WChar Sequence Text!";

    // Emit the tracepoint event with the sequence text field
    lttng_ust_tracepoint(naricc_test_provider, test_event, utf8_text_value, wchar_text_value);

    return 0;
}

----
Here is the tracepoint header (repro-tracepoint.h):
___

#undef TRACEPOINT_PROVIDER
#define TRACEPOINT_PROVIDER naricc_test_provider

#undef TRACEPOINT_INCLUDE
#define TRACEPOINT_INCLUDE "./repro-tracepoint.h"

#if !defined(_TP_H) || defined(TRACEPOINT_HEADER_MULTI_READ)
#define _TP_H

#include <lttng/tracepoint.h>
#include <wchar.h>

// Define the tracepoint event with a sequence text field
TRACEPOINT_EVENT(naricc_test_provider, test_event,
    TP_ARGS(
        const char*, utf8_text_value,
        const wchar_t*, wchar_text_value
    ),
    TP_FIELDS(
        ctf_sequence_text(char, utf8_text_sequence, utf8_text_value, size_t, strlen(utf8_text_value))
        ctf_sequence_text(wchar_t, wchar_text_sequence, wchar_text_value, size_t, wcslen(wchar_text_value) * 2 + 2)
    )
)

#endif /* _TP_H */

#include <lttng/tracepoint-event.h>

----
And here is the repro-tracepoint.cpp:
___

#define LTTNG_UST_TRACEPOINT_CREATE_PROBES
#define LTTNG_UST_TRACEPOINT_DEFINE

#include "repro-tracepoint.h"

---

I built it like so:


g++ -c -I. repro-tracepoint.cpp
g++ -c lttng-test.cpp
g++ -o lttng-test lttng-test.o repro-tracepoint.o -llttng-ust -ldl

naricc at TDC20748914:/workspace/lttng-test$

---

After starting a session, running that program, and destroying the session, this is what I get with babeltrace2:

```
$ babeltrace2  ~/lttng-traces/my-user-space-session-20240131-161638
[16:16:43.553211421] (+?.?????????) TDC20748914 naricc_test_provider:test_event: { cpu_id = 6 }, { _utf8_text_sequence_length = 26, utf8_text_sequence = "Hello, UTF8 Sequence Text!", _wchar_text_sequence_length = 56, wchar_text_sequence = [ [0] = 72, [1] = 0, [2] = 0, [3] = 0, [4] = 0, [5] = 0, [6] = 0, [7] = 0, [8] = 0, [9] = 0, [10] = 0, [11] = 0, [12] = 0, [13] = 0, [14] = 0, [15] = 0, [16] = 0, [17] = 0, [18] = 0, [19] = 0, [20] = 0, [21] = 0, [22] = 0, [23] = 0, [24] = 0, [25] = 0, [26] = 0, [27] = 0, [28] = 0, [29] = 0, [30] = 0, [31] = 0, [32] = 0, [33] = 0, [34] = 0, [35] = 0, [36] = 0, [37] = 0, [38] = 0, [39] = 0, [40] = 0, [41] = 0, [42] = 0, [43] = 0, [44] = 0, [45] = 0, [46] = 0, [47] = 0, [48] = 0, [49] = 0, [50] = 0, [51] = 0, [52] = 0, [53] = 0, [54] = 0, [55] = 0 ] }

```

The utf8 sequence prints fine, but the wchar one is truncated to a single character and then zeros.  To rule out an error in babelltrace, I inspected the channel files with hexedit and found this:

```
85 58  E1 C1 43 EE  9A 93 9C 60  6D B7 2B B0  00 00 00 00  .......X..C....`m.+.....
00000018   06 00 00 00  00 00 00 00  6A 33 DA B4  2D AD 02 00  04 78 1C 56  30 AD 02 00  ........j3..-....x.V0...
00000030   60 0B 00 00  00 00 00 00  00 80 00 00  00 00 00 00  00 00 00 00  00 00 00 00  `.......................
00000048   00 00 00 00  00 00 00 00  06 00 00 00  FF FF 00 00  00 00 3C 10  F2 E2 2E AD  ..................<.....
00000060   02 00 1A 00  00 00 00 00  00 00 48 65  6C 6C 6F 2C  20 55 54 46  38 20 53 65  ..........Hello, UTF8 Se
00000078   71 75 65 6E  63 65 20 54  65 78 74 21  38 00 00 00  00 00 00 00  48 00 00 00  quence Text!8.......H...
00000090   00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ........................
```

So it seems the error in the the recording of the trace, not in the viewing.

Looking into the lttng-ust code, it seems like ctf_sequence_text ends up mapped to this:

lttng-ust/include/lttng/ust-tracepoint-event-write.h at 717c38f658248bc04ccfc6e7fdf5d03040c2a846 * lttng/lttng-ust * GitHub<https://github.com/lttng/lttng-ust/blob/717c38f658248bc04ccfc6e7fdf5d03040c2a846/include/lttng/ust-tracepoint-event-write.h#L73>

Which assumes utf8 encoding, and ultimately writes into a ring buffer terminating on null:

lttng-ust/src/common/ringbuffer/backend.h at 717c38f658248bc04ccfc6e7fdf5d03040c2a846 * lttng/lttng-ust * GitHub<https://github.com/lttng/lttng-ust/blob/717c38f658248bc04ccfc6e7fdf5d03040c2a846/src/common/ringbuffer/backend.h#L126>


If we agree this is an error, I believe I can produce a fix for it. Or if I am just using the APIs wrong, please let me know what I should do instead.

             --Nathan Ricci





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20240131/e5dc1ac9/attachment.htm>


More information about the lttng-dev mailing list