[lttng-dev] error with ctf_sequence_text when using non-utf8 encoded strings
Nathan Ricci
naricc at microsoft.com
Wed Jan 31 12:10:55 EST 2024
I am trying to emit a trace that has a wchar string, using as one of the fields ctf_sequence_text. When the trace is recorded, it seems like everything at but the first character is truncated, and I think this is because it is assuming UTF8 encoding and stopping at the first null character. This is on Ubuntu 22.04, using this liblttng-ust package:
liblttng-ust-common1/jammy,now 2.13.1-1ubuntu1
I've boiled this down to a simple repro; I've included the code below, but you can also get it here: https://github.com/naricc/lttng-test
Here is the main file:
-----
#include <stdio.h>
#include <unistd.h>
#include <lttng/lttng.h>
#include <lttng/tracepoint.h>
#include <wchar.h>
#include "repro-tracepoint.h"
int main() {
puts("Hello, World!\nPress Enter to continue...");
getchar();
const char* utf8_text_value = "Hello, UTF8 Sequence Text!";
const wchar_t *wchar_text_value = L"Hello, WChar Sequence Text!";
// Emit the tracepoint event with the sequence text field
lttng_ust_tracepoint(naricc_test_provider, test_event, utf8_text_value, wchar_text_value);
return 0;
}
----
Here is the tracepoint header (repro-tracepoint.h):
___
#undef TRACEPOINT_PROVIDER
#define TRACEPOINT_PROVIDER naricc_test_provider
#undef TRACEPOINT_INCLUDE
#define TRACEPOINT_INCLUDE "./repro-tracepoint.h"
#if !defined(_TP_H) || defined(TRACEPOINT_HEADER_MULTI_READ)
#define _TP_H
#include <lttng/tracepoint.h>
#include <wchar.h>
// Define the tracepoint event with a sequence text field
TRACEPOINT_EVENT(naricc_test_provider, test_event,
TP_ARGS(
const char*, utf8_text_value,
const wchar_t*, wchar_text_value
),
TP_FIELDS(
ctf_sequence_text(char, utf8_text_sequence, utf8_text_value, size_t, strlen(utf8_text_value))
ctf_sequence_text(wchar_t, wchar_text_sequence, wchar_text_value, size_t, wcslen(wchar_text_value) * 2 + 2)
)
)
#endif /* _TP_H */
#include <lttng/tracepoint-event.h>
----
And here is the repro-tracepoint.cpp:
___
#define LTTNG_UST_TRACEPOINT_CREATE_PROBES
#define LTTNG_UST_TRACEPOINT_DEFINE
#include "repro-tracepoint.h"
---
I built it like so:
g++ -c -I. repro-tracepoint.cpp
g++ -c lttng-test.cpp
g++ -o lttng-test lttng-test.o repro-tracepoint.o -llttng-ust -ldl
naricc at TDC20748914:/workspace/lttng-test$
---
After starting a session, running that program, and destroying the session, this is what I get with babeltrace2:
```
$ babeltrace2 ~/lttng-traces/my-user-space-session-20240131-161638
[16:16:43.553211421] (+?.?????????) TDC20748914 naricc_test_provider:test_event: { cpu_id = 6 }, { _utf8_text_sequence_length = 26, utf8_text_sequence = "Hello, UTF8 Sequence Text!", _wchar_text_sequence_length = 56, wchar_text_sequence = [ [0] = 72, [1] = 0, [2] = 0, [3] = 0, [4] = 0, [5] = 0, [6] = 0, [7] = 0, [8] = 0, [9] = 0, [10] = 0, [11] = 0, [12] = 0, [13] = 0, [14] = 0, [15] = 0, [16] = 0, [17] = 0, [18] = 0, [19] = 0, [20] = 0, [21] = 0, [22] = 0, [23] = 0, [24] = 0, [25] = 0, [26] = 0, [27] = 0, [28] = 0, [29] = 0, [30] = 0, [31] = 0, [32] = 0, [33] = 0, [34] = 0, [35] = 0, [36] = 0, [37] = 0, [38] = 0, [39] = 0, [40] = 0, [41] = 0, [42] = 0, [43] = 0, [44] = 0, [45] = 0, [46] = 0, [47] = 0, [48] = 0, [49] = 0, [50] = 0, [51] = 0, [52] = 0, [53] = 0, [54] = 0, [55] = 0 ] }
```
The utf8 sequence prints fine, but the wchar one is truncated to a single character and then zeros. To rule out an error in babelltrace, I inspected the channel files with hexedit and found this:
```
85 58 E1 C1 43 EE 9A 93 9C 60 6D B7 2B B0 00 00 00 00 .......X..C....`m.+.....
00000018 06 00 00 00 00 00 00 00 6A 33 DA B4 2D AD 02 00 04 78 1C 56 30 AD 02 00 ........j3..-....x.V0...
00000030 60 0B 00 00 00 00 00 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 `.......................
00000048 00 00 00 00 00 00 00 00 06 00 00 00 FF FF 00 00 00 00 3C 10 F2 E2 2E AD ..................<.....
00000060 02 00 1A 00 00 00 00 00 00 00 48 65 6C 6C 6F 2C 20 55 54 46 38 20 53 65 ..........Hello, UTF8 Se
00000078 71 75 65 6E 63 65 20 54 65 78 74 21 38 00 00 00 00 00 00 00 48 00 00 00 quence Text!8.......H...
00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........................
```
So it seems the error in the the recording of the trace, not in the viewing.
Looking into the lttng-ust code, it seems like ctf_sequence_text ends up mapped to this:
lttng-ust/include/lttng/ust-tracepoint-event-write.h at 717c38f658248bc04ccfc6e7fdf5d03040c2a846 * lttng/lttng-ust * GitHub<https://github.com/lttng/lttng-ust/blob/717c38f658248bc04ccfc6e7fdf5d03040c2a846/include/lttng/ust-tracepoint-event-write.h#L73>
Which assumes utf8 encoding, and ultimately writes into a ring buffer terminating on null:
lttng-ust/src/common/ringbuffer/backend.h at 717c38f658248bc04ccfc6e7fdf5d03040c2a846 * lttng/lttng-ust * GitHub<https://github.com/lttng/lttng-ust/blob/717c38f658248bc04ccfc6e7fdf5d03040c2a846/src/common/ringbuffer/backend.h#L126>
If we agree this is an error, I believe I can produce a fix for it. Or if I am just using the APIs wrong, please let me know what I should do instead.
--Nathan Ricci
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20240131/e5dc1ac9/attachment.htm>
More information about the lttng-dev
mailing list