[lttng-dev] Unexport of kvm_x86_ops vs tracer modules
mathieu.desnoyers at efficios.com
Mon Apr 25 09:00:22 EDT 2022
----- On Apr 8, 2022, at 2:06 PM, Mathieu Desnoyers via lttng-dev lttng-dev at lists.lttng.org wrote:
> ----- On Apr 8, 2022, at 12:24 PM, Paolo Bonzini pbonzini at redhat.com wrote:
>> On 4/8/22 17:36, Mathieu Desnoyers wrote:
>>> LTTng is an out of tree kernel module, which currently relies on the export.
>>> Indeed, arch/x86/kvm/x86.c exports a set of tracepoints to kernel modules, e.g.:
>>> But any probe implementation hooking on that tracepoint would need kvm_x86_ops
>>> to translate the struct kvm_vcpu * into meaningful tracing data.
>>> I could work-around this on my side in ugly ways, but I would like to discuss
>>> how kernel module tracers are expected to implement kvm events probes without
>>> the kvm_x86_ops symbol ?
>> The conversion is done in the TP_fast_assign snippets, which are part of
>> kvm.ko and therefore do not need the export. As I understand it, the
>> issue is that LTTng cannot use the TP_fast_assign snippets, because they
>> are embedded in the trace_event_raw_event_* symbols?
> Indeed, the fact that the TP_fast_assign snippets are embedded in the
> trace_event_raw_event_* symbols is an issue for LTTng. This ties those
> to ftrace.
> AFAIK, TP_fast_assign copies directly into ftrace ring buffers, and then
> afterwards things like dynamic filters are applied, which then "uncommits" the
> events if need be (and if possible). Also, TP_fast_assign is tied to the
> ftrace ring buffer event layout. The fact that the TP_STRUCT__entry()
> and TP_fast_assign() (open-coded C) are separate fields really focuses on a
> use-case where all data is serialized to a ring buffer.
> In LTTng, the event fields are made available to a filter interpreter prior to
> being copied into LTTng's ring buffer. This is made possible by implementing
> our own LTTNG_TRACEPOINT_EVENT code generation headers. In addition, we have
> recently released an event notification mechanism (lttng 2.13) which captures
> specific event fields to send with an immediate notification (thus bypassing the
> tracer buffering). We are also currently working on a LTTng trace hit counters
> mechanism, which performs aggregation through per-cpu counters, which doesn't
> even allocate a ring buffer.
> For those reasons, LTTng reimplements its own tracepoint probe callbacks. All
> those sit within LTTng kernel modules, which means we currently need the
> kvm_x86_ops callbacks.
>> We cannot do the extraction before calling trace_kvm_exit, because it's
> I suspect that extracting relevant data prior to calling trace_kvm_exit
> is too expensive because it cannot be skipped when the tracepoint is
> disabled. This is because trace_kvm_exit() is a static inline function,
> and the check to figure out if the event is enabled is within that function.
> Unfortunately, even if the tracepoint is disabled, the side-effects of the
> parameters passed to trace_kvm_exit() must happen.
> I've solved this in LTTng-UST by implementing a lttng_ust_tracepoint()
> macro, which basically "lifts" the tracepoint enabled check before the
> evaluation of the arguments.
> You could achieve something similar by using trace_kvm_exit_enabled() in the
> kernel like so:
> if (trace_kvm_exit_enabled())
> Which would skip evaluation of the argument side-effects when the tracepoint is
> By doing that, when multiple tracers are attached to a kvm tracepoint, the
> translation from pointer-to-internal-structure to meaningful fields would only
> need to be done once when a tracepoint is hit. And this would remove the need
> for using kvm_x86_ops callbacks from tracer probe functions.
> Thoughts ?
We are at 5.18-rc4 now. Should I expect this unexport to stay in place
for 5.18 final and go ahead with using kallsyms to find this symbol from
lttng-modules instead ?
> Mathieu Desnoyers
> EfficiOS Inc.
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
More information about the lttng-dev