[lttng-dev] Capturing User-Level Function Calls/Returns

Thu Jul 16 12:20:16 EDT 2020

<p>Hi Michel,</p>
<p>Thanks for the detailed answer! DBI tools are really interesting but 
I want to do this during normal execution and on multiple programs 
running simultaneously. I mean this is not supposed to 
be conventional tracing with multiple re-executions. I want to 
extract some information about the execution-state at runtime and inform 
the lower levels in the software stack to make smarter choices. 
Fortunately, there are only a few functions that need to be traced. But 
any reduction in the wasted cycles is helpful, specially if it is caused 
by privilege level transitions.</p>
<p>Regards.</p>
<p> </p>
<p>On 2020-07-16 05:36, Michel Dagenais wrote:</p>
<blockquote><!-- html ignored --><!-- head ignored --><!-- meta ignored 
-->
<div class="pre"><br />
<blockquote>Without recompiling, how would that be 
implemented?</blockquote>
<br /> As you mentioned, this is possible when "jump patching" 5 bytes 
instructions. Fast tracepoints in GDB and in kprobe do it. Kprobe goes 
further and patches sequences of instructions (because the target 
instruction is less than 5 bytes) if there is no incoming branch into 
the middle of the sequence. You can go even further, for instance using 
3 bytes jumps to a trampoline installed in alignment nops. If you 
combine different strategies like this, you can eventually reach almost 
100% success rate for "jump patching" tracepoints. This gets quite hairy 
though. However, the short story is that there is currently no tool as 
far as I know that does that easily and reliably in user space.<br /><br 
/><a href="https://onlinelibrary.wiley.com/doi/abs/10.1002/spe.2746" 
target="_blank" rel="noopener 
noreferrer">https://onlinelibrary.wiley.com/doi/abs/10.1002/spe.2746</a><br 
/><a href="https://dl.acm.org/doi/pdf/10.1145/3062341.3062344" 
target="_blank" rel="noopener 
noreferrer">https://dl.acm.org/doi/pdf/10.1145/3062341.3062344</a><br 
/><br /> If you can afford a more invasive tool, that requires a lot of 
memory and stops your application for quite some time, you can look at 
approaches like dyninst that decompile the binary, insert 
instrumentation code and reassemble the code.<br /><br /><a 
href="https://dyninst.org/" target="_blank" rel="noopener 
noreferrer">https://dyninst.org/</a><br /><br />
<blockquote>You would need to insert a jump on top of code, and still be 
able to<br /> preserve that code. What a trap does, is to insert a int3, 
that will<br /> trap into the kernel, it would then emulate the code 
that the int3 was<br /> on, and also call some code that can trace the 
current state.<br /><br /> To do it in user land, you would need to find 
way to replace the code<br /> at the location you want to trace, with a 
jump to the tracing<br /> infrastructure, that will also be able to 
emulate the code that the<br /> jump was inserted on top of. As on x86, 
that jump will need to be 5<br /> bytes long (covering 5 bytes of text 
to emulate), where as a int3 is a<br /> single byte.<br /><br /> Thus, 
you either recompile and insert nops where you want to place your<br /> 
jumps, or you trap using int3 that can do the work from within the<br /> 
kernel.<br /><br /> -- Steve<br /> 
_______________________________________________<br /> lttng-dev mailing 
list<br /><a 
href="mailto:lttng-dev at lists.lttng.org">lttng-dev at lists.lttng.org</a><br 
/><a href="https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev" 
target="_blank" rel="noopener 
noreferrer">https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev</a></blockquote>
</div>
</blockquote>
<p> </p>
<div id="_rc_sig"> </div>