[lttng-dev] Userspace tracing in docker containers

Tue Apr 6 10:07:53 EDT 2021

Hi,

On Mon, Apr 05, 2021 at 11:09:39AM -0700, Eqbal via lttng-dev wrote:
> Hi,
> 
> I am trying to get user space tracing working for an application running in
> a docker container. I am running lttng session daemon in another container.
> I mounted the unix socket locations (either /var/run/lttng for root or
> $HOME/.lttng for another user). By doing that I can run commands like lttng
> create or lttng list <session-name>, but the tracepoint events from the
> application don't get registered and there is no trace output.
> 
> I enabled LTTNG_UST_DEBUG an ran lttng-sessiond in verbose mode (-vvv and
> --verbose-consumer) and got the following error message:
> 
> "*Unix socket credential pid=0. Refusing application in distinct,
> non-nested pid namespace.*"
> 
> It appears that for some calls to the session daemon there is a getsockopt
> syscall made with *SO_PEERCRED* which returns 0 for pid and the call is
> failed with *LTTNG_UST_ERR_PEERCRED_PID* error (see get_cred call in
> ustctl.c).
> 
> If I comment out the getsockopt call, my application tracing starts to work.
> 
> From what I found, docker cannot support getsockopt/SO_PEERCRED call to get
> peer pid on the unix socket which would make sense as it's in a separate
> namespace.
> 
> I have a few questions on this:
> 1. What is the reason for the get_cred/getsockopt call with SO_PEERCRED? I
> would like to understand why it's required for some and not other calls.

More information is found in the introducing commit:

  commit a834901f2890deadb815d7f9e3ab79c3ba673994
  Author: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
  Date:   Mon Oct 12 16:52:03 2020 -0400

    Fix: Use unix socket peercred for pid, uid, gid credentials

    Currently, the session daemon trust the pid, ppid, uid, and gid values
    passed by the application, but should really validate the uid using unix
    socket peercred. This fix uses the peercred values rather than the
    values provided by the application on registration for:

    - pid, uid and gid on Linux,
    - uid and gid on FreeBSD.

    This should improve how the session daemon deals with containerized
    applications on Linux as well. Applications are required to be either in
    the same pid namespace, or in a pid namespace nested within the pid
    namespace of the lttng-sessiond, so the session daemon can map the
    application pid to something meaningful within its own pid namespace.
    Applications in a unrelated (disjoint) pid namespace will be refused by
    the session daemon.

    About the uid and gid with user namespaces on Linux, those will provide
    meaningful IDs if the application user namespace is either the same as
    the user namespace of the session daemon, or a nested user namespace.
    Otherwise, the IDs will be that of /proc/sys/kernel/overflowuid and
    /proc/sys/kernel/overflowgid, which typically maps to nobody.nogroup on
    current distributions.

    Given that fetching the parent pid (ppid) of the application would
    require to use /proc/<pid>/status (which is racy wrt pid reuse), expose
    the ppid provided by the application on registration instead, but only
    in situations where the application sits in the same pid namespace as
    the session daemon (on Linux), which is detected by checking if the pid
    provided by the application matches the pid obtained using unix socket
    credentials. The ppid is only used for logging/debugging purposes in the
    session daemon anyway, so it is OK to use the value provided by the
    application for that purpose.

    Fixes: #1286
    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
    Change-Id: I94742e57dad642106908d09e2c7e395993c2c48f

As for "why it's required for some and not other calls.", there is a difference
between communicating with a lttng-sessiond daemon (using the lttng CLI) and
userspace application registering. They are essentially two distinct
communication interface. Now, to be honest, I'm not certain of the complete
"security" policy for the lttng-sessiond <-> CLI interface and if we should be
more strict or not.

> 2. Is there any workaround for this problem, so that I can get this to work
> with the container topology I am working with (app in one container and
> lttng daemons in another).

Based on the commit message, lttng-ust explicitly cannot be used across
non-nested pid namespace.

Could you give us more information on the goal for the topology you plan to use?
This could lead to further discussion and/or alternative solution based on the
goal and constraints of your deployment.

> 3. Related to 2, are there any gotchas to bypassing the getsockopt call in
> get_cred?

Based on the content of the mentioned bug (1286) [1],  the principal concern is:

"
This means a non-root application could theoretically impersonate a root
application from a tracing perspective, and thus access root tracing buffers in
a per-uid configuration, which is unwanted.
"

[1] https://bugs.lttng.org/issues/1286

Cheers

-- 
Jonathan Rajotte-Julien
EfficiOS