[lttng-dev] [PATCH lttng-tools v3 2/2] man: document dead-peer detection for relayd

Jérémie Galarneau jeremie.galarneau at efficios.com
Mon Dec 4 15:23:28 UTC 2017


On 6 October 2017 at 14:22, Jonathan Rajotte
<jonathan.rajotte-julien at efficios.com> wrote:
> Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien at efficios.com>
> ---
>  doc/man/lttng-relayd.8.txt | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>
> diff --git a/doc/man/lttng-relayd.8.txt b/doc/man/lttng-relayd.8.txt
> index 218526a1..e37ce845 100644
> --- a/doc/man/lttng-relayd.8.txt
> +++ b/doc/man/lttng-relayd.8.txt
> @@ -162,6 +162,25 @@ ENVIRONMENT VARIABLES
>  `LTTNG_RELAYD_HEALTH`::
>      Path to relay daemon health's socket.
>
> +`LTTNG_RELAYD_TCP_KEEP_ALIVE_ENABLE`::
> +    Set to 1 to enable the use of tcp keep-alive allowing the detection of dead
> +    peers.

I think we should add an explanation of why this option matters.

I would enable the keep-alive mechanism by default considering the fairly
severe problems users can experience without this mechanism in place
(fd exhaustion
and what amounts to a memory leak) when connections are dropped.

Given this, it would make sense to change the variable to
'LTTNG_RELAYD_TCP_KEEP_ALIVE' which the user can set to 0/1 to override the
default policy (default to 'on').

> +
> +`LTTNG_RELAYD_TCP_KEEP_ALIVE_TIME`::
> +    See tcp(7) tcp_keepalive_time or tcp_keepalive_interval on Solaris 11.
> +    A value of -1 lets the operating system manage this parameter (default).

Pointing the user to tcp(7) is okay, but a sentence to explain what this
setting does would be helpful.

"The number of seconds a connection needs to be idle before TCP begins
sending out keep-alive probes."

I propose renaming the variable to LTTNG_RELAYD_TCP_KEEP_ALIVE_IDLE_TIME.

Please add the time unit expected in this variable. Also, what would 0
mean here?

While I agree with -1 == OS default, I'm wondering if we should be
more aggressive.

The Linux default of two hours seems like an awfully long time... Let's keep
the platform defaults for now and I'll do some research to see what would be a
reasonable value here if we decide to shorten the interval.

> +
> +`LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBES`::
> +    See tcp(7) tcp_keepalive_probes.
> +    A value of -1 lets the operating system manage this parameter (default).

Same comment about the expected time unit, meaning and valid/expected values.

I propose renaming the variable to LTTNG_RELAYD_TCP_KEEP_ALIVE_MAX_PROBE_COUNT.

> +
> +NOTE: Not supported on Solaris.

Is this option supported by Cygwin?


> +
> +`LTTNG_RELAYD_TCP_KEEP_ALIVE_INTVL`::
> +    See tcp(7) tcp_keepalive_intvl.
> +    A value of -1 lets the operating system manage this parameter (default).

Same comment about the expected time unit, meaning and valid/expected values.

I propose renaming the variable to LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBE_INTERVAL.

> +
> +NOTE: Not supported on Solaris.

Same comment about Cygwin.


>
>  FILES
>  -----
> --
> 2.11.0
>



-- 
Jérémie Galarneau
EfficiOS Inc.
http://www.efficios.com


More information about the lttng-dev mailing list