[lttng-dev] [PATCH lttng-tools v4 1/2] lttng-relayd: use TCP keep-alive mechanism to detect dead-peer

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Mon Jan 22 16:54:06 UTC 2018


Please do the opposite order for easier patch merge:


Subject

Change description body

Signed-off-by....
---
Changes since v4:
- ....

Thanks,

Mathieu

----- On Jan 21, 2018, at 12:39 PM, Jonathan Rajotte jonathan.rajotte-julien at efficios.com wrote:

> v4:
> - Complete rework to expose an internal API for TCP keep-alive
>  functionality support.
> - keepalive -> keep_alive
> - Moved env. variables #define to defaults.h.
> - Introduced struct tcp_keep_alive_support and struct
>  tcp_keep_alive_config as suggested.
> - Use config_parse_value for LTTNG_RELAYD_KEEP_ALIVE.
>  Values supported: on,off,true,false,yes,no,1,0.
> - Support tcp_keepalive_abort_threshold on Solaris 11.
>  This is the more granular equivalent to tcp_keepalive_probe &
>  tcp_keepalive_intvl. The environment variable exposed is
>  LTTNG_TCP_KEEP_ALIVE_ABORT_THRESHOLD.
> - tcp_keep_alive_init is called at lttng-relayd start to validate
>  the values passed, if any, and error out if not they are not valid. It
>  could be called lazily as proposed but would result in validation only
>  when a socket is prepared (on channel creation) which is
>  counter-intuitive and could happen a long time after lttng-relatd
>  start.
> - tcp_keepalive_setsockopt renamed to socket_apply_keep_alive_config
> 
> v3:
> - Fix inversion of definition for tcp_keepalive_time_valid.
> - Handle value of -1 and value < -1 in tcp_keepalive_time_valid
> - Allow value of -1 for LTTNG_RELAYD_TCP_KEEP_ALIVE_TIME,
>  LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBES, LTTNG_RELAYD_TCP_KEEP_ALIVE_INTVL
>  on Solaris 10 & 11.
> - Fix LTTNG_RELAYD_TCP_KEEP_ALIVE_TIME=-1 cases for solaris which would
>  yield -1000 as final modifier value.
> 
> v2:
> -  Replace __sun -> __sun__
> -  Use MS_PER_SEC
> -  Modified definition of tcp_keepalive_time_valid to increase
>   readability.
> -  In tcp_keepalive_time_valid return true if value is -1 since it is
>   equivalent to letting the system manage the setting.
> -  Removed debugging getsockopt calls
> 
> --
> Allow relayd to clean-up objects related to a dead connection
> for which the FIN packet was no emitted (Unexpected shutdown,
> ethernet blocking). Note that an idle peer is not considered dead given
> that it respond to the keep-alive query after the idle time is elapsed.
> 
> By RFC 1122-4.2.3.6 implementation must default to no less than two
> hours for the idle period. On linux the default value is indeed 2 hours.
> This could be problematic if relayd should be aggressive regarding
> dead-peers. Hence it is important to provide tuning knob regarding the
> tcp keep-alive mechanism.
> 
> The following environments variable can be used to enable and fine-tune
> it:
>    LTTNG_RELAYD_TCP_KEEP_ALIVE_ENABLE
>        Set to 1 to enable the use of tcp keep-alive allowing the detection
>        of dead peers.
> 
>    LTTNG_RELAYD_TCP_KEEP_ALIVE_TIME
>        See tcp(7) tcp_keepalive_time or tcp_keepalive_interval on
>	Solaris 11.
>        A value of -1 lets the operating system manage this parameter
>        (default).
> 
>    LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBES
>        See tcp(7) tcp_keepalive_probes.
>        A value of -1 lets the operating system manage this
>        parameter (default).
>	No effect on Solaris.
> 
>    LTTNG_RELAYD_TCP_KEEP_ALIVE_INTVL`::
>        See tcp(7) tcp_keepalive_intvl.
>        A value of -1 lets the operating system manage
>        his parameter (default).
> 
> Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien at efficios.com>
> ---
> src/bin/lttng-relayd/Makefile.am      |   3 +-
> src/bin/lttng-relayd/main.c           |  15 +
> src/bin/lttng-relayd/tcp_keep_alive.c | 560 ++++++++++++++++++++++++++++++++++
> src/bin/lttng-relayd/tcp_keep_alive.h |  24 ++
> src/common/defaults.h                 |   7 +
> 5 files changed, 608 insertions(+), 1 deletion(-)
> create mode 100644 src/bin/lttng-relayd/tcp_keep_alive.c
> create mode 100644 src/bin/lttng-relayd/tcp_keep_alive.h
> 
> diff --git a/src/bin/lttng-relayd/Makefile.am b/src/bin/lttng-relayd/Makefile.am
> index c7dd37e..5125c72 100644
> --- a/src/bin/lttng-relayd/Makefile.am
> +++ b/src/bin/lttng-relayd/Makefile.am
> @@ -21,7 +21,8 @@ lttng_relayd_SOURCES = main.c lttng-relayd.h utils.h utils.c
> cmd.h \
>                        stream-fd.c stream-fd.h \
>                        connection.c connection.h \
>                        viewer-session.c viewer-session.h \
> -                       tracefile-array.c tracefile-array.h
> +                       tracefile-array.c tracefile-array.h \
> +                       tcp_keep_alive.c tcp_keep_alive.h
> 
> # link on liblttngctl for check if relayd is already alive.
> lttng_relayd_LDADD = -lurcu-common -lurcu \
> diff --git a/src/bin/lttng-relayd/main.c b/src/bin/lttng-relayd/main.c
> index 0eb8e28..b4f9457 100644
> --- a/src/bin/lttng-relayd/main.c
> +++ b/src/bin/lttng-relayd/main.c
> @@ -70,6 +70,7 @@
> #include "stream.h"
> #include "connection.h"
> #include "tracefile-array.h"
> +#include "tcp_keep_alive.h"
> 
> static const char *help_msg =
> #ifdef LTTNG_EMBED_HELP
> @@ -899,6 +900,14 @@ restart:
> 					lttcomm_destroy_sock(newsock);
> 					goto error;
> 				}
> +
> +				ret = socket_apply_keep_alive_config(newsock->fd);
> +				if (ret < 0) {
> +					PERROR("setsockopt tcp_keep_alive");
> +					lttcomm_destroy_sock(newsock);
> +					goto error;
> +				}
> +
> 				new_conn = connection_create(newsock, type);
> 				if (!new_conn) {
> 					lttcomm_destroy_sock(newsock);
> @@ -2755,6 +2764,12 @@ int main(int argc, char **argv)
> 		goto exit_options;
> 	}
> 
> +
> +	if(tcp_keep_alive_init()){
> +		retval = -1;
> +		goto exit_options;
> +	}
> +
> 	if (set_signal_handler()) {
> 		retval = -1;
> 		goto exit_options;
> diff --git a/src/bin/lttng-relayd/tcp_keep_alive.c
> b/src/bin/lttng-relayd/tcp_keep_alive.c
> new file mode 100644
> index 0000000..e22f4a9
> --- /dev/null
> +++ b/src/bin/lttng-relayd/tcp_keep_alive.c
> @@ -0,0 +1,560 @@
> +/*
> + * Copyright (C) 2017 - Jonathan Rajotte <jonathan.rajotte-julien at efficios.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2 only,
> + * as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, write to the Free Software Foundation, Inc.,
> + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
> + */
> +
> +#include <sys/types.h>
> +#include <netinet/tcp.h>
> +#include <stdbool.h>
> +#include <sys/socket.h>
> +#include <limits.h>
> +
> +#include <common/compat/getenv.h>
> +#include <common/time.h>
> +#include <common/defaults.h>
> +#include <common/config/session-config.h>
> +
> +#include "tcp_keep_alive.h"
> +
> +#define SOLARIS_IDLE_TIME_MIN_S 10
> +#define SOLARIS_IDLE_TIME_MAX_S 864000 /* 10 days */
> +#define SOLARIS_ABORT_THRESHOLD_MIN_S 1
> +#define SOLARIS_ABORT_THRESHOLD_MAX_S 480 /* 8 minutes */
> +
> +/* Per-platform definition of TCP socket option */
> +#if defined (__linux__)
> +#define COMPAT_SOCKET_LEVEL SOL_TCP
> +#define COMPAT_TCP_LEVEL SOL_TCP
> +#define COMPAT_TCP_ABORT_THRESHOLD 0 /* Does not exists on linux */
> +#define COMPAT_TCP_KEEPIDLE TCP_KEEPIDLE
> +#define COMPAT_TCP_KEEPINTVL TCP_KEEPINTVL
> +#define COMPAT_TCP_KEEPCNT TCP_KEEPCNT
> +
> +#elif defined (__sun__) /* ! defined (__linux__) */
> +#define COMPAT_SOCKET_LEVEL SOL_SOCKET
> +#define COMPAT_TCP_LEVEL IPPROTO_TCP
> +
> +#ifdef TCP_KEEPALIVE_THRESHOLD
> +#define COMPAT_TCP_KEEPIDLE TCP_KEEPALIVE_THRESHOLD
> +#else /* ! defined (TCP_KEEPALIVE_THRESHOLD) */
> +#define COMPAT_TCP_KEEPIDLE 0
> +#endif /* TCP_KEEPALIVE_THRESHOLD */
> +
> +#ifdef TCP_KEEPALIVE_ABORT_THRESHOLD
> +#define COMPAT_TCP_ABORT_THRESHOLD TCP_KEEPALIVE_ABORT_THRESHOLD
> +#else /* ! defined (TCP_KEEPALIVE_ABORT_THRESHOLD) */
> +#define COMPAT_TCP_ABORT_THRESHOLD 0
> +#endif /* TCP_KEEPALIVE_ABORT_THRESHOLD */
> +
> +#define COMPAT_TCP_KEEPINTVL 0 /* Does not exists on sun */
> +#define COMPAT_TCP_KEEPCNT 0 /* Does not exists on sun */
> +
> +#else /* ! defined (__linux__) && ! defined (__sun__) */
> +#define COMPAT_SOCKET_LEVEL 0
> +#define COMPAT_TCP_LEVEL 0
> +#define COMPAT_TCP_ABORT_THRESHOLD 0
> +#define COMPAT_TCP_KEEPIDLE 0
> +#define COMPAT_TCP_KEEPINTVL 0
> +#define COMPAT_TCP_KEEPCNT 0
> +#endif /* ! defined (__linux__) && ! defined (__sun__) */
> +
> +struct tcp_keep_alive_support {
> +	/* TCP keep-alive is supported by this platform. */
> +	bool supported;
> +	/* Overriding idle-time per socket is supported by this
> +	 * platform.
> +	 */
> +	bool idle_time_supported;
> +	/* Overriding probe interval per socket is supported by this
> +	 * platform.
> +	 */
> +	bool probe_interval_supported;
> +	/* Configuring max probe count per socket is supported by this
> +	 * platform.
> +	 */
> +	bool max_probe_count_supported;
> +	/* Overriding  per socket is supported by this
> +	 * platform. Solaris specific
> +	 */
> +	bool abort_threshold_supported;
> +};
> +
> +struct tcp_keep_alive_config {
> +	bool initialized;
> +	/* Maps to the environment variable defined
> +	 * by LTTNG_RELAYD_TCP_KEEP_ALIVE_ENV
> +	 */
> +	bool enabled;
> +	/* Maps to the environment variable defined
> +	 * by LTTNG_RELAYD_TCP_KEEP_ALIVE_IDLE_TIME_ENV
> +	 */
> +	int idle_time;
> +	/* Maps to the environment variable defined
> +	 * by LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBE_INTERVAL_ENV
> +	 */
> +	int probe_interval;
> +	/* Maps to the environment variable defined
> +	 * by LTTNG_RELAYD_TCP_KEEP_ALIVE_MAX_PROBE_COUNT_ENV
> +	 */
> +	int max_probe_count;
> +	/* Maps to the environment variable defined
> +	 * by LTTNG_RELAYD_TCP_KEEP_ALIVE_ABORT_THRESHOLD_ENV
> +	 */
> +	int abort_threshold;
> +};
> +
> +static struct tcp_keep_alive_config config = {
> +	.initialized = false,
> +	.enabled = false,
> +	.idle_time = -1,
> +	.probe_interval = -1,
> +	.max_probe_count = -1,
> +	.abort_threshold = -1
> +};
> +
> +static struct tcp_keep_alive_support support = {
> +	.supported = false,
> +	.idle_time_supported = false,
> +	.probe_interval_supported = false,
> +	.max_probe_count_supported = false,
> +	.abort_threshold_supported = false
> +};
> +
> +/*
> + * Common parser for string to positive int conversion where the value must be
> + * in range [-1, INT_MAX].
> + * Returns -2 on invalid value.
> + */
> +static
> +int tcp_keep_alive_string_to_pos_int_parser(const char *env_var, const char
> *value)
> +{
> +	int ret;
> +	long tmp;
> +	char *endptr = NULL;
> +
> +	errno = 0;
> +	tmp = strtol(value, &endptr, 0);
> +	if (errno != 0) {
> +		ERR("%s cannot be parsed.", env_var);
> +		PERROR("Errno for previous parsing failure.");
> +		ret = -2;
> +		goto end;
> +	}
> +
> +	if (endptr == value || *endptr != '\0') {
> +	    ERR("%s is not a valid number.", env_var);
> +	    ret = -1;
> +	    goto end;
> +	}
> +
> +	if (tmp < -1) {
> +		ERR("%s must be greater or equal to -1.", env_var);
> +		ret = -2;
> +		goto end;
> +	}
> +	if (tmp > INT_MAX){
> +		ERR("%s is too big. Maximum value is %d.", env_var, INT_MAX);
> +		ret = -2;
> +		goto end;
> +	}
> +
> +	ret = (int) tmp;
> +end:
> +	return ret;
> +
> +}
> +
> +/*
> + * Per-platform implementation of tcp_keep_alive_idle_time_modifier.
> + * Returns -2 on invalid value.
> + */
> +#ifdef __sun__
> +static int tcp_keep_alive_idle_time_modifier(int value)
> +{
> +	int ret;
> +	unsigned int tmp_ms;
> +
> +	if (value == -1 || value == 0) {
> +		/* Use system defaults */
> +		ret = value;
> +		goto end;
> +	}
> +
> +	/*
> +	 * Additional constraints for Solaris 11.
> +	 * Minimum 10s, maximum 10 days. Defined by
> +	 *
> https://docs.oracle.com/cd/E23824_01/html/821-1475/tcp-7p.html#REFMAN7tcp-7p
> +	 */
> +	if ((value < SOLARIS_IDLE_TIME_MIN_S || value > SOLARIS_IDLE_TIME_MAX_S)) {
> +		ERR("%s must be comprised between %d and %d inclusively on Solaris.",
> +				DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_IDLE_TIME_ENV,
> +				SOLARIS_IDLE_TIME_MIN_S,
> +				SOLARIS_IDLE_TIME_MAX_S);
> +		ret = -2;
> +		goto end;
> +	}
> +
> +	/* On Solaris idle time is given in milliseconds. */
> +	tmp_ms = (unsigned int) value * MSEC_PER_SEC;
> +	if ((value != 0 && (tmp_ms / (unsigned int) value) != MSEC_PER_SEC ) || tmp_ms
> > INT_MAX){
> +		/* Overflow */
> +		int max_possible_value = INT_MAX / MSEC_PER_SEC;
> +		ERR("%s is too big. Maximum value is %d.",
> +				DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_IDLE_TIME_ENV,
> +				max_possible_value);
> +		ret = -2;
> +		goto end;
> +	}
> +
> +	/* tmp_ms is > 0 and <= INT_MAX. Cast is safe */
> +	ret = (int) tmp_ms;
> +end:
> +	return ret;
> +}
> +#else /* ! defined(__sun__) */
> +static int tcp_keep_alive_idle_time_modifier(int value)
> +{
> +	return value;
> +}
> +#endif /* ! defined(__sun__) */
> +
> +/* Per-platform support of tcp_keep_alive functionality. */
> +#if defined (__linux__)
> +static
> +void tcp_keep_alive_init_support(struct tcp_keep_alive_support *support)
> +{
> +	support->supported = true;
> +	support->idle_time_supported = true;
> +	support->probe_interval_supported = true;
> +	support->max_probe_count_supported = true;
> +	/* Solaris specific */
> +	support->abort_threshold_supported = false;
> +}
> +#elif defined(__sun__) /* ! defined (__linux__) */
> +static
> +void tcp_keep_alive_init_support(struct tcp_keep_alive_support *support)
> +{
> +	support->supported = true;
> +#ifdef TCP_KEEPALIVE_THRESHOLD
> +	support->idle_time_supported = true;
> +#else
> +	support->idle_time_supported = false;;
> +#endif /* TCP_KEEPALIVE_THRESHOLD */
> +
> +	/* Sun does not support either tcp_keepalive_probes or
> +	 * tcp_keepalive_intvl. Inferring a value for
> +	 * TCP_KEEP_ALIVE_ABORT_THRESHOLD doing
> +	 * tcp_keepalive_probes * tcp_keepalive_intvl could yield a good
> +	 * alternative but Solaris does not detail the algorithm used (constant
> +	 * time retry like linux or something fancier). Ignore those
> +	 * settings on Solaris 11. We prefer to expose an environment
> +	 * variable only used on Sun for the abort threshold.
> +	 */
> +	support->probe_interval_supported = false;
> +	support->max_probe_count_supported = false;
> +#ifdef TCP_KEEPALIVE_ABORT_THRESHOLD
> +	support->abort_threshold_supported = true;
> +#else
> +	support->abort_threshold_supported = false;
> +#endif /* TCP_KEEPALIVE_THRESHOLD */
> +}
> +#else /* ! defined(__sun__) && ! defined(__linux__) */
> +/* Not supported */
> +static
> +void tcp_keep_alive_init_support(struct tcp_keep_alive_support *support)
> +{
> +	support->supported = false;
> +	support->idle_time_supported = false;
> +	support->probe_interval_supported = false;
> +	support->max_probe_count_supported = false;
> +	support->abort_threshold_supported = false;
> +}
> +#endif /* ! defined(__sun__) && ! defined(__linux__) */
> +
> +#ifdef __sun__
> +/*
> + * Sun specific modifier for abort threshold.
> + * Return -2 on error.
> + * */
> +static int tcp_keep_alive_abort_threshold_modifier(int value)
> +{
> +	int ret;
> +	unsigned int tmp_ms;
> +
> +	if (value == -1) {
> +		/* Use system defaults */
> +		ret = value;
> +		goto end;
> +	}
> +
> +	/*
> +	 * Additional constraints for Solaris 11.
> +	 * Between 0 and 8 minutes.
> +	 * https://docs.oracle.com/cd/E19120-01/open.solaris/819-2724/fsvdh/index.html
> +	 * Restrict from 1 seconds to 8 minutes sice the 0 value goes against
> +	 * the purpose of dead peers detection by never timing out when probing.
> +	 * It does NOT mean that the connection timeout immediately.
> +	 * (Make no sense but was tested, validated and match the doc).
> +	 */
> +	if ((value < SOLARIS_ABORT_THRESHOLD_MIN_S || value >
> SOLARIS_ABORT_THRESHOLD_MAX_S)) {
> +		ERR("%s must be comprised between %d and %d inclusively on Solaris.",
> +				DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_ABORT_THRESHOLD_ENV,
> +				SOLARIS_ABORT_THRESHOLD_MIN_S,
> +				SOLARIS_ABORT_THRESHOLD_MAX_S);
> +		ret = -2;
> +		goto end;
> +	}
> +
> +	/* Abort threshold is given in milliseconds. */
> +	tmp_ms = (unsigned int) value * MSEC_PER_SEC;
> +	if ((value != 0 && (tmp_ms / (unsigned int) value) != MSEC_PER_SEC ) || tmp_ms
> > INT_MAX){
> +		/* Overflow */
> +		int max_possible_value = INT_MAX / MSEC_PER_SEC;
> +		ERR("%s is too big. Maximum value is %d.",
> +				DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_ABORT_THRESHOLD_ENV,
> +				max_possible_value);
> +		ret = -2;
> +		goto end;
> +	}
> +
> +	/* tmp_ms is > 0 and <= INT_MAX. Cast is safe */
> +	ret = (int) tmp_ms;
> +end:
> +	return ret;
> +}
> +#else
> +static int tcp_keep_alive_abort_threshold_modifier(int value)
> +{
> +	return value;
> +}
> +#endif /* defined (__sun__) */
> +
> +
> +
> +/* Retrieve settings from env vars and check/warn if supported by platform. */
> +static
> +int tcp_keep_alive_init_config(struct tcp_keep_alive_support *support, struct
> tcp_keep_alive_config *config)
> +{
> +	int ret;
> +	const char *value;
> +
> +	/* config is already defined with default value */
> +	config->initialized = true;
> +
> +	value = lttng_secure_getenv(DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_ENV);
> +	if (!support->supported) {
> +		if (value) {
> +			WARN("Using per-socket TCP Keep-alive mechanism is not supported by this
> platform. Ignoring the %s environment variable.",
> +				DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_ENV);
> +		}
> +		config->enabled = false;
> +	} else if (value) {
> +		ret = config_parse_value(value);
> +		if (ret < 0 || ret > 1) {
> +			ERR("Invalid value for %s", DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_ENV);
> +			ret = 1;
> +			goto error;
> +		}
> +		config->enabled = ret;
> +	}
> +	DBG("TCP keep-alive mechanism %s", config->enabled ? "enabled": "disabled");
> +
> +	/* Get value for tcp_keepalive_time in seconds*/
> +	value =
> lttng_secure_getenv(DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_IDLE_TIME_ENV);
> +	if (!support->idle_time_supported) {
> +		if (value) {
> +			WARN("Overriding the TCP keep-alive idle time threshold per-socket is not
> supported by this platform. Ignoring the %s environment variable.",
> +				DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_IDLE_TIME_ENV);
> +		}
> +		config->idle_time = -1;
> +	} else if (value) {
> +		int idle_time_platform;
> +		int idle_time_seconds;;
> +		idle_time_seconds =
> tcp_keep_alive_string_to_pos_int_parser(DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_IDLE_TIME_ENV,
> value);
> +		if (idle_time_seconds < -1) {
> +			ret = 1;
> +			goto error;
> +		}
> +
> +		idle_time_platform = tcp_keep_alive_idle_time_modifier(idle_time_seconds);
> +		if (idle_time_platform < -1) {
> +			ret = 1;
> +			goto error;
> +		}
> +
> +		config->idle_time = idle_time_platform;
> +		DBG("Overriding %s to %d",
> +			DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_IDLE_TIME_ENV,
> +			idle_time_seconds);
> +	}
> +
> +	/* Get value for tcp_keepalive_intvl in seconds */
> +	value =
> lttng_secure_getenv(DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBE_INTERVAL_ENV);
> +	if (!support->probe_interval_supported) {
> +		if (value) {
> +			WARN("Overriding the TCP keep-alive probe interval time per-socket is not
> supported by this platform. Ignoring the %s environment variable.",
> +				DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBE_INTERVAL_ENV);
> +		}
> +		config->probe_interval=-1;
> +	} else if (value) {
> +		int probe_interval;
> +		probe_interval =
> tcp_keep_alive_string_to_pos_int_parser(DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBE_INTERVAL_ENV,
> value);
> +		if (probe_interval < -1) {
> +			ret = 1;
> +			goto error;
> +		}
> +		config->probe_interval = probe_interval;
> +		DBG("Overriding %s to %d",
> +			DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBE_INTERVAL_ENV,
> +			config->probe_interval);
> +	}
> +
> +	/* Get value for tcp_keepalive_probes */
> +	value =
> lttng_secure_getenv(DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_MAX_PROBE_COUNT_ENV);
> +	if (!support->max_probe_count_supported) {
> +		if (value) {
> +			WARN("Overriding the TCP keep-alive maximum probe count per-socket is not
> supported by this platform. Ignoring the %s environment variable.",
> +				 DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_MAX_PROBE_COUNT_ENV);
> +		}
> +		config->max_probe_count = -1;
> +	} else if (value) {
> +		int max_probe_count;
> +		max_probe_count = tcp_keep_alive_string_to_pos_int_parser(
> +				DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_MAX_PROBE_COUNT_ENV,
> +				value);
> +		if (max_probe_count < -1) {
> +			ret = 1;
> +			goto error;
> +		}
> +		config->max_probe_count = max_probe_count;
> +		DBG("Overriding %s to %d",
> +			DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_MAX_PROBE_COUNT_ENV,
> +			config->max_probe_count);
> +	}
> +
> +	/* Get value for tcp_keepalive_abort_interval*/
> +	value =
> lttng_secure_getenv(DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_ABORT_THRESHOLD_ENV);
> +	if (!support->abort_threshold_supported) {
> +		if (value) {
> +			WARN("Overriding the TCP keep-alive abort threshold per-socket is not
> supported by this platform. Ignoring the %s environment variable.",
> +				DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_ABORT_THRESHOLD_ENV);
> +		}
> +		config->abort_threshold = -1;
> +	} else if (value) {
> +		int abort_threshold_platform;
> +		int abort_threshold_seconds;
> +
> +		abort_threshold_seconds = tcp_keep_alive_string_to_pos_int_parser(
> +				DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_MAX_PROBE_COUNT_ENV,
> +				value);
> +		if (abort_threshold_seconds < -1) {
> +			ret = 1;
> +			goto error;
> +		}
> +
> +		abort_threshold_platform =
> tcp_keep_alive_abort_threshold_modifier(abort_threshold_seconds);
> +		if (abort_threshold_platform < -1) {
> +			ret = 1;
> +			goto error;
> +		}
> +
> +		config->abort_threshold = abort_threshold_platform;
> +		DBG("Overriding %s to %d",
> +			DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_ABORT_THRESHOLD_ENV,
> +			config->abort_threshold);
> +	}
> +
> +	ret = 0;
> +
> +error:
> +	return ret;
> +}
> +
> +/*
> + * Initialize the TCP keep-alive settings.
> + */
> +int tcp_keep_alive_init(void)
> +{
> +	tcp_keep_alive_init_support(&support);
> +	return tcp_keep_alive_init_config(&support, &config);
> +}
> +
> +/*
> + * Set the socket options regarding TCP keep-alive.
> + */
> +int socket_apply_keep_alive_config(int socket_fd)
> +{
> +	int ret;
> +	int val = 1;
> +
> +	if (!config.initialized) {
> +		ERR("TCP keep-alive configuration is not initialized.");
> +		abort();
> +	}
> +
> +	/* TCP keep-alive */
> +	if (!support.supported || !config.enabled ) {
> +		ret = 0;
> +		goto end;
> +	}
> +
> +	ret = setsockopt(socket_fd, COMPAT_SOCKET_LEVEL, SO_KEEPALIVE, &val,
> +			sizeof(val));
> +	if (ret < 0) {
> +		PERROR("setsockopt so_keepalive");
> +		goto end;
> +	}
> +
> +	/* TCP keep-alive idle time */
> +	if (support.idle_time_supported && config.idle_time > 0) {
> +		ret = setsockopt(socket_fd, COMPAT_TCP_LEVEL, COMPAT_TCP_KEEPIDLE,
> &config.idle_time,
> +				sizeof(config.idle_time));
> +		if (ret < 0) {
> +			PERROR("setsockopt TCP_KEEPIDLE");
> +			goto end;
> +		}
> +	}
> +	/* TCP keep-alive probe interval */
> +	if (support.probe_interval_supported && config.probe_interval > 0) {
> +		ret = setsockopt(socket_fd, COMPAT_TCP_LEVEL, COMPAT_TCP_KEEPINTVL,
> &config.probe_interval,
> +				sizeof(config.probe_interval));
> +		if (ret < 0) {
> +			PERROR("setsockopt TCP_KEEPINTVL");
> +			goto end;
> +		}
> +	}
> +
> +	/* TCP keep-alive max probe count */
> +	if (support.max_probe_count_supported && config.max_probe_count > 0) {
> +		ret = setsockopt(socket_fd, COMPAT_TCP_LEVEL, COMPAT_TCP_KEEPCNT,
> &config.max_probe_count,
> +				sizeof(config.max_probe_count));
> +		if (ret < 0) {
> +			PERROR("setsockopt TCP_KEEPCNT");
> +			goto end;
> +		}
> +	}
> +
> +	/* TCP keep-alive abort threshold */
> +	if (support.abort_threshold_supported && config.abort_threshold > 0) {
> +		ret = setsockopt(socket_fd, COMPAT_TCP_LEVEL, COMPAT_TCP_ABORT_THRESHOLD,
> &config.abort_threshold,
> +				sizeof(config.max_probe_count));
> +		if (ret < 0) {
> +			PERROR("setsockopt TCP_KEEPALIVE_ABORT_THRESHOLD");
> +			goto end;
> +		}
> +	}
> +end:
> +	return ret;
> +}
> diff --git a/src/bin/lttng-relayd/tcp_keep_alive.h
> b/src/bin/lttng-relayd/tcp_keep_alive.h
> new file mode 100644
> index 0000000..fb81ecd
> --- /dev/null
> +++ b/src/bin/lttng-relayd/tcp_keep_alive.h
> @@ -0,0 +1,24 @@
> +#ifndef RELAYD_TCP_KEEP_ALIVE_H
> +#define RELAYD_TCP_KEEP_ALIVE_H
> +
> +/*
> + * Copyright (C) 2017 - Jonathan Rajotte <jonathan.rajotte-julien at efficios.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2 only,
> + * as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, write to the Free Software Foundation, Inc.,
> + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
> + */
> +
> +int tcp_keep_alive_init(void);
> +int socket_apply_keep_alive_config(int socket_fd);
> +
> +#endif /* RELAYD_TCP_KEEP_ALIVE_H */
> diff --git a/src/common/defaults.h b/src/common/defaults.h
> index 8669a84..0ee13b7 100644
> --- a/src/common/defaults.h
> +++ b/src/common/defaults.h
> @@ -328,6 +328,13 @@
> /* Default maximal size of message notification channel message payloads. */
> #define DEFAULT_CLIENT_MAX_QUEUED_NOTIFICATIONS_COUNT		100
> 
> +
> +#define DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_ENV "LTTNG_RELAYD_TCP_KEEP_ALIVE"
> +#define DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_IDLE_TIME_ENV
> "LTTNG_RELAYD_TCP_KEEP_ALIVE_IDLE_TIME"
> +#define DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_MAX_PROBE_COUNT_ENV
> "LTTNG_RELAYD_TCP_KEEP_ALIVE_MAX_PROBE_COUNT"
> +#define DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBE_INTERVAL_ENV
> "LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBE_INTERVAL"
> +#define DEFAULT_LTTNG_RELAYD_TCP_KEEP_ALIVE_ABORT_THRESHOLD_ENV
> "LTTNG_RELAYD_TCP_KEEP_ALIVE_ABORT_THRESHOLD"
> +
> /*
>  * Returns the default subbuf size.
>  *
> --
> 2.7.4
> 
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


More information about the lttng-dev mailing list