[mptcp-net,2/2] mptcp: fix full TCP keep-alive support

Message ID	20240508-mptcp-tcp-keepalive-sockopts-v1-2-fdf7e03e14c4@kernel.org (mailing list archive)
State	Superseded, archived
Delegated to:	Paolo Abeni
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 045D87351A for <mptcp@lists.linux.dev>; Wed, 8 May 2024 09:45:03 +0000 (UTC) From: "Matthieu Baerts (NGI0)" <matttbe@kernel.org> Date: Wed, 08 May 2024 11:44:56 +0200 Subject: [PATCH mptcp-net 2/2] mptcp: fix full TCP keep-alive support Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20240508-mptcp-tcp-keepalive-sockopts-v1-2-fdf7e03e14c4@kernel.org> References: <20240508-mptcp-tcp-keepalive-sockopts-v1-0-fdf7e03e14c4@kernel.org> In-Reply-To: <20240508-mptcp-tcp-keepalive-sockopts-v1-0-fdf7e03e14c4@kernel.org> To: mptcp@lists.linux.dev Cc: "Matthieu Baerts (NGI0)" <matttbe@kernel.org>
Series	mptcp: getsockopt(SO_KEEPALIVE) and TCP_KEEP* sockopts \| expand [mptcp-net,0/2] mptcp: getsockopt(SO_KEEPALIVE) and TCP_KEEP* sockopts [mptcp-net,1/2] mptcp: SO_KEEPALIVE: fix getsockopt support [mptcp-net,2/2] mptcp: fix full TCP keep-alive support

Message ID

20240508-mptcp-tcp-keepalive-sockopts-v1-2-fdf7e03e14c4@kernel.org (mailing list archive)

State

Superseded, archived

Delegated to:

Paolo Abeni

Headers

From: "Matthieu Baerts (NGI0)" <matttbe@kernel.org>
Date: Wed, 08 May 2024 11:44:56 +0200
Subject: [PATCH mptcp-net 2/2] mptcp: fix full TCP keep-alive support
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Message-Id: 
 <20240508-mptcp-tcp-keepalive-sockopts-v1-2-fdf7e03e14c4@kernel.org>
References: 
 <20240508-mptcp-tcp-keepalive-sockopts-v1-0-fdf7e03e14c4@kernel.org>
In-Reply-To: 
 <20240508-mptcp-tcp-keepalive-sockopts-v1-0-fdf7e03e14c4@kernel.org>
To: mptcp@lists.linux.dev
Cc: "Matthieu Baerts (NGI0)" <matttbe@kernel.org>

Series

mptcp: getsockopt(SO_KEEPALIVE) and TCP_KEEP* sockopts | expand

Checks

Context	Check	Description
matttbe/build	success	Build and static analysis OK
matttbe/checkpatch	warning	total: 0 errors, 3 warnings, 0 checks, 96 lines checked
matttbe/shellcheck	success	MPTCP selftests files have not been modified
matttbe/KVM_Validation__normal	success	Success! ✅
matttbe/KVM_Validation__debug	success	Success! ✅
matttbe/KVM_Validation__btf__only_bpftest_all_	success	Success! ✅

Context

Check

Description

matttbe/build

success

Build and static analysis OK

matttbe/checkpatch

warning

total: 0 errors, 3 warnings, 0 checks, 96 lines checked

matttbe/shellcheck

success

MPTCP selftests files have not been modified

matttbe/KVM_Validation__normal

success

Success! ✅

matttbe/KVM_Validation__debug

success

Success! ✅

matttbe/KVM_Validation__btf__only_bpftest_all_

success

Success! ✅

Commit Message

Matthieu Baerts May 8, 2024, 9:44 a.m. UTC

SO_KEEPALIVE support has been added a while ago, as part of a series
"adding SOL_SOCKET" support. To have a full control of this keep-alive
feature, it is important to also support TCP_KEEP* socket options at the
SOL_TCP level.

Supporting them on the setsockopt() part is easy, it is just a matter of
remembering each value in the MPTCP sock structure, and calling
tcp_sock_set_keep*() helpers on each subflow. If the value is not
modified (0), calling these helpers will not do anything. For the
getsockopt() part, the corresponding value from the MPTCP sock structure
or the default one is simply returned.

It looks important for kernels supporting SO_KEEPALIVE, to also support
TCP_KEEP* options as well: some apps seem to (wrongly) consider that if
the former is supported, the latter ones will be supported as well. But
also, not having this simple and isolated change is preventing MPTCP
support in some apps, and libraries like GoLang [1]. This is why this
patch is seen as a fix.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/383
Fixes: 1b3e7ede1365 ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY")
Link: https://github.com/golang/go/issues/56539 [1]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 net/mptcp/protocol.h |  3 +++
 net/mptcp/sockopt.c  | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+)

Comments

Paolo Abeni May 8, 2024, 3:44 p.m. UTC | #1

On Wed, 2024-05-08 at 11:44 +0200, Matthieu Baerts (NGI0) wrote:
> SO_KEEPALIVE support has been added a while ago, as part of a series
> "adding SOL_SOCKET" support. To have a full control of this keep-alive
> feature, it is important to also support TCP_KEEP* socket options at the
> SOL_TCP level.
> 
> Supporting them on the setsockopt() part is easy, it is just a matter of
> remembering each value in the MPTCP sock structure, and calling
> tcp_sock_set_keep*() helpers on each subflow. If the value is not
> modified (0), calling these helpers will not do anything. For the
> getsockopt() part, the corresponding value from the MPTCP sock structure
> or the default one is simply returned.
> 
> It looks important for kernels supporting SO_KEEPALIVE, to also support
> TCP_KEEP* options as well: some apps seem to (wrongly) consider that if
> the former is supported, the latter ones will be supported as well. But
> also, not having this simple and isolated change is preventing MPTCP
> support in some apps, and libraries like GoLang [1]. This is why this
> patch is seen as a fix.
> 
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/383
> Fixes: 1b3e7ede1365 ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY")
> Link: https://github.com/golang/go/issues/56539 [1]
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> ---
>  net/mptcp/protocol.h |  3 +++
>  net/mptcp/sockopt.c  | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 60 insertions(+)
> 
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index c09357e04f23..8c7ab1d50c75 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -309,6 +309,9 @@ struct mptcp_sock {
>  			in_accept_queue:1,
>  			free_first:1,
>  			rcvspace_init:1;
> +	u8		keepalive_cnt;
> +	unsigned int	keepalive_idle;
> +	unsigned int	keepalive_intvl;

It's a bit a pity we have to duplicate such values in all the subflows
_and_ in the msk...

>  	u32		notsent_lowat;
>  	struct work_struct work;
>  	struct sk_buff  *ooo_last_skb;
> diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
> index d9f0d36e24ad..794d8b63024a 100644
> --- a/net/mptcp/sockopt.c
> +++ b/net/mptcp/sockopt.c
> @@ -622,6 +622,30 @@ static int mptcp_setsockopt_sol_tcp_congestion(struct mptcp_sock *msk, sockptr_t
>  	return ret;
>  }
>  
> +static int __mptcp_setsockopt_set_val(struct mptcp_sock *msk, int max,
> +				      int (*set_val)(struct sock *, int),
> +				      unsigned int *msk_val, int val)
> +{
> +	struct mptcp_subflow_context *subflow;
> +	int ret = 0;
> +
> +	if (val < 1 || val > max)
> +		return -EINVAL;

I think it would be better just skip this check, to avoid copying more
logic from TCP, and do the store and sockopt_seq_inc() only when the
operation is successful on all the subflows.

> +
> +	*msk_val = val;
> +	sockopt_seq_inc(msk);
> +
> +	mptcp_for_each_subflow(msk, subflow) {
> +		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> +
> +		lock_sock(ssk);
> +		ret |= set_val(ssk, val);

the above produces 'reasonable' return code only if all 'set_val' calls
will return the same error code.

That expectation should be respected by the existing code, but what
about keeping the first error code only? e.g:

	int err, ret;
	// ...

		ret = set_val(ssk, val);
		err = err ?: ret;

	// ...
	return err;

> +		release_sock(ssk);
> +	}
> +
> +	return ret;
> +}
> +
>  static int __mptcp_setsockopt_sol_tcp_cork(struct mptcp_sock *msk, int val)
>  {
>  	struct mptcp_subflow_context *subflow;
> @@ -818,6 +842,22 @@ static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
>  	case TCP_NODELAY:
>  		ret = __mptcp_setsockopt_sol_tcp_nodelay(msk, val);
>  		break;
> +	case TCP_KEEPIDLE:
> +		ret = __mptcp_setsockopt_set_val(msk, MAX_TCP_KEEPIDLE,
> +						 &tcp_sock_set_keepidle_locked,
> +						 &msk->keepalive_idle, val);
> +		break;
> +	case TCP_KEEPINTVL:
> +		ret = __mptcp_setsockopt_set_val(msk, MAX_TCP_KEEPINTVL,
> +						 &tcp_sock_set_keepintvl,
> +						 &msk->keepalive_intvl, val);
> +		break;
> +	case TCP_KEEPCNT:
> +		ret = __mptcp_setsockopt_set_val(msk, MAX_TCP_KEEPCNT,
> +						 &tcp_sock_set_keepcnt,
> +						 (unsigned int *)&msk->keepalive_cnt,

keepalive_cnt is 'u8' so the above will be technically a buffer
overflow ;) If we keep 'keepalive_cnt' in msk, it has to become
an 'unsigned int'

> +						 val);
> +		break;
>  	default:
>  		ret = -ENOPROTOOPT;
>  	}
> @@ -1332,6 +1372,8 @@ static int mptcp_put_int_option(struct mptcp_sock *msk, char __user *optval,
>  static int mptcp_getsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
>  				    char __user *optval, int __user *optlen)
>  {
> +	struct sock *sk = (void *)msk;
> +
>  	switch (optname) {
>  	case TCP_ULP:
>  	case TCP_CONGESTION:
> @@ -1352,6 +1394,18 @@ static int mptcp_getsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
>  		return mptcp_put_int_option(msk, optval, optlen, msk->nodelay);
>  	case TCP_NOTSENT_LOWAT:
>  		return mptcp_put_int_option(msk, optval, optlen, msk->notsent_lowat);
> +	case TCP_KEEPIDLE:
> +		return mptcp_put_int_option(msk, optval, optlen,
> +					    msk->keepalive_idle ? :
> +					    READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_keepalive_time) / HZ);

If we force the first subflow to be present/allocated at setsockopt
time, then here we could:

	case TCP_KEEPIDLE:
	case TCP_KEEPIDLE:
	case TCP_KEEPCNT:
		return do_tcp_getsockopt(msk->first, SOL_TCP,
optname, 
					USER_SOCKPTR(optval),
					USER_SOCKPTR(optlen));

> +	case TCP_KEEPINTVL:
> +		return mptcp_put_int_option(msk, optval, optlen,
> +					    msk->keepalive_intvl ? :
> +					    READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_keepalive_intvl) / HZ);
> +	case TCP_KEEPCNT:
> +		return mptcp_put_int_option(msk, optval, optlen,
> +					    msk->keepalive_cnt ? :
> +					    READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_keepalive_probes));
>  	}
>  	return -EOPNOTSUPP;
>  }
> @@ -1467,6 +1521,9 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
>  		tcp_set_congestion_control(ssk, msk->ca_name, false, true);
>  	__tcp_sock_set_cork(ssk, !!msk->cork);
>  	__tcp_sock_set_nodelay(ssk, !!msk->nodelay);
> +	tcp_sock_set_keepidle_locked(ssk, msk->keepalive_idle);
> +	tcp_sock_set_keepintvl(ssk, msk->keepalive_intvl);
> +	tcp_sock_set_keepcnt(ssk, msk->keepalive_cnt);
>  
>  	inet_assign_bit(TRANSPARENT, ssk, inet_test_bit(TRANSPARENT, sk));
>  	inet_assign_bit(FREEBIND, ssk, inet_test_bit(FREEBIND, sk));
>

Matthieu Baerts May 9, 2024, 10:28 a.m. UTC | #2

Hi Paolo,

Thank you for the review!

On 08/05/2024 17:44, Paolo Abeni wrote:
> On Wed, 2024-05-08 at 11:44 +0200, Matthieu Baerts (NGI0) wrote:
>> SO_KEEPALIVE support has been added a while ago, as part of a series
>> "adding SOL_SOCKET" support. To have a full control of this keep-alive
>> feature, it is important to also support TCP_KEEP* socket options at the
>> SOL_TCP level.
>>
>> Supporting them on the setsockopt() part is easy, it is just a matter of
>> remembering each value in the MPTCP sock structure, and calling
>> tcp_sock_set_keep*() helpers on each subflow. If the value is not
>> modified (0), calling these helpers will not do anything. For the
>> getsockopt() part, the corresponding value from the MPTCP sock structure
>> or the default one is simply returned.
>>
>> It looks important for kernels supporting SO_KEEPALIVE, to also support
>> TCP_KEEP* options as well: some apps seem to (wrongly) consider that if
>> the former is supported, the latter ones will be supported as well. But
>> also, not having this simple and isolated change is preventing MPTCP
>> support in some apps, and libraries like GoLang [1]. This is why this
>> patch is seen as a fix.
>>
>> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/383
>> Fixes: 1b3e7ede1365 ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY")
>> Link: https://github.com/golang/go/issues/56539 [1]
>> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>> ---
>>  net/mptcp/protocol.h |  3 +++
>>  net/mptcp/sockopt.c  | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 60 insertions(+)
>>
>> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
>> index c09357e04f23..8c7ab1d50c75 100644
>> --- a/net/mptcp/protocol.h
>> +++ b/net/mptcp/protocol.h
>> @@ -309,6 +309,9 @@ struct mptcp_sock {
>>  			in_accept_queue:1,
>>  			free_first:1,
>>  			rcvspace_init:1;
>> +	u8		keepalive_cnt;
>> +	unsigned int	keepalive_idle;
>> +	unsigned int	keepalive_intvl;
> 
> It's a bit a pity we have to duplicate such values in all the subflows
> _and_ in the msk...

Yes, I agree, but I don't see how else we could do that :-/

(Or relying on what has been set in the first subflow? But that's not
what we are currently doing, and should be part of a separated
refactoring I think, see below.)

>>  	u32		notsent_lowat;
>>  	struct work_struct work;
>>  	struct sk_buff  *ooo_last_skb;
>> diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
>> index d9f0d36e24ad..794d8b63024a 100644
>> --- a/net/mptcp/sockopt.c
>> +++ b/net/mptcp/sockopt.c
>> @@ -622,6 +622,30 @@ static int mptcp_setsockopt_sol_tcp_congestion(struct mptcp_sock *msk, sockptr_t
>>  	return ret;
>>  }
>>  
>> +static int __mptcp_setsockopt_set_val(struct mptcp_sock *msk, int max,
>> +				      int (*set_val)(struct sock *, int),
>> +				      unsigned int *msk_val, int val)
>> +{
>> +	struct mptcp_subflow_context *subflow;
>> +	int ret = 0;
>> +
>> +	if (val < 1 || val > max)
>> +		return -EINVAL;
> 
> I think it would be better just skip this check, to avoid copying more
> logic from TCP, and do the store and sockopt_seq_inc() only when the
> operation is successful on all the subflows.

Good idea!

>> +	*msk_val = val;
>> +	sockopt_seq_inc(msk);
>> +
>> +	mptcp_for_each_subflow(msk, subflow) {
>> +		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
>> +
>> +		lock_sock(ssk);
>> +		ret |= set_val(ssk, val);
> 
> the above produces 'reasonable' return code only if all 'set_val' calls
> will return the same error code.
> 
> That expectation should be respected by the existing code, but what
> about keeping the first error code only? e.g:
> 
> 	int err, ret;
> 	// ...
> 
> 		ret = set_val(ssk, val);
> 		err = err ?: ret;
> 
> 	// ...
> 	return err;

Yes, sounds cleaner and similar to what we did with TCP_CONGESTION, thanks!

>> +		release_sock(ssk);
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>  static int __mptcp_setsockopt_sol_tcp_cork(struct mptcp_sock *msk, int val)
>>  {
>>  	struct mptcp_subflow_context *subflow;
>> @@ -818,6 +842,22 @@ static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
>>  	case TCP_NODELAY:
>>  		ret = __mptcp_setsockopt_sol_tcp_nodelay(msk, val);
>>  		break;
>> +	case TCP_KEEPIDLE:
>> +		ret = __mptcp_setsockopt_set_val(msk, MAX_TCP_KEEPIDLE,
>> +						 &tcp_sock_set_keepidle_locked,
>> +						 &msk->keepalive_idle, val);
>> +		break;
>> +	case TCP_KEEPINTVL:
>> +		ret = __mptcp_setsockopt_set_val(msk, MAX_TCP_KEEPINTVL,
>> +						 &tcp_sock_set_keepintvl,
>> +						 &msk->keepalive_intvl, val);
>> +		break;
>> +	case TCP_KEEPCNT:
>> +		ret = __mptcp_setsockopt_set_val(msk, MAX_TCP_KEEPCNT,
>> +						 &tcp_sock_set_keepcnt,
>> +						 (unsigned int *)&msk->keepalive_cnt,
> 
> keepalive_cnt is 'u8' so the above will be technically a buffer
> overflow ;) If we keep 'keepalive_cnt' in msk, it has to become
> an 'unsigned int'

Argh, good catch! I checked the limits (1 < val < MAX_TCP_KEEPCNT), and
I missed the overflowing later, sorry :-/
Easier to store the original value, as an 'int' then.

> 
>> +						 val);
>> +		break;
>>  	default:
>>  		ret = -ENOPROTOOPT;
>>  	}
>> @@ -1332,6 +1372,8 @@ static int mptcp_put_int_option(struct mptcp_sock *msk, char __user *optval,
>>  static int mptcp_getsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
>>  				    char __user *optval, int __user *optlen)
>>  {
>> +	struct sock *sk = (void *)msk;
>> +
>>  	switch (optname) {
>>  	case TCP_ULP:
>>  	case TCP_CONGESTION:
>> @@ -1352,6 +1394,18 @@ static int mptcp_getsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
>>  		return mptcp_put_int_option(msk, optval, optlen, msk->nodelay);
>>  	case TCP_NOTSENT_LOWAT:
>>  		return mptcp_put_int_option(msk, optval, optlen, msk->notsent_lowat);
>> +	case TCP_KEEPIDLE:
>> +		return mptcp_put_int_option(msk, optval, optlen,
>> +					    msk->keepalive_idle ? :
>> +					    READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_keepalive_time) / HZ);
> 
> If we force the first subflow to be present/allocated at setsockopt
> time, then here we could:
> 
> 	case TCP_KEEPIDLE:
> 	case TCP_KEEPIDLE:
> 	case TCP_KEEPCNT:
> 		return do_tcp_getsockopt(msk->first, SOL_TCP,
> optname, 
> 					USER_SOCKPTR(optval),
> 					USER_SOCKPTR(optlen));

I initially did that -- by calling mptcp_getsockopt_first_sf_only() --
like it is done for TCP_FASTOPEN*, etc.

But then I changed that because TCP_FASTOPEN* and similar only make
sense on the first subflow, while the other ones in the list (TCP_INQ,
TCP_CORK, etc.) are set on all subflows. So I thought not to make an
exception here. (Also, socket options could be modified per subflow with
BPF, best not to have exceptions here).

But I still think it would make sense to do that if we also do the same
with the other socket options: I do think the way we handle the socket
options should be simplified, e.g. what I proposed a while ago, and I
failed to update so far:

  https://patchwork.kernel.org/project/mptcp/list/?series=763505&state=7

We could rely on the first subflow for the get as you suggested, but
also use {ip,ipv6,tcp,sock}_[gs]etsockopt(), instead of setting
variables manually. And maybe refactoring the 'sync' to "duplicate"
options set on the first subflow, and not to have to store TCP specific
options that have been set.

Would it be OK to look at that later, and keep this way here?

>> +	case TCP_KEEPINTVL:
>> +		return mptcp_put_int_option(msk, optval, optlen,
>> +					    msk->keepalive_intvl ? :
>> +					    READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_keepalive_intvl) / HZ);
>> +	case TCP_KEEPCNT:
>> +		return mptcp_put_int_option(msk, optval, optlen,
>> +					    msk->keepalive_cnt ? :
>> +					    READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_keepalive_probes));
>>  	}
>>  	return -EOPNOTSUPP;
>>  }
>> @@ -1467,6 +1521,9 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
>>  		tcp_set_congestion_control(ssk, msk->ca_name, false, true);
>>  	__tcp_sock_set_cork(ssk, !!msk->cork);
>>  	__tcp_sock_set_nodelay(ssk, !!msk->nodelay);
>> +	tcp_sock_set_keepidle_locked(ssk, msk->keepalive_idle);
>> +	tcp_sock_set_keepintvl(ssk, msk->keepalive_intvl);
>> +	tcp_sock_set_keepcnt(ssk, msk->keepalive_cnt);
>>  
>>  	inet_assign_bit(TRANSPARENT, ssk, inet_test_bit(TRANSPARENT, sk));
>>  	inet_assign_bit(FREEBIND, ssk, inet_test_bit(FREEBIND, sk));
>>
> 

Cheers,
Matt

diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index c09357e04f23..8c7ab1d50c75 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -309,6 +309,9 @@  struct mptcp_sock {
 			in_accept_queue:1,
 			free_first:1,
 			rcvspace_init:1;
+	u8		keepalive_cnt;
+	unsigned int	keepalive_idle;
+	unsigned int	keepalive_intvl;
 	u32		notsent_lowat;
 	struct work_struct work;
 	struct sk_buff  *ooo_last_skb;
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index d9f0d36e24ad..794d8b63024a 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -622,6 +622,30 @@  static int mptcp_setsockopt_sol_tcp_congestion(struct mptcp_sock *msk, sockptr_t
 	return ret;
 }
 
+static int __mptcp_setsockopt_set_val(struct mptcp_sock *msk, int max,
+				      int (*set_val)(struct sock *, int),
+				      unsigned int *msk_val, int val)
+{
+	struct mptcp_subflow_context *subflow;
+	int ret = 0;
+
+	if (val < 1 || val > max)
+		return -EINVAL;
+
+	*msk_val = val;
+	sockopt_seq_inc(msk);
+
+	mptcp_for_each_subflow(msk, subflow) {
+		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+
+		lock_sock(ssk);
+		ret |= set_val(ssk, val);
+		release_sock(ssk);
+	}
+
+	return ret;
+}
+
 static int __mptcp_setsockopt_sol_tcp_cork(struct mptcp_sock *msk, int val)
 {
 	struct mptcp_subflow_context *subflow;
@@ -818,6 +842,22 @@  static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
 	case TCP_NODELAY:
 		ret = __mptcp_setsockopt_sol_tcp_nodelay(msk, val);
 		break;
+	case TCP_KEEPIDLE:
+		ret = __mptcp_setsockopt_set_val(msk, MAX_TCP_KEEPIDLE,
+						 &tcp_sock_set_keepidle_locked,
+						 &msk->keepalive_idle, val);
+		break;
+	case TCP_KEEPINTVL:
+		ret = __mptcp_setsockopt_set_val(msk, MAX_TCP_KEEPINTVL,
+						 &tcp_sock_set_keepintvl,
+						 &msk->keepalive_intvl, val);
+		break;
+	case TCP_KEEPCNT:
+		ret = __mptcp_setsockopt_set_val(msk, MAX_TCP_KEEPCNT,
+						 &tcp_sock_set_keepcnt,
+						 (unsigned int *)&msk->keepalive_cnt,
+						 val);
+		break;
 	default:
 		ret = -ENOPROTOOPT;
 	}
@@ -1332,6 +1372,8 @@  static int mptcp_put_int_option(struct mptcp_sock *msk, char __user *optval,
 static int mptcp_getsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
 				    char __user *optval, int __user *optlen)
 {
+	struct sock *sk = (void *)msk;
+
 	switch (optname) {
 	case TCP_ULP:
 	case TCP_CONGESTION:
@@ -1352,6 +1394,18 @@  static int mptcp_getsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
 		return mptcp_put_int_option(msk, optval, optlen, msk->nodelay);
 	case TCP_NOTSENT_LOWAT:
 		return mptcp_put_int_option(msk, optval, optlen, msk->notsent_lowat);
+	case TCP_KEEPIDLE:
+		return mptcp_put_int_option(msk, optval, optlen,
+					    msk->keepalive_idle ? :
+					    READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_keepalive_time) / HZ);
+	case TCP_KEEPINTVL:
+		return mptcp_put_int_option(msk, optval, optlen,
+					    msk->keepalive_intvl ? :
+					    READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_keepalive_intvl) / HZ);
+	case TCP_KEEPCNT:
+		return mptcp_put_int_option(msk, optval, optlen,
+					    msk->keepalive_cnt ? :
+					    READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_keepalive_probes));
 	}
 	return -EOPNOTSUPP;
 }
@@ -1467,6 +1521,9 @@  static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
 		tcp_set_congestion_control(ssk, msk->ca_name, false, true);
 	__tcp_sock_set_cork(ssk, !!msk->cork);
 	__tcp_sock_set_nodelay(ssk, !!msk->nodelay);
+	tcp_sock_set_keepidle_locked(ssk, msk->keepalive_idle);
+	tcp_sock_set_keepintvl(ssk, msk->keepalive_intvl);
+	tcp_sock_set_keepcnt(ssk, msk->keepalive_cnt);
 
 	inet_assign_bit(TRANSPARENT, ssk, inet_test_bit(TRANSPARENT, sk));
 	inet_assign_bit(FREEBIND, ssk, inet_test_bit(FREEBIND, sk));

[mptcp-net,2/2] mptcp: fix full TCP keep-alive support

Checks

Commit Message

Comments

Patch