diff mbox series

[mptcp-next,3/3] mptcp: support TCP_CORK and TCP_NODELAY

Message ID 20211123070708.2897469-4-max@internet.ru (mailing list archive)
State Accepted, archived
Commit 54c2d9725869ad4cd708ccf445ace8b9f68aeac3
Delegated to: Matthieu Baerts
Headers show
Series mptcp: support disabling Nagle's algorithm | expand

Checks

Context Check Description
matttbe/build success Build and static analysis OK
matttbe/checkpatch success total: 0 errors, 0 warnings, 0 checks, 104 lines checked
matttbe/KVM_Validation__normal warning Unstable: 1 failed test(s): selftest_mptcp_join
matttbe/KVM_Validation__debug warning Unstable: 2 failed test(s): selftest_diag selftest_mptcp_join

Commit Message

Maxim Galaganov Nov. 23, 2021, 7:07 a.m. UTC
First, add cork and nodelay fields to the mptcp_sock structure
so they can be used in sync_socket_options(), and fill them on setsockopt
while holding the msk socket lock.

Then, on setsockopt set proper tcp_sk(ssk)->nonagle values for subflows
by calling __tcp_sock_set_cork() or __tcp_sock_set_nodelay() on the ssk
while holding the ssk socket lock.

tcp_push_pending_frames() will be invoked on the ssk if a cork was cleared
or nodelay was set. Also set MPTCP_PUSH_PENDING bit by calling
mptcp_check_and_set_pending(). This will lead to __mptcp_push_pending()
being called inside mptcp_release_cb() with new tcp_sk(ssk)->nonagle.

Also add getsockopt support for TCP_CORK and TCP_NODELAY.

Signed-off-by: Maxim Galaganov <max@internet.ru>
---
I've tested this by doing single-byte writes and looking at the length
of packets in tcpdump, also this is now running on my tproxy setup.
Enabling nodelay on a tproxy gave a decrease in latency to first byte
and lead to improved user experience in web browsing and voip
applications. Existing selftests run passed on a debug config, I'll add
coverage for new sockopts once I grasp how to do it.

Should these bit fields have an explicit zero initializer? I've looked
at recvmsg_inq and it doesn't seem to have one.

 net/mptcp/protocol.h |  4 ++-
 net/mptcp/sockopt.c  | 70 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 73 insertions(+), 1 deletion(-)

Comments

Paolo Abeni Nov. 23, 2021, 12:21 p.m. UTC | #1
On Tue, 2021-11-23 at 10:07 +0300, Maxim Galaganov wrote:
> First, add cork and nodelay fields to the mptcp_sock structure
> so they can be used in sync_socket_options(), and fill them on setsockopt
> while holding the msk socket lock.
> 
> Then, on setsockopt set proper tcp_sk(ssk)->nonagle values for subflows
> by calling __tcp_sock_set_cork() or __tcp_sock_set_nodelay() on the ssk
> while holding the ssk socket lock.
> 
> tcp_push_pending_frames() will be invoked on the ssk if a cork was cleared
> or nodelay was set. Also set MPTCP_PUSH_PENDING bit by calling
> mptcp_check_and_set_pending(). This will lead to __mptcp_push_pending()
> being called inside mptcp_release_cb() with new tcp_sk(ssk)->nonagle.
> 
> Also add getsockopt support for TCP_CORK and TCP_NODELAY.
> 
> Signed-off-by: Maxim Galaganov <max@internet.ru>
> ---
> I've tested this by doing single-byte writes and looking at the length
> of packets in tcpdump, also this is now running on my tproxy setup.
> Enabling nodelay on a tproxy gave a decrease in latency to first byte
> and lead to improved user experience in web browsing and voip
> applications. Existing selftests run passed on a debug config, I'll add
> coverage for new sockopts once I grasp how to do it.
> 
> Should these bit fields have an explicit zero initializer? I've looked
> at recvmsg_inq and it doesn't seem to have one.

Newly created sockets are zeroed at creation time by sk_alloc(). Cloned
(accepted) sockets inherit the nodelay/cork status from the partent, as
expected.

No need to explicitly clear the fields in [__]mptcp_init_sock().

self-tests should be doable with pktdrill, see:

https://github.com/multipath-tcp/packetdrill/blob/mptcp-net-next/gtests/net/mptcp/sockopts/sockopt_set_ip_tos_valid_v4.pkt

Otherwise you could implement a simple mptcp client setting the cork
flag and doing e.g.
for (;;) { write(MTU/10); sleep (x); }

the peer should do:

	read(..., mtu);
	// mesure elapsed time since previous read
	// delta should be ~ x * 10


yep, is very rough!

As for this series: patches LGTM! many thanks for implementing this!

Acked-by: Paolo Abeni <pabeni@redhat.com>
Matthieu Baerts Nov. 23, 2021, 4:09 p.m. UTC | #2
Hi Max, Paolo,

On 23/11/2021 13:21, Paolo Abeni wrote:
> On Tue, 2021-11-23 at 10:07 +0300, Maxim Galaganov wrote:
>> First, add cork and nodelay fields to the mptcp_sock structure
>> so they can be used in sync_socket_options(), and fill them on setsockopt
>> while holding the msk socket lock.
>>
>> Then, on setsockopt set proper tcp_sk(ssk)->nonagle values for subflows
>> by calling __tcp_sock_set_cork() or __tcp_sock_set_nodelay() on the ssk
>> while holding the ssk socket lock.
>>
>> tcp_push_pending_frames() will be invoked on the ssk if a cork was cleared
>> or nodelay was set. Also set MPTCP_PUSH_PENDING bit by calling
>> mptcp_check_and_set_pending(). This will lead to __mptcp_push_pending()
>> being called inside mptcp_release_cb() with new tcp_sk(ssk)->nonagle.
>>
>> Also add getsockopt support for TCP_CORK and TCP_NODELAY.

Thank you for looking at that!

>> Signed-off-by: Maxim Galaganov <max@internet.ru>
>> ---
>> I've tested this by doing single-byte writes and looking at the length
>> of packets in tcpdump, also this is now running on my tproxy setup.
>> Enabling nodelay on a tproxy gave a decrease in latency to first byte
>> and lead to improved user experience in web browsing and voip
>> applications. Existing selftests run passed on a debug config, I'll add
>> coverage for new sockopts once I grasp how to do it.
> 
> self-tests should be doable with pktdrill, see:
> 
> https://github.com/multipath-tcp/packetdrill/blob/mptcp-net-next/gtests/net/mptcp/sockopts/sockopt_set_ip_tos_valid_v4.pkt

You can also imitate what is done with the Packetdrill upstream's TCP
Nagle tests:

https://github.com/multipath-tcp/packetdrill/tree/mptcp-net-next/gtests/net/tcp/nagle

Testing with packetdrill might be more reliable than with an E2E test in
selftests if your test needs to look at expected time to get data from
the other side. Indeed, some CI environments might be quite slow or busy
with parallel tasks.

Do you mind if we wait a bit to have these tests before applying your
patches?

Cheers,
Matt
Maxim Galaganov Nov. 24, 2021, 1:44 a.m. UTC | #3
Hi Matt and Paolo, thank you for clarifications!

On 23.11.2021 19:09, Matthieu Baerts wrote:
>>> Signed-off-by: Maxim Galaganov <max@internet.ru>
>>> ---
>>> I've tested this by doing single-byte writes and looking at the length
>>> of packets in tcpdump, also this is now running on my tproxy setup.
>>> Enabling nodelay on a tproxy gave a decrease in latency to first byte
>>> and lead to improved user experience in web browsing and voip
>>> applications. Existing selftests run passed on a debug config, I'll add
>>> coverage for new sockopts once I grasp how to do it.
>>
>> self-tests should be doable with pktdrill, see:
>>
>> https://github.com/multipath-tcp/packetdrill/blob/mptcp-net-next/gtests/net/mptcp/sockopts/sockopt_set_ip_tos_valid_v4.pkt
> 
> You can also imitate what is done with the Packetdrill upstream's TCP
> Nagle tests:
> 
> https://github.com/multipath-tcp/packetdrill/tree/mptcp-net-next/gtests/net/tcp/nagle
> 
> Testing with packetdrill might be more reliable than with an E2E test in
> selftests if your test needs to look at expected time to get data from
> the other side. Indeed, some CI environments might be quite slow or busy
> with parallel tasks.
> 

I'll try to cover all the cases from tcp/nagle, aside from MSG_MORE 
(which is silently ignored in MPTCP for now), and also add some 
MPTCP-specific ones.

> Do you mind if we wait a bit to have these tests before applying your
> patches?
> 

I don't mind at all and I think it's preferable to add more test 
coverage before applying this series. Though it may take some time to 
get ahold of the tool.
Matthieu Baerts Nov. 24, 2021, 9:18 a.m. UTC | #4
Hi Max,

On 24/11/2021 02:44, Maxim Galaganov wrote:
> On 23.11.2021 19:09, Matthieu Baerts wrote:
>> Do you mind if we wait a bit to have these tests before applying your
>> patches?
>>
> 
> I don't mind at all and I think it's preferable to add more test
> coverage before applying this series. Though it may take some time to
> get ahold of the tool.

Great!
Do not hesitate to ask for help on this ML or on IRC ;-)

Also, if you need a virtual environment to quickly test, you can have a
look at:

https://github.com/multipath-tcp/mptcp-upstream-virtme-docker#packetdrill

Cheers,
Matt
Maxim Galaganov Nov. 25, 2021, 11:34 a.m. UTC | #5
Hi Matt!

On 24.11.2021 12:18, Matthieu Baerts wrote:
> Hi Max,
> 
> On 24/11/2021 02:44, Maxim Galaganov wrote:
>> On 23.11.2021 19:09, Matthieu Baerts wrote:
>>> Do you mind if we wait a bit to have these tests before applying your
>>> patches?
>>>
>>
>> I don't mind at all and I think it's preferable to add more test
>> coverage before applying this series. Though it may take some time to
>> get ahold of the tool.
> 
> Great!

I've opened a pull request into the packetdrill repo with tests for 
TCP_CORK and TCP_NODELAY.

Link: https://github.com/multipath-tcp/packetdrill/pull/75
diff mbox series

Patch

diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 7f199fb720ff..47d24478763c 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -249,7 +249,9 @@  struct mptcp_sock {
 	bool		use_64bit_ack; /* Set when we received a 64-bit DSN */
 	bool		csum_enabled;
 	bool		allow_infinite_fallback;
-	u8		recvmsg_inq:1;
+	u8		recvmsg_inq:1,
+			cork:1,
+			nodelay:1;
 	spinlock_t	join_list_lock;
 	struct work_struct work;
 	struct sk_buff  *ooo_last_skb;
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 11cda8629993..e0501f6bfff8 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -616,6 +616,66 @@  static int mptcp_setsockopt_sol_tcp_congestion(struct mptcp_sock *msk, sockptr_t
 	return ret;
 }
 
+static int mptcp_setsockopt_sol_tcp_cork(struct mptcp_sock *msk, sockptr_t optval,
+					 unsigned int optlen)
+{
+	struct mptcp_subflow_context *subflow;
+	struct sock *sk = (struct sock *)msk;
+	int val;
+
+	if (optlen < sizeof(int))
+		return -EINVAL;
+
+	if (copy_from_sockptr(&val, optval, sizeof(val)))
+		return -EFAULT;
+
+	lock_sock(sk);
+	sockopt_seq_inc(msk);
+	msk->cork = !!val;
+	mptcp_for_each_subflow(msk, subflow) {
+		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+
+		lock_sock(ssk);
+		__tcp_sock_set_cork(ssk, !!val);
+		release_sock(ssk);
+	}
+	if (!val)
+		mptcp_check_and_set_pending(sk);
+	release_sock(sk);
+
+	return 0;
+}
+
+static int mptcp_setsockopt_sol_tcp_nodelay(struct mptcp_sock *msk, sockptr_t optval,
+					    unsigned int optlen)
+{
+	struct mptcp_subflow_context *subflow;
+	struct sock *sk = (struct sock *)msk;
+	int val;
+
+	if (optlen < sizeof(int))
+		return -EINVAL;
+
+	if (copy_from_sockptr(&val, optval, sizeof(val)))
+		return -EFAULT;
+
+	lock_sock(sk);
+	sockopt_seq_inc(msk);
+	msk->nodelay = !!val;
+	mptcp_for_each_subflow(msk, subflow) {
+		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+
+		lock_sock(ssk);
+		__tcp_sock_set_nodelay(ssk, !!val);
+		release_sock(ssk);
+	}
+	if (val)
+		mptcp_check_and_set_pending(sk);
+	release_sock(sk);
+
+	return 0;
+}
+
 static int mptcp_setsockopt_sol_ip_set_transparent(struct mptcp_sock *msk, int optname,
 						   sockptr_t optval, unsigned int optlen)
 {
@@ -717,6 +777,10 @@  static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
 		return -EOPNOTSUPP;
 	case TCP_CONGESTION:
 		return mptcp_setsockopt_sol_tcp_congestion(msk, optval, optlen);
+	case TCP_CORK:
+		return mptcp_setsockopt_sol_tcp_cork(msk, optval, optlen);
+	case TCP_NODELAY:
+		return mptcp_setsockopt_sol_tcp_nodelay(msk, optval, optlen);
 	}
 
 	return -EOPNOTSUPP;
@@ -1078,6 +1142,10 @@  static int mptcp_getsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
 						      optval, optlen);
 	case TCP_INQ:
 		return mptcp_put_int_option(msk, optval, optlen, msk->recvmsg_inq);
+	case TCP_CORK:
+		return mptcp_put_int_option(msk, optval, optlen, msk->cork);
+	case TCP_NODELAY:
+		return mptcp_put_int_option(msk, optval, optlen, msk->nodelay);
 	}
 	return -EOPNOTSUPP;
 }
@@ -1165,6 +1233,8 @@  static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
 
 	if (inet_csk(sk)->icsk_ca_ops != inet_csk(ssk)->icsk_ca_ops)
 		tcp_set_congestion_control(ssk, msk->ca_name, false, true);
+	__tcp_sock_set_cork(ssk, !!msk->cork);
+	__tcp_sock_set_nodelay(ssk, !!msk->nodelay);
 
 	inet_sk(ssk)->transparent = inet_sk(sk)->transparent;
 	inet_sk(ssk)->freebind = inet_sk(sk)->freebind;