Message ID | 20211123070708.2897469-4-max@internet.ru (mailing list archive) |
---|---|
State | Accepted, archived |
Commit | 54c2d9725869ad4cd708ccf445ace8b9f68aeac3 |
Delegated to: | Matthieu Baerts |
Headers | show |
Series | mptcp: support disabling Nagle's algorithm | expand |
Context | Check | Description |
---|---|---|
matttbe/build | success | Build and static analysis OK |
matttbe/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 104 lines checked |
matttbe/KVM_Validation__normal | warning | Unstable: 1 failed test(s): selftest_mptcp_join |
matttbe/KVM_Validation__debug | warning | Unstable: 2 failed test(s): selftest_diag selftest_mptcp_join |
On Tue, 2021-11-23 at 10:07 +0300, Maxim Galaganov wrote: > First, add cork and nodelay fields to the mptcp_sock structure > so they can be used in sync_socket_options(), and fill them on setsockopt > while holding the msk socket lock. > > Then, on setsockopt set proper tcp_sk(ssk)->nonagle values for subflows > by calling __tcp_sock_set_cork() or __tcp_sock_set_nodelay() on the ssk > while holding the ssk socket lock. > > tcp_push_pending_frames() will be invoked on the ssk if a cork was cleared > or nodelay was set. Also set MPTCP_PUSH_PENDING bit by calling > mptcp_check_and_set_pending(). This will lead to __mptcp_push_pending() > being called inside mptcp_release_cb() with new tcp_sk(ssk)->nonagle. > > Also add getsockopt support for TCP_CORK and TCP_NODELAY. > > Signed-off-by: Maxim Galaganov <max@internet.ru> > --- > I've tested this by doing single-byte writes and looking at the length > of packets in tcpdump, also this is now running on my tproxy setup. > Enabling nodelay on a tproxy gave a decrease in latency to first byte > and lead to improved user experience in web browsing and voip > applications. Existing selftests run passed on a debug config, I'll add > coverage for new sockopts once I grasp how to do it. > > Should these bit fields have an explicit zero initializer? I've looked > at recvmsg_inq and it doesn't seem to have one. Newly created sockets are zeroed at creation time by sk_alloc(). Cloned (accepted) sockets inherit the nodelay/cork status from the partent, as expected. No need to explicitly clear the fields in [__]mptcp_init_sock(). self-tests should be doable with pktdrill, see: https://github.com/multipath-tcp/packetdrill/blob/mptcp-net-next/gtests/net/mptcp/sockopts/sockopt_set_ip_tos_valid_v4.pkt Otherwise you could implement a simple mptcp client setting the cork flag and doing e.g. for (;;) { write(MTU/10); sleep (x); } the peer should do: read(..., mtu); // mesure elapsed time since previous read // delta should be ~ x * 10 yep, is very rough! As for this series: patches LGTM! many thanks for implementing this! Acked-by: Paolo Abeni <pabeni@redhat.com>
Hi Max, Paolo, On 23/11/2021 13:21, Paolo Abeni wrote: > On Tue, 2021-11-23 at 10:07 +0300, Maxim Galaganov wrote: >> First, add cork and nodelay fields to the mptcp_sock structure >> so they can be used in sync_socket_options(), and fill them on setsockopt >> while holding the msk socket lock. >> >> Then, on setsockopt set proper tcp_sk(ssk)->nonagle values for subflows >> by calling __tcp_sock_set_cork() or __tcp_sock_set_nodelay() on the ssk >> while holding the ssk socket lock. >> >> tcp_push_pending_frames() will be invoked on the ssk if a cork was cleared >> or nodelay was set. Also set MPTCP_PUSH_PENDING bit by calling >> mptcp_check_and_set_pending(). This will lead to __mptcp_push_pending() >> being called inside mptcp_release_cb() with new tcp_sk(ssk)->nonagle. >> >> Also add getsockopt support for TCP_CORK and TCP_NODELAY. Thank you for looking at that! >> Signed-off-by: Maxim Galaganov <max@internet.ru> >> --- >> I've tested this by doing single-byte writes and looking at the length >> of packets in tcpdump, also this is now running on my tproxy setup. >> Enabling nodelay on a tproxy gave a decrease in latency to first byte >> and lead to improved user experience in web browsing and voip >> applications. Existing selftests run passed on a debug config, I'll add >> coverage for new sockopts once I grasp how to do it. > > self-tests should be doable with pktdrill, see: > > https://github.com/multipath-tcp/packetdrill/blob/mptcp-net-next/gtests/net/mptcp/sockopts/sockopt_set_ip_tos_valid_v4.pkt You can also imitate what is done with the Packetdrill upstream's TCP Nagle tests: https://github.com/multipath-tcp/packetdrill/tree/mptcp-net-next/gtests/net/tcp/nagle Testing with packetdrill might be more reliable than with an E2E test in selftests if your test needs to look at expected time to get data from the other side. Indeed, some CI environments might be quite slow or busy with parallel tasks. Do you mind if we wait a bit to have these tests before applying your patches? Cheers, Matt
Hi Matt and Paolo, thank you for clarifications! On 23.11.2021 19:09, Matthieu Baerts wrote: >>> Signed-off-by: Maxim Galaganov <max@internet.ru> >>> --- >>> I've tested this by doing single-byte writes and looking at the length >>> of packets in tcpdump, also this is now running on my tproxy setup. >>> Enabling nodelay on a tproxy gave a decrease in latency to first byte >>> and lead to improved user experience in web browsing and voip >>> applications. Existing selftests run passed on a debug config, I'll add >>> coverage for new sockopts once I grasp how to do it. >> >> self-tests should be doable with pktdrill, see: >> >> https://github.com/multipath-tcp/packetdrill/blob/mptcp-net-next/gtests/net/mptcp/sockopts/sockopt_set_ip_tos_valid_v4.pkt > > You can also imitate what is done with the Packetdrill upstream's TCP > Nagle tests: > > https://github.com/multipath-tcp/packetdrill/tree/mptcp-net-next/gtests/net/tcp/nagle > > Testing with packetdrill might be more reliable than with an E2E test in > selftests if your test needs to look at expected time to get data from > the other side. Indeed, some CI environments might be quite slow or busy > with parallel tasks. > I'll try to cover all the cases from tcp/nagle, aside from MSG_MORE (which is silently ignored in MPTCP for now), and also add some MPTCP-specific ones. > Do you mind if we wait a bit to have these tests before applying your > patches? > I don't mind at all and I think it's preferable to add more test coverage before applying this series. Though it may take some time to get ahold of the tool.
Hi Max, On 24/11/2021 02:44, Maxim Galaganov wrote: > On 23.11.2021 19:09, Matthieu Baerts wrote: >> Do you mind if we wait a bit to have these tests before applying your >> patches? >> > > I don't mind at all and I think it's preferable to add more test > coverage before applying this series. Though it may take some time to > get ahold of the tool. Great! Do not hesitate to ask for help on this ML or on IRC ;-) Also, if you need a virtual environment to quickly test, you can have a look at: https://github.com/multipath-tcp/mptcp-upstream-virtme-docker#packetdrill Cheers, Matt
Hi Matt! On 24.11.2021 12:18, Matthieu Baerts wrote: > Hi Max, > > On 24/11/2021 02:44, Maxim Galaganov wrote: >> On 23.11.2021 19:09, Matthieu Baerts wrote: >>> Do you mind if we wait a bit to have these tests before applying your >>> patches? >>> >> >> I don't mind at all and I think it's preferable to add more test >> coverage before applying this series. Though it may take some time to >> get ahold of the tool. > > Great! I've opened a pull request into the packetdrill repo with tests for TCP_CORK and TCP_NODELAY. Link: https://github.com/multipath-tcp/packetdrill/pull/75
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 7f199fb720ff..47d24478763c 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -249,7 +249,9 @@ struct mptcp_sock { bool use_64bit_ack; /* Set when we received a 64-bit DSN */ bool csum_enabled; bool allow_infinite_fallback; - u8 recvmsg_inq:1; + u8 recvmsg_inq:1, + cork:1, + nodelay:1; spinlock_t join_list_lock; struct work_struct work; struct sk_buff *ooo_last_skb; diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c index 11cda8629993..e0501f6bfff8 100644 --- a/net/mptcp/sockopt.c +++ b/net/mptcp/sockopt.c @@ -616,6 +616,66 @@ static int mptcp_setsockopt_sol_tcp_congestion(struct mptcp_sock *msk, sockptr_t return ret; } +static int mptcp_setsockopt_sol_tcp_cork(struct mptcp_sock *msk, sockptr_t optval, + unsigned int optlen) +{ + struct mptcp_subflow_context *subflow; + struct sock *sk = (struct sock *)msk; + int val; + + if (optlen < sizeof(int)) + return -EINVAL; + + if (copy_from_sockptr(&val, optval, sizeof(val))) + return -EFAULT; + + lock_sock(sk); + sockopt_seq_inc(msk); + msk->cork = !!val; + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + + lock_sock(ssk); + __tcp_sock_set_cork(ssk, !!val); + release_sock(ssk); + } + if (!val) + mptcp_check_and_set_pending(sk); + release_sock(sk); + + return 0; +} + +static int mptcp_setsockopt_sol_tcp_nodelay(struct mptcp_sock *msk, sockptr_t optval, + unsigned int optlen) +{ + struct mptcp_subflow_context *subflow; + struct sock *sk = (struct sock *)msk; + int val; + + if (optlen < sizeof(int)) + return -EINVAL; + + if (copy_from_sockptr(&val, optval, sizeof(val))) + return -EFAULT; + + lock_sock(sk); + sockopt_seq_inc(msk); + msk->nodelay = !!val; + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + + lock_sock(ssk); + __tcp_sock_set_nodelay(ssk, !!val); + release_sock(ssk); + } + if (val) + mptcp_check_and_set_pending(sk); + release_sock(sk); + + return 0; +} + static int mptcp_setsockopt_sol_ip_set_transparent(struct mptcp_sock *msk, int optname, sockptr_t optval, unsigned int optlen) { @@ -717,6 +777,10 @@ static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname, return -EOPNOTSUPP; case TCP_CONGESTION: return mptcp_setsockopt_sol_tcp_congestion(msk, optval, optlen); + case TCP_CORK: + return mptcp_setsockopt_sol_tcp_cork(msk, optval, optlen); + case TCP_NODELAY: + return mptcp_setsockopt_sol_tcp_nodelay(msk, optval, optlen); } return -EOPNOTSUPP; @@ -1078,6 +1142,10 @@ static int mptcp_getsockopt_sol_tcp(struct mptcp_sock *msk, int optname, optval, optlen); case TCP_INQ: return mptcp_put_int_option(msk, optval, optlen, msk->recvmsg_inq); + case TCP_CORK: + return mptcp_put_int_option(msk, optval, optlen, msk->cork); + case TCP_NODELAY: + return mptcp_put_int_option(msk, optval, optlen, msk->nodelay); } return -EOPNOTSUPP; } @@ -1165,6 +1233,8 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk) if (inet_csk(sk)->icsk_ca_ops != inet_csk(ssk)->icsk_ca_ops) tcp_set_congestion_control(ssk, msk->ca_name, false, true); + __tcp_sock_set_cork(ssk, !!msk->cork); + __tcp_sock_set_nodelay(ssk, !!msk->nodelay); inet_sk(ssk)->transparent = inet_sk(sk)->transparent; inet_sk(ssk)->freebind = inet_sk(sk)->freebind;
First, add cork and nodelay fields to the mptcp_sock structure so they can be used in sync_socket_options(), and fill them on setsockopt while holding the msk socket lock. Then, on setsockopt set proper tcp_sk(ssk)->nonagle values for subflows by calling __tcp_sock_set_cork() or __tcp_sock_set_nodelay() on the ssk while holding the ssk socket lock. tcp_push_pending_frames() will be invoked on the ssk if a cork was cleared or nodelay was set. Also set MPTCP_PUSH_PENDING bit by calling mptcp_check_and_set_pending(). This will lead to __mptcp_push_pending() being called inside mptcp_release_cb() with new tcp_sk(ssk)->nonagle. Also add getsockopt support for TCP_CORK and TCP_NODELAY. Signed-off-by: Maxim Galaganov <max@internet.ru> --- I've tested this by doing single-byte writes and looking at the length of packets in tcpdump, also this is now running on my tproxy setup. Enabling nodelay on a tproxy gave a decrease in latency to first byte and lead to improved user experience in web browsing and voip applications. Existing selftests run passed on a debug config, I'll add coverage for new sockopts once I grasp how to do it. Should these bit fields have an explicit zero initializer? I've looked at recvmsg_inq and it doesn't seem to have one. net/mptcp/protocol.h | 4 ++- net/mptcp/sockopt.c | 70 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 73 insertions(+), 1 deletion(-)