Message ID | 20230224184606.7101-1-fw@strlen.de (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: avoid indirect memory pressure calls | expand |
On Fri, 24 Feb 2023 19:46:06 +0100 Florian Westphal wrote: > There is a noticeable tcp performance regression (loopback or cross-netns), > seen with iperf3 -Z (sendfile mode) when generic retpolines are needed. > > With SK_RECLAIM_THRESHOLD checks gone number of calls to enter/leave > memory pressure happen much more often. For TCP indirect calls are > used. > > We can't remove the if-set-return short-circuit check in > tcp_enter_memory_pressure because there are callers other than > sk_enter_memory_pressure. Doing a check in the sk wrapper too > reduces the indirect calls enough to recover some performance. > > Before, > 0.00-60.00 sec 322 GBytes 46.1 Gbits/sec receiver > > After: > 0.00-60.04 sec 359 GBytes 51.4 Gbits/sec receiver > > "iperf3 -c $peer -t 60 -Z -f g", connected via veth in another netns. > > Fixes: 4890b686f408 ("net: keep sk->sk_forward_alloc as small as possible") > Signed-off-by: Florian Westphal <fw@strlen.de> Looks acceptable, Eric?
From: Jakub Kicinski <kuba@kernel.org> Date: Mon, 27 Feb 2023 15:27:41 -0800 > On Fri, 24 Feb 2023 19:46:06 +0100 Florian Westphal wrote: >> There is a noticeable tcp performance regression (loopback or cross-netns), >> seen with iperf3 -Z (sendfile mode) when generic retpolines are needed. >> >> With SK_RECLAIM_THRESHOLD checks gone number of calls to enter/leave >> memory pressure happen much more often. For TCP indirect calls are >> used. >> >> We can't remove the if-set-return short-circuit check in >> tcp_enter_memory_pressure because there are callers other than >> sk_enter_memory_pressure. Doing a check in the sk wrapper too >> reduces the indirect calls enough to recover some performance. >> >> Before, >> 0.00-60.00 sec 322 GBytes 46.1 Gbits/sec receiver >> >> After: >> 0.00-60.04 sec 359 GBytes 51.4 Gbits/sec receiver >> >> "iperf3 -c $peer -t 60 -Z -f g", connected via veth in another netns. >> >> Fixes: 4890b686f408 ("net: keep sk->sk_forward_alloc as small as possible") >> Signed-off-by: Florian Westphal <fw@strlen.de> > > Looks acceptable, Eric? > I'm no Eric, but I'd only change this: + if (!memory_pressure || READ_ONCE(*memory_pressure) == 0) to + if (!memory_pressure || !READ_ONCE(*memory_pressure)) :p The perf boost looks gross, love that *_* Thanks, Olek
Alexander Lobakin <aleksander.lobakin@intel.com> wrote: > From: Jakub Kicinski <kuba@kernel.org> > Date: Mon, 27 Feb 2023 15:27:41 -0800 > > > On Fri, 24 Feb 2023 19:46:06 +0100 Florian Westphal wrote: > >> There is a noticeable tcp performance regression (loopback or cross-netns), > >> seen with iperf3 -Z (sendfile mode) when generic retpolines are needed. > >> > >> With SK_RECLAIM_THRESHOLD checks gone number of calls to enter/leave > >> memory pressure happen much more often. For TCP indirect calls are > >> used. > >> > >> We can't remove the if-set-return short-circuit check in > >> tcp_enter_memory_pressure because there are callers other than > >> sk_enter_memory_pressure. Doing a check in the sk wrapper too > >> reduces the indirect calls enough to recover some performance. > >> > >> Before, > >> 0.00-60.00 sec 322 GBytes 46.1 Gbits/sec receiver > >> > >> After: > >> 0.00-60.04 sec 359 GBytes 51.4 Gbits/sec receiver > >> > >> "iperf3 -c $peer -t 60 -Z -f g", connected via veth in another netns. > >> > >> Fixes: 4890b686f408 ("net: keep sk->sk_forward_alloc as small as possible") > >> Signed-off-by: Florian Westphal <fw@strlen.de> > > > > Looks acceptable, Eric? > > > I'm no Eric, but I'd only change this: > > + if (!memory_pressure || READ_ONCE(*memory_pressure) == 0) > > to > > + if (!memory_pressure || !READ_ONCE(*memory_pressure)) I intentioanlly used '== 0', i found it too easy to miss the '!' before 'R'. But maybe I just need better glasses.
On Tue, Feb 28, 2023 at 5:35 PM Florian Westphal <fw@strlen.de> wrote: > > Alexander Lobakin <aleksander.lobakin@intel.com> wrote: > > From: Jakub Kicinski <kuba@kernel.org> > > Date: Mon, 27 Feb 2023 15:27:41 -0800 > > > > > On Fri, 24 Feb 2023 19:46:06 +0100 Florian Westphal wrote: > > >> There is a noticeable tcp performance regression (loopback or cross-netns), > > >> seen with iperf3 -Z (sendfile mode) when generic retpolines are needed. > > >> > > >> With SK_RECLAIM_THRESHOLD checks gone number of calls to enter/leave > > >> memory pressure happen much more often. For TCP indirect calls are > > >> used. > > >> > > >> We can't remove the if-set-return short-circuit check in > > >> tcp_enter_memory_pressure because there are callers other than > > >> sk_enter_memory_pressure. Doing a check in the sk wrapper too > > >> reduces the indirect calls enough to recover some performance. > > >> > > >> Before, > > >> 0.00-60.00 sec 322 GBytes 46.1 Gbits/sec receiver > > >> > > >> After: > > >> 0.00-60.04 sec 359 GBytes 51.4 Gbits/sec receiver > > >> > > >> "iperf3 -c $peer -t 60 -Z -f g", connected via veth in another netns. > > >> > > >> Fixes: 4890b686f408 ("net: keep sk->sk_forward_alloc as small as possible") > > >> Signed-off-by: Florian Westphal <fw@strlen.de> > > > > > > Looks acceptable, Eric? > > > > > I'm no Eric, but I'd only change this: > > > > + if (!memory_pressure || READ_ONCE(*memory_pressure) == 0) > > > > to > > > > + if (!memory_pressure || !READ_ONCE(*memory_pressure)) > > I intentioanlly used '== 0', i found it too easy to miss the '!' before > 'R'. But maybe I just need better glasses. Sorry for the delay, I will take a look.
On Fri, Feb 24, 2023 at 7:49 PM Florian Westphal <fw@strlen.de> wrote: > > There is a noticeable tcp performance regression (loopback or cross-netns), > seen with iperf3 -Z (sendfile mode) when generic retpolines are needed. > > With SK_RECLAIM_THRESHOLD checks gone number of calls to enter/leave > memory pressure happen much more often. For TCP indirect calls are > used. > > We can't remove the if-set-return short-circuit check in > tcp_enter_memory_pressure because there are callers other than > sk_enter_memory_pressure. Doing a check in the sk wrapper too > reduces the indirect calls enough to recover some performance. > > Before, > 0.00-60.00 sec 322 GBytes 46.1 Gbits/sec receiver > > After: > 0.00-60.04 sec 359 GBytes 51.4 Gbits/sec receiver > > "iperf3 -c $peer -t 60 -Z -f g", connected via veth in another netns. > > Fixes: 4890b686f408 ("net: keep sk->sk_forward_alloc as small as possible") > Signed-off-by: Florian Westphal <fw@strlen.de> > --- > net/core/sock.c | 22 +++++++++++++--------- > 1 file changed, 13 insertions(+), 9 deletions(-) > > diff --git a/net/core/sock.c b/net/core/sock.c > index 341c565dbc26..45d247112aa5 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -2809,22 +2809,26 @@ EXPORT_SYMBOL(sock_cmsg_send); > > static void sk_enter_memory_pressure(struct sock *sk) This one is probably not called under normal circumstances. > { > - if (!sk->sk_prot->enter_memory_pressure) > + unsigned long *memory_pressure = sk->sk_prot->memory_pressure; > + > + if (!memory_pressure || READ_ONCE(*memory_pressure)) > return; > > - sk->sk_prot->enter_memory_pressure(sk); > + if (sk->sk_prot->enter_memory_pressure) > + sk->sk_prot->enter_memory_pressure(sk); > } > > static void sk_leave_memory_pressure(struct sock *sk) > { > - if (sk->sk_prot->leave_memory_pressure) { > - sk->sk_prot->leave_memory_pressure(sk); > - } else { > - unsigned long *memory_pressure = sk->sk_prot->memory_pressure; > + unsigned long *memory_pressure = sk->sk_prot->memory_pressure; > > - if (memory_pressure && READ_ONCE(*memory_pressure)) > - WRITE_ONCE(*memory_pressure, 0); > - } > + if (!memory_pressure || READ_ONCE(*memory_pressure) == 0) > + return; > + > + if (sk->sk_prot->leave_memory_pressure) > + sk->sk_prot->leave_memory_pressure(sk); > + else > + WRITE_ONCE(*memory_pressure, 0); > } > For this one, we have too callers. First one (__sk_mem_reduce_allocated) is using if (sk_under_memory_pressure(sk) ... So I wonder if for consistency we could use the same heuristic [1] ? Note that [1] is memcg aware/ready. Refactoring of sk_enter_memory_pressure() and sk_leave_memory_pressure() could wait net-next maybe... [1] diff --git a/net/core/sock.c b/net/core/sock.c index 341c565dbc262fcece1c5b410609d910a68edcb0..0472f8559a10136672ce6647f133367c81a93cb7 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2993,7 +2993,8 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind) /* Under limit. */ if (allocated <= sk_prot_mem_limits(sk, 0)) { - sk_leave_memory_pressure(sk); + if (sk_under_memory_pressure(sk)) + sk_leave_memory_pressure(sk); return 1; }
On Tue, Feb 28, 2023 at 5:44 PM Eric Dumazet <edumazet@google.com> wrote: > > On Fri, Feb 24, 2023 at 7:49 PM Florian Westphal <fw@strlen.de> wrote: > > > > There is a noticeable tcp performance regression (loopback or cross-netns), > > seen with iperf3 -Z (sendfile mode) when generic retpolines are needed. > > > > With SK_RECLAIM_THRESHOLD checks gone number of calls to enter/leave > > memory pressure happen much more often. For TCP indirect calls are > > used. > > > > We can't remove the if-set-return short-circuit check in > > tcp_enter_memory_pressure because there are callers other than > > sk_enter_memory_pressure. Doing a check in the sk wrapper too > > reduces the indirect calls enough to recover some performance. > > > > Before, > > 0.00-60.00 sec 322 GBytes 46.1 Gbits/sec receiver > > > > After: > > 0.00-60.04 sec 359 GBytes 51.4 Gbits/sec receiver > > BTW I was curious why Google was not seeing this, and it appears Brian Vasquez forgot to upstream this change... commit 5ea2f21d6c1078d2c563cb455ad5877b4ada94e1 Author: Brian Vazquez <brianvv@google.com> Date: Thu Mar 3 19:09:49 2022 -0800 PRODKERNEL: net-directcall: annotate tcp_leave_memory_pressure and tcp_getsockopt Switch to upstream infra to make rebase easier Tested: built and booted on lab machine Upstream-Plan: 150254871 Effort: net-directcall diff --git a/net/core/sock.c b/net/core/sock.c index 05032b399c873984e5297898d647905ca9f21f2e..54cb989dc162f3982380ac12cf5a150214e209a2 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2647,10 +2647,13 @@ static void sk_enter_memory_pressure(struct sock *sk) sk->sk_prot->enter_memory_pressure(sk); } +INDIRECT_CALLABLE_DECLARE(void tcp_leave_memory_pressure(struct sock *sk)); + static void sk_leave_memory_pressure(struct sock *sk) { if (sk->sk_prot->leave_memory_pressure) { - sk->sk_prot->leave_memory_pressure(sk); + INDIRECT_CALL_1(sk->sk_prot->leave_memory_pressure, + tcp_leave_memory_pressure, sk); } else { unsigned long *memory_pressure = sk->sk_prot->memory_pressure; @@ -3439,6 +3442,10 @@ int sock_recv_errqueue(struct sock *sk, struct msghdr *msg, int len, } EXPORT_SYMBOL(sock_recv_errqueue); +INDIRECT_CALLABLE_DECLARE(int tcp_getsockopt(struct sock *sk, int level, + int optname, char __user *optval, + int __user *optlen)); + /* * Get a socket option on an socket. * @@ -3451,7 +3458,8 @@ int sock_common_getsockopt(struct socket *sock, int level, int optname, { struct sock *sk = sock->sk; - return sk->sk_prot->getsockopt(sk, level, optname, optval, optlen); + return INDIRECT_CALL_1(sk->sk_prot->getsockopt, tcp_getsockopt, + sk, level, optname, optval, optlen); } EXPORT_SYMBOL(sock_common_getsockopt);
Eric Dumazet <edumazet@google.com> wrote: > BTW I was curious why Google was not seeing this, and it appears Brian Vasquez > forgot to upstream this change... > > commit 5ea2f21d6c1078d2c563cb455ad5877b4ada94e1 > Author: Brian Vazquez <brianvv@google.com> > Date: Thu Mar 3 19:09:49 2022 -0800 > > PRODKERNEL: net-directcall: annotate tcp_leave_memory_pressure and > tcp_getsockopt > > diff --git a/net/core/sock.c b/net/core/sock.c > index 05032b399c873984e5297898d647905ca9f21f2e..54cb989dc162f3982380ac12cf5a150214e209a2 > 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -2647,10 +2647,13 @@ static void sk_enter_memory_pressure(struct sock *sk) > sk->sk_prot->enter_memory_pressure(sk); > } > > +INDIRECT_CALLABLE_DECLARE(void tcp_leave_memory_pressure(struct sock *sk)); > + > static void sk_leave_memory_pressure(struct sock *sk) > { > if (sk->sk_prot->leave_memory_pressure) { > - sk->sk_prot->leave_memory_pressure(sk); > + INDIRECT_CALL_1(sk->sk_prot->leave_memory_pressure, > + tcp_leave_memory_pressure, sk); > } else { > unsigned long *memory_pressure = sk->sk_prot->memory_pressure; re-tested: this change also resolves the regression i was seeing. If you prefer to upstream this instead of the proposed change then I'm fine with that. Thanks.
On Wed, Mar 1, 2023 at 1:31 PM Florian Westphal <fw@strlen.de> wrote: > > Eric Dumazet <edumazet@google.com> wrote: > > BTW I was curious why Google was not seeing this, and it appears Brian Vasquez > > forgot to upstream this change... > > > > commit 5ea2f21d6c1078d2c563cb455ad5877b4ada94e1 > > Author: Brian Vazquez <brianvv@google.com> > > Date: Thu Mar 3 19:09:49 2022 -0800 > > > > PRODKERNEL: net-directcall: annotate tcp_leave_memory_pressure and > > tcp_getsockopt > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > index 05032b399c873984e5297898d647905ca9f21f2e..54cb989dc162f3982380ac12cf5a150214e209a2 > > 100644 > > --- a/net/core/sock.c > > +++ b/net/core/sock.c > > @@ -2647,10 +2647,13 @@ static void sk_enter_memory_pressure(struct sock *sk) > > sk->sk_prot->enter_memory_pressure(sk); > > } > > > > +INDIRECT_CALLABLE_DECLARE(void tcp_leave_memory_pressure(struct sock *sk)); > > + > > static void sk_leave_memory_pressure(struct sock *sk) > > { > > if (sk->sk_prot->leave_memory_pressure) { > > - sk->sk_prot->leave_memory_pressure(sk); > > + INDIRECT_CALL_1(sk->sk_prot->leave_memory_pressure, > > + tcp_leave_memory_pressure, sk); > > } else { > > unsigned long *memory_pressure = sk->sk_prot->memory_pressure; > > re-tested: this change also resolves the regression i was seeing. > > If you prefer to upstream this instead of the proposed change then I'm > fine with that. This seems a bit less risky, if the plan is to add this to stable trees. ( We had this mitigation for ~4 years at Google) I will rebase Brian patch (only the tcp_leave_memory_pressure part) and send it. Thanks !
diff --git a/net/core/sock.c b/net/core/sock.c index 341c565dbc26..45d247112aa5 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2809,22 +2809,26 @@ EXPORT_SYMBOL(sock_cmsg_send); static void sk_enter_memory_pressure(struct sock *sk) { - if (!sk->sk_prot->enter_memory_pressure) + unsigned long *memory_pressure = sk->sk_prot->memory_pressure; + + if (!memory_pressure || READ_ONCE(*memory_pressure)) return; - sk->sk_prot->enter_memory_pressure(sk); + if (sk->sk_prot->enter_memory_pressure) + sk->sk_prot->enter_memory_pressure(sk); } static void sk_leave_memory_pressure(struct sock *sk) { - if (sk->sk_prot->leave_memory_pressure) { - sk->sk_prot->leave_memory_pressure(sk); - } else { - unsigned long *memory_pressure = sk->sk_prot->memory_pressure; + unsigned long *memory_pressure = sk->sk_prot->memory_pressure; - if (memory_pressure && READ_ONCE(*memory_pressure)) - WRITE_ONCE(*memory_pressure, 0); - } + if (!memory_pressure || READ_ONCE(*memory_pressure) == 0) + return; + + if (sk->sk_prot->leave_memory_pressure) + sk->sk_prot->leave_memory_pressure(sk); + else + WRITE_ONCE(*memory_pressure, 0); } DEFINE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key);
There is a noticeable tcp performance regression (loopback or cross-netns), seen with iperf3 -Z (sendfile mode) when generic retpolines are needed. With SK_RECLAIM_THRESHOLD checks gone number of calls to enter/leave memory pressure happen much more often. For TCP indirect calls are used. We can't remove the if-set-return short-circuit check in tcp_enter_memory_pressure because there are callers other than sk_enter_memory_pressure. Doing a check in the sk wrapper too reduces the indirect calls enough to recover some performance. Before, 0.00-60.00 sec 322 GBytes 46.1 Gbits/sec receiver After: 0.00-60.04 sec 359 GBytes 51.4 Gbits/sec receiver "iperf3 -c $peer -t 60 -Z -f g", connected via veth in another netns. Fixes: 4890b686f408 ("net: keep sk->sk_forward_alloc as small as possible") Signed-off-by: Florian Westphal <fw@strlen.de> --- net/core/sock.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)