diff mbox series

[net] net: avoid indirect memory pressure calls

Message ID 20230224184606.7101-1-fw@strlen.de (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net] net: avoid indirect memory pressure calls | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 6 this patch: 6
netdev/cc_maintainers warning 1 maintainers not CCed: martin.lau@kernel.org
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 6 this patch: 6
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 35 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Florian Westphal Feb. 24, 2023, 6:46 p.m. UTC
There is a noticeable tcp performance regression (loopback or cross-netns),
seen with iperf3 -Z (sendfile mode) when generic retpolines are needed.

With SK_RECLAIM_THRESHOLD checks gone number of calls to enter/leave
memory pressure happen much more often. For TCP indirect calls are
used.

We can't remove the if-set-return short-circuit check in
tcp_enter_memory_pressure because there are callers other than
sk_enter_memory_pressure.  Doing a check in the sk wrapper too
reduces the indirect calls enough to recover some performance.

Before,
0.00-60.00  sec   322 GBytes  46.1 Gbits/sec                  receiver

After:
0.00-60.04  sec   359 GBytes  51.4 Gbits/sec                  receiver

"iperf3 -c $peer -t 60 -Z -f g", connected via veth in another netns.

Fixes: 4890b686f408 ("net: keep sk->sk_forward_alloc as small as possible")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/core/sock.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

Comments

Jakub Kicinski Feb. 27, 2023, 11:27 p.m. UTC | #1
On Fri, 24 Feb 2023 19:46:06 +0100 Florian Westphal wrote:
> There is a noticeable tcp performance regression (loopback or cross-netns),
> seen with iperf3 -Z (sendfile mode) when generic retpolines are needed.
> 
> With SK_RECLAIM_THRESHOLD checks gone number of calls to enter/leave
> memory pressure happen much more often. For TCP indirect calls are
> used.
> 
> We can't remove the if-set-return short-circuit check in
> tcp_enter_memory_pressure because there are callers other than
> sk_enter_memory_pressure.  Doing a check in the sk wrapper too
> reduces the indirect calls enough to recover some performance.
> 
> Before,
> 0.00-60.00  sec   322 GBytes  46.1 Gbits/sec                  receiver
> 
> After:
> 0.00-60.04  sec   359 GBytes  51.4 Gbits/sec                  receiver
> 
> "iperf3 -c $peer -t 60 -Z -f g", connected via veth in another netns.
> 
> Fixes: 4890b686f408 ("net: keep sk->sk_forward_alloc as small as possible")
> Signed-off-by: Florian Westphal <fw@strlen.de>

Looks acceptable, Eric?
Alexander Lobakin Feb. 28, 2023, 4:28 p.m. UTC | #2
From: Jakub Kicinski <kuba@kernel.org>
Date: Mon, 27 Feb 2023 15:27:41 -0800

> On Fri, 24 Feb 2023 19:46:06 +0100 Florian Westphal wrote:
>> There is a noticeable tcp performance regression (loopback or cross-netns),
>> seen with iperf3 -Z (sendfile mode) when generic retpolines are needed.
>>
>> With SK_RECLAIM_THRESHOLD checks gone number of calls to enter/leave
>> memory pressure happen much more often. For TCP indirect calls are
>> used.
>>
>> We can't remove the if-set-return short-circuit check in
>> tcp_enter_memory_pressure because there are callers other than
>> sk_enter_memory_pressure.  Doing a check in the sk wrapper too
>> reduces the indirect calls enough to recover some performance.
>>
>> Before,
>> 0.00-60.00  sec   322 GBytes  46.1 Gbits/sec                  receiver
>>
>> After:
>> 0.00-60.04  sec   359 GBytes  51.4 Gbits/sec                  receiver
>>
>> "iperf3 -c $peer -t 60 -Z -f g", connected via veth in another netns.
>>
>> Fixes: 4890b686f408 ("net: keep sk->sk_forward_alloc as small as possible")
>> Signed-off-by: Florian Westphal <fw@strlen.de>
> 
> Looks acceptable, Eric?
> 
I'm no Eric, but I'd only change this:

+	if (!memory_pressure || READ_ONCE(*memory_pressure) == 0)

to

+	if (!memory_pressure || !READ_ONCE(*memory_pressure))

:p

The perf boost looks gross, love that *_*

Thanks,
Olek
Florian Westphal Feb. 28, 2023, 4:34 p.m. UTC | #3
Alexander Lobakin <aleksander.lobakin@intel.com> wrote:
> From: Jakub Kicinski <kuba@kernel.org>
> Date: Mon, 27 Feb 2023 15:27:41 -0800
> 
> > On Fri, 24 Feb 2023 19:46:06 +0100 Florian Westphal wrote:
> >> There is a noticeable tcp performance regression (loopback or cross-netns),
> >> seen with iperf3 -Z (sendfile mode) when generic retpolines are needed.
> >>
> >> With SK_RECLAIM_THRESHOLD checks gone number of calls to enter/leave
> >> memory pressure happen much more often. For TCP indirect calls are
> >> used.
> >>
> >> We can't remove the if-set-return short-circuit check in
> >> tcp_enter_memory_pressure because there are callers other than
> >> sk_enter_memory_pressure.  Doing a check in the sk wrapper too
> >> reduces the indirect calls enough to recover some performance.
> >>
> >> Before,
> >> 0.00-60.00  sec   322 GBytes  46.1 Gbits/sec                  receiver
> >>
> >> After:
> >> 0.00-60.04  sec   359 GBytes  51.4 Gbits/sec                  receiver
> >>
> >> "iperf3 -c $peer -t 60 -Z -f g", connected via veth in another netns.
> >>
> >> Fixes: 4890b686f408 ("net: keep sk->sk_forward_alloc as small as possible")
> >> Signed-off-by: Florian Westphal <fw@strlen.de>
> > 
> > Looks acceptable, Eric?
> > 
> I'm no Eric, but I'd only change this:
> 
> +	if (!memory_pressure || READ_ONCE(*memory_pressure) == 0)
> 
> to
> 
> +	if (!memory_pressure || !READ_ONCE(*memory_pressure))

I intentioanlly used '== 0', i found it too easy to miss the '!' before
'R'.  But maybe I just need better glasses.
Eric Dumazet Feb. 28, 2023, 4:35 p.m. UTC | #4
On Tue, Feb 28, 2023 at 5:35 PM Florian Westphal <fw@strlen.de> wrote:
>
> Alexander Lobakin <aleksander.lobakin@intel.com> wrote:
> > From: Jakub Kicinski <kuba@kernel.org>
> > Date: Mon, 27 Feb 2023 15:27:41 -0800
> >
> > > On Fri, 24 Feb 2023 19:46:06 +0100 Florian Westphal wrote:
> > >> There is a noticeable tcp performance regression (loopback or cross-netns),
> > >> seen with iperf3 -Z (sendfile mode) when generic retpolines are needed.
> > >>
> > >> With SK_RECLAIM_THRESHOLD checks gone number of calls to enter/leave
> > >> memory pressure happen much more often. For TCP indirect calls are
> > >> used.
> > >>
> > >> We can't remove the if-set-return short-circuit check in
> > >> tcp_enter_memory_pressure because there are callers other than
> > >> sk_enter_memory_pressure.  Doing a check in the sk wrapper too
> > >> reduces the indirect calls enough to recover some performance.
> > >>
> > >> Before,
> > >> 0.00-60.00  sec   322 GBytes  46.1 Gbits/sec                  receiver
> > >>
> > >> After:
> > >> 0.00-60.04  sec   359 GBytes  51.4 Gbits/sec                  receiver
> > >>
> > >> "iperf3 -c $peer -t 60 -Z -f g", connected via veth in another netns.
> > >>
> > >> Fixes: 4890b686f408 ("net: keep sk->sk_forward_alloc as small as possible")
> > >> Signed-off-by: Florian Westphal <fw@strlen.de>
> > >
> > > Looks acceptable, Eric?
> > >
> > I'm no Eric, but I'd only change this:
> >
> > +     if (!memory_pressure || READ_ONCE(*memory_pressure) == 0)
> >
> > to
> >
> > +     if (!memory_pressure || !READ_ONCE(*memory_pressure))
>
> I intentioanlly used '== 0', i found it too easy to miss the '!' before
> 'R'.  But maybe I just need better glasses.

Sorry for the delay, I will take a look.
Eric Dumazet Feb. 28, 2023, 4:44 p.m. UTC | #5
On Fri, Feb 24, 2023 at 7:49 PM Florian Westphal <fw@strlen.de> wrote:
>
> There is a noticeable tcp performance regression (loopback or cross-netns),
> seen with iperf3 -Z (sendfile mode) when generic retpolines are needed.
>
> With SK_RECLAIM_THRESHOLD checks gone number of calls to enter/leave
> memory pressure happen much more often. For TCP indirect calls are
> used.
>
> We can't remove the if-set-return short-circuit check in
> tcp_enter_memory_pressure because there are callers other than
> sk_enter_memory_pressure.  Doing a check in the sk wrapper too
> reduces the indirect calls enough to recover some performance.
>
> Before,
> 0.00-60.00  sec   322 GBytes  46.1 Gbits/sec                  receiver
>
> After:
> 0.00-60.04  sec   359 GBytes  51.4 Gbits/sec                  receiver
>
> "iperf3 -c $peer -t 60 -Z -f g", connected via veth in another netns.
>
> Fixes: 4890b686f408 ("net: keep sk->sk_forward_alloc as small as possible")
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  net/core/sock.c | 22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 341c565dbc26..45d247112aa5 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2809,22 +2809,26 @@ EXPORT_SYMBOL(sock_cmsg_send);
>
>  static void sk_enter_memory_pressure(struct sock *sk)

This one is probably not called under normal circumstances.

>  {
> -       if (!sk->sk_prot->enter_memory_pressure)
> +       unsigned long *memory_pressure = sk->sk_prot->memory_pressure;
> +
> +       if (!memory_pressure || READ_ONCE(*memory_pressure))
>                 return;
>
> -       sk->sk_prot->enter_memory_pressure(sk);
> +       if (sk->sk_prot->enter_memory_pressure)
> +               sk->sk_prot->enter_memory_pressure(sk);
>  }
>
>  static void sk_leave_memory_pressure(struct sock *sk)
>  {
> -       if (sk->sk_prot->leave_memory_pressure) {
> -               sk->sk_prot->leave_memory_pressure(sk);
> -       } else {
> -               unsigned long *memory_pressure = sk->sk_prot->memory_pressure;
> +       unsigned long *memory_pressure = sk->sk_prot->memory_pressure;
>
> -               if (memory_pressure && READ_ONCE(*memory_pressure))
> -                       WRITE_ONCE(*memory_pressure, 0);
> -       }
> +       if (!memory_pressure || READ_ONCE(*memory_pressure) == 0)
> +               return;
> +
> +       if (sk->sk_prot->leave_memory_pressure)
> +               sk->sk_prot->leave_memory_pressure(sk);
> +       else
> +               WRITE_ONCE(*memory_pressure, 0);
>  }
>

For this one, we have too callers.

First one (__sk_mem_reduce_allocated) is using

   if (sk_under_memory_pressure(sk)
     ...

So I wonder if for consistency we could use the same heuristic [1] ?

Note that [1] is memcg aware/ready.

 Refactoring of sk_enter_memory_pressure() and
sk_leave_memory_pressure() could wait net-next maybe...

[1]
diff --git a/net/core/sock.c b/net/core/sock.c
index 341c565dbc262fcece1c5b410609d910a68edcb0..0472f8559a10136672ce6647f133367c81a93cb7
100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2993,7 +2993,8 @@ int __sk_mem_raise_allocated(struct sock *sk,
int size, int amt, int kind)

        /* Under limit. */
        if (allocated <= sk_prot_mem_limits(sk, 0)) {
-               sk_leave_memory_pressure(sk);
+               if (sk_under_memory_pressure(sk))
+                       sk_leave_memory_pressure(sk);
                return 1;
        }
Eric Dumazet Feb. 28, 2023, 5:42 p.m. UTC | #6
On Tue, Feb 28, 2023 at 5:44 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Fri, Feb 24, 2023 at 7:49 PM Florian Westphal <fw@strlen.de> wrote:
> >
> > There is a noticeable tcp performance regression (loopback or cross-netns),
> > seen with iperf3 -Z (sendfile mode) when generic retpolines are needed.
> >
> > With SK_RECLAIM_THRESHOLD checks gone number of calls to enter/leave
> > memory pressure happen much more often. For TCP indirect calls are
> > used.
> >
> > We can't remove the if-set-return short-circuit check in
> > tcp_enter_memory_pressure because there are callers other than
> > sk_enter_memory_pressure.  Doing a check in the sk wrapper too
> > reduces the indirect calls enough to recover some performance.
> >
> > Before,
> > 0.00-60.00  sec   322 GBytes  46.1 Gbits/sec                  receiver
> >
> > After:
> > 0.00-60.04  sec   359 GBytes  51.4 Gbits/sec                  receiver
> >

BTW I was curious why Google was not seeing this, and it appears Brian Vasquez
forgot to upstream this change...

commit 5ea2f21d6c1078d2c563cb455ad5877b4ada94e1
Author: Brian Vazquez <brianvv@google.com>
Date:   Thu Mar 3 19:09:49 2022 -0800

    PRODKERNEL: net-directcall: annotate tcp_leave_memory_pressure and
    tcp_getsockopt

    Switch to upstream infra to make rebase easier

    Tested: built and booted on lab machine

    Upstream-Plan: 150254871
    Effort: net-directcall

diff --git a/net/core/sock.c b/net/core/sock.c
index 05032b399c873984e5297898d647905ca9f21f2e..54cb989dc162f3982380ac12cf5a150214e209a2
100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2647,10 +2647,13 @@ static void sk_enter_memory_pressure(struct sock *sk)
        sk->sk_prot->enter_memory_pressure(sk);
 }

+INDIRECT_CALLABLE_DECLARE(void tcp_leave_memory_pressure(struct sock *sk));
+
 static void sk_leave_memory_pressure(struct sock *sk)
 {
        if (sk->sk_prot->leave_memory_pressure) {
-               sk->sk_prot->leave_memory_pressure(sk);
+               INDIRECT_CALL_1(sk->sk_prot->leave_memory_pressure,
+                               tcp_leave_memory_pressure, sk);
        } else {
                unsigned long *memory_pressure = sk->sk_prot->memory_pressure;

@@ -3439,6 +3442,10 @@ int sock_recv_errqueue(struct sock *sk, struct
msghdr *msg, int len,
 }
 EXPORT_SYMBOL(sock_recv_errqueue);

+INDIRECT_CALLABLE_DECLARE(int tcp_getsockopt(struct sock *sk, int level,
+                                            int optname, char __user *optval,
+                                            int __user *optlen));
+
 /*
  *     Get a socket option on an socket.
  *
@@ -3451,7 +3458,8 @@ int sock_common_getsockopt(struct socket *sock,
int level, int optname,
 {
        struct sock *sk = sock->sk;

-       return sk->sk_prot->getsockopt(sk, level, optname, optval, optlen);
+       return INDIRECT_CALL_1(sk->sk_prot->getsockopt, tcp_getsockopt,
+                               sk, level, optname, optval, optlen);
 }
 EXPORT_SYMBOL(sock_common_getsockopt);
Florian Westphal March 1, 2023, 12:31 p.m. UTC | #7
Eric Dumazet <edumazet@google.com> wrote:
> BTW I was curious why Google was not seeing this, and it appears Brian Vasquez
> forgot to upstream this change...
> 
> commit 5ea2f21d6c1078d2c563cb455ad5877b4ada94e1
> Author: Brian Vazquez <brianvv@google.com>
> Date:   Thu Mar 3 19:09:49 2022 -0800
> 
>     PRODKERNEL: net-directcall: annotate tcp_leave_memory_pressure and
>     tcp_getsockopt
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 05032b399c873984e5297898d647905ca9f21f2e..54cb989dc162f3982380ac12cf5a150214e209a2
> 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2647,10 +2647,13 @@ static void sk_enter_memory_pressure(struct sock *sk)
>         sk->sk_prot->enter_memory_pressure(sk);
>  }
> 
> +INDIRECT_CALLABLE_DECLARE(void tcp_leave_memory_pressure(struct sock *sk));
> +
>  static void sk_leave_memory_pressure(struct sock *sk)
>  {
>         if (sk->sk_prot->leave_memory_pressure) {
> -               sk->sk_prot->leave_memory_pressure(sk);
> +               INDIRECT_CALL_1(sk->sk_prot->leave_memory_pressure,
> +                               tcp_leave_memory_pressure, sk);
>         } else {
>                 unsigned long *memory_pressure = sk->sk_prot->memory_pressure;

re-tested: this change also resolves the regression i was seeing.

If you prefer to upstream this instead of the proposed change then I'm
fine with that.

Thanks.
Eric Dumazet March 1, 2023, 12:51 p.m. UTC | #8
On Wed, Mar 1, 2023 at 1:31 PM Florian Westphal <fw@strlen.de> wrote:
>
> Eric Dumazet <edumazet@google.com> wrote:
> > BTW I was curious why Google was not seeing this, and it appears Brian Vasquez
> > forgot to upstream this change...
> >
> > commit 5ea2f21d6c1078d2c563cb455ad5877b4ada94e1
> > Author: Brian Vazquez <brianvv@google.com>
> > Date:   Thu Mar 3 19:09:49 2022 -0800
> >
> >     PRODKERNEL: net-directcall: annotate tcp_leave_memory_pressure and
> >     tcp_getsockopt
> >
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index 05032b399c873984e5297898d647905ca9f21f2e..54cb989dc162f3982380ac12cf5a150214e209a2
> > 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -2647,10 +2647,13 @@ static void sk_enter_memory_pressure(struct sock *sk)
> >         sk->sk_prot->enter_memory_pressure(sk);
> >  }
> >
> > +INDIRECT_CALLABLE_DECLARE(void tcp_leave_memory_pressure(struct sock *sk));
> > +
> >  static void sk_leave_memory_pressure(struct sock *sk)
> >  {
> >         if (sk->sk_prot->leave_memory_pressure) {
> > -               sk->sk_prot->leave_memory_pressure(sk);
> > +               INDIRECT_CALL_1(sk->sk_prot->leave_memory_pressure,
> > +                               tcp_leave_memory_pressure, sk);
> >         } else {
> >                 unsigned long *memory_pressure = sk->sk_prot->memory_pressure;
>
> re-tested: this change also resolves the regression i was seeing.
>
> If you prefer to upstream this instead of the proposed change then I'm
> fine with that.

This seems a bit less risky, if the plan is to add this to stable trees.
( We had this mitigation for ~4 years at Google)

I will rebase Brian patch (only the tcp_leave_memory_pressure part) and send it.

Thanks !
diff mbox series

Patch

diff --git a/net/core/sock.c b/net/core/sock.c
index 341c565dbc26..45d247112aa5 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2809,22 +2809,26 @@  EXPORT_SYMBOL(sock_cmsg_send);
 
 static void sk_enter_memory_pressure(struct sock *sk)
 {
-	if (!sk->sk_prot->enter_memory_pressure)
+	unsigned long *memory_pressure = sk->sk_prot->memory_pressure;
+
+	if (!memory_pressure || READ_ONCE(*memory_pressure))
 		return;
 
-	sk->sk_prot->enter_memory_pressure(sk);
+	if (sk->sk_prot->enter_memory_pressure)
+		sk->sk_prot->enter_memory_pressure(sk);
 }
 
 static void sk_leave_memory_pressure(struct sock *sk)
 {
-	if (sk->sk_prot->leave_memory_pressure) {
-		sk->sk_prot->leave_memory_pressure(sk);
-	} else {
-		unsigned long *memory_pressure = sk->sk_prot->memory_pressure;
+	unsigned long *memory_pressure = sk->sk_prot->memory_pressure;
 
-		if (memory_pressure && READ_ONCE(*memory_pressure))
-			WRITE_ONCE(*memory_pressure, 0);
-	}
+	if (!memory_pressure || READ_ONCE(*memory_pressure) == 0)
+		return;
+
+	if (sk->sk_prot->leave_memory_pressure)
+		sk->sk_prot->leave_memory_pressure(sk);
+	else
+		WRITE_ONCE(*memory_pressure, 0);
 }
 
 DEFINE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key);