diff mbox series

[1/2] Move hash calculation inside udp4_lib_lookup2()

Message ID 5eb8631d430248999116ce8ced13e4b2@AcuMS.aculab.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series udp: rescan hash2 list if chains crossed. | expand

Checks

Context Check Description
netdev/series_format warning Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1350 this patch: 1350
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 1365 this patch: 1365
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1373 this patch: 1373
netdev/checkpatch warning CHECK: From:/Signed-off-by: email comments mismatch: 'From: David Laight <David.Laight@ACULAB.COM>' != 'Signed-off-by: David Laight <david.laight@aculab.com>'
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

David Laight July 26, 2023, 12:05 p.m. UTC
Pass the udptable address into udp4_lib_lookup2() instead of the hash slot.

While ipv4_portaddr_hash(net, IP_ADDR_ANY, 0) is constant for each net
(the port is an xor) the value isn't saved.
Since the hash function doesn't get simplified when passed zero the hash
might as well be computed inside udp4_lib_lookup2().

This reduces the cache footprint and allows additional checks.

Signed-off-by: David Laight <david.laight@aculab.com>
---
 net/ipv4/udp.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

Comments

Paolo Abeni July 26, 2023, 1:44 p.m. UTC | #1
On Wed, 2023-07-26 at 12:05 +0000, David Laight wrote:
> Pass the udptable address into udp4_lib_lookup2() instead of the hash slot.
> 
> While ipv4_portaddr_hash(net, IP_ADDR_ANY, 0) is constant for each net
> (the port is an xor) the value isn't saved.
> Since the hash function doesn't get simplified when passed zero the hash

Are you sure? could you please objdump and compare the binary code
generated before and after the patch? In theory all the callers up to
__jhash_final() included should be inlined, and the compiler should be
able to optimze at least rol32(0, <n>).

Cheers,

Paolo
David Laight July 26, 2023, 2:02 p.m. UTC | #2
From: Paolo Abeni
> Sent: 26 July 2023 14:44
> 
> On Wed, 2023-07-26 at 12:05 +0000, David Laight wrote:
> > Pass the udptable address into udp4_lib_lookup2() instead of the hash slot.
> >
> > While ipv4_portaddr_hash(net, IP_ADDR_ANY, 0) is constant for each net
> > (the port is an xor) the value isn't saved.
> > Since the hash function doesn't get simplified when passed zero the hash
> 
> Are you sure? could you please objdump and compare the binary code
> generated before and after the patch? In theory all the callers up to
> __jhash_final() included should be inlined, and the compiler should be
> able to optimze at least rol32(0, <n>).

I looked the hash is 20+ instructions and pretty much all of
them appeared twice.
(I'm in the wrong building to have a buildable kernel tree.)

It has to be said that ipv4_portaddr_hash(net, IPADDR_ANY, port)
could just be net_hash_mix(net) ^ port.
(Or maybe you could use a different random value.)

I'm not even sure the relatively expensive mixing of 'saddr'
is needed - it is one of the local IPv4 addresses.
Mixing in the remote address for connected sockets might
be useful for any server code that uses a lot of connected
udp sockets - but that isn't done.

We will have hundreds of udp sockets with different ports that
are not connected (I don't know if they get bound to a local
address). So a remote address hash wouldn't help.

If you look at the generated code for __udp4_lib_lookup()
it is actually quite horrid.
Too many called functions with too many parameters.
Things spill to the stack all the time.

The reuse_port code made it a whole lot worse.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Eric Dumazet July 26, 2023, 2:03 p.m. UTC | #3
On Wed, Jul 26, 2023 at 4:02 PM David Laight <David.Laight@aculab.com> wrote:
>
> From: Paolo Abeni
> > Sent: 26 July 2023 14:44
> >
> > On Wed, 2023-07-26 at 12:05 +0000, David Laight wrote:
> > > Pass the udptable address into udp4_lib_lookup2() instead of the hash slot.
> > >
> > > While ipv4_portaddr_hash(net, IP_ADDR_ANY, 0) is constant for each net
> > > (the port is an xor) the value isn't saved.
> > > Since the hash function doesn't get simplified when passed zero the hash
> >
> > Are you sure? could you please objdump and compare the binary code
> > generated before and after the patch? In theory all the callers up to
> > __jhash_final() included should be inlined, and the compiler should be
> > able to optimze at least rol32(0, <n>).
>
> I looked the hash is 20+ instructions and pretty much all of
> them appeared twice.
> (I'm in the wrong building to have a buildable kernel tree.)
>
> It has to be said that ipv4_portaddr_hash(net, IPADDR_ANY, port)
> could just be net_hash_mix(net) ^ port.
> (Or maybe you could use a different random value.)
>
> I'm not even sure the relatively expensive mixing of 'saddr'
> is needed - it is one of the local IPv4 addresses.
> Mixing in the remote address for connected sockets might
> be useful for any server code that uses a lot of connected
> udp sockets - but that isn't done.
>
> We will have hundreds of udp sockets with different ports that
> are not connected (I don't know if they get bound to a local
> address). So a remote address hash wouldn't help.
>
> If you look at the generated code for __udp4_lib_lookup()
> it is actually quite horrid.
> Too many called functions with too many parameters.
> Things spill to the stack all the time.
>
> The reuse_port code made it a whole lot worse.
>

Maybe ... Please show us performance numbers.

If less than 1%, I would not bother changing this code, making future
backports more risky.
David Laight July 26, 2023, 2:06 p.m. UTC | #4
From: Eric Dumazet
> Sent: 26 July 2023 15:04
> 
> On Wed, Jul 26, 2023 at 4:02 PM David Laight <David.Laight@aculab.com> wrote:
> >
> > From: Paolo Abeni
> > > Sent: 26 July 2023 14:44
> > >
> > > On Wed, 2023-07-26 at 12:05 +0000, David Laight wrote:
> > > > Pass the udptable address into udp4_lib_lookup2() instead of the hash slot.
> > > >
> > > > While ipv4_portaddr_hash(net, IP_ADDR_ANY, 0) is constant for each net
> > > > (the port is an xor) the value isn't saved.
> > > > Since the hash function doesn't get simplified when passed zero the hash
> > >
> > > Are you sure? could you please objdump and compare the binary code
> > > generated before and after the patch? In theory all the callers up to
> > > __jhash_final() included should be inlined, and the compiler should be
> > > able to optimze at least rol32(0, <n>).
> >
> > I looked the hash is 20+ instructions and pretty much all of
> > them appeared twice.
> > (I'm in the wrong building to have a buildable kernel tree.)
> >
> > It has to be said that ipv4_portaddr_hash(net, IPADDR_ANY, port)
> > could just be net_hash_mix(net) ^ port.
> > (Or maybe you could use a different random value.)
> >
> > I'm not even sure the relatively expensive mixing of 'saddr'
> > is needed - it is one of the local IPv4 addresses.
> > Mixing in the remote address for connected sockets might
> > be useful for any server code that uses a lot of connected
> > udp sockets - but that isn't done.
> >
> > We will have hundreds of udp sockets with different ports that
> > are not connected (I don't know if they get bound to a local
> > address). So a remote address hash wouldn't help.
> >
> > If you look at the generated code for __udp4_lib_lookup()
> > it is actually quite horrid.
> > Too many called functions with too many parameters.
> > Things spill to the stack all the time.
> >
> > The reuse_port code made it a whole lot worse.
> >
> 
> Maybe ... Please show us performance numbers.
> 
> If less than 1%, I would not bother changing this code, making future
> backports more risky.

I didn't :-)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
David Laight July 26, 2023, 3:42 p.m. UTC | #5
From: Paolo Abeni
> Sent: 26 July 2023 14:44
> 
> On Wed, 2023-07-26 at 12:05 +0000, David Laight wrote:
> > Pass the udptable address into udp4_lib_lookup2() instead of the hash slot.
> >
> > While ipv4_portaddr_hash(net, IP_ADDR_ANY, 0) is constant for each net
> > (the port is an xor) the value isn't saved.
> > Since the hash function doesn't get simplified when passed zero the hash
> 
> Are you sure? could you please objdump and compare the binary code
> generated before and after the patch? In theory all the callers up to
> __jhash_final() included should be inlined, and the compiler should be
> able to optimze at least rol32(0, <n>).

From an oldish build I have lurking:
0000000000003960 <__udp4_lib_lookup>:
    3960:       e8 00 00 00 00          callq  3965 <__udp4_lib_lookup+0x5>
                        3961: R_X86_64_PC32     __fentry__-0x4
    3965:       48 83 ec 48             sub    $0x48,%rsp
    3969:       66 41 c1 c0 08          rol    $0x8,%r8w
    396e:       48 89 5c 24 18          mov    %rbx,0x18(%rsp)
    3973:       48 89 6c 24 20          mov    %rbp,0x20(%rsp)
    3978:       89 f5                   mov    %esi,%ebp
    397a:       4c 89 64 24 28          mov    %r12,0x28(%rsp)
    397f:       4c 89 6c 24 30          mov    %r13,0x30(%rsp)
    3984:       44 0f b7 e2             movzwl %dx,%r12d
    3988:       4c 89 74 24 38          mov    %r14,0x38(%rsp)
    398d:       4c 89 7c 24 40          mov    %r15,0x40(%rsp)
    3992:       49 89 fe                mov    %rdi,%r14
    3995:       8b bf 40 01 00 00       mov    0x140(%rdi),%edi
    399b:       45 0f b7 f8             movzwl %r8w,%r15d
    399f:       48 8b 5c 24 58          mov    0x58(%rsp),%rbx
    39a4:       48 8b 54 24 60          mov    0x60(%rsp),%rdx
    39a9:       45 89 cd                mov    %r9d,%r13d
    39ac:       81 ef 0d 41 52 21       sub    $0x2152410d,%edi
    39b2:       89 fe                   mov    %edi,%esi
    39b4:       8d 04 39                lea    (%rcx,%rdi,1),%eax
    39b7:       48 89 54 24 10          mov    %rdx,0x10(%rsp)
    39bc:       c1 c6 0e                rol    $0xe,%esi
    39bf:       44 89 e2                mov    %r12d,%edx
    39c2:       f7 de                   neg    %esi
    39c4:       41 89 f0                mov    %esi,%r8d
    39c7:       31 f0                   xor    %esi,%eax
    39c9:       41 c1 c0 0b             rol    $0xb,%r8d
    39cd:       44 29 c0                sub    %r8d,%eax
    39d0:       41 89 c0                mov    %eax,%r8d
    39d3:       31 c7                   xor    %eax,%edi
    39d5:       41 c1 c0 19             rol    $0x19,%r8d
    39d9:       44 29 c7                sub    %r8d,%edi
    39dc:       41 89 f8                mov    %edi,%r8d
    39df:       31 fe                   xor    %edi,%esi
    39e1:       41 c1 c0 10             rol    $0x10,%r8d
    39e5:       44 29 c6                sub    %r8d,%esi
    39e8:       41 89 f0                mov    %esi,%r8d
    39eb:       31 f0                   xor    %esi,%eax
    39ed:       41 c1 c0 04             rol    $0x4,%r8d
    39f1:       44 29 c0                sub    %r8d,%eax
    39f4:       45 89 f8                mov    %r15d,%r8d
    39f7:       31 c7                   xor    %eax,%edi
    39f9:       c1 c0 0e                rol    $0xe,%eax
    39fc:       29 c7                   sub    %eax,%edi
    39fe:       89 f8                   mov    %edi,%eax
    3a00:       c1 c7 18                rol    $0x18,%edi
    3a03:       31 f0                   xor    %esi,%eax
    3a05:       89 ee                   mov    %ebp,%esi
    3a07:       29 f8                   sub    %edi,%eax
    3a09:       4c 89 f7                mov    %r14,%rdi
    3a0c:       44 31 f8                xor    %r15d,%eax
    3a0f:       23 43 10                and    0x10(%rbx),%eax
    3a12:       48 c1 e0 04             shl    $0x4,%rax
    3a16:       48 03 43 08             add    0x8(%rbx),%rax
    3a1a:       48 89 44 24 08          mov    %rax,0x8(%rsp)
    3a1f:       8b 44 24 50             mov    0x50(%rsp),%eax
    3a23:       89 04 24                mov    %eax,(%rsp)
    3a26:       e8 05 f7 ff ff          callq  3130 <udp4_lib_lookup2>
    3a2b:       48 85 c0                test   %rax,%rax
    3a2e:       74 32                   je     3a62 <__udp4_lib_lookup+0x102>
    3a30:       48 3d 01 f0 ff ff       cmp    $0xfffffffffffff001,%rax
    3a36:       ba 00 00 00 00          mov    $0x0,%edx
    3a3b:       48 8b 5c 24 18          mov    0x18(%rsp),%rbx
    3a40:       48 0f 43 c2             cmovae %rdx,%rax
    3a44:       48 8b 6c 24 20          mov    0x20(%rsp),%rbp
    3a49:       4c 8b 64 24 28          mov    0x28(%rsp),%r12
    3a4e:       4c 8b 6c 24 30          mov    0x30(%rsp),%r13
    3a53:       4c 8b 74 24 38          mov    0x38(%rsp),%r14
    3a58:       4c 8b 7c 24 40          mov    0x40(%rsp),%r15
    3a5d:       48 83 c4 48             add    $0x48,%rsp
    3a61:       c3                      retq
    3a62:       41 8b 96 40 01 00 00    mov    0x140(%r14),%edx
    3a69:       45 89 e9                mov    %r13d,%r9d
    3a6c:       45 89 f8                mov    %r15d,%r8d
    3a6f:       4c 89 f7                mov    %r14,%rdi
    3a72:       81 ea 0d 41 52 21       sub    $0x2152410d,%edx
    3a78:       89 d0                   mov    %edx,%eax
    3a7a:       c1 c0 0e                rol    $0xe,%eax
    3a7d:       f7 d8                   neg    %eax
    3a7f:       89 c1                   mov    %eax,%ecx
    3a81:       89 c6                   mov    %eax,%esi
    3a83:       31 d1                   xor    %edx,%ecx
    3a85:       c1 c6 0b                rol    $0xb,%esi
    3a88:       29 f1                   sub    %esi,%ecx
    3a8a:       89 ce                   mov    %ecx,%esi
    3a8c:       31 ca                   xor    %ecx,%edx
    3a8e:       c1 c6 19                rol    $0x19,%esi
    3a91:       29 f2                   sub    %esi,%edx
    3a93:       89 d6                   mov    %edx,%esi
    3a95:       31 d0                   xor    %edx,%eax
    3a97:       c1 c6 10                rol    $0x10,%esi
    3a9a:       29 f0                   sub    %esi,%eax
    3a9c:       89 c6                   mov    %eax,%esi
    3a9e:       31 c1                   xor    %eax,%ecx
    3aa0:       c1 c6 04                rol    $0x4,%esi
    3aa3:       29 f1                   sub    %esi,%ecx
    3aa5:       89 ee                   mov    %ebp,%esi
    3aa7:       31 ca                   xor    %ecx,%edx
    3aa9:       c1 c1 0e                rol    $0xe,%ecx
    3aac:       29 ca                   sub    %ecx,%edx
    3aae:       48 8b 4c 24 60          mov    0x60(%rsp),%rcx
    3ab3:       31 d0                   xor    %edx,%eax
    3ab5:       c1 c2 18                rol    $0x18,%edx
    3ab8:       29 d0                   sub    %edx,%eax
    3aba:       44 89 e2                mov    %r12d,%edx
    3abd:       48 89 4c 24 10          mov    %rcx,0x10(%rsp)
    3ac2:       44 31 f8                xor    %r15d,%eax
    3ac5:       23 43 10                and    0x10(%rbx),%eax
    3ac8:       31 c9                   xor    %ecx,%ecx
    3aca:       48 c1 e0 04             shl    $0x4,%rax
    3ace:       48 03 43 08             add    0x8(%rbx),%rax
    3ad2:       48 89 44 24 08          mov    %rax,0x8(%rsp)
    3ad7:       8b 44 24 50             mov    0x50(%rsp),%eax
    3adb:       89 04 24                mov    %eax,(%rsp)
    3ade:       e8 4d f6 ff ff          callq  3130 <udp4_lib_lookup2>
    3ae3:       e9 48 ff ff ff          jmpq   3a30 <__udp4_lib_lookup+0xd0>

The two copies of the hash are pretty much the same.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
diff mbox series

Patch

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 42a96b3547c9..ad64d6c4cd99 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -439,12 +439,18 @@  static struct sock *udp4_lib_lookup2(struct net *net,
 				     __be32 saddr, __be16 sport,
 				     __be32 daddr, unsigned int hnum,
 				     int dif, int sdif,
-				     struct udp_hslot *hslot2,
+				     struct udp_table *udptable,
 				     struct sk_buff *skb)
 {
+	unsigned int hash2, slot2;
+	struct udp_hslot *hslot2;
 	struct sock *sk, *result;
 	int score, badness;
 
+	hash2 = ipv4_portaddr_hash(net, daddr, hnum);
+	slot2 = hash2 & udptable->mask;
+	hslot2 = &udptable->hash2[slot2];
+
 	result = NULL;
 	badness = 0;
 	udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
@@ -495,18 +501,12 @@  struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 		int sdif, struct udp_table *udptable, struct sk_buff *skb)
 {
 	unsigned short hnum = ntohs(dport);
-	unsigned int hash2, slot2;
-	struct udp_hslot *hslot2;
 	struct sock *result, *sk;
 
-	hash2 = ipv4_portaddr_hash(net, daddr, hnum);
-	slot2 = hash2 & udptable->mask;
-	hslot2 = &udptable->hash2[slot2];
-
 	/* Lookup connected or non-wildcard socket */
 	result = udp4_lib_lookup2(net, saddr, sport,
 				  daddr, hnum, dif, sdif,
-				  hslot2, skb);
+				  udptable, skb);
 	if (!IS_ERR_OR_NULL(result) && result->sk_state == TCP_ESTABLISHED)
 		goto done;
 
@@ -525,13 +525,9 @@  struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 		goto done;
 
 	/* Lookup wildcard sockets */
-	hash2 = ipv4_portaddr_hash(net, htonl(INADDR_ANY), hnum);
-	slot2 = hash2 & udptable->mask;
-	hslot2 = &udptable->hash2[slot2];
-
 	result = udp4_lib_lookup2(net, saddr, sport,
 				  htonl(INADDR_ANY), hnum, dif, sdif,
-				  hslot2, skb);
+				  udptable, skb);
 done:
 	if (IS_ERR(result))
 		return NULL;