Message ID | 20250302124237.3913746-2-edumazet@google.com (mailing list archive) |
---|---|
State | Accepted |
Commit | ae9d5b19b322d4b557efda684da2f4df21670ef8 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | tcp: scale connect() under pressure | expand |
On Sun, Mar 2, 2025 at 8:42 PM Eric Dumazet <edumazet@google.com> wrote: > > When __inet_hash_connect() has to try many 4-tuples before > finding an available one, we see a high spinlock cost from > __inet_check_established() and/or __inet6_check_established(). > > This patch adds an RCU lookup to avoid the spinlock > acquisition when the 4-tuple is found in the hash table. > > Note that there are still spin_lock_bh() calls in > __inet_hash_connect() to protect inet_bind_hashbucket, > this will be fixed later in this series. > > Signed-off-by: Eric Dumazet <edumazet@google.com> > Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Yesterday, I did a few tests on this single patch and managed to see a ~7% increase in performance on my virtual machine[1] :) Thanks! Tested-by: Jason Xing <kerneljasonxing@gmail.com> [1]: https://lore.kernel.org/all/CAL+tcoBAVmTk_JBX=OEBqZZuoSzZd8bjuw9rgwRLMd9fvZOSkA@mail.gmail.com/
From: Eric Dumazet <edumazet@google.com> Date: Sun, 2 Mar 2025 12:42:34 +0000 > When __inet_hash_connect() has to try many 4-tuples before > finding an available one, we see a high spinlock cost from > __inet_check_established() and/or __inet6_check_established(). > > This patch adds an RCU lookup to avoid the spinlock > acquisition when the 4-tuple is found in the hash table. > > Note that there are still spin_lock_bh() calls in > __inet_hash_connect() to protect inet_bind_hashbucket, > this will be fixed later in this series. > > Signed-off-by: Eric Dumazet <edumazet@google.com> > Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 9bfcfd016e18275fb50fea8d77adc8a64fb12494..46d39aa2199ec3a405b50e8e85130e990d2c26b7 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -551,11 +551,24 @@ static int __inet_check_established(struct inet_timewait_death_row *death_row, unsigned int hash = inet_ehashfn(net, daddr, lport, saddr, inet->inet_dport); struct inet_ehash_bucket *head = inet_ehash_bucket(hinfo, hash); - spinlock_t *lock = inet_ehash_lockp(hinfo, hash); - struct sock *sk2; - const struct hlist_nulls_node *node; struct inet_timewait_sock *tw = NULL; + const struct hlist_nulls_node *node; + struct sock *sk2; + spinlock_t *lock; + + rcu_read_lock(); + sk_nulls_for_each(sk2, node, &head->chain) { + if (sk2->sk_hash != hash || + !inet_match(net, sk2, acookie, ports, dif, sdif)) + continue; + if (sk2->sk_state == TCP_TIME_WAIT) + break; + rcu_read_unlock(); + return -EADDRNOTAVAIL; + } + rcu_read_unlock(); + lock = inet_ehash_lockp(hinfo, hash); spin_lock(lock); sk_nulls_for_each(sk2, node, &head->chain) { diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index 9ec05e354baa69d14e88da37f5a9fce11e874e35..3604a5cae5d29a25d24f9513308334ff8e64b083 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -276,11 +276,24 @@ static int __inet6_check_established(struct inet_timewait_death_row *death_row, const unsigned int hash = inet6_ehashfn(net, daddr, lport, saddr, inet->inet_dport); struct inet_ehash_bucket *head = inet_ehash_bucket(hinfo, hash); - spinlock_t *lock = inet_ehash_lockp(hinfo, hash); - struct sock *sk2; - const struct hlist_nulls_node *node; struct inet_timewait_sock *tw = NULL; + const struct hlist_nulls_node *node; + struct sock *sk2; + spinlock_t *lock; + + rcu_read_lock(); + sk_nulls_for_each(sk2, node, &head->chain) { + if (sk2->sk_hash != hash || + !inet6_match(net, sk2, saddr, daddr, ports, dif, sdif)) + continue; + if (sk2->sk_state == TCP_TIME_WAIT) + break; + rcu_read_unlock(); + return -EADDRNOTAVAIL; + } + rcu_read_unlock(); + lock = inet_ehash_lockp(hinfo, hash); spin_lock(lock); sk_nulls_for_each(sk2, node, &head->chain) {