Message ID | 20230330192534.2307410-1-edumazet@google.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] tcp: tsq: avoid using a tasklet if possible | expand |
On Thu, 30 Mar 2023 19:25:34 +0000 Eric Dumazet wrote: > Instead of always defer TCP Small Queue handling to a tasklet > we can try to lock the socket and immediately call tcp_tsq_handler() > > In my experiments on a 100Gbit NIC with FQ qdisc, number of times > tcp_tasklet_func() is fired per second goes from ~70,000 to ~1,000. > > This reduces number of ksoftirqd wakeups and thus extra cpu > costs and latencies. Oh, you posted already. LMK what you think about other locks, I reckon we should apply this patch... just maybe around -rc1 to give ourselves plenty of time to find the broken drivers?
On Thu, Mar 30, 2023 at 10:08 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 30 Mar 2023 19:25:34 +0000 Eric Dumazet wrote: > > Instead of always defer TCP Small Queue handling to a tasklet > > we can try to lock the socket and immediately call tcp_tsq_handler() > > > > In my experiments on a 100Gbit NIC with FQ qdisc, number of times > > tcp_tasklet_func() is fired per second goes from ~70,000 to ~1,000. > > > > This reduces number of ksoftirqd wakeups and thus extra cpu > > costs and latencies. > > Oh, you posted already. LMK what you think about other locks, > I reckon we should apply this patch... just maybe around -rc1 > to give ourselves plenty of time to find the broken drivers? Agreed, this can wait. I will post a v2 RFC with the additional logic, then I will audit napi_consume_skb() callers. (BTW this logic removes one indirect call to tcp_wfree()) Thanks diff --git a/include/net/tcp.h b/include/net/tcp.h index a0a91a988272710470cd20f22e02e49476513239..7efdda889b001fdd2cb941e40d1318a6b80b78bb 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -339,6 +339,7 @@ int tcp_send_mss(struct sock *sk, int *size_goal, int flags); void tcp_push(struct sock *sk, int flags, int mss_now, int nonagle, int size_goal); void tcp_release_cb(struct sock *sk); +void __tcp_wfree(struct sk_buff *skb, bool trylock); void tcp_wfree(struct sk_buff *skb); void tcp_write_timer_handler(struct sock *sk); void tcp_delack_timer_handler(struct sock *sk); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 050a875d09c5535e0cb2e52c9c65f3febd0385d6..2b09c247721144696a937a928a7b8af7f1ad01db 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1242,6 +1242,9 @@ void napi_skb_free_stolen_head(struct sk_buff *skb) napi_skb_cache_put(skb); } +/* + * Note: callers must not lock the TX queue. + */ void napi_consume_skb(struct sk_buff *skb, int budget) { /* Zero budget indicate non-NAPI context called us, like netpoll */ @@ -1260,6 +1263,10 @@ void napi_consume_skb(struct sk_buff *skb, int budget) /* if SKB is a clone, don't handle this case */ if (skb->fclone != SKB_FCLONE_UNAVAILABLE) { + if (skb->destructor == tcp_wfree) { + __tcp_wfree(skb, true); + skb->destructor = NULL; + } __kfree_skb(skb); return; } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 470bb840bb6b6fb457f96d5c00fdfd11b414482f..63ff79f92a2d44c403bd521850875839f18a0c20 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1139,7 +1139,7 @@ void __init tcp_tasklet_init(void) * We can't xmit new skbs from this context, as we might already * hold qdisc lock. */ -void tcp_wfree(struct sk_buff *skb) +void __tcp_wfree(struct sk_buff *skb, bool trylock) { struct sock *sk = skb->sk; struct tcp_sock *tp = tcp_sk(sk); @@ -1171,7 +1171,7 @@ void tcp_wfree(struct sk_buff *skb) } while (!try_cmpxchg(&sk->sk_tsq_flags, &oval, nval)); /* attempt to grab socket lock. */ - if (spin_trylock_bh(&sk->sk_lock.slock)) { + if (trylock && spin_trylock_bh(&sk->sk_lock.slock)) { clear_bit(TSQ_QUEUED, &sk->sk_tsq_flags); tcp_tsq_handler_locked(sk); spin_unlock_bh(&sk->sk_lock.slock); @@ -1190,6 +1190,11 @@ void tcp_wfree(struct sk_buff *skb) sk_free(sk); } +void tcp_wfree(struct sk_buff *skb) +{ + __tcp_wfree(skb, false); +} + /* Note: Called under soft irq. * We can call TCP stack right away, unless socket is owned by user. */
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index cfe128b81a010339b486dd6a40b077cee9570d08..470bb840bb6b6fb457f96d5c00fdfd11b414482f 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1023,13 +1023,18 @@ static void tcp_tsq_write(struct sock *sk) } } -static void tcp_tsq_handler(struct sock *sk) +static void tcp_tsq_handler_locked(struct sock *sk) { - bh_lock_sock(sk); if (!sock_owned_by_user(sk)) tcp_tsq_write(sk); else if (!test_and_set_bit(TCP_TSQ_DEFERRED, &sk->sk_tsq_flags)) sock_hold(sk); +} + +static void tcp_tsq_handler(struct sock *sk) +{ + bh_lock_sock(sk); + tcp_tsq_handler_locked(sk); bh_unlock_sock(sk); } /* @@ -1165,6 +1170,13 @@ void tcp_wfree(struct sk_buff *skb) nval = (oval & ~TSQF_THROTTLED) | TSQF_QUEUED; } while (!try_cmpxchg(&sk->sk_tsq_flags, &oval, nval)); + /* attempt to grab socket lock. */ + if (spin_trylock_bh(&sk->sk_lock.slock)) { + clear_bit(TSQ_QUEUED, &sk->sk_tsq_flags); + tcp_tsq_handler_locked(sk); + spin_unlock_bh(&sk->sk_lock.slock); + goto out; + } /* queue this socket to tasklet queue */ local_irq_save(flags); tsq = this_cpu_ptr(&tsq_tasklet);
Instead of always defer TCP Small Queue handling to a tasklet we can try to lock the socket and immediately call tcp_tsq_handler() In my experiments on a 100Gbit NIC with FQ qdisc, number of times tcp_tasklet_func() is fired per second goes from ~70,000 to ~1,000. This reduces number of ksoftirqd wakeups and thus extra cpu costs and latencies. Signed-off-by: Eric Dumazet <edumazet@google.com> --- net/ipv4/tcp_output.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)