Message ID | 20211214102647.7734-1-magnus.karlsson@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | [bpf-next] xsk: add test for tx_writeable to batched path | expand |
On Tue, Dec 14, 2021 at 11:26:47AM +0100, Magnus Karlsson wrote: > From: Magnus Karlsson <magnus.karlsson@intel.com> > > Add a test for the tx_writeable condition to the batched Tx processing > path. This test is in the skb and non-batched code paths but not in the > batched code path. So add it there. This test makes sure that a > process is not woken up until there are a sufficiently large number of > free entries in the Tx ring. Currently, any driver using the batched > interface will be woken up even if there is only one free entry, > impacting performance negatively. I gave this patch a shot on ice driver with the Tx batching patch that i'm about to send which is using the xsk_tx_peek_release_desc_batch(). I ran the 2 core setup with no busy poll and it turned out that this change has a negative impact on performance - it degrades by 5%. After a short chat with Magnus he said it's due to the touch to the global state of a ring that xsk_tx_writeable() is doing. So maintainers, please do not apply this yet, we'll come up with a solution. Also, should this be sent to bpf tree (not bpf-next) ? Thanks! > > Fixes: 3413f04141aa ("xsk: Change the tx writeable condition") > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> > --- > net/xdp/xsk.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > index 28ef3f4465ae..3772fcaa76ed 100644 > --- a/net/xdp/xsk.c > +++ b/net/xdp/xsk.c > @@ -392,7 +392,8 @@ u32 xsk_tx_peek_release_desc_batch(struct xsk_buff_pool *pool, struct xdp_desc * > > xskq_cons_release_n(xs->tx, nb_pkts); > __xskq_cons_release(xs->tx); > - xs->sk.sk_write_space(&xs->sk); > + if (xsk_tx_writeable(xs)) > + xs->sk.sk_write_space(&xs->sk); > > out: > rcu_read_unlock(); > > base-commit: d27a662290963a1cde26cdfdbac71a546c06e94a > -- > 2.29.0 >
On Tue, Dec 14, 2021 at 3:28 PM Maciej Fijalkowski <maciej.fijalkowski@intel.com> wrote: > > On Tue, Dec 14, 2021 at 11:26:47AM +0100, Magnus Karlsson wrote: > > From: Magnus Karlsson <magnus.karlsson@intel.com> > > > > Add a test for the tx_writeable condition to the batched Tx processing > > path. This test is in the skb and non-batched code paths but not in the > > batched code path. So add it there. This test makes sure that a > > process is not woken up until there are a sufficiently large number of > > free entries in the Tx ring. Currently, any driver using the batched > > interface will be woken up even if there is only one free entry, > > impacting performance negatively. > > I gave this patch a shot on ice driver with the Tx batching patch that i'm > about to send which is using the xsk_tx_peek_release_desc_batch(). I ran > the 2 core setup with no busy poll and it turned out that this change has > a negative impact on performance - it degrades by 5%. > > After a short chat with Magnus he said it's due to the touch to the global > state of a ring that xsk_tx_writeable() is doing. > > So maintainers, please do not apply this yet, we'll come up with a > solution. > > Also, should this be sent to bpf tree (not bpf-next) ? It is just a performance fix, so I would say bpf-next. > Thanks! > > > > > Fixes: 3413f04141aa ("xsk: Change the tx writeable condition") > > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> > > --- > > net/xdp/xsk.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > > index 28ef3f4465ae..3772fcaa76ed 100644 > > --- a/net/xdp/xsk.c > > +++ b/net/xdp/xsk.c > > @@ -392,7 +392,8 @@ u32 xsk_tx_peek_release_desc_batch(struct xsk_buff_pool *pool, struct xdp_desc * > > > > xskq_cons_release_n(xs->tx, nb_pkts); > > __xskq_cons_release(xs->tx); > > - xs->sk.sk_write_space(&xs->sk); > > + if (xsk_tx_writeable(xs)) > > + xs->sk.sk_write_space(&xs->sk); > > > > out: > > rcu_read_unlock(); > > > > base-commit: d27a662290963a1cde26cdfdbac71a546c06e94a > > -- > > 2.29.0 > >
On 12/14/21 3:29 PM, Magnus Karlsson wrote: > On Tue, Dec 14, 2021 at 3:28 PM Maciej Fijalkowski > <maciej.fijalkowski@intel.com> wrote: >> On Tue, Dec 14, 2021 at 11:26:47AM +0100, Magnus Karlsson wrote: >>> From: Magnus Karlsson <magnus.karlsson@intel.com> >>> >>> Add a test for the tx_writeable condition to the batched Tx processing >>> path. This test is in the skb and non-batched code paths but not in the >>> batched code path. So add it there. This test makes sure that a >>> process is not woken up until there are a sufficiently large number of >>> free entries in the Tx ring. Currently, any driver using the batched >>> interface will be woken up even if there is only one free entry, >>> impacting performance negatively. >> >> I gave this patch a shot on ice driver with the Tx batching patch that i'm >> about to send which is using the xsk_tx_peek_release_desc_batch(). I ran >> the 2 core setup with no busy poll and it turned out that this change has >> a negative impact on performance - it degrades by 5%. >> >> After a short chat with Magnus he said it's due to the touch to the global >> state of a ring that xsk_tx_writeable() is doing. >> >> So maintainers, please do not apply this yet, we'll come up with a >> solution. >> >> Also, should this be sent to bpf tree (not bpf-next) ? > > It is just a performance fix, so I would say bpf-next. Ok, np. I'm tossing it from patchwork in that case now, and wait for a v2. Given the fix is a two-liner, I'll leave it up to you which tree you think it would be best. Thanks, Daniel
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 28ef3f4465ae..3772fcaa76ed 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -392,7 +392,8 @@ u32 xsk_tx_peek_release_desc_batch(struct xsk_buff_pool *pool, struct xdp_desc * xskq_cons_release_n(xs->tx, nb_pkts); __xskq_cons_release(xs->tx); - xs->sk.sk_write_space(&xs->sk); + if (xsk_tx_writeable(xs)) + xs->sk.sk_write_space(&xs->sk); out: rcu_read_unlock();