Message ID | 20230512025740.1068965-1-kuba@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net/mlx5e: do as little as possible in napi poll when budget is 0 | expand |
On 12/05/2023 5:57, Jakub Kicinski wrote: > NAPI gets called with budget of 0 from netpoll, which has interrupts > disabled. We should try to free some space on Tx rings and nothing > else. > > Specifically do not try to handle XDP TX or try to refill Rx buffers - > we can't use the page pool from IRQ context. Don't check if IRQs moved, > either, that makes no sense in netpoll. Netpoll calls _all_ the rings > from whatever CPU it happens to be invoked on. > > In general do as little as possible, the work quickly adds up when > there's tens of rings to poll. > > The immediate stack trace I was seeing is: > > __do_softirq+0xd1/0x2c0 > __local_bh_enable_ip+0xc7/0x120 > </IRQ> > <TASK> > page_pool_put_defragged_page+0x267/0x320 > mlx5e_free_xdpsq_desc+0x99/0xd0 > mlx5e_poll_xdpsq_cq+0x138/0x3b0 > mlx5e_napi_poll+0xc3/0x8b0 > netpoll_poll_dev+0xce/0x150 > > AFAIU page pool takes a BH lock, releases it and since BH is now > enabled tries to run softirqs. > > Fixes: 60bbf7eeef10 ("mlx5: use page_pool for xdp_return_frame call") > Signed-off-by: Jakub Kicinski <kuba@kernel.org> > --- > I'm pointing Fixes at where page_pool was added, although that's > probably not 100% fair. > > CC: saeedm@nvidia.com > CC: leon@kernel.org > CC: brouer@redhat.com > CC: tariqt@mellanox.com > --- Thanks for your patch. Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
On Fri, May 12, 2023 at 4:57 AM Jakub Kicinski <kuba@kernel.org> wrote: > > NAPI gets called with budget of 0 from netpoll, which has interrupts > disabled. We should try to free some space on Tx rings and nothing > else. > > Specifically do not try to handle XDP TX or try to refill Rx buffers - > we can't use the page pool from IRQ context. Don't check if IRQs moved, > either, that makes no sense in netpoll. Netpoll calls _all_ the rings > from whatever CPU it happens to be invoked on. > > In general do as little as possible, the work quickly adds up when > there's tens of rings to poll. > > The immediate stack trace I was seeing is: > > __do_softirq+0xd1/0x2c0 > __local_bh_enable_ip+0xc7/0x120 > </IRQ> > <TASK> > page_pool_put_defragged_page+0x267/0x320 > mlx5e_free_xdpsq_desc+0x99/0xd0 > mlx5e_poll_xdpsq_cq+0x138/0x3b0 > mlx5e_napi_poll+0xc3/0x8b0 > netpoll_poll_dev+0xce/0x150 > > AFAIU page pool takes a BH lock, releases it and since BH is now > enabled tries to run softirqs. > > Fixes: 60bbf7eeef10 ("mlx5: use page_pool for xdp_return_frame call") > Signed-off-by: Jakub Kicinski <kuba@kernel.org> > --- > I'm pointing Fixes at where page_pool was added, although that's > probably not 100% fair. > > CC: saeedm@nvidia.com > CC: leon@kernel.org > CC: brouer@redhat.com > CC: tariqt@mellanox.com > --- > .../net/ethernet/mellanox/mlx5/core/en_txrx.c | 19 ++++++++++++------- > 1 file changed, 12 insertions(+), 7 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c > index a50bfda18e96..bd4294dd72da 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c > @@ -161,20 +161,25 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget) > } > } > > + /* budget=0 means we may be in IRQ context, do as little as possible */ > + if (unlikely(!budget)) { > + /* no work done, can't be asked to re-enable IRQs */ > + WARN_ON_ONCE(napi_complete_done(napi, work_done)); This is not clear why you call napi_complete_done() here ? Note the fine doc ( https://www.kernel.org/doc/html/next/networking/napi.html ) says: <quote>If the budget is 0 napi_complete_done() should never be called.</quote> > + goto out; > + } > + > busy |= mlx5e_poll_xdpsq_cq(&c->xdpsq.cq); > > if (c->xdp) > busy |= mlx5e_poll_xdpsq_cq(&c->rq_xdpsq.cq); > > - if (likely(budget)) { /* budget=0 means: don't poll rx rings */ > - if (xsk_open) > - work_done = mlx5e_poll_rx_cq(&xskrq->cq, budget); > + if (xsk_open) > + work_done = mlx5e_poll_rx_cq(&xskrq->cq, budget); > > - if (likely(budget - work_done)) > - work_done += mlx5e_poll_rx_cq(&rq->cq, budget - work_done); > + if (likely(budget - work_done)) > + work_done += mlx5e_poll_rx_cq(&rq->cq, budget - work_done); > > - busy |= work_done == budget; > - } > + busy |= work_done == budget; > > mlx5e_poll_ico_cq(&c->icosq.cq); > if (mlx5e_poll_ico_cq(&c->async_icosq.cq)) > -- > 2.40.1 >
On Mon, 15 May 2023 18:39:20 +0200 Eric Dumazet wrote: > > + /* budget=0 means we may be in IRQ context, do as little as possible */ > > + if (unlikely(!budget)) { > > + /* no work done, can't be asked to re-enable IRQs */ > > + WARN_ON_ONCE(napi_complete_done(napi, work_done)); > > This is not clear why you call napi_complete_done() here ? > > Note the fine doc ( https://www.kernel.org/doc/html/next/networking/napi.html ) > says: > > <quote>If the budget is 0 napi_complete_done() should never be called.</quote> fair point, I was trying to be conservative because I think it would have gotten called before
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c index a50bfda18e96..bd4294dd72da 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c @@ -161,20 +161,25 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget) } } + /* budget=0 means we may be in IRQ context, do as little as possible */ + if (unlikely(!budget)) { + /* no work done, can't be asked to re-enable IRQs */ + WARN_ON_ONCE(napi_complete_done(napi, work_done)); + goto out; + } + busy |= mlx5e_poll_xdpsq_cq(&c->xdpsq.cq); if (c->xdp) busy |= mlx5e_poll_xdpsq_cq(&c->rq_xdpsq.cq); - if (likely(budget)) { /* budget=0 means: don't poll rx rings */ - if (xsk_open) - work_done = mlx5e_poll_rx_cq(&xskrq->cq, budget); + if (xsk_open) + work_done = mlx5e_poll_rx_cq(&xskrq->cq, budget); - if (likely(budget - work_done)) - work_done += mlx5e_poll_rx_cq(&rq->cq, budget - work_done); + if (likely(budget - work_done)) + work_done += mlx5e_poll_rx_cq(&rq->cq, budget - work_done); - busy |= work_done == budget; - } + busy |= work_done == budget; mlx5e_poll_ico_cq(&c->icosq.cq); if (mlx5e_poll_ico_cq(&c->async_icosq.cq))
NAPI gets called with budget of 0 from netpoll, which has interrupts disabled. We should try to free some space on Tx rings and nothing else. Specifically do not try to handle XDP TX or try to refill Rx buffers - we can't use the page pool from IRQ context. Don't check if IRQs moved, either, that makes no sense in netpoll. Netpoll calls _all_ the rings from whatever CPU it happens to be invoked on. In general do as little as possible, the work quickly adds up when there's tens of rings to poll. The immediate stack trace I was seeing is: __do_softirq+0xd1/0x2c0 __local_bh_enable_ip+0xc7/0x120 </IRQ> <TASK> page_pool_put_defragged_page+0x267/0x320 mlx5e_free_xdpsq_desc+0x99/0xd0 mlx5e_poll_xdpsq_cq+0x138/0x3b0 mlx5e_napi_poll+0xc3/0x8b0 netpoll_poll_dev+0xce/0x150 AFAIU page pool takes a BH lock, releases it and since BH is now enabled tries to run softirqs. Fixes: 60bbf7eeef10 ("mlx5: use page_pool for xdp_return_frame call") Signed-off-by: Jakub Kicinski <kuba@kernel.org> --- I'm pointing Fixes at where page_pool was added, although that's probably not 100% fair. CC: saeedm@nvidia.com CC: leon@kernel.org CC: brouer@redhat.com CC: tariqt@mellanox.com --- .../net/ethernet/mellanox/mlx5/core/en_txrx.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-)