diff mbox series

[net] mlx5e: add add missing BH locking around napi_schdule()

Message ID 20210505202026.778635-1-kuba@kernel.org (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series [net] mlx5e: add add missing BH locking around napi_schdule() | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/cc_maintainers fail 2 blamed authors not CCed: ayal@nvidia.com tariqt@nvidia.com; 5 maintainers not CCed: ayal@nvidia.com tariqt@nvidia.com leon@kernel.org davem@davemloft.net linux-rdma@vger.kernel.org
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 15 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link

Commit Message

Jakub Kicinski May 5, 2021, 8:20 p.m. UTC
It's not correct to call napi_schedule() in pure process
context. Because we use __raise_softirq_irqoff() we require
callers to be in a context which will eventually lead to
softirq handling (hardirq, bh disabled, etc.).

With code as is users will see:

 NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!

Fixes: a8dd7ac12fc3 ("net/mlx5e: Generalize RQ activation")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
We may want to patch net-next once it opens to switch
from __raise_softirq_irqoff() to raise_softirq_irqoff().
The irq_count() check is probably negligable and we'd need
to split the hardirq / non-hardirq paths completely to
keep the current behaviour. Plus what's hardirq is murky
with RT enabled..

Eric WDYT?

 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Saeed Mahameed May 18, 2021, 7:23 p.m. UTC | #1
On Wed, 2021-05-05 at 13:20 -0700, Jakub Kicinski wrote:
> It's not correct to call napi_schedule() in pure process
> context. Because we use __raise_softirq_irqoff() we require
> callers to be in a context which will eventually lead to
> softirq handling (hardirq, bh disabled, etc.).
> 
> With code as is users will see:
> 
>  NOHZ tick-stop error: Non-RCU local softirq work is pending, handler
> #08!!!
> 
> Fixes: a8dd7ac12fc3 ("net/mlx5e: Generalize RQ activation")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> We may want to patch net-next once it opens to switch
> from __raise_softirq_irqoff() to raise_softirq_irqoff().
> The irq_count() check is probably negligable and we'd need
> to split the hardirq / non-hardirq paths completely to
> keep the current behaviour. Plus what's hardirq is murky
> with RT enabled..
> 
> Eric WDYT?
> 

I was waiting for Eric to reply, Anyway i think this patch is correct
as is, 

Jakub do you want me to submit to net  via net-mlx5 branch? 

Another valid solution is that driver will avoid calling
napi_schedule() altogether from  process context,  we have the
mechanism of mlx5e_trigger_irq(), which can be utilized here, but needs
some re-factoring to move the icosq object from the main rx rq to the
containing channel object.

>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index bca832cdc4cb..11e50f5b3a1e 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -889,10 +889,13 @@ int mlx5e_open_rq(struct mlx5e_params *params,
> struct mlx5e_rq_param *param,
>  void mlx5e_activate_rq(struct mlx5e_rq *rq)
>  {
>         set_bit(MLX5E_RQ_STATE_ENABLED, &rq->state);
> -       if (rq->icosq)
> +       if (rq->icosq) {
>                 mlx5e_trigger_irq(rq->icosq);
> -       else
> +       } else {
> +               local_bh_disable();
>                 napi_schedule(rq->cq.napi);
> +               local_bh_enable();
> +       }
>  }
>  
>  void mlx5e_deactivate_rq(struct mlx5e_rq *rq)
Jakub Kicinski May 18, 2021, 7:33 p.m. UTC | #2
On Tue, 18 May 2021 12:23:54 -0700 Saeed Mahameed wrote:
> On Wed, 2021-05-05 at 13:20 -0700, Jakub Kicinski wrote:
> > It's not correct to call napi_schedule() in pure process
> > context. Because we use __raise_softirq_irqoff() we require
> > callers to be in a context which will eventually lead to
> > softirq handling (hardirq, bh disabled, etc.).
> > 
> > With code as is users will see:
> > 
> >  NOHZ tick-stop error: Non-RCU local softirq work is pending, handler
> > #08!!!
> > 
> > Fixes: a8dd7ac12fc3 ("net/mlx5e: Generalize RQ activation")
> > Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> > ---
> > We may want to patch net-next once it opens to switch
> > from __raise_softirq_irqoff() to raise_softirq_irqoff().
> > The irq_count() check is probably negligable and we'd need
> > to split the hardirq / non-hardirq paths completely to
> > keep the current behaviour. Plus what's hardirq is murky
> > with RT enabled..
> > 
> > Eric WDYT?
> 
> I was waiting for Eric to reply, Anyway i think this patch is correct
> as is, 
> 
> Jakub do you want me to submit to net  via net-mlx5 branch? 

Yes, please. FWIW we had a short exchange with RT folks last Friday,
and it doesn't seem like RT is an issue, so tglx will likely take
care of just adding a lockdep check and maybe a helper for scheduling
from process ctx.

> Another valid solution is that driver will avoid calling
> napi_schedule() altogether from  process context,  we have the
> mechanism of mlx5e_trigger_irq(), which can be utilized here, but needs
> some re-factoring to move the icosq object from the main rx rq to the
> containing channel object.

Yea.. someone on your side would probably need to take care of that
kind of surgery. Apart from that no preference on which fix gets
applied.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index bca832cdc4cb..11e50f5b3a1e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -889,10 +889,13 @@  int mlx5e_open_rq(struct mlx5e_params *params, struct mlx5e_rq_param *param,
 void mlx5e_activate_rq(struct mlx5e_rq *rq)
 {
 	set_bit(MLX5E_RQ_STATE_ENABLED, &rq->state);
-	if (rq->icosq)
+	if (rq->icosq) {
 		mlx5e_trigger_irq(rq->icosq);
-	else
+	} else {
+		local_bh_disable();
 		napi_schedule(rq->cq.napi);
+		local_bh_enable();
+	}
 }
 
 void mlx5e_deactivate_rq(struct mlx5e_rq *rq)