diff mbox series

[net,v3] octeon_ep_vf: Resolve netdevice usage count issue

Message ID 20250416102533.9959-1-sedara@marvell.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net,v3] octeon_ep_vf: Resolve netdevice usage count issue | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 2 this patch: 2
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 2 this patch: 2
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 14 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 6 this patch: 6
netdev/source_inline success Was 0 now: 0
netdev/contest warning net-next-2025-04-16--12-00 (tests: 794)

Commit Message

Sathesh B Edara April 16, 2025, 10:25 a.m. UTC
The netdevice usage count increases during transmit queue timeouts
because netdev_hold is called in ndo_tx_timeout, scheduling a task
to reinitialize the card. Although netdev_put is called at the end
of the scheduled work, rtnl_unlock checks the reference count during
cleanup. This could cause issues if transmit timeout is called on
multiple queues. Therefore, netdev_hold and netdev_put have been removed.

Fixes: cb7dd712189f ("octeon_ep_vf: Add driver framework and device initialization")
Signed-off-by: Sathesh B Edara <sedara@marvell.com>
---
Changes:
V3:
  - Added more description to commit message
V2:
  - Removed redundant call

 drivers/net/ethernet/marvell/octeon_ep_vf/octep_vf_main.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Jacob Keller April 16, 2025, 8:26 p.m. UTC | #1
On 4/16/2025 3:25 AM, Sathesh B Edara wrote:
> The netdevice usage count increases during transmit queue timeouts
> because netdev_hold is called in ndo_tx_timeout, scheduling a task
> to reinitialize the card. Although netdev_put is called at the end
> of the scheduled work, rtnl_unlock checks the reference count during
> cleanup. This could cause issues if transmit timeout is called on
> multiple queues. Therefore, netdev_hold and netdev_put have been removed.
> 
> Fixes: cb7dd712189f ("octeon_ep_vf: Add driver framework and device initialization")
> Signed-off-by: Sathesh B Edara <sedara@marvell.com>
> ---
> Changes:
> V3:
>   - Added more description to commit message
> V2:
>   - Removed redundant call
> 
>  drivers/net/ethernet/marvell/octeon_ep_vf/octep_vf_main.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/marvell/octeon_ep_vf/octep_vf_main.c b/drivers/net/ethernet/marvell/octeon_ep_vf/octep_vf_main.c
> index 18c922dd5fc6..5d033bc66bdf 100644
> --- a/drivers/net/ethernet/marvell/octeon_ep_vf/octep_vf_main.c
> +++ b/drivers/net/ethernet/marvell/octeon_ep_vf/octep_vf_main.c
> @@ -819,7 +819,6 @@ static void octep_vf_tx_timeout_task(struct work_struct *work)
>  		octep_vf_open(netdev);
>  	}
>  	rtnl_unlock();
> -	netdev_put(netdev, NULL);
>  }
>  
>  /**
> @@ -834,7 +833,6 @@ static void octep_vf_tx_timeout(struct net_device *netdev, unsigned int txqueue)
>  {
>  	struct octep_vf_device *oct = netdev_priv(netdev);
>  
> -	netdev_hold(netdev, NULL, GFP_ATOMIC);
>  	schedule_work(&oct->tx_timeout_task);
>  }
I guess the thought was that we need to hold because we scheduled a work
item?

Presumably the driver would simply cancel_work_sync() on this timeout
task before it attempts to release its own reference on the netdev, so
this really doesn't protect anything.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

>
Jakub Kicinski April 18, 2025, 2:25 a.m. UTC | #2
On Wed, 16 Apr 2025 13:26:43 -0700 Jacob Keller wrote:
> > @@ -834,7 +833,6 @@ static void octep_vf_tx_timeout(struct net_device *netdev, unsigned int txqueue)
> >  {
> >  	struct octep_vf_device *oct = netdev_priv(netdev);
> >  
> > -	netdev_hold(netdev, NULL, GFP_ATOMIC);
> >  	schedule_work(&oct->tx_timeout_task);
> >  }  
> I guess the thought was that we need to hold because we scheduled a work
> item?

Looks like something I would have asked them to do :)
But it was probably merged before I could review next version ?

I mean, passing NULL for the tracker is... quite something.

> Presumably the driver would simply cancel_work_sync() on this timeout
> task before it attempts to release its own reference on the netdev, so
> this really doesn't protect anything.

It does, but before unregistering :/

Sathesh, schedule_work() returns a value. You should use it.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/marvell/octeon_ep_vf/octep_vf_main.c b/drivers/net/ethernet/marvell/octeon_ep_vf/octep_vf_main.c
index 18c922dd5fc6..5d033bc66bdf 100644
--- a/drivers/net/ethernet/marvell/octeon_ep_vf/octep_vf_main.c
+++ b/drivers/net/ethernet/marvell/octeon_ep_vf/octep_vf_main.c
@@ -819,7 +819,6 @@  static void octep_vf_tx_timeout_task(struct work_struct *work)
 		octep_vf_open(netdev);
 	}
 	rtnl_unlock();
-	netdev_put(netdev, NULL);
 }
 
 /**
@@ -834,7 +833,6 @@  static void octep_vf_tx_timeout(struct net_device *netdev, unsigned int txqueue)
 {
 	struct octep_vf_device *oct = netdev_priv(netdev);
 
-	netdev_hold(netdev, NULL, GFP_ATOMIC);
 	schedule_work(&oct->tx_timeout_task);
 }