diff mbox series

[net,v4,1/4] octeon_ep: fix race conditions in ndo_get_stats64

Message ID 20250102112246.2494230-2-srasheed@marvell.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series Fix race conditions in ndo_get_stats64 | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 2 this patch: 2
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 20 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 17 this patch: 17
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-01-04--00-00 (tests: 885)

Commit Message

Shinas Rasheed Jan. 2, 2025, 11:22 a.m. UTC
ndo_get_stats64() can race with ndo_stop(), which frees input and
output queue resources. Check if netdev is running before accessing
per queue resources.

Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops")
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
---
V4:
  - Check if netdev is running, as decision for accessing resources
    rather than availing lock implementations, in ndo_get_stats64()

V3: https://lore.kernel.org/all/20241218115111.2407958-2-srasheed@marvell.com/
  - No changes 

V2: https://lore.kernel.org/all/20241216075842.2394606-2-srasheed@marvell.com/
  - Changed sync mechanism to fix race conditions from using an atomic
    set_bit ops to a much simpler synchronize_net()

V1: https://lore.kernel.org/all/20241203072130.2316913-2-srasheed@marvell.com/

 drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

Comments

Jakub Kicinski Jan. 4, 2025, 5:01 p.m. UTC | #1
On Thu, 2 Jan 2025 03:22:43 -0800 Shinas Rasheed wrote:
> diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> index 549436efc204..a452ee3b9a98 100644
> --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> @@ -995,16 +995,14 @@ static void octep_get_stats64(struct net_device *netdev,
>  	struct octep_device *oct = netdev_priv(netdev);
>  	int q;
>  
> -	if (netif_running(netdev))
> -		octep_ctrl_net_get_if_stats(oct,
> -					    OCTEP_CTRL_NET_INVALID_VFID,
> -					    &oct->iface_rx_stats,
> -					    &oct->iface_tx_stats);
> -
>  	tx_packets = 0;
>  	tx_bytes = 0;
>  	rx_packets = 0;
>  	rx_bytes = 0;
> +
> +	if (!netif_running(netdev))
> +		return;

So we'll provide no stats when the device is down? That's not correct.
The driver should save the stats from the freed queues (somewhere in
the oct structure). Also please mention how this is synchronized
against netif_running() changing its state, device may get closed while
we're running..
Shinas Rasheed Jan. 6, 2025, 5:57 a.m. UTC | #2
Hi Jakub,

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Saturday, January 4, 2025 10:31 PM
> To: Shinas Rasheed <srasheed@marvell.com>
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Haseeb Gani
> <hgani@marvell.com>; Sathesh B Edara <sedara@marvell.com>; Vimlesh
> Kumar <vimleshk@marvell.com>; thaller@redhat.com; wizhao@redhat.com;
> kheib@redhat.com; konguyen@redhat.com; horms@kernel.org;
> einstein.xue@synaxg.com; Veerasenareddy Burru <vburru@marvell.com>;
> Andrew Lunn <andrew+netdev@lunn.ch>; David S. Miller
> <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Paolo
> Abeni <pabeni@redhat.com>; Abhijit Ayarekar <aayarekar@marvell.com>;
> Satananda Burla <sburla@marvell.com>
> Subject: [EXTERNAL] Re: [PATCH net v4 1/4] octeon_ep: fix race conditions in
> ndo_get_stats64
> 
> On Thu, 2 Jan 2025 03: 22: 43 -0800 Shinas Rasheed wrote: > diff --git
> a/drivers/net/ethernet/marvell/octeon_ep/octep_main. c
> b/drivers/net/ethernet/marvell/octeon_ep/octep_main. c > index
> 549436efc204. . a452ee3b9a98 100644 > ---
> a/drivers/net/ethernet/marvell/octeon_ep/octep_main. c
> On Thu, 2 Jan 2025 03:22:43 -0800 Shinas Rasheed wrote:
> > diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > index 549436efc204..a452ee3b9a98 100644
> > --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > @@ -995,16 +995,14 @@ static void octep_get_stats64(struct net_device
> *netdev,
> >  	struct octep_device *oct = netdev_priv(netdev);
> >  	int q;
> >
> > -	if (netif_running(netdev))
> > -		octep_ctrl_net_get_if_stats(oct,
> > -					    OCTEP_CTRL_NET_INVALID_VFID,
> > -					    &oct->iface_rx_stats,
> > -					    &oct->iface_tx_stats);
> > -
> >  	tx_packets = 0;
> >  	tx_bytes = 0;
> >  	rx_packets = 0;
> >  	rx_bytes = 0;
> > +
> > +	if (!netif_running(netdev))
> > +		return;
> 
> So we'll provide no stats when the device is down? That's not correct.
> The driver should save the stats from the freed queues (somewhere in
> the oct structure). Also please mention how this is synchronized
> against netif_running() changing its state, device may get closed while
> we're running..

I ACK the 'save stats from freed queues and emit out stats when device is down'.

About the synchronization, the reason I changed to simple netif_running check was to avoid
locks (as per previous patch version comments). Please do correct me if I'm wrong, but isn't the case
you mentioned protected by the rtnl_lock held by the netdev stack when it calls the ndo_op ?

> --
> pw-bot: cr
Jakub Kicinski Jan. 6, 2025, 8:57 p.m. UTC | #3
On Mon, 6 Jan 2025 05:57:09 +0000 Shinas Rasheed wrote:
> > >  	struct octep_device *oct = netdev_priv(netdev);
> > >  	int q;
> > >
> > > -	if (netif_running(netdev))
> > > -		octep_ctrl_net_get_if_stats(oct,
> > > -					    OCTEP_CTRL_NET_INVALID_VFID,
> > > -					    &oct->iface_rx_stats,
> > > -					    &oct->iface_tx_stats);
> > > -
> > >  	tx_packets = 0;
> > >  	tx_bytes = 0;
> > >  	rx_packets = 0;
> > >  	rx_bytes = 0;
> > > +
> > > +	if (!netif_running(netdev))
> > > +		return;  
> > 
> > So we'll provide no stats when the device is down? That's not correct.
> > The driver should save the stats from the freed queues (somewhere in
> > the oct structure). Also please mention how this is synchronized
> > against netif_running() changing its state, device may get closed while
> > we're running..  
> 
> I ACK the 'save stats from freed queues and emit out stats when device is down'.
> 
> About the synchronization, the reason I changed to simple netif_running check was to avoid
> locks (as per previous patch version comments). Please do correct me if I'm wrong, but isn't the case
> you mentioned protected by the rtnl_lock held by the netdev stack when it calls the ndo_op ?

I don't see rtnl_lock being taken in the procfs path.

FWIW I posted a test for the problem you're fixing in octeon, 
since it's relatively common among drivers:
 https://lore.kernel.org/20250105011525.1718380-1-kuba@kernel.org
see also:
 https://github.com/linux-netdev/nipa/wiki/Running-driver-tests
Shinas Rasheed Jan. 7, 2025, 6:11 a.m. UTC | #4
Hi Jakub,

Thanks for the reply, will revert

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Tuesday, January 7, 2025 2:27 AM
> To: Shinas Rasheed <srasheed@marvell.com>
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Haseeb Gani
> <hgani@marvell.com>; Sathesh B Edara <sedara@marvell.com>; Vimlesh
> Kumar <vimleshk@marvell.com>; thaller@redhat.com; wizhao@redhat.com;
> kheib@redhat.com; konguyen@redhat.com; horms@kernel.org;
> einstein.xue@synaxg.com; Veerasenareddy Burru <vburru@marvell.com>;
> Andrew Lunn <andrew+netdev@lunn.ch>; David S. Miller
> <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Paolo
> Abeni <pabeni@redhat.com>; Abhijit Ayarekar <aayarekar@marvell.com>;
> Satananda Burla <sburla@marvell.com>
> Subject: Re: [EXTERNAL] Re: [PATCH net v4 1/4] octeon_ep: fix race conditions
> in ndo_get_stats64
> 
> On Mon, 6 Jan 2025 05: 57: 09 +0000 Shinas Rasheed wrote: > > > struct
> octep_device *oct = netdev_priv(netdev); > > > int q; > > > > > > - if
> (netif_running(netdev)) > > > - octep_ctrl_net_get_if_stats(oct,
> On Mon, 6 Jan 2025 05:57:09 +0000 Shinas Rasheed wrote:
> > > >  	struct octep_device *oct = netdev_priv(netdev);
> > > >  	int q;
> > > >
> > > > -	if (netif_running(netdev))
> > > > -		octep_ctrl_net_get_if_stats(oct,
> > > > -					    OCTEP_CTRL_NET_INVALID_VFID,
> > > > -					    &oct->iface_rx_stats,
> > > > -					    &oct->iface_tx_stats);
> > > > -
> > > >  	tx_packets = 0;
> > > >  	tx_bytes = 0;
> > > >  	rx_packets = 0;
> > > >  	rx_bytes = 0;
> > > > +
> > > > +	if (!netif_running(netdev))
> > > > +		return;
> > >
> > > So we'll provide no stats when the device is down? That's not correct.
> > > The driver should save the stats from the freed queues (somewhere in
> > > the oct structure). Also please mention how this is synchronized
> > > against netif_running() changing its state, device may get closed while
> > > we're running..
> >
> > I ACK the 'save stats from freed queues and emit out stats when device is
> down'.
> >
> > About the synchronization, the reason I changed to simple netif_running
> check was to avoid
> > locks (as per previous patch version comments). Please do correct me if I'm
> wrong, but isn't the case
> > you mentioned protected by the rtnl_lock held by the netdev stack when it
> calls the ndo_op ?
> 
> I don't see rtnl_lock being taken in the procfs path.
> 
> FWIW I posted a test for the problem you're fixing in octeon,
> since it's relatively common among drivers:
>  https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__lore.kernel.org_20250105011525.1718380-2D1-2Dkuba-
> 40kernel.org&d=DwICAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=1OxLD4y-
> oxrlgQ1rjXgWtmLz1pnaDjD96sDq-
> cKUwK4&m=9gsH3cuOJoFpbgNiQc2gqY6_Cugh5GeBCKFU9mmblsBxpslPW2q
> VVBa1LG7w8qmb&s=-9Gao3oSw4wAp6L8V86hli4Bmqu3Po8jfOqNOtYwL-
> o&e=
> see also:
>  https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_linux-
> 2Dnetdev_nipa_wiki_Running-2Ddriver-
> 2Dtests&d=DwICAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=1OxLD4y-
> oxrlgQ1rjXgWtmLz1pnaDjD96sDq-
> cKUwK4&m=9gsH3cuOJoFpbgNiQc2gqY6_Cugh5GeBCKFU9mmblsBxpslPW2q
> VVBa1LG7w8qmb&s=q8uPNNae_-
> 4ps18BT6XOel9HsYApsxh4IN01HF2_ARw&e=
diff mbox series

Patch

diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
index 549436efc204..a452ee3b9a98 100644
--- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
+++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
@@ -995,16 +995,14 @@  static void octep_get_stats64(struct net_device *netdev,
 	struct octep_device *oct = netdev_priv(netdev);
 	int q;
 
-	if (netif_running(netdev))
-		octep_ctrl_net_get_if_stats(oct,
-					    OCTEP_CTRL_NET_INVALID_VFID,
-					    &oct->iface_rx_stats,
-					    &oct->iface_tx_stats);
-
 	tx_packets = 0;
 	tx_bytes = 0;
 	rx_packets = 0;
 	rx_bytes = 0;
+
+	if (!netif_running(netdev))
+		return;
+
 	for (q = 0; q < oct->num_oqs; q++) {
 		struct octep_iq *iq = oct->iq[q];
 		struct octep_oq *oq = oct->oq[q];