diff mbox series

[net,v2,1/4] octeon_ep: fix race conditions in ndo_get_stats64

Message ID 20241216075842.2394606-2-srasheed@marvell.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series Fix race conditions in ndo_get_stats64 | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 5 this patch: 5
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 7 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 17 this patch: 17
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-12-16--09-00 (tests: 795)

Commit Message

Shinas Rasheed Dec. 16, 2024, 7:58 a.m. UTC
ndo_get_stats64() can race with ndo_stop(), which frees input and
output queue resources. Call synchronize_net() to avoid such races.

Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops")
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
---
V2:
  - Changed sync mechanism to fix race conditions from using an atomic
    set_bit ops to a much simpler synchronize_net()

V1: https://lore.kernel.org/all/20241203072130.2316913-2-srasheed@marvell.com/

 drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Larysa Zaremba Dec. 16, 2024, 2:30 p.m. UTC | #1
On Sun, Dec 15, 2024 at 11:58:39PM -0800, Shinas Rasheed wrote:
> ndo_get_stats64() can race with ndo_stop(), which frees input and
> output queue resources. Call synchronize_net() to avoid such races.
> 
> Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops")
> Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
> ---
> V2:
>   - Changed sync mechanism to fix race conditions from using an atomic
>     set_bit ops to a much simpler synchronize_net()
> 
> V1: https://lore.kernel.org/all/20241203072130.2316913-2-srasheed@marvell.com/
> 
>  drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> index 549436efc204..941bbaaa67b5 100644
> --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> @@ -757,6 +757,7 @@ static int octep_stop(struct net_device *netdev)
>  {
>  	struct octep_device *oct = netdev_priv(netdev);
>  
> +	synchronize_net();

You should have elaborated on the fact that this synchronize_net() is for 
__LINK_STATE_START flag in the commit message, this is not obvious. Also, is 
octep_get_stats64() called from RCU-safe context?

>  	netdev_info(netdev, "Stopping the device ...\n");
>  
>  	octep_ctrl_net_set_link_status(oct, OCTEP_CTRL_NET_INVALID_VFID, false,
> -- 
> 2.25.1
> 
>
Larysa Zaremba Dec. 16, 2024, 2:46 p.m. UTC | #2
On Mon, Dec 16, 2024 at 03:30:12PM +0100, Larysa Zaremba wrote:
> On Sun, Dec 15, 2024 at 11:58:39PM -0800, Shinas Rasheed wrote:
> > ndo_get_stats64() can race with ndo_stop(), which frees input and
> > output queue resources. Call synchronize_net() to avoid such races.
> > 
> > Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops")
> > Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
> > ---
> > V2:
> >   - Changed sync mechanism to fix race conditions from using an atomic
> >     set_bit ops to a much simpler synchronize_net()
> > 
> > V1: https://lore.kernel.org/all/20241203072130.2316913-2-srasheed@marvell.com/
> > 
> >  drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > index 549436efc204..941bbaaa67b5 100644
> > --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > @@ -757,6 +757,7 @@ static int octep_stop(struct net_device *netdev)
> >  {
> >  	struct octep_device *oct = netdev_priv(netdev);
> >  
> > +	synchronize_net();
> 
> You should have elaborated on the fact that this synchronize_net() is for 
> __LINK_STATE_START flag in the commit message, this is not obvious. Also, is 
> octep_get_stats64() called from RCU-safe context?
>

Now I see that in case !netif_running(), you do not bail out of 
octep_get_stats64() fully (or at all after the second patch). So, could you 
explain, how are you utilizing RCU here?

> >  	netdev_info(netdev, "Stopping the device ...\n");
> >  
> >  	octep_ctrl_net_set_link_status(oct, OCTEP_CTRL_NET_INVALID_VFID, false,
> > -- 
> > 2.25.1
> > 
> > 
>
Shinas Rasheed Dec. 16, 2024, 6:24 p.m. UTC | #3
Hi Larysa,

> On Sun, Dec 15, 2024 at 11:58:39PM -0800, Shinas Rasheed wrote:
> > ndo_get_stats64() can race with ndo_stop(), which frees input and
> > output queue resources. Call synchronize_net() to avoid such races.
> >
> > Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops")
> > Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
> > ---
> > V2:
> >   - Changed sync mechanism to fix race conditions from using an atomic
> >     set_bit ops to a much simpler synchronize_net()
> >
> > V1: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__lore.kernel.org_all_20241203072130.2316913-2D2-2Dsrasheed-
> 40marvell.com_&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=1OxLD4y-
> oxrlgQ1rjXgWtmLz1pnaDjD96sDq-
> cKUwK4&m=SfdHxslanlPmhVsD4m3qSuUI2eout0Tqi9o-
> DnPbfjVzv5Nte11HnOw2-aIeEN3i&s=0iv3vwi5W-
> le8dxI6EtazU7q4RlnvElcVk5MsyMm_0k&e=
> >
> >  drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > index 549436efc204..941bbaaa67b5 100644
> > --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > @@ -757,6 +757,7 @@ static int octep_stop(struct net_device *netdev)
> >  {
> >  	struct octep_device *oct = netdev_priv(netdev);
> >
> > +	synchronize_net();
> 
> You should have elaborated on the fact that this synchronize_net() is for
> __LINK_STATE_START flag in the commit message, this is not obvious. Also, is
> octep_get_stats64() called from RCU-safe context?

The octep_get_stats64() is an instance of the ndo_get_stats64() netdev op. As I understand, the netdev op is supposed to be
called from RCU-safe contexts. Please do correct me if I'm wrong.
Shinas Rasheed Dec. 16, 2024, 6:28 p.m. UTC | #4
Hi Larysa,

> On Mon, Dec 16, 2024 at 03:30:12PM +0100, Larysa Zaremba wrote:
> > On Sun, Dec 15, 2024 at 11:58:39PM -0800, Shinas Rasheed wrote:
> > > ndo_get_stats64() can race with ndo_stop(), which frees input and
> > > output queue resources. Call synchronize_net() to avoid such races.
> > >
> > > Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops")
> > > Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
> > > ---
> > > V2:
> > >   - Changed sync mechanism to fix race conditions from using an atomic
> > >     set_bit ops to a much simpler synchronize_net()
> > >
> > > V1: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__lore.kernel.org_all_20241203072130.2316913-2D2-2Dsrasheed-
> 40marvell.com_&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=1OxLD4y-
> oxrlgQ1rjXgWtmLz1pnaDjD96sDq-cKUwK4&m=Dh7BH5wsuCdQnE-
> 4erjptaJnM42YsLU2tY4wPn5NWqwsymkNOllPfQAkomj1mXPN&s=IjWHk3SOqr
> ibgv6kz-WTL8VfGVInSu5DzKSbcjCFIvk&e=
> > >
> > >  drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > index 549436efc204..941bbaaa67b5 100644
> > > --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > @@ -757,6 +757,7 @@ static int octep_stop(struct net_device *netdev)
> > >  {
> > >  	struct octep_device *oct = netdev_priv(netdev);
> > >
> > > +	synchronize_net();
> >
> > You should have elaborated on the fact that this synchronize_net() is for
> > __LINK_STATE_START flag in the commit message, this is not obvious. Also,
> is
> > octep_get_stats64() called from RCU-safe context?
> >
> 
> Now I see that in case !netif_running(), you do not bail out of
> octep_get_stats64() fully (or at all after the second patch). So, could you
> explain, how are you utilizing RCU here?
> 

The understanding is that octep_get_stats64() (.ndo_get_stats64() in turn) is called from RCU safe contexts, and
that the netdev op is never called after the ndo_stop().

Thanks for the comments
Larysa Zaremba Dec. 17, 2024, 5:06 p.m. UTC | #5
On Mon, Dec 16, 2024 at 06:28:13PM +0000, Shinas Rasheed wrote:
> Hi Larysa,
> 
> > On Mon, Dec 16, 2024 at 03:30:12PM +0100, Larysa Zaremba wrote:
> > > On Sun, Dec 15, 2024 at 11:58:39PM -0800, Shinas Rasheed wrote:
> > > > ndo_get_stats64() can race with ndo_stop(), which frees input and
> > > > output queue resources. Call synchronize_net() to avoid such races.
> > > >
> > > > Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops")
> > > > Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
> > > > ---
> > > > V2:
> > > >   - Changed sync mechanism to fix race conditions from using an atomic
> > > >     set_bit ops to a much simpler synchronize_net()
> > > >
> > > > V1: https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__lore.kernel.org_all_20241203072130.2316913-2D2-2Dsrasheed-
> > 40marvell.com_&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=1OxLD4y-
> > oxrlgQ1rjXgWtmLz1pnaDjD96sDq-cKUwK4&m=Dh7BH5wsuCdQnE-
> > 4erjptaJnM42YsLU2tY4wPn5NWqwsymkNOllPfQAkomj1mXPN&s=IjWHk3SOqr
> > ibgv6kz-WTL8VfGVInSu5DzKSbcjCFIvk&e=
> > > >
> > > >  drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > >
> > > > diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > > index 549436efc204..941bbaaa67b5 100644
> > > > --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > > +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > > @@ -757,6 +757,7 @@ static int octep_stop(struct net_device *netdev)
> > > >  {
> > > >  	struct octep_device *oct = netdev_priv(netdev);
> > > >
> > > > +	synchronize_net();
> > >
> > > You should have elaborated on the fact that this synchronize_net() is for
> > > __LINK_STATE_START flag in the commit message, this is not obvious. Also,
> > is
> > > octep_get_stats64() called from RCU-safe context?
> > >
> > 
> > Now I see that in case !netif_running(), you do not bail out of
> > octep_get_stats64() fully (or at all after the second patch). So, could you
> > explain, how are you utilizing RCU here?
> > 
> 
> The understanding is that octep_get_stats64() (.ndo_get_stats64() in turn) is called from RCU safe contexts, and
> that the netdev op is never called after the ndo_stop().

As I now see, in net/core/net-sysfs.c, yes there is an rcu read lock around the 
thing, but there are a lot more callers and for example veth_get_stats64() 
explicitly calls rcu_read_lock().

Also, even with RCU-protected section, I am not sure prevents the 
octep_get_stats64() to be called after synchronize_net() finishes. Again, the 
callers seem too diverse to definitely say that we can rely on built-in flags 
for this to not happen :/

> 
> Thanks for the comments
Shinas Rasheed Dec. 17, 2024, 5:49 p.m. UTC | #6
> > > On Mon, Dec 16, 2024 at 03:30:12PM +0100, Larysa Zaremba wrote:
> > > > On Sun, Dec 15, 2024 at 11:58:39PM -0800, Shinas Rasheed wrote:
> > > > > ndo_get_stats64() can race with ndo_stop(), which frees input and
> > > > > output queue resources. Call synchronize_net() to avoid such races.
> > > > >
> > > > > Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops")
> > > > > Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
> > > > > ---
> > > > > V2:
> > > > >   - Changed sync mechanism to fix race conditions from using an
> atomic
> > > > >     set_bit ops to a much simpler synchronize_net()
> > > > >
> > > > >  drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 +
> > > > >  1 file changed, 1 insertion(+)
> > > > >
> > > > > diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > > > index 549436efc204..941bbaaa67b5 100644
> > > > > --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > > > +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > > > @@ -757,6 +757,7 @@ static int octep_stop(struct net_device
> *netdev)
> > > > >  {
> > > > >  	struct octep_device *oct = netdev_priv(netdev);
> > > > >
> > > > > +	synchronize_net();
> > > >
> > > > You should have elaborated on the fact that this synchronize_net() is for
> > > > __LINK_STATE_START flag in the commit message, this is not obvious.
> Also,
> > > is
> > > > octep_get_stats64() called from RCU-safe context?
> > > >
> > >
> > > Now I see that in case !netif_running(), you do not bail out of
> > > octep_get_stats64() fully (or at all after the second patch). So, could you
> > > explain, how are you utilizing RCU here?
> > >
> >
> > The understanding is that octep_get_stats64() (.ndo_get_stats64() in turn) is
> called from RCU safe contexts, and
> > that the netdev op is never called after the ndo_stop().
> 
> As I now see, in net/core/net-sysfs.c, yes there is an rcu read lock around the
> thing, but there are a lot more callers and for example veth_get_stats64()
> explicitly calls rcu_read_lock().
> 
> Also, even with RCU-protected section, I am not sure prevents the
> octep_get_stats64() to be called after synchronize_net() finishes. Again, the
> callers seem too diverse to definitely say that we can rely on built-in flags
> for this to not happen :/

Usually, the understanding is that ndo_get_stats won't be called by the network stack after the interface is put down. As long as that is the case, I don't think we should keep adding checks until there is a strong reason to do so. What do you think?

> > Thanks for the comments
Larysa Zaremba Dec. 18, 2024, 1:25 p.m. UTC | #7
On Tue, Dec 17, 2024 at 05:49:13PM +0000, Shinas Rasheed wrote:
> 
> 
> > > > On Mon, Dec 16, 2024 at 03:30:12PM +0100, Larysa Zaremba wrote:
> > > > > On Sun, Dec 15, 2024 at 11:58:39PM -0800, Shinas Rasheed wrote:
> > > > > > ndo_get_stats64() can race with ndo_stop(), which frees input and
> > > > > > output queue resources. Call synchronize_net() to avoid such races.
> > > > > >
> > > > > > Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops")
> > > > > > Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
> > > > > > ---
> > > > > > V2:
> > > > > >   - Changed sync mechanism to fix race conditions from using an
> > atomic
> > > > > >     set_bit ops to a much simpler synchronize_net()
> > > > > >
> > > > > >  drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 +
> > > > > >  1 file changed, 1 insertion(+)
> > > > > >
> > > > > > diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > > b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > > > > index 549436efc204..941bbaaa67b5 100644
> > > > > > --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > > > > +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > > > > @@ -757,6 +757,7 @@ static int octep_stop(struct net_device
> > *netdev)
> > > > > >  {
> > > > > >  	struct octep_device *oct = netdev_priv(netdev);
> > > > > >
> > > > > > +	synchronize_net();
> > > > >
> > > > > You should have elaborated on the fact that this synchronize_net() is for
> > > > > __LINK_STATE_START flag in the commit message, this is not obvious.
> > Also,
> > > > is
> > > > > octep_get_stats64() called from RCU-safe context?
> > > > >
> > > >
> > > > Now I see that in case !netif_running(), you do not bail out of
> > > > octep_get_stats64() fully (or at all after the second patch). So, could you
> > > > explain, how are you utilizing RCU here?
> > > >
> > >
> > > The understanding is that octep_get_stats64() (.ndo_get_stats64() in turn) is
> > called from RCU safe contexts, and
> > > that the netdev op is never called after the ndo_stop().
> > 
> > As I now see, in net/core/net-sysfs.c, yes there is an rcu read lock around the
> > thing, but there are a lot more callers and for example veth_get_stats64()
> > explicitly calls rcu_read_lock().
> > 
> > Also, even with RCU-protected section, I am not sure prevents the
> > octep_get_stats64() to be called after synchronize_net() finishes. Again, the
> > callers seem too diverse to definitely say that we can rely on built-in flags
> > for this to not happen :/
> 
> Usually, the understanding is that ndo_get_stats won't be called by the network stack after the interface is put down. As long as that is the case, I don't think we should keep adding checks until there is a strong reason to do so. What do you think?
>

It is hard to know without testing (but testing should not be hard). I think the 
phrase "Statistics must persist across routine operations like bringing the 
interface down and up." [0] implies that bringing the interface down may not 
necessarily prevent stats calls.

[0] https://docs.kernel.org/networking/statistics.html
 
> > > Thanks for the comments
Shinas Rasheed Dec. 18, 2024, 1:58 p.m. UTC | #8
Hi Larysa,

> > > > > On Mon, Dec 16, 2024 at 03:30:12PM +0100, Larysa Zaremba wrote:
> > > > > > On Sun, Dec 15, 2024 at 11:58:39PM -0800, Shinas Rasheed wrote:
> > > > > > > ndo_get_stats64() can race with ndo_stop(), which frees input and
> > > > > > > output queue resources. Call synchronize_net() to avoid such
> races.
> > > > > > >
> > > > > > > Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops")
> > > > > > > Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
> > > > > > > ---
> > > > > > > V2:
> > > > > > >   - Changed sync mechanism to fix race conditions from using an
> > > atomic
> > > > > > >     set_bit ops to a much simpler synchronize_net()
> > > > > > >
> > > > > > >  drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 +
> > > > > > >  1 file changed, 1 insertion(+)
> > > > > > >
> > > > > > > diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > > > b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > > > > > index 549436efc204..941bbaaa67b5 100644
> > > > > > > --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > > > > > +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > > > > > @@ -757,6 +757,7 @@ static int octep_stop(struct net_device
> > > *netdev)
> > > > > > >  {
> > > > > > >  	struct octep_device *oct = netdev_priv(netdev);
> > > > > > >
> > > > > > > +	synchronize_net();
> > > > > >
> > > > > > You should have elaborated on the fact that this synchronize_net() is
> for
> > > > > > __LINK_STATE_START flag in the commit message, this is not obvious.
> > > Also,
> > > > > is
> > > > > > octep_get_stats64() called from RCU-safe context?
> > > > > >
> > > > >
> > > > > Now I see that in case !netif_running(), you do not bail out of
> > > > > octep_get_stats64() fully (or at all after the second patch). So, could
> you
> > > > > explain, how are you utilizing RCU here?
> > > > >
> > > >
> > > > The understanding is that octep_get_stats64() (.ndo_get_stats64() in
> turn) is
> > > called from RCU safe contexts, and
> > > > that the netdev op is never called after the ndo_stop().
> > >
> > > As I now see, in net/core/net-sysfs.c, yes there is an rcu read lock around
> the
> > > thing, but there are a lot more callers and for example veth_get_stats64()
> > > explicitly calls rcu_read_lock().
> > >
> > > Also, even with RCU-protected section, I am not sure prevents the
> > > octep_get_stats64() to be called after synchronize_net() finishes. Again,
> the
> > > callers seem too diverse to definitely say that we can rely on built-in flags
> > > for this to not happen :/
> >
> > Usually, the understanding is that ndo_get_stats won't be called by the
> network stack after the interface is put down. As long as that is the case, I
> don't think we should keep adding checks until there is a strong reason to do
> so. What do you think?
> >
> 
> It is hard to know without testing (but testing should not be hard). I think the
> phrase "Statistics must persist across routine operations like bringing the
> interface down and up." [0] implies that bringing the interface down may not
> necessarily prevent stats calls.
> 
> [0] https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__docs.kernel.org_networking_statistics.html&d=DwIBAg&c=nKjWec2b6R0
> mOyPaz7xtfQ&r=1OxLD4y-oxrlgQ1rjXgWtmLz1pnaDjD96sDq-
> cKUwK4&m=DJzJNo9WT10pSHikJhCBbN7-CfB-
> O2kz9OVGsmiIRQXvcIIWDK6034tMmzZGvlFs&s=essE01suLWF42taNi0yJ3H3YC
> 0Et8GofMj5wxor9yD4&e=
>

Sorry, I misworded my previous statement. Of course ndo_get_stats can get called while the netdev is down. This is tested code, and the reason why there is no issue in this scenario is because for a ndo_get_stats call that happens after ndo_stop() happens, the oct->num_oqs will be seen as 0, and hence octep_iq nor octep_oq is not accessed in the for loop that follows. Octep_iq and octep_oq are the resources that we're trying to protect from race conditions. Hope that clarifies things.


> > > > Thanks for the comments
Eric Dumazet Dec. 18, 2024, 2:21 p.m. UTC | #9
On Wed, Dec 18, 2024 at 2:25 PM Larysa Zaremba <larysa.zaremba@intel.com> wrote:

>
> It is hard to know without testing (but testing should not be hard). I think the
> phrase "Statistics must persist across routine operations like bringing the
> interface down and up." [0] implies that bringing the interface down may not
> necessarily prevent stats calls.

Please don't  add workarounds to individual drivers.

I think the core networking stack should handle the possible races.

Most dev_get_stats() callers are correctly testing dev_isalive() or
are protected by RTNL.

There are few nested cases that are not properly handled, the
following patch should take care of them.


diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2593019ad5b1614f3b8c037afb4ba4fa740c7d51..768afc2a18d343d051e7a1b631124910af9922d2
100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -5342,6 +5342,12 @@ static inline const char
*netdev_reg_state(const struct net_device *dev)
        return " (unknown)";
 }

+/* Caller holds RTNL or RCU */
+static inline int dev_isalive(const struct net_device *dev)
+{
+       return READ_ONCE(dev->reg_state) <= NETREG_REGISTERED;
+}
+
 #define MODULE_ALIAS_NETDEV(device) \
        MODULE_ALIAS("netdev-" device)

diff --git a/net/core/dev.c b/net/core/dev.c
index c7f3dea3e0eb9eb05865e49dd7a8535afb974149..f11f305f3136f208fcb285c7b314914aef20dfad
100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -11044,8 +11044,13 @@ struct rtnl_link_stats64
*dev_get_stats(struct net_device *dev,
        const struct net_device_ops *ops = dev->netdev_ops;
        const struct net_device_core_stats __percpu *p;

+       memset(storage, 0, sizeof(*storage));
+       rcu_read_lock();
+
+       if (unlikely(!dev_isalive(dev)))
+               goto unlock;
+
        if (ops->ndo_get_stats64) {
-               memset(storage, 0, sizeof(*storage));
                ops->ndo_get_stats64(dev, storage);
        } else if (ops->ndo_get_stats) {
                netdev_stats_to_stats64(storage, ops->ndo_get_stats(dev));
@@ -11071,6 +11076,8 @@ struct rtnl_link_stats64 *dev_get_stats(struct
net_device *dev,
                        storage->rx_otherhost_dropped +=
READ_ONCE(core_stats->rx_otherhost_dropped);
                }
        }
+unlock:
+       rcu_read_unlock();
        return storage;
 }
 EXPORT_SYMBOL(dev_get_stats);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 2d9afc6e2161efa51ffa62813ec10c8f43944bce..3f4851d67015c959dd531c571c46fc2ac18beb65
100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -36,12 +36,6 @@ static const char fmt_uint[] = "%u\n";
 static const char fmt_ulong[] = "%lu\n";
 static const char fmt_u64[] = "%llu\n";

-/* Caller holds RTNL or RCU */
-static inline int dev_isalive(const struct net_device *dev)
-{
-       return READ_ONCE(dev->reg_state) <= NETREG_REGISTERED;
-}
-
 /* use same locking rules as GIF* ioctl's */
 static ssize_t netdev_show(const struct device *dev,
                           struct device_attribute *attr, char *buf,
Larysa Zaremba Dec. 18, 2024, 2:49 p.m. UTC | #10
On Wed, Dec 18, 2024 at 03:21:12PM +0100, Eric Dumazet wrote:
> On Wed, Dec 18, 2024 at 2:25 PM Larysa Zaremba <larysa.zaremba@intel.com> wrote:
> 
> >
> > It is hard to know without testing (but testing should not be hard). I think the
> > phrase "Statistics must persist across routine operations like bringing the
> > interface down and up." [0] implies that bringing the interface down may not
> > necessarily prevent stats calls.
> 
> Please don't  add workarounds to individual drivers.
> 
> I think the core networking stack should handle the possible races.
> 
> Most dev_get_stats() callers are correctly testing dev_isalive() or
> are protected by RTNL.
> 
> There are few nested cases that are not properly handled, the
> following patch should take care of them.
>

I was under the impression that .ndo_stop() being called does not mean the 
device stops being NETREG_REGISTERED, such link would be required to solve the 
original problem  with your patch alone (though it is generally a good change).
Could you please explain this relation?

> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 2593019ad5b1614f3b8c037afb4ba4fa740c7d51..768afc2a18d343d051e7a1b631124910af9922d2
> 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -5342,6 +5342,12 @@ static inline const char
> *netdev_reg_state(const struct net_device *dev)
>         return " (unknown)";
>  }
> 
> +/* Caller holds RTNL or RCU */
> +static inline int dev_isalive(const struct net_device *dev)
> +{
> +       return READ_ONCE(dev->reg_state) <= NETREG_REGISTERED;
> +}
> +
>  #define MODULE_ALIAS_NETDEV(device) \
>         MODULE_ALIAS("netdev-" device)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index c7f3dea3e0eb9eb05865e49dd7a8535afb974149..f11f305f3136f208fcb285c7b314914aef20dfad
> 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -11044,8 +11044,13 @@ struct rtnl_link_stats64
> *dev_get_stats(struct net_device *dev,
>         const struct net_device_ops *ops = dev->netdev_ops;
>         const struct net_device_core_stats __percpu *p;
> 
> +       memset(storage, 0, sizeof(*storage));
> +       rcu_read_lock();
> +
> +       if (unlikely(!dev_isalive(dev)))
> +               goto unlock;
> +
>         if (ops->ndo_get_stats64) {
> -               memset(storage, 0, sizeof(*storage));
>                 ops->ndo_get_stats64(dev, storage);
>         } else if (ops->ndo_get_stats) {
>                 netdev_stats_to_stats64(storage, ops->ndo_get_stats(dev));
> @@ -11071,6 +11076,8 @@ struct rtnl_link_stats64 *dev_get_stats(struct
> net_device *dev,
>                         storage->rx_otherhost_dropped +=
> READ_ONCE(core_stats->rx_otherhost_dropped);
>                 }
>         }
> +unlock:
> +       rcu_read_unlock();
>         return storage;
>  }
>  EXPORT_SYMBOL(dev_get_stats);
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index 2d9afc6e2161efa51ffa62813ec10c8f43944bce..3f4851d67015c959dd531c571c46fc2ac18beb65
> 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -36,12 +36,6 @@ static const char fmt_uint[] = "%u\n";
>  static const char fmt_ulong[] = "%lu\n";
>  static const char fmt_u64[] = "%llu\n";
> 
> -/* Caller holds RTNL or RCU */
> -static inline int dev_isalive(const struct net_device *dev)
> -{
> -       return READ_ONCE(dev->reg_state) <= NETREG_REGISTERED;
> -}
> -
>  /* use same locking rules as GIF* ioctl's */
>  static ssize_t netdev_show(const struct device *dev,
>                            struct device_attribute *attr, char *buf,
Eric Dumazet Dec. 18, 2024, 3:03 p.m. UTC | #11
On Wed, Dec 18, 2024 at 3:49 PM Larysa Zaremba <larysa.zaremba@intel.com> wrote:
>
> On Wed, Dec 18, 2024 at 03:21:12PM +0100, Eric Dumazet wrote:
> > On Wed, Dec 18, 2024 at 2:25 PM Larysa Zaremba <larysa.zaremba@intel.com> wrote:
> >
> > >
> > > It is hard to know without testing (but testing should not be hard). I think the
> > > phrase "Statistics must persist across routine operations like bringing the
> > > interface down and up." [0] implies that bringing the interface down may not
> > > necessarily prevent stats calls.
> >
> > Please don't  add workarounds to individual drivers.
> >
> > I think the core networking stack should handle the possible races.
> >
> > Most dev_get_stats() callers are correctly testing dev_isalive() or
> > are protected by RTNL.
> >
> > There are few nested cases that are not properly handled, the
> > following patch should take care of them.
> >
>
> I was under the impression that .ndo_stop() being called does not mean the
> device stops being NETREG_REGISTERED, such link would be required to solve the
> original problem  with your patch alone (though it is generally a good change).
> Could you please explain this relation?
>

ndo_stop() being called must have no impact on statistics :

# ip -s link sh dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    RX:  bytes packets errors dropped  missed   mcast
       3473568   41352      0       0       0       0
    TX:  bytes packets errors dropped carrier collsns
       3473568   41352      0       0       0       0

# ip link set dev lo down  # would call ndo_stop() if loopback had one

# ip -s link sh dev lo
1: lo: <LOOPBACK> mtu 65536 qdisc noqueue state DOWN mode DEFAULT
group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    RX:  bytes packets errors dropped  missed   mcast
       3473568   41352      0       0       0       0
    TX:  bytes packets errors dropped carrier collsns
       3473568   41352      0       0       0       0


So perhaps the problem with this driver is that its ndo_stop() is
doing things it should not.
Shinas Rasheed Dec. 18, 2024, 3:25 p.m. UTC | #12
> > On Wed, Dec 18, 2024 at 03:21:12PM +0100, Eric Dumazet wrote:
> > > On Wed, Dec 18, 2024 at 2:25 PM Larysa Zaremba
> <larysa.zaremba@intel.com> wrote:
> > >
> > > >
> > > > It is hard to know without testing (but testing should not be hard). I
> think the
> > > > phrase "Statistics must persist across routine operations like bringing the
> > > > interface down and up." [0] implies that bringing the interface down
> may not
> > > > necessarily prevent stats calls.
> > >
> > > Please don't  add workarounds to individual drivers.
> > >
> > > I think the core networking stack should handle the possible races.
> > >
> > > Most dev_get_stats() callers are correctly testing dev_isalive() or
> > > are protected by RTNL.
> > >
> > > There are few nested cases that are not properly handled, the
> > > following patch should take care of them.
> > >
> >
> > I was under the impression that .ndo_stop() being called does not mean the
> > device stops being NETREG_REGISTERED, such link would be required to
> solve the
> > original problem  with your patch alone (though it is generally a good
> change).
> > Could you please explain this relation?
> >
> 
> ndo_stop() being called must have no impact on statistics :
> 
> # ip -s link sh dev lo
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
> mode DEFAULT group default qlen 1000
>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>     RX:  bytes packets errors dropped  missed   mcast
>        3473568   41352      0       0       0       0
>     TX:  bytes packets errors dropped carrier collsns
>        3473568   41352      0       0       0       0
> 
> # ip link set dev lo down  # would call ndo_stop() if loopback had one
> 
> # ip -s link sh dev lo
> 1: lo: <LOOPBACK> mtu 65536 qdisc noqueue state DOWN mode DEFAULT
> group default qlen 1000
>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>     RX:  bytes packets errors dropped  missed   mcast
>        3473568   41352      0       0       0       0
>     TX:  bytes packets errors dropped carrier collsns
>        3473568   41352      0       0       0       0
> 
> 
> So perhaps the problem with this driver is that its ndo_stop() is
> doing things it should not.

Hi Eric,

This patch is not a workaround. In some setups, we were seeing races with regards
to resource freeing between ndo_stop() and ndo_get_stats(). Hence to sync with the view of
resources, a synchronize_net() is called in ndo_stop(). Please let me know if you see anything wrong here.

Thanks
Eric Dumazet Dec. 18, 2024, 3:39 p.m. UTC | #13
On Wed, Dec 18, 2024 at 4:25 PM Shinas Rasheed <srasheed@marvell.com> wrote:

> Hi Eric,
>
> This patch is not a workaround. In some setups, we were seeing races with regards
> to resource freeing between ndo_stop() and ndo_get_stats(). Hence to sync with the view of
> resources, a synchronize_net() is called in ndo_stop(). Please let me know if you see anything wrong here.

We do not add a synchronize_net() without a very strong explanation
(details, not a weak sentence in the changelog).

Where is the opposite barrier in your patch ?

I am saying you do not need this, unless you can show evidence.

If your ndo_get_stats() needs to call netif_running(), this would be
the fix IMO.
Shinas Rasheed Dec. 19, 2024, 4:28 p.m. UTC | #14
Hi Eric,

> On Wed, Dec 18, 2024 at 4:25 PM Shinas Rasheed <srasheed@marvell.com>
> wrote:
> 
> > Hi Eric,
> >
> > This patch is not a workaround. In some setups, we were seeing races with
> regards
> > to resource freeing between ndo_stop() and ndo_get_stats(). Hence to sync
> with the view of
> > resources, a synchronize_net() is called in ndo_stop(). Please let me know if
> you see anything wrong here.
> 
> We do not add a synchronize_net() without a very strong explanation
> (details, not a weak sentence in the changelog).
> 
> Where is the opposite barrier in your patch ?
> 
> I am saying you do not need this, unless you can show evidence.
> 
> If your ndo_get_stats() needs to call netif_running(), this would be
> the fix IMO.

The synchronize_net() is supposed to sync all previous calls of ndo_get_stats() and wait for their completion before closing the device.
Again this seems to be the v2 of this patch. In the v3, I have provided the warn log as well in the commit message for reference, in answer to
the changelog comment for more clarification.

As I stated, this is needed because ndo_stop() races with ndo_get_stats(), and a 'lock' or a similar mechanism seems required to alleviate this.
Fixes in the same vein seem to be common, as I do see other drivers utilizing a lock mechanism while retrieving statistics to resolve the same 
(ie; race with resource destruction in ndo_stop()). So, just to state, I'm not trying to do anything new
here.

Can we please comment further on the v3 of this patch? https://lore.kernel.org/all/20241218115111.2407958-1-srasheed@marvell.com/

Thanks a lot for your time and comments
Eric Dumazet Dec. 19, 2024, 4:35 p.m. UTC | #15
On Thu, Dec 19, 2024 at 5:28 PM Shinas Rasheed <srasheed@marvell.com> wrote:
>
> Hi Eric,
>
> > On Wed, Dec 18, 2024 at 4:25 PM Shinas Rasheed <srasheed@marvell.com>
> > wrote:
> >
> > > Hi Eric,
> > >
> > > This patch is not a workaround. In some setups, we were seeing races with
> > regards
> > > to resource freeing between ndo_stop() and ndo_get_stats(). Hence to sync
> > with the view of
> > > resources, a synchronize_net() is called in ndo_stop(). Please let me know if
> > you see anything wrong here.
> >
> > We do not add a synchronize_net() without a very strong explanation
> > (details, not a weak sentence in the changelog).
> >
> > Where is the opposite barrier in your patch ?
> >
> > I am saying you do not need this, unless you can show evidence.
> >
> > If your ndo_get_stats() needs to call netif_running(), this would be
> > the fix IMO.
>
> The synchronize_net() is supposed to sync all previous calls of ndo_get_stats() and wait for their completion before closing the device.
> Again this seems to be the v2 of this patch. In the v3, I have provided the warn log as well in the commit message for reference, in answer to
> the changelog comment for more clarification.
>
> As I stated, this is needed because ndo_stop() races with ndo_get_stats(), and a 'lock' or a similar mechanism seems required to alleviate this.
> Fixes in the same vein seem to be common, as I do see other drivers utilizing a lock mechanism while retrieving statistics to resolve the same
> (ie; race with resource destruction in ndo_stop()). So, just to state, I'm not trying to do anything new
> here.

A precise lock is better, it is easy to grep, if really other drivers
were not able to fix this in a different way.

A synchronize_net() without explicit details is very very weak.

>
> Can we please comment further on the v3 of this patch? https://lore.kernel.org/all/20241218115111.2407958-1-srasheed@marvell.com/
>
> Thanks a lot for your time and comments

I tried to say that adding a synchronize_net() call in ndo_stop() was
not needed.

I do not particularly care for this driver, my concern is that
copy/pasting is going to happen.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
index 549436efc204..941bbaaa67b5 100644
--- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
+++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
@@ -757,6 +757,7 @@  static int octep_stop(struct net_device *netdev)
 {
 	struct octep_device *oct = netdev_priv(netdev);
 
+	synchronize_net();
 	netdev_info(netdev, "Stopping the device ...\n");
 
 	octep_ctrl_net_set_link_status(oct, OCTEP_CTRL_NET_INVALID_VFID, false,