Message ID | 20241216075842.2394606-2-srasheed@marvell.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Fix race conditions in ndo_get_stats64 | expand |
On Sun, Dec 15, 2024 at 11:58:39PM -0800, Shinas Rasheed wrote: > ndo_get_stats64() can race with ndo_stop(), which frees input and > output queue resources. Call synchronize_net() to avoid such races. > > Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops") > Signed-off-by: Shinas Rasheed <srasheed@marvell.com> > --- > V2: > - Changed sync mechanism to fix race conditions from using an atomic > set_bit ops to a much simpler synchronize_net() > > V1: https://lore.kernel.org/all/20241203072130.2316913-2-srasheed@marvell.com/ > > drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > index 549436efc204..941bbaaa67b5 100644 > --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > @@ -757,6 +757,7 @@ static int octep_stop(struct net_device *netdev) > { > struct octep_device *oct = netdev_priv(netdev); > > + synchronize_net(); You should have elaborated on the fact that this synchronize_net() is for __LINK_STATE_START flag in the commit message, this is not obvious. Also, is octep_get_stats64() called from RCU-safe context? > netdev_info(netdev, "Stopping the device ...\n"); > > octep_ctrl_net_set_link_status(oct, OCTEP_CTRL_NET_INVALID_VFID, false, > -- > 2.25.1 > >
On Mon, Dec 16, 2024 at 03:30:12PM +0100, Larysa Zaremba wrote: > On Sun, Dec 15, 2024 at 11:58:39PM -0800, Shinas Rasheed wrote: > > ndo_get_stats64() can race with ndo_stop(), which frees input and > > output queue resources. Call synchronize_net() to avoid such races. > > > > Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops") > > Signed-off-by: Shinas Rasheed <srasheed@marvell.com> > > --- > > V2: > > - Changed sync mechanism to fix race conditions from using an atomic > > set_bit ops to a much simpler synchronize_net() > > > > V1: https://lore.kernel.org/all/20241203072130.2316913-2-srasheed@marvell.com/ > > > > drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > index 549436efc204..941bbaaa67b5 100644 > > --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > @@ -757,6 +757,7 @@ static int octep_stop(struct net_device *netdev) > > { > > struct octep_device *oct = netdev_priv(netdev); > > > > + synchronize_net(); > > You should have elaborated on the fact that this synchronize_net() is for > __LINK_STATE_START flag in the commit message, this is not obvious. Also, is > octep_get_stats64() called from RCU-safe context? > Now I see that in case !netif_running(), you do not bail out of octep_get_stats64() fully (or at all after the second patch). So, could you explain, how are you utilizing RCU here? > > netdev_info(netdev, "Stopping the device ...\n"); > > > > octep_ctrl_net_set_link_status(oct, OCTEP_CTRL_NET_INVALID_VFID, false, > > -- > > 2.25.1 > > > > >
Hi Larysa, > On Sun, Dec 15, 2024 at 11:58:39PM -0800, Shinas Rasheed wrote: > > ndo_get_stats64() can race with ndo_stop(), which frees input and > > output queue resources. Call synchronize_net() to avoid such races. > > > > Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops") > > Signed-off-by: Shinas Rasheed <srasheed@marvell.com> > > --- > > V2: > > - Changed sync mechanism to fix race conditions from using an atomic > > set_bit ops to a much simpler synchronize_net() > > > > V1: https://urldefense.proofpoint.com/v2/url?u=https- > 3A__lore.kernel.org_all_20241203072130.2316913-2D2-2Dsrasheed- > 40marvell.com_&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=1OxLD4y- > oxrlgQ1rjXgWtmLz1pnaDjD96sDq- > cKUwK4&m=SfdHxslanlPmhVsD4m3qSuUI2eout0Tqi9o- > DnPbfjVzv5Nte11HnOw2-aIeEN3i&s=0iv3vwi5W- > le8dxI6EtazU7q4RlnvElcVk5MsyMm_0k&e= > > > > drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > index 549436efc204..941bbaaa67b5 100644 > > --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > @@ -757,6 +757,7 @@ static int octep_stop(struct net_device *netdev) > > { > > struct octep_device *oct = netdev_priv(netdev); > > > > + synchronize_net(); > > You should have elaborated on the fact that this synchronize_net() is for > __LINK_STATE_START flag in the commit message, this is not obvious. Also, is > octep_get_stats64() called from RCU-safe context? The octep_get_stats64() is an instance of the ndo_get_stats64() netdev op. As I understand, the netdev op is supposed to be called from RCU-safe contexts. Please do correct me if I'm wrong.
Hi Larysa, > On Mon, Dec 16, 2024 at 03:30:12PM +0100, Larysa Zaremba wrote: > > On Sun, Dec 15, 2024 at 11:58:39PM -0800, Shinas Rasheed wrote: > > > ndo_get_stats64() can race with ndo_stop(), which frees input and > > > output queue resources. Call synchronize_net() to avoid such races. > > > > > > Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops") > > > Signed-off-by: Shinas Rasheed <srasheed@marvell.com> > > > --- > > > V2: > > > - Changed sync mechanism to fix race conditions from using an atomic > > > set_bit ops to a much simpler synchronize_net() > > > > > > V1: https://urldefense.proofpoint.com/v2/url?u=https- > 3A__lore.kernel.org_all_20241203072130.2316913-2D2-2Dsrasheed- > 40marvell.com_&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=1OxLD4y- > oxrlgQ1rjXgWtmLz1pnaDjD96sDq-cKUwK4&m=Dh7BH5wsuCdQnE- > 4erjptaJnM42YsLU2tY4wPn5NWqwsymkNOllPfQAkomj1mXPN&s=IjWHk3SOqr > ibgv6kz-WTL8VfGVInSu5DzKSbcjCFIvk&e= > > > > > > drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > > index 549436efc204..941bbaaa67b5 100644 > > > --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > > +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > > @@ -757,6 +757,7 @@ static int octep_stop(struct net_device *netdev) > > > { > > > struct octep_device *oct = netdev_priv(netdev); > > > > > > + synchronize_net(); > > > > You should have elaborated on the fact that this synchronize_net() is for > > __LINK_STATE_START flag in the commit message, this is not obvious. Also, > is > > octep_get_stats64() called from RCU-safe context? > > > > Now I see that in case !netif_running(), you do not bail out of > octep_get_stats64() fully (or at all after the second patch). So, could you > explain, how are you utilizing RCU here? > The understanding is that octep_get_stats64() (.ndo_get_stats64() in turn) is called from RCU safe contexts, and that the netdev op is never called after the ndo_stop(). Thanks for the comments
On Mon, Dec 16, 2024 at 06:28:13PM +0000, Shinas Rasheed wrote: > Hi Larysa, > > > On Mon, Dec 16, 2024 at 03:30:12PM +0100, Larysa Zaremba wrote: > > > On Sun, Dec 15, 2024 at 11:58:39PM -0800, Shinas Rasheed wrote: > > > > ndo_get_stats64() can race with ndo_stop(), which frees input and > > > > output queue resources. Call synchronize_net() to avoid such races. > > > > > > > > Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops") > > > > Signed-off-by: Shinas Rasheed <srasheed@marvell.com> > > > > --- > > > > V2: > > > > - Changed sync mechanism to fix race conditions from using an atomic > > > > set_bit ops to a much simpler synchronize_net() > > > > > > > > V1: https://urldefense.proofpoint.com/v2/url?u=https- > > 3A__lore.kernel.org_all_20241203072130.2316913-2D2-2Dsrasheed- > > 40marvell.com_&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=1OxLD4y- > > oxrlgQ1rjXgWtmLz1pnaDjD96sDq-cKUwK4&m=Dh7BH5wsuCdQnE- > > 4erjptaJnM42YsLU2tY4wPn5NWqwsymkNOllPfQAkomj1mXPN&s=IjWHk3SOqr > > ibgv6kz-WTL8VfGVInSu5DzKSbcjCFIvk&e= > > > > > > > > drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > > > index 549436efc204..941bbaaa67b5 100644 > > > > --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > > > +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > > > @@ -757,6 +757,7 @@ static int octep_stop(struct net_device *netdev) > > > > { > > > > struct octep_device *oct = netdev_priv(netdev); > > > > > > > > + synchronize_net(); > > > > > > You should have elaborated on the fact that this synchronize_net() is for > > > __LINK_STATE_START flag in the commit message, this is not obvious. Also, > > is > > > octep_get_stats64() called from RCU-safe context? > > > > > > > Now I see that in case !netif_running(), you do not bail out of > > octep_get_stats64() fully (or at all after the second patch). So, could you > > explain, how are you utilizing RCU here? > > > > The understanding is that octep_get_stats64() (.ndo_get_stats64() in turn) is called from RCU safe contexts, and > that the netdev op is never called after the ndo_stop(). As I now see, in net/core/net-sysfs.c, yes there is an rcu read lock around the thing, but there are a lot more callers and for example veth_get_stats64() explicitly calls rcu_read_lock(). Also, even with RCU-protected section, I am not sure prevents the octep_get_stats64() to be called after synchronize_net() finishes. Again, the callers seem too diverse to definitely say that we can rely on built-in flags for this to not happen :/ > > Thanks for the comments
> > > On Mon, Dec 16, 2024 at 03:30:12PM +0100, Larysa Zaremba wrote: > > > > On Sun, Dec 15, 2024 at 11:58:39PM -0800, Shinas Rasheed wrote: > > > > > ndo_get_stats64() can race with ndo_stop(), which frees input and > > > > > output queue resources. Call synchronize_net() to avoid such races. > > > > > > > > > > Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops") > > > > > Signed-off-by: Shinas Rasheed <srasheed@marvell.com> > > > > > --- > > > > > V2: > > > > > - Changed sync mechanism to fix race conditions from using an > atomic > > > > > set_bit ops to a much simpler synchronize_net() > > > > > > > > > > drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 + > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > > b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > > > > index 549436efc204..941bbaaa67b5 100644 > > > > > --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > > > > +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > > > > @@ -757,6 +757,7 @@ static int octep_stop(struct net_device > *netdev) > > > > > { > > > > > struct octep_device *oct = netdev_priv(netdev); > > > > > > > > > > + synchronize_net(); > > > > > > > > You should have elaborated on the fact that this synchronize_net() is for > > > > __LINK_STATE_START flag in the commit message, this is not obvious. > Also, > > > is > > > > octep_get_stats64() called from RCU-safe context? > > > > > > > > > > Now I see that in case !netif_running(), you do not bail out of > > > octep_get_stats64() fully (or at all after the second patch). So, could you > > > explain, how are you utilizing RCU here? > > > > > > > The understanding is that octep_get_stats64() (.ndo_get_stats64() in turn) is > called from RCU safe contexts, and > > that the netdev op is never called after the ndo_stop(). > > As I now see, in net/core/net-sysfs.c, yes there is an rcu read lock around the > thing, but there are a lot more callers and for example veth_get_stats64() > explicitly calls rcu_read_lock(). > > Also, even with RCU-protected section, I am not sure prevents the > octep_get_stats64() to be called after synchronize_net() finishes. Again, the > callers seem too diverse to definitely say that we can rely on built-in flags > for this to not happen :/ Usually, the understanding is that ndo_get_stats won't be called by the network stack after the interface is put down. As long as that is the case, I don't think we should keep adding checks until there is a strong reason to do so. What do you think? > > Thanks for the comments
On Tue, Dec 17, 2024 at 05:49:13PM +0000, Shinas Rasheed wrote: > > > > > > On Mon, Dec 16, 2024 at 03:30:12PM +0100, Larysa Zaremba wrote: > > > > > On Sun, Dec 15, 2024 at 11:58:39PM -0800, Shinas Rasheed wrote: > > > > > > ndo_get_stats64() can race with ndo_stop(), which frees input and > > > > > > output queue resources. Call synchronize_net() to avoid such races. > > > > > > > > > > > > Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops") > > > > > > Signed-off-by: Shinas Rasheed <srasheed@marvell.com> > > > > > > --- > > > > > > V2: > > > > > > - Changed sync mechanism to fix race conditions from using an > > atomic > > > > > > set_bit ops to a much simpler synchronize_net() > > > > > > > > > > > > drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 + > > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > > > b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > > > > > index 549436efc204..941bbaaa67b5 100644 > > > > > > --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > > > > > +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > > > > > @@ -757,6 +757,7 @@ static int octep_stop(struct net_device > > *netdev) > > > > > > { > > > > > > struct octep_device *oct = netdev_priv(netdev); > > > > > > > > > > > > + synchronize_net(); > > > > > > > > > > You should have elaborated on the fact that this synchronize_net() is for > > > > > __LINK_STATE_START flag in the commit message, this is not obvious. > > Also, > > > > is > > > > > octep_get_stats64() called from RCU-safe context? > > > > > > > > > > > > > Now I see that in case !netif_running(), you do not bail out of > > > > octep_get_stats64() fully (or at all after the second patch). So, could you > > > > explain, how are you utilizing RCU here? > > > > > > > > > > The understanding is that octep_get_stats64() (.ndo_get_stats64() in turn) is > > called from RCU safe contexts, and > > > that the netdev op is never called after the ndo_stop(). > > > > As I now see, in net/core/net-sysfs.c, yes there is an rcu read lock around the > > thing, but there are a lot more callers and for example veth_get_stats64() > > explicitly calls rcu_read_lock(). > > > > Also, even with RCU-protected section, I am not sure prevents the > > octep_get_stats64() to be called after synchronize_net() finishes. Again, the > > callers seem too diverse to definitely say that we can rely on built-in flags > > for this to not happen :/ > > Usually, the understanding is that ndo_get_stats won't be called by the network stack after the interface is put down. As long as that is the case, I don't think we should keep adding checks until there is a strong reason to do so. What do you think? > It is hard to know without testing (but testing should not be hard). I think the phrase "Statistics must persist across routine operations like bringing the interface down and up." [0] implies that bringing the interface down may not necessarily prevent stats calls. [0] https://docs.kernel.org/networking/statistics.html > > > Thanks for the comments
Hi Larysa, > > > > > On Mon, Dec 16, 2024 at 03:30:12PM +0100, Larysa Zaremba wrote: > > > > > > On Sun, Dec 15, 2024 at 11:58:39PM -0800, Shinas Rasheed wrote: > > > > > > > ndo_get_stats64() can race with ndo_stop(), which frees input and > > > > > > > output queue resources. Call synchronize_net() to avoid such > races. > > > > > > > > > > > > > > Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops") > > > > > > > Signed-off-by: Shinas Rasheed <srasheed@marvell.com> > > > > > > > --- > > > > > > > V2: > > > > > > > - Changed sync mechanism to fix race conditions from using an > > > atomic > > > > > > > set_bit ops to a much simpler synchronize_net() > > > > > > > > > > > > > > drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 + > > > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > > > diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > > > > b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > > > > > > index 549436efc204..941bbaaa67b5 100644 > > > > > > > --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > > > > > > +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c > > > > > > > @@ -757,6 +757,7 @@ static int octep_stop(struct net_device > > > *netdev) > > > > > > > { > > > > > > > struct octep_device *oct = netdev_priv(netdev); > > > > > > > > > > > > > > + synchronize_net(); > > > > > > > > > > > > You should have elaborated on the fact that this synchronize_net() is > for > > > > > > __LINK_STATE_START flag in the commit message, this is not obvious. > > > Also, > > > > > is > > > > > > octep_get_stats64() called from RCU-safe context? > > > > > > > > > > > > > > > > Now I see that in case !netif_running(), you do not bail out of > > > > > octep_get_stats64() fully (or at all after the second patch). So, could > you > > > > > explain, how are you utilizing RCU here? > > > > > > > > > > > > > The understanding is that octep_get_stats64() (.ndo_get_stats64() in > turn) is > > > called from RCU safe contexts, and > > > > that the netdev op is never called after the ndo_stop(). > > > > > > As I now see, in net/core/net-sysfs.c, yes there is an rcu read lock around > the > > > thing, but there are a lot more callers and for example veth_get_stats64() > > > explicitly calls rcu_read_lock(). > > > > > > Also, even with RCU-protected section, I am not sure prevents the > > > octep_get_stats64() to be called after synchronize_net() finishes. Again, > the > > > callers seem too diverse to definitely say that we can rely on built-in flags > > > for this to not happen :/ > > > > Usually, the understanding is that ndo_get_stats won't be called by the > network stack after the interface is put down. As long as that is the case, I > don't think we should keep adding checks until there is a strong reason to do > so. What do you think? > > > > It is hard to know without testing (but testing should not be hard). I think the > phrase "Statistics must persist across routine operations like bringing the > interface down and up." [0] implies that bringing the interface down may not > necessarily prevent stats calls. > > [0] https://urldefense.proofpoint.com/v2/url?u=https- > 3A__docs.kernel.org_networking_statistics.html&d=DwIBAg&c=nKjWec2b6R0 > mOyPaz7xtfQ&r=1OxLD4y-oxrlgQ1rjXgWtmLz1pnaDjD96sDq- > cKUwK4&m=DJzJNo9WT10pSHikJhCBbN7-CfB- > O2kz9OVGsmiIRQXvcIIWDK6034tMmzZGvlFs&s=essE01suLWF42taNi0yJ3H3YC > 0Et8GofMj5wxor9yD4&e= > Sorry, I misworded my previous statement. Of course ndo_get_stats can get called while the netdev is down. This is tested code, and the reason why there is no issue in this scenario is because for a ndo_get_stats call that happens after ndo_stop() happens, the oct->num_oqs will be seen as 0, and hence octep_iq nor octep_oq is not accessed in the for loop that follows. Octep_iq and octep_oq are the resources that we're trying to protect from race conditions. Hope that clarifies things. > > > > Thanks for the comments
On Wed, Dec 18, 2024 at 2:25 PM Larysa Zaremba <larysa.zaremba@intel.com> wrote: > > It is hard to know without testing (but testing should not be hard). I think the > phrase "Statistics must persist across routine operations like bringing the > interface down and up." [0] implies that bringing the interface down may not > necessarily prevent stats calls. Please don't add workarounds to individual drivers. I think the core networking stack should handle the possible races. Most dev_get_stats() callers are correctly testing dev_isalive() or are protected by RTNL. There are few nested cases that are not properly handled, the following patch should take care of them. diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2593019ad5b1614f3b8c037afb4ba4fa740c7d51..768afc2a18d343d051e7a1b631124910af9922d2 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -5342,6 +5342,12 @@ static inline const char *netdev_reg_state(const struct net_device *dev) return " (unknown)"; } +/* Caller holds RTNL or RCU */ +static inline int dev_isalive(const struct net_device *dev) +{ + return READ_ONCE(dev->reg_state) <= NETREG_REGISTERED; +} + #define MODULE_ALIAS_NETDEV(device) \ MODULE_ALIAS("netdev-" device) diff --git a/net/core/dev.c b/net/core/dev.c index c7f3dea3e0eb9eb05865e49dd7a8535afb974149..f11f305f3136f208fcb285c7b314914aef20dfad 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -11044,8 +11044,13 @@ struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev, const struct net_device_ops *ops = dev->netdev_ops; const struct net_device_core_stats __percpu *p; + memset(storage, 0, sizeof(*storage)); + rcu_read_lock(); + + if (unlikely(!dev_isalive(dev))) + goto unlock; + if (ops->ndo_get_stats64) { - memset(storage, 0, sizeof(*storage)); ops->ndo_get_stats64(dev, storage); } else if (ops->ndo_get_stats) { netdev_stats_to_stats64(storage, ops->ndo_get_stats(dev)); @@ -11071,6 +11076,8 @@ struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev, storage->rx_otherhost_dropped += READ_ONCE(core_stats->rx_otherhost_dropped); } } +unlock: + rcu_read_unlock(); return storage; } EXPORT_SYMBOL(dev_get_stats); diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 2d9afc6e2161efa51ffa62813ec10c8f43944bce..3f4851d67015c959dd531c571c46fc2ac18beb65 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -36,12 +36,6 @@ static const char fmt_uint[] = "%u\n"; static const char fmt_ulong[] = "%lu\n"; static const char fmt_u64[] = "%llu\n"; -/* Caller holds RTNL or RCU */ -static inline int dev_isalive(const struct net_device *dev) -{ - return READ_ONCE(dev->reg_state) <= NETREG_REGISTERED; -} - /* use same locking rules as GIF* ioctl's */ static ssize_t netdev_show(const struct device *dev, struct device_attribute *attr, char *buf,
On Wed, Dec 18, 2024 at 03:21:12PM +0100, Eric Dumazet wrote: > On Wed, Dec 18, 2024 at 2:25 PM Larysa Zaremba <larysa.zaremba@intel.com> wrote: > > > > > It is hard to know without testing (but testing should not be hard). I think the > > phrase "Statistics must persist across routine operations like bringing the > > interface down and up." [0] implies that bringing the interface down may not > > necessarily prevent stats calls. > > Please don't add workarounds to individual drivers. > > I think the core networking stack should handle the possible races. > > Most dev_get_stats() callers are correctly testing dev_isalive() or > are protected by RTNL. > > There are few nested cases that are not properly handled, the > following patch should take care of them. > I was under the impression that .ndo_stop() being called does not mean the device stops being NETREG_REGISTERED, such link would be required to solve the original problem with your patch alone (though it is generally a good change). Could you please explain this relation? > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 2593019ad5b1614f3b8c037afb4ba4fa740c7d51..768afc2a18d343d051e7a1b631124910af9922d2 > 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -5342,6 +5342,12 @@ static inline const char > *netdev_reg_state(const struct net_device *dev) > return " (unknown)"; > } > > +/* Caller holds RTNL or RCU */ > +static inline int dev_isalive(const struct net_device *dev) > +{ > + return READ_ONCE(dev->reg_state) <= NETREG_REGISTERED; > +} > + > #define MODULE_ALIAS_NETDEV(device) \ > MODULE_ALIAS("netdev-" device) > > diff --git a/net/core/dev.c b/net/core/dev.c > index c7f3dea3e0eb9eb05865e49dd7a8535afb974149..f11f305f3136f208fcb285c7b314914aef20dfad > 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -11044,8 +11044,13 @@ struct rtnl_link_stats64 > *dev_get_stats(struct net_device *dev, > const struct net_device_ops *ops = dev->netdev_ops; > const struct net_device_core_stats __percpu *p; > > + memset(storage, 0, sizeof(*storage)); > + rcu_read_lock(); > + > + if (unlikely(!dev_isalive(dev))) > + goto unlock; > + > if (ops->ndo_get_stats64) { > - memset(storage, 0, sizeof(*storage)); > ops->ndo_get_stats64(dev, storage); > } else if (ops->ndo_get_stats) { > netdev_stats_to_stats64(storage, ops->ndo_get_stats(dev)); > @@ -11071,6 +11076,8 @@ struct rtnl_link_stats64 *dev_get_stats(struct > net_device *dev, > storage->rx_otherhost_dropped += > READ_ONCE(core_stats->rx_otherhost_dropped); > } > } > +unlock: > + rcu_read_unlock(); > return storage; > } > EXPORT_SYMBOL(dev_get_stats); > diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c > index 2d9afc6e2161efa51ffa62813ec10c8f43944bce..3f4851d67015c959dd531c571c46fc2ac18beb65 > 100644 > --- a/net/core/net-sysfs.c > +++ b/net/core/net-sysfs.c > @@ -36,12 +36,6 @@ static const char fmt_uint[] = "%u\n"; > static const char fmt_ulong[] = "%lu\n"; > static const char fmt_u64[] = "%llu\n"; > > -/* Caller holds RTNL or RCU */ > -static inline int dev_isalive(const struct net_device *dev) > -{ > - return READ_ONCE(dev->reg_state) <= NETREG_REGISTERED; > -} > - > /* use same locking rules as GIF* ioctl's */ > static ssize_t netdev_show(const struct device *dev, > struct device_attribute *attr, char *buf,
On Wed, Dec 18, 2024 at 3:49 PM Larysa Zaremba <larysa.zaremba@intel.com> wrote: > > On Wed, Dec 18, 2024 at 03:21:12PM +0100, Eric Dumazet wrote: > > On Wed, Dec 18, 2024 at 2:25 PM Larysa Zaremba <larysa.zaremba@intel.com> wrote: > > > > > > > > It is hard to know without testing (but testing should not be hard). I think the > > > phrase "Statistics must persist across routine operations like bringing the > > > interface down and up." [0] implies that bringing the interface down may not > > > necessarily prevent stats calls. > > > > Please don't add workarounds to individual drivers. > > > > I think the core networking stack should handle the possible races. > > > > Most dev_get_stats() callers are correctly testing dev_isalive() or > > are protected by RTNL. > > > > There are few nested cases that are not properly handled, the > > following patch should take care of them. > > > > I was under the impression that .ndo_stop() being called does not mean the > device stops being NETREG_REGISTERED, such link would be required to solve the > original problem with your patch alone (though it is generally a good change). > Could you please explain this relation? > ndo_stop() being called must have no impact on statistics : # ip -s link sh dev lo 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 RX: bytes packets errors dropped missed mcast 3473568 41352 0 0 0 0 TX: bytes packets errors dropped carrier collsns 3473568 41352 0 0 0 0 # ip link set dev lo down # would call ndo_stop() if loopback had one # ip -s link sh dev lo 1: lo: <LOOPBACK> mtu 65536 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 RX: bytes packets errors dropped missed mcast 3473568 41352 0 0 0 0 TX: bytes packets errors dropped carrier collsns 3473568 41352 0 0 0 0 So perhaps the problem with this driver is that its ndo_stop() is doing things it should not.
> > On Wed, Dec 18, 2024 at 03:21:12PM +0100, Eric Dumazet wrote: > > > On Wed, Dec 18, 2024 at 2:25 PM Larysa Zaremba > <larysa.zaremba@intel.com> wrote: > > > > > > > > > > > It is hard to know without testing (but testing should not be hard). I > think the > > > > phrase "Statistics must persist across routine operations like bringing the > > > > interface down and up." [0] implies that bringing the interface down > may not > > > > necessarily prevent stats calls. > > > > > > Please don't add workarounds to individual drivers. > > > > > > I think the core networking stack should handle the possible races. > > > > > > Most dev_get_stats() callers are correctly testing dev_isalive() or > > > are protected by RTNL. > > > > > > There are few nested cases that are not properly handled, the > > > following patch should take care of them. > > > > > > > I was under the impression that .ndo_stop() being called does not mean the > > device stops being NETREG_REGISTERED, such link would be required to > solve the > > original problem with your patch alone (though it is generally a good > change). > > Could you please explain this relation? > > > > ndo_stop() being called must have no impact on statistics : > > # ip -s link sh dev lo > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN > mode DEFAULT group default qlen 1000 > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > RX: bytes packets errors dropped missed mcast > 3473568 41352 0 0 0 0 > TX: bytes packets errors dropped carrier collsns > 3473568 41352 0 0 0 0 > > # ip link set dev lo down # would call ndo_stop() if loopback had one > > # ip -s link sh dev lo > 1: lo: <LOOPBACK> mtu 65536 qdisc noqueue state DOWN mode DEFAULT > group default qlen 1000 > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > RX: bytes packets errors dropped missed mcast > 3473568 41352 0 0 0 0 > TX: bytes packets errors dropped carrier collsns > 3473568 41352 0 0 0 0 > > > So perhaps the problem with this driver is that its ndo_stop() is > doing things it should not. Hi Eric, This patch is not a workaround. In some setups, we were seeing races with regards to resource freeing between ndo_stop() and ndo_get_stats(). Hence to sync with the view of resources, a synchronize_net() is called in ndo_stop(). Please let me know if you see anything wrong here. Thanks
On Wed, Dec 18, 2024 at 4:25 PM Shinas Rasheed <srasheed@marvell.com> wrote: > Hi Eric, > > This patch is not a workaround. In some setups, we were seeing races with regards > to resource freeing between ndo_stop() and ndo_get_stats(). Hence to sync with the view of > resources, a synchronize_net() is called in ndo_stop(). Please let me know if you see anything wrong here. We do not add a synchronize_net() without a very strong explanation (details, not a weak sentence in the changelog). Where is the opposite barrier in your patch ? I am saying you do not need this, unless you can show evidence. If your ndo_get_stats() needs to call netif_running(), this would be the fix IMO.
Hi Eric, > On Wed, Dec 18, 2024 at 4:25 PM Shinas Rasheed <srasheed@marvell.com> > wrote: > > > Hi Eric, > > > > This patch is not a workaround. In some setups, we were seeing races with > regards > > to resource freeing between ndo_stop() and ndo_get_stats(). Hence to sync > with the view of > > resources, a synchronize_net() is called in ndo_stop(). Please let me know if > you see anything wrong here. > > We do not add a synchronize_net() without a very strong explanation > (details, not a weak sentence in the changelog). > > Where is the opposite barrier in your patch ? > > I am saying you do not need this, unless you can show evidence. > > If your ndo_get_stats() needs to call netif_running(), this would be > the fix IMO. The synchronize_net() is supposed to sync all previous calls of ndo_get_stats() and wait for their completion before closing the device. Again this seems to be the v2 of this patch. In the v3, I have provided the warn log as well in the commit message for reference, in answer to the changelog comment for more clarification. As I stated, this is needed because ndo_stop() races with ndo_get_stats(), and a 'lock' or a similar mechanism seems required to alleviate this. Fixes in the same vein seem to be common, as I do see other drivers utilizing a lock mechanism while retrieving statistics to resolve the same (ie; race with resource destruction in ndo_stop()). So, just to state, I'm not trying to do anything new here. Can we please comment further on the v3 of this patch? https://lore.kernel.org/all/20241218115111.2407958-1-srasheed@marvell.com/ Thanks a lot for your time and comments
On Thu, Dec 19, 2024 at 5:28 PM Shinas Rasheed <srasheed@marvell.com> wrote: > > Hi Eric, > > > On Wed, Dec 18, 2024 at 4:25 PM Shinas Rasheed <srasheed@marvell.com> > > wrote: > > > > > Hi Eric, > > > > > > This patch is not a workaround. In some setups, we were seeing races with > > regards > > > to resource freeing between ndo_stop() and ndo_get_stats(). Hence to sync > > with the view of > > > resources, a synchronize_net() is called in ndo_stop(). Please let me know if > > you see anything wrong here. > > > > We do not add a synchronize_net() without a very strong explanation > > (details, not a weak sentence in the changelog). > > > > Where is the opposite barrier in your patch ? > > > > I am saying you do not need this, unless you can show evidence. > > > > If your ndo_get_stats() needs to call netif_running(), this would be > > the fix IMO. > > The synchronize_net() is supposed to sync all previous calls of ndo_get_stats() and wait for their completion before closing the device. > Again this seems to be the v2 of this patch. In the v3, I have provided the warn log as well in the commit message for reference, in answer to > the changelog comment for more clarification. > > As I stated, this is needed because ndo_stop() races with ndo_get_stats(), and a 'lock' or a similar mechanism seems required to alleviate this. > Fixes in the same vein seem to be common, as I do see other drivers utilizing a lock mechanism while retrieving statistics to resolve the same > (ie; race with resource destruction in ndo_stop()). So, just to state, I'm not trying to do anything new > here. A precise lock is better, it is easy to grep, if really other drivers were not able to fix this in a different way. A synchronize_net() without explicit details is very very weak. > > Can we please comment further on the v3 of this patch? https://lore.kernel.org/all/20241218115111.2407958-1-srasheed@marvell.com/ > > Thanks a lot for your time and comments I tried to say that adding a synchronize_net() call in ndo_stop() was not needed. I do not particularly care for this driver, my concern is that copy/pasting is going to happen.
diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c index 549436efc204..941bbaaa67b5 100644 --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c @@ -757,6 +757,7 @@ static int octep_stop(struct net_device *netdev) { struct octep_device *oct = netdev_priv(netdev); + synchronize_net(); netdev_info(netdev, "Stopping the device ...\n"); octep_ctrl_net_set_link_status(oct, OCTEP_CTRL_NET_INVALID_VFID, false,
ndo_get_stats64() can race with ndo_stop(), which frees input and output queue resources. Call synchronize_net() to avoid such races. Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops") Signed-off-by: Shinas Rasheed <srasheed@marvell.com> --- V2: - Changed sync mechanism to fix race conditions from using an atomic set_bit ops to a much simpler synchronize_net() V1: https://lore.kernel.org/all/20241203072130.2316913-2-srasheed@marvell.com/ drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 + 1 file changed, 1 insertion(+)