Message ID | 85228e43f4771609b290964a8983e8c567e22509.1722211917.git.jamie.bainbridge@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v2] net-sysfs: check device is present when showing duplex | expand |
Hi, On 7/29/24 02:12, Jamie Bainbridge wrote: > A sysfs reader can race with a device reset or removal, attempting to > read device state when the device is not actuall present. > > This is the same sort of panic as observed in commit 4224cfd7fb65 > ("net-sysfs: add check for netdevice being present to speed_show"): > > [exception RIP: qed_get_current_link+17] > #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede] > #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3 > #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4 > #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300 > #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c > #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b > #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3 > #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1 > #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f > #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb > > crash> struct net_device.state ffff9a9d21336000 > state = 5, > > state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100). > The device is not present, note lack of __LINK_STATE_PRESENT (0b10). > > Resolve by adding the same netif_device_present() check to duplex_show. > > Fixes: 8ae6daca85c8 ("ethtool: Call ethtool's get/set_settings callbacks with cleaned data") the patch LGTM, but it looks like the issue pre-exist WRT the above blamed commit??! possibly: Fixes: d519e17e2d01 ("net: export device speed and duplex via sysfs") Also please explicitly CC people who gave feedback on previous revisions, Thanks, Paolo
Hi Jamie, On Mon, 29 Jul 2024 10:12:10 +1000, Jamie Bainbridge wrote: > A sysfs reader can race with a device reset or removal, attempting to > read device state when the device is not actuall present. > > This is the same sort of panic as observed in commit 4224cfd7fb65 > ("net-sysfs: add check for netdevice being present to speed_show"): > > [exception RIP: qed_get_current_link+17] > #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede] > #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3 > #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4 > #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300 > #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c > #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b > #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3 > #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1 > #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f > #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb > > crash> struct net_device.state ffff9a9d21336000 > state = 5, > > state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100). > The device is not present, note lack of __LINK_STATE_PRESENT (0b10). > > Resolve by adding the same netif_device_present() check to duplex_show. > > Fixes: 8ae6daca85c8 ("ethtool: Call ethtool's get/set_settings callbacks with cleaned data") > Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> > --- > v2: Restrict patch to just required path and describe problem in more > detail as suggested by Johannes Berg. Improve commit message format > as suggested by Shigeru Yoshida. > --- > net/core/net-sysfs.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c > index 0e2084ce7b7572bff458ed7e02358d9258c74628..22801d165d852a6578ca625b9674090519937be5 100644 > --- a/net/core/net-sysfs.c > +++ b/net/core/net-sysfs.c > @@ -261,7 +261,7 @@ static ssize_t duplex_show(struct device *dev, > if (!rtnl_trylock()) > return restart_syscall(); > > - if (netif_running(netdev)) { > + if (netif_running(netdev) && netif_device_present(netdev)) { > struct ethtool_link_ksettings cmd; > > if (!__ethtool_get_link_ksettings(netdev, &cmd)) { As for the qede driver mentioned in the commit log, I assume the race was caused between duplex_show() and qede_recovery_handler(). qede_recovery_handler() clears __LINK_STATE_PRESENT on recovery failure and it is called with rtnl lock, so I think the patch works correctly. As Paolo mentioned, I think the issue was introduced when duplex_show()/show_duplex() was first introduced. Anyway, Reviewed-by: Shigeru Yoshida <syoshida@redhat.com> > -- > 2.39.2 > >
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 0e2084ce7b7572bff458ed7e02358d9258c74628..22801d165d852a6578ca625b9674090519937be5 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -261,7 +261,7 @@ static ssize_t duplex_show(struct device *dev, if (!rtnl_trylock()) return restart_syscall(); - if (netif_running(netdev)) { + if (netif_running(netdev) && netif_device_present(netdev)) { struct ethtool_link_ksettings cmd; if (!__ethtool_get_link_ksettings(netdev, &cmd)) {
A sysfs reader can race with a device reset or removal, attempting to read device state when the device is not actuall present. This is the same sort of panic as observed in commit 4224cfd7fb65 ("net-sysfs: add check for netdevice being present to speed_show"): [exception RIP: qed_get_current_link+17] #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede] #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb crash> struct net_device.state ffff9a9d21336000 state = 5, state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100). The device is not present, note lack of __LINK_STATE_PRESENT (0b10). Resolve by adding the same netif_device_present() check to duplex_show. Fixes: 8ae6daca85c8 ("ethtool: Call ethtool's get/set_settings callbacks with cleaned data") Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> --- v2: Restrict patch to just required path and describe problem in more detail as suggested by Johannes Berg. Improve commit message format as suggested by Shigeru Yoshida. --- net/core/net-sysfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)