Message ID | 20240911144006.48481-1-alexander.sverdlin@siemens.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: dsa: lan9303: avoid dsa_switch_shutdown() | expand |
Hi Alexander, On Wed, Sep 11, 2024 at 04:40:03PM +0200, A. Sverdlin wrote: > From: Alexander Sverdlin <alexander.sverdlin@siemens.com> > > dsa_switch_shutdown() doesn't bring down any ports, but only disconnects > slaves from master. Packets still come afterwards into master port and the > ports are being polled for link status. This leads to crashes: > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+ #1 > Workqueue: events_power_efficient phy_state_machine > pc : lan9303_mdio_phy_read > lr : lan9303_phy_read > Call trace: > lan9303_mdio_phy_read > lan9303_phy_read > dsa_slave_phy_read > __mdiobus_read > mdiobus_read > genphy_update_link > genphy_read_status > phy_check_link_status > phy_state_machine > process_one_work > worker_thread > > Call lan9303_remove() instead to really unregister all ports before zeroing > drvdata and dsa_ptr. > > Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown") > Cc: stable@vger.kernel.org > Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> > --- > drivers/net/dsa/lan9303-core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c > index 268949939636..ecd507355f51 100644 > --- a/drivers/net/dsa/lan9303-core.c > +++ b/drivers/net/dsa/lan9303-core.c > @@ -1477,7 +1477,7 @@ EXPORT_SYMBOL(lan9303_remove); > > void lan9303_shutdown(struct lan9303 *chip) > { > - dsa_switch_shutdown(chip->ds); > + lan9303_remove(chip); > } > EXPORT_SYMBOL(lan9303_shutdown); > > -- > 2.46.0 > You've said here that a similar change still does not protect against packets received after shutdown: https://lore.kernel.org/netdev/c5e0e67400816d68e6bf90b4a999bfa28c59043b.camel@siemens.com/ The difference between that and this is the extra lan9303_disable_processing_port() calls here. But while that does disable RX on switch ports, it still doesn't wait for pending RX frames to be processed. So the race is still open. No?
Hello Vladimir! On Wed, 2024-09-11 at 19:28 +0300, Vladimir Oltean wrote: > On Wed, Sep 11, 2024 at 04:40:03PM +0200, A. Sverdlin wrote: > > From: Alexander Sverdlin <alexander.sverdlin@siemens.com> > > > > dsa_switch_shutdown() doesn't bring down any ports, but only disconnects > > slaves from master. Packets still come afterwards into master port and the > > ports are being polled for link status. This leads to crashes: > > > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > > CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+ #1 > > Workqueue: events_power_efficient phy_state_machine > > pc : lan9303_mdio_phy_read > > lr : lan9303_phy_read > > Call trace: > > lan9303_mdio_phy_read > > lan9303_phy_read > > dsa_slave_phy_read > > __mdiobus_read > > mdiobus_read > > genphy_update_link > > genphy_read_status > > phy_check_link_status > > phy_state_machine > > process_one_work > > worker_thread > > > > Call lan9303_remove() instead to really unregister all ports before zeroing > > drvdata and dsa_ptr. > > > > Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown") > > Cc: stable@vger.kernel.org > > Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> > > --- > > drivers/net/dsa/lan9303-core.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c > > index 268949939636..ecd507355f51 100644 > > --- a/drivers/net/dsa/lan9303-core.c > > +++ b/drivers/net/dsa/lan9303-core.c > > @@ -1477,7 +1477,7 @@ EXPORT_SYMBOL(lan9303_remove); > > > > void lan9303_shutdown(struct lan9303 *chip) > > { > > - dsa_switch_shutdown(chip->ds); > > + lan9303_remove(chip); > > } > > EXPORT_SYMBOL(lan9303_shutdown); > > > > -- > > 2.46.0 > > > > You've said here that a similar change still does not protect against > packets received after shutdown: > https://lore.kernel.org/netdev/c5e0e67400816d68e6bf90b4a999bfa28c59043b.camel@siemens.com/ > > The difference between that and this is the extra lan9303_disable_processing_port() > calls here. But while that does disable RX on switch ports, it still doesn't wait > for pending RX frames to be processed. So the race is still open. No? This patch addresses the race of zeroing drvdata in static void lan9303_mdio_shutdown(struct mdio_device *mdiodev) { struct lan9303_mdio *sw_dev = dev_get_drvdata(&mdiodev->dev); if (!sw_dev) return; lan9303_shutdown(&sw_dev->chip); dev_set_drvdata(&mdiodev->dev, NULL); } versus static int lan9303_mdio_phy_read(struct lan9303 *chip, int phy, int reg) { struct lan9303_mdio *sw_dev = dev_get_drvdata(chip->dev); what you refer to is another race, zeroing of dsa_ptr in struct net_device versus the whole network stack, which I addressed in https://lore.kernel.org/netdev/20240910130321.337154-2-alexander.sverdlin@siemens.com/
Hello Vladimir, On Wed, 2024-09-11 at 18:55 +0200, Alexander Sverdlin wrote: > > You've said here that a similar change still does not protect against > > packets received after shutdown: > > https://lore.kernel.org/netdev/c5e0e67400816d68e6bf90b4a999bfa28c59043b.camel@siemens.com/ > > > > The difference between that and this is the extra lan9303_disable_processing_port() > > calls here. But while that does disable RX on switch ports, it still doesn't wait > > for pending RX frames to be processed. So the race is still open. No? besides from the below, I've expected this question... In the meanwhile I've tested mv88e6xxx driver, but it (accidentally) has no MDIO race vs shutdown. After some shallow review of the drivers I didn't find dev_get_drvdata <= mdio_read pattern therefore I've posted this tested patch. If you'd prefer to solve this centrally for all drivers, I can test your patch from the MDIO-drvdata PoV. > This patch addresses the race of zeroing drvdata in > > static void lan9303_mdio_shutdown(struct mdio_device *mdiodev) > { > struct lan9303_mdio *sw_dev = dev_get_drvdata(&mdiodev->dev); > > if (!sw_dev) > return; > > lan9303_shutdown(&sw_dev->chip); > > dev_set_drvdata(&mdiodev->dev, NULL); > } > > versus > > static int lan9303_mdio_phy_read(struct lan9303 *chip, int phy, int reg) > { > struct lan9303_mdio *sw_dev = dev_get_drvdata(chip->dev); > > what you refer to is another race, zeroing of dsa_ptr in struct net_device versus > the whole network stack, which I addressed in > https://lore.kernel.org/netdev/20240910130321.337154-2-alexander.sverdlin@siemens.com/
On Wed, 2024-09-11 at 19:04 +0200, Alexander Sverdlin wrote: > > > The difference between that and this is the extra lan9303_disable_processing_port() > > > calls here. But while that does disable RX on switch ports, it still doesn't wait > > > for pending RX frames to be processed. So the race is still open. No? > > besides from the below, I've expected this question... In the meanwhile I've tested > mv88e6xxx driver, but it (accidentally) has no MDIO race vs shutdown. > After some shallow review of the drivers I didn't find dev_get_drvdata <= mdio_read > pattern therefore I've posted this tested patch. > > If you'd prefer to solve this centrally for all drivers, I can test your patch from > the MDIO-drvdata PoV. But this would mean throwing away the whole "net: dsa: be compatible with masters which unregister on shutdown" work?
On Wed, Sep 11, 2024 at 05:09:08PM +0000, Sverdlin, Alexander wrote: > On Wed, 2024-09-11 at 19:04 +0200, Alexander Sverdlin wrote: > > > > The difference between that and this is the extra lan9303_disable_processing_port() > > > > calls here. But while that does disable RX on switch ports, it still doesn't wait > > > > for pending RX frames to be processed. So the race is still open. No? > > > > besides from the below, I've expected this question... In the meanwhile I've tested > > mv88e6xxx driver, but it (accidentally) has no MDIO race vs shutdown. > > After some shallow review of the drivers I didn't find dev_get_drvdata <= mdio_read > > pattern therefore I've posted this tested patch. > > > > If you'd prefer to solve this centrally for all drivers, I can test your patch from > > the MDIO-drvdata PoV. > > But this would mean throwing away the whole > "net: dsa: be compatible with masters which unregister on shutdown" work? I did not propose anything (yet). I'm still trying to form a mental model of what is broken and what works. Hence the request to test that change. OTOH, this patch is equally throwing away the whole "net: dsa: be compatible with masters which unregister on shutdown" work (for lan9303).
Hi Alexander, On Wed, Sep 11, 2024 at 04:40:03PM +0200, A. Sverdlin wrote: > From: Alexander Sverdlin <alexander.sverdlin@siemens.com> > > dsa_switch_shutdown() doesn't bring down any ports, but only disconnects > slaves from master. Packets still come afterwards into master port and the > ports are being polled for link status. This leads to crashes: > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+ #1 > Workqueue: events_power_efficient phy_state_machine > pc : lan9303_mdio_phy_read > lr : lan9303_phy_read > Call trace: > lan9303_mdio_phy_read > lan9303_phy_read > dsa_slave_phy_read > __mdiobus_read > mdiobus_read > genphy_update_link > genphy_read_status > phy_check_link_status > phy_state_machine > process_one_work > worker_thread > > Call lan9303_remove() instead to really unregister all ports before zeroing > drvdata and dsa_ptr. > > Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown") > Cc: stable@vger.kernel.org > Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> > --- Could you please test this alternative solution (patch attached) for both reported problems? Thanks.
Hi Vladimir! Thank you for the quick fix! On Thu, 2024-09-12 at 13:15 +0300, Vladimir Oltean wrote: > > From: Alexander Sverdlin <alexander.sverdlin@siemens.com> > > > > dsa_switch_shutdown() doesn't bring down any ports, but only disconnects > > slaves from master. Packets still come afterwards into master port and the > > ports are being polled for link status. This leads to crashes: > > > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > > CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+ #1 > > Workqueue: events_power_efficient phy_state_machine > > pc : lan9303_mdio_phy_read > > lr : lan9303_phy_read > > Call trace: > > lan9303_mdio_phy_read > > lan9303_phy_read > > dsa_slave_phy_read > > __mdiobus_read > > mdiobus_read > > genphy_update_link > > genphy_read_status > > phy_check_link_status > > phy_state_machine > > process_one_work > > worker_thread > > > > Call lan9303_remove() instead to really unregister all ports before zeroing > > drvdata and dsa_ptr. > > > > Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown") Do you think it would make sense to add the same Fixes: tag as above? (That's the earlier one of the two) > > Cc: stable@vger.kernel.org > > Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> > > --- > > Could you please test this alternative solution (patch attached) for both reported problems? We had two LAN9303-equipped systems running overnight with PROVE_LOCKING+PROVE_RCU and without, and I also ran couple of reboots with PROVE_LOCKING on a Marvell mv6xxx equipped HW. All of the above for a backport to v6.1, but this part should be OK, I believe. Overall looks very good, you can add my Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Tested-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> when you officially publish the patch.
Hi Alexander, On Fri, Sep 13, 2024 at 04:16:57PM +0000, Sverdlin, Alexander wrote: > > > Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown") > > Do you think it would make sense to add the same Fixes: tag as above? > (That's the earlier one of the two) > > > > Cc: stable@vger.kernel.org > > > Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> > > > --- > > > > Could you please test this alternative solution (patch attached) for both reported problems? > > We had two LAN9303-equipped systems running overnight with PROVE_LOCKING+PROVE_RCU and without, > and I also ran couple of reboots with PROVE_LOCKING on a Marvell mv6xxx equipped HW. > All of the above for a backport to v6.1, but this part should be OK, I believe. > > Overall looks very good, you can add my > Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> > Tested-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> > when you officially publish the patch. Thanks for testing, I really appreciate it. I will add all the required tags for the formal patch submission.
diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c index 268949939636..ecd507355f51 100644 --- a/drivers/net/dsa/lan9303-core.c +++ b/drivers/net/dsa/lan9303-core.c @@ -1477,7 +1477,7 @@ EXPORT_SYMBOL(lan9303_remove); void lan9303_shutdown(struct lan9303 *chip) { - dsa_switch_shutdown(chip->ds); + lan9303_remove(chip); } EXPORT_SYMBOL(lan9303_shutdown);