diff mbox series

[net] net: dsa: lan9303: avoid dsa_switch_shutdown()

Message ID 20240911144006.48481-1-alexander.sverdlin@siemens.com (mailing list archive)
State Superseded, archived
Delegated to: Netdev Maintainers
Headers show
Series [net] net: dsa: lan9303: avoid dsa_switch_shutdown() | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 16 this patch: 16
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-09-12--06-00 (tests: 764)

Commit Message

Sverdlin, Alexander Sept. 11, 2024, 2:40 p.m. UTC
From: Alexander Sverdlin <alexander.sverdlin@siemens.com>

dsa_switch_shutdown() doesn't bring down any ports, but only disconnects
slaves from master. Packets still come afterwards into master port and the
ports are being polled for link status. This leads to crashes:

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+ #1
Workqueue: events_power_efficient phy_state_machine
pc : lan9303_mdio_phy_read
lr : lan9303_phy_read
Call trace:
 lan9303_mdio_phy_read
 lan9303_phy_read
 dsa_slave_phy_read
 __mdiobus_read
 mdiobus_read
 genphy_update_link
 genphy_read_status
 phy_check_link_status
 phy_state_machine
 process_one_work
 worker_thread

Call lan9303_remove() instead to really unregister all ports before zeroing
drvdata and dsa_ptr.

Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown")
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
---
 drivers/net/dsa/lan9303-core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Vladimir Oltean Sept. 11, 2024, 4:28 p.m. UTC | #1
Hi Alexander,

On Wed, Sep 11, 2024 at 04:40:03PM +0200, A. Sverdlin wrote:
> From: Alexander Sverdlin <alexander.sverdlin@siemens.com>
> 
> dsa_switch_shutdown() doesn't bring down any ports, but only disconnects
> slaves from master. Packets still come afterwards into master port and the
> ports are being polled for link status. This leads to crashes:
> 
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+ #1
> Workqueue: events_power_efficient phy_state_machine
> pc : lan9303_mdio_phy_read
> lr : lan9303_phy_read
> Call trace:
>  lan9303_mdio_phy_read
>  lan9303_phy_read
>  dsa_slave_phy_read
>  __mdiobus_read
>  mdiobus_read
>  genphy_update_link
>  genphy_read_status
>  phy_check_link_status
>  phy_state_machine
>  process_one_work
>  worker_thread
> 
> Call lan9303_remove() instead to really unregister all ports before zeroing
> drvdata and dsa_ptr.
> 
> Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown")
> Cc: stable@vger.kernel.org
> Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
> ---
>  drivers/net/dsa/lan9303-core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
> index 268949939636..ecd507355f51 100644
> --- a/drivers/net/dsa/lan9303-core.c
> +++ b/drivers/net/dsa/lan9303-core.c
> @@ -1477,7 +1477,7 @@ EXPORT_SYMBOL(lan9303_remove);
>  
>  void lan9303_shutdown(struct lan9303 *chip)
>  {
> -	dsa_switch_shutdown(chip->ds);
> +	lan9303_remove(chip);
>  }
>  EXPORT_SYMBOL(lan9303_shutdown);
>  
> -- 
> 2.46.0
> 

You've said here that a similar change still does not protect against
packets received after shutdown:
https://lore.kernel.org/netdev/c5e0e67400816d68e6bf90b4a999bfa28c59043b.camel@siemens.com/

The difference between that and this is the extra lan9303_disable_processing_port()
calls here. But while that does disable RX on switch ports, it still doesn't wait
for pending RX frames to be processed. So the race is still open. No?
Sverdlin, Alexander Sept. 11, 2024, 4:55 p.m. UTC | #2
Hello Vladimir!

On Wed, 2024-09-11 at 19:28 +0300, Vladimir Oltean wrote:
> On Wed, Sep 11, 2024 at 04:40:03PM +0200, A. Sverdlin wrote:
> > From: Alexander Sverdlin <alexander.sverdlin@siemens.com>
> > 
> > dsa_switch_shutdown() doesn't bring down any ports, but only disconnects
> > slaves from master. Packets still come afterwards into master port and the
> > ports are being polled for link status. This leads to crashes:
> > 
> > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> > CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+ #1
> > Workqueue: events_power_efficient phy_state_machine
> > pc : lan9303_mdio_phy_read
> > lr : lan9303_phy_read
> > Call trace:
> >   lan9303_mdio_phy_read
> >   lan9303_phy_read
> >   dsa_slave_phy_read
> >   __mdiobus_read
> >   mdiobus_read
> >   genphy_update_link
> >   genphy_read_status
> >   phy_check_link_status
> >   phy_state_machine
> >   process_one_work
> >   worker_thread
> > 
> > Call lan9303_remove() instead to really unregister all ports before zeroing
> > drvdata and dsa_ptr.
> > 
> > Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
> > ---
> >   drivers/net/dsa/lan9303-core.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
> > index 268949939636..ecd507355f51 100644
> > --- a/drivers/net/dsa/lan9303-core.c
> > +++ b/drivers/net/dsa/lan9303-core.c
> > @@ -1477,7 +1477,7 @@ EXPORT_SYMBOL(lan9303_remove);
> >   
> >   void lan9303_shutdown(struct lan9303 *chip)
> >   {
> > -	dsa_switch_shutdown(chip->ds);
> > +	lan9303_remove(chip);
> >   }
> >   EXPORT_SYMBOL(lan9303_shutdown);
> >   
> > -- 
> > 2.46.0
> > 
> 
> You've said here that a similar change still does not protect against
> packets received after shutdown:
> https://lore.kernel.org/netdev/c5e0e67400816d68e6bf90b4a999bfa28c59043b.camel@siemens.com/
> 
> The difference between that and this is the extra lan9303_disable_processing_port()
> calls here. But while that does disable RX on switch ports, it still doesn't wait
> for pending RX frames to be processed. So the race is still open. No?

This patch addresses the race of zeroing drvdata in

static void lan9303_mdio_shutdown(struct mdio_device *mdiodev)
{
        struct lan9303_mdio *sw_dev = dev_get_drvdata(&mdiodev->dev);
        
        if (!sw_dev)
                return;
 
        lan9303_shutdown(&sw_dev->chip);
        
        dev_set_drvdata(&mdiodev->dev, NULL);
}

versus 

static int lan9303_mdio_phy_read(struct lan9303 *chip, int phy,  int reg)
{
        struct lan9303_mdio *sw_dev = dev_get_drvdata(chip->dev);

what you refer to is another race, zeroing of dsa_ptr in struct net_device versus
the whole network stack, which I addressed in
https://lore.kernel.org/netdev/20240910130321.337154-2-alexander.sverdlin@siemens.com/
Sverdlin, Alexander Sept. 11, 2024, 5:04 p.m. UTC | #3
Hello Vladimir,

On Wed, 2024-09-11 at 18:55 +0200, Alexander Sverdlin wrote:
> > You've said here that a similar change still does not protect against
> > packets received after shutdown:
> > https://lore.kernel.org/netdev/c5e0e67400816d68e6bf90b4a999bfa28c59043b.camel@siemens.com/
> > 
> > The difference between that and this is the extra lan9303_disable_processing_port()
> > calls here. But while that does disable RX on switch ports, it still doesn't wait
> > for pending RX frames to be processed. So the race is still open. No?

besides from the below, I've expected this question... In the meanwhile I've tested
mv88e6xxx driver, but it (accidentally) has no MDIO race vs shutdown.
After some shallow review of the drivers I didn't find dev_get_drvdata <= mdio_read
pattern therefore I've posted this tested patch.

If you'd prefer to solve this centrally for all drivers, I can test your patch from
the MDIO-drvdata PoV.

> This patch addresses the race of zeroing drvdata in
> 
> static void lan9303_mdio_shutdown(struct mdio_device *mdiodev)
> {
>         struct lan9303_mdio *sw_dev = dev_get_drvdata(&mdiodev->dev);
>         
>         if (!sw_dev)
>                 return;
>  
>         lan9303_shutdown(&sw_dev->chip);
>         
>         dev_set_drvdata(&mdiodev->dev, NULL);
> }
> 
> versus 
> 
> static int lan9303_mdio_phy_read(struct lan9303 *chip, int phy,  int reg)
> {
>         struct lan9303_mdio *sw_dev = dev_get_drvdata(chip->dev);
> 
> what you refer to is another race, zeroing of dsa_ptr in struct net_device versus
> the whole network stack, which I addressed in
> https://lore.kernel.org/netdev/20240910130321.337154-2-alexander.sverdlin@siemens.com/
Sverdlin, Alexander Sept. 11, 2024, 5:09 p.m. UTC | #4
On Wed, 2024-09-11 at 19:04 +0200, Alexander Sverdlin wrote:
> > > The difference between that and this is the extra lan9303_disable_processing_port()
> > > calls here. But while that does disable RX on switch ports, it still doesn't wait
> > > for pending RX frames to be processed. So the race is still open. No?
> 
> besides from the below, I've expected this question... In the meanwhile I've tested
> mv88e6xxx driver, but it (accidentally) has no MDIO race vs shutdown.
> After some shallow review of the drivers I didn't find dev_get_drvdata <= mdio_read
> pattern therefore I've posted this tested patch.
> 
> If you'd prefer to solve this centrally for all drivers, I can test your patch from
> the MDIO-drvdata PoV.

But this would mean throwing away the whole
"net: dsa: be compatible with masters which unregister on shutdown" work?
Vladimir Oltean Sept. 11, 2024, 5:19 p.m. UTC | #5
On Wed, Sep 11, 2024 at 05:09:08PM +0000, Sverdlin, Alexander wrote:
> On Wed, 2024-09-11 at 19:04 +0200, Alexander Sverdlin wrote:
> > > > The difference between that and this is the extra lan9303_disable_processing_port()
> > > > calls here. But while that does disable RX on switch ports, it still doesn't wait
> > > > for pending RX frames to be processed. So the race is still open. No?
> > 
> > besides from the below, I've expected this question... In the meanwhile I've tested
> > mv88e6xxx driver, but it (accidentally) has no MDIO race vs shutdown.
> > After some shallow review of the drivers I didn't find dev_get_drvdata <= mdio_read
> > pattern therefore I've posted this tested patch.
> > 
> > If you'd prefer to solve this centrally for all drivers, I can test your patch from
> > the MDIO-drvdata PoV.
> 
> But this would mean throwing away the whole
> "net: dsa: be compatible with masters which unregister on shutdown" work?

I did not propose anything (yet). I'm still trying to form a mental model
of what is broken and what works. Hence the request to test that change.

OTOH, this patch is equally throwing away the whole "net: dsa: be
compatible with masters which unregister on shutdown" work (for lan9303).
Vladimir Oltean Sept. 12, 2024, 10:15 a.m. UTC | #6
Hi Alexander,

On Wed, Sep 11, 2024 at 04:40:03PM +0200, A. Sverdlin wrote:
> From: Alexander Sverdlin <alexander.sverdlin@siemens.com>
> 
> dsa_switch_shutdown() doesn't bring down any ports, but only disconnects
> slaves from master. Packets still come afterwards into master port and the
> ports are being polled for link status. This leads to crashes:
> 
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+ #1
> Workqueue: events_power_efficient phy_state_machine
> pc : lan9303_mdio_phy_read
> lr : lan9303_phy_read
> Call trace:
>  lan9303_mdio_phy_read
>  lan9303_phy_read
>  dsa_slave_phy_read
>  __mdiobus_read
>  mdiobus_read
>  genphy_update_link
>  genphy_read_status
>  phy_check_link_status
>  phy_state_machine
>  process_one_work
>  worker_thread
> 
> Call lan9303_remove() instead to really unregister all ports before zeroing
> drvdata and dsa_ptr.
> 
> Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown")
> Cc: stable@vger.kernel.org
> Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
> ---

Could you please test this alternative solution (patch attached) for both reported problems?

Thanks.
Sverdlin, Alexander Sept. 13, 2024, 4:16 p.m. UTC | #7
Hi Vladimir!

Thank you for the quick fix!

On Thu, 2024-09-12 at 13:15 +0300, Vladimir Oltean wrote:
> > From: Alexander Sverdlin <alexander.sverdlin@siemens.com>
> > 
> > dsa_switch_shutdown() doesn't bring down any ports, but only disconnects
> > slaves from master. Packets still come afterwards into master port and the
> > ports are being polled for link status. This leads to crashes:
> > 
> > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> > CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+ #1
> > Workqueue: events_power_efficient phy_state_machine
> > pc : lan9303_mdio_phy_read
> > lr : lan9303_phy_read
> > Call trace:
> >   lan9303_mdio_phy_read
> >   lan9303_phy_read
> >   dsa_slave_phy_read
> >   __mdiobus_read
> >   mdiobus_read
> >   genphy_update_link
> >   genphy_read_status
> >   phy_check_link_status
> >   phy_state_machine
> >   process_one_work
> >   worker_thread
> > 
> > Call lan9303_remove() instead to really unregister all ports before zeroing
> > drvdata and dsa_ptr.
> > 
> > Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown")

Do you think it would make sense to add the same Fixes: tag as above?
(That's the earlier one of the two)

> > Cc: stable@vger.kernel.org
> > Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
> > ---
> 
> Could you please test this alternative solution (patch attached) for both reported problems?

We had two LAN9303-equipped systems running overnight with PROVE_LOCKING+PROVE_RCU and without,
and I also ran couple of reboots with PROVE_LOCKING on a Marvell mv6xxx equipped HW.
All of the above for a backport to v6.1, but this part should be OK, I believe.

Overall looks very good, you can add my
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Tested-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> 
when you officially publish the patch.
Vladimir Oltean Sept. 13, 2024, 6:57 p.m. UTC | #8
Hi Alexander,

On Fri, Sep 13, 2024 at 04:16:57PM +0000, Sverdlin, Alexander wrote:
> > > Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown")
> 
> Do you think it would make sense to add the same Fixes: tag as above?
> (That's the earlier one of the two)
> 
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
> > > ---
> > 
> > Could you please test this alternative solution (patch attached) for both reported problems?
> 
> We had two LAN9303-equipped systems running overnight with PROVE_LOCKING+PROVE_RCU and without,
> and I also ran couple of reboots with PROVE_LOCKING on a Marvell mv6xxx equipped HW.
> All of the above for a backport to v6.1, but this part should be OK, I believe.
> 
> Overall looks very good, you can add my
> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
> Tested-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> 
> when you officially publish the patch.

Thanks for testing, I really appreciate it.

I will add all the required tags for the formal patch submission.
diff mbox series

Patch

diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index 268949939636..ecd507355f51 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -1477,7 +1477,7 @@  EXPORT_SYMBOL(lan9303_remove);
 
 void lan9303_shutdown(struct lan9303 *chip)
 {
-	dsa_switch_shutdown(chip->ds);
+	lan9303_remove(chip);
 }
 EXPORT_SYMBOL(lan9303_shutdown);