diff mbox series

[net-next,2/9] net: fec: Don't return early on error in .remove()

Message ID 20230313103653.2753139-3-u.kleine-koenig@pengutronix.de (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series net: freescale: Convert to platform remove callback returning void | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 18 this patch: 18
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 18 this patch: 18
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 18 this patch: 18
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 17 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Uwe Kleine-König March 13, 2023, 10:36 a.m. UTC
If waking up the device in .remove() fails, exiting early results in
strange state: The platform device will be unbound but not all resources
are freed. E.g. the network device continues to exist without an parent.

Instead of an early error return, only skip the cleanup that was already
done by suspend and release the remaining resources.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
---
 drivers/net/ethernet/freescale/fec_main.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

Comments

Andrew Lunn March 13, 2023, 3:07 p.m. UTC | #1
On Mon, Mar 13, 2023 at 11:36:46AM +0100, Uwe Kleine-König wrote:
> If waking up the device in .remove() fails, exiting early results in
> strange state: The platform device will be unbound but not all resources
> are freed. E.g. the network device continues to exist without an parent.
> 
> Instead of an early error return, only skip the cleanup that was already
> done by suspend and release the remaining resources.
> 
> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> ---
>  drivers/net/ethernet/freescale/fec_main.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
> index c73e25f8995e..31d1dc5e9196 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -4465,15 +4465,13 @@ fec_drv_remove(struct platform_device *pdev)
>  	struct device_node *np = pdev->dev.of_node;
>  	int ret;
>  
> -	ret = pm_runtime_resume_and_get(&pdev->dev);
> -	if (ret < 0)
> -		return ret;
> +	ret = pm_runtime_get_sync(&pdev->dev);
>  
>  	cancel_work_sync(&fep->tx_timeout_work);
>  	fec_ptp_stop(pdev);
>  	unregister_netdev(ndev);
>  	fec_enet_mii_remove(fep);
> -	if (fep->reg_phy)
> +	if (ret >= 0 && fep->reg_phy)
>  		regulator_disable(fep->reg_phy);
>  
>  	if (of_phy_is_fixed_link(np))

I'm not sure this is correct. My experience with the FEC is that if
the device is run time suspended, access to the hardware does not
work. In the case i was debugging, MDIO bus reads/writes time out. I
think IO reads and writes turn into NOPs, but i don't actually know.

So if pm_runtime_resume_and_get() fails, fec_ptp_stop() probably does
not work if it touches the hardware. I guess fec_enet_mii_remove()
unregisters any PHYs, which could cause MDIO bus access to shut down
the PHYs, so i expect that also does not work. regulator_disable()
probably does actually work because that is a different hardware block
unaffected by the suspend.

So i think you need to decide:

exit immediately if resume fails, leaving dangling PHYs, netdev,
regulator etc

Keep going, but maybe everything is going to grind to a halt soon
afterwards when accessing the hardware.

You seem to prefer keep going, so i would also suggest you disable the
regulator.

	   Andrew
Uwe Kleine-König March 13, 2023, 4:21 p.m. UTC | #2
Hello Andrew,

On Mon, Mar 13, 2023 at 04:07:12PM +0100, Andrew Lunn wrote:
> On Mon, Mar 13, 2023 at 11:36:46AM +0100, Uwe Kleine-König wrote:
> > If waking up the device in .remove() fails, exiting early results in
> > strange state: The platform device will be unbound but not all resources
> > are freed. E.g. the network device continues to exist without an parent.
> > 
> > Instead of an early error return, only skip the cleanup that was already
> > done by suspend and release the remaining resources.
> > 
> > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> > ---
> >  drivers/net/ethernet/freescale/fec_main.c | 6 ++----
> >  1 file changed, 2 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
> > index c73e25f8995e..31d1dc5e9196 100644
> > --- a/drivers/net/ethernet/freescale/fec_main.c
> > +++ b/drivers/net/ethernet/freescale/fec_main.c
> > @@ -4465,15 +4465,13 @@ fec_drv_remove(struct platform_device *pdev)
> >  	struct device_node *np = pdev->dev.of_node;
> >  	int ret;
> >  
> > -	ret = pm_runtime_resume_and_get(&pdev->dev);
> > -	if (ret < 0)
> > -		return ret;
> > +	ret = pm_runtime_get_sync(&pdev->dev);
> >  
> >  	cancel_work_sync(&fep->tx_timeout_work);
> >  	fec_ptp_stop(pdev);
> >  	unregister_netdev(ndev);
> >  	fec_enet_mii_remove(fep);
> > -	if (fep->reg_phy)
> > +	if (ret >= 0 && fep->reg_phy)
> >  		regulator_disable(fep->reg_phy);
> >  
> >  	if (of_phy_is_fixed_link(np))
> 
> I'm not sure this is correct. My experience with the FEC is that if
> the device is run time suspended, access to the hardware does not
> work. In the case i was debugging, MDIO bus reads/writes time out. I
> think IO reads and writes turn into NOPs, but i don't actually know.
> 
> So if pm_runtime_resume_and_get() fails, fec_ptp_stop() probably does
> not work if it touches the hardware.

While creating the patch I checked, and just from looking at the code
I'd say it doesn't access hardware.

> I guess fec_enet_mii_remove()
> unregisters any PHYs, which could cause MDIO bus access to shut down
> the PHYs, so i expect that also does not work.

fec_enet_mii_remove only calls mdiobus_unregister + mdiobus_free which
should be software only?!

> regulator_disable() probably does actually work because that is a
> different hardware block unaffected by the suspend.

fec_suspend() calls

        if (fep->reg_phy && !(fep->wol_flag & FEC_WOL_FLAG_ENABLE))
                regulator_disable(fep->reg_phy);

and if resume failed I have to assume that fec_resume() didn't come
around reenabling it. So not disabling the regulator in .remove() is
correct.

> So i think you need to decide:
> 
> exit immediately if resume fails, leaving dangling PHYs, netdev,
> regulator etc

I think keeping netdev is very prone to surprises. You'd still have eth0
(or however your device is called), it might even work somewhat, or it
might oops because devm_platform_ioremap_resource is undone.
 
> Keep going, but maybe everything is going to grind to a halt soon
> afterwards when accessing the hardware.
> 
> You seem to prefer keep going, so i would also suggest you disable the
> regulator.

(Described above why I didn't.)

Best regards
Uwe
Andrew Lunn March 14, 2023, 12:30 a.m. UTC | #3
> > > -	ret = pm_runtime_resume_and_get(&pdev->dev);
> > > -	if (ret < 0)
> > > -		return ret;
> > regulator_disable() probably does actually work because that is a
> > different hardware block unaffected by the suspend.
 
> fec_suspend() calls
> 
>         if (fep->reg_phy && !(fep->wol_flag & FEC_WOL_FLAG_ENABLE))
>                 regulator_disable(fep->reg_phy);

There are two different types of suspend here.

pm_runtime_resume_and_get() is about runtime suspend. It calls
fec_runtime_suspend() which just turns some clocks off/on.

fec_suspend() is for system sleep, where the whole system is put to
sleep, except what is needed to trigger a wake up, such as Wake on
LAN. The regulator is being used to power the PHY, so you obviously
don't want to turn the PHY off when doing WoL.

But if you are unloading the FEC, WoL is not going to work, so you
should turn the PHY off. And turning the PHY off should not have any
dependencies on first turning on FEC clocks in
pm_runtime_resume_and_get().

    Andrew
Uwe Kleine-König March 14, 2023, 10:13 p.m. UTC | #4
On Tue, Mar 14, 2023 at 01:30:35AM +0100, Andrew Lunn wrote:
> > > > -	ret = pm_runtime_resume_and_get(&pdev->dev);
> > > > -	if (ret < 0)
> > > > -		return ret;
> > > regulator_disable() probably does actually work because that is a
> > > different hardware block unaffected by the suspend.
>  
> > fec_suspend() calls
> > 
> >         if (fep->reg_phy && !(fep->wol_flag & FEC_WOL_FLAG_ENABLE))
> >                 regulator_disable(fep->reg_phy);
> 
> There are two different types of suspend here.
> 
> pm_runtime_resume_and_get() is about runtime suspend. It calls
> fec_runtime_suspend() which just turns some clocks off/on.

Oh indeed.

I won't resend this series in this cycle with this issue fixed.
Converting all other drivers to .remove_new() will take quite some time
I guess so there is no pressure. If you apply patches 1, 3, 5 - 9 anyhow
that would be great. (Patch 4 depends on this one.) I'll address the fec
driver at a later point in time.

Best regards
Uwe
diff mbox series

Patch

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index c73e25f8995e..31d1dc5e9196 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -4465,15 +4465,13 @@  fec_drv_remove(struct platform_device *pdev)
 	struct device_node *np = pdev->dev.of_node;
 	int ret;
 
-	ret = pm_runtime_resume_and_get(&pdev->dev);
-	if (ret < 0)
-		return ret;
+	ret = pm_runtime_get_sync(&pdev->dev);
 
 	cancel_work_sync(&fep->tx_timeout_work);
 	fec_ptp_stop(pdev);
 	unregister_netdev(ndev);
 	fec_enet_mii_remove(fep);
-	if (fep->reg_phy)
+	if (ret >= 0 && fep->reg_phy)
 		regulator_disable(fep->reg_phy);
 
 	if (of_phy_is_fixed_link(np))