Message ID | 20230313103653.2753139-3-u.kleine-koenig@pengutronix.de (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: freescale: Convert to platform remove callback returning void | expand |
Context | Check | Description |
---|---|---|
netdev/series_format | success | Posting correctly formatted |
netdev/tree_selection | success | Clearly marked for net-next |
netdev/fixes_present | success | Fixes tag not required for -next series |
netdev/header_inline | success | No static functions without inline keyword in header files |
netdev/build_32bit | success | Errors and warnings before: 18 this patch: 18 |
netdev/cc_maintainers | success | CCed 9 of 9 maintainers |
netdev/build_clang | success | Errors and warnings before: 18 this patch: 18 |
netdev/verify_signedoff | success | Signed-off-by tag matches author and committer |
netdev/deprecated_api | success | None detected |
netdev/check_selftest | success | No net selftest shell script |
netdev/verify_fixes | success | No Fixes tag |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 18 this patch: 18 |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 17 lines checked |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/source_inline | success | Was 0 now: 0 |
On Mon, Mar 13, 2023 at 11:36:46AM +0100, Uwe Kleine-König wrote: > If waking up the device in .remove() fails, exiting early results in > strange state: The platform device will be unbound but not all resources > are freed. E.g. the network device continues to exist without an parent. > > Instead of an early error return, only skip the cleanup that was already > done by suspend and release the remaining resources. > > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> > --- > drivers/net/ethernet/freescale/fec_main.c | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c > index c73e25f8995e..31d1dc5e9196 100644 > --- a/drivers/net/ethernet/freescale/fec_main.c > +++ b/drivers/net/ethernet/freescale/fec_main.c > @@ -4465,15 +4465,13 @@ fec_drv_remove(struct platform_device *pdev) > struct device_node *np = pdev->dev.of_node; > int ret; > > - ret = pm_runtime_resume_and_get(&pdev->dev); > - if (ret < 0) > - return ret; > + ret = pm_runtime_get_sync(&pdev->dev); > > cancel_work_sync(&fep->tx_timeout_work); > fec_ptp_stop(pdev); > unregister_netdev(ndev); > fec_enet_mii_remove(fep); > - if (fep->reg_phy) > + if (ret >= 0 && fep->reg_phy) > regulator_disable(fep->reg_phy); > > if (of_phy_is_fixed_link(np)) I'm not sure this is correct. My experience with the FEC is that if the device is run time suspended, access to the hardware does not work. In the case i was debugging, MDIO bus reads/writes time out. I think IO reads and writes turn into NOPs, but i don't actually know. So if pm_runtime_resume_and_get() fails, fec_ptp_stop() probably does not work if it touches the hardware. I guess fec_enet_mii_remove() unregisters any PHYs, which could cause MDIO bus access to shut down the PHYs, so i expect that also does not work. regulator_disable() probably does actually work because that is a different hardware block unaffected by the suspend. So i think you need to decide: exit immediately if resume fails, leaving dangling PHYs, netdev, regulator etc Keep going, but maybe everything is going to grind to a halt soon afterwards when accessing the hardware. You seem to prefer keep going, so i would also suggest you disable the regulator. Andrew
Hello Andrew, On Mon, Mar 13, 2023 at 04:07:12PM +0100, Andrew Lunn wrote: > On Mon, Mar 13, 2023 at 11:36:46AM +0100, Uwe Kleine-König wrote: > > If waking up the device in .remove() fails, exiting early results in > > strange state: The platform device will be unbound but not all resources > > are freed. E.g. the network device continues to exist without an parent. > > > > Instead of an early error return, only skip the cleanup that was already > > done by suspend and release the remaining resources. > > > > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> > > --- > > drivers/net/ethernet/freescale/fec_main.c | 6 ++---- > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c > > index c73e25f8995e..31d1dc5e9196 100644 > > --- a/drivers/net/ethernet/freescale/fec_main.c > > +++ b/drivers/net/ethernet/freescale/fec_main.c > > @@ -4465,15 +4465,13 @@ fec_drv_remove(struct platform_device *pdev) > > struct device_node *np = pdev->dev.of_node; > > int ret; > > > > - ret = pm_runtime_resume_and_get(&pdev->dev); > > - if (ret < 0) > > - return ret; > > + ret = pm_runtime_get_sync(&pdev->dev); > > > > cancel_work_sync(&fep->tx_timeout_work); > > fec_ptp_stop(pdev); > > unregister_netdev(ndev); > > fec_enet_mii_remove(fep); > > - if (fep->reg_phy) > > + if (ret >= 0 && fep->reg_phy) > > regulator_disable(fep->reg_phy); > > > > if (of_phy_is_fixed_link(np)) > > I'm not sure this is correct. My experience with the FEC is that if > the device is run time suspended, access to the hardware does not > work. In the case i was debugging, MDIO bus reads/writes time out. I > think IO reads and writes turn into NOPs, but i don't actually know. > > So if pm_runtime_resume_and_get() fails, fec_ptp_stop() probably does > not work if it touches the hardware. While creating the patch I checked, and just from looking at the code I'd say it doesn't access hardware. > I guess fec_enet_mii_remove() > unregisters any PHYs, which could cause MDIO bus access to shut down > the PHYs, so i expect that also does not work. fec_enet_mii_remove only calls mdiobus_unregister + mdiobus_free which should be software only?! > regulator_disable() probably does actually work because that is a > different hardware block unaffected by the suspend. fec_suspend() calls if (fep->reg_phy && !(fep->wol_flag & FEC_WOL_FLAG_ENABLE)) regulator_disable(fep->reg_phy); and if resume failed I have to assume that fec_resume() didn't come around reenabling it. So not disabling the regulator in .remove() is correct. > So i think you need to decide: > > exit immediately if resume fails, leaving dangling PHYs, netdev, > regulator etc I think keeping netdev is very prone to surprises. You'd still have eth0 (or however your device is called), it might even work somewhat, or it might oops because devm_platform_ioremap_resource is undone. > Keep going, but maybe everything is going to grind to a halt soon > afterwards when accessing the hardware. > > You seem to prefer keep going, so i would also suggest you disable the > regulator. (Described above why I didn't.) Best regards Uwe
> > > - ret = pm_runtime_resume_and_get(&pdev->dev); > > > - if (ret < 0) > > > - return ret; > > regulator_disable() probably does actually work because that is a > > different hardware block unaffected by the suspend. > fec_suspend() calls > > if (fep->reg_phy && !(fep->wol_flag & FEC_WOL_FLAG_ENABLE)) > regulator_disable(fep->reg_phy); There are two different types of suspend here. pm_runtime_resume_and_get() is about runtime suspend. It calls fec_runtime_suspend() which just turns some clocks off/on. fec_suspend() is for system sleep, where the whole system is put to sleep, except what is needed to trigger a wake up, such as Wake on LAN. The regulator is being used to power the PHY, so you obviously don't want to turn the PHY off when doing WoL. But if you are unloading the FEC, WoL is not going to work, so you should turn the PHY off. And turning the PHY off should not have any dependencies on first turning on FEC clocks in pm_runtime_resume_and_get(). Andrew
On Tue, Mar 14, 2023 at 01:30:35AM +0100, Andrew Lunn wrote: > > > > - ret = pm_runtime_resume_and_get(&pdev->dev); > > > > - if (ret < 0) > > > > - return ret; > > > regulator_disable() probably does actually work because that is a > > > different hardware block unaffected by the suspend. > > > fec_suspend() calls > > > > if (fep->reg_phy && !(fep->wol_flag & FEC_WOL_FLAG_ENABLE)) > > regulator_disable(fep->reg_phy); > > There are two different types of suspend here. > > pm_runtime_resume_and_get() is about runtime suspend. It calls > fec_runtime_suspend() which just turns some clocks off/on. Oh indeed. I won't resend this series in this cycle with this issue fixed. Converting all other drivers to .remove_new() will take quite some time I guess so there is no pressure. If you apply patches 1, 3, 5 - 9 anyhow that would be great. (Patch 4 depends on this one.) I'll address the fec driver at a later point in time. Best regards Uwe
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index c73e25f8995e..31d1dc5e9196 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -4465,15 +4465,13 @@ fec_drv_remove(struct platform_device *pdev) struct device_node *np = pdev->dev.of_node; int ret; - ret = pm_runtime_resume_and_get(&pdev->dev); - if (ret < 0) - return ret; + ret = pm_runtime_get_sync(&pdev->dev); cancel_work_sync(&fep->tx_timeout_work); fec_ptp_stop(pdev); unregister_netdev(ndev); fec_enet_mii_remove(fep); - if (fep->reg_phy) + if (ret >= 0 && fep->reg_phy) regulator_disable(fep->reg_phy); if (of_phy_is_fixed_link(np))
If waking up the device in .remove() fails, exiting early results in strange state: The platform device will be unbound but not all resources are freed. E.g. the network device continues to exist without an parent. Instead of an early error return, only skip the cleanup that was already done by suspend and release the remaining resources. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> --- drivers/net/ethernet/freescale/fec_main.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)