Message ID | 20240212105010.2258421-1-john.ernberg@actia.se (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] net: fec: Always call fec_restart() in resume path | expand |
On Mon, 12 Feb 2024 10:50:30 +0000 John Ernberg wrote: > Tested on 6.1 kernel and forward ported. I discovered this when we > upgraded from 5.10 to 6.1, but the resume path in the FEC driver has had > this imbalance since at least 2009. > > This is also why I target the -next tree, I can't identify a proper commit > to blame with a Fixes. Let me know if this should be the net tree anyway. I thought you bisected it to one or two specific changes? I'd put those down as Fixes tags and target net.
On 2/14/24 03:44, Jakub Kicinski wrote: > On Mon, 12 Feb 2024 10:50:30 +0000 John Ernberg wrote: >> Tested on 6.1 kernel and forward ported. I discovered this when we >> upgraded from 5.10 to 6.1, but the resume path in the FEC driver has had >> this imbalance since at least 2009. >> >> This is also why I target the -next tree, I can't identify a proper commit >> to blame with a Fixes. Let me know if this should be the net tree anyway. > > I thought you bisected it to one or two specific changes? > I'd put those down as Fixes tags and target net. Hi Jakub, You are correct, we thought so too at [1], but bisection is really hard because we need a whole bunch of patches on top to even boot the system (imx8qxp specific stuff in the NXP vendor tree that's difficult to rebase), we left it a bit open ended. Over the course of the weekend I lost all confidence in my bisection after being confident for 4-5 days, because the more I thought about it the less it made sense for that commit to be the culprit. I should probably have both followed up on that mail with that, and been clearer here. I apologize for failing that. Best regards // John Ernberg [1]: https://lore.kernel.org/netdev/1f45bdbe-eab1-4e59-8f24-add177590d27@actia.se/
On Wed, 14 Feb 2024 08:27:02 +0000 John Ernberg wrote: > You are correct, we thought so too at [1], but bisection is really hard > because we need a whole bunch of patches on top to even boot the system > (imx8qxp specific stuff in the NXP vendor tree that's difficult to > rebase), we left it a bit open ended. > > Over the course of the weekend I lost all confidence in my bisection > after being confident for 4-5 days, because the more I thought about it > the less it made sense for that commit to be the culprit. > > I should probably have both followed up on that mail with that, and been > clearer here. I apologize for failing that. Is it perhaps possible that upstream 5.10 also didn't work? I'm not saying the change itself is incorrect, indeed there is fec_restart() on probe and open paths, as you say. Did you try reverting as many of the changes that happened in the meantime as possible (instead of bisection)? The other question is whether we need to enable any of the clocks or runtime resume before calling fec_restart()?
On 2/14/24 15:52, Jakub Kicinski wrote: > On Wed, 14 Feb 2024 08:27:02 +0000 John Ernberg wrote: >> You are correct, we thought so too at [1], but bisection is really hard >> because we need a whole bunch of patches on top to even boot the system >> (imx8qxp specific stuff in the NXP vendor tree that's difficult to >> rebase), we left it a bit open ended. >> >> Over the course of the weekend I lost all confidence in my bisection >> after being confident for 4-5 days, because the more I thought about it >> the less it made sense for that commit to be the culprit. >> >> I should probably have both followed up on that mail with that, and been >> clearer here. I apologize for failing that. > > Is it perhaps possible that upstream 5.10 also didn't work? > I'm not saying the change itself is incorrect, indeed there > is fec_restart() on probe and open paths, as you say. > Did you try reverting as many of the changes that happened > in the meantime as possible (instead of bisection)? > That's a really good point. I'll make some time for this in the next weeks. Please mark it with changes requested in the meantime, as I expect to make changes to the patch when I have a result. > The other question is whether we need to enable any of the > clocks or runtime resume before calling fec_restart()? On our board it works fine without it, I don't know enough about this SoC or other NXP SoCs to know if it's necessary in other situations. The clocks are re-enabled in the open call which appears to be enough to get traffic going again when the link is brought up. Perhaps NXP can fill us in? Thanks! // John Ernberg
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 42bdc01a304e..e6804c068d6b 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -4706,6 +4706,8 @@ static int __maybe_unused fec_resume(struct device *dev) napi_enable(&fep->napi); phy_init_hw(ndev->phydev); phy_start(ndev->phydev); + } else { + fec_restart(ndev); } rtnl_unlock();
When trying to resume from suspend the following can be observed: fec 5b040000.ethernet eth0: MDIO read timeout Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: dpm_run_callback(): mdio_bus_phy_resume+0x0/0xc8 returns -110 Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: failed to resume: error -110 This is because the MAC is left powered down after resuming from suspend. The MAC is brought up in both probe and open, so leaving it off in resume from suspend is an imbalance. This imbalance combined with a LAN8700R that is permanently powered results in unusuable networking if the board would happen to suspend before the link is brought up, and the only way to get out of it would be a full power cycle. NOTE: With this change the PHY ends up taking different resume paths when the link has never been up compared to once the link has been up. Currently the resume process is identical and just happens at different times, so this *should* not have any unforseen consequences. Signed-off-by: John Ernberg <john.ernberg@actia.se> --- Tested on 6.1 kernel and forward ported. I discovered this when we upgraded from 5.10 to 6.1, but the resume path in the FEC driver has had this imbalance since at least 2009. This is also why I target the -next tree, I can't identify a proper commit to blame with a Fixes. Let me know if this should be the net tree anyway. drivers/net/ethernet/freescale/fec_main.c | 2 ++ 1 file changed, 2 insertions(+)