Message ID | Z8m-CRucPxDW5zZK@shell.armlinux.org.uk (mailing list archive) |
---|---|
Headers | show |
Series | net: stmmac: approach 2 to solve EEE LPI reset issues | expand |
Hi Russell, On 06/03/2025 15:23, Russell King (Oracle) wrote: > Hi, > > This is a second approach to solving the STMMAC reset issues caused by > the lack of receive clock from the PHY where the media is in low power > mode with a PHY that supports receive clock-stop. > > The first approach centred around only addressing the issue in the > resume path, but it seems to also happen when the platform glue module > is removed and re-inserted (Jon - can you check whether that's also > the case for you please?) > > As this is more targetted, I've dropped the patches from this series > which move the call to phylink_resume(), so the link may still come > up too early on resume - but that's something I also intend to fix. > > This is experimental - so I value test reports for this change. The subject indicates 3 patches, but I only see 2 patches? Can you confirm if there are 2 or 3? So far I have only tested to resume case with the 2 patches to make that that is working but on Tegra186, which has been the most problematic, it is not working reliably on top of next-20250305. Cheers, Jon
On Fri, Mar 07, 2025 at 04:11:19PM +0000, Jon Hunter wrote: > Hi Russell, > > On 06/03/2025 15:23, Russell King (Oracle) wrote: > > Hi, > > > > This is a second approach to solving the STMMAC reset issues caused by > > the lack of receive clock from the PHY where the media is in low power > > mode with a PHY that supports receive clock-stop. > > > > The first approach centred around only addressing the issue in the > > resume path, but it seems to also happen when the platform glue module > > is removed and re-inserted (Jon - can you check whether that's also > > the case for you please?) > > > > As this is more targetted, I've dropped the patches from this series > > which move the call to phylink_resume(), so the link may still come > > up too early on resume - but that's something I also intend to fix. > > > > This is experimental - so I value test reports for this change. > > > The subject indicates 3 patches, but I only see 2 patches? Can you confirm > if there are 2 or 3? Yes, 2 patches is correct. > So far I have only tested to resume case with the 2 patches to make that > that is working but on Tegra186, which has been the most problematic, it is > not working reliably on top of next-20250305. To confirm, you're seeing stmmac_reset() sporadically timing out on resume even with these patches appled? That's rather disappointing. Do either of the two attached diffs make any difference? Thanks for testing!
On 07/03/2025 17:07, Russell King (Oracle) wrote: > On Fri, Mar 07, 2025 at 04:11:19PM +0000, Jon Hunter wrote: >> Hi Russell, >> >> On 06/03/2025 15:23, Russell King (Oracle) wrote: >>> Hi, >>> >>> This is a second approach to solving the STMMAC reset issues caused by >>> the lack of receive clock from the PHY where the media is in low power >>> mode with a PHY that supports receive clock-stop. >>> >>> The first approach centred around only addressing the issue in the >>> resume path, but it seems to also happen when the platform glue module >>> is removed and re-inserted (Jon - can you check whether that's also >>> the case for you please?) >>> >>> As this is more targetted, I've dropped the patches from this series >>> which move the call to phylink_resume(), so the link may still come >>> up too early on resume - but that's something I also intend to fix. >>> >>> This is experimental - so I value test reports for this change. >> >> >> The subject indicates 3 patches, but I only see 2 patches? Can you confirm >> if there are 2 or 3? > > Yes, 2 patches is correct. > >> So far I have only tested to resume case with the 2 patches to make that >> that is working but on Tegra186, which has been the most problematic, it is >> not working reliably on top of next-20250305. > > To confirm, you're seeing stmmac_reset() sporadically timing out on > resume even with these patches appled? That's rather disappointing. So I am no longer seeing the reset fail, from what I can see, but now NFS is not responding after resume ... [ 49.825094] Enabling non-boot CPUs ... [ 49.829760] Detected PIPT I-cache on CPU1 [ 49.832694] CPU features: SANITY CHECK: Unexpected variation in SYS_CTR_EL0. Boot CPU: 0x0000008444c004, CPU1: 0x0000009444c004 [ 49.844120] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64DFR0_EL1. Boot CPU: 0x00000010305106, CPU1: 0x00000010305116 [ 49.856231] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_DFR0_EL1. Boot CPU: 0x00000003010066, CPU1: 0x00000003001066 [ 49.868081] CPU1: Booted secondary processor 0x0000000000 [0x4e0f0030] [ 49.875389] CPU1 is up [ 49.877187] Detected PIPT I-cache on CPU2 [ 49.880824] CPU features: SANITY CHECK: Unexpected variation in SYS_CTR_EL0. Boot CPU: 0x0000008444c004, CPU2: 0x0000009444c004 [ 49.892266] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64DFR0_EL1. Boot CPU: 0x00000010305106, CPU2: 0x00000010305116 [ 49.904467] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_DFR0_EL1. Boot CPU: 0x00000003010066, CPU2: 0x00000003001066 [ 49.916257] CPU2: Booted secondary processor 0x0000000001 [0x4e0f0030] [ 49.923610] CPU2 is up [ 49.925194] Detected PIPT I-cache on CPU3 [ 49.929010] CPU3: Booted secondary processor 0x0000000101 [0x411fd073] [ 49.935866] CPU3 is up [ 49.937983] Detected PIPT I-cache on CPU4 [ 49.941824] CPU4: Booted secondary processor 0x0000000102 [0x411fd073] [ 49.948593] CPU4 is up [ 49.950810] Detected PIPT I-cache on CPU5 [ 49.954651] CPU5: Booted secondary processor 0x0000000103 [0x411fd073] [ 49.961431] CPU5 is up [ 50.069784] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode [ 50.077634] dwmac4: Master AXI performs any burst length [ 50.080718] dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features support found [ 50.088172] dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported [ 50.096851] dwc-eth-dwmac 2490000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx [ 50.110897] usb-conn-gpio 3520000.padctl:ports:usb2-0:connector: repeated role: device [ 50.113922] tegra-xusb 3530000.usb: Firmware timestamp: 2020-07-06 13:39:28 UTC [ 50.147552] OOM killer enabled. [ 50.148441] Restarting tasks ... done. [ 50.152552] VDDIO_SDMMC3_AP: voltage operation not allowed [ 50.154761] random: crng reseeded on system resumption [ 50.162912] PM: suspend exit [ 50.212215] VDDIO_SDMMC3_AP: voltage operation not allowed [ 50.271578] VDDIO_SDMMC3_AP: voltage operation not allowed [ 50.338597] VDDIO_SDMMC3_AP: voltage operation not allowed [ 234.474848] nfs: server 10.26.51.252 not responding, still trying [ 234.538769] nfs: server 10.26.51.252 not responding, still trying [ 237.546922] nfs: server 10.26.51.252 not responding, still trying [ 254.762753] nfs: server 10.26.51.252 not responding, timed out [ 254.762771] nfs: server 10.26.51.252 not responding, timed out [ 254.766376] nfs: server 10.26.51.252 not responding, timed out [ 254.766392] nfs: server 10.26.51.252 not responding, timed out [ 254.783778] nfs: server 10.26.51.252 not responding, timed out [ 254.789582] nfs: server 10.26.51.252 not responding, timed out [ 254.795421] nfs: server 10.26.51.252 not responding, timed out [ 254.801193] nfs: server 10.26.51.252 not responding, timed out > Do either of the two attached diffs make any difference? I will try these next. Thanks Jon
On 10/03/2025 14:20, Jon Hunter wrote: > > On 07/03/2025 17:07, Russell King (Oracle) wrote: >> On Fri, Mar 07, 2025 at 04:11:19PM +0000, Jon Hunter wrote: >>> Hi Russell, >>> >>> On 06/03/2025 15:23, Russell King (Oracle) wrote: >>>> Hi, >>>> >>>> This is a second approach to solving the STMMAC reset issues caused by >>>> the lack of receive clock from the PHY where the media is in low power >>>> mode with a PHY that supports receive clock-stop. >>>> >>>> The first approach centred around only addressing the issue in the >>>> resume path, but it seems to also happen when the platform glue module >>>> is removed and re-inserted (Jon - can you check whether that's also >>>> the case for you please?) >>>> >>>> As this is more targetted, I've dropped the patches from this series >>>> which move the call to phylink_resume(), so the link may still come >>>> up too early on resume - but that's something I also intend to fix. >>>> >>>> This is experimental - so I value test reports for this change. >>> >>> >>> The subject indicates 3 patches, but I only see 2 patches? Can you >>> confirm >>> if there are 2 or 3? >> >> Yes, 2 patches is correct. >> >>> So far I have only tested to resume case with the 2 patches to make that >>> that is working but on Tegra186, which has been the most problematic, >>> it is >>> not working reliably on top of next-20250305. >> >> To confirm, you're seeing stmmac_reset() sporadically timing out on >> resume even with these patches appled? That's rather disappointing. > > So I am no longer seeing the reset fail, from what I can see, but now > NFS is not responding after resume ... > > [ 49.825094] Enabling non-boot CPUs ... > [ 49.829760] Detected PIPT I-cache on CPU1 > [ 49.832694] CPU features: SANITY CHECK: Unexpected variation in > SYS_CTR_EL0. Boot CPU: 0x0000008444c004, CPU1: 0x0000009444c004 > [ 49.844120] CPU features: SANITY CHECK: Unexpected variation in > SYS_ID_AA64DFR0_EL1. Boot CPU: 0x00000010305106, CPU1: 0x00000010305116 > [ 49.856231] CPU features: SANITY CHECK: Unexpected variation in > SYS_ID_DFR0_EL1. Boot CPU: 0x00000003010066, CPU1: 0x00000003001066 > [ 49.868081] CPU1: Booted secondary processor 0x0000000000 [0x4e0f0030] > [ 49.875389] CPU1 is up > [ 49.877187] Detected PIPT I-cache on CPU2 > [ 49.880824] CPU features: SANITY CHECK: Unexpected variation in > SYS_CTR_EL0. Boot CPU: 0x0000008444c004, CPU2: 0x0000009444c004 > [ 49.892266] CPU features: SANITY CHECK: Unexpected variation in > SYS_ID_AA64DFR0_EL1. Boot CPU: 0x00000010305106, CPU2: 0x00000010305116 > [ 49.904467] CPU features: SANITY CHECK: Unexpected variation in > SYS_ID_DFR0_EL1. Boot CPU: 0x00000003010066, CPU2: 0x00000003001066 > [ 49.916257] CPU2: Booted secondary processor 0x0000000001 [0x4e0f0030] > [ 49.923610] CPU2 is up > [ 49.925194] Detected PIPT I-cache on CPU3 > [ 49.929010] CPU3: Booted secondary processor 0x0000000101 [0x411fd073] > [ 49.935866] CPU3 is up > [ 49.937983] Detected PIPT I-cache on CPU4 > [ 49.941824] CPU4: Booted secondary processor 0x0000000102 [0x411fd073] > [ 49.948593] CPU4 is up > [ 49.950810] Detected PIPT I-cache on CPU5 > [ 49.954651] CPU5: Booted secondary processor 0x0000000103 [0x411fd073] > [ 49.961431] CPU5 is up > [ 50.069784] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/ > rgmii link mode > [ 50.077634] dwmac4: Master AXI performs any burst length > [ 50.080718] dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features > support found > [ 50.088172] dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 > Advanced Timestamp supported > [ 50.096851] dwc-eth-dwmac 2490000.ethernet eth0: Link is Up - 1Gbps/ > Full - flow control rx/tx > [ 50.110897] usb-conn-gpio 3520000.padctl:ports:usb2-0:connector: > repeated role: device > [ 50.113922] tegra-xusb 3530000.usb: Firmware timestamp: 2020-07-06 > 13:39:28 UTC > [ 50.147552] OOM killer enabled. > [ 50.148441] Restarting tasks ... done. > [ 50.152552] VDDIO_SDMMC3_AP: voltage operation not allowed > [ 50.154761] random: crng reseeded on system resumption > [ 50.162912] PM: suspend exit > [ 50.212215] VDDIO_SDMMC3_AP: voltage operation not allowed > [ 50.271578] VDDIO_SDMMC3_AP: voltage operation not allowed > [ 50.338597] VDDIO_SDMMC3_AP: voltage operation not allowed > [ 234.474848] nfs: server 10.26.51.252 not responding, still trying > [ 234.538769] nfs: server 10.26.51.252 not responding, still trying > [ 237.546922] nfs: server 10.26.51.252 not responding, still trying > [ 254.762753] nfs: server 10.26.51.252 not responding, timed out > [ 254.762771] nfs: server 10.26.51.252 not responding, timed out > [ 254.766376] nfs: server 10.26.51.252 not responding, timed out > [ 254.766392] nfs: server 10.26.51.252 not responding, timed out > [ 254.783778] nfs: server 10.26.51.252 not responding, timed out > [ 254.789582] nfs: server 10.26.51.252 not responding, timed out > [ 254.795421] nfs: server 10.26.51.252 not responding, timed out > [ 254.801193] nfs: server 10.26.51.252 not responding, timed out > >> Do either of the two attached diffs make any difference? > > I will try these next. I tried both of the diffs, but both had the same problem as above and I see these nfs timeouts after resuming. What works the best is the original change you proposed (this is based upon the latest two patches) ... diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index e2146d3aee74..48a646b76a29 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -3109,10 +3109,7 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv) if (priv->extend_desc && (priv->mode == STMMAC_RING_MODE)) priv->plat->dma_cfg->atds = 1; - /* Note that the PHY clock must be running for reset to complete. */ - phylink_rx_clk_stop_block(priv->phylink); ret = stmmac_reset(priv, priv->ioaddr); - phylink_rx_clk_stop_unblock(priv->phylink); if (ret) { netdev_err(priv->dev, "Failed to reset the dma\n"); return ret; @@ -7951,6 +7948,8 @@ int stmmac_resume(struct device *dev) rtnl_lock(); mutex_lock(&priv->lock); + /* Note that the PHY clock must be running for reset to complete. */ + phylink_rx_clk_stop_block(priv->phylink); stmmac_reset_queues_param(priv); stmmac_free_tx_skbufs(priv); @@ -7961,6 +7960,7 @@ int stmmac_resume(struct device *dev) stmmac_set_rx_mode(ndev); stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw); + phylink_rx_clk_stop_unblock(priv->phylink); stmmac_enable_all_queues(priv); stmmac_enable_all_dma_irq(priv);
On Tue, Mar 11, 2025 at 01:25:58PM +0000, Jon Hunter wrote: > > On 10/03/2025 14:20, Jon Hunter wrote: > > > > On 07/03/2025 17:07, Russell King (Oracle) wrote: > > > On Fri, Mar 07, 2025 at 04:11:19PM +0000, Jon Hunter wrote: > > > > Hi Russell, > > > > > > > > On 06/03/2025 15:23, Russell King (Oracle) wrote: > > > > > Hi, > > > > > > > > > > This is a second approach to solving the STMMAC reset issues caused by > > > > > the lack of receive clock from the PHY where the media is in low power > > > > > mode with a PHY that supports receive clock-stop. > > > > > > > > > > The first approach centred around only addressing the issue in the > > > > > resume path, but it seems to also happen when the platform glue module > > > > > is removed and re-inserted (Jon - can you check whether that's also > > > > > the case for you please?) > > > > > > > > > > As this is more targetted, I've dropped the patches from this series > > > > > which move the call to phylink_resume(), so the link may still come > > > > > up too early on resume - but that's something I also intend to fix. > > > > > > > > > > This is experimental - so I value test reports for this change. > > > > > > > > > > > > The subject indicates 3 patches, but I only see 2 patches? Can > > > > you confirm > > > > if there are 2 or 3? > > > > > > Yes, 2 patches is correct. > > > > > > > So far I have only tested to resume case with the 2 patches to make that > > > > that is working but on Tegra186, which has been the most > > > > problematic, it is > > > > not working reliably on top of next-20250305. > > > > > > To confirm, you're seeing stmmac_reset() sporadically timing out on > > > resume even with these patches appled? That's rather disappointing. > > > > So I am no longer seeing the reset fail, from what I can see, but now > > NFS is not responding after resume ... > > > > [ 49.825094] Enabling non-boot CPUs ... > > [ 49.829760] Detected PIPT I-cache on CPU1 > > [ 49.832694] CPU features: SANITY CHECK: Unexpected variation in > > SYS_CTR_EL0. Boot CPU: 0x0000008444c004, CPU1: 0x0000009444c004 > > [ 49.844120] CPU features: SANITY CHECK: Unexpected variation in > > SYS_ID_AA64DFR0_EL1. Boot CPU: 0x00000010305106, CPU1: 0x00000010305116 > > [ 49.856231] CPU features: SANITY CHECK: Unexpected variation in > > SYS_ID_DFR0_EL1. Boot CPU: 0x00000003010066, CPU1: 0x00000003001066 > > [ 49.868081] CPU1: Booted secondary processor 0x0000000000 [0x4e0f0030] > > [ 49.875389] CPU1 is up > > [ 49.877187] Detected PIPT I-cache on CPU2 > > [ 49.880824] CPU features: SANITY CHECK: Unexpected variation in > > SYS_CTR_EL0. Boot CPU: 0x0000008444c004, CPU2: 0x0000009444c004 > > [ 49.892266] CPU features: SANITY CHECK: Unexpected variation in > > SYS_ID_AA64DFR0_EL1. Boot CPU: 0x00000010305106, CPU2: 0x00000010305116 > > [ 49.904467] CPU features: SANITY CHECK: Unexpected variation in > > SYS_ID_DFR0_EL1. Boot CPU: 0x00000003010066, CPU2: 0x00000003001066 > > [ 49.916257] CPU2: Booted secondary processor 0x0000000001 [0x4e0f0030] > > [ 49.923610] CPU2 is up > > [ 49.925194] Detected PIPT I-cache on CPU3 > > [ 49.929010] CPU3: Booted secondary processor 0x0000000101 [0x411fd073] > > [ 49.935866] CPU3 is up > > [ 49.937983] Detected PIPT I-cache on CPU4 > > [ 49.941824] CPU4: Booted secondary processor 0x0000000102 [0x411fd073] > > [ 49.948593] CPU4 is up > > [ 49.950810] Detected PIPT I-cache on CPU5 > > [ 49.954651] CPU5: Booted secondary processor 0x0000000103 [0x411fd073] > > [ 49.961431] CPU5 is up > > [ 50.069784] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/ > > rgmii link mode > > [ 50.077634] dwmac4: Master AXI performs any burst length > > [ 50.080718] dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features > > support found > > [ 50.088172] dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 > > Advanced Timestamp supported > > [ 50.096851] dwc-eth-dwmac 2490000.ethernet eth0: Link is Up - 1Gbps/ > > Full - flow control rx/tx > > [ 50.110897] usb-conn-gpio 3520000.padctl:ports:usb2-0:connector: > > repeated role: device > > [ 50.113922] tegra-xusb 3530000.usb: Firmware timestamp: 2020-07-06 > > 13:39:28 UTC > > [ 50.147552] OOM killer enabled. > > [ 50.148441] Restarting tasks ... done. > > [ 50.152552] VDDIO_SDMMC3_AP: voltage operation not allowed > > [ 50.154761] random: crng reseeded on system resumption > > [ 50.162912] PM: suspend exit > > [ 50.212215] VDDIO_SDMMC3_AP: voltage operation not allowed > > [ 50.271578] VDDIO_SDMMC3_AP: voltage operation not allowed > > [ 50.338597] VDDIO_SDMMC3_AP: voltage operation not allowed > > [ 234.474848] nfs: server 10.26.51.252 not responding, still trying > > [ 234.538769] nfs: server 10.26.51.252 not responding, still trying > > [ 237.546922] nfs: server 10.26.51.252 not responding, still trying > > [ 254.762753] nfs: server 10.26.51.252 not responding, timed out > > [ 254.762771] nfs: server 10.26.51.252 not responding, timed out > > [ 254.766376] nfs: server 10.26.51.252 not responding, timed out > > [ 254.766392] nfs: server 10.26.51.252 not responding, timed out > > [ 254.783778] nfs: server 10.26.51.252 not responding, timed out > > [ 254.789582] nfs: server 10.26.51.252 not responding, timed out > > [ 254.795421] nfs: server 10.26.51.252 not responding, timed out > > [ 254.801193] nfs: server 10.26.51.252 not responding, timed out > > > > > Do either of the two attached diffs make any difference? > > > > I will try these next. > > > I tried both of the diffs, but both had the same problem as above and > I see these nfs timeouts after resuming. What works the best is the > original change you proposed (this is based upon the latest two > patches) ... I'm wondering whether there's something else which needs the RX clock running in order to take effect. > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > index e2146d3aee74..48a646b76a29 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > @@ -3109,10 +3109,7 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv) > if (priv->extend_desc && (priv->mode == STMMAC_RING_MODE)) > priv->plat->dma_cfg->atds = 1; > - /* Note that the PHY clock must be running for reset to complete. */ > - phylink_rx_clk_stop_block(priv->phylink); > ret = stmmac_reset(priv, priv->ioaddr); > - phylink_rx_clk_stop_unblock(priv->phylink); > if (ret) { > netdev_err(priv->dev, "Failed to reset the dma\n"); > return ret; > @@ -7951,6 +7948,8 @@ int stmmac_resume(struct device *dev) > rtnl_lock(); > mutex_lock(&priv->lock); > + /* Note that the PHY clock must be running for reset to complete. */ > + phylink_rx_clk_stop_block(priv->phylink); > stmmac_reset_queues_param(priv); > stmmac_free_tx_skbufs(priv); > @@ -7961,6 +7960,7 @@ int stmmac_resume(struct device *dev) > stmmac_set_rx_mode(ndev); > stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw); > + phylink_rx_clk_stop_unblock(priv->phylink); > stmmac_enable_all_queues(priv); > stmmac_enable_all_dma_irq(priv); If you haven't already, can you try shrinking down the number of functions that are within the block..unblock region please? Looking at the functions called: stmmac_reset_queues_param() stmmac_free_tx_skbufs() stmmac_clear_descriptors() These look like it's only manipulating software state stmmac_hw_setup() We know this calls stmmac_reset() and thus needs the blocking stmmac_init_coalesce() Looks like it's only manipulating software state stmmac_set_rx_mode() This manipulates GMAC_RXQ_CTRL4, GMAC_HASH_TAB(*), GMAC_ADDR_HIGH(*), GMAC_ADDR_LOW(*), and GMAC_PACKET_FILTER. stmmac_restore_hw_vlan_rx_fltr() This manipulates GMAC_VLAN_TAG_DATA, GMAC_VLAN_TAG, GMAC_VLAN_HASH_TABLE, GMAC_VLAN_TAG I wonder whether the last two also require the RX clock to be running. The reason I want to track this down is that we may need to add block..unblock elsewhere in the driver to ensure that the RX clock is running when configuration is done elsewhere. Thanks.
On 11/03/2025 13:58, Russell King (Oracle) wrote: ... > I'm wondering whether there's something else which needs the RX clock > running in order to take effect. > >> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >> index e2146d3aee74..48a646b76a29 100644 >> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >> @@ -3109,10 +3109,7 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv) >> if (priv->extend_desc && (priv->mode == STMMAC_RING_MODE)) >> priv->plat->dma_cfg->atds = 1; >> - /* Note that the PHY clock must be running for reset to complete. */ >> - phylink_rx_clk_stop_block(priv->phylink); >> ret = stmmac_reset(priv, priv->ioaddr); >> - phylink_rx_clk_stop_unblock(priv->phylink); >> if (ret) { >> netdev_err(priv->dev, "Failed to reset the dma\n"); >> return ret; >> @@ -7951,6 +7948,8 @@ int stmmac_resume(struct device *dev) >> rtnl_lock(); >> mutex_lock(&priv->lock); >> + /* Note that the PHY clock must be running for reset to complete. */ >> + phylink_rx_clk_stop_block(priv->phylink); >> stmmac_reset_queues_param(priv); >> stmmac_free_tx_skbufs(priv); >> @@ -7961,6 +7960,7 @@ int stmmac_resume(struct device *dev) >> stmmac_set_rx_mode(ndev); >> stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw); >> + phylink_rx_clk_stop_unblock(priv->phylink); >> stmmac_enable_all_queues(priv); >> stmmac_enable_all_dma_irq(priv); > > If you haven't already, can you try shrinking down the number of > functions that are within the block..unblock region please? It seems that at a minimum I need to block/unblock around the following functions ... diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index e2146d3aee74..46c343088b1f 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -3109,10 +3109,7 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv) if (priv->extend_desc && (priv->mode == STMMAC_RING_MODE)) priv->plat->dma_cfg->atds = 1; - /* Note that the PHY clock must be running for reset to complete. */ - phylink_rx_clk_stop_block(priv->phylink); ret = stmmac_reset(priv, priv->ioaddr); - phylink_rx_clk_stop_unblock(priv->phylink); if (ret) { netdev_err(priv->dev, "Failed to reset the dma\n"); return ret; @@ -7953,10 +7950,13 @@ int stmmac_resume(struct device *dev) stmmac_reset_queues_param(priv); + /* Note that the PHY clock must be running for reset to complete. */ + phylink_rx_clk_stop_block(priv->phylink); stmmac_free_tx_skbufs(priv); stmmac_clear_descriptors(priv, &priv->dma_conf); stmmac_hw_setup(ndev, false); + phylink_rx_clk_stop_unblock(priv->phylink); stmmac_init_coalesce(priv); stmmac_set_rx_mode(ndev); > Looking at the functions called: > > stmmac_reset_queues_param() > stmmac_free_tx_skbufs() > stmmac_clear_descriptors() > These look like it's only manipulating software state So it appears that the last two need to be in the block/unblock region and ... > stmmac_hw_setup() > We know this calls stmmac_reset() and thus needs the blocking ... this one, which is no surprise, but the others are OK. Please note so far I have only tested on the Tegra186 board which seems to be the most sensitive. Cheers Jon