Message ID | 1394823605-31883-1-git-send-email-ben.dooks@codethink.co.uk (mailing list archive) |
---|---|
State | Awaiting Upstream |
Headers | show |
On 14/03/14 19:00, Ben Dooks wrote: > It seems the pm_rumtime work queue is causing the device to be suspended > during initialisation, thus the initialisation may not be able to access > registers properly. Use pm_runtime_get_sync() and pm_runtime_put_sync() > to ensure that the pm system does not suspend it during the probe() call. If this work's for Geert then I'll look into updating the header and sending it to the netdev list.
On 14/03/14 20:06, Sergei Shtylyov wrote: > Hello. > > On 03/14/2014 10:01 PM, Ben Dooks wrote: > >>> It seems the pm_rumtime work queue is causing the device to be suspended >>> during initialisation, thus the initialisation may not be able to access >>> registers properly. Use pm_runtime_get_sync() and pm_runtime_put_sync() >>> to ensure that the pm system does not suspend it during the probe() >>> call. > >> If this work's for Geert then I'll look into updating the header and >> sending it to the netdev list. > > I keep wondering why I'm seeing no issues in my setup... I don't > have drivers/sh/ patch and have CONFIG_PM_RUNTIME=y. Seeing no issues on > multiplatform kernels on neither Lager nor Koelsch (however, other > platforms are not enabled). Without drivers/sh there is no specification to the pm code to do anything with the platform or OF devices. This means the clocks are not being auto-managed, which in turn means the clock is probably being left on from the boot-loader and thus the unit works correctly. Unfortunately, without drivers/sh patched a number of other drivers fail to work. This means we need it or something similar in the kernel. I will spend some time over the weekend moving the code out of the drivers/sh to somewhere else so that this does not occur again.
Hello. On 03/14/2014 10:01 PM, Ben Dooks wrote: >> It seems the pm_rumtime work queue is causing the device to be suspended >> during initialisation, thus the initialisation may not be able to access >> registers properly. Use pm_runtime_get_sync() and pm_runtime_put_sync() >> to ensure that the pm system does not suspend it during the probe() call. > If this work's for Geert then I'll look into updating the header and > sending it to the netdev list. I keep wondering why I'm seeing no issues in my setup... I don't have drivers/sh/ patch and have CONFIG_PM_RUNTIME=y. Seeing no issues on multiplatform kernels on neither Lager nor Koelsch (however, other platforms are not enabled). WBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello. On 03/14/2014 10:19 PM, Ben Dooks wrote: >>>> It seems the pm_rumtime work queue is causing the device to be suspended >>>> during initialisation, thus the initialisation may not be able to access >>>> registers properly. Use pm_runtime_get_sync() and pm_runtime_put_sync() >>>> to ensure that the pm system does not suspend it during the probe() >>>> call. >>> If this work's for Geert then I'll look into updating the header and >>> sending it to the netdev list. >> I keep wondering why I'm seeing no issues in my setup... I don't >> have drivers/sh/ patch and have CONFIG_PM_RUNTIME=y. Seeing no issues on >> multiplatform kernels on neither Lager nor Koelsch (however, other >> platforms are not enabled). > Without drivers/sh there is no specification to the pm code to do > anything with the platform or OF devices. This means the clocks are > not being auto-managed, which in turn means the clock is probably > being left on from the boot-loader and thus the unit works correctly. Yeah, I TFTP-boot over Ether from U-Boot. I wonder how others' setup is different from mine. > Unfortunately, without drivers/sh patched a number of other drivers > fail to work. This means we need it or something similar in the kernel. I wonder how drivers/sh/ came to existence at all. :-) Although there's also drivers/parisc/... > I will spend some time over the weekend moving the code out of the > drivers/sh to somewhere else so that this does not occur again. Thanks for your work. I just wasn't aware of these issues when I ported Ether to DT -- it Just Worked (TM). WBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Ben, Thank you for the patch. On Friday 14 March 2014 19:00:05 Ben Dooks wrote: > It seems the pm_rumtime work queue is causing the device to be suspended > during initialisation, Have you investigated through which call path(s) this happens ? If device can be runtime suspended during their probe function despite calling pm_runtime_resume() then I don't see the point of that function at all. > thus the initialisation may not be able to access registers properly. Use > pm_runtime_get_sync() and pm_runtime_put_sync() to ensure that the pm system > does not suspend it during the probe() call. > > Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> > --- > drivers/net/ethernet/renesas/sh_eth.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/net/ethernet/renesas/sh_eth.c > b/drivers/net/ethernet/renesas/sh_eth.c index 4f76b5e..88aac2c 100644 > --- a/drivers/net/ethernet/renesas/sh_eth.c > +++ b/drivers/net/ethernet/renesas/sh_eth.c > @@ -2870,6 +2870,7 @@ static int sh_eth_drv_probe(struct platform_device > *pdev) mdp->pdev = pdev; > pm_runtime_enable(&pdev->dev); > pm_runtime_resume(&pdev->dev); > + pm_runtime_get_sync(&pdev->dev); I believe pm_runtime_resume() isn't needed anymore if we call pm_runtime_get_sync(). > if (pdev->dev.of_node) > pd = sh_eth_parse_dt(&pdev->dev); > @@ -2961,6 +2962,7 @@ static int sh_eth_drv_probe(struct platform_device > *pdev) pr_info("Base address at 0x%x, %pM, IRQ %d.\n", > (u32)ndev->base_addr, ndev->dev_addr, ndev->irq); > > + pm_runtime_put_sync(&pdev->dev); > platform_set_drvdata(pdev, ndev); > > return ret;
Hi Ben, On Fri, Mar 14, 2014 at 8:00 PM, Ben Dooks <ben.dooks@codethink.co.uk> wrote: > It seems the pm_rumtime work queue is causing the device to be suspended > during initialisation, thus the initialisation may not be able to access > registers properly. Use pm_runtime_get_sync() and pm_runtime_put_sync() > to ensure that the pm system does not suspend it during the probe() call. > > Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Thanks, this fixes the "imprecise external abort" on Koelsch. Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 17/03/14 09:22, Geert Uytterhoeven wrote: > Hi Ben, > > On Fri, Mar 14, 2014 at 8:00 PM, Ben Dooks <ben.dooks@codethink.co.uk> wrote: >> It seems the pm_rumtime work queue is causing the device to be suspended >> during initialisation, thus the initialisation may not be able to access >> registers properly. Use pm_runtime_get_sync() and pm_runtime_put_sync() >> to ensure that the pm system does not suspend it during the probe() call. >> >> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> > > Thanks, this fixes the "imprecise external abort" on Koelsch. > > Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Thanks, will consider sending this to the netdev list later.
On 15/03/14 11:19, Laurent Pinchart wrote: > Hi Ben, > > Thank you for the patch. > > On Friday 14 March 2014 19:00:05 Ben Dooks wrote: >> It seems the pm_rumtime work queue is causing the device to be suspended >> during initialisation, > > Have you investigated through which call path(s) this happens ? If device can > be runtime suspended during their probe function despite calling > pm_runtime_resume() then I don't see the point of that function at all. > >> thus the initialisation may not be able to access registers properly. Use >> pm_runtime_get_sync() and pm_runtime_put_sync() to ensure that the pm system >> does not suspend it during the probe() call. >> >> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> >> --- >> drivers/net/ethernet/renesas/sh_eth.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/drivers/net/ethernet/renesas/sh_eth.c >> b/drivers/net/ethernet/renesas/sh_eth.c index 4f76b5e..88aac2c 100644 >> --- a/drivers/net/ethernet/renesas/sh_eth.c >> +++ b/drivers/net/ethernet/renesas/sh_eth.c >> @@ -2870,6 +2870,7 @@ static int sh_eth_drv_probe(struct platform_device >> *pdev) mdp->pdev = pdev; >> pm_runtime_enable(&pdev->dev); >> pm_runtime_resume(&pdev->dev); >> + pm_runtime_get_sync(&pdev->dev); > > I believe pm_runtime_resume() isn't needed anymore if we call > pm_runtime_get_sync(). > >> if (pdev->dev.of_node) >> pd = sh_eth_parse_dt(&pdev->dev); >> @@ -2961,6 +2962,7 @@ static int sh_eth_drv_probe(struct platform_device >> *pdev) pr_info("Base address at 0x%x, %pM, IRQ %d.\n", >> (u32)ndev->base_addr, ndev->dev_addr, ndev->irq); >> >> + pm_runtime_put_sync(&pdev->dev); >> platform_set_drvdata(pdev, ndev); >> >> return ret; I will look at removing the pm_runtime_resume call as well in this patch.
Hi Ben, On Monday 17 March 2014 11:20:32 Ben Dooks wrote: > On 15/03/14 11:19, Laurent Pinchart wrote: > > On Friday 14 March 2014 19:00:05 Ben Dooks wrote: > >> It seems the pm_rumtime work queue is causing the device to be suspended > >> during initialisation, > > > > Have you investigated through which call path(s) this happens ? If device > > can be runtime suspended during their probe function despite calling > > pm_runtime_resume() then I don't see the point of that function at all. > > > >> thus the initialisation may not be able to access registers properly. Use > >> pm_runtime_get_sync() and pm_runtime_put_sync() to ensure that the pm > >> system does not suspend it during the probe() call. > >> > >> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> > >> --- > >> > >> drivers/net/ethernet/renesas/sh_eth.c | 2 ++ > >> 1 file changed, 2 insertions(+) > >> > >> diff --git a/drivers/net/ethernet/renesas/sh_eth.c > >> b/drivers/net/ethernet/renesas/sh_eth.c index 4f76b5e..88aac2c 100644 > >> --- a/drivers/net/ethernet/renesas/sh_eth.c > >> +++ b/drivers/net/ethernet/renesas/sh_eth.c > >> @@ -2870,6 +2870,7 @@ static int sh_eth_drv_probe(struct platform_device > >> *pdev) > >> mdp->pdev = pdev; > >> pm_runtime_enable(&pdev->dev); > >> pm_runtime_resume(&pdev->dev); > >> + pm_runtime_get_sync(&pdev->dev); > > > > I believe pm_runtime_resume() isn't needed anymore if we call > > pm_runtime_get_sync(). > > > >> if (pdev->dev.of_node) > >> pd = sh_eth_parse_dt(&pdev->dev); > >> @@ -2961,6 +2962,7 @@ static int sh_eth_drv_probe(struct platform_device > >> *pdev) > >> pr_info("Base address at 0x%x, %pM, IRQ %d.\n", > >> (u32)ndev->base_addr, ndev->dev_addr, ndev->irq); > >> > >> + pm_runtime_put_sync(&pdev->dev); > >> platform_set_drvdata(pdev, ndev); > >> > >> return ret; > > I will look at removing the pm_runtime_resume call as well in this patch. Thank you. It would also be nice if you could find out what causes the PM runtime work queue to suspend the device. pm_runtime_resume() is documented as being the function to call at probe time. I'd like to know whether the problem comes from the PM runtime core itself, in which case the documentation should be updated or the PM runtime core should be fixed, or from operations performed by the sh-eth driver, in which case we might need a different fix in the driver.
On 17/03/14 11:35, Laurent Pinchart wrote: > Hi Ben, > > On Monday 17 March 2014 11:20:32 Ben Dooks wrote: >> On 15/03/14 11:19, Laurent Pinchart wrote: >>> On Friday 14 March 2014 19:00:05 Ben Dooks wrote: >>>> It seems the pm_rumtime work queue is causing the device to be suspended >>>> during initialisation, >>> >>> Have you investigated through which call path(s) this happens ? If device >>> can be runtime suspended during their probe function despite calling >>> pm_runtime_resume() then I don't see the point of that function at all. >>> >>>> thus the initialisation may not be able to access registers properly. Use >>>> pm_runtime_get_sync() and pm_runtime_put_sync() to ensure that the pm >>>> system does not suspend it during the probe() call. >>>> >>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> >>>> --- >>>> >>>> drivers/net/ethernet/renesas/sh_eth.c | 2 ++ >>>> 1 file changed, 2 insertions(+) >>>> >>>> diff --git a/drivers/net/ethernet/renesas/sh_eth.c >>>> b/drivers/net/ethernet/renesas/sh_eth.c index 4f76b5e..88aac2c 100644 >>>> --- a/drivers/net/ethernet/renesas/sh_eth.c >>>> +++ b/drivers/net/ethernet/renesas/sh_eth.c >>>> @@ -2870,6 +2870,7 @@ static int sh_eth_drv_probe(struct platform_device >>>> *pdev) >>>> mdp->pdev = pdev; >>>> pm_runtime_enable(&pdev->dev); >>>> pm_runtime_resume(&pdev->dev); >>>> + pm_runtime_get_sync(&pdev->dev); >>> >>> I believe pm_runtime_resume() isn't needed anymore if we call >>> pm_runtime_get_sync(). >>> >>>> if (pdev->dev.of_node) >>>> pd = sh_eth_parse_dt(&pdev->dev); >>>> @@ -2961,6 +2962,7 @@ static int sh_eth_drv_probe(struct platform_device >>>> *pdev) >>>> pr_info("Base address at 0x%x, %pM, IRQ %d.\n", >>>> (u32)ndev->base_addr, ndev->dev_addr, ndev->irq); >>>> >>>> + pm_runtime_put_sync(&pdev->dev); >>>> platform_set_drvdata(pdev, ndev); >>>> >>>> return ret; >> >> I will look at removing the pm_runtime_resume call as well in this patch. > > Thank you. It would also be nice if you could find out what causes the PM > runtime work queue to suspend the device. pm_runtime_resume() is documented as > being the function to call at probe time. I'd like to know whether the problem > comes from the PM runtime core itself, in which case the documentation should > be updated or the PM runtime core should be fixed, or from operations > performed by the sh-eth driver, in which case we might need a different fix in > the driver. I'm not entirely sure, pm_runtime_resume() seems like something to call during a resume operation. It probably does not increment device use count, and thus the pm worker thread is allowed to re-suspend the device when it runs. pm_runtime_get_sync() is the right thing to do as we are dealing with access device registers during the probe and thus should ensure the device does not get shutdown until the probe has finished.
Hi Ben, On Monday 17 March 2014 11:40:11 Ben Dooks wrote: > On 17/03/14 11:35, Laurent Pinchart wrote: > > On Monday 17 March 2014 11:20:32 Ben Dooks wrote: > >> On 15/03/14 11:19, Laurent Pinchart wrote: > >>> On Friday 14 March 2014 19:00:05 Ben Dooks wrote: > >>>> It seems the pm_rumtime work queue is causing the device to be > >>>> suspended during initialisation, > >>> > >>> Have you investigated through which call path(s) this happens ? If > >>> device can be runtime suspended during their probe function despite > >>> calling pm_runtime_resume() then I don't see the point of that function > >>> at all. > >>> > >>>> thus the initialisation may not be able to access registers properly. > >>>> Use pm_runtime_get_sync() and pm_runtime_put_sync() to ensure that the > >>>> pm system does not suspend it during the probe() call. > >>>> > >>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> > >>>> --- > >>>> > >>>> drivers/net/ethernet/renesas/sh_eth.c | 2 ++ > >>>> 1 file changed, 2 insertions(+) > >>>> > >>>> diff --git a/drivers/net/ethernet/renesas/sh_eth.c > >>>> b/drivers/net/ethernet/renesas/sh_eth.c index 4f76b5e..88aac2c 100644 > >>>> --- a/drivers/net/ethernet/renesas/sh_eth.c > >>>> +++ b/drivers/net/ethernet/renesas/sh_eth.c > >>>> @@ -2870,6 +2870,7 @@ static int sh_eth_drv_probe(struct > >>>> platform_device *pdev) > >>>> mdp->pdev = pdev; > >>>> pm_runtime_enable(&pdev->dev); > >>>> pm_runtime_resume(&pdev->dev); > >>>> + pm_runtime_get_sync(&pdev->dev); > >>> > >>> I believe pm_runtime_resume() isn't needed anymore if we call > >>> pm_runtime_get_sync(). > >>> > >>>> if (pdev->dev.of_node) > >>>> pd = sh_eth_parse_dt(&pdev->dev); > >>>> @@ -2961,6 +2962,7 @@ static int sh_eth_drv_probe(struct > >>>> platform_device *pdev) > >>>> pr_info("Base address at 0x%x, %pM, IRQ %d.\n", > >>>> (u32)ndev->base_addr, ndev->dev_addr, ndev->irq); > >>>> > >>>> + pm_runtime_put_sync(&pdev->dev); > >>>> platform_set_drvdata(pdev, ndev); > >>>> > >>>> return ret; > >> > >> I will look at removing the pm_runtime_resume call as well in this patch. > > > > Thank you. It would also be nice if you could find out what causes the PM > > runtime work queue to suspend the device. pm_runtime_resume() is > > documented as being the function to call at probe time. I'd like to know > > whether the problem comes from the PM runtime core itself, in which case > > the documentation should be updated or the PM runtime core should be > > fixed, or from operations performed by the sh-eth driver, in which case > > we might need a different fix in the driver. > > I'm not entirely sure, pm_runtime_resume() seems like something to call > during a resume operation. It probably does not increment device use > count, and thus the pm worker thread is allowed to re-suspend the device > when it runs. Yes, but why does it do so ? The runtime PM work queue doesn't seem to go through devices to try and suspend them, it seems to only process delayed work. What queues the work that makes runtime PM suspend the device ? > pm_runtime_get_sync() is the right thing to do as we are dealing with > access device registers during the probe and thus should ensure the > device does not get shutdown until the probe has finished. pm_runtime_resume() is documented as being the function to call in the probe handler. I'd like to get a clarification on this from the runtime PM developers. If the documentation is wrong it needs to be fixed.
On 17/03/14 11:53, Laurent Pinchart wrote: > Hi Ben, > > On Monday 17 March 2014 11:40:11 Ben Dooks wrote: >> On 17/03/14 11:35, Laurent Pinchart wrote: >>> On Monday 17 March 2014 11:20:32 Ben Dooks wrote: >>>> On 15/03/14 11:19, Laurent Pinchart wrote: >>>>> On Friday 14 March 2014 19:00:05 Ben Dooks wrote: >>>>>> It seems the pm_rumtime work queue is causing the device to be >>>>>> suspended during initialisation, >>>>> >>>>> Have you investigated through which call path(s) this happens ? If >>>>> device can be runtime suspended during their probe function despite >>>>> calling pm_runtime_resume() then I don't see the point of that function >>>>> at all. >>>>> >>>>>> thus the initialisation may not be able to access registers properly. >>>>>> Use pm_runtime_get_sync() and pm_runtime_put_sync() to ensure that the >>>>>> pm system does not suspend it during the probe() call. >>>>>> >>>>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> >>>>>> --- >>>>>> >>>>>> drivers/net/ethernet/renesas/sh_eth.c | 2 ++ >>>>>> 1 file changed, 2 insertions(+) >>>>>> >>>>>> diff --git a/drivers/net/ethernet/renesas/sh_eth.c >>>>>> b/drivers/net/ethernet/renesas/sh_eth.c index 4f76b5e..88aac2c 100644 >>>>>> --- a/drivers/net/ethernet/renesas/sh_eth.c >>>>>> +++ b/drivers/net/ethernet/renesas/sh_eth.c >>>>>> @@ -2870,6 +2870,7 @@ static int sh_eth_drv_probe(struct >>>>>> platform_device *pdev) >>>>>> mdp->pdev = pdev; >>>>>> pm_runtime_enable(&pdev->dev); >>>>>> pm_runtime_resume(&pdev->dev); >>>>>> + pm_runtime_get_sync(&pdev->dev); >>>>> >>>>> I believe pm_runtime_resume() isn't needed anymore if we call >>>>> pm_runtime_get_sync(). >>>>> >>>>>> if (pdev->dev.of_node) >>>>>> pd = sh_eth_parse_dt(&pdev->dev); >>>>>> @@ -2961,6 +2962,7 @@ static int sh_eth_drv_probe(struct >>>>>> platform_device *pdev) >>>>>> pr_info("Base address at 0x%x, %pM, IRQ %d.\n", >>>>>> (u32)ndev->base_addr, ndev->dev_addr, ndev->irq); >>>>>> >>>>>> + pm_runtime_put_sync(&pdev->dev); >>>>>> platform_set_drvdata(pdev, ndev); >>>>>> >>>>>> return ret; >>>> >>>> I will look at removing the pm_runtime_resume call as well in this patch. >>> >>> Thank you. It would also be nice if you could find out what causes the PM >>> runtime work queue to suspend the device. pm_runtime_resume() is >>> documented as being the function to call at probe time. I'd like to know >>> whether the problem comes from the PM runtime core itself, in which case >>> the documentation should be updated or the PM runtime core should be >>> fixed, or from operations performed by the sh-eth driver, in which case >>> we might need a different fix in the driver. >> >> I'm not entirely sure, pm_runtime_resume() seems like something to call >> during a resume operation. It probably does not increment device use >> count, and thus the pm worker thread is allowed to re-suspend the device >> when it runs. > > Yes, but why does it do so ? The runtime PM work queue doesn't seem to go > through devices to try and suspend them, it seems to only process delayed > work. What queues the work that makes runtime PM suspend the device ? > >> pm_runtime_get_sync() is the right thing to do as we are dealing with >> access device registers during the probe and thus should ensure the >> device does not get shutdown until the probe has finished. > > pm_runtime_resume() is documented as being the function to call in the probe > handler. I'd like to get a clarification on this from the runtime PM > developers. If the documentation is wrong it needs to be fixed. Hmm, wonder if it works for things that take a long time to run, or for things that create and probe sub-devices such as the phy? I annotated the clk-mstp.c driver to test for removal of the eth clock and the culprit in this case was a pm thread running before the device finished probing and suspending. I do not have the exact debug output as I deleted it after looking in to it.
Hi Ben, On Monday 17 March 2014 12:56:00 Ben Dooks wrote: > On 17/03/14 11:53, Laurent Pinchart wrote: > > On Monday 17 March 2014 11:40:11 Ben Dooks wrote: > >> On 17/03/14 11:35, Laurent Pinchart wrote: > >>> On Monday 17 March 2014 11:20:32 Ben Dooks wrote: > >>>> On 15/03/14 11:19, Laurent Pinchart wrote: > >>>>> On Friday 14 March 2014 19:00:05 Ben Dooks wrote: > >>>>>> It seems the pm_rumtime work queue is causing the device to be > >>>>>> suspended during initialisation, > >>>>> > >>>>> Have you investigated through which call path(s) this happens ? If > >>>>> device can be runtime suspended during their probe function despite > >>>>> calling pm_runtime_resume() then I don't see the point of that > >>>>> function at all. > >>>>> > >>>>>> thus the initialisation may not be able to access registers properly. > >>>>>> Use pm_runtime_get_sync() and pm_runtime_put_sync() to ensure that > >>>>>> the pm system does not suspend it during the probe() call. > >>>>>> > >>>>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> > >>>>>> --- > >>>>>> > >>>>>> drivers/net/ethernet/renesas/sh_eth.c | 2 ++ > >>>>>> 1 file changed, 2 insertions(+) > >>>>>> > >>>>>> diff --git a/drivers/net/ethernet/renesas/sh_eth.c > >>>>>> b/drivers/net/ethernet/renesas/sh_eth.c index 4f76b5e..88aac2c 100644 > >>>>>> --- a/drivers/net/ethernet/renesas/sh_eth.c > >>>>>> +++ b/drivers/net/ethernet/renesas/sh_eth.c > >>>>>> @@ -2870,6 +2870,7 @@ static int sh_eth_drv_probe(struct > >>>>>> platform_device *pdev) > >>>>>> mdp->pdev = pdev; > >>>>>> pm_runtime_enable(&pdev->dev); > >>>>>> pm_runtime_resume(&pdev->dev); > >>>>>> + pm_runtime_get_sync(&pdev->dev); > >>>>> > >>>>> I believe pm_runtime_resume() isn't needed anymore if we call > >>>>> pm_runtime_get_sync(). > >>>>> > >>>>>> if (pdev->dev.of_node) > >>>>>> pd = sh_eth_parse_dt(&pdev->dev); > >>>>>> @@ -2961,6 +2962,7 @@ static int sh_eth_drv_probe(struct > >>>>>> platform_device *pdev) > >>>>>> pr_info("Base address at 0x%x, %pM, IRQ %d.\n", > >>>>>> (u32)ndev->base_addr, ndev->dev_addr, ndev->irq); > >>>>>> > >>>>>> + pm_runtime_put_sync(&pdev->dev); > >>>>>> platform_set_drvdata(pdev, ndev); > >>>>>> > >>>>>> return ret; > >>>> > >>>> I will look at removing the pm_runtime_resume call as well in this > >>>> patch. > >>> > >>> Thank you. It would also be nice if you could find out what causes the > >>> PM runtime work queue to suspend the device. pm_runtime_resume() is > >>> documented as being the function to call at probe time. I'd like to know > >>> whether the problem comes from the PM runtime core itself, in which case > >>> the documentation should be updated or the PM runtime core should be > >>> fixed, or from operations performed by the sh-eth driver, in which case > >>> we might need a different fix in the driver. > >> > >> I'm not entirely sure, pm_runtime_resume() seems like something to call > >> during a resume operation. It probably does not increment device use > >> count, and thus the pm worker thread is allowed to re-suspend the device > >> when it runs. > > > > Yes, but why does it do so ? The runtime PM work queue doesn't seem to go > > through devices to try and suspend them, it seems to only process delayed > > work. What queues the work that makes runtime PM suspend the device ? > > > >> pm_runtime_get_sync() is the right thing to do as we are dealing with > >> access device registers during the probe and thus should ensure the > >> device does not get shutdown until the probe has finished. > > > > pm_runtime_resume() is documented as being the function to call in the > > probe handler. I'd like to get a clarification on this from the runtime > > PM developers. If the documentation is wrong it needs to be fixed. > > Hmm, wonder if it works for things that take a long time to run, or for > things that create and probe sub-devices such as the phy? That's exactly what I'd like to know :-) > I annotated the clk-mstp.c driver to test for removal of the eth clock and > the culprit in this case was a pm thread running before the device finished > probing and suspending. I do not have the exact debug output as I deleted it > after looking in to it. Would you be able to perform more tests ? I'd like to get to the bottom of this problem and find out whether the best solution would be to fix the runtime PM core, the runtime PM documentation or the driver (or a combination of them).
On 17/03/14 13:27, Laurent Pinchart wrote: > Hi Ben, > > On Monday 17 March 2014 12:56:00 Ben Dooks wrote: >> On 17/03/14 11:53, Laurent Pinchart wrote: >>> On Monday 17 March 2014 11:40:11 Ben Dooks wrote: >>>> On 17/03/14 11:35, Laurent Pinchart wrote: >>>>> On Monday 17 March 2014 11:20:32 Ben Dooks wrote: >>>>>> On 15/03/14 11:19, Laurent Pinchart wrote: >>>>>>> On Friday 14 March 2014 19:00:05 Ben Dooks wrote: >>>>>>>> It seems the pm_rumtime work queue is causing the device to be >>>>>>>> suspended during initialisation, >>>>>>> >>>>>>> Have you investigated through which call path(s) this happens ? If >>>>>>> device can be runtime suspended during their probe function despite >>>>>>> calling pm_runtime_resume() then I don't see the point of that >>>>>>> function at all. >>>>>>> >>>>>>>> thus the initialisation may not be able to access registers properly. >>>>>>>> Use pm_runtime_get_sync() and pm_runtime_put_sync() to ensure that >>>>>>>> the pm system does not suspend it during the probe() call. >>>>>>>> >>>>>>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> >>>>>>>> --- >>>>>>>> >>>>>>>> drivers/net/ethernet/renesas/sh_eth.c | 2 ++ >>>>>>>> 1 file changed, 2 insertions(+) >>>>>>>> >>>>>>>> diff --git a/drivers/net/ethernet/renesas/sh_eth.c >>>>>>>> b/drivers/net/ethernet/renesas/sh_eth.c index 4f76b5e..88aac2c 100644 >>>>>>>> --- a/drivers/net/ethernet/renesas/sh_eth.c >>>>>>>> +++ b/drivers/net/ethernet/renesas/sh_eth.c >>>>>>>> @@ -2870,6 +2870,7 @@ static int sh_eth_drv_probe(struct >>>>>>>> platform_device *pdev) >>>>>>>> mdp->pdev = pdev; >>>>>>>> pm_runtime_enable(&pdev->dev); >>>>>>>> pm_runtime_resume(&pdev->dev); >>>>>>>> + pm_runtime_get_sync(&pdev->dev); >>>>>>> >>>>>>> I believe pm_runtime_resume() isn't needed anymore if we call >>>>>>> pm_runtime_get_sync(). >>>>>>> >>>>>>>> if (pdev->dev.of_node) >>>>>>>> pd = sh_eth_parse_dt(&pdev->dev); >>>>>>>> @@ -2961,6 +2962,7 @@ static int sh_eth_drv_probe(struct >>>>>>>> platform_device *pdev) >>>>>>>> pr_info("Base address at 0x%x, %pM, IRQ %d.\n", >>>>>>>> (u32)ndev->base_addr, ndev->dev_addr, ndev->irq); >>>>>>>> >>>>>>>> + pm_runtime_put_sync(&pdev->dev); >>>>>>>> platform_set_drvdata(pdev, ndev); >>>>>>>> >>>>>>>> return ret; >>>>>> >>>>>> I will look at removing the pm_runtime_resume call as well in this >>>>>> patch. >>>>> >>>>> Thank you. It would also be nice if you could find out what causes the >>>>> PM runtime work queue to suspend the device. pm_runtime_resume() is >>>>> documented as being the function to call at probe time. I'd like to know >>>>> whether the problem comes from the PM runtime core itself, in which case >>>>> the documentation should be updated or the PM runtime core should be >>>>> fixed, or from operations performed by the sh-eth driver, in which case >>>>> we might need a different fix in the driver. >>>> >>>> I'm not entirely sure, pm_runtime_resume() seems like something to call >>>> during a resume operation. It probably does not increment device use >>>> count, and thus the pm worker thread is allowed to re-suspend the device >>>> when it runs. >>> >>> Yes, but why does it do so ? The runtime PM work queue doesn't seem to go >>> through devices to try and suspend them, it seems to only process delayed >>> work. What queues the work that makes runtime PM suspend the device ? >>> >>>> pm_runtime_get_sync() is the right thing to do as we are dealing with >>>> access device registers during the probe and thus should ensure the >>>> device does not get shutdown until the probe has finished. >>> >>> pm_runtime_resume() is documented as being the function to call in the >>> probe handler. I'd like to get a clarification on this from the runtime >>> PM developers. If the documentation is wrong it needs to be fixed. >> >> Hmm, wonder if it works for things that take a long time to run, or for >> things that create and probe sub-devices such as the phy? > > That's exactly what I'd like to know :-) > >> I annotated the clk-mstp.c driver to test for removal of the eth clock and >> the culprit in this case was a pm thread running before the device finished >> probing and suspending. I do not have the exact debug output as I deleted it >> after looking in to it. > > Would you be able to perform more tests ? I'd like to get to the bottom of > this problem and find out whether the best solution would be to fix the > runtime PM core, the runtime PM documentation or the driver (or a combination > of them). From pm_runtime.h: static inline int pm_runtime_get_sync(struct device *dev) { return __pm_runtime_resume(dev, RPM_GET_PUT); } static inline int pm_request_resume(struct device *dev) { return __pm_runtime_resume(dev, RPM_ASYNC); } So it looks like pm_runtime_resume() does not protect against the possibility that something else may re-suspend the device. I have yet to ascertain how this ends up happening with device probe, it seems to be very dependant on the code.
On Mon, Mar 17, 2014 at 2:36 PM, Ben Dooks <ben.dooks@codethink.co.uk> wrote: > I have yet to ascertain how this ends up happening with device probe, > it seems to be very dependant on the code. Adding a WARN() in cpg_mstp_clock_endisable(): [<c0278c9c>] (cpg_mstp_clock_endisable) from [<c0278e10>] (cpg_mstp_clock_disable+0x14/0 x18) r10:eec7a000 r9:00000003 r8:eed41508 r7:00000001 r6:a0000113 r5:eec23a00 r4:eec23a00 [<c0278dfc>] (cpg_mstp_clock_disable) from [<c0275bf4>] (__clk_disable+0x68/0x78) [<c0275b8c>] (__clk_disable) from [<c0275cf4>] (clk_disable+0x20/0x2c) r4:60000193 r3:00000002 [<c0275cd4>] (clk_disable) from [<c01fd000>] (pm_clk_suspend+0x54/0x78) r5:eed41500 r4:eed414c0 [<c01fcfac>] (pm_clk_suspend) from [<c01f87f8>] (__rpm_callback+0x4c/0x7c) r8:0000000c r7:00000004 r6:00000000 r5:eed42810 r4:eed42810 r3:c01fcfac [<c01f87ac>] (__rpm_callback) from [<c01f8878>] (rpm_callback+0x50/0x84) r5:eed42810 r4:eec7a000 [<c01f8828>] (rpm_callback) from [<c01f8e54>] (rpm_suspend+0x214/0x3e4) r6:00000000 r5:00000000 r4:eed42810 r3:eed42810 [<c01f8c40>] (rpm_suspend) from [<c01f9224>] (rpm_idle+0x184/0x19c) r10:00000010 r9:00000000 r8:ee4710bc r7:00000004 r6:00000004 r5:00000000 r4:eed42810 [<c01f90a0>] (rpm_idle) from [<c01f9328>] (__pm_runtime_idle+0x64/0x7c) r6:60000113 r5:eed42870 r4:eed42810 r3:eed42870 [<c01f92c4>] (__pm_runtime_idle) from [<c023ee90>] (sh_eth_get_stats+0x228/0x23c) r7:ee471000 r6:c0356bd4 r5:ee428000 r4:ee428000 [<c023ec68>] (sh_eth_get_stats) from [<c028c194>] (dev_get_stats+0x5c/0x84) r4:eec7bc30 r3:c023ec68 [<c028c138>] (dev_get_stats) from [<c029cb08>] (rtnl_fill_ifinfo+0x410/0x8dc) r6:00000000 r5:ee428000 r4:eed76ec0 r3:00000684 [<c029c6f8>] (rtnl_fill_ifinfo) from [<c029d9f4>] (rtmsg_ifinfo+0x6c/0xd4) r10:00000010 r9:ee428510 r8:ffffffff r7:eed76ec0 r6:ee428000 r5:000000d0 r4:00000000 [<c029d988>] (rtmsg_ifinfo) from [<c0291cd0>] (register_netdevice+0x3dc/0x428) r10:00000000 r8:eed42810 r7:ee428010 r6:ee428000 r5:c047f600 r4:00000000 [<c02918f4>] (register_netdevice) from [<c0291d38>] (register_netdev+0x1c/0x2c) r7:eed42800 r6:ef203978 r5:ee46e790 r4:ee428000 [<c0291d1c>] (register_netdev) from [<c0240528>] (sh_eth_drv_probe+0x994/0xb7c) r4:ee428000 r3:00000040 [<c023fb94>] (sh_eth_drv_probe) from [<c01f3e6c>] (platform_drv_probe+0x20/0x50) r10:c042b548 r9:c045c544 r8:00000065 r7:c047d830 r6:00000000 r5:c047d830 r4:eed42810 [<c01f3e4c>] (platform_drv_probe) from [<c01f2428>] (driver_probe_device+0xc8/0x1f8) r5:eed42844 r4:eed42810 [<c01f2360>] (driver_probe_device) from [<c01f25c0>] (__driver_attach+0x68/0x8c) r7:00000000 r6:c047d830 r5:eed42844 r4:eed42810 [<c01f2558>] (__driver_attach) from [<c01f0cd4>] (bus_for_each_dev+0x5c/0x94) r6:c01f2558 r5:eec7be48 r4:c047d830 r3:c01f2558 [<c01f0c78>] (bus_for_each_dev) from [<c01f1f84>] (driver_attach+0x20/0x28) r7:00000000 r6:c047a148 r5:ee461480 r4:c047d830 [<c01f1f64>] (driver_attach) from [<c01f1b98>] (bus_add_driver+0xb4/0x1c8) [<c01f1ae4>] (bus_add_driver) from [<c01f2c98>] (driver_register+0xa4/0xe8) r7:c0484780 r6:00000000 r5:c0454c18 r4:c047d830 [<c01f2bf4>] (driver_register) from [<c01f3854>] (__platform_driver_register+0x50/0x64) r5:c0454c18 r4:c0442acc [<c01f3804>] (__platform_driver_register) from [<c0442ae4>] (sh_eth_driver_init+0x18/0x20) [<c0442acc>] (sh_eth_driver_init) from [<c0009668>] (do_one_initcall+0x98/0x13c) [<c00095d0>] (do_one_initcall) from [<c042bc60>] (kernel_init_freeable+0x100/0x1cc) r9:c045c544 r8:00000065 r7:c0484780 r6:c0454bf8 r5:c0454c18 r4:00000006 [<c042bb60>] (kernel_init_freeable) from [<c032afa8>] (kernel_init+0x10/0xec) r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c032af98 r4:00000000 [<c032af98>] (kernel_init) from [<c000ed98>] (ret_from_fork+0x14/0x3c) r4:00000000 r3:00000000
Hi Ben, On Monday 17 March 2014 13:36:33 Ben Dooks wrote: [snip] > From pm_runtime.h: > > static inline int pm_runtime_get_sync(struct device *dev) > { > return __pm_runtime_resume(dev, RPM_GET_PUT); > } > > static inline int pm_request_resume(struct device *dev) > { > return __pm_runtime_resume(dev, RPM_ASYNC); > } > > So it looks like pm_runtime_resume() does not protect against the > possibility that something else may re-suspend the device. Correct. > I have yet to ascertain how this ends up happening with device probe, it > seems to be very dependant on the code. We might be doing something wrong in the driver from a runtime PM point of view that leads to the device being suspended. I'd like to catch that instead of hiding it by a pm_runtime_get_sync() call. If it turns out that we're not doing anything wrong then replacing pm_runtime_resume() with pm_runtime_get_sync() would of course be fine.
On Mon, Mar 17, 2014 at 2:38 PM, Geert Uytterhoeven <geert@linux-m68k.org> wrote: > [<c01f92c4>] (__pm_runtime_idle) from [<c023ee90>] > (sh_eth_get_stats+0x228/0x23c) > r7:ee471000 r6:c0356bd4 r5:ee428000 r4:ee428000 > [<c023ec68>] (sh_eth_get_stats) from [<c028c194>] (dev_get_stats+0x5c/0x84) > r4:eec7bc30 r3:c023ec68 > [<c028c138>] (dev_get_stats) from [<c029cb08>] (rtnl_fill_ifinfo+0x410/0x8dc) > r6:00000000 r5:ee428000 r4:eed76ec0 r3:00000684 > [<c029c6f8>] (rtnl_fill_ifinfo) from [<c029d9f4>] (rtmsg_ifinfo+0x6c/0xd4) > r10:00000010 r9:ee428510 r8:ffffffff r7:eed76ec0 r6:ee428000 r5:000000d0 > r4:00000000 > [<c029d988>] (rtmsg_ifinfo) from [<c0291cd0>] (register_netdevice+0x3dc/0x428) > r10:00000000 r8:eed42810 r7:ee428010 r6:ee428000 r5:c047f600 r4:00000000 > [<c02918f4>] (register_netdevice) from [<c0291d38>] (register_netdev+0x1c/0x2c) > r7:eed42800 r6:ef203978 r5:ee46e790 r4:ee428000 > [<c0291d1c>] (register_netdev) from [<c0240528>] (sh_eth_drv_probe+0x994/0xb7c) > r4:ee428000 r3:00000040 > [<c023fb94>] (sh_eth_drv_probe) from [<c01f3e6c>] (platform_drv_probe+0x20/0x50) > r10:c042b548 r9:c045c544 r8:00000065 r7:c047d830 r6:00000000 r5:c047d830 > r4:eed42810 sh_eth_get_stats() calls pm_runtime_get_sync(&mdp->pdev->dev); ... pm_runtime_put_sync(&mdp->pdev->dev); so adding the extra pm_runtime_get_sync() in ->probe() prevents the pm_runtime_put_sync() from putting the hardware to sleep. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 17/03/14 13:38, Geert Uytterhoeven wrote: > On Mon, Mar 17, 2014 at 2:36 PM, Ben Dooks <ben.dooks@codethink.co.uk> wrote: >> I have yet to ascertain how this ends up happening with device probe, >> it seems to be very dependant on the code. > > Adding a WARN() in cpg_mstp_clock_endisable(): > > [<c0278c9c>] (cpg_mstp_clock_endisable) from [<c0278e10>] > (cpg_mstp_clock_disable+0x14/0 > x18) > r10:eec7a000 r9:00000003 r8:eed41508 r7:00000001 r6:a0000113 r5:eec23a00 > r4:eec23a00 > [<c0278dfc>] (cpg_mstp_clock_disable) from [<c0275bf4>] > (__clk_disable+0x68/0x78) > [<c0275b8c>] (__clk_disable) from [<c0275cf4>] (clk_disable+0x20/0x2c) > r4:60000193 r3:00000002 > [<c0275cd4>] (clk_disable) from [<c01fd000>] (pm_clk_suspend+0x54/0x78) > r5:eed41500 r4:eed414c0 > [<c01fcfac>] (pm_clk_suspend) from [<c01f87f8>] (__rpm_callback+0x4c/0x7c) > r8:0000000c r7:00000004 r6:00000000 r5:eed42810 r4:eed42810 r3:c01fcfac > [<c01f87ac>] (__rpm_callback) from [<c01f8878>] (rpm_callback+0x50/0x84) > r5:eed42810 r4:eec7a000 > [<c01f8828>] (rpm_callback) from [<c01f8e54>] (rpm_suspend+0x214/0x3e4) > r6:00000000 r5:00000000 r4:eed42810 r3:eed42810 > [<c01f8c40>] (rpm_suspend) from [<c01f9224>] (rpm_idle+0x184/0x19c) > r10:00000010 r9:00000000 r8:ee4710bc r7:00000004 r6:00000004 r5:00000000 > r4:eed42810 > [<c01f90a0>] (rpm_idle) from [<c01f9328>] (__pm_runtime_idle+0x64/0x7c) > r6:60000113 r5:eed42870 r4:eed42810 r3:eed42870 > [<c01f92c4>] (__pm_runtime_idle) from [<c023ee90>] > (sh_eth_get_stats+0x228/0x23c) > r7:ee471000 r6:c0356bd4 r5:ee428000 r4:ee428000 > [<c023ec68>] (sh_eth_get_stats) from [<c028c194>] (dev_get_stats+0x5c/0x84) > r4:eec7bc30 r3:c023ec68 > [<c028c138>] (dev_get_stats) from [<c029cb08>] (rtnl_fill_ifinfo+0x410/0x8dc) > r6:00000000 r5:ee428000 r4:eed76ec0 r3:00000684 > [<c029c6f8>] (rtnl_fill_ifinfo) from [<c029d9f4>] (rtmsg_ifinfo+0x6c/0xd4) > r10:00000010 r9:ee428510 r8:ffffffff r7:eed76ec0 r6:ee428000 r5:000000d0 > r4:00000000 > [<c029d988>] (rtmsg_ifinfo) from [<c0291cd0>] (register_netdevice+0x3dc/0x428) > r10:00000000 r8:eed42810 r7:ee428010 r6:ee428000 r5:c047f600 r4:00000000 > [<c02918f4>] (register_netdevice) from [<c0291d38>] (register_netdev+0x1c/0x2c) > r7:eed42800 r6:ef203978 r5:ee46e790 r4:ee428000 > [<c0291d1c>] (register_netdev) from [<c0240528>] (sh_eth_drv_probe+0x994/0xb7c) > r4:ee428000 r3:00000040 > [<c023fb94>] (sh_eth_drv_probe) from [<c01f3e6c>] (platform_drv_probe+0x20/0x50) > r10:c042b548 r9:c045c544 r8:00000065 r7:c047d830 r6:00000000 r5:c047d830 > r4:eed42810 > [<c01f3e4c>] (platform_drv_probe) from [<c01f2428>] > (driver_probe_device+0xc8/0x1f8) > r5:eed42844 r4:eed42810 > [<c01f2360>] (driver_probe_device) from [<c01f25c0>] (__driver_attach+0x68/0x8c) > r7:00000000 r6:c047d830 r5:eed42844 r4:eed42810 > [<c01f2558>] (__driver_attach) from [<c01f0cd4>] (bus_for_each_dev+0x5c/0x94) > r6:c01f2558 r5:eec7be48 r4:c047d830 r3:c01f2558 > [<c01f0c78>] (bus_for_each_dev) from [<c01f1f84>] (driver_attach+0x20/0x28) > r7:00000000 r6:c047a148 r5:ee461480 r4:c047d830 > [<c01f1f64>] (driver_attach) from [<c01f1b98>] (bus_add_driver+0xb4/0x1c8) > [<c01f1ae4>] (bus_add_driver) from [<c01f2c98>] (driver_register+0xa4/0xe8) > r7:c0484780 r6:00000000 r5:c0454c18 r4:c047d830 > [<c01f2bf4>] (driver_register) from [<c01f3854>] > (__platform_driver_register+0x50/0x64) > r5:c0454c18 r4:c0442acc > [<c01f3804>] (__platform_driver_register) from [<c0442ae4>] > (sh_eth_driver_init+0x18/0x20) > [<c0442acc>] (sh_eth_driver_init) from [<c0009668>] (do_one_initcall+0x98/0x13c) > [<c00095d0>] (do_one_initcall) from [<c042bc60>] > (kernel_init_freeable+0x100/0x1cc) > r9:c045c544 r8:00000065 r7:c0484780 r6:c0454bf8 r5:c0454c18 r4:00000006 > [<c042bb60>] (kernel_init_freeable) from [<c032afa8>] (kernel_init+0x10/0xec) > r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c032af98 > r4:00000000 > [<c032af98>] (kernel_init) from [<c000ed98>] (ret_from_fork+0x14/0x3c) > r4:00000000 r3:00000000 this explains it, the call to stats causes a get_sync/put_sync which puts the device into a state where it /could/ be suspended and thus does get suspended in this case from the pm code. I'm not /sure/ why the pm_runtime code is not protecting against running this when a device probe is in progress, but it seems the best thing is to ensure that we always do a get/put sync in the driver to ensure we have a reference during probe.
Hi Ben, On Monday 17 March 2014 13:44:02 Ben Dooks wrote: > On 17/03/14 13:38, Geert Uytterhoeven wrote: > > On Mon, Mar 17, 2014 at 2:36 PM, Ben Dooks wrote: > >> I have yet to ascertain how this ends up happening with device probe, > >> it seems to be very dependant on the code. > > > > Adding a WARN() in cpg_mstp_clock_endisable(): > > > > [<c0278c9c>] (cpg_mstp_clock_endisable) from [<c0278e10>] > > (cpg_mstp_clock_disable+0x14/0x18) > > r10:eec7a000 r9:00000003 r8:eed41508 r7:00000001 r6:a0000113 r5:eec23a00 > > r4:eec23a00 > > [<c0278dfc>] (cpg_mstp_clock_disable) from [<c0275bf4>] > > (__clk_disable+0x68/0x78) > > [<c0275b8c>] (__clk_disable) from [<c0275cf4>] (clk_disable+0x20/0x2c) > > r4:60000193 r3:00000002 > > [<c0275cd4>] (clk_disable) from [<c01fd000>] (pm_clk_suspend+0x54/0x78) > > r5:eed41500 r4:eed414c0 > > [<c01fcfac>] (pm_clk_suspend) from [<c01f87f8>] (__rpm_callback+0x4c/0x7c) > > r8:0000000c r7:00000004 r6:00000000 r5:eed42810 r4:eed42810 r3:c01fcfac > > [<c01f87ac>] (__rpm_callback) from [<c01f8878>] (rpm_callback+0x50/0x84) > > r5:eed42810 r4:eec7a000 > > [<c01f8828>] (rpm_callback) from [<c01f8e54>] (rpm_suspend+0x214/0x3e4) > > r6:00000000 r5:00000000 r4:eed42810 r3:eed42810 > > [<c01f8c40>] (rpm_suspend) from [<c01f9224>] (rpm_idle+0x184/0x19c) > > r10:00000010 r9:00000000 r8:ee4710bc r7:00000004 r6:00000004 r5:00000000 > > r4:eed42810 > > [<c01f90a0>] (rpm_idle) from [<c01f9328>] (__pm_runtime_idle+0x64/0x7c) > > r6:60000113 r5:eed42870 r4:eed42810 r3:eed42870 > > [<c01f92c4>] (__pm_runtime_idle) from [<c023ee90>] > > (sh_eth_get_stats+0x228/0x23c) > > r7:ee471000 r6:c0356bd4 r5:ee428000 r4:ee428000 > > [<c023ec68>] (sh_eth_get_stats) from [<c028c194>] > > (dev_get_stats+0x5c/0x84) > > r4:eec7bc30 r3:c023ec68 > > [<c028c138>] (dev_get_stats) from [<c029cb08>] > > (rtnl_fill_ifinfo+0x410/0x8dc) > > r6:00000000 r5:ee428000 r4:eed76ec0 r3:00000684 > > [<c029c6f8>] (rtnl_fill_ifinfo) from [<c029d9f4>] (rtmsg_ifinfo+0x6c/0xd4) > > r10:00000010 r9:ee428510 r8:ffffffff r7:eed76ec0 r6:ee428000 r5:000000d0 > > r4:00000000 > > [<c029d988>] (rtmsg_ifinfo) from [<c0291cd0>] > > (register_netdevice+0x3dc/0x428)> > > r10:00000000 r8:eed42810 r7:ee428010 r6:ee428000 r5:c047f600 r4:00000000 > > [<c02918f4>] (register_netdevice) from [<c0291d38>] > > (register_netdev+0x1c/0x2c)> > > r7:eed42800 r6:ef203978 r5:ee46e790 r4:ee428000 > > [<c0291d1c>] (register_netdev) from [<c0240528>] > > (sh_eth_drv_probe+0x994/0xb7c) > > r4:ee428000 r3:00000040 > > [<c023fb94>] (sh_eth_drv_probe) from [<c01f3e6c>] > > (platform_drv_probe+0x20/0x50) > > r10:c042b548 r9:c045c544 r8:00000065 r7:c047d830 r6:00000000 r5:c047d830 > > r4:eed42810 [snip] > this explains it, the call to stats causes a get_sync/put_sync which > puts the device into a state where it /could/ be suspended and thus > does get suspended in this case from the pm code. > > I'm not /sure/ why the pm_runtime code is not protecting against > running this when a device probe is in progress, but it seems the > best thing is to ensure that we always do a get/put sync in the > driver to ensure we have a reference during probe. Wouldn't it be better to register the MDIO bus before registering the network device ? It looks like the current order is prone to race conditions. Another potential issue, does the network layer guarantee that the device will always be opened by an .ndo_open() call before performing any MDIO operation ? sh_mmd_ctrl, sh_set_mdio and sh_get_mdio access sh-eth registers, could we be missing runtime PM calls in those call paths ?
On 17/03/14 13:54, Laurent Pinchart wrote: > Hi Ben, > > On Monday 17 March 2014 13:44:02 Ben Dooks wrote: >> On 17/03/14 13:38, Geert Uytterhoeven wrote: >>> On Mon, Mar 17, 2014 at 2:36 PM, Ben Dooks wrote: >>>> I have yet to ascertain how this ends up happening with device probe, >>>> it seems to be very dependant on the code. >>> >>> Adding a WARN() in cpg_mstp_clock_endisable(): >>> >>> [<c0278c9c>] (cpg_mstp_clock_endisable) from [<c0278e10>] >>> (cpg_mstp_clock_disable+0x14/0x18) >>> r10:eec7a000 r9:00000003 r8:eed41508 r7:00000001 r6:a0000113 r5:eec23a00 >>> r4:eec23a00 >>> [<c0278dfc>] (cpg_mstp_clock_disable) from [<c0275bf4>] >>> (__clk_disable+0x68/0x78) >>> [<c0275b8c>] (__clk_disable) from [<c0275cf4>] (clk_disable+0x20/0x2c) >>> r4:60000193 r3:00000002 >>> [<c0275cd4>] (clk_disable) from [<c01fd000>] (pm_clk_suspend+0x54/0x78) >>> r5:eed41500 r4:eed414c0 >>> [<c01fcfac>] (pm_clk_suspend) from [<c01f87f8>] (__rpm_callback+0x4c/0x7c) >>> r8:0000000c r7:00000004 r6:00000000 r5:eed42810 r4:eed42810 r3:c01fcfac >>> [<c01f87ac>] (__rpm_callback) from [<c01f8878>] (rpm_callback+0x50/0x84) >>> r5:eed42810 r4:eec7a000 >>> [<c01f8828>] (rpm_callback) from [<c01f8e54>] (rpm_suspend+0x214/0x3e4) >>> r6:00000000 r5:00000000 r4:eed42810 r3:eed42810 >>> [<c01f8c40>] (rpm_suspend) from [<c01f9224>] (rpm_idle+0x184/0x19c) >>> r10:00000010 r9:00000000 r8:ee4710bc r7:00000004 r6:00000004 r5:00000000 >>> r4:eed42810 >>> [<c01f90a0>] (rpm_idle) from [<c01f9328>] (__pm_runtime_idle+0x64/0x7c) >>> r6:60000113 r5:eed42870 r4:eed42810 r3:eed42870 >>> [<c01f92c4>] (__pm_runtime_idle) from [<c023ee90>] >>> (sh_eth_get_stats+0x228/0x23c) >>> r7:ee471000 r6:c0356bd4 r5:ee428000 r4:ee428000 >>> [<c023ec68>] (sh_eth_get_stats) from [<c028c194>] >>> (dev_get_stats+0x5c/0x84) >>> r4:eec7bc30 r3:c023ec68 >>> [<c028c138>] (dev_get_stats) from [<c029cb08>] >>> (rtnl_fill_ifinfo+0x410/0x8dc) >>> r6:00000000 r5:ee428000 r4:eed76ec0 r3:00000684 >>> [<c029c6f8>] (rtnl_fill_ifinfo) from [<c029d9f4>] (rtmsg_ifinfo+0x6c/0xd4) >>> r10:00000010 r9:ee428510 r8:ffffffff r7:eed76ec0 r6:ee428000 r5:000000d0 >>> r4:00000000 >>> [<c029d988>] (rtmsg_ifinfo) from [<c0291cd0>] >>> (register_netdevice+0x3dc/0x428)> >>> r10:00000000 r8:eed42810 r7:ee428010 r6:ee428000 r5:c047f600 r4:00000000 >>> [<c02918f4>] (register_netdevice) from [<c0291d38>] >>> (register_netdev+0x1c/0x2c)> >>> r7:eed42800 r6:ef203978 r5:ee46e790 r4:ee428000 >>> [<c0291d1c>] (register_netdev) from [<c0240528>] >>> (sh_eth_drv_probe+0x994/0xb7c) >>> r4:ee428000 r3:00000040 >>> [<c023fb94>] (sh_eth_drv_probe) from [<c01f3e6c>] >>> (platform_drv_probe+0x20/0x50) >>> r10:c042b548 r9:c045c544 r8:00000065 r7:c047d830 r6:00000000 r5:c047d830 >>> r4:eed42810 > > [snip] > >> this explains it, the call to stats causes a get_sync/put_sync which >> puts the device into a state where it /could/ be suspended and thus >> does get suspended in this case from the pm code. >> >> I'm not /sure/ why the pm_runtime code is not protecting against >> running this when a device probe is in progress, but it seems the >> best thing is to ensure that we always do a get/put sync in the >> driver to ensure we have a reference during probe. > > Wouldn't it be better to register the MDIO bus before registering the network > device ? It looks like the current order is prone to race conditions. Not sure, requires input? > Another potential issue, does the network layer guarantee that the device will > always be opened by an .ndo_open() call before performing any MDIO operation ? > sh_mmd_ctrl, sh_set_mdio and sh_get_mdio access sh-eth registers, could we be > missing runtime PM calls in those call paths ? I'm not sure if there is any guarantee, I don't think the PHY driver does any sort of open on probe. I do have a patch to ensure that the MDIO operations are wrapped with pm_runtime_{get,put}_sync() calls as the probe of the PHY often fails without it.
(adding Sergei) On Mon, Mar 17, 2014 at 2:54 PM, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > On Monday 17 March 2014 13:44:02 Ben Dooks wrote: >> On 17/03/14 13:38, Geert Uytterhoeven wrote: >> > On Mon, Mar 17, 2014 at 2:36 PM, Ben Dooks wrote: >> >> I have yet to ascertain how this ends up happening with device probe, >> >> it seems to be very dependant on the code. >> > >> > Adding a WARN() in cpg_mstp_clock_endisable(): >> > >> > [<c0278c9c>] (cpg_mstp_clock_endisable) from [<c0278e10>] >> > (cpg_mstp_clock_disable+0x14/0x18) >> > r10:eec7a000 r9:00000003 r8:eed41508 r7:00000001 r6:a0000113 r5:eec23a00 >> > r4:eec23a00 >> > [<c0278dfc>] (cpg_mstp_clock_disable) from [<c0275bf4>] >> > (__clk_disable+0x68/0x78) >> > [<c0275b8c>] (__clk_disable) from [<c0275cf4>] (clk_disable+0x20/0x2c) >> > r4:60000193 r3:00000002 >> > [<c0275cd4>] (clk_disable) from [<c01fd000>] (pm_clk_suspend+0x54/0x78) >> > r5:eed41500 r4:eed414c0 >> > [<c01fcfac>] (pm_clk_suspend) from [<c01f87f8>] (__rpm_callback+0x4c/0x7c) >> > r8:0000000c r7:00000004 r6:00000000 r5:eed42810 r4:eed42810 r3:c01fcfac >> > [<c01f87ac>] (__rpm_callback) from [<c01f8878>] (rpm_callback+0x50/0x84) >> > r5:eed42810 r4:eec7a000 >> > [<c01f8828>] (rpm_callback) from [<c01f8e54>] (rpm_suspend+0x214/0x3e4) >> > r6:00000000 r5:00000000 r4:eed42810 r3:eed42810 >> > [<c01f8c40>] (rpm_suspend) from [<c01f9224>] (rpm_idle+0x184/0x19c) >> > r10:00000010 r9:00000000 r8:ee4710bc r7:00000004 r6:00000004 r5:00000000 >> > r4:eed42810 >> > [<c01f90a0>] (rpm_idle) from [<c01f9328>] (__pm_runtime_idle+0x64/0x7c) >> > r6:60000113 r5:eed42870 r4:eed42810 r3:eed42870 >> > [<c01f92c4>] (__pm_runtime_idle) from [<c023ee90>] >> > (sh_eth_get_stats+0x228/0x23c) >> > r7:ee471000 r6:c0356bd4 r5:ee428000 r4:ee428000 >> > [<c023ec68>] (sh_eth_get_stats) from [<c028c194>] >> > (dev_get_stats+0x5c/0x84) >> > r4:eec7bc30 r3:c023ec68 >> > [<c028c138>] (dev_get_stats) from [<c029cb08>] >> > (rtnl_fill_ifinfo+0x410/0x8dc) >> > r6:00000000 r5:ee428000 r4:eed76ec0 r3:00000684 >> > [<c029c6f8>] (rtnl_fill_ifinfo) from [<c029d9f4>] (rtmsg_ifinfo+0x6c/0xd4) >> > r10:00000010 r9:ee428510 r8:ffffffff r7:eed76ec0 r6:ee428000 r5:000000d0 >> > r4:00000000 >> > [<c029d988>] (rtmsg_ifinfo) from [<c0291cd0>] >> > (register_netdevice+0x3dc/0x428)> >> > r10:00000000 r8:eed42810 r7:ee428010 r6:ee428000 r5:c047f600 r4:00000000 >> > [<c02918f4>] (register_netdevice) from [<c0291d38>] >> > (register_netdev+0x1c/0x2c)> >> > r7:eed42800 r6:ef203978 r5:ee46e790 r4:ee428000 >> > [<c0291d1c>] (register_netdev) from [<c0240528>] >> > (sh_eth_drv_probe+0x994/0xb7c) >> > r4:ee428000 r3:00000040 >> > [<c023fb94>] (sh_eth_drv_probe) from [<c01f3e6c>] >> > (platform_drv_probe+0x20/0x50) >> > r10:c042b548 r9:c045c544 r8:00000065 r7:c047d830 r6:00000000 r5:c047d830 >> > r4:eed42810 > > [snip] > >> this explains it, the call to stats causes a get_sync/put_sync which >> puts the device into a state where it /could/ be suspended and thus >> does get suspended in this case from the pm code. >> >> I'm not /sure/ why the pm_runtime code is not protecting against >> running this when a device probe is in progress, but it seems the >> best thing is to ensure that we always do a get/put sync in the >> driver to ensure we have a reference during probe. > > Wouldn't it be better to register the MDIO bus before registering the network > device ? It looks like the current order is prone to race conditions. Indeed, typically register_netdev() is the last call of the .probe() function. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello. On 17-03-2014 18:02, Geert Uytterhoeven wrote: > (adding Sergei) [...] >>>>> I have yet to ascertain how this ends up happening with device probe, >>>>> it seems to be very dependant on the code. >>>> Adding a WARN() in cpg_mstp_clock_endisable(): >>>> [<c0278c9c>] (cpg_mstp_clock_endisable) from [<c0278e10>] >>>> (cpg_mstp_clock_disable+0x14/0x18) >>>> r10:eec7a000 r9:00000003 r8:eed41508 r7:00000001 r6:a0000113 r5:eec23a00 >>>> r4:eec23a00 >>>> [<c0278dfc>] (cpg_mstp_clock_disable) from [<c0275bf4>] >>>> (__clk_disable+0x68/0x78) >>>> [<c0275b8c>] (__clk_disable) from [<c0275cf4>] (clk_disable+0x20/0x2c) >>>> r4:60000193 r3:00000002 >>>> [<c0275cd4>] (clk_disable) from [<c01fd000>] (pm_clk_suspend+0x54/0x78) >>>> r5:eed41500 r4:eed414c0 >>>> [<c01fcfac>] (pm_clk_suspend) from [<c01f87f8>] (__rpm_callback+0x4c/0x7c) >>>> r8:0000000c r7:00000004 r6:00000000 r5:eed42810 r4:eed42810 r3:c01fcfac >>>> [<c01f87ac>] (__rpm_callback) from [<c01f8878>] (rpm_callback+0x50/0x84) >>>> r5:eed42810 r4:eec7a000 >>>> [<c01f8828>] (rpm_callback) from [<c01f8e54>] (rpm_suspend+0x214/0x3e4) >>>> r6:00000000 r5:00000000 r4:eed42810 r3:eed42810 >>>> [<c01f8c40>] (rpm_suspend) from [<c01f9224>] (rpm_idle+0x184/0x19c) >>>> r10:00000010 r9:00000000 r8:ee4710bc r7:00000004 r6:00000004 r5:00000000 >>>> r4:eed42810 >>>> [<c01f90a0>] (rpm_idle) from [<c01f9328>] (__pm_runtime_idle+0x64/0x7c) >>>> r6:60000113 r5:eed42870 r4:eed42810 r3:eed42870 >>>> [<c01f92c4>] (__pm_runtime_idle) from [<c023ee90>] >>>> (sh_eth_get_stats+0x228/0x23c) >>>> r7:ee471000 r6:c0356bd4 r5:ee428000 r4:ee428000 >>>> [<c023ec68>] (sh_eth_get_stats) from [<c028c194>] >>>> (dev_get_stats+0x5c/0x84) >>>> r4:eec7bc30 r3:c023ec68 >>>> [<c028c138>] (dev_get_stats) from [<c029cb08>] >>>> (rtnl_fill_ifinfo+0x410/0x8dc) >>>> r6:00000000 r5:ee428000 r4:eed76ec0 r3:00000684 >>>> [<c029c6f8>] (rtnl_fill_ifinfo) from [<c029d9f4>] (rtmsg_ifinfo+0x6c/0xd4) >>>> r10:00000010 r9:ee428510 r8:ffffffff r7:eed76ec0 r6:ee428000 r5:000000d0 >>>> r4:00000000 >>>> [<c029d988>] (rtmsg_ifinfo) from [<c0291cd0>] >>>> (register_netdevice+0x3dc/0x428)> >>>> r10:00000000 r8:eed42810 r7:ee428010 r6:ee428000 r5:c047f600 r4:00000000 >>>> [<c02918f4>] (register_netdevice) from [<c0291d38>] >>>> (register_netdev+0x1c/0x2c)> >>>> r7:eed42800 r6:ef203978 r5:ee46e790 r4:ee428000 >>>> [<c0291d1c>] (register_netdev) from [<c0240528>] >>>> (sh_eth_drv_probe+0x994/0xb7c) >>>> r4:ee428000 r3:00000040 >>>> [<c023fb94>] (sh_eth_drv_probe) from [<c01f3e6c>] >>>> (platform_drv_probe+0x20/0x50) >>>> r10:c042b548 r9:c045c544 r8:00000065 r7:c047d830 r6:00000000 r5:c047d830 >>>> r4:eed42810 >> [snip] >>> this explains it, the call to stats causes a get_sync/put_sync which >>> puts the device into a state where it /could/ be suspended and thus >>> does get suspended in this case from the pm code. >>> I'm not /sure/ why the pm_runtime code is not protecting against >>> running this when a device probe is in progress, but it seems the >>> best thing is to ensure that we always do a get/put sync in the >>> driver to ensure we have a reference during probe. >> Wouldn't it be better to register the MDIO bus before registering the network >> device ? It looks like the current order is prone to race conditions. > Indeed, typically register_netdev() is the last call of the .probe() function. I'll see what can be done. I've been looking at the probe() method recently... > Gr{oetje,eeting}s, > Geert WBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Ben and Sergei, On Wednesday 19 March 2014 00:17:43 Sergei Shtylyov wrote: > On 03/17/2014 05:01 PM, Ben Dooks wrote: > >>>>> I have yet to ascertain how this ends up happening with device probe, > >>>>> it seems to be very dependant on the code. > >>>> > >>>> Adding a WARN() in cpg_mstp_clock_endisable(): > [...] > > >>> this explains it, the call to stats causes a get_sync/put_sync which > >>> puts the device into a state where it /could/ be suspended and thus > >>> does get suspended in this case from the pm code. > >>> > >>> I'm not /sure/ why the pm_runtime code is not protecting against > >>> running this when a device probe is in progress, but it seems the > >>> best thing is to ensure that we always do a get/put sync in the > >>> driver to ensure we have a reference during probe. > >> > >> Wouldn't it be better to register the MDIO bus before registering the > >> network device ? It looks like the current order is prone to race > >> conditions. > > > > Not sure, requires input? > > People say that the driver should be ready to receive the ndo_open() method > call even before register_netdev() returns. I have prepared the patch to > probe MDIO before calling register_netdev() now, need to sanity test it. Thank you. > >> Another potential issue, does the network layer guarantee that the device > >> will always be opened by an .ndo_open() call before performing any MDIO > >> operation ? sh_mmd_ctrl, sh_set_mdio and sh_get_mdio access sh-eth > >> registers, could we be missing runtime PM calls in those call paths ? > > > > I'm not sure if there is any guarantee, I don't think the PHY driver > > does any sort of open on probe. I do have a patch to ensure that the > > MDIO operations are wrapped with pm_runtime_{get,put}_sync() calls as > > the probe of the PHY often fails without it. > > Respin this patch please (addressing my comments), so that DaveM could take > it. Before wrapping MDIO operations in runtime PM calls, could we check whether that's really needed ? Moving PHY registration before network device registration will fix PHY probe problems, we might not need any other change.
Hello. On 03/17/2014 05:01 PM, Ben Dooks wrote: >>>>> I have yet to ascertain how this ends up happening with device probe, >>>>> it seems to be very dependant on the code. >>>> Adding a WARN() in cpg_mstp_clock_endisable(): [...] >>> this explains it, the call to stats causes a get_sync/put_sync which >>> puts the device into a state where it /could/ be suspended and thus >>> does get suspended in this case from the pm code. >>> I'm not /sure/ why the pm_runtime code is not protecting against >>> running this when a device probe is in progress, but it seems the >>> best thing is to ensure that we always do a get/put sync in the >>> driver to ensure we have a reference during probe. >> Wouldn't it be better to register the MDIO bus before registering the network >> device ? It looks like the current order is prone to race conditions. > Not sure, requires input? People say that the driver should be ready to receive the ndo_open() method call even before register_netdev() returns. I have prepared the patch to probe MDIO before calling register_netdev() now, need to sanity test it. >> Another potential issue, does the network layer guarantee that the device will >> always be opened by an .ndo_open() call before performing any MDIO operation ? >> sh_mmd_ctrl, sh_set_mdio and sh_get_mdio access sh-eth registers, could we be >> missing runtime PM calls in those call paths ? > I'm not sure if there is any guarantee, I don't think the PHY driver > does any sort of open on probe. I do have a patch to ensure that the > MDIO operations are wrapped with pm_runtime_{get,put}_sync() calls as > the probe of the PHY often fails without it. Respin this patch please (addressing my comments), so that DaveM could take it. WBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 17/03/14 14:41, Laurent Pinchart wrote: > Hi Ben, > > On Monday 17 March 2014 13:36:33 Ben Dooks wrote: > > [snip] > >> From pm_runtime.h: >> >> static inline int pm_runtime_get_sync(struct device *dev) >> { >> return __pm_runtime_resume(dev, RPM_GET_PUT); >> } >> >> static inline int pm_request_resume(struct device *dev) >> { >> return __pm_runtime_resume(dev, RPM_ASYNC); >> } >> >> So it looks like pm_runtime_resume() does not protect against the >> possibility that something else may re-suspend the device. > > Correct. > >> I have yet to ascertain how this ends up happening with device probe, it >> seems to be very dependant on the code. > > We might be doing something wrong in the driver from a runtime PM point of > view that leads to the device being suspended. I'd like to catch that instead > of hiding it by a pm_runtime_get_sync() call. If it turns out that we're not > doing anything wrong then replacing pm_runtime_resume() with > pm_runtime_get_sync() would of course be fine. My view is the pm_runtime)_resume() is just nasty, as it makes no guarantees that the device cannot be re-suspended.
Hi Ben, On Wednesday 19 March 2014 09:19:44 Ben Dooks wrote: > On 17/03/14 14:41, Laurent Pinchart wrote: > > On Monday 17 March 2014 13:36:33 Ben Dooks wrote: > > > > [snip] > > > >> From pm_runtime.h: > >> static inline int pm_runtime_get_sync(struct device *dev) > >> { > >> return __pm_runtime_resume(dev, RPM_GET_PUT); > >> } > >> > >> static inline int pm_request_resume(struct device *dev) > >> { > >> return __pm_runtime_resume(dev, RPM_ASYNC); > >> } > >> > >> So it looks like pm_runtime_resume() does not protect against the > >> possibility that something else may re-suspend the device. > > > > Correct. > > > >> I have yet to ascertain how this ends up happening with device probe, it > >> seems to be very dependant on the code. > > > > We might be doing something wrong in the driver from a runtime PM point of > > view that leads to the device being suspended. I'd like to catch that > > instead of hiding it by a pm_runtime_get_sync() call. If it turns out > > that we're not doing anything wrong then replacing pm_runtime_resume() > > with pm_runtime_get_sync() would of course be fine. > > My view is the pm_runtime)_resume() is just nasty, as it makes no guarantees > that the device cannot be re-suspended. My point is that requiring such a guarantee at probe time might be a sign that something else is wrong (which was the case with the sh-eth driver). I'm not saying it should never be done of course.
diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c index 4f76b5e..88aac2c 100644 --- a/drivers/net/ethernet/renesas/sh_eth.c +++ b/drivers/net/ethernet/renesas/sh_eth.c @@ -2870,6 +2870,7 @@ static int sh_eth_drv_probe(struct platform_device *pdev) mdp->pdev = pdev; pm_runtime_enable(&pdev->dev); pm_runtime_resume(&pdev->dev); + pm_runtime_get_sync(&pdev->dev); if (pdev->dev.of_node) pd = sh_eth_parse_dt(&pdev->dev); @@ -2961,6 +2962,7 @@ static int sh_eth_drv_probe(struct platform_device *pdev) pr_info("Base address at 0x%x, %pM, IRQ %d.\n", (u32)ndev->base_addr, ndev->dev_addr, ndev->irq); + pm_runtime_put_sync(&pdev->dev); platform_set_drvdata(pdev, ndev); return ret;
It seems the pm_rumtime work queue is causing the device to be suspended during initialisation, thus the initialisation may not be able to access registers properly. Use pm_runtime_get_sync() and pm_runtime_put_sync() to ensure that the pm system does not suspend it during the probe() call. Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> --- drivers/net/ethernet/renesas/sh_eth.c | 2 ++ 1 file changed, 2 insertions(+)