diff mbox

sh_eth: ensure pm_runtime cannot suspend the device during init

Message ID 1394823605-31883-1-git-send-email-ben.dooks@codethink.co.uk (mailing list archive)
State Awaiting Upstream
Headers show

Commit Message

Ben Dooks March 14, 2014, 7 p.m. UTC
It seems the pm_rumtime work queue is causing the device to be suspended
during initialisation, thus the initialisation may not be able to access
registers properly. Use pm_runtime_get_sync() and pm_runtime_put_sync()
to ensure that the pm system does not suspend it during the probe() call.

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
---
 drivers/net/ethernet/renesas/sh_eth.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Ben Dooks March 14, 2014, 7:01 p.m. UTC | #1
On 14/03/14 19:00, Ben Dooks wrote:
> It seems the pm_rumtime work queue is causing the device to be suspended
> during initialisation, thus the initialisation may not be able to access
> registers properly. Use pm_runtime_get_sync() and pm_runtime_put_sync()
> to ensure that the pm system does not suspend it during the probe() call.

If this work's for Geert then I'll look into updating the header and
sending it to the netdev list.
Ben Dooks March 14, 2014, 7:19 p.m. UTC | #2
On 14/03/14 20:06, Sergei Shtylyov wrote:
> Hello.
>
> On 03/14/2014 10:01 PM, Ben Dooks wrote:
>
>>> It seems the pm_rumtime work queue is causing the device to be suspended
>>> during initialisation, thus the initialisation may not be able to access
>>> registers properly. Use pm_runtime_get_sync() and pm_runtime_put_sync()
>>> to ensure that the pm system does not suspend it during the probe()
>>> call.
>
>> If this work's for Geert then I'll look into updating the header and
>> sending it to the netdev list.
>
>     I keep wondering why I'm seeing no issues in my setup... I don't
> have drivers/sh/ patch and have CONFIG_PM_RUNTIME=y. Seeing no issues on
> multiplatform kernels on neither Lager nor Koelsch (however, other
> platforms are not enabled).

Without drivers/sh there is no specification to the pm code to do
anything with the platform or OF devices. This means the clocks are
not being auto-managed, which in turn means the clock is probably
being left on from the boot-loader and thus the unit works correctly.

Unfortunately, without drivers/sh patched a number of other drivers
fail to work. This means we need it or something similar in the kernel.

I will spend some time over the weekend moving the code out of the
drivers/sh to somewhere else so that this does not occur again.
Sergei Shtylyov March 14, 2014, 8:06 p.m. UTC | #3
Hello.

On 03/14/2014 10:01 PM, Ben Dooks wrote:

>> It seems the pm_rumtime work queue is causing the device to be suspended
>> during initialisation, thus the initialisation may not be able to access
>> registers properly. Use pm_runtime_get_sync() and pm_runtime_put_sync()
>> to ensure that the pm system does not suspend it during the probe() call.

> If this work's for Geert then I'll look into updating the header and
> sending it to the netdev list.

    I keep wondering why I'm seeing no issues in my setup... I don't have 
drivers/sh/ patch and have CONFIG_PM_RUNTIME=y. Seeing no issues on 
multiplatform kernels on neither Lager nor Koelsch (however, other platforms 
are not enabled).

WBR, Sergei


--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sergei Shtylyov March 14, 2014, 10:13 p.m. UTC | #4
Hello.

On 03/14/2014 10:19 PM, Ben Dooks wrote:

>>>> It seems the pm_rumtime work queue is causing the device to be suspended
>>>> during initialisation, thus the initialisation may not be able to access
>>>> registers properly. Use pm_runtime_get_sync() and pm_runtime_put_sync()
>>>> to ensure that the pm system does not suspend it during the probe()
>>>> call.

>>> If this work's for Geert then I'll look into updating the header and
>>> sending it to the netdev list.

>>     I keep wondering why I'm seeing no issues in my setup... I don't
>> have drivers/sh/ patch and have CONFIG_PM_RUNTIME=y. Seeing no issues on
>> multiplatform kernels on neither Lager nor Koelsch (however, other
>> platforms are not enabled).

> Without drivers/sh there is no specification to the pm code to do
> anything with the platform or OF devices. This means the clocks are
> not being auto-managed, which in turn means the clock is probably
> being left on from the boot-loader and thus the unit works correctly.

    Yeah, I TFTP-boot over Ether from U-Boot. I wonder how others' setup is 
different from mine.

> Unfortunately, without drivers/sh patched a number of other drivers
> fail to work. This means we need it or something similar in the kernel.

    I wonder how drivers/sh/ came to existence at all. :-) Although there's 
also drivers/parisc/...

> I will spend some time over the weekend moving the code out of the
> drivers/sh to somewhere else so that this does not occur again.

    Thanks for your work. I just wasn't aware of these issues when I ported 
Ether to DT -- it Just Worked (TM).

WBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laurent Pinchart March 15, 2014, 11:19 a.m. UTC | #5
Hi Ben,

Thank you for the patch.

On Friday 14 March 2014 19:00:05 Ben Dooks wrote:
> It seems the pm_rumtime work queue is causing the device to be suspended
> during initialisation,

Have you investigated through which call path(s) this happens ? If device can 
be runtime suspended during their probe function despite calling 
pm_runtime_resume() then I don't see the point of that function at all.

> thus the initialisation may not be able to access registers properly. Use
> pm_runtime_get_sync() and pm_runtime_put_sync() to ensure that the pm system
> does not suspend it during the probe() call.
> 
> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> ---
>  drivers/net/ethernet/renesas/sh_eth.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/ethernet/renesas/sh_eth.c
> b/drivers/net/ethernet/renesas/sh_eth.c index 4f76b5e..88aac2c 100644
> --- a/drivers/net/ethernet/renesas/sh_eth.c
> +++ b/drivers/net/ethernet/renesas/sh_eth.c
> @@ -2870,6 +2870,7 @@ static int sh_eth_drv_probe(struct platform_device
> *pdev) mdp->pdev = pdev;
>  	pm_runtime_enable(&pdev->dev);
>  	pm_runtime_resume(&pdev->dev);
> +	pm_runtime_get_sync(&pdev->dev);

I believe pm_runtime_resume() isn't needed anymore if we call 
pm_runtime_get_sync().

>  	if (pdev->dev.of_node)
>  		pd = sh_eth_parse_dt(&pdev->dev);
> @@ -2961,6 +2962,7 @@ static int sh_eth_drv_probe(struct platform_device
> *pdev) pr_info("Base address at 0x%x, %pM, IRQ %d.\n",
>  		(u32)ndev->base_addr, ndev->dev_addr, ndev->irq);
> 
> +	pm_runtime_put_sync(&pdev->dev);
>  	platform_set_drvdata(pdev, ndev);
> 
>  	return ret;
Geert Uytterhoeven March 17, 2014, 9:22 a.m. UTC | #6
Hi Ben,

On Fri, Mar 14, 2014 at 8:00 PM, Ben Dooks <ben.dooks@codethink.co.uk> wrote:
> It seems the pm_rumtime work queue is causing the device to be suspended
> during initialisation, thus the initialisation may not be able to access
> registers properly. Use pm_runtime_get_sync() and pm_runtime_put_sync()
> to ensure that the pm system does not suspend it during the probe() call.
>
> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>

Thanks, this fixes the "imprecise external abort" on Koelsch.

Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Dooks March 17, 2014, 9:41 a.m. UTC | #7
On 17/03/14 09:22, Geert Uytterhoeven wrote:
> Hi Ben,
>
> On Fri, Mar 14, 2014 at 8:00 PM, Ben Dooks <ben.dooks@codethink.co.uk> wrote:
>> It seems the pm_rumtime work queue is causing the device to be suspended
>> during initialisation, thus the initialisation may not be able to access
>> registers properly. Use pm_runtime_get_sync() and pm_runtime_put_sync()
>> to ensure that the pm system does not suspend it during the probe() call.
>>
>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>
> Thanks, this fixes the "imprecise external abort" on Koelsch.
>
> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>

Thanks, will consider sending this to the netdev list later.
Ben Dooks March 17, 2014, 11:20 a.m. UTC | #8
On 15/03/14 11:19, Laurent Pinchart wrote:
> Hi Ben,
>
> Thank you for the patch.
>
> On Friday 14 March 2014 19:00:05 Ben Dooks wrote:
>> It seems the pm_rumtime work queue is causing the device to be suspended
>> during initialisation,
>
> Have you investigated through which call path(s) this happens ? If device can
> be runtime suspended during their probe function despite calling
> pm_runtime_resume() then I don't see the point of that function at all.
>
>> thus the initialisation may not be able to access registers properly. Use
>> pm_runtime_get_sync() and pm_runtime_put_sync() to ensure that the pm system
>> does not suspend it during the probe() call.
>>
>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>> ---
>>   drivers/net/ethernet/renesas/sh_eth.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/renesas/sh_eth.c
>> b/drivers/net/ethernet/renesas/sh_eth.c index 4f76b5e..88aac2c 100644
>> --- a/drivers/net/ethernet/renesas/sh_eth.c
>> +++ b/drivers/net/ethernet/renesas/sh_eth.c
>> @@ -2870,6 +2870,7 @@ static int sh_eth_drv_probe(struct platform_device
>> *pdev) mdp->pdev = pdev;
>>   	pm_runtime_enable(&pdev->dev);
>>   	pm_runtime_resume(&pdev->dev);
>> +	pm_runtime_get_sync(&pdev->dev);
>
> I believe pm_runtime_resume() isn't needed anymore if we call
> pm_runtime_get_sync().
>
>>   	if (pdev->dev.of_node)
>>   		pd = sh_eth_parse_dt(&pdev->dev);
>> @@ -2961,6 +2962,7 @@ static int sh_eth_drv_probe(struct platform_device
>> *pdev) pr_info("Base address at 0x%x, %pM, IRQ %d.\n",
>>   		(u32)ndev->base_addr, ndev->dev_addr, ndev->irq);
>>
>> +	pm_runtime_put_sync(&pdev->dev);
>>   	platform_set_drvdata(pdev, ndev);
>>
>>   	return ret;

I will look at removing the pm_runtime_resume call as well in this
patch.
Laurent Pinchart March 17, 2014, 11:35 a.m. UTC | #9
Hi Ben,

On Monday 17 March 2014 11:20:32 Ben Dooks wrote:
> On 15/03/14 11:19, Laurent Pinchart wrote:
> > On Friday 14 March 2014 19:00:05 Ben Dooks wrote:
> >> It seems the pm_rumtime work queue is causing the device to be suspended
> >> during initialisation,
> > 
> > Have you investigated through which call path(s) this happens ? If device
> > can be runtime suspended during their probe function despite calling
> > pm_runtime_resume() then I don't see the point of that function at all.
> > 
> >> thus the initialisation may not be able to access registers properly. Use
> >> pm_runtime_get_sync() and pm_runtime_put_sync() to ensure that the pm
> >> system does not suspend it during the probe() call.
> >> 
> >> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> >> ---
> >> 
> >>   drivers/net/ethernet/renesas/sh_eth.c | 2 ++
> >>   1 file changed, 2 insertions(+)
> >> 
> >> diff --git a/drivers/net/ethernet/renesas/sh_eth.c
> >> b/drivers/net/ethernet/renesas/sh_eth.c index 4f76b5e..88aac2c 100644
> >> --- a/drivers/net/ethernet/renesas/sh_eth.c
> >> +++ b/drivers/net/ethernet/renesas/sh_eth.c
> >> @@ -2870,6 +2870,7 @@ static int sh_eth_drv_probe(struct platform_device
> >> *pdev)
> >>   	mdp->pdev = pdev;
> >>   	pm_runtime_enable(&pdev->dev);
> >>   	pm_runtime_resume(&pdev->dev);
> >> +	pm_runtime_get_sync(&pdev->dev);
> > 
> > I believe pm_runtime_resume() isn't needed anymore if we call
> > pm_runtime_get_sync().
> > 
> >>   	if (pdev->dev.of_node)
> >>   		pd = sh_eth_parse_dt(&pdev->dev);
> >> @@ -2961,6 +2962,7 @@ static int sh_eth_drv_probe(struct platform_device
> >> *pdev)
> >>   	pr_info("Base address at 0x%x, %pM, IRQ %d.\n",
> >>   		(u32)ndev->base_addr, ndev->dev_addr, ndev->irq);
> >> 
> >> +	pm_runtime_put_sync(&pdev->dev);
> >>   	platform_set_drvdata(pdev, ndev);
> >>   	
> >>   	return ret;
> 
> I will look at removing the pm_runtime_resume call as well in this patch.

Thank you. It would also be nice if you could find out what causes the PM 
runtime work queue to suspend the device. pm_runtime_resume() is documented as 
being the function to call at probe time. I'd like to know whether the problem 
comes from the PM runtime core itself, in which case the documentation should 
be updated or the PM runtime core should be fixed, or from operations 
performed by the sh-eth driver, in which case we might need a different fix in 
the driver.
Ben Dooks March 17, 2014, 11:40 a.m. UTC | #10
On 17/03/14 11:35, Laurent Pinchart wrote:
> Hi Ben,
>
> On Monday 17 March 2014 11:20:32 Ben Dooks wrote:
>> On 15/03/14 11:19, Laurent Pinchart wrote:
>>> On Friday 14 March 2014 19:00:05 Ben Dooks wrote:
>>>> It seems the pm_rumtime work queue is causing the device to be suspended
>>>> during initialisation,
>>>
>>> Have you investigated through which call path(s) this happens ? If device
>>> can be runtime suspended during their probe function despite calling
>>> pm_runtime_resume() then I don't see the point of that function at all.
>>>
>>>> thus the initialisation may not be able to access registers properly. Use
>>>> pm_runtime_get_sync() and pm_runtime_put_sync() to ensure that the pm
>>>> system does not suspend it during the probe() call.
>>>>
>>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>>> ---
>>>>
>>>>    drivers/net/ethernet/renesas/sh_eth.c | 2 ++
>>>>    1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/drivers/net/ethernet/renesas/sh_eth.c
>>>> b/drivers/net/ethernet/renesas/sh_eth.c index 4f76b5e..88aac2c 100644
>>>> --- a/drivers/net/ethernet/renesas/sh_eth.c
>>>> +++ b/drivers/net/ethernet/renesas/sh_eth.c
>>>> @@ -2870,6 +2870,7 @@ static int sh_eth_drv_probe(struct platform_device
>>>> *pdev)
>>>>    	mdp->pdev = pdev;
>>>>    	pm_runtime_enable(&pdev->dev);
>>>>    	pm_runtime_resume(&pdev->dev);
>>>> +	pm_runtime_get_sync(&pdev->dev);
>>>
>>> I believe pm_runtime_resume() isn't needed anymore if we call
>>> pm_runtime_get_sync().
>>>
>>>>    	if (pdev->dev.of_node)
>>>>    		pd = sh_eth_parse_dt(&pdev->dev);
>>>> @@ -2961,6 +2962,7 @@ static int sh_eth_drv_probe(struct platform_device
>>>> *pdev)
>>>>    	pr_info("Base address at 0x%x, %pM, IRQ %d.\n",
>>>>    		(u32)ndev->base_addr, ndev->dev_addr, ndev->irq);
>>>>
>>>> +	pm_runtime_put_sync(&pdev->dev);
>>>>    	platform_set_drvdata(pdev, ndev);
>>>>    	
>>>>    	return ret;
>>
>> I will look at removing the pm_runtime_resume call as well in this patch.
>
> Thank you. It would also be nice if you could find out what causes the PM
> runtime work queue to suspend the device. pm_runtime_resume() is documented as
> being the function to call at probe time. I'd like to know whether the problem
> comes from the PM runtime core itself, in which case the documentation should
> be updated or the PM runtime core should be fixed, or from operations
> performed by the sh-eth driver, in which case we might need a different fix in
> the driver.

I'm not entirely sure, pm_runtime_resume() seems like something to call
during a resume operation. It probably does not increment device use
count, and thus the pm worker thread is allowed to re-suspend the device
when it runs.

pm_runtime_get_sync() is the right thing to do as we are dealing with
access device registers during the probe and thus should ensure the
device does not get shutdown until the probe has finished.
Laurent Pinchart March 17, 2014, 11:53 a.m. UTC | #11
Hi Ben,

On Monday 17 March 2014 11:40:11 Ben Dooks wrote:
> On 17/03/14 11:35, Laurent Pinchart wrote:
> > On Monday 17 March 2014 11:20:32 Ben Dooks wrote:
> >> On 15/03/14 11:19, Laurent Pinchart wrote:
> >>> On Friday 14 March 2014 19:00:05 Ben Dooks wrote:
> >>>> It seems the pm_rumtime work queue is causing the device to be
> >>>> suspended during initialisation,
> >>> 
> >>> Have you investigated through which call path(s) this happens ? If
> >>> device can be runtime suspended during their probe function despite
> >>> calling pm_runtime_resume() then I don't see the point of that function
> >>> at all.
> >>> 
> >>>> thus the initialisation may not be able to access registers properly.
> >>>> Use pm_runtime_get_sync() and pm_runtime_put_sync() to ensure that the
> >>>> pm system does not suspend it during the probe() call.
> >>>> 
> >>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> >>>> ---
> >>>> 
> >>>>    drivers/net/ethernet/renesas/sh_eth.c | 2 ++
> >>>>    1 file changed, 2 insertions(+)
> >>>> 
> >>>> diff --git a/drivers/net/ethernet/renesas/sh_eth.c
> >>>> b/drivers/net/ethernet/renesas/sh_eth.c index 4f76b5e..88aac2c 100644
> >>>> --- a/drivers/net/ethernet/renesas/sh_eth.c
> >>>> +++ b/drivers/net/ethernet/renesas/sh_eth.c
> >>>> @@ -2870,6 +2870,7 @@ static int sh_eth_drv_probe(struct
> >>>> platform_device *pdev)
> >>>>    	mdp->pdev = pdev;
> >>>>    	pm_runtime_enable(&pdev->dev);
> >>>>    	pm_runtime_resume(&pdev->dev);
> >>>> +   	pm_runtime_get_sync(&pdev->dev);
> >>> 
> >>> I believe pm_runtime_resume() isn't needed anymore if we call
> >>> pm_runtime_get_sync().
> >>> 
> >>>>    	if (pdev->dev.of_node)
> >>>>    		pd = sh_eth_parse_dt(&pdev->dev);
> >>>> @@ -2961,6 +2962,7 @@ static int sh_eth_drv_probe(struct
> >>>> platform_device *pdev)
> >>>>    	pr_info("Base address at 0x%x, %pM, IRQ %d.\n",
> >>>>    		(u32)ndev->base_addr, ndev->dev_addr, ndev->irq);
> >>>> 
> >>>> +   	pm_runtime_put_sync(&pdev->dev);
> >>>>    	platform_set_drvdata(pdev, ndev);
> >>>>    	
> >>>>    	return ret;
> >> 
> >> I will look at removing the pm_runtime_resume call as well in this patch.
> > 
> > Thank you. It would also be nice if you could find out what causes the PM
> > runtime work queue to suspend the device. pm_runtime_resume() is
> > documented as being the function to call at probe time. I'd like to know
> > whether the problem comes from the PM runtime core itself, in which case
> > the documentation should be updated or the PM runtime core should be
> > fixed, or from operations performed by the sh-eth driver, in which case
> > we might need a different fix in the driver.
> 
> I'm not entirely sure, pm_runtime_resume() seems like something to call
> during a resume operation. It probably does not increment device use
> count, and thus the pm worker thread is allowed to re-suspend the device
> when it runs.

Yes, but why does it do so ? The runtime PM work queue doesn't seem to go 
through devices to try and suspend them, it seems to only process delayed 
work. What queues the work that makes runtime PM suspend the device ?

> pm_runtime_get_sync() is the right thing to do as we are dealing with
> access device registers during the probe and thus should ensure the
> device does not get shutdown until the probe has finished.

pm_runtime_resume() is documented as being the function to call in the probe 
handler. I'd like to get a clarification on this from the runtime PM 
developers. If the documentation is wrong it needs to be fixed.
Ben Dooks March 17, 2014, 12:56 p.m. UTC | #12
On 17/03/14 11:53, Laurent Pinchart wrote:
> Hi Ben,
>
> On Monday 17 March 2014 11:40:11 Ben Dooks wrote:
>> On 17/03/14 11:35, Laurent Pinchart wrote:
>>> On Monday 17 March 2014 11:20:32 Ben Dooks wrote:
>>>> On 15/03/14 11:19, Laurent Pinchart wrote:
>>>>> On Friday 14 March 2014 19:00:05 Ben Dooks wrote:
>>>>>> It seems the pm_rumtime work queue is causing the device to be
>>>>>> suspended during initialisation,
>>>>>
>>>>> Have you investigated through which call path(s) this happens ? If
>>>>> device can be runtime suspended during their probe function despite
>>>>> calling pm_runtime_resume() then I don't see the point of that function
>>>>> at all.
>>>>>
>>>>>> thus the initialisation may not be able to access registers properly.
>>>>>> Use pm_runtime_get_sync() and pm_runtime_put_sync() to ensure that the
>>>>>> pm system does not suspend it during the probe() call.
>>>>>>
>>>>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>>> ---
>>>>>>
>>>>>>     drivers/net/ethernet/renesas/sh_eth.c | 2 ++
>>>>>>     1 file changed, 2 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/net/ethernet/renesas/sh_eth.c
>>>>>> b/drivers/net/ethernet/renesas/sh_eth.c index 4f76b5e..88aac2c 100644
>>>>>> --- a/drivers/net/ethernet/renesas/sh_eth.c
>>>>>> +++ b/drivers/net/ethernet/renesas/sh_eth.c
>>>>>> @@ -2870,6 +2870,7 @@ static int sh_eth_drv_probe(struct
>>>>>> platform_device *pdev)
>>>>>>     	mdp->pdev = pdev;
>>>>>>     	pm_runtime_enable(&pdev->dev);
>>>>>>     	pm_runtime_resume(&pdev->dev);
>>>>>> +   	pm_runtime_get_sync(&pdev->dev);
>>>>>
>>>>> I believe pm_runtime_resume() isn't needed anymore if we call
>>>>> pm_runtime_get_sync().
>>>>>
>>>>>>     	if (pdev->dev.of_node)
>>>>>>     		pd = sh_eth_parse_dt(&pdev->dev);
>>>>>> @@ -2961,6 +2962,7 @@ static int sh_eth_drv_probe(struct
>>>>>> platform_device *pdev)
>>>>>>     	pr_info("Base address at 0x%x, %pM, IRQ %d.\n",
>>>>>>     		(u32)ndev->base_addr, ndev->dev_addr, ndev->irq);
>>>>>>
>>>>>> +   	pm_runtime_put_sync(&pdev->dev);
>>>>>>     	platform_set_drvdata(pdev, ndev);
>>>>>>     	
>>>>>>     	return ret;
>>>>
>>>> I will look at removing the pm_runtime_resume call as well in this patch.
>>>
>>> Thank you. It would also be nice if you could find out what causes the PM
>>> runtime work queue to suspend the device. pm_runtime_resume() is
>>> documented as being the function to call at probe time. I'd like to know
>>> whether the problem comes from the PM runtime core itself, in which case
>>> the documentation should be updated or the PM runtime core should be
>>> fixed, or from operations performed by the sh-eth driver, in which case
>>> we might need a different fix in the driver.
>>
>> I'm not entirely sure, pm_runtime_resume() seems like something to call
>> during a resume operation. It probably does not increment device use
>> count, and thus the pm worker thread is allowed to re-suspend the device
>> when it runs.
>
> Yes, but why does it do so ? The runtime PM work queue doesn't seem to go
> through devices to try and suspend them, it seems to only process delayed
> work. What queues the work that makes runtime PM suspend the device ?
>
>> pm_runtime_get_sync() is the right thing to do as we are dealing with
>> access device registers during the probe and thus should ensure the
>> device does not get shutdown until the probe has finished.
>
> pm_runtime_resume() is documented as being the function to call in the probe
> handler. I'd like to get a clarification on this from the runtime PM
> developers. If the documentation is wrong it needs to be fixed.

Hmm, wonder if it works for things that take a long time to run,
or for things that create and probe sub-devices such as the phy?

I annotated the clk-mstp.c driver to test for removal of the eth
clock and the culprit in this case was a pm thread running
before the device finished probing and suspending. I do not have
the exact debug output as I deleted it after looking in to it.
Laurent Pinchart March 17, 2014, 1:27 p.m. UTC | #13
Hi Ben,

On Monday 17 March 2014 12:56:00 Ben Dooks wrote:
> On 17/03/14 11:53, Laurent Pinchart wrote:
> > On Monday 17 March 2014 11:40:11 Ben Dooks wrote:
> >> On 17/03/14 11:35, Laurent Pinchart wrote:
> >>> On Monday 17 March 2014 11:20:32 Ben Dooks wrote:
> >>>> On 15/03/14 11:19, Laurent Pinchart wrote:
> >>>>> On Friday 14 March 2014 19:00:05 Ben Dooks wrote:
> >>>>>> It seems the pm_rumtime work queue is causing the device to be
> >>>>>> suspended during initialisation,
> >>>>> 
> >>>>> Have you investigated through which call path(s) this happens ? If
> >>>>> device can be runtime suspended during their probe function despite
> >>>>> calling pm_runtime_resume() then I don't see the point of that
> >>>>> function at all.
> >>>>> 
> >>>>>> thus the initialisation may not be able to access registers properly.
> >>>>>> Use pm_runtime_get_sync() and pm_runtime_put_sync() to ensure that
> >>>>>> the pm system does not suspend it during the probe() call.
> >>>>>> 
> >>>>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> >>>>>> ---
> >>>>>> 
> >>>>>>     drivers/net/ethernet/renesas/sh_eth.c | 2 ++
> >>>>>>     1 file changed, 2 insertions(+)
> >>>>>> 
> >>>>>> diff --git a/drivers/net/ethernet/renesas/sh_eth.c
> >>>>>> b/drivers/net/ethernet/renesas/sh_eth.c index 4f76b5e..88aac2c 100644
> >>>>>> --- a/drivers/net/ethernet/renesas/sh_eth.c
> >>>>>> +++ b/drivers/net/ethernet/renesas/sh_eth.c
> >>>>>> @@ -2870,6 +2870,7 @@ static int sh_eth_drv_probe(struct
> >>>>>> platform_device *pdev)
> >>>>>>     	mdp->pdev = pdev;
> >>>>>>     	pm_runtime_enable(&pdev->dev);
> >>>>>>     	pm_runtime_resume(&pdev->dev);
> >>>>>> +   	pm_runtime_get_sync(&pdev->dev);
> >>>>> 
> >>>>> I believe pm_runtime_resume() isn't needed anymore if we call
> >>>>> pm_runtime_get_sync().
> >>>>> 
> >>>>>>     	if (pdev->dev.of_node)
> >>>>>>     		pd = sh_eth_parse_dt(&pdev->dev);
> >>>>>> @@ -2961,6 +2962,7 @@ static int sh_eth_drv_probe(struct
> >>>>>> platform_device *pdev)
> >>>>>>     	pr_info("Base address at 0x%x, %pM, IRQ %d.\n",
> >>>>>>     		(u32)ndev->base_addr, ndev->dev_addr, ndev->irq);
> >>>>>> 
> >>>>>> +   	pm_runtime_put_sync(&pdev->dev);
> >>>>>>     	platform_set_drvdata(pdev, ndev);
> >>>>>>     	
> >>>>>>     	return ret;
> >>>> 
> >>>> I will look at removing the pm_runtime_resume call as well in this
> >>>> patch.
> >>> 
> >>> Thank you. It would also be nice if you could find out what causes the
> >>> PM runtime work queue to suspend the device. pm_runtime_resume() is
> >>> documented as being the function to call at probe time. I'd like to know
> >>> whether the problem comes from the PM runtime core itself, in which case
> >>> the documentation should be updated or the PM runtime core should be
> >>> fixed, or from operations performed by the sh-eth driver, in which case
> >>> we might need a different fix in the driver.
> >> 
> >> I'm not entirely sure, pm_runtime_resume() seems like something to call
> >> during a resume operation. It probably does not increment device use
> >> count, and thus the pm worker thread is allowed to re-suspend the device
> >> when it runs.
> > 
> > Yes, but why does it do so ? The runtime PM work queue doesn't seem to go
> > through devices to try and suspend them, it seems to only process delayed
> > work. What queues the work that makes runtime PM suspend the device ?
> > 
> >> pm_runtime_get_sync() is the right thing to do as we are dealing with
> >> access device registers during the probe and thus should ensure the
> >> device does not get shutdown until the probe has finished.
> > 
> > pm_runtime_resume() is documented as being the function to call in the
> > probe handler. I'd like to get a clarification on this from the runtime
> > PM developers. If the documentation is wrong it needs to be fixed.
> 
> Hmm, wonder if it works for things that take a long time to run, or for
> things that create and probe sub-devices such as the phy?

That's exactly what I'd like to know :-)

> I annotated the clk-mstp.c driver to test for removal of the eth clock and
> the culprit in this case was a pm thread running before the device finished
> probing and suspending. I do not have the exact debug output as I deleted it
> after looking in to it.

Would you be able to perform more tests ? I'd like to get to the bottom of 
this problem and find out whether the best solution would be to fix the 
runtime PM core, the runtime PM documentation or the driver (or a combination 
of them).
Ben Dooks March 17, 2014, 1:36 p.m. UTC | #14
On 17/03/14 13:27, Laurent Pinchart wrote:
> Hi Ben,
>
> On Monday 17 March 2014 12:56:00 Ben Dooks wrote:
>> On 17/03/14 11:53, Laurent Pinchart wrote:
>>> On Monday 17 March 2014 11:40:11 Ben Dooks wrote:
>>>> On 17/03/14 11:35, Laurent Pinchart wrote:
>>>>> On Monday 17 March 2014 11:20:32 Ben Dooks wrote:
>>>>>> On 15/03/14 11:19, Laurent Pinchart wrote:
>>>>>>> On Friday 14 March 2014 19:00:05 Ben Dooks wrote:
>>>>>>>> It seems the pm_rumtime work queue is causing the device to be
>>>>>>>> suspended during initialisation,
>>>>>>>
>>>>>>> Have you investigated through which call path(s) this happens ? If
>>>>>>> device can be runtime suspended during their probe function despite
>>>>>>> calling pm_runtime_resume() then I don't see the point of that
>>>>>>> function at all.
>>>>>>>
>>>>>>>> thus the initialisation may not be able to access registers properly.
>>>>>>>> Use pm_runtime_get_sync() and pm_runtime_put_sync() to ensure that
>>>>>>>> the pm system does not suspend it during the probe() call.
>>>>>>>>
>>>>>>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>>>>> ---
>>>>>>>>
>>>>>>>>      drivers/net/ethernet/renesas/sh_eth.c | 2 ++
>>>>>>>>      1 file changed, 2 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/net/ethernet/renesas/sh_eth.c
>>>>>>>> b/drivers/net/ethernet/renesas/sh_eth.c index 4f76b5e..88aac2c 100644
>>>>>>>> --- a/drivers/net/ethernet/renesas/sh_eth.c
>>>>>>>> +++ b/drivers/net/ethernet/renesas/sh_eth.c
>>>>>>>> @@ -2870,6 +2870,7 @@ static int sh_eth_drv_probe(struct
>>>>>>>> platform_device *pdev)
>>>>>>>>      	mdp->pdev = pdev;
>>>>>>>>      	pm_runtime_enable(&pdev->dev);
>>>>>>>>      	pm_runtime_resume(&pdev->dev);
>>>>>>>> +   	pm_runtime_get_sync(&pdev->dev);
>>>>>>>
>>>>>>> I believe pm_runtime_resume() isn't needed anymore if we call
>>>>>>> pm_runtime_get_sync().
>>>>>>>
>>>>>>>>      	if (pdev->dev.of_node)
>>>>>>>>      		pd = sh_eth_parse_dt(&pdev->dev);
>>>>>>>> @@ -2961,6 +2962,7 @@ static int sh_eth_drv_probe(struct
>>>>>>>> platform_device *pdev)
>>>>>>>>      	pr_info("Base address at 0x%x, %pM, IRQ %d.\n",
>>>>>>>>      		(u32)ndev->base_addr, ndev->dev_addr, ndev->irq);
>>>>>>>>
>>>>>>>> +   	pm_runtime_put_sync(&pdev->dev);
>>>>>>>>      	platform_set_drvdata(pdev, ndev);
>>>>>>>>      	
>>>>>>>>      	return ret;
>>>>>>
>>>>>> I will look at removing the pm_runtime_resume call as well in this
>>>>>> patch.
>>>>>
>>>>> Thank you. It would also be nice if you could find out what causes the
>>>>> PM runtime work queue to suspend the device. pm_runtime_resume() is
>>>>> documented as being the function to call at probe time. I'd like to know
>>>>> whether the problem comes from the PM runtime core itself, in which case
>>>>> the documentation should be updated or the PM runtime core should be
>>>>> fixed, or from operations performed by the sh-eth driver, in which case
>>>>> we might need a different fix in the driver.
>>>>
>>>> I'm not entirely sure, pm_runtime_resume() seems like something to call
>>>> during a resume operation. It probably does not increment device use
>>>> count, and thus the pm worker thread is allowed to re-suspend the device
>>>> when it runs.
>>>
>>> Yes, but why does it do so ? The runtime PM work queue doesn't seem to go
>>> through devices to try and suspend them, it seems to only process delayed
>>> work. What queues the work that makes runtime PM suspend the device ?
>>>
>>>> pm_runtime_get_sync() is the right thing to do as we are dealing with
>>>> access device registers during the probe and thus should ensure the
>>>> device does not get shutdown until the probe has finished.
>>>
>>> pm_runtime_resume() is documented as being the function to call in the
>>> probe handler. I'd like to get a clarification on this from the runtime
>>> PM developers. If the documentation is wrong it needs to be fixed.
>>
>> Hmm, wonder if it works for things that take a long time to run, or for
>> things that create and probe sub-devices such as the phy?
>
> That's exactly what I'd like to know :-)
>
>> I annotated the clk-mstp.c driver to test for removal of the eth clock and
>> the culprit in this case was a pm thread running before the device finished
>> probing and suspending. I do not have the exact debug output as I deleted it
>> after looking in to it.
>
> Would you be able to perform more tests ? I'd like to get to the bottom of
> this problem and find out whether the best solution would be to fix the
> runtime PM core, the runtime PM documentation or the driver (or a combination
> of them).

 From pm_runtime.h:

static inline int pm_runtime_get_sync(struct device *dev)
{
	return __pm_runtime_resume(dev, RPM_GET_PUT);
}

static inline int pm_request_resume(struct device *dev)
{
	return __pm_runtime_resume(dev, RPM_ASYNC);
}

So it looks like pm_runtime_resume() does not protect against
the possibility that something else may re-suspend the device.

I have yet to ascertain how this ends up happening with device probe,
it seems to be very dependant on the code.
Geert Uytterhoeven March 17, 2014, 1:38 p.m. UTC | #15
On Mon, Mar 17, 2014 at 2:36 PM, Ben Dooks <ben.dooks@codethink.co.uk> wrote:
> I have yet to ascertain how this ends up happening with device probe,
> it seems to be very dependant on the code.

Adding a WARN() in cpg_mstp_clock_endisable():

[<c0278c9c>] (cpg_mstp_clock_endisable) from [<c0278e10>]
(cpg_mstp_clock_disable+0x14/0
x18)
 r10:eec7a000 r9:00000003 r8:eed41508 r7:00000001 r6:a0000113 r5:eec23a00
 r4:eec23a00
[<c0278dfc>] (cpg_mstp_clock_disable) from [<c0275bf4>]
(__clk_disable+0x68/0x78)
[<c0275b8c>] (__clk_disable) from [<c0275cf4>] (clk_disable+0x20/0x2c)
 r4:60000193 r3:00000002
[<c0275cd4>] (clk_disable) from [<c01fd000>] (pm_clk_suspend+0x54/0x78)
 r5:eed41500 r4:eed414c0
[<c01fcfac>] (pm_clk_suspend) from [<c01f87f8>] (__rpm_callback+0x4c/0x7c)
 r8:0000000c r7:00000004 r6:00000000 r5:eed42810 r4:eed42810 r3:c01fcfac
[<c01f87ac>] (__rpm_callback) from [<c01f8878>] (rpm_callback+0x50/0x84)
 r5:eed42810 r4:eec7a000
[<c01f8828>] (rpm_callback) from [<c01f8e54>] (rpm_suspend+0x214/0x3e4)
 r6:00000000 r5:00000000 r4:eed42810 r3:eed42810
[<c01f8c40>] (rpm_suspend) from [<c01f9224>] (rpm_idle+0x184/0x19c)
 r10:00000010 r9:00000000 r8:ee4710bc r7:00000004 r6:00000004 r5:00000000
 r4:eed42810
[<c01f90a0>] (rpm_idle) from [<c01f9328>] (__pm_runtime_idle+0x64/0x7c)
 r6:60000113 r5:eed42870 r4:eed42810 r3:eed42870
[<c01f92c4>] (__pm_runtime_idle) from [<c023ee90>]
(sh_eth_get_stats+0x228/0x23c)
 r7:ee471000 r6:c0356bd4 r5:ee428000 r4:ee428000
[<c023ec68>] (sh_eth_get_stats) from [<c028c194>] (dev_get_stats+0x5c/0x84)
 r4:eec7bc30 r3:c023ec68
[<c028c138>] (dev_get_stats) from [<c029cb08>] (rtnl_fill_ifinfo+0x410/0x8dc)
 r6:00000000 r5:ee428000 r4:eed76ec0 r3:00000684
[<c029c6f8>] (rtnl_fill_ifinfo) from [<c029d9f4>] (rtmsg_ifinfo+0x6c/0xd4)
 r10:00000010 r9:ee428510 r8:ffffffff r7:eed76ec0 r6:ee428000 r5:000000d0
 r4:00000000
[<c029d988>] (rtmsg_ifinfo) from [<c0291cd0>] (register_netdevice+0x3dc/0x428)
 r10:00000000 r8:eed42810 r7:ee428010 r6:ee428000 r5:c047f600 r4:00000000
[<c02918f4>] (register_netdevice) from [<c0291d38>] (register_netdev+0x1c/0x2c)
 r7:eed42800 r6:ef203978 r5:ee46e790 r4:ee428000
[<c0291d1c>] (register_netdev) from [<c0240528>] (sh_eth_drv_probe+0x994/0xb7c)
 r4:ee428000 r3:00000040
[<c023fb94>] (sh_eth_drv_probe) from [<c01f3e6c>] (platform_drv_probe+0x20/0x50)
 r10:c042b548 r9:c045c544 r8:00000065 r7:c047d830 r6:00000000 r5:c047d830
 r4:eed42810
[<c01f3e4c>] (platform_drv_probe) from [<c01f2428>]
(driver_probe_device+0xc8/0x1f8)
 r5:eed42844 r4:eed42810
[<c01f2360>] (driver_probe_device) from [<c01f25c0>] (__driver_attach+0x68/0x8c)
 r7:00000000 r6:c047d830 r5:eed42844 r4:eed42810
[<c01f2558>] (__driver_attach) from [<c01f0cd4>] (bus_for_each_dev+0x5c/0x94)
 r6:c01f2558 r5:eec7be48 r4:c047d830 r3:c01f2558
[<c01f0c78>] (bus_for_each_dev) from [<c01f1f84>] (driver_attach+0x20/0x28)
 r7:00000000 r6:c047a148 r5:ee461480 r4:c047d830
[<c01f1f64>] (driver_attach) from [<c01f1b98>] (bus_add_driver+0xb4/0x1c8)
[<c01f1ae4>] (bus_add_driver) from [<c01f2c98>] (driver_register+0xa4/0xe8)
 r7:c0484780 r6:00000000 r5:c0454c18 r4:c047d830
[<c01f2bf4>] (driver_register) from [<c01f3854>]
(__platform_driver_register+0x50/0x64)
 r5:c0454c18 r4:c0442acc
[<c01f3804>] (__platform_driver_register) from [<c0442ae4>]
(sh_eth_driver_init+0x18/0x20)
[<c0442acc>] (sh_eth_driver_init) from [<c0009668>] (do_one_initcall+0x98/0x13c)
[<c00095d0>] (do_one_initcall) from [<c042bc60>]
(kernel_init_freeable+0x100/0x1cc)
 r9:c045c544 r8:00000065 r7:c0484780 r6:c0454bf8 r5:c0454c18 r4:00000006
[<c042bb60>] (kernel_init_freeable) from [<c032afa8>] (kernel_init+0x10/0xec)
 r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c032af98
 r4:00000000
[<c032af98>] (kernel_init) from [<c000ed98>] (ret_from_fork+0x14/0x3c)
 r4:00000000 r3:00000000
Laurent Pinchart March 17, 2014, 1:41 p.m. UTC | #16
Hi Ben,

On Monday 17 March 2014 13:36:33 Ben Dooks wrote:

[snip]

>  From pm_runtime.h:
> 
> static inline int pm_runtime_get_sync(struct device *dev)
> {
> 	return __pm_runtime_resume(dev, RPM_GET_PUT);
> }
> 
> static inline int pm_request_resume(struct device *dev)
> {
> 	return __pm_runtime_resume(dev, RPM_ASYNC);
> }
> 
> So it looks like pm_runtime_resume() does not protect against the
> possibility that something else may re-suspend the device.

Correct.

> I have yet to ascertain how this ends up happening with device probe, it
> seems to be very dependant on the code.

We might be doing something wrong in the driver from a runtime PM point of 
view that leads to the device being suspended. I'd like to catch that instead 
of hiding it by a pm_runtime_get_sync() call. If it turns out that we're not 
doing anything wrong then replacing pm_runtime_resume() with 
pm_runtime_get_sync() would of course be fine.
Geert Uytterhoeven March 17, 2014, 1:43 p.m. UTC | #17
On Mon, Mar 17, 2014 at 2:38 PM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> [<c01f92c4>] (__pm_runtime_idle) from [<c023ee90>]
> (sh_eth_get_stats+0x228/0x23c)
>  r7:ee471000 r6:c0356bd4 r5:ee428000 r4:ee428000
> [<c023ec68>] (sh_eth_get_stats) from [<c028c194>] (dev_get_stats+0x5c/0x84)
>  r4:eec7bc30 r3:c023ec68
> [<c028c138>] (dev_get_stats) from [<c029cb08>] (rtnl_fill_ifinfo+0x410/0x8dc)
>  r6:00000000 r5:ee428000 r4:eed76ec0 r3:00000684
> [<c029c6f8>] (rtnl_fill_ifinfo) from [<c029d9f4>] (rtmsg_ifinfo+0x6c/0xd4)
>  r10:00000010 r9:ee428510 r8:ffffffff r7:eed76ec0 r6:ee428000 r5:000000d0
>  r4:00000000
> [<c029d988>] (rtmsg_ifinfo) from [<c0291cd0>] (register_netdevice+0x3dc/0x428)
>  r10:00000000 r8:eed42810 r7:ee428010 r6:ee428000 r5:c047f600 r4:00000000
> [<c02918f4>] (register_netdevice) from [<c0291d38>] (register_netdev+0x1c/0x2c)
>  r7:eed42800 r6:ef203978 r5:ee46e790 r4:ee428000
> [<c0291d1c>] (register_netdev) from [<c0240528>] (sh_eth_drv_probe+0x994/0xb7c)
>  r4:ee428000 r3:00000040
> [<c023fb94>] (sh_eth_drv_probe) from [<c01f3e6c>] (platform_drv_probe+0x20/0x50)
>  r10:c042b548 r9:c045c544 r8:00000065 r7:c047d830 r6:00000000 r5:c047d830
>  r4:eed42810

sh_eth_get_stats() calls

    pm_runtime_get_sync(&mdp->pdev->dev);
    ...
    pm_runtime_put_sync(&mdp->pdev->dev);

so adding the extra pm_runtime_get_sync() in ->probe() prevents the
pm_runtime_put_sync() from putting the hardware to sleep.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Dooks March 17, 2014, 1:44 p.m. UTC | #18
On 17/03/14 13:38, Geert Uytterhoeven wrote:
> On Mon, Mar 17, 2014 at 2:36 PM, Ben Dooks <ben.dooks@codethink.co.uk> wrote:
>> I have yet to ascertain how this ends up happening with device probe,
>> it seems to be very dependant on the code.
>
> Adding a WARN() in cpg_mstp_clock_endisable():
>
> [<c0278c9c>] (cpg_mstp_clock_endisable) from [<c0278e10>]
> (cpg_mstp_clock_disable+0x14/0
> x18)
>   r10:eec7a000 r9:00000003 r8:eed41508 r7:00000001 r6:a0000113 r5:eec23a00
>   r4:eec23a00
> [<c0278dfc>] (cpg_mstp_clock_disable) from [<c0275bf4>]
> (__clk_disable+0x68/0x78)
> [<c0275b8c>] (__clk_disable) from [<c0275cf4>] (clk_disable+0x20/0x2c)
>   r4:60000193 r3:00000002
> [<c0275cd4>] (clk_disable) from [<c01fd000>] (pm_clk_suspend+0x54/0x78)
>   r5:eed41500 r4:eed414c0
> [<c01fcfac>] (pm_clk_suspend) from [<c01f87f8>] (__rpm_callback+0x4c/0x7c)
>   r8:0000000c r7:00000004 r6:00000000 r5:eed42810 r4:eed42810 r3:c01fcfac
> [<c01f87ac>] (__rpm_callback) from [<c01f8878>] (rpm_callback+0x50/0x84)
>   r5:eed42810 r4:eec7a000
> [<c01f8828>] (rpm_callback) from [<c01f8e54>] (rpm_suspend+0x214/0x3e4)
>   r6:00000000 r5:00000000 r4:eed42810 r3:eed42810
> [<c01f8c40>] (rpm_suspend) from [<c01f9224>] (rpm_idle+0x184/0x19c)
>   r10:00000010 r9:00000000 r8:ee4710bc r7:00000004 r6:00000004 r5:00000000
>   r4:eed42810
> [<c01f90a0>] (rpm_idle) from [<c01f9328>] (__pm_runtime_idle+0x64/0x7c)
>   r6:60000113 r5:eed42870 r4:eed42810 r3:eed42870
> [<c01f92c4>] (__pm_runtime_idle) from [<c023ee90>]
> (sh_eth_get_stats+0x228/0x23c)
>   r7:ee471000 r6:c0356bd4 r5:ee428000 r4:ee428000
> [<c023ec68>] (sh_eth_get_stats) from [<c028c194>] (dev_get_stats+0x5c/0x84)
>   r4:eec7bc30 r3:c023ec68
> [<c028c138>] (dev_get_stats) from [<c029cb08>] (rtnl_fill_ifinfo+0x410/0x8dc)
>   r6:00000000 r5:ee428000 r4:eed76ec0 r3:00000684
> [<c029c6f8>] (rtnl_fill_ifinfo) from [<c029d9f4>] (rtmsg_ifinfo+0x6c/0xd4)
>   r10:00000010 r9:ee428510 r8:ffffffff r7:eed76ec0 r6:ee428000 r5:000000d0
>   r4:00000000
> [<c029d988>] (rtmsg_ifinfo) from [<c0291cd0>] (register_netdevice+0x3dc/0x428)
>   r10:00000000 r8:eed42810 r7:ee428010 r6:ee428000 r5:c047f600 r4:00000000
> [<c02918f4>] (register_netdevice) from [<c0291d38>] (register_netdev+0x1c/0x2c)
>   r7:eed42800 r6:ef203978 r5:ee46e790 r4:ee428000
> [<c0291d1c>] (register_netdev) from [<c0240528>] (sh_eth_drv_probe+0x994/0xb7c)
>   r4:ee428000 r3:00000040
> [<c023fb94>] (sh_eth_drv_probe) from [<c01f3e6c>] (platform_drv_probe+0x20/0x50)
>   r10:c042b548 r9:c045c544 r8:00000065 r7:c047d830 r6:00000000 r5:c047d830
>   r4:eed42810
> [<c01f3e4c>] (platform_drv_probe) from [<c01f2428>]
> (driver_probe_device+0xc8/0x1f8)
>   r5:eed42844 r4:eed42810
> [<c01f2360>] (driver_probe_device) from [<c01f25c0>] (__driver_attach+0x68/0x8c)
>   r7:00000000 r6:c047d830 r5:eed42844 r4:eed42810
> [<c01f2558>] (__driver_attach) from [<c01f0cd4>] (bus_for_each_dev+0x5c/0x94)
>   r6:c01f2558 r5:eec7be48 r4:c047d830 r3:c01f2558
> [<c01f0c78>] (bus_for_each_dev) from [<c01f1f84>] (driver_attach+0x20/0x28)
>   r7:00000000 r6:c047a148 r5:ee461480 r4:c047d830
> [<c01f1f64>] (driver_attach) from [<c01f1b98>] (bus_add_driver+0xb4/0x1c8)
> [<c01f1ae4>] (bus_add_driver) from [<c01f2c98>] (driver_register+0xa4/0xe8)
>   r7:c0484780 r6:00000000 r5:c0454c18 r4:c047d830
> [<c01f2bf4>] (driver_register) from [<c01f3854>]
> (__platform_driver_register+0x50/0x64)
>   r5:c0454c18 r4:c0442acc
> [<c01f3804>] (__platform_driver_register) from [<c0442ae4>]
> (sh_eth_driver_init+0x18/0x20)
> [<c0442acc>] (sh_eth_driver_init) from [<c0009668>] (do_one_initcall+0x98/0x13c)
> [<c00095d0>] (do_one_initcall) from [<c042bc60>]
> (kernel_init_freeable+0x100/0x1cc)
>   r9:c045c544 r8:00000065 r7:c0484780 r6:c0454bf8 r5:c0454c18 r4:00000006
> [<c042bb60>] (kernel_init_freeable) from [<c032afa8>] (kernel_init+0x10/0xec)
>   r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c032af98
>   r4:00000000
> [<c032af98>] (kernel_init) from [<c000ed98>] (ret_from_fork+0x14/0x3c)
>   r4:00000000 r3:00000000

this explains it, the call to stats causes a get_sync/put_sync which
puts the device into a state where it /could/ be suspended and thus
does get suspended in this case from the pm code.

I'm not /sure/ why the pm_runtime code is not protecting against
running this when a device probe is in progress, but it seems the
best thing is to ensure that we always do a get/put sync in the
driver to ensure we have a reference during probe.
Laurent Pinchart March 17, 2014, 1:54 p.m. UTC | #19
Hi Ben,

On Monday 17 March 2014 13:44:02 Ben Dooks wrote:
> On 17/03/14 13:38, Geert Uytterhoeven wrote:
> > On Mon, Mar 17, 2014 at 2:36 PM, Ben Dooks wrote:
> >> I have yet to ascertain how this ends up happening with device probe,
> >> it seems to be very dependant on the code.
> > 
> > Adding a WARN() in cpg_mstp_clock_endisable():
> > 
> > [<c0278c9c>] (cpg_mstp_clock_endisable) from [<c0278e10>]
> > (cpg_mstp_clock_disable+0x14/0x18)
> >   r10:eec7a000 r9:00000003 r8:eed41508 r7:00000001 r6:a0000113 r5:eec23a00
> >   r4:eec23a00
> > [<c0278dfc>] (cpg_mstp_clock_disable) from [<c0275bf4>]
> > (__clk_disable+0x68/0x78)
> > [<c0275b8c>] (__clk_disable) from [<c0275cf4>] (clk_disable+0x20/0x2c)
> >   r4:60000193 r3:00000002
> > [<c0275cd4>] (clk_disable) from [<c01fd000>] (pm_clk_suspend+0x54/0x78)
> >   r5:eed41500 r4:eed414c0
> > [<c01fcfac>] (pm_clk_suspend) from [<c01f87f8>] (__rpm_callback+0x4c/0x7c)
> >   r8:0000000c r7:00000004 r6:00000000 r5:eed42810 r4:eed42810 r3:c01fcfac
> > [<c01f87ac>] (__rpm_callback) from [<c01f8878>] (rpm_callback+0x50/0x84)
> >   r5:eed42810 r4:eec7a000
> > [<c01f8828>] (rpm_callback) from [<c01f8e54>] (rpm_suspend+0x214/0x3e4)
> >   r6:00000000 r5:00000000 r4:eed42810 r3:eed42810
> > [<c01f8c40>] (rpm_suspend) from [<c01f9224>] (rpm_idle+0x184/0x19c)
> >   r10:00000010 r9:00000000 r8:ee4710bc r7:00000004 r6:00000004 r5:00000000
> >   r4:eed42810
> > [<c01f90a0>] (rpm_idle) from [<c01f9328>] (__pm_runtime_idle+0x64/0x7c)
> >   r6:60000113 r5:eed42870 r4:eed42810 r3:eed42870
> > [<c01f92c4>] (__pm_runtime_idle) from [<c023ee90>]
> > (sh_eth_get_stats+0x228/0x23c)
> >   r7:ee471000 r6:c0356bd4 r5:ee428000 r4:ee428000
> > [<c023ec68>] (sh_eth_get_stats) from [<c028c194>]
> > (dev_get_stats+0x5c/0x84)
> >   r4:eec7bc30 r3:c023ec68
> > [<c028c138>] (dev_get_stats) from [<c029cb08>]
> > (rtnl_fill_ifinfo+0x410/0x8dc)
> >   r6:00000000 r5:ee428000 r4:eed76ec0 r3:00000684
> > [<c029c6f8>] (rtnl_fill_ifinfo) from [<c029d9f4>] (rtmsg_ifinfo+0x6c/0xd4)
> >   r10:00000010 r9:ee428510 r8:ffffffff r7:eed76ec0 r6:ee428000 r5:000000d0
> >   r4:00000000
> > [<c029d988>] (rtmsg_ifinfo) from [<c0291cd0>]
> > (register_netdevice+0x3dc/0x428)> 
> >   r10:00000000 r8:eed42810 r7:ee428010 r6:ee428000 r5:c047f600 r4:00000000
> > [<c02918f4>] (register_netdevice) from [<c0291d38>]
> > (register_netdev+0x1c/0x2c)> 
> >   r7:eed42800 r6:ef203978 r5:ee46e790 r4:ee428000
> > [<c0291d1c>] (register_netdev) from [<c0240528>]
> > (sh_eth_drv_probe+0x994/0xb7c)
> >   r4:ee428000 r3:00000040
> > [<c023fb94>] (sh_eth_drv_probe) from [<c01f3e6c>]
> > (platform_drv_probe+0x20/0x50)
> >   r10:c042b548 r9:c045c544 r8:00000065 r7:c047d830 r6:00000000 r5:c047d830
> >   r4:eed42810

[snip]

> this explains it, the call to stats causes a get_sync/put_sync which
> puts the device into a state where it /could/ be suspended and thus
> does get suspended in this case from the pm code.
> 
> I'm not /sure/ why the pm_runtime code is not protecting against
> running this when a device probe is in progress, but it seems the
> best thing is to ensure that we always do a get/put sync in the
> driver to ensure we have a reference during probe.

Wouldn't it be better to register the MDIO bus before registering the network 
device ? It looks like the current order is prone to race conditions.

Another potential issue, does the network layer guarantee that the device will 
always be opened by an .ndo_open() call before performing any MDIO operation ? 
sh_mmd_ctrl, sh_set_mdio and sh_get_mdio access sh-eth registers, could we be 
missing runtime PM calls in those call paths ?
Ben Dooks March 17, 2014, 2:01 p.m. UTC | #20
On 17/03/14 13:54, Laurent Pinchart wrote:
> Hi Ben,
>
> On Monday 17 March 2014 13:44:02 Ben Dooks wrote:
>> On 17/03/14 13:38, Geert Uytterhoeven wrote:
>>> On Mon, Mar 17, 2014 at 2:36 PM, Ben Dooks wrote:
>>>> I have yet to ascertain how this ends up happening with device probe,
>>>> it seems to be very dependant on the code.
>>>
>>> Adding a WARN() in cpg_mstp_clock_endisable():
>>>
>>> [<c0278c9c>] (cpg_mstp_clock_endisable) from [<c0278e10>]
>>> (cpg_mstp_clock_disable+0x14/0x18)
>>>    r10:eec7a000 r9:00000003 r8:eed41508 r7:00000001 r6:a0000113 r5:eec23a00
>>>    r4:eec23a00
>>> [<c0278dfc>] (cpg_mstp_clock_disable) from [<c0275bf4>]
>>> (__clk_disable+0x68/0x78)
>>> [<c0275b8c>] (__clk_disable) from [<c0275cf4>] (clk_disable+0x20/0x2c)
>>>    r4:60000193 r3:00000002
>>> [<c0275cd4>] (clk_disable) from [<c01fd000>] (pm_clk_suspend+0x54/0x78)
>>>    r5:eed41500 r4:eed414c0
>>> [<c01fcfac>] (pm_clk_suspend) from [<c01f87f8>] (__rpm_callback+0x4c/0x7c)
>>>    r8:0000000c r7:00000004 r6:00000000 r5:eed42810 r4:eed42810 r3:c01fcfac
>>> [<c01f87ac>] (__rpm_callback) from [<c01f8878>] (rpm_callback+0x50/0x84)
>>>    r5:eed42810 r4:eec7a000
>>> [<c01f8828>] (rpm_callback) from [<c01f8e54>] (rpm_suspend+0x214/0x3e4)
>>>    r6:00000000 r5:00000000 r4:eed42810 r3:eed42810
>>> [<c01f8c40>] (rpm_suspend) from [<c01f9224>] (rpm_idle+0x184/0x19c)
>>>    r10:00000010 r9:00000000 r8:ee4710bc r7:00000004 r6:00000004 r5:00000000
>>>    r4:eed42810
>>> [<c01f90a0>] (rpm_idle) from [<c01f9328>] (__pm_runtime_idle+0x64/0x7c)
>>>    r6:60000113 r5:eed42870 r4:eed42810 r3:eed42870
>>> [<c01f92c4>] (__pm_runtime_idle) from [<c023ee90>]
>>> (sh_eth_get_stats+0x228/0x23c)
>>>    r7:ee471000 r6:c0356bd4 r5:ee428000 r4:ee428000
>>> [<c023ec68>] (sh_eth_get_stats) from [<c028c194>]
>>> (dev_get_stats+0x5c/0x84)
>>>    r4:eec7bc30 r3:c023ec68
>>> [<c028c138>] (dev_get_stats) from [<c029cb08>]
>>> (rtnl_fill_ifinfo+0x410/0x8dc)
>>>    r6:00000000 r5:ee428000 r4:eed76ec0 r3:00000684
>>> [<c029c6f8>] (rtnl_fill_ifinfo) from [<c029d9f4>] (rtmsg_ifinfo+0x6c/0xd4)
>>>    r10:00000010 r9:ee428510 r8:ffffffff r7:eed76ec0 r6:ee428000 r5:000000d0
>>>    r4:00000000
>>> [<c029d988>] (rtmsg_ifinfo) from [<c0291cd0>]
>>> (register_netdevice+0x3dc/0x428)>
>>>    r10:00000000 r8:eed42810 r7:ee428010 r6:ee428000 r5:c047f600 r4:00000000
>>> [<c02918f4>] (register_netdevice) from [<c0291d38>]
>>> (register_netdev+0x1c/0x2c)>
>>>    r7:eed42800 r6:ef203978 r5:ee46e790 r4:ee428000
>>> [<c0291d1c>] (register_netdev) from [<c0240528>]
>>> (sh_eth_drv_probe+0x994/0xb7c)
>>>    r4:ee428000 r3:00000040
>>> [<c023fb94>] (sh_eth_drv_probe) from [<c01f3e6c>]
>>> (platform_drv_probe+0x20/0x50)
>>>    r10:c042b548 r9:c045c544 r8:00000065 r7:c047d830 r6:00000000 r5:c047d830
>>>    r4:eed42810
>
> [snip]
>
>> this explains it, the call to stats causes a get_sync/put_sync which
>> puts the device into a state where it /could/ be suspended and thus
>> does get suspended in this case from the pm code.
>>
>> I'm not /sure/ why the pm_runtime code is not protecting against
>> running this when a device probe is in progress, but it seems the
>> best thing is to ensure that we always do a get/put sync in the
>> driver to ensure we have a reference during probe.
>
> Wouldn't it be better to register the MDIO bus before registering the network
> device ? It looks like the current order is prone to race conditions.

Not sure, requires input?

> Another potential issue, does the network layer guarantee that the device will
> always be opened by an .ndo_open() call before performing any MDIO operation ?
> sh_mmd_ctrl, sh_set_mdio and sh_get_mdio access sh-eth registers, could we be
> missing runtime PM calls in those call paths ?

I'm not sure if there is any guarantee, I don't think the PHY driver
does any sort of open on probe. I do have a patch to ensure that the
MDIO operations are wrapped with pm_runtime_{get,put}_sync() calls as
the probe of the PHY often fails without it.
Geert Uytterhoeven March 17, 2014, 2:02 p.m. UTC | #21
(adding Sergei)

On Mon, Mar 17, 2014 at 2:54 PM, Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
> On Monday 17 March 2014 13:44:02 Ben Dooks wrote:
>> On 17/03/14 13:38, Geert Uytterhoeven wrote:
>> > On Mon, Mar 17, 2014 at 2:36 PM, Ben Dooks wrote:
>> >> I have yet to ascertain how this ends up happening with device probe,
>> >> it seems to be very dependant on the code.
>> >
>> > Adding a WARN() in cpg_mstp_clock_endisable():
>> >
>> > [<c0278c9c>] (cpg_mstp_clock_endisable) from [<c0278e10>]
>> > (cpg_mstp_clock_disable+0x14/0x18)
>> >   r10:eec7a000 r9:00000003 r8:eed41508 r7:00000001 r6:a0000113 r5:eec23a00
>> >   r4:eec23a00
>> > [<c0278dfc>] (cpg_mstp_clock_disable) from [<c0275bf4>]
>> > (__clk_disable+0x68/0x78)
>> > [<c0275b8c>] (__clk_disable) from [<c0275cf4>] (clk_disable+0x20/0x2c)
>> >   r4:60000193 r3:00000002
>> > [<c0275cd4>] (clk_disable) from [<c01fd000>] (pm_clk_suspend+0x54/0x78)
>> >   r5:eed41500 r4:eed414c0
>> > [<c01fcfac>] (pm_clk_suspend) from [<c01f87f8>] (__rpm_callback+0x4c/0x7c)
>> >   r8:0000000c r7:00000004 r6:00000000 r5:eed42810 r4:eed42810 r3:c01fcfac
>> > [<c01f87ac>] (__rpm_callback) from [<c01f8878>] (rpm_callback+0x50/0x84)
>> >   r5:eed42810 r4:eec7a000
>> > [<c01f8828>] (rpm_callback) from [<c01f8e54>] (rpm_suspend+0x214/0x3e4)
>> >   r6:00000000 r5:00000000 r4:eed42810 r3:eed42810
>> > [<c01f8c40>] (rpm_suspend) from [<c01f9224>] (rpm_idle+0x184/0x19c)
>> >   r10:00000010 r9:00000000 r8:ee4710bc r7:00000004 r6:00000004 r5:00000000
>> >   r4:eed42810
>> > [<c01f90a0>] (rpm_idle) from [<c01f9328>] (__pm_runtime_idle+0x64/0x7c)
>> >   r6:60000113 r5:eed42870 r4:eed42810 r3:eed42870
>> > [<c01f92c4>] (__pm_runtime_idle) from [<c023ee90>]
>> > (sh_eth_get_stats+0x228/0x23c)
>> >   r7:ee471000 r6:c0356bd4 r5:ee428000 r4:ee428000
>> > [<c023ec68>] (sh_eth_get_stats) from [<c028c194>]
>> > (dev_get_stats+0x5c/0x84)
>> >   r4:eec7bc30 r3:c023ec68
>> > [<c028c138>] (dev_get_stats) from [<c029cb08>]
>> > (rtnl_fill_ifinfo+0x410/0x8dc)
>> >   r6:00000000 r5:ee428000 r4:eed76ec0 r3:00000684
>> > [<c029c6f8>] (rtnl_fill_ifinfo) from [<c029d9f4>] (rtmsg_ifinfo+0x6c/0xd4)
>> >   r10:00000010 r9:ee428510 r8:ffffffff r7:eed76ec0 r6:ee428000 r5:000000d0
>> >   r4:00000000
>> > [<c029d988>] (rtmsg_ifinfo) from [<c0291cd0>]
>> > (register_netdevice+0x3dc/0x428)>
>> >   r10:00000000 r8:eed42810 r7:ee428010 r6:ee428000 r5:c047f600 r4:00000000
>> > [<c02918f4>] (register_netdevice) from [<c0291d38>]
>> > (register_netdev+0x1c/0x2c)>
>> >   r7:eed42800 r6:ef203978 r5:ee46e790 r4:ee428000
>> > [<c0291d1c>] (register_netdev) from [<c0240528>]
>> > (sh_eth_drv_probe+0x994/0xb7c)
>> >   r4:ee428000 r3:00000040
>> > [<c023fb94>] (sh_eth_drv_probe) from [<c01f3e6c>]
>> > (platform_drv_probe+0x20/0x50)
>> >   r10:c042b548 r9:c045c544 r8:00000065 r7:c047d830 r6:00000000 r5:c047d830
>> >   r4:eed42810
>
> [snip]
>
>> this explains it, the call to stats causes a get_sync/put_sync which
>> puts the device into a state where it /could/ be suspended and thus
>> does get suspended in this case from the pm code.
>>
>> I'm not /sure/ why the pm_runtime code is not protecting against
>> running this when a device probe is in progress, but it seems the
>> best thing is to ensure that we always do a get/put sync in the
>> driver to ensure we have a reference during probe.
>
> Wouldn't it be better to register the MDIO bus before registering the network
> device ? It looks like the current order is prone to race conditions.

Indeed, typically register_netdev() is the last call of the .probe() function.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sergei Shtylyov March 17, 2014, 2:15 p.m. UTC | #22
Hello.

On 17-03-2014 18:02, Geert Uytterhoeven wrote:

> (adding Sergei)

[...]
>>>>> I have yet to ascertain how this ends up happening with device probe,
>>>>> it seems to be very dependant on the code.

>>>> Adding a WARN() in cpg_mstp_clock_endisable():

>>>> [<c0278c9c>] (cpg_mstp_clock_endisable) from [<c0278e10>]
>>>> (cpg_mstp_clock_disable+0x14/0x18)
>>>>    r10:eec7a000 r9:00000003 r8:eed41508 r7:00000001 r6:a0000113 r5:eec23a00
>>>>    r4:eec23a00
>>>> [<c0278dfc>] (cpg_mstp_clock_disable) from [<c0275bf4>]
>>>> (__clk_disable+0x68/0x78)
>>>> [<c0275b8c>] (__clk_disable) from [<c0275cf4>] (clk_disable+0x20/0x2c)
>>>>    r4:60000193 r3:00000002
>>>> [<c0275cd4>] (clk_disable) from [<c01fd000>] (pm_clk_suspend+0x54/0x78)
>>>>    r5:eed41500 r4:eed414c0
>>>> [<c01fcfac>] (pm_clk_suspend) from [<c01f87f8>] (__rpm_callback+0x4c/0x7c)
>>>>    r8:0000000c r7:00000004 r6:00000000 r5:eed42810 r4:eed42810 r3:c01fcfac
>>>> [<c01f87ac>] (__rpm_callback) from [<c01f8878>] (rpm_callback+0x50/0x84)
>>>>    r5:eed42810 r4:eec7a000
>>>> [<c01f8828>] (rpm_callback) from [<c01f8e54>] (rpm_suspend+0x214/0x3e4)
>>>>    r6:00000000 r5:00000000 r4:eed42810 r3:eed42810
>>>> [<c01f8c40>] (rpm_suspend) from [<c01f9224>] (rpm_idle+0x184/0x19c)
>>>>    r10:00000010 r9:00000000 r8:ee4710bc r7:00000004 r6:00000004 r5:00000000
>>>>    r4:eed42810
>>>> [<c01f90a0>] (rpm_idle) from [<c01f9328>] (__pm_runtime_idle+0x64/0x7c)
>>>>    r6:60000113 r5:eed42870 r4:eed42810 r3:eed42870
>>>> [<c01f92c4>] (__pm_runtime_idle) from [<c023ee90>]
>>>> (sh_eth_get_stats+0x228/0x23c)
>>>>    r7:ee471000 r6:c0356bd4 r5:ee428000 r4:ee428000
>>>> [<c023ec68>] (sh_eth_get_stats) from [<c028c194>]
>>>> (dev_get_stats+0x5c/0x84)
>>>>    r4:eec7bc30 r3:c023ec68
>>>> [<c028c138>] (dev_get_stats) from [<c029cb08>]
>>>> (rtnl_fill_ifinfo+0x410/0x8dc)
>>>>    r6:00000000 r5:ee428000 r4:eed76ec0 r3:00000684
>>>> [<c029c6f8>] (rtnl_fill_ifinfo) from [<c029d9f4>] (rtmsg_ifinfo+0x6c/0xd4)
>>>>    r10:00000010 r9:ee428510 r8:ffffffff r7:eed76ec0 r6:ee428000 r5:000000d0
>>>>    r4:00000000
>>>> [<c029d988>] (rtmsg_ifinfo) from [<c0291cd0>]
>>>> (register_netdevice+0x3dc/0x428)>
>>>>    r10:00000000 r8:eed42810 r7:ee428010 r6:ee428000 r5:c047f600 r4:00000000
>>>> [<c02918f4>] (register_netdevice) from [<c0291d38>]
>>>> (register_netdev+0x1c/0x2c)>
>>>>    r7:eed42800 r6:ef203978 r5:ee46e790 r4:ee428000
>>>> [<c0291d1c>] (register_netdev) from [<c0240528>]
>>>> (sh_eth_drv_probe+0x994/0xb7c)
>>>>    r4:ee428000 r3:00000040
>>>> [<c023fb94>] (sh_eth_drv_probe) from [<c01f3e6c>]
>>>> (platform_drv_probe+0x20/0x50)
>>>>    r10:c042b548 r9:c045c544 r8:00000065 r7:c047d830 r6:00000000 r5:c047d830
>>>>    r4:eed42810

>> [snip]

>>> this explains it, the call to stats causes a get_sync/put_sync which
>>> puts the device into a state where it /could/ be suspended and thus
>>> does get suspended in this case from the pm code.

>>> I'm not /sure/ why the pm_runtime code is not protecting against
>>> running this when a device probe is in progress, but it seems the
>>> best thing is to ensure that we always do a get/put sync in the
>>> driver to ensure we have a reference during probe.

>> Wouldn't it be better to register the MDIO bus before registering the network
>> device ? It looks like the current order is prone to race conditions.

> Indeed, typically register_netdev() is the last call of the .probe() function.

    I'll see what can be done. I've been looking at the probe() method recently...

> Gr{oetje,eeting}s,

>                          Geert

WBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laurent Pinchart March 18, 2014, 8:45 p.m. UTC | #23
Hi Ben and Sergei,

On Wednesday 19 March 2014 00:17:43 Sergei Shtylyov wrote:
> On 03/17/2014 05:01 PM, Ben Dooks wrote:
> >>>>> I have yet to ascertain how this ends up happening with device probe,
> >>>>> it seems to be very dependant on the code.
> >>>> 
> >>>> Adding a WARN() in cpg_mstp_clock_endisable():
> [...]
> 
> >>> this explains it, the call to stats causes a get_sync/put_sync which
> >>> puts the device into a state where it /could/ be suspended and thus
> >>> does get suspended in this case from the pm code.
> >>> 
> >>> I'm not /sure/ why the pm_runtime code is not protecting against
> >>> running this when a device probe is in progress, but it seems the
> >>> best thing is to ensure that we always do a get/put sync in the
> >>> driver to ensure we have a reference during probe.
> >> 
> >> Wouldn't it be better to register the MDIO bus before registering the
> >> network device ? It looks like the current order is prone to race
> >> conditions.
> >
> > Not sure, requires input?
> 
> People say that the driver should be ready to receive the ndo_open() method
> call even before register_netdev() returns. I have prepared the patch to
> probe MDIO before calling register_netdev() now, need to sanity test it.

Thank you.

> >> Another potential issue, does the network layer guarantee that the device
> >> will always be opened by an .ndo_open() call before performing any MDIO
> >> operation ? sh_mmd_ctrl, sh_set_mdio and sh_get_mdio access sh-eth
> >> registers, could we be missing runtime PM calls in those call paths ?
> > 
> > I'm not sure if there is any guarantee, I don't think the PHY driver
> > does any sort of open on probe. I do have a patch to ensure that the
> > MDIO operations are wrapped with pm_runtime_{get,put}_sync() calls as
> > the probe of the PHY often fails without it.
> 
> Respin this patch please (addressing my comments), so that DaveM could take
> it.

Before wrapping MDIO operations in runtime PM calls, could we check whether 
that's really needed ? Moving PHY registration before network device 
registration will fix PHY probe problems, we might not need any other change.
Sergei Shtylyov March 18, 2014, 9:17 p.m. UTC | #24
Hello.

On 03/17/2014 05:01 PM, Ben Dooks wrote:

>>>>> I have yet to ascertain how this ends up happening with device probe,
>>>>> it seems to be very dependant on the code.

>>>> Adding a WARN() in cpg_mstp_clock_endisable():

[...]

>>> this explains it, the call to stats causes a get_sync/put_sync which
>>> puts the device into a state where it /could/ be suspended and thus
>>> does get suspended in this case from the pm code.

>>> I'm not /sure/ why the pm_runtime code is not protecting against
>>> running this when a device probe is in progress, but it seems the
>>> best thing is to ensure that we always do a get/put sync in the
>>> driver to ensure we have a reference during probe.

>> Wouldn't it be better to register the MDIO bus before registering the network
>> device ? It looks like the current order is prone to race conditions.

> Not sure, requires input?

    People say that the driver should be ready to receive the ndo_open() 
method call even before register_netdev() returns. I have prepared the patch 
to probe MDIO before calling register_netdev() now, need to sanity test it.

>> Another potential issue, does the network layer guarantee that the device will
>> always be opened by an .ndo_open() call before performing any MDIO operation ?
>> sh_mmd_ctrl, sh_set_mdio and sh_get_mdio access sh-eth registers, could we be
>> missing runtime PM calls in those call paths ?

> I'm not sure if there is any guarantee, I don't think the PHY driver
> does any sort of open on probe. I do have a patch to ensure that the
> MDIO operations are wrapped with pm_runtime_{get,put}_sync() calls as
> the probe of the PHY often fails without it.

    Respin this patch please (addressing my comments), so that DaveM could 
take it.

WBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Dooks March 19, 2014, 8:19 a.m. UTC | #25
On 17/03/14 14:41, Laurent Pinchart wrote:
> Hi Ben,
>
> On Monday 17 March 2014 13:36:33 Ben Dooks wrote:
>
> [snip]
>
>>   From pm_runtime.h:
>>
>> static inline int pm_runtime_get_sync(struct device *dev)
>> {
>> 	return __pm_runtime_resume(dev, RPM_GET_PUT);
>> }
>>
>> static inline int pm_request_resume(struct device *dev)
>> {
>> 	return __pm_runtime_resume(dev, RPM_ASYNC);
>> }
>>
>> So it looks like pm_runtime_resume() does not protect against the
>> possibility that something else may re-suspend the device.
>
> Correct.
>
>> I have yet to ascertain how this ends up happening with device probe, it
>> seems to be very dependant on the code.
>
> We might be doing something wrong in the driver from a runtime PM point of
> view that leads to the device being suspended. I'd like to catch that instead
> of hiding it by a pm_runtime_get_sync() call. If it turns out that we're not
> doing anything wrong then replacing pm_runtime_resume() with
> pm_runtime_get_sync() would of course be fine.

My view is the pm_runtime)_resume() is just nasty, as it makes no
guarantees that the device cannot be re-suspended.
Laurent Pinchart March 19, 2014, 10:17 a.m. UTC | #26
Hi Ben,

On Wednesday 19 March 2014 09:19:44 Ben Dooks wrote:
> On 17/03/14 14:41, Laurent Pinchart wrote:
> > On Monday 17 March 2014 13:36:33 Ben Dooks wrote:
> > 
> > [snip]
> > 
> >>   From pm_runtime.h:
> >> static inline int pm_runtime_get_sync(struct device *dev)
> >> {
> >> 	return __pm_runtime_resume(dev, RPM_GET_PUT);
> >> }
> >> 
> >> static inline int pm_request_resume(struct device *dev)
> >> {
> >> 	return __pm_runtime_resume(dev, RPM_ASYNC);
> >> }
> >> 
> >> So it looks like pm_runtime_resume() does not protect against the
> >> possibility that something else may re-suspend the device.
> > 
> > Correct.
> > 
> >> I have yet to ascertain how this ends up happening with device probe, it
> >> seems to be very dependant on the code.
> > 
> > We might be doing something wrong in the driver from a runtime PM point of
> > view that leads to the device being suspended. I'd like to catch that
> > instead of hiding it by a pm_runtime_get_sync() call. If it turns out
> > that we're not doing anything wrong then replacing pm_runtime_resume()
> > with pm_runtime_get_sync() would of course be fine.
> 
> My view is the pm_runtime)_resume() is just nasty, as it makes no guarantees
> that the device cannot be re-suspended.

My point is that requiring such a guarantee at probe time might be a sign that 
something else is wrong (which was the case with the sh-eth driver). I'm not 
saying it should never be done of course.
diff mbox

Patch

diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index 4f76b5e..88aac2c 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -2870,6 +2870,7 @@  static int sh_eth_drv_probe(struct platform_device *pdev)
 	mdp->pdev = pdev;
 	pm_runtime_enable(&pdev->dev);
 	pm_runtime_resume(&pdev->dev);
+	pm_runtime_get_sync(&pdev->dev);
 
 	if (pdev->dev.of_node)
 		pd = sh_eth_parse_dt(&pdev->dev);
@@ -2961,6 +2962,7 @@  static int sh_eth_drv_probe(struct platform_device *pdev)
 	pr_info("Base address at 0x%x, %pM, IRQ %d.\n",
 		(u32)ndev->base_addr, ndev->dev_addr, ndev->irq);
 
+	pm_runtime_put_sync(&pdev->dev);
 	platform_set_drvdata(pdev, ndev);
 
 	return ret;