spi: tegra20-slink: Fix runtime PM imbalance on error

Message ID	20200521074946.21799-1-dinghao.liu@zju.edu.cn (mailing list archive)
State	Superseded
Headers	show Return-Path: <SRS0=zGdX=7D=vger.kernel.org=linux-spi-owner@kernel.org> From: Dinghao Liu <dinghao.liu@zju.edu.cn> To: dinghao.liu@zju.edu.cn, kjlu@umn.edu Cc: Laxman Dewangan <ldewangan@nvidia.com>, Mark Brown <broonie@kernel.org>, Thierry Reding <thierry.reding@gmail.com>, Jonathan Hunter <jonathanh@nvidia.com>, linux-spi@vger.kernel.org, linux-tegra@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] spi: tegra20-slink: Fix runtime PM imbalance on error Date: Thu, 21 May 2020 15:49:46 +0800 Message-Id: <20200521074946.21799-1-dinghao.liu@zju.edu.cn> Sender: linux-spi-owner@vger.kernel.org Precedence: bulk
Series	spi: tegra20-slink: Fix runtime PM imbalance on error \| expand spi: tegra20-slink: Fix runtime PM imbalance on error

Dinghao Liu May 21, 2020, 7:49 a.m. UTC

pm_runtime_get_sync() increments the runtime PM usage counter even
when it returns an error code. Thus a pairing decrement is needed on
the error handling path to keep the counter balanced.

Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
---
 drivers/spi/spi-tegra20-slink.c | 1 +
 1 file changed, 1 insertion(+)

Andy Shevchenko May 21, 2020, 8:04 a.m. UTC | #1

On Thu, May 21, 2020 at 10:50 AM Dinghao Liu <dinghao.liu@zju.edu.cn> wrote:
>
> pm_runtime_get_sync() increments the runtime PM usage counter even
> when it returns an error code. Thus a pairing decrement is needed on
> the error handling path to keep the counter balanced.

...

>         ret = pm_runtime_get_sync(&pdev->dev);
>         if (ret < 0) {
>                 dev_err(&pdev->dev, "pm runtime get failed, e = %d\n", ret);

> +               pm_runtime_put(&pdev->dev);

For all your patches, please, double check what you are proposing.

Here, I believe, the correct one will be _put_noidle().

AFAIU you are not supposed to actually suspend the device in case of error.
But I might be mistaken, thus see above.

>                 goto exit_pm_disable;
>         }

Andy Shevchenko May 21, 2020, 8:06 a.m. UTC | #2

On Thu, May 21, 2020 at 11:04 AM Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
> On Thu, May 21, 2020 at 10:50 AM Dinghao Liu <dinghao.liu@zju.edu.cn> wrote:

Any I have coccinelle script for this, I can share with you.

Dinghao Liu May 21, 2020, 8:24 a.m. UTC | #3

Hi Andy,

Thank you for your advice! I will fix the problem in the next edition
of patch. The coccinelle script will be very helpful and I'm looking 
forward to it.

Regards,
Dinghao 

&quot;Andy Shevchenko&quot; &lt;andy.shevchenko@gmail.com&gt;写道：
> On Thu, May 21, 2020 at 11:04 AM Andy Shevchenko
> <andy.shevchenko@gmail.com> wrote:
> > On Thu, May 21, 2020 at 10:50 AM Dinghao Liu <dinghao.liu@zju.edu.cn> wrote:
> 
> Any I have coccinelle script for this, I can share with you.
> 
> -- 
> With Best Regards,
> Andy Shevchenko

Jon Hunter May 21, 2020, 8:38 a.m. UTC | #4

On 21/05/2020 09:04, Andy Shevchenko wrote:
> On Thu, May 21, 2020 at 10:50 AM Dinghao Liu <dinghao.liu@zju.edu.cn> wrote:
>>
>> pm_runtime_get_sync() increments the runtime PM usage counter even
>> when it returns an error code. Thus a pairing decrement is needed on
>> the error handling path to keep the counter balanced.
> 
> ...
> 
>>         ret = pm_runtime_get_sync(&pdev->dev);
>>         if (ret < 0) {
>>                 dev_err(&pdev->dev, "pm runtime get failed, e = %d\n", ret);
> 
>> +               pm_runtime_put(&pdev->dev);
> 
> For all your patches, please, double check what you are proposing.
> 
> Here, I believe, the correct one will be _put_noidle().
> 
> AFAIU you are not supposed to actually suspend the device in case of error.
> But I might be mistaken, thus see above.
> 
>>                 goto exit_pm_disable;
>>         }


Is there any reason why this is not handled in pm_runtime_get itself?
Jon

Jon Hunter May 21, 2020, 8:46 a.m. UTC | #5

On 21/05/2020 09:38, Jon Hunter wrote:
> 
> On 21/05/2020 09:04, Andy Shevchenko wrote:
>> On Thu, May 21, 2020 at 10:50 AM Dinghao Liu <dinghao.liu@zju.edu.cn> wrote:
>>>
>>> pm_runtime_get_sync() increments the runtime PM usage counter even
>>> when it returns an error code. Thus a pairing decrement is needed on
>>> the error handling path to keep the counter balanced.
>>
>> ...
>>
>>>         ret = pm_runtime_get_sync(&pdev->dev);
>>>         if (ret < 0) {
>>>                 dev_err(&pdev->dev, "pm runtime get failed, e = %d\n", ret);
>>
>>> +               pm_runtime_put(&pdev->dev);
>>
>> For all your patches, please, double check what you are proposing.
>>
>> Here, I believe, the correct one will be _put_noidle().
>>
>> AFAIU you are not supposed to actually suspend the device in case of error.
>> But I might be mistaken, thus see above.
>>
>>>                 goto exit_pm_disable;
>>>         }
> 
> 
> Is there any reason why this is not handled in pm_runtime_get itself?

Ah I see a response from Rafael here:
https://lkml.org/lkml/2020/5/20/1100

OK so this is intentional and needs to be fixed.

Jon

Dinghao Liu May 22, 2020, 7:45 a.m. UTC | #6

Hi Andy,

Thank you for your advice!

Your suggestion is to use pm_runtime_put_noidle(), right? 
The only difference between pm_runtime_put() and this function
is that pm_runtime_put() will run an extra pm_request_idle().

I checked this patched function again and found there is a
pm_runtime_put() in the normal branch of pm_runtime_get_sync().
Does this mean the original program logic need to execute idle
callback?

According to runtime PM's doc, the pm_runtime_get_sync() call
paired with a pm_runtime_put() call will be appropriate to ensure
that the device is not put back to sleep during the probe. Therefore
I think pm_runtime_put() is more appropriate here. Do you have 
more detailed suggestion for why we should use _put_noidle()?

Regards,
Dinghao 
> On Thu, May 21, 2020 at 10:50 AM Dinghao Liu <dinghao.liu@zju.edu.cn> wrote:
> >
> > pm_runtime_get_sync() increments the runtime PM usage counter even
> > when it returns an error code. Thus a pairing decrement is needed on
> > the error handling path to keep the counter balanced.
> 
> ...
> 
> >         ret = pm_runtime_get_sync(&pdev->dev);
> >         if (ret < 0) {
> >                 dev_err(&pdev->dev, "pm runtime get failed, e = %d\n", ret);
> 
> > +               pm_runtime_put(&pdev->dev);
> 
> For all your patches, please, double check what you are proposing.
> 
> Here, I believe, the correct one will be _put_noidle().
> 
> AFAIU you are not supposed to actually suspend the device in case of error.
> But I might be mistaken, thus see above.
> 
> >                 goto exit_pm_disable;
> >         }
> 
> -- 
> With Best Regards,
> Andy Shevchenko

Andy Shevchenko May 22, 2020, 3:20 p.m. UTC | #7

On Fri, May 22, 2020 at 10:46 AM <dinghao.liu@zju.edu.cn> wrote:
>
> Hi Andy,
>
> Thank you for your advice!

You are welcome, but please, stop top-posting.

> Your suggestion is to use pm_runtime_put_noidle(), right?
> The only difference between pm_runtime_put() and this function
> is that pm_runtime_put() will run an extra pm_request_idle().
>
> I checked this patched function again and found there is a
> pm_runtime_put() in the normal branch of pm_runtime_get_sync().
> Does this mean the original program logic need to execute idle
> callback?
>
> According to runtime PM's doc, the pm_runtime_get_sync() call
> paired with a pm_runtime_put() call will be appropriate to ensure
> that the device is not put back to sleep during the probe.

Correct.

> Therefore
> I think pm_runtime_put() is more appropriate here.

How come to wrong conclusion? We are considering error path. What does
documentation say about this?

> Do you have
> more detailed suggestion for why we should use _put_noidle()?

Because in error case there is no need to go through all code patch to
be sure that the device is idling. Moreover, consider below case

CPU1: ...somewhere in the code...
pm_runtime_get() // with success!
...see below...
pm_runtime_put()

CPU2: ...on parallel thread...
ret = pm_runtime_get_sync() // failed!
if (ret)
  pm_runtime_put() // oi vei, we put device into sleep

So, there is a potential issue.

> > > pm_runtime_get_sync() increments the runtime PM usage counter even
> > > when it returns an error code. Thus a pairing decrement is needed on
> > > the error handling path to keep the counter balanced.
> >
> > ...
> >
> > >         ret = pm_runtime_get_sync(&pdev->dev);
> > >         if (ret < 0) {
> > >                 dev_err(&pdev->dev, "pm runtime get failed, e = %d\n", ret);
> >
> > > +               pm_runtime_put(&pdev->dev);
> >
> > For all your patches, please, double check what you are proposing.
> >
> > Here, I believe, the correct one will be _put_noidle().
> >
> > AFAIU you are not supposed to actually suspend the device in case of error.
> > But I might be mistaken, thus see above.
> >
> > >                 goto exit_pm_disable;
> > >         }

Andy Shevchenko May 22, 2020, 3:22 p.m. UTC | #8

On Fri, May 22, 2020 at 6:20 PM Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
> On Fri, May 22, 2020 at 10:46 AM <dinghao.liu@zju.edu.cn> wrote:

...

> Moreover, consider below case
>
> CPU1: ...somewhere in the code...
> pm_runtime_get() // with success!
> ...see below...
> pm_runtime_put()
>
> CPU2: ...on parallel thread...
> ret = pm_runtime_get_sync() // failed!
> if (ret)
>   pm_runtime_put() // oi vei, we put device into sleep
>
> So, there is a potential issue.

...and even if it's impossible (no bugs in runtime PM core, etc) the
code with put() looks suspicious.

Dinghao Liu May 23, 2020, 11:32 a.m. UTC | #9

> On Fri, May 22, 2020 at 6:20 PM Andy Shevchenko
> <andy.shevchenko@gmail.com> wrote:
> > On Fri, May 22, 2020 at 10:46 AM <dinghao.liu@zju.edu.cn> wrote:
> 
> ...
> 
> > Moreover, consider below case
> >
> > CPU1: ...somewhere in the code...
> > pm_runtime_get() // with success!
> > ...see below...
> > pm_runtime_put()
> >
> > CPU2: ...on parallel thread...
> > ret = pm_runtime_get_sync() // failed!
> > if (ret)
> >   pm_runtime_put() // oi vei, we put device into sleep
> >
> > So, there is a potential issue.
> 
> ...and even if it's impossible (no bugs in runtime PM core, etc) the
> code with put() looks suspicious.
> 

I may understand what you are worried about. Do you mean that
executing pm_runtime_put() will influence other threads (e.g.,
one parallel thread can put the device into sleep while other
threads are using this device)?

I think this will never happen. Because in this case the PM usage
counter cannot be decreased to zero if there are still some threads
using this device. Otherwise, pm_runtime_put() should never be
used in the case of multithreading, which is strange since this
API is used widely. 

I also checked many other implementation of probe in drivers.
It seems that using pm_runtime_put() is ok. If I misunderstood
your opinion, please point it out, thanks.

Regards,
Dinghao

> -- 
> With Best Regards,
> Andy Shevchenko

Andy Shevchenko May 23, 2020, 11:52 a.m. UTC | #10

On Sat, May 23, 2020 at 2:32 PM <dinghao.liu@zju.edu.cn> wrote:
>
> > On Fri, May 22, 2020 at 6:20 PM Andy Shevchenko
> > <andy.shevchenko@gmail.com> wrote:
> > > On Fri, May 22, 2020 at 10:46 AM <dinghao.liu@zju.edu.cn> wrote:

...

> I also checked many other implementation of probe in drivers.
> It seems that using pm_runtime_put() is ok.

In *error path* or normal path?

> If I misunderstood
> your opinion, please point it out, thanks.

Bottom line is (for the *error path* case):
pm_runtime_put_noidle() has no side effects
pm_runtime_put() (potentially) might have side effects.

You should choose one which is clearer about what it does.

Dinghao Liu May 23, 2020, 12:04 p.m. UTC | #11

> On Sat, May 23, 2020 at 2:32 PM <dinghao.liu@zju.edu.cn> wrote:
> >
> > > On Fri, May 22, 2020 at 6:20 PM Andy Shevchenko
> > > <andy.shevchenko@gmail.com> wrote:
> > > > On Fri, May 22, 2020 at 10:46 AM <dinghao.liu@zju.edu.cn> wrote:
> 
> ...
> 
> > I also checked many other implementation of probe in drivers.
> > It seems that using pm_runtime_put() is ok.
> 
> In *error path* or normal path?
> 

Error path (e.g., sysc_probe, exynos_trng_probe, 
map_rng_probe, ti_eqep_probe).

> > If I misunderstood
> > your opinion, please point it out, thanks.
> 
> Bottom line is (for the *error path* case):
> pm_runtime_put_noidle() has no side effects
> pm_runtime_put() (potentially) might have side effects.
> 
> You should choose one which is clearer about what it does.
> 
> -- 
> With Best Regards,
> Andy Shevchenko

Agree, for this bug using _noidle() is clearer. I will 
send a new path to fix this.

Regards,
Dinghao

spi: tegra20-slink: Fix runtime PM imbalance on error

Commit Message

Comments

Patch