diff mbox series

[v3,1/2] PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter

Message ID 20201110092933.3342784-2-zhangqilong3@huawei.com (mailing list archive)
State Not Applicable, archived
Headers show
Series Fix usage counter leak by adding a general sync ops | expand

Commit Message

Zhang Qilong Nov. 10, 2020, 9:29 a.m. UTC
In many case, we need to check return value of pm_runtime_get_sync, but
it brings a trouble to the usage counter processing. Many callers forget
to decrease the usage counter when it failed, which could resulted in
reference leak. It has been discussed a lot[0][1]. So we add a function
to deal with the usage counter for better coding.

[0]https://lkml.org/lkml/2020/6/14/88
[1]https://patchwork.ozlabs.org/project/linux-tegra/list/?series=178139
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
---
 include/linux/pm_runtime.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

Comments

Jakub Kicinski Nov. 13, 2020, 7:43 p.m. UTC | #1
On Tue, 10 Nov 2020 17:29:32 +0800 Zhang Qilong wrote:
> In many case, we need to check return value of pm_runtime_get_sync, but
> it brings a trouble to the usage counter processing. Many callers forget
> to decrease the usage counter when it failed, which could resulted in
> reference leak. It has been discussed a lot[0][1]. So we add a function
> to deal with the usage counter for better coding.
> 
> [0]https://lkml.org/lkml/2020/6/14/88
> [1]https://patchwork.ozlabs.org/project/linux-tegra/list/?series=178139
> Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>

FWIW I'm expecting to apply this series to net-next once we get an ack
from the power management side.
Rafael J. Wysocki Nov. 16, 2020, 12:49 p.m. UTC | #2
On Tue, Nov 10, 2020 at 10:25 AM Zhang Qilong <zhangqilong3@huawei.com> wrote:
>
> In many case, we need to check return value of pm_runtime_get_sync, but
> it brings a trouble to the usage counter processing. Many callers forget
> to decrease the usage counter when it failed, which could resulted in
> reference leak. It has been discussed a lot[0][1]. So we add a function
> to deal with the usage counter for better coding.
>
> [0]https://lkml.org/lkml/2020/6/14/88
> [1]https://patchwork.ozlabs.org/project/linux-tegra/list/?series=178139
> Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>

Acked-by: Rafael J. Wysocki  <rafael.j.wysocki@intel.com>

> ---
>  include/linux/pm_runtime.h | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
>
> diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
> index 4b708f4e8eed..b492ae00cc90 100644
> --- a/include/linux/pm_runtime.h
> +++ b/include/linux/pm_runtime.h
> @@ -386,6 +386,27 @@ static inline int pm_runtime_get_sync(struct device *dev)
>         return __pm_runtime_resume(dev, RPM_GET_PUT);
>  }
>
> +/**
> + * pm_runtime_resume_and_get - Bump up usage counter of a device and resume it.
> + * @dev: Target device.
> + *
> + * Resume @dev synchronously and if that is successful, increment its runtime
> + * PM usage counter. Return 0 if the runtime PM usage counter of @dev has been
> + * incremented or a negative error code otherwise.
> + */
> +static inline int pm_runtime_resume_and_get(struct device *dev)
> +{
> +       int ret;
> +
> +       ret = __pm_runtime_resume(dev, RPM_GET_PUT);
> +       if (ret < 0) {
> +               pm_runtime_put_noidle(dev);
> +               return ret;
> +       }
> +
> +       return 0;
> +}
> +
>  /**
>   * pm_runtime_put - Drop device usage counter and queue up "idle check" if 0.
>   * @dev: Target device.
> --
> 2.25.4
>
Geert Uytterhoeven Nov. 27, 2020, 10:15 a.m. UTC | #3
Hi Zhang,

On Tue, Nov 10, 2020 at 10:29 AM Zhang Qilong <zhangqilong3@huawei.com> wrote:
> In many case, we need to check return value of pm_runtime_get_sync, but
> it brings a trouble to the usage counter processing. Many callers forget
> to decrease the usage counter when it failed, which could resulted in
> reference leak. It has been discussed a lot[0][1]. So we add a function
> to deal with the usage counter for better coding.
>
> [0]https://lkml.org/lkml/2020/6/14/88
> [1]https://patchwork.ozlabs.org/project/linux-tegra/list/?series=178139
> Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>

Thanks for your patch, which is now commit dd8088d5a8969dc2 ("PM:
runtime: Add pm_runtime_resume_and_get to deal with usage counter") in
v5.10-rc5.

> --- a/include/linux/pm_runtime.h
> +++ b/include/linux/pm_runtime.h
> @@ -386,6 +386,27 @@ static inline int pm_runtime_get_sync(struct device *dev)
>         return __pm_runtime_resume(dev, RPM_GET_PUT);
>  }
>
> +/**
> + * pm_runtime_resume_and_get - Bump up usage counter of a device and resume it.
> + * @dev: Target device.
> + *
> + * Resume @dev synchronously and if that is successful, increment its runtime
> + * PM usage counter. Return 0 if the runtime PM usage counter of @dev has been
> + * incremented or a negative error code otherwise.
> + */
> +static inline int pm_runtime_resume_and_get(struct device *dev)

Perhaps this function should be called pm_runtime_resume_and_get_sync(),
to make it clear it does a synchronous get?

I had to look into the implementation to verify that a change like

-       ret = pm_runtime_get_sync(&pdev->dev);
+       ret = pm_runtime_resume_and_get(&pdev->dev);

in the follow-up patches is actually a valid change, maintaining
synchronous operation. Oh, pm_runtime_resume() is synchronous, too...

While I agree the old pm_runtime_get_sync() had confusing semantics, we
should go to great lengths to avoid making the same mistake while fixing
it.

> +{
> +       int ret;
> +
> +       ret = __pm_runtime_resume(dev, RPM_GET_PUT);
> +       if (ret < 0) {
> +               pm_runtime_put_noidle(dev);
> +               return ret;
> +       }
> +
> +       return 0;
> +}
> +
>  /**
>   * pm_runtime_put - Drop device usage counter and queue up "idle check" if 0.
>   * @dev: Target device.

Gr{oetje,eeting}s,

                        Geert


--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Rafael J. Wysocki Nov. 30, 2020, 4:37 p.m. UTC | #4
On Fri, Nov 27, 2020 at 11:16 AM Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
>
> Hi Zhang,
>
> On Tue, Nov 10, 2020 at 10:29 AM Zhang Qilong <zhangqilong3@huawei.com> wrote:
> > In many case, we need to check return value of pm_runtime_get_sync, but
> > it brings a trouble to the usage counter processing. Many callers forget
> > to decrease the usage counter when it failed, which could resulted in
> > reference leak. It has been discussed a lot[0][1]. So we add a function
> > to deal with the usage counter for better coding.
> >
> > [0]https://lkml.org/lkml/2020/6/14/88
> > [1]https://patchwork.ozlabs.org/project/linux-tegra/list/?series=178139
> > Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
>
> Thanks for your patch, which is now commit dd8088d5a8969dc2 ("PM:
> runtime: Add pm_runtime_resume_and_get to deal with usage counter") in
> v5.10-rc5.
>
> > --- a/include/linux/pm_runtime.h
> > +++ b/include/linux/pm_runtime.h
> > @@ -386,6 +386,27 @@ static inline int pm_runtime_get_sync(struct device *dev)
> >         return __pm_runtime_resume(dev, RPM_GET_PUT);
> >  }
> >
> > +/**
> > + * pm_runtime_resume_and_get - Bump up usage counter of a device and resume it.
> > + * @dev: Target device.
> > + *
> > + * Resume @dev synchronously and if that is successful, increment its runtime
> > + * PM usage counter. Return 0 if the runtime PM usage counter of @dev has been
> > + * incremented or a negative error code otherwise.
> > + */
> > +static inline int pm_runtime_resume_and_get(struct device *dev)
>
> Perhaps this function should be called pm_runtime_resume_and_get_sync(),

No, really.

I might consider calling it pm_runtime_acquire(), and adding a
matching _release() as a pm_runtime_get() synonym for that matter, but
not the above.

> to make it clear it does a synchronous get?
>
> I had to look into the implementation to verify that a change like

I'm not sure why, because the kerneldoc is unambiguous AFAICS.

>
> -       ret = pm_runtime_get_sync(&pdev->dev);
> +       ret = pm_runtime_resume_and_get(&pdev->dev);
>
> in the follow-up patches is actually a valid change, maintaining
> synchronous operation. Oh, pm_runtime_resume() is synchronous, too...

Yes, it is.
Laurent Pinchart Nov. 30, 2020, 5:35 p.m. UTC | #5
On Mon, Nov 30, 2020 at 05:37:52PM +0100, Rafael J. Wysocki wrote:
> On Fri, Nov 27, 2020 at 11:16 AM Geert Uytterhoeven wrote:
> > On Tue, Nov 10, 2020 at 10:29 AM Zhang Qilong <zhangqilong3@huawei.com> wrote:
> > > In many case, we need to check return value of pm_runtime_get_sync, but
> > > it brings a trouble to the usage counter processing. Many callers forget
> > > to decrease the usage counter when it failed, which could resulted in
> > > reference leak. It has been discussed a lot[0][1]. So we add a function
> > > to deal with the usage counter for better coding.
> > >
> > > [0]https://lkml.org/lkml/2020/6/14/88
> > > [1]https://patchwork.ozlabs.org/project/linux-tegra/list/?series=178139
> > > Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
> >
> > Thanks for your patch, which is now commit dd8088d5a8969dc2 ("PM:
> > runtime: Add pm_runtime_resume_and_get to deal with usage counter") in
> > v5.10-rc5.
> >
> > > --- a/include/linux/pm_runtime.h
> > > +++ b/include/linux/pm_runtime.h
> > > @@ -386,6 +386,27 @@ static inline int pm_runtime_get_sync(struct device *dev)
> > >         return __pm_runtime_resume(dev, RPM_GET_PUT);
> > >  }
> > >
> > > +/**
> > > + * pm_runtime_resume_and_get - Bump up usage counter of a device and resume it.
> > > + * @dev: Target device.
> > > + *
> > > + * Resume @dev synchronously and if that is successful, increment its runtime
> > > + * PM usage counter. Return 0 if the runtime PM usage counter of @dev has been
> > > + * incremented or a negative error code otherwise.
> > > + */
> > > +static inline int pm_runtime_resume_and_get(struct device *dev)
> >
> > Perhaps this function should be called pm_runtime_resume_and_get_sync(),
> 
> No, really.
> 
> I might consider calling it pm_runtime_acquire(), and adding a
> matching _release() as a pm_runtime_get() synonym for that matter, but
> not the above.

pm_runtime_acquire() seems better to me too. Would pm_runtime_release()
would be an alias for pm_runtime_put() ?

We would also likely need a pm_runtime_release_autosuspend() too then.
But on that topic, I was wondering, is there a reason we can't select
autosuspend behaviour automatically when autosuspend is enabled ?

> > to make it clear it does a synchronous get?
> >
> > I had to look into the implementation to verify that a change like
> 
> I'm not sure why, because the kerneldoc is unambiguous AFAICS.
> 
> >
> > -       ret = pm_runtime_get_sync(&pdev->dev);
> > +       ret = pm_runtime_resume_and_get(&pdev->dev);
> >
> > in the follow-up patches is actually a valid change, maintaining
> > synchronous operation. Oh, pm_runtime_resume() is synchronous, too...
> 
> Yes, it is.
Rafael J. Wysocki Nov. 30, 2020, 5:55 p.m. UTC | #6
On Mon, Nov 30, 2020 at 6:35 PM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> On Mon, Nov 30, 2020 at 05:37:52PM +0100, Rafael J. Wysocki wrote:
> > On Fri, Nov 27, 2020 at 11:16 AM Geert Uytterhoeven wrote:
> > > On Tue, Nov 10, 2020 at 10:29 AM Zhang Qilong <zhangqilong3@huawei.com> wrote:
> > > > In many case, we need to check return value of pm_runtime_get_sync, but
> > > > it brings a trouble to the usage counter processing. Many callers forget
> > > > to decrease the usage counter when it failed, which could resulted in
> > > > reference leak. It has been discussed a lot[0][1]. So we add a function
> > > > to deal with the usage counter for better coding.
> > > >
> > > > [0]https://lkml.org/lkml/2020/6/14/88
> > > > [1]https://patchwork.ozlabs.org/project/linux-tegra/list/?series=178139
> > > > Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
> > >
> > > Thanks for your patch, which is now commit dd8088d5a8969dc2 ("PM:
> > > runtime: Add pm_runtime_resume_and_get to deal with usage counter") in
> > > v5.10-rc5.
> > >
> > > > --- a/include/linux/pm_runtime.h
> > > > +++ b/include/linux/pm_runtime.h
> > > > @@ -386,6 +386,27 @@ static inline int pm_runtime_get_sync(struct device *dev)
> > > >         return __pm_runtime_resume(dev, RPM_GET_PUT);
> > > >  }
> > > >
> > > > +/**
> > > > + * pm_runtime_resume_and_get - Bump up usage counter of a device and resume it.
> > > > + * @dev: Target device.
> > > > + *
> > > > + * Resume @dev synchronously and if that is successful, increment its runtime
> > > > + * PM usage counter. Return 0 if the runtime PM usage counter of @dev has been
> > > > + * incremented or a negative error code otherwise.
> > > > + */
> > > > +static inline int pm_runtime_resume_and_get(struct device *dev)
> > >
> > > Perhaps this function should be called pm_runtime_resume_and_get_sync(),
> >
> > No, really.
> >
> > I might consider calling it pm_runtime_acquire(), and adding a
> > matching _release() as a pm_runtime_get() synonym for that matter, but
> > not the above.
>
> pm_runtime_acquire() seems better to me too. Would pm_runtime_release()
> would be an alias for pm_runtime_put() ?

Yes.  This covers all of the use cases relevant for drivers AFAICS.

> We would also likely need a pm_runtime_release_autosuspend() too then.

Why would we?

> But on that topic, I was wondering, is there a reason we can't select
> autosuspend behaviour automatically when autosuspend is enabled ?

That is the case already.

pm_runtime_put() will autosuspend if enabled and the usage counter is
0, as long as ->runtime_idle() returns 0 (or is absent).

pm_runtime_put_autosuspend() is an optimization allowing
->runtime_idle() to be skipped entirely, but I'm wondering how many
users really need that.
Laurent Pinchart Nov. 30, 2020, 6:50 p.m. UTC | #7
Hi Rafael,

On Mon, Nov 30, 2020 at 06:55:57PM +0100, Rafael J. Wysocki wrote:
> On Mon, Nov 30, 2020 at 6:35 PM Laurent Pinchart wrote:
> > On Mon, Nov 30, 2020 at 05:37:52PM +0100, Rafael J. Wysocki wrote:
> > > On Fri, Nov 27, 2020 at 11:16 AM Geert Uytterhoeven wrote:
> > > > On Tue, Nov 10, 2020 at 10:29 AM Zhang Qilong <zhangqilong3@huawei.com> wrote:
> > > > > In many case, we need to check return value of pm_runtime_get_sync, but
> > > > > it brings a trouble to the usage counter processing. Many callers forget
> > > > > to decrease the usage counter when it failed, which could resulted in
> > > > > reference leak. It has been discussed a lot[0][1]. So we add a function
> > > > > to deal with the usage counter for better coding.
> > > > >
> > > > > [0]https://lkml.org/lkml/2020/6/14/88
> > > > > [1]https://patchwork.ozlabs.org/project/linux-tegra/list/?series=178139
> > > > > Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
> > > >
> > > > Thanks for your patch, which is now commit dd8088d5a8969dc2 ("PM:
> > > > runtime: Add pm_runtime_resume_and_get to deal with usage counter") in
> > > > v5.10-rc5.
> > > >
> > > > > --- a/include/linux/pm_runtime.h
> > > > > +++ b/include/linux/pm_runtime.h
> > > > > @@ -386,6 +386,27 @@ static inline int pm_runtime_get_sync(struct device *dev)
> > > > >         return __pm_runtime_resume(dev, RPM_GET_PUT);
> > > > >  }
> > > > >
> > > > > +/**
> > > > > + * pm_runtime_resume_and_get - Bump up usage counter of a device and resume it.
> > > > > + * @dev: Target device.
> > > > > + *
> > > > > + * Resume @dev synchronously and if that is successful, increment its runtime
> > > > > + * PM usage counter. Return 0 if the runtime PM usage counter of @dev has been
> > > > > + * incremented or a negative error code otherwise.
> > > > > + */
> > > > > +static inline int pm_runtime_resume_and_get(struct device *dev)
> > > >
> > > > Perhaps this function should be called pm_runtime_resume_and_get_sync(),
> > >
> > > No, really.
> > >
> > > I might consider calling it pm_runtime_acquire(), and adding a
> > > matching _release() as a pm_runtime_get() synonym for that matter, but
> > > not the above.
> >
> > pm_runtime_acquire() seems better to me too. Would pm_runtime_release()
> > would be an alias for pm_runtime_put() ?
> 
> Yes.  This covers all of the use cases relevant for drivers AFAICS.
> 
> > We would also likely need a pm_runtime_release_autosuspend() too then.
> 
> Why would we?
> 
> > But on that topic, I was wondering, is there a reason we can't select
> > autosuspend behaviour automatically when autosuspend is enabled ?
> 
> That is the case already.
> 
> pm_runtime_put() will autosuspend if enabled and the usage counter is
> 0, as long as ->runtime_idle() returns 0 (or is absent).
> 
> pm_runtime_put_autosuspend() is an optimization allowing
> ->runtime_idle() to be skipped entirely, but I'm wondering how many
> users really need that.

Ah, I didn't know that, that's good to know. We then don't need
pm_runtime_release_autosuspend() (unless the optimization really makes a
big difference).

Should I write new drievr code with pm_runtime_put() instead of
pm_runtime_put_autosuspend() ? I haven't found clear guidelines on this
in the documentation.
Rafael J. Wysocki Nov. 30, 2020, 7:12 p.m. UTC | #8
On Mon, Nov 30, 2020 at 7:50 PM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Rafael,
>
> On Mon, Nov 30, 2020 at 06:55:57PM +0100, Rafael J. Wysocki wrote:
> > On Mon, Nov 30, 2020 at 6:35 PM Laurent Pinchart wrote:
> > > On Mon, Nov 30, 2020 at 05:37:52PM +0100, Rafael J. Wysocki wrote:
> > > > On Fri, Nov 27, 2020 at 11:16 AM Geert Uytterhoeven wrote:
> > > > > On Tue, Nov 10, 2020 at 10:29 AM Zhang Qilong <zhangqilong3@huawei.com> wrote:
> > > > > > In many case, we need to check return value of pm_runtime_get_sync, but
> > > > > > it brings a trouble to the usage counter processing. Many callers forget
> > > > > > to decrease the usage counter when it failed, which could resulted in
> > > > > > reference leak. It has been discussed a lot[0][1]. So we add a function
> > > > > > to deal with the usage counter for better coding.
> > > > > >
> > > > > > [0]https://lkml.org/lkml/2020/6/14/88
> > > > > > [1]https://patchwork.ozlabs.org/project/linux-tegra/list/?series=178139
> > > > > > Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
> > > > >
> > > > > Thanks for your patch, which is now commit dd8088d5a8969dc2 ("PM:
> > > > > runtime: Add pm_runtime_resume_and_get to deal with usage counter") in
> > > > > v5.10-rc5.
> > > > >
> > > > > > --- a/include/linux/pm_runtime.h
> > > > > > +++ b/include/linux/pm_runtime.h
> > > > > > @@ -386,6 +386,27 @@ static inline int pm_runtime_get_sync(struct device *dev)
> > > > > >         return __pm_runtime_resume(dev, RPM_GET_PUT);
> > > > > >  }
> > > > > >
> > > > > > +/**
> > > > > > + * pm_runtime_resume_and_get - Bump up usage counter of a device and resume it.
> > > > > > + * @dev: Target device.
> > > > > > + *
> > > > > > + * Resume @dev synchronously and if that is successful, increment its runtime
> > > > > > + * PM usage counter. Return 0 if the runtime PM usage counter of @dev has been
> > > > > > + * incremented or a negative error code otherwise.
> > > > > > + */
> > > > > > +static inline int pm_runtime_resume_and_get(struct device *dev)
> > > > >
> > > > > Perhaps this function should be called pm_runtime_resume_and_get_sync(),
> > > >
> > > > No, really.
> > > >
> > > > I might consider calling it pm_runtime_acquire(), and adding a
> > > > matching _release() as a pm_runtime_get() synonym for that matter, but
> > > > not the above.
> > >
> > > pm_runtime_acquire() seems better to me too. Would pm_runtime_release()
> > > would be an alias for pm_runtime_put() ?
> >
> > Yes.  This covers all of the use cases relevant for drivers AFAICS.
> >
> > > We would also likely need a pm_runtime_release_autosuspend() too then.
> >
> > Why would we?
> >
> > > But on that topic, I was wondering, is there a reason we can't select
> > > autosuspend behaviour automatically when autosuspend is enabled ?
> >
> > That is the case already.
> >
> > pm_runtime_put() will autosuspend if enabled and the usage counter is
> > 0, as long as ->runtime_idle() returns 0 (or is absent).
> >
> > pm_runtime_put_autosuspend() is an optimization allowing
> > ->runtime_idle() to be skipped entirely, but I'm wondering how many
> > users really need that.
>
> Ah, I didn't know that, that's good to know. We then don't need
> pm_runtime_release_autosuspend() (unless the optimization really makes a
> big difference).
>
> Should I write new drievr code with pm_runtime_put() instead of
> pm_runtime_put_autosuspend() ?

If you don't have ->runtime_idle() in the driver (and in the bus type
generally speaking, but none of them provide it IIRC),
pm_runtime_put() is basically equivalent to
pm_runtime_put_autosuspend() AFAICS, except for some extra checks done
by the former.

Otherwise it all depends on what the ->runtime_idle() callback does,
but it is hard to imagine a practical use case when the difference
would be really meaningful.

> I haven't found clear guidelines on this in the documentation.

Yes, that's one of the items I need to take care of.
diff mbox series

Patch

diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
index 4b708f4e8eed..b492ae00cc90 100644
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -386,6 +386,27 @@  static inline int pm_runtime_get_sync(struct device *dev)
 	return __pm_runtime_resume(dev, RPM_GET_PUT);
 }
 
+/**
+ * pm_runtime_resume_and_get - Bump up usage counter of a device and resume it.
+ * @dev: Target device.
+ *
+ * Resume @dev synchronously and if that is successful, increment its runtime
+ * PM usage counter. Return 0 if the runtime PM usage counter of @dev has been
+ * incremented or a negative error code otherwise.
+ */
+static inline int pm_runtime_resume_and_get(struct device *dev)
+{
+	int ret;
+
+	ret = __pm_runtime_resume(dev, RPM_GET_PUT);
+	if (ret < 0) {
+		pm_runtime_put_noidle(dev);
+		return ret;
+	}
+
+	return 0;
+}
+
 /**
  * pm_runtime_put - Drop device usage counter and queue up "idle check" if 0.
  * @dev: Target device.