Message ID | 20201110092933.3342784-2-zhangqilong3@huawei.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | Fix usage counter leak by adding a general sync ops | expand |
On Tue, 10 Nov 2020 17:29:32 +0800 Zhang Qilong wrote: > In many case, we need to check return value of pm_runtime_get_sync, but > it brings a trouble to the usage counter processing. Many callers forget > to decrease the usage counter when it failed, which could resulted in > reference leak. It has been discussed a lot[0][1]. So we add a function > to deal with the usage counter for better coding. > > [0]https://lkml.org/lkml/2020/6/14/88 > [1]https://patchwork.ozlabs.org/project/linux-tegra/list/?series=178139 > Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> FWIW I'm expecting to apply this series to net-next once we get an ack from the power management side.
On Tue, Nov 10, 2020 at 10:25 AM Zhang Qilong <zhangqilong3@huawei.com> wrote: > > In many case, we need to check return value of pm_runtime_get_sync, but > it brings a trouble to the usage counter processing. Many callers forget > to decrease the usage counter when it failed, which could resulted in > reference leak. It has been discussed a lot[0][1]. So we add a function > to deal with the usage counter for better coding. > > [0]https://lkml.org/lkml/2020/6/14/88 > [1]https://patchwork.ozlabs.org/project/linux-tegra/list/?series=178139 > Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > include/linux/pm_runtime.h | 21 +++++++++++++++++++++ > 1 file changed, 21 insertions(+) > > diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h > index 4b708f4e8eed..b492ae00cc90 100644 > --- a/include/linux/pm_runtime.h > +++ b/include/linux/pm_runtime.h > @@ -386,6 +386,27 @@ static inline int pm_runtime_get_sync(struct device *dev) > return __pm_runtime_resume(dev, RPM_GET_PUT); > } > > +/** > + * pm_runtime_resume_and_get - Bump up usage counter of a device and resume it. > + * @dev: Target device. > + * > + * Resume @dev synchronously and if that is successful, increment its runtime > + * PM usage counter. Return 0 if the runtime PM usage counter of @dev has been > + * incremented or a negative error code otherwise. > + */ > +static inline int pm_runtime_resume_and_get(struct device *dev) > +{ > + int ret; > + > + ret = __pm_runtime_resume(dev, RPM_GET_PUT); > + if (ret < 0) { > + pm_runtime_put_noidle(dev); > + return ret; > + } > + > + return 0; > +} > + > /** > * pm_runtime_put - Drop device usage counter and queue up "idle check" if 0. > * @dev: Target device. > -- > 2.25.4 >
Hi Zhang, On Tue, Nov 10, 2020 at 10:29 AM Zhang Qilong <zhangqilong3@huawei.com> wrote: > In many case, we need to check return value of pm_runtime_get_sync, but > it brings a trouble to the usage counter processing. Many callers forget > to decrease the usage counter when it failed, which could resulted in > reference leak. It has been discussed a lot[0][1]. So we add a function > to deal with the usage counter for better coding. > > [0]https://lkml.org/lkml/2020/6/14/88 > [1]https://patchwork.ozlabs.org/project/linux-tegra/list/?series=178139 > Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Thanks for your patch, which is now commit dd8088d5a8969dc2 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter") in v5.10-rc5. > --- a/include/linux/pm_runtime.h > +++ b/include/linux/pm_runtime.h > @@ -386,6 +386,27 @@ static inline int pm_runtime_get_sync(struct device *dev) > return __pm_runtime_resume(dev, RPM_GET_PUT); > } > > +/** > + * pm_runtime_resume_and_get - Bump up usage counter of a device and resume it. > + * @dev: Target device. > + * > + * Resume @dev synchronously and if that is successful, increment its runtime > + * PM usage counter. Return 0 if the runtime PM usage counter of @dev has been > + * incremented or a negative error code otherwise. > + */ > +static inline int pm_runtime_resume_and_get(struct device *dev) Perhaps this function should be called pm_runtime_resume_and_get_sync(), to make it clear it does a synchronous get? I had to look into the implementation to verify that a change like - ret = pm_runtime_get_sync(&pdev->dev); + ret = pm_runtime_resume_and_get(&pdev->dev); in the follow-up patches is actually a valid change, maintaining synchronous operation. Oh, pm_runtime_resume() is synchronous, too... While I agree the old pm_runtime_get_sync() had confusing semantics, we should go to great lengths to avoid making the same mistake while fixing it. > +{ > + int ret; > + > + ret = __pm_runtime_resume(dev, RPM_GET_PUT); > + if (ret < 0) { > + pm_runtime_put_noidle(dev); > + return ret; > + } > + > + return 0; > +} > + > /** > * pm_runtime_put - Drop device usage counter and queue up "idle check" if 0. > * @dev: Target device. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
On Fri, Nov 27, 2020 at 11:16 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > Hi Zhang, > > On Tue, Nov 10, 2020 at 10:29 AM Zhang Qilong <zhangqilong3@huawei.com> wrote: > > In many case, we need to check return value of pm_runtime_get_sync, but > > it brings a trouble to the usage counter processing. Many callers forget > > to decrease the usage counter when it failed, which could resulted in > > reference leak. It has been discussed a lot[0][1]. So we add a function > > to deal with the usage counter for better coding. > > > > [0]https://lkml.org/lkml/2020/6/14/88 > > [1]https://patchwork.ozlabs.org/project/linux-tegra/list/?series=178139 > > Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> > > Thanks for your patch, which is now commit dd8088d5a8969dc2 ("PM: > runtime: Add pm_runtime_resume_and_get to deal with usage counter") in > v5.10-rc5. > > > --- a/include/linux/pm_runtime.h > > +++ b/include/linux/pm_runtime.h > > @@ -386,6 +386,27 @@ static inline int pm_runtime_get_sync(struct device *dev) > > return __pm_runtime_resume(dev, RPM_GET_PUT); > > } > > > > +/** > > + * pm_runtime_resume_and_get - Bump up usage counter of a device and resume it. > > + * @dev: Target device. > > + * > > + * Resume @dev synchronously and if that is successful, increment its runtime > > + * PM usage counter. Return 0 if the runtime PM usage counter of @dev has been > > + * incremented or a negative error code otherwise. > > + */ > > +static inline int pm_runtime_resume_and_get(struct device *dev) > > Perhaps this function should be called pm_runtime_resume_and_get_sync(), No, really. I might consider calling it pm_runtime_acquire(), and adding a matching _release() as a pm_runtime_get() synonym for that matter, but not the above. > to make it clear it does a synchronous get? > > I had to look into the implementation to verify that a change like I'm not sure why, because the kerneldoc is unambiguous AFAICS. > > - ret = pm_runtime_get_sync(&pdev->dev); > + ret = pm_runtime_resume_and_get(&pdev->dev); > > in the follow-up patches is actually a valid change, maintaining > synchronous operation. Oh, pm_runtime_resume() is synchronous, too... Yes, it is.
On Mon, Nov 30, 2020 at 05:37:52PM +0100, Rafael J. Wysocki wrote: > On Fri, Nov 27, 2020 at 11:16 AM Geert Uytterhoeven wrote: > > On Tue, Nov 10, 2020 at 10:29 AM Zhang Qilong <zhangqilong3@huawei.com> wrote: > > > In many case, we need to check return value of pm_runtime_get_sync, but > > > it brings a trouble to the usage counter processing. Many callers forget > > > to decrease the usage counter when it failed, which could resulted in > > > reference leak. It has been discussed a lot[0][1]. So we add a function > > > to deal with the usage counter for better coding. > > > > > > [0]https://lkml.org/lkml/2020/6/14/88 > > > [1]https://patchwork.ozlabs.org/project/linux-tegra/list/?series=178139 > > > Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> > > > > Thanks for your patch, which is now commit dd8088d5a8969dc2 ("PM: > > runtime: Add pm_runtime_resume_and_get to deal with usage counter") in > > v5.10-rc5. > > > > > --- a/include/linux/pm_runtime.h > > > +++ b/include/linux/pm_runtime.h > > > @@ -386,6 +386,27 @@ static inline int pm_runtime_get_sync(struct device *dev) > > > return __pm_runtime_resume(dev, RPM_GET_PUT); > > > } > > > > > > +/** > > > + * pm_runtime_resume_and_get - Bump up usage counter of a device and resume it. > > > + * @dev: Target device. > > > + * > > > + * Resume @dev synchronously and if that is successful, increment its runtime > > > + * PM usage counter. Return 0 if the runtime PM usage counter of @dev has been > > > + * incremented or a negative error code otherwise. > > > + */ > > > +static inline int pm_runtime_resume_and_get(struct device *dev) > > > > Perhaps this function should be called pm_runtime_resume_and_get_sync(), > > No, really. > > I might consider calling it pm_runtime_acquire(), and adding a > matching _release() as a pm_runtime_get() synonym for that matter, but > not the above. pm_runtime_acquire() seems better to me too. Would pm_runtime_release() would be an alias for pm_runtime_put() ? We would also likely need a pm_runtime_release_autosuspend() too then. But on that topic, I was wondering, is there a reason we can't select autosuspend behaviour automatically when autosuspend is enabled ? > > to make it clear it does a synchronous get? > > > > I had to look into the implementation to verify that a change like > > I'm not sure why, because the kerneldoc is unambiguous AFAICS. > > > > > - ret = pm_runtime_get_sync(&pdev->dev); > > + ret = pm_runtime_resume_and_get(&pdev->dev); > > > > in the follow-up patches is actually a valid change, maintaining > > synchronous operation. Oh, pm_runtime_resume() is synchronous, too... > > Yes, it is.
On Mon, Nov 30, 2020 at 6:35 PM Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > On Mon, Nov 30, 2020 at 05:37:52PM +0100, Rafael J. Wysocki wrote: > > On Fri, Nov 27, 2020 at 11:16 AM Geert Uytterhoeven wrote: > > > On Tue, Nov 10, 2020 at 10:29 AM Zhang Qilong <zhangqilong3@huawei.com> wrote: > > > > In many case, we need to check return value of pm_runtime_get_sync, but > > > > it brings a trouble to the usage counter processing. Many callers forget > > > > to decrease the usage counter when it failed, which could resulted in > > > > reference leak. It has been discussed a lot[0][1]. So we add a function > > > > to deal with the usage counter for better coding. > > > > > > > > [0]https://lkml.org/lkml/2020/6/14/88 > > > > [1]https://patchwork.ozlabs.org/project/linux-tegra/list/?series=178139 > > > > Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> > > > > > > Thanks for your patch, which is now commit dd8088d5a8969dc2 ("PM: > > > runtime: Add pm_runtime_resume_and_get to deal with usage counter") in > > > v5.10-rc5. > > > > > > > --- a/include/linux/pm_runtime.h > > > > +++ b/include/linux/pm_runtime.h > > > > @@ -386,6 +386,27 @@ static inline int pm_runtime_get_sync(struct device *dev) > > > > return __pm_runtime_resume(dev, RPM_GET_PUT); > > > > } > > > > > > > > +/** > > > > + * pm_runtime_resume_and_get - Bump up usage counter of a device and resume it. > > > > + * @dev: Target device. > > > > + * > > > > + * Resume @dev synchronously and if that is successful, increment its runtime > > > > + * PM usage counter. Return 0 if the runtime PM usage counter of @dev has been > > > > + * incremented or a negative error code otherwise. > > > > + */ > > > > +static inline int pm_runtime_resume_and_get(struct device *dev) > > > > > > Perhaps this function should be called pm_runtime_resume_and_get_sync(), > > > > No, really. > > > > I might consider calling it pm_runtime_acquire(), and adding a > > matching _release() as a pm_runtime_get() synonym for that matter, but > > not the above. > > pm_runtime_acquire() seems better to me too. Would pm_runtime_release() > would be an alias for pm_runtime_put() ? Yes. This covers all of the use cases relevant for drivers AFAICS. > We would also likely need a pm_runtime_release_autosuspend() too then. Why would we? > But on that topic, I was wondering, is there a reason we can't select > autosuspend behaviour automatically when autosuspend is enabled ? That is the case already. pm_runtime_put() will autosuspend if enabled and the usage counter is 0, as long as ->runtime_idle() returns 0 (or is absent). pm_runtime_put_autosuspend() is an optimization allowing ->runtime_idle() to be skipped entirely, but I'm wondering how many users really need that.
Hi Rafael, On Mon, Nov 30, 2020 at 06:55:57PM +0100, Rafael J. Wysocki wrote: > On Mon, Nov 30, 2020 at 6:35 PM Laurent Pinchart wrote: > > On Mon, Nov 30, 2020 at 05:37:52PM +0100, Rafael J. Wysocki wrote: > > > On Fri, Nov 27, 2020 at 11:16 AM Geert Uytterhoeven wrote: > > > > On Tue, Nov 10, 2020 at 10:29 AM Zhang Qilong <zhangqilong3@huawei.com> wrote: > > > > > In many case, we need to check return value of pm_runtime_get_sync, but > > > > > it brings a trouble to the usage counter processing. Many callers forget > > > > > to decrease the usage counter when it failed, which could resulted in > > > > > reference leak. It has been discussed a lot[0][1]. So we add a function > > > > > to deal with the usage counter for better coding. > > > > > > > > > > [0]https://lkml.org/lkml/2020/6/14/88 > > > > > [1]https://patchwork.ozlabs.org/project/linux-tegra/list/?series=178139 > > > > > Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> > > > > > > > > Thanks for your patch, which is now commit dd8088d5a8969dc2 ("PM: > > > > runtime: Add pm_runtime_resume_and_get to deal with usage counter") in > > > > v5.10-rc5. > > > > > > > > > --- a/include/linux/pm_runtime.h > > > > > +++ b/include/linux/pm_runtime.h > > > > > @@ -386,6 +386,27 @@ static inline int pm_runtime_get_sync(struct device *dev) > > > > > return __pm_runtime_resume(dev, RPM_GET_PUT); > > > > > } > > > > > > > > > > +/** > > > > > + * pm_runtime_resume_and_get - Bump up usage counter of a device and resume it. > > > > > + * @dev: Target device. > > > > > + * > > > > > + * Resume @dev synchronously and if that is successful, increment its runtime > > > > > + * PM usage counter. Return 0 if the runtime PM usage counter of @dev has been > > > > > + * incremented or a negative error code otherwise. > > > > > + */ > > > > > +static inline int pm_runtime_resume_and_get(struct device *dev) > > > > > > > > Perhaps this function should be called pm_runtime_resume_and_get_sync(), > > > > > > No, really. > > > > > > I might consider calling it pm_runtime_acquire(), and adding a > > > matching _release() as a pm_runtime_get() synonym for that matter, but > > > not the above. > > > > pm_runtime_acquire() seems better to me too. Would pm_runtime_release() > > would be an alias for pm_runtime_put() ? > > Yes. This covers all of the use cases relevant for drivers AFAICS. > > > We would also likely need a pm_runtime_release_autosuspend() too then. > > Why would we? > > > But on that topic, I was wondering, is there a reason we can't select > > autosuspend behaviour automatically when autosuspend is enabled ? > > That is the case already. > > pm_runtime_put() will autosuspend if enabled and the usage counter is > 0, as long as ->runtime_idle() returns 0 (or is absent). > > pm_runtime_put_autosuspend() is an optimization allowing > ->runtime_idle() to be skipped entirely, but I'm wondering how many > users really need that. Ah, I didn't know that, that's good to know. We then don't need pm_runtime_release_autosuspend() (unless the optimization really makes a big difference). Should I write new drievr code with pm_runtime_put() instead of pm_runtime_put_autosuspend() ? I haven't found clear guidelines on this in the documentation.
On Mon, Nov 30, 2020 at 7:50 PM Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > Hi Rafael, > > On Mon, Nov 30, 2020 at 06:55:57PM +0100, Rafael J. Wysocki wrote: > > On Mon, Nov 30, 2020 at 6:35 PM Laurent Pinchart wrote: > > > On Mon, Nov 30, 2020 at 05:37:52PM +0100, Rafael J. Wysocki wrote: > > > > On Fri, Nov 27, 2020 at 11:16 AM Geert Uytterhoeven wrote: > > > > > On Tue, Nov 10, 2020 at 10:29 AM Zhang Qilong <zhangqilong3@huawei.com> wrote: > > > > > > In many case, we need to check return value of pm_runtime_get_sync, but > > > > > > it brings a trouble to the usage counter processing. Many callers forget > > > > > > to decrease the usage counter when it failed, which could resulted in > > > > > > reference leak. It has been discussed a lot[0][1]. So we add a function > > > > > > to deal with the usage counter for better coding. > > > > > > > > > > > > [0]https://lkml.org/lkml/2020/6/14/88 > > > > > > [1]https://patchwork.ozlabs.org/project/linux-tegra/list/?series=178139 > > > > > > Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> > > > > > > > > > > Thanks for your patch, which is now commit dd8088d5a8969dc2 ("PM: > > > > > runtime: Add pm_runtime_resume_and_get to deal with usage counter") in > > > > > v5.10-rc5. > > > > > > > > > > > --- a/include/linux/pm_runtime.h > > > > > > +++ b/include/linux/pm_runtime.h > > > > > > @@ -386,6 +386,27 @@ static inline int pm_runtime_get_sync(struct device *dev) > > > > > > return __pm_runtime_resume(dev, RPM_GET_PUT); > > > > > > } > > > > > > > > > > > > +/** > > > > > > + * pm_runtime_resume_and_get - Bump up usage counter of a device and resume it. > > > > > > + * @dev: Target device. > > > > > > + * > > > > > > + * Resume @dev synchronously and if that is successful, increment its runtime > > > > > > + * PM usage counter. Return 0 if the runtime PM usage counter of @dev has been > > > > > > + * incremented or a negative error code otherwise. > > > > > > + */ > > > > > > +static inline int pm_runtime_resume_and_get(struct device *dev) > > > > > > > > > > Perhaps this function should be called pm_runtime_resume_and_get_sync(), > > > > > > > > No, really. > > > > > > > > I might consider calling it pm_runtime_acquire(), and adding a > > > > matching _release() as a pm_runtime_get() synonym for that matter, but > > > > not the above. > > > > > > pm_runtime_acquire() seems better to me too. Would pm_runtime_release() > > > would be an alias for pm_runtime_put() ? > > > > Yes. This covers all of the use cases relevant for drivers AFAICS. > > > > > We would also likely need a pm_runtime_release_autosuspend() too then. > > > > Why would we? > > > > > But on that topic, I was wondering, is there a reason we can't select > > > autosuspend behaviour automatically when autosuspend is enabled ? > > > > That is the case already. > > > > pm_runtime_put() will autosuspend if enabled and the usage counter is > > 0, as long as ->runtime_idle() returns 0 (or is absent). > > > > pm_runtime_put_autosuspend() is an optimization allowing > > ->runtime_idle() to be skipped entirely, but I'm wondering how many > > users really need that. > > Ah, I didn't know that, that's good to know. We then don't need > pm_runtime_release_autosuspend() (unless the optimization really makes a > big difference). > > Should I write new drievr code with pm_runtime_put() instead of > pm_runtime_put_autosuspend() ? If you don't have ->runtime_idle() in the driver (and in the bus type generally speaking, but none of them provide it IIRC), pm_runtime_put() is basically equivalent to pm_runtime_put_autosuspend() AFAICS, except for some extra checks done by the former. Otherwise it all depends on what the ->runtime_idle() callback does, but it is hard to imagine a practical use case when the difference would be really meaningful. > I haven't found clear guidelines on this in the documentation. Yes, that's one of the items I need to take care of.
diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h index 4b708f4e8eed..b492ae00cc90 100644 --- a/include/linux/pm_runtime.h +++ b/include/linux/pm_runtime.h @@ -386,6 +386,27 @@ static inline int pm_runtime_get_sync(struct device *dev) return __pm_runtime_resume(dev, RPM_GET_PUT); } +/** + * pm_runtime_resume_and_get - Bump up usage counter of a device and resume it. + * @dev: Target device. + * + * Resume @dev synchronously and if that is successful, increment its runtime + * PM usage counter. Return 0 if the runtime PM usage counter of @dev has been + * incremented or a negative error code otherwise. + */ +static inline int pm_runtime_resume_and_get(struct device *dev) +{ + int ret; + + ret = __pm_runtime_resume(dev, RPM_GET_PUT); + if (ret < 0) { + pm_runtime_put_noidle(dev); + return ret; + } + + return 0; +} + /** * pm_runtime_put - Drop device usage counter and queue up "idle check" if 0. * @dev: Target device.
In many case, we need to check return value of pm_runtime_get_sync, but it brings a trouble to the usage counter processing. Many callers forget to decrease the usage counter when it failed, which could resulted in reference leak. It has been discussed a lot[0][1]. So we add a function to deal with the usage counter for better coding. [0]https://lkml.org/lkml/2020/6/14/88 [1]https://patchwork.ozlabs.org/project/linux-tegra/list/?series=178139 Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> --- include/linux/pm_runtime.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)