mbox series

[v4,00/79] Address some issues with PM runtime at media subsystem

Message ID cover.1619621413.git.mchehab+huawei@kernel.org (mailing list archive)
Headers show
Series Address some issues with PM runtime at media subsystem | expand

Message

Mauro Carvalho Chehab April 28, 2021, 2:51 p.m. UTC
During the review of the patches from unm.edu, one of the patterns
I noticed is the amount of patches trying to fix pm_runtime_get_sync()
calls.

After analyzing the feedback from version 1 of this series, I noticed
a few other weird behaviors at the PM runtime resume code. So, this
series start addressing some bugs and issues at the current code.
Then, it gets rid of pm_runtime_get_sync() at the media subsystem
(with 2 exceptions).

It should be noticed that
Commit dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")
added a new method to does a pm_runtime get, which increments
the usage count only on success.

The rationale of getting rid of pm_runtime_get_sync() is:

1. despite its name, this is actually a PM runtime resume call,
   but some developers didn't seem to realize that, as I got this
   pattern on some drivers:

        pm_runtime_get_sync(&client->dev);
        pm_runtime_disable(&client->dev);
        pm_runtime_set_suspended(&client->dev);
        pm_runtime_put_noidle(&client->dev);

   It makes no sense to resume PM just to suspend it again ;-)

2. Usual *_get() methods only increment their use count on success,
   but pm_runtime_get_sync() increments it unconditionally. Due to
   that, several drivers were mistakenly not calling
   pm_runtime_put_noidle() when it fails;

3. The name of the new variant is a lot clearer:
	pm_runtime_resume_and_get()
    As its same clearly says that this is a PM runtime resume function,
    that also increments the usage counter on success;

4. Consistency: we did similar changes subsystem wide with
   for instance strlcpy() and strcpy() that got replaced by
   strscpy(). Having all drivers using the same known-to-be-safe
   methods is a good thing;

5. Prevent newer drivers to copy-and-paste a code that it would
   be easier to break if they don't truly understand what's behind
   the scenes.

This series replace places  pm_runtime_get_sync(), by calling
pm_runtime_resume_and_get() instead.

This should help to avoid future mistakes like that, as people
tend to use the existing drivers as examples for newer ones.

compile-tested only.

Patches 1 to 7 fix some issues that already exists at the current
PM runtime code;

patches 8 to 20 fix some usage_count problems that still exists
at the media subsystem;

patches 21 to 78 repaces pm_runtime_get_sync() by 
pm_runtime_resume_and_get();

Patch 79 (and a hunk on patch 78) documents the two exceptions
where pm_runtime_get_sync() will still be used for now.

---

v4:
    - Added a couple of additional fixes at existing PM runtime code;
    - Some patches are now more conservative in order to avoid causing
     regressions.
v3:
    - fix a compilation error;
v2:
    - addressed pointed issues and fixed a few other PM issues.


Mauro Carvalho Chehab (79):
  media: venus: fix PM runtime logic at venus_sys_error_handler()
  media: s6p_cec: decrement usage count if disabled
  media: i2c: ccs-core: return the right error code at suspend
  media: i2c: ov7740: don't resume at remove time
  media: i2c: video-i2c: don't resume at remove time
  media: i2c: imx334: fix the pm runtime get logic
  media: exynos-gsc: don't resume at remove time
  media: atmel: properly get pm_runtime
  media: marvel-ccic: fix some issues when getting pm_runtime
  media: mdk-mdp: fix pm_runtime_get_sync() usage count
  media: rcar_fdp1: fix pm_runtime_get_sync() usage count
  media: renesas-ceu: Properly check for PM errors
  media: s5p: fix pm_runtime_get_sync() usage count
  media: am437x: fix pm_runtime_get_sync() usage count
  media: sh_vou: fix pm_runtime_get_sync() usage count
  media: mtk-vcodec: fix pm_runtime_get_sync() usage count
  media: s5p-jpeg: fix pm_runtime_get_sync() usage count
  media: sti/delta: fix pm_runtime_get_sync() usage count
  media: sunxi: fix pm_runtime_get_sync() usage count
  staging: media: rkvdec: fix pm_runtime_get_sync() usage count
  staging: media: atomisp: use pm_runtime_resume_and_get()
  staging: media: imx7-mipi-csis: use pm_runtime_resume_and_get()
  staging: media: ipu3: use pm_runtime_resume_and_get()
  staging: media: cedrus_video: use pm_runtime_resume_and_get()
  staging: media: tegra-vde: use pm_runtime_resume_and_get()
  staging: media: tegra-video: use pm_runtime_resume_and_get()
  media: i2c: ak7375: use pm_runtime_resume_and_get()
  media: i2c: ccs-core: use pm_runtime_resume_and_get()
  media: i2c: dw9714: use pm_runtime_resume_and_get()
  media: i2c: dw9768: use pm_runtime_resume_and_get()
  media: i2c: dw9807-vcm: use pm_runtime_resume_and_get()
  media: i2c: hi556: use pm_runtime_resume_and_get()
  media: i2c: imx214: use pm_runtime_resume_and_get()
  media: i2c: imx219: use pm_runtime_resume_and_get()
  media: i2c: imx258: use pm_runtime_resume_and_get()
  media: i2c: imx274: use pm_runtime_resume_and_get()
  media: i2c: imx290: use pm_runtime_resume_and_get()
  media: i2c: imx319: use pm_runtime_resume_and_get()
  media: i2c: imx355: use pm_runtime_resume_and_get()
  media: i2c: mt9m001: use pm_runtime_resume_and_get()
  media: i2c: ov02a10: use pm_runtime_resume_and_get()
  media: i2c: ov13858: use pm_runtime_resume_and_get()
  media: i2c: ov2659: use pm_runtime_resume_and_get()
  media: i2c: ov2685: use pm_runtime_resume_and_get()
  media: i2c: ov2740: use pm_runtime_resume_and_get()
  media: i2c: ov5647: use pm_runtime_resume_and_get()
  media: i2c: ov5648: use pm_runtime_resume_and_get()
  media: i2c: ov5670: use pm_runtime_resume_and_get()
  media: i2c: ov5675: use pm_runtime_resume_and_get()
  media: i2c: ov5695: use pm_runtime_resume_and_get()
  media: i2c: ov7740: use pm_runtime_resume_and_get()
  media: i2c: ov8856: use pm_runtime_resume_and_get()
  media: i2c: ov8865: use pm_runtime_resume_and_get()
  media: i2c: ov9734: use pm_runtime_resume_and_get()
  media: i2c: tvp5150: use pm_runtime_resume_and_get()
  media: i2c: video-i2c: use pm_runtime_resume_and_get()
  media: rockchip/rga: use pm_runtime_resume_and_get()
  media: sti/hva: use pm_runtime_resume_and_get()
  media: sti/bdisp: use pm_runtime_resume_and_get()
  media: ipu3: use pm_runtime_resume_and_get()
  media: coda: use pm_runtime_resume_and_get()
  media: exynos4-is: use pm_runtime_resume_and_get()
  media: exynos-gsc: use pm_runtime_resume_and_get()
  media: mtk-jpeg: use pm_runtime_resume_and_get()
  media: camss: use pm_runtime_resume_and_get()
  media: venus: use pm_runtime_resume_and_get()
  media: venus: vdec: use pm_runtime_resume_and_get()
  media: venus: venc: use pm_runtime_resume_and_get()
  media: rcar-fcp: use pm_runtime_resume_and_get()
  media: rkisp1: use pm_runtime_resume_and_get()
  media: s3c-camif: use pm_runtime_resume_and_get()
  media: s5p-mfc: use pm_runtime_resume_and_get()
  media: stm32: use pm_runtime_resume_and_get()
  media: sunxi: use pm_runtime_resume_and_get()
  media: ti-vpe: use pm_runtime_resume_and_get()
  media: vsp1: use pm_runtime_resume_and_get()
  media: rcar-vin: use pm_runtime_resume_and_get()
  media: hantro: use pm_runtime_resume_and_get()
  media: hantro: do a PM resume earlier

 drivers/media/cec/platform/s5p/s5p_cec.c      |  7 +++--
 drivers/media/i2c/ak7375.c                    | 10 +------
 drivers/media/i2c/ccs/ccs-core.c              | 18 +++++-------
 drivers/media/i2c/dw9714.c                    | 10 +------
 drivers/media/i2c/dw9768.c                    | 10 +------
 drivers/media/i2c/dw9807-vcm.c                | 10 +------
 drivers/media/i2c/hi556.c                     |  3 +-
 drivers/media/i2c/imx214.c                    |  6 ++--
 drivers/media/i2c/imx219.c                    |  6 ++--
 drivers/media/i2c/imx258.c                    |  6 ++--
 drivers/media/i2c/imx274.c                    |  3 +-
 drivers/media/i2c/imx290.c                    |  6 ++--
 drivers/media/i2c/imx319.c                    |  6 ++--
 drivers/media/i2c/imx334.c                    |  7 +++--
 drivers/media/i2c/imx355.c                    |  6 ++--
 drivers/media/i2c/mt9m001.c                   |  9 ++++--
 drivers/media/i2c/ov02a10.c                   |  6 ++--
 drivers/media/i2c/ov13858.c                   |  6 ++--
 drivers/media/i2c/ov2659.c                    |  6 ++--
 drivers/media/i2c/ov2685.c                    |  7 ++---
 drivers/media/i2c/ov2740.c                    |  6 ++--
 drivers/media/i2c/ov5647.c                    |  9 +++---
 drivers/media/i2c/ov5648.c                    |  6 ++--
 drivers/media/i2c/ov5670.c                    |  6 ++--
 drivers/media/i2c/ov5675.c                    |  3 +-
 drivers/media/i2c/ov5695.c                    |  6 ++--
 drivers/media/i2c/ov7740.c                    |  8 ++---
 drivers/media/i2c/ov8856.c                    |  3 +-
 drivers/media/i2c/ov8865.c                    |  6 ++--
 drivers/media/i2c/ov9734.c                    |  3 +-
 drivers/media/i2c/tvp5150.c                   | 16 ++--------
 drivers/media/i2c/video-i2c.c                 | 14 +++------
 drivers/media/pci/intel/ipu3/ipu3-cio2-main.c |  3 +-
 drivers/media/platform/am437x/am437x-vpfe.c   | 22 ++++++++++----
 drivers/media/platform/atmel/atmel-isc-base.c | 27 ++++++++++++-----
 drivers/media/platform/atmel/atmel-isi.c      | 19 +++++++++---
 drivers/media/platform/coda/coda-common.c     |  7 +++--
 drivers/media/platform/exynos-gsc/gsc-core.c  | 11 +++----
 drivers/media/platform/exynos-gsc/gsc-m2m.c   |  2 +-
 .../media/platform/exynos4-is/fimc-capture.c  |  6 ++--
 drivers/media/platform/exynos4-is/fimc-is.c   |  4 +--
 .../platform/exynos4-is/fimc-isp-video.c      |  3 +-
 drivers/media/platform/exynos4-is/fimc-isp.c  |  7 ++---
 drivers/media/platform/exynos4-is/fimc-lite.c |  5 ++--
 drivers/media/platform/exynos4-is/fimc-m2m.c  |  2 +-
 drivers/media/platform/exynos4-is/media-dev.c |  8 ++---
 drivers/media/platform/exynos4-is/mipi-csis.c |  8 ++---
 .../media/platform/marvell-ccic/mcam-core.c   |  9 ++++--
 .../media/platform/mtk-jpeg/mtk_jpeg_core.c   |  4 +--
 drivers/media/platform/mtk-mdp/mtk_mdp_m2m.c  |  6 ++--
 .../platform/mtk-vcodec/mtk_vcodec_dec_pm.c   |  4 +--
 .../media/platform/qcom/camss/camss-csid.c    |  6 ++--
 .../media/platform/qcom/camss/camss-csiphy.c  |  6 ++--
 .../media/platform/qcom/camss/camss-ispif.c   |  6 ++--
 drivers/media/platform/qcom/camss/camss-vfe.c |  5 ++--
 drivers/media/platform/qcom/venus/core.c      | 28 +++++++++++-------
 .../media/platform/qcom/venus/pm_helpers.c    | 10 +++----
 drivers/media/platform/qcom/venus/vdec.c      |  4 +--
 drivers/media/platform/qcom/venus/venc.c      |  5 ++--
 drivers/media/platform/rcar-fcp.c             |  6 ++--
 drivers/media/platform/rcar-vin/rcar-csi2.c   | 15 ++++++++--
 drivers/media/platform/rcar-vin/rcar-dma.c    |  6 ++--
 drivers/media/platform/rcar-vin/rcar-v4l2.c   |  6 ++--
 drivers/media/platform/rcar_fdp1.c            | 12 ++++++--
 drivers/media/platform/renesas-ceu.c          |  4 +--
 drivers/media/platform/rockchip/rga/rga-buf.c |  3 +-
 drivers/media/platform/rockchip/rga/rga.c     |  4 ++-
 .../platform/rockchip/rkisp1/rkisp1-capture.c |  3 +-
 .../media/platform/s3c-camif/camif-capture.c  |  2 +-
 drivers/media/platform/s3c-camif/camif-core.c |  5 ++--
 drivers/media/platform/s5p-jpeg/jpeg-core.c   |  2 +-
 drivers/media/platform/s5p-mfc/s5p_mfc_pm.c   |  6 ++--
 drivers/media/platform/sh_vou.c               |  6 +++-
 drivers/media/platform/sti/bdisp/bdisp-v4l2.c |  7 +++--
 drivers/media/platform/sti/delta/delta-v4l2.c |  4 +--
 drivers/media/platform/sti/hva/hva-hw.c       | 17 ++++++-----
 drivers/media/platform/stm32/stm32-dcmi.c     |  5 ++--
 .../platform/sunxi/sun4i-csi/sun4i_v4l2.c     |  6 ++--
 .../sunxi/sun8i-rotate/sun8i_rotate.c         |  2 +-
 drivers/media/platform/ti-vpe/cal-video.c     |  4 ++-
 drivers/media/platform/ti-vpe/cal.c           |  8 +++--
 drivers/media/platform/ti-vpe/vpe.c           |  4 +--
 drivers/media/platform/vsp1/vsp1_drv.c        |  6 ++--
 .../staging/media/atomisp/pci/atomisp_fops.c  |  6 ++--
 drivers/staging/media/hantro/hantro_drv.c     | 29 ++++++++++++-------
 drivers/staging/media/imx/imx7-mipi-csis.c    |  7 ++---
 drivers/staging/media/ipu3/ipu3.c             |  3 +-
 drivers/staging/media/rkvdec/rkvdec.c         |  2 +-
 .../staging/media/sunxi/cedrus/cedrus_video.c |  6 ++--
 drivers/staging/media/tegra-vde/vde.c         | 19 ++++++++++--
 drivers/staging/media/tegra-video/csi.c       |  3 +-
 drivers/staging/media/tegra-video/vi.c        |  3 +-
 92 files changed, 335 insertions(+), 347 deletions(-)

Comments

Johan Hovold April 28, 2021, 3:50 p.m. UTC | #1
On Wed, Apr 28, 2021 at 04:51:21PM +0200, Mauro Carvalho Chehab wrote:
> During the review of the patches from unm.edu, one of the patterns
> I noticed is the amount of patches trying to fix pm_runtime_get_sync()
> calls.
> 
> After analyzing the feedback from version 1 of this series, I noticed
> a few other weird behaviors at the PM runtime resume code. So, this
> series start addressing some bugs and issues at the current code.
> Then, it gets rid of pm_runtime_get_sync() at the media subsystem
> (with 2 exceptions).
> 
> It should be noticed that
> Commit dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")
> added a new method to does a pm_runtime get, which increments
> the usage count only on success.
> 
> The rationale of getting rid of pm_runtime_get_sync() is:
> 
> 1. despite its name, this is actually a PM runtime resume call,
>    but some developers didn't seem to realize that, as I got this
>    pattern on some drivers:
> 
>         pm_runtime_get_sync(&client->dev);
>         pm_runtime_disable(&client->dev);
>         pm_runtime_set_suspended(&client->dev);
>         pm_runtime_put_noidle(&client->dev);
> 
>    It makes no sense to resume PM just to suspend it again ;-)

This is perfectly alright. Take a look at ov7740_remove() for example:

	pm_runtime_get_sync(&client->dev);
	pm_runtime_disable(&client->dev);
	pm_runtime_set_suspended(&client->dev);
	pm_runtime_put_noidle(&client->dev);
	
	ov7740_set_power(ov7740, 0);

There's an explicit power-off after balancing the PM count and this will
work regardless of the power state when entering this function.

So this has nothing to do with pm_runtime_get_sync() per se.

> 2. Usual *_get() methods only increment their use count on success,
>    but pm_runtime_get_sync() increments it unconditionally. Due to
>    that, several drivers were mistakenly not calling
>    pm_runtime_put_noidle() when it fails;

Sure, but pm_runtime_get_async() also works this way. You just won't be
notified if the async resume fails.

> 3. The name of the new variant is a lot clearer:
> 	pm_runtime_resume_and_get()
>     As its same clearly says that this is a PM runtime resume function,
>     that also increments the usage counter on success;

It also introduced an inconsistency in the API and does not pair as well
with the pm_runtime_put variants.

> 4. Consistency: we did similar changes subsystem wide with
>    for instance strlcpy() and strcpy() that got replaced by
>    strscpy(). Having all drivers using the same known-to-be-safe
>    methods is a good thing;

It's not known to be safe; there are ways to get also this interface
wrong as for example this series has shown.

> 5. Prevent newer drivers to copy-and-paste a code that it would
>    be easier to break if they don't truly understand what's behind
>    the scenes.

Cargo-cult programming always runs that risk.

> This series replace places  pm_runtime_get_sync(), by calling
> pm_runtime_resume_and_get() instead.
> 
> This should help to avoid future mistakes like that, as people
> tend to use the existing drivers as examples for newer ones.

The only valid point about and use for pm_runtime_resume_and_get() is to
avoid leaking a PM usage count reference in the unlikely case that
resume fails (something which hardly any driver implements recovery
from anyway).

It's a convenience wrapper that saves you from writing one extra line in
some cases (depending on how you implement runtime-pm support) and not a
silver bullet against bugs.
 
> compile-tested only.
> 
> Patches 1 to 7 fix some issues that already exists at the current
> PM runtime code;
> 
> patches 8 to 20 fix some usage_count problems that still exists
> at the media subsystem;
> 
> patches 21 to 78 repaces pm_runtime_get_sync() by 
> pm_runtime_resume_and_get();
> 
> Patch 79 (and a hunk on patch 78) documents the two exceptions
> where pm_runtime_get_sync() will still be used for now.
> 
> ---
> 
> v4:
>     - Added a couple of additional fixes at existing PM runtime code;
>     - Some patches are now more conservative in order to avoid causing
>      regressions.
> v3:
>     - fix a compilation error;
> v2:
>     - addressed pointed issues and fixed a few other PM issues.

This really doesn't say much more than "changed stuff" so kinda hard to
track if review feedback has been taken into account for example.

Johan
Mauro Carvalho Chehab April 29, 2021, 10:18 a.m. UTC | #2
Em Wed, 28 Apr 2021 17:50:08 +0200
Johan Hovold <johan@kernel.org> escreveu:

> On Wed, Apr 28, 2021 at 04:51:21PM +0200, Mauro Carvalho Chehab wrote:

> > 1. despite its name, this is actually a PM runtime resume call,
> >    but some developers didn't seem to realize that, as I got this
> >    pattern on some drivers:
> > 
> >         pm_runtime_get_sync(&client->dev);
> >         pm_runtime_disable(&client->dev);
> >         pm_runtime_set_suspended(&client->dev);
> >         pm_runtime_put_noidle(&client->dev);
> > 
> >    It makes no sense to resume PM just to suspend it again ;-)  
> 
> This is perfectly alright. Take a look at ov7740_remove() for example:
> 
> 	pm_runtime_get_sync(&client->dev);
> 	pm_runtime_disable(&client->dev);
> 	pm_runtime_set_suspended(&client->dev);
> 	pm_runtime_put_noidle(&client->dev);
> 	
> 	ov7740_set_power(ov7740, 0);
> 
> There's an explicit power-off after balancing the PM count and this will
> work regardless of the power state when entering this function.

Ok, but this should equally work:

 	pm_runtime_disable(&client->dev);
 	pm_runtime_set_suspended(&client->dev);
 	
 	ov7740_set_power(ov7740, 0);

as there's no additional cleanup made on this particular driver
between pm_runtime_get_sync() and pm_runtime_put_noidle().

> So this has nothing to do with pm_runtime_get_sync() per se.

Yes, but some patches on this series are cleaning up the driver release
logic.

> 
> > 2. Usual *_get() methods only increment their use count on success,
> >    but pm_runtime_get_sync() increments it unconditionally. Due to
> >    that, several drivers were mistakenly not calling
> >    pm_runtime_put_noidle() when it fails;  
> 
> Sure, but pm_runtime_get_async() also works this way. You just won't be
> notified if the async resume fails.

Granted, it makes sense along the pm_runtime kAPI.

It is inconsistent with the behavior of kobject_get*() and other
*_get*() methods that are based or inspired on it, as, on those, the
operations are atomic: either everything succeeds and it doesn't return
an error, or the usage counter is not incremented and the object
state doesn't change after the call.

> > 3. The name of the new variant is a lot clearer:
> > 	pm_runtime_resume_and_get()
> >     As its same clearly says that this is a PM runtime resume function,
> >     that also increments the usage counter on success;  
> 
> It also introduced an inconsistency in the API and does not pair as well
> with the pm_runtime_put variants.

Agreed. A name that would be more consistent with PM runtime would
probably be:

	pm_runtime_resume_if_get()

as there are already:

	pm_runtime_get_if_in_use()
	pm_runtime_get_if_active()

But any such discussions are out of the scope of this patchset ;-)

> 
> > 4. Consistency: we did similar changes subsystem wide with
> >    for instance strlcpy() and strcpy() that got replaced by
> >    strscpy(). Having all drivers using the same known-to-be-safe
> >    methods is a good thing;  
> 
> It's not known to be safe; there are ways to get also this interface
> wrong as for example this series has shown.

Very true. Yet, it is a lot simpler to use functions that won't change
the state of the objects when returning an error, as this is by far
the most common pattern within the Kernel.

Human brains are trained to identify certain patterns. When there's
something using a similar pattern, but with a different behavior, 
our brains are more subject to fail identifying problems.

> > 5. Prevent newer drivers to copy-and-paste a code that it would
> >    be easier to break if they don't truly understand what's behind
> >    the scenes.  
> 
> Cargo-cult programming always runs that risk.

True.

> > This series replace places  pm_runtime_get_sync(), by calling
> > pm_runtime_resume_and_get() instead.
> > 
> > This should help to avoid future mistakes like that, as people
> > tend to use the existing drivers as examples for newer ones.  
> 
> The only valid point about and use for pm_runtime_resume_and_get() is to
> avoid leaking a PM usage count reference in the unlikely case that
> resume fails (something which hardly any driver implements recovery
> from anyway).
> 
> It's a convenience wrapper that saves you from writing one extra line in
> some cases (depending on how you implement runtime-pm support) and not a
> silver bullet against bugs.
>  
> > compile-tested only.
> > 
> > Patches 1 to 7 fix some issues that already exists at the current
> > PM runtime code;
> > 
> > patches 8 to 20 fix some usage_count problems that still exists
> > at the media subsystem;
> > 
> > patches 21 to 78 repaces pm_runtime_get_sync() by 
> > pm_runtime_resume_and_get();
> > 
> > Patch 79 (and a hunk on patch 78) documents the two exceptions
> > where pm_runtime_get_sync() will still be used for now.
> > 
> > ---
> > 
> > v4:
> >     - Added a couple of additional fixes at existing PM runtime code;
> >     - Some patches are now more conservative in order to avoid causing
> >      regressions.
> > v3:
> >     - fix a compilation error;
> > v2:
> >     - addressed pointed issues and fixed a few other PM issues.  
> 
> This really doesn't say much more than "changed stuff" so kinda hard to
> track if review feedback has been taken into account for example.

I addressed all review feedback I got (as far as I'm aware), and added
all received reviewed-by/acked-by.

Yeah, I could have written a more comprehensive changes description
there.

Thanks,
Mauro
Dmitry Osipenko April 29, 2021, 12:33 p.m. UTC | #3
29.04.2021 13:18, Mauro Carvalho Chehab пишет:
>> This is perfectly alright. Take a look at ov7740_remove() for example:
>>
>> 	pm_runtime_get_sync(&client->dev);
>> 	pm_runtime_disable(&client->dev);
>> 	pm_runtime_set_suspended(&client->dev);
>> 	pm_runtime_put_noidle(&client->dev);
>> 	
>> 	ov7740_set_power(ov7740, 0);
>>
>> There's an explicit power-off after balancing the PM count and this will
>> work regardless of the power state when entering this function.
> Ok, but this should equally work:
> 
>  	pm_runtime_disable(&client->dev);
>  	pm_runtime_set_suspended(&client->dev);
>  	
>  	ov7740_set_power(ov7740, 0);
> 
> as there's no additional cleanup made on this particular driver
> between pm_runtime_get_sync() and pm_runtime_put_noidle().
> 

The pm_runtime_get_sync() turns hardware ON by invoking
ov7740_set_power(ov7740, 1), and thus, the ON->OFF is kept balanced in
both RPM-available and RPM-unavailable cases. The RPM state of device
should be reset after driver removal.

It doesn't look like any additional cleanups are needed by that ov7740
driver. The driver removal is opposite to the probe, hence it should be
correct as-is.
Johan Hovold April 29, 2021, 3:17 p.m. UTC | #4
On Thu, Apr 29, 2021 at 12:18:16PM +0200, Mauro Carvalho Chehab wrote:
> Em Wed, 28 Apr 2021 17:50:08 +0200
> Johan Hovold <johan@kernel.org> escreveu:
> 
> > On Wed, Apr 28, 2021 at 04:51:21PM +0200, Mauro Carvalho Chehab wrote:
> 
> > > 1. despite its name, this is actually a PM runtime resume call,
> > >    but some developers didn't seem to realize that, as I got this
> > >    pattern on some drivers:
> > > 
> > >         pm_runtime_get_sync(&client->dev);
> > >         pm_runtime_disable(&client->dev);
> > >         pm_runtime_set_suspended(&client->dev);
> > >         pm_runtime_put_noidle(&client->dev);
> > > 
> > >    It makes no sense to resume PM just to suspend it again ;-)  
> > 
> > This is perfectly alright. Take a look at ov7740_remove() for example:
> > 
> > 	pm_runtime_get_sync(&client->dev);
> > 	pm_runtime_disable(&client->dev);
> > 	pm_runtime_set_suspended(&client->dev);
> > 	pm_runtime_put_noidle(&client->dev);
> > 	
> > 	ov7740_set_power(ov7740, 0);
> > 
> > There's an explicit power-off after balancing the PM count and this will
> > work regardless of the power state when entering this function.
> 
> Ok, but this should equally work:
> 
>  	pm_runtime_disable(&client->dev);
>  	pm_runtime_set_suspended(&client->dev);
>  	
>  	ov7740_set_power(ov7740, 0);
> 
> as there's no additional cleanup made on this particular driver
> between pm_runtime_get_sync() and pm_runtime_put_noidle().

No, that would break the driver as I pointed out to you yesterday:

	https://lore.kernel.org/r/YImG1klSPkFSaS3a@hovoldconsulting.com

If the device is already suspended when remove is called then you'll
end up with an unbalanced call to ov7740_set_power() that will try to
disable an already disabled clock.

> > So this has nothing to do with pm_runtime_get_sync() per se.
> 
> Yes, but some patches on this series are cleaning up the driver release
> logic.

You mentioned this example as an argument against using
pm_runtime_get_sync(), which I don't think makes sense.

> > > 2. Usual *_get() methods only increment their use count on success,
> > >    but pm_runtime_get_sync() increments it unconditionally. Due to
> > >    that, several drivers were mistakenly not calling
> > >    pm_runtime_put_noidle() when it fails;  
> > 
> > Sure, but pm_runtime_get_async() also works this way. You just won't be
> > notified if the async resume fails.
> 
> Granted, it makes sense along the pm_runtime kAPI.
> 
> It is inconsistent with the behavior of kobject_get*() and other
> *_get*() methods that are based or inspired on it, as, on those, the
> operations are atomic: either everything succeeds and it doesn't return
> an error, or the usage counter is not incremented and the object
> state doesn't change after the call.

Right, and I'm aware that some people have overlooked this. But its not
the end of the world since hardly any driver can handle resume failures
properly anyway. 

This is mostly just an exercise to shut up static checkers.

> > > 3. The name of the new variant is a lot clearer:
> > > 	pm_runtime_resume_and_get()
> > >     As its same clearly says that this is a PM runtime resume function,
> > >     that also increments the usage counter on success;  
> > 
> > It also introduced an inconsistency in the API and does not pair as well
> > with the pm_runtime_put variants.
> 
> Agreed. A name that would be more consistent with PM runtime would
> probably be:
> 
> 	pm_runtime_resume_if_get()

Naw, since the get part always succeeds.

It should start with pm_runtime_get, but pm_runtime_get_sync() is
unfortunately taken.

> as there are already:
> 
> 	pm_runtime_get_if_in_use()
> 	pm_runtime_get_if_active()
> 
> But any such discussions are out of the scope of this patchset ;-)

Right.

> > > 4. Consistency: we did similar changes subsystem wide with
> > >    for instance strlcpy() and strcpy() that got replaced by
> > >    strscpy(). Having all drivers using the same known-to-be-safe
> > >    methods is a good thing;  
> > 
> > It's not known to be safe; there are ways to get also this interface
> > wrong as for example this series has shown.
> 
> Very true. Yet, it is a lot simpler to use functions that won't change
> the state of the objects when returning an error, as this is by far
> the most common pattern within the Kernel.

A resume failure does change the state (and needs to be recovered from),
but I get what you're saying.

> Human brains are trained to identify certain patterns. When there's
> something using a similar pattern, but with a different behavior, 
> our brains are more subject to fail identifying problems.

Sure. But I'm not sure that having two interfaces with different
semantics to do the job is doing us any favours here. But again, that
discussion has already been had.

And I realise that this is partly also your motive here (even if the old
interface isn't going to go away).

> > > compile-tested only.
> > > Patches 1 to 7 fix some issues that already exists at the current
> > > PM runtime code;
> > > 
> > > patches 8 to 20 fix some usage_count problems that still exists
> > > at the media subsystem;
> > > 
> > > patches 21 to 78 repaces pm_runtime_get_sync() by 
> > > pm_runtime_resume_and_get();
> > > 
> > > Patch 79 (and a hunk on patch 78) documents the two exceptions
> > > where pm_runtime_get_sync() will still be used for now.

80 patches in one series (posted to lkml) is a bit excessive. Perhaps
you can break it up in a fixes part and one or more cleanups parts?

Johan