Message ID | 20250215130849.227812-1-claudiu.beznea.uj@bp.renesas.com (mailing list archive) |
---|---|
State | Under Review |
Delegated to: | Geert Uytterhoeven |
Headers | show |
Series | driver core: platform: Use devres group to free driver probe resources | expand |
On Sat, Feb 15, 2025 at 03:08:49PM +0200, Claudiu wrote: > From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> > > On the Renesas RZ/G3S (and other Renesas SoCs, e.g., RZ/G2{L, LC, UL}), > clocks are managed through PM domains. These PM domains, registered on > behalf of the clock controller driver, are configured with > GENPD_FLAG_PM_CLK. In most of the Renesas drivers used by RZ SoCs, the > clocks are enabled/disabled using runtime PM APIs. The power domains may > also have power_on/power_off support implemented. After the device PM > domain is powered off any CPU accesses to these domains leads to system > aborts. > > During probe, devices are attached to the PM domain controlling their > clocks and power. Similarly, during removal, devices are detached from the > PM domain. > > The detachment call stack is as follows: > > device_driver_detach() -> > device_release_driver_internal() -> > __device_release_driver() -> > device_remove() -> > platform_remove() -> > dev_pm_domain_detach() > > During driver unbind, after the device is detached from its PM domain, > the device_unbind_cleanup() function is called, which subsequently invokes > devres_release_all(). This function handles devres resource cleanup. > > If runtime PM is enabled in driver probe via devm_pm_runtime_enable(), the > cleanup process triggers the action or reset function for disabling runtime > PM. This function is pm_runtime_disable_action(), which leads to the > following call stack of interest when called: > > pm_runtime_disable_action() -> > pm_runtime_dont_use_autosuspend() -> > __pm_runtime_use_autosuspend() -> > update_autosuspend() -> > rpm_idle() > > The rpm_idle() function attempts to resume the device at runtime. However, > at the point it is called, the device is no longer part of a PM domain > (which manages clocks and power states). If the driver implements its own > runtime PM APIs for specific functionalities - such as the rzg2l_adc > driver - while also relying on the power domain subsystem for power > management, rpm_idle() will invoke the driver's runtime PM API. However, > since the device is no longer part of a PM domain at this point, the PM > domain's runtime PM APIs will not be called. This leads to system aborts on > Renesas SoCs. > > Another identified case is when a subsystem performs various cleanups > using device_unbind_cleanup(), calling driver-specific APIs in the process. > A known example is the thermal subsystem, which may call driver-specific > APIs to disable the thermal device. The relevant call stack in this case > is: > > device_driver_detach() -> > device_release_driver_internal() -> > device_unbind_cleanup() -> > devres_release_all() -> > devm_thermal_of_zone_release() -> > thermal_zone_device_disable() -> > thermal_zone_device_set_mode() -> > struct thermal_zone_device_ops::change_mode() > > At the moment the driver-specific change_mode() API is called, the device > is no longer part of its PM domain. Accessing its registers without proper > power management leads to system aborts. > > Open a devres group before calling the driver probe, and close it > immediately after the driver remove function is called and before > dev_pm_domain_detach(). This ensures that driver-specific devm actions or > reset functions are executed immediately after the driver remove function > completes. Additionally, it prevents driver-specific runtime PM APIs from > being called when the device is no longer part of its power domain. > > Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> > --- > > Hi, > > Although Ulf gave its green light for the approaches on both IIO [1], > [2] and thermal subsystems [3], Jonathan considered unacceptable the > approaches in [1], [2] as he considered it may lead to dificult to > maintain code and code opened to subtle bugs (due to the potential of > mixing devres and non-devres calls). He pointed out a similar approach > that was done for the I2C bus [4], [5]. > > As the discussions in [1], [2] stopped w/o a clear conclusion, this > patch tries to revive it by proposing a similar approach that was done > for the I2C bus. > > Please let me know you input. I'm with Jonathan here, the devres stuff is getting crazy here and you have drivers mixing them and side affects happening and lots of confusion. Your change here is only going to make it even more confusing, and shouldn't actually solve it for other busses (i.e. what about iio devices NOT on the platform bus?) Why can't your individual driver handle this instead? thanks, greg k-h
Hi, Greg, On 15.02.2025 15:25, Greg KH wrote: > On Sat, Feb 15, 2025 at 03:08:49PM +0200, Claudiu wrote: >> From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> >> >> On the Renesas RZ/G3S (and other Renesas SoCs, e.g., RZ/G2{L, LC, UL}), >> clocks are managed through PM domains. These PM domains, registered on >> behalf of the clock controller driver, are configured with >> GENPD_FLAG_PM_CLK. In most of the Renesas drivers used by RZ SoCs, the >> clocks are enabled/disabled using runtime PM APIs. The power domains may >> also have power_on/power_off support implemented. After the device PM >> domain is powered off any CPU accesses to these domains leads to system >> aborts. >> >> During probe, devices are attached to the PM domain controlling their >> clocks and power. Similarly, during removal, devices are detached from the >> PM domain. >> >> The detachment call stack is as follows: >> >> device_driver_detach() -> >> device_release_driver_internal() -> >> __device_release_driver() -> >> device_remove() -> >> platform_remove() -> >> dev_pm_domain_detach() >> >> During driver unbind, after the device is detached from its PM domain, >> the device_unbind_cleanup() function is called, which subsequently invokes >> devres_release_all(). This function handles devres resource cleanup. >> >> If runtime PM is enabled in driver probe via devm_pm_runtime_enable(), the >> cleanup process triggers the action or reset function for disabling runtime >> PM. This function is pm_runtime_disable_action(), which leads to the >> following call stack of interest when called: >> >> pm_runtime_disable_action() -> >> pm_runtime_dont_use_autosuspend() -> >> __pm_runtime_use_autosuspend() -> >> update_autosuspend() -> >> rpm_idle() >> >> The rpm_idle() function attempts to resume the device at runtime. However, >> at the point it is called, the device is no longer part of a PM domain >> (which manages clocks and power states). If the driver implements its own >> runtime PM APIs for specific functionalities - such as the rzg2l_adc >> driver - while also relying on the power domain subsystem for power >> management, rpm_idle() will invoke the driver's runtime PM API. However, >> since the device is no longer part of a PM domain at this point, the PM >> domain's runtime PM APIs will not be called. This leads to system aborts on >> Renesas SoCs. >> >> Another identified case is when a subsystem performs various cleanups >> using device_unbind_cleanup(), calling driver-specific APIs in the process. >> A known example is the thermal subsystem, which may call driver-specific >> APIs to disable the thermal device. The relevant call stack in this case >> is: >> >> device_driver_detach() -> >> device_release_driver_internal() -> >> device_unbind_cleanup() -> >> devres_release_all() -> >> devm_thermal_of_zone_release() -> >> thermal_zone_device_disable() -> >> thermal_zone_device_set_mode() -> >> struct thermal_zone_device_ops::change_mode() >> >> At the moment the driver-specific change_mode() API is called, the device >> is no longer part of its PM domain. Accessing its registers without proper >> power management leads to system aborts. >> >> Open a devres group before calling the driver probe, and close it >> immediately after the driver remove function is called and before >> dev_pm_domain_detach(). This ensures that driver-specific devm actions or >> reset functions are executed immediately after the driver remove function >> completes. Additionally, it prevents driver-specific runtime PM APIs from >> being called when the device is no longer part of its power domain. >> >> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> >> --- >> >> Hi, >> >> Although Ulf gave its green light for the approaches on both IIO [1], >> [2] and thermal subsystems [3], Jonathan considered unacceptable the >> approaches in [1], [2] as he considered it may lead to dificult to >> maintain code and code opened to subtle bugs (due to the potential of >> mixing devres and non-devres calls). He pointed out a similar approach >> that was done for the I2C bus [4], [5]. >> >> As the discussions in [1], [2] stopped w/o a clear conclusion, this >> patch tries to revive it by proposing a similar approach that was done >> for the I2C bus. >> >> Please let me know you input. > > I'm with Jonathan here, the devres stuff is getting crazy here and you > have drivers mixing them and side affects happening and lots of > confusion. Your change here is only going to make it even more > confusing, and shouldn't actually solve it for other busses (i.e. what > about iio devices NOT on the platform bus?) You're right, other busses will still have this problem. > > Why can't your individual driver handle this instead? Initially I tried it at the driver level by using non-devres PM runtime enable API but wasn't considered OK by all parties. I haven't thought about having devres_open_group()/devres_close_group() in the driver itself but it should work. Thank you, Claudiu > > thanks, > > greg k-h
Hi, Daniel, Jonathan, On 15.02.2025 15:51, Claudiu Beznea wrote: > Hi, Greg, > > On 15.02.2025 15:25, Greg KH wrote: >> On Sat, Feb 15, 2025 at 03:08:49PM +0200, Claudiu wrote: >>> From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> >>> >>> On the Renesas RZ/G3S (and other Renesas SoCs, e.g., RZ/G2{L, LC, UL}), >>> clocks are managed through PM domains. These PM domains, registered on >>> behalf of the clock controller driver, are configured with >>> GENPD_FLAG_PM_CLK. In most of the Renesas drivers used by RZ SoCs, the >>> clocks are enabled/disabled using runtime PM APIs. The power domains may >>> also have power_on/power_off support implemented. After the device PM >>> domain is powered off any CPU accesses to these domains leads to system >>> aborts. >>> >>> During probe, devices are attached to the PM domain controlling their >>> clocks and power. Similarly, during removal, devices are detached from the >>> PM domain. >>> >>> The detachment call stack is as follows: >>> >>> device_driver_detach() -> >>> device_release_driver_internal() -> >>> __device_release_driver() -> >>> device_remove() -> >>> platform_remove() -> >>> dev_pm_domain_detach() >>> >>> During driver unbind, after the device is detached from its PM domain, >>> the device_unbind_cleanup() function is called, which subsequently invokes >>> devres_release_all(). This function handles devres resource cleanup. >>> >>> If runtime PM is enabled in driver probe via devm_pm_runtime_enable(), the >>> cleanup process triggers the action or reset function for disabling runtime >>> PM. This function is pm_runtime_disable_action(), which leads to the >>> following call stack of interest when called: >>> >>> pm_runtime_disable_action() -> >>> pm_runtime_dont_use_autosuspend() -> >>> __pm_runtime_use_autosuspend() -> >>> update_autosuspend() -> >>> rpm_idle() >>> >>> The rpm_idle() function attempts to resume the device at runtime. However, >>> at the point it is called, the device is no longer part of a PM domain >>> (which manages clocks and power states). If the driver implements its own >>> runtime PM APIs for specific functionalities - such as the rzg2l_adc >>> driver - while also relying on the power domain subsystem for power >>> management, rpm_idle() will invoke the driver's runtime PM API. However, >>> since the device is no longer part of a PM domain at this point, the PM >>> domain's runtime PM APIs will not be called. This leads to system aborts on >>> Renesas SoCs. >>> >>> Another identified case is when a subsystem performs various cleanups >>> using device_unbind_cleanup(), calling driver-specific APIs in the process. >>> A known example is the thermal subsystem, which may call driver-specific >>> APIs to disable the thermal device. The relevant call stack in this case >>> is: >>> >>> device_driver_detach() -> >>> device_release_driver_internal() -> >>> device_unbind_cleanup() -> >>> devres_release_all() -> >>> devm_thermal_of_zone_release() -> >>> thermal_zone_device_disable() -> >>> thermal_zone_device_set_mode() -> >>> struct thermal_zone_device_ops::change_mode() >>> >>> At the moment the driver-specific change_mode() API is called, the device >>> is no longer part of its PM domain. Accessing its registers without proper >>> power management leads to system aborts. >>> >>> Open a devres group before calling the driver probe, and close it >>> immediately after the driver remove function is called and before >>> dev_pm_domain_detach(). This ensures that driver-specific devm actions or >>> reset functions are executed immediately after the driver remove function >>> completes. Additionally, it prevents driver-specific runtime PM APIs from >>> being called when the device is no longer part of its power domain. >>> >>> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> >>> --- >>> >>> Hi, >>> >>> Although Ulf gave its green light for the approaches on both IIO [1], >>> [2] and thermal subsystems [3], Jonathan considered unacceptable the >>> approaches in [1], [2] as he considered it may lead to dificult to >>> maintain code and code opened to subtle bugs (due to the potential of >>> mixing devres and non-devres calls). He pointed out a similar approach >>> that was done for the I2C bus [4], [5]. >>> >>> As the discussions in [1], [2] stopped w/o a clear conclusion, this >>> patch tries to revive it by proposing a similar approach that was done >>> for the I2C bus. >>> >>> Please let me know you input. >> >> I'm with Jonathan here, the devres stuff is getting crazy here and you >> have drivers mixing them and side affects happening and lots of >> confusion. Your change here is only going to make it even more >> confusing, and shouldn't actually solve it for other busses (i.e. what >> about iio devices NOT on the platform bus?) > > You're right, other busses will still have this problem. > >> >> Why can't your individual driver handle this instead? > > Initially I tried it at the driver level by using non-devres PM runtime > enable API but wasn't considered OK by all parties. > > I haven't thought about having devres_open_group()/devres_close_group() in > the driver itself but it should work. Are you OK with having the devres_open_group()/devres_close_group() in the currently known affected drivers (drivers/iio/adc/rzg2l_adc.c and the proposed drivers/thermal/renesas/rzg3s_thermal.c [1]) ? Thank you, Claudiu [1] https://lore.kernel.org/all/20250103163805.1775705-5-claudiu.beznea.uj@bp.renesas.com > > Thank you, > Claudiu > >> >> thanks, >> >> greg k-h >
diff --git a/drivers/base/platform.c b/drivers/base/platform.c index 6f2a33722c52..1b64c4a44263 100644 --- a/drivers/base/platform.c +++ b/drivers/base/platform.c @@ -1401,9 +1401,15 @@ static int platform_probe(struct device *_dev) goto out; if (drv->probe) { + dev->devres_group_id = devres_open_group(&dev->dev, NULL, GFP_KERNEL); + if (!dev->devres_group_id) + return -ENOMEM; + ret = drv->probe(dev); - if (ret) + if (ret) { + devres_close_group(&dev->dev, dev->devres_group_id); dev_pm_domain_detach(_dev, true); + } } out: @@ -1422,6 +1428,8 @@ static void platform_remove(struct device *_dev) if (drv->remove) drv->remove(dev); + if (dev->devres_group_id) + devres_release_group(&dev->dev, dev->devres_group_id); dev_pm_domain_detach(_dev, true); } diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h index 074754c23d33..e842ad243bef 100644 --- a/include/linux/platform_device.h +++ b/include/linux/platform_device.h @@ -40,6 +40,9 @@ struct platform_device { /* MFD cell pointer */ struct mfd_cell *mfd_cell; + /* ID of the probe devres group. */ + void *devres_group_id; + /* arch specific additions */ struct pdev_archdata archdata; };