Message ID | 1610068562-4410-1-git-send-email-collinsd@codeaurora.org (mailing list archive) |
---|---|
State | Accepted |
Commit | eaa7995c529b54d68d97a30f6344cc6ca2f214a7 |
Headers | show |
Series | regulator: core: avoid regulator_resolve_supply() race condition | expand |
On Thu, 7 Jan 2021 17:16:02 -0800, David Collins wrote: > The final step in regulator_register() is to call > regulator_resolve_supply() for each registered regulator > (including the one in the process of being registered). The > regulator_resolve_supply() function first checks if rdev->supply > is NULL, then it performs various steps to try to find the supply. > If successful, rdev->supply is set inside of set_supply(). > > [...] Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git for-next Thanks! [1/1] regulator: core: avoid regulator_resolve_supply() race condition commit: eaa7995c529b54d68d97a30f6344cc6ca2f214a7 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Hi, On 08.01.2021 02:16, David Collins wrote: > The final step in regulator_register() is to call > regulator_resolve_supply() for each registered regulator > (including the one in the process of being registered). The > regulator_resolve_supply() function first checks if rdev->supply > is NULL, then it performs various steps to try to find the supply. > If successful, rdev->supply is set inside of set_supply(). > > This procedure can encounter a race condition if two concurrent > tasks call regulator_register() near to each other on separate CPUs > and one of the regulators has rdev->supply_name specified. There > is currently nothing guaranteeing atomicity between the rdev->supply > check and set steps. Thus, both tasks can observe rdev->supply==NULL > in their regulator_resolve_supply() calls. This then results in > both creating a struct regulator for the supply. One ends up > actually stored in rdev->supply and the other is lost (though still > present in the supply's consumer_list). > > Here is a kernel log snippet showing the issue: > > [ 12.421768] gpu_cc_gx_gdsc: supplied by pm8350_s5_level > [ 12.425854] gpu_cc_gx_gdsc: supplied by pm8350_s5_level > [ 12.429064] debugfs: Directory 'regulator.4-SUPPLY' with parent > '17a00000.rsc:rpmh-regulator-gfxlvl-pm8350_s5_level' > already present! > > Avoid this race condition by holding the rdev->mutex lock inside > of regulator_resolve_supply() while checking and setting > rdev->supply. > > Signed-off-by: David Collins <collinsd@codeaurora.org> This patch landed in linux next-20210112 as commit eaa7995c529b ("regulator: core: avoid regulator_resolve_supply() race condition"). I found that it triggers a following lockdep warning during the DWC3 driver registration on some Exynos based boards (this log is from Samsung Exynos5420-based Peach-Pit board): ====================================================== WARNING: possible circular locking dependency detected 5.11.0-rc1-00008-geaa7995c529b #10095 Not tainted ------------------------------------------------------ swapper/0/1 is trying to acquire lock: c12e1b80 (regulator_list_mutex){+.+.}-{3:3}, at: regulator_lock_dependent+0x4c/0x2b0 but task is already holding lock: df7190c0 (regulator_ww_class_mutex){+.+.}-{3:3}, at: regulator_resolve_supply+0x44/0x318 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (regulator_ww_class_mutex){+.+.}-{3:3}: ww_mutex_lock+0x48/0x88 regulator_lock_recursive+0x84/0x1f4 regulator_lock_dependent+0x184/0x2b0 regulator_enable+0x30/0xe4 dwc3_exynos_probe+0x17c/0x2c0 platform_probe+0x80/0xc0 really_probe+0x1c4/0x4e4 driver_probe_device+0x78/0x1d8 device_driver_attach+0x58/0x60 __driver_attach+0xfc/0x160 bus_for_each_dev+0x6c/0xb8 bus_add_driver+0x170/0x20c driver_register+0x78/0x10c do_one_initcall+0x88/0x438 kernel_init_freeable+0x18c/0x1dc kernel_init+0x8/0x118 ret_from_fork+0x14/0x38 0x0 -> #1 (regulator_ww_class_acquire){+.+.}-{0:0}: regulator_enable+0x30/0xe4 dwc3_exynos_probe+0x17c/0x2c0 platform_probe+0x80/0xc0 really_probe+0x1c4/0x4e4 driver_probe_device+0x78/0x1d8 device_driver_attach+0x58/0x60 __driver_attach+0xfc/0x160 bus_for_each_dev+0x6c/0xb8 bus_add_driver+0x170/0x20c driver_register+0x78/0x10c do_one_initcall+0x88/0x438 kernel_init_freeable+0x18c/0x1dc kernel_init+0x8/0x118 ret_from_fork+0x14/0x38 0x0 -> #0 (regulator_list_mutex){+.+.}-{3:3}: lock_acquire+0x2e4/0x5dc __mutex_lock+0xa4/0xb60 mutex_lock_nested+0x1c/0x24 regulator_lock_dependent+0x4c/0x2b0 regulator_enable+0x30/0xe4 regulator_resolve_supply+0x1cc/0x318 regulator_register_resolve_supply+0x14/0x78 class_for_each_device+0x68/0xe8 regulator_register+0xa2c/0xc9c devm_regulator_register+0x40/0x70 tps65090_regulator_probe+0x150/0x648 platform_probe+0x80/0xc0 really_probe+0x1c4/0x4e4 driver_probe_device+0x78/0x1d8 bus_for_each_drv+0x78/0xbc __device_attach+0xe8/0x180 bus_probe_device+0x88/0x90 device_add+0x4c4/0x7e8 platform_device_add+0x120/0x25c mfd_add_devices+0x580/0x60c tps65090_i2c_probe+0xb8/0x184 i2c_device_probe+0x234/0x2a4 really_probe+0x1c4/0x4e4 driver_probe_device+0x78/0x1d8 bus_for_each_drv+0x78/0xbc __device_attach+0xe8/0x180 bus_probe_device+0x88/0x90 device_add+0x4c4/0x7e8 i2c_new_client_device+0x15c/0x27c of_i2c_register_devices+0x114/0x184 i2c_register_adapter+0x1d8/0x6dc ec_i2c_probe+0xc8/0x124 platform_probe+0x80/0xc0 really_probe+0x1c4/0x4e4 driver_probe_device+0x78/0x1d8 bus_for_each_drv+0x78/0xbc __device_attach+0xe8/0x180 bus_probe_device+0x88/0x90 device_add+0x4c4/0x7e8 of_platform_device_create_pdata+0x90/0xc8 of_platform_bus_create+0x1a0/0x4ec of_platform_populate+0x88/0x120 devm_of_platform_populate+0x40/0x80 cros_ec_register+0x174/0x308 cros_ec_spi_probe+0x16c/0x1ec spi_probe+0x88/0xac really_probe+0x1c4/0x4e4 driver_probe_device+0x78/0x1d8 device_driver_attach+0x58/0x60 __driver_attach+0xfc/0x160 bus_for_each_dev+0x6c/0xb8 bus_add_driver+0x170/0x20c driver_register+0x78/0x10c do_one_initcall+0x88/0x438 kernel_init_freeable+0x18c/0x1dc kernel_init+0x8/0x118 ret_from_fork+0x14/0x38 0x0 other info that might help us debug this: Chain exists of: regulator_list_mutex --> regulator_ww_class_acquire --> regulator_ww_class_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(regulator_ww_class_mutex); lock(regulator_ww_class_acquire); lock(regulator_ww_class_mutex); lock(regulator_list_mutex); *** DEADLOCK *** 5 locks held by swapper/0/1: #0: dfb6e4c8 (&dev->mutex){....}-{3:3}, at: device_driver_attach+0x18/0x60 #1: c1fedcd8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x34/0x180 #2: df53a4e8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x34/0x180 #3: df5224d8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x34/0x180 #4: df7190c0 (regulator_ww_class_mutex){+.+.}-{3:3}, at: regulator_resolve_supply+0x44/0x318 stack backtrace: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc1-00008-geaa7995c529b #10095 Hardware name: Samsung Exynos (Flattened Device Tree) [<c01116e8>] (unwind_backtrace) from [<c010cf58>] (show_stack+0x10/0x14) [<c010cf58>] (show_stack) from [<c0b38ffc>] (dump_stack+0xa4/0xc4) [<c0b38ffc>] (dump_stack) from [<c0193458>] (check_noncircular+0x14c/0x164) [<c0193458>] (check_noncircular) from [<c0196b90>] (__lock_acquire+0x1830/0x31cc) [<c0196b90>] (__lock_acquire) from [<c01991e4>] (lock_acquire+0x2e4/0x5dc) [<c01991e4>] (lock_acquire) from [<c0b4043c>] (__mutex_lock+0xa4/0xb60) [<c0b4043c>] (__mutex_lock) from [<c0b40f14>] (mutex_lock_nested+0x1c/0x24) [<c0b40f14>] (mutex_lock_nested) from [<c05ccd94>] (regulator_lock_dependent+0x4c/0x2b0) [<c05ccd94>] (regulator_lock_dependent) from [<c05d220c>] (regulator_enable+0x30/0xe4) [<c05d220c>] (regulator_enable) from [<c05d248c>] (regulator_resolve_supply+0x1cc/0x318) [<c05d248c>] (regulator_resolve_supply) from [<c05d2974>] (regulator_register_resolve_supply+0x14/0x78) [<c05d2974>] (regulator_register_resolve_supply) from [<c06a3000>] (class_for_each_device+0x68/0xe8) [<c06a3000>] (class_for_each_device) from [<c05d3e20>] (regulator_register+0xa2c/0xc9c) [<c05d3e20>] (regulator_register) from [<c05d5c70>] (devm_regulator_register+0x40/0x70) [<c05d5c70>] (devm_regulator_register) from [<c05dea58>] (tps65090_regulator_probe+0x150/0x648) [<c05dea58>] (tps65090_regulator_probe) from [<c06a3fe8>] (platform_probe+0x80/0xc0) [<c06a3fe8>] (platform_probe) from [<c06a1114>] (really_probe+0x1c4/0x4e4) [<c06a1114>] (really_probe) from [<c06a14ac>] (driver_probe_device+0x78/0x1d8) [<c06a14ac>] (driver_probe_device) from [<c069f1a4>] (bus_for_each_drv+0x78/0xbc) [<c069f1a4>] (bus_for_each_drv) from [<c06a0eb0>] (__device_attach+0xe8/0x180) [<c06a0eb0>] (__device_attach) from [<c069ff50>] (bus_probe_device+0x88/0x90) [<c069ff50>] (bus_probe_device) from [<c069dbac>] (device_add+0x4c4/0x7e8) [<c069dbac>] (device_add) from [<c06a3bac>] (platform_device_add+0x120/0x25c) [<c06a3bac>] (platform_device_add) from [<c06d5c7c>] (mfd_add_devices+0x580/0x60c) [<c06d5c7c>] (mfd_add_devices) from [<c06d80e8>] (tps65090_i2c_probe+0xb8/0x184) [<c06d80e8>] (tps65090_i2c_probe) from [<c0822520>] (i2c_device_probe+0x234/0x2a4) [<c0822520>] (i2c_device_probe) from [<c06a1114>] (really_probe+0x1c4/0x4e4) [<c06a1114>] (really_probe) from [<c06a14ac>] (driver_probe_device+0x78/0x1d8) [<c06a14ac>] (driver_probe_device) from [<c069f1a4>] (bus_for_each_drv+0x78/0xbc) [<c069f1a4>] (bus_for_each_drv) from [<c06a0eb0>] (__device_attach+0xe8/0x180) [<c06a0eb0>] (__device_attach) from [<c069ff50>] (bus_probe_device+0x88/0x90) [<c069ff50>] (bus_probe_device) from [<c069dbac>] (device_add+0x4c4/0x7e8) [<c069dbac>] (device_add) from [<c0824aec>] (i2c_new_client_device+0x15c/0x27c) [<c0824aec>] (i2c_new_client_device) from [<c08285e0>] (of_i2c_register_devices+0x114/0x184) [<c08285e0>] (of_i2c_register_devices) from [<c08254b8>] (i2c_register_adapter+0x1d8/0x6dc) [<c08254b8>] (i2c_register_adapter) from [<c082dd1c>] (ec_i2c_probe+0xc8/0x124) [<c082dd1c>] (ec_i2c_probe) from [<c06a3fe8>] (platform_probe+0x80/0xc0) [<c06a3fe8>] (platform_probe) from [<c06a1114>] (really_probe+0x1c4/0x4e4) [<c06a1114>] (really_probe) from [<c06a14ac>] (driver_probe_device+0x78/0x1d8) [<c06a14ac>] (driver_probe_device) from [<c069f1a4>] (bus_for_each_drv+0x78/0xbc) [<c069f1a4>] (bus_for_each_drv) from [<c06a0eb0>] (__device_attach+0xe8/0x180) [<c06a0eb0>] (__device_attach) from [<c069ff50>] (bus_probe_device+0x88/0x90) [<c069ff50>] (bus_probe_device) from [<c069dbac>] (device_add+0x4c4/0x7e8) [<c069dbac>] (device_add) from [<c08b140c>] (of_platform_device_create_pdata+0x90/0xc8) [<c08b140c>] (of_platform_device_create_pdata) from [<c08b15f0>] (of_platform_bus_create+0x1a0/0x4ec) [<c08b15f0>] (of_platform_bus_create) from [<c08b1af0>] (of_platform_populate+0x88/0x120) [<c08b1af0>] (of_platform_populate) from [<c08b1bdc>] (devm_of_platform_populate+0x40/0x80) [<c08b1bdc>] (devm_of_platform_populate) from [<c08b72fc>] (cros_ec_register+0x174/0x308) [<c08b72fc>] (cros_ec_register) from [<c08b868c>] (cros_ec_spi_probe+0x16c/0x1ec) [<c08b868c>] (cros_ec_spi_probe) from [<c071b2f4>] (spi_probe+0x88/0xac) [<c071b2f4>] (spi_probe) from [<c06a1114>] (really_probe+0x1c4/0x4e4) [<c06a1114>] (really_probe) from [<c06a14ac>] (driver_probe_device+0x78/0x1d8) [<c06a14ac>] (driver_probe_device) from [<c06a19c4>] (device_driver_attach+0x58/0x60) [<c06a19c4>] (device_driver_attach) from [<c06a1ac8>] (__driver_attach+0xfc/0x160) [<c06a1ac8>] (__driver_attach) from [<c069f0cc>] (bus_for_each_dev+0x6c/0xb8) [<c069f0cc>] (bus_for_each_dev) from [<c06a0204>] (bus_add_driver+0x170/0x20c) [<c06a0204>] (bus_add_driver) from [<c06a2968>] (driver_register+0x78/0x10c) [<c06a2968>] (driver_register) from [<c0102428>] (do_one_initcall+0x88/0x438) [<c0102428>] (do_one_initcall) from [<c1101104>] (kernel_init_freeable+0x18c/0x1dc) [<c1101104>] (kernel_init_freeable) from [<c0b3c65c>] (kernel_init+0x8/0x118) [<c0b3c65c>] (kernel_init) from [<c010011c>] (ret_from_fork+0x14/0x38) Exception stack(0xc1ce3fb0 to 0xc1ce3ff8) 3fa0: 00000000 00000000 00000000 00000000 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 I didn't analyze it yet if this warning is really an issue or just a false positive. If you have any hints or comments let me know. > --- > drivers/regulator/core.c | 39 ++++++++++++++++++++++++++++----------- > 1 file changed, 28 insertions(+), 11 deletions(-) > > diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c > index fee9241..3ae5ccd 100644 > --- a/drivers/regulator/core.c > +++ b/drivers/regulator/core.c > @@ -1813,23 +1813,34 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) > { > struct regulator_dev *r; > struct device *dev = rdev->dev.parent; > - int ret; > + int ret = 0; > > /* No supply to resolve? */ > if (!rdev->supply_name) > return 0; > > - /* Supply already resolved? */ > + /* Supply already resolved? (fast-path without locking contention) */ > if (rdev->supply) > return 0; > > + /* > + * Recheck rdev->supply with rdev->mutex lock held to avoid a race > + * between rdev->supply null check and setting rdev->supply in > + * set_supply() from concurrent tasks. > + */ > + regulator_lock(rdev); > + > + /* Supply just resolved by a concurrent task? */ > + if (rdev->supply) > + goto out; > + > r = regulator_dev_lookup(dev, rdev->supply_name); > if (IS_ERR(r)) { > ret = PTR_ERR(r); > > /* Did the lookup explicitly defer for us? */ > if (ret == -EPROBE_DEFER) > - return ret; > + goto out; > > if (have_full_constraints()) { > r = dummy_regulator_rdev; > @@ -1837,15 +1848,18 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) > } else { > dev_err(dev, "Failed to resolve %s-supply for %s\n", > rdev->supply_name, rdev->desc->name); > - return -EPROBE_DEFER; > + ret = -EPROBE_DEFER; > + goto out; > } > } > > if (r == rdev) { > dev_err(dev, "Supply for %s (%s) resolved to itself\n", > rdev->desc->name, rdev->supply_name); > - if (!have_full_constraints()) > - return -EINVAL; > + if (!have_full_constraints()) { > + ret = -EINVAL; > + goto out; > + } > r = dummy_regulator_rdev; > get_device(&r->dev); > } > @@ -1859,7 +1873,8 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) > if (r->dev.parent && r->dev.parent != rdev->dev.parent) { > if (!device_is_bound(r->dev.parent)) { > put_device(&r->dev); > - return -EPROBE_DEFER; > + ret = -EPROBE_DEFER; > + goto out; > } > } > > @@ -1867,13 +1882,13 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) > ret = regulator_resolve_supply(r); > if (ret < 0) { > put_device(&r->dev); > - return ret; > + goto out; > } > > ret = set_supply(rdev, r); > if (ret < 0) { > put_device(&r->dev); > - return ret; > + goto out; > } > > /* > @@ -1886,11 +1901,13 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) > if (ret < 0) { > _regulator_put(rdev->supply); > rdev->supply = NULL; > - return ret; > + goto out; > } > } > > - return 0; > +out: > + regulator_unlock(rdev); > + return ret; > } > > /* Internal regulator request function */ Best regards
Hi, On Wed, 13 Jan 2021 at 03:21, Marek Szyprowski <m.szyprowski@samsung.com> wrote: > > Hi, > <trim> > > This patch landed in linux next-20210112 as commit eaa7995c529b > ("regulator: core: avoid regulator_resolve_supply() race condition"). I > found that it triggers a following lockdep warning during the DWC3 > driver registration on some Exynos based boards (this log is from > Samsung Exynos5420-based Peach-Pit board): > > ====================================================== > WARNING: possible circular locking dependency detected > 5.11.0-rc1-00008-geaa7995c529b #10095 Not tainted > ------------------------------------------------------ > swapper/0/1 is trying to acquire lock: > c12e1b80 (regulator_list_mutex){+.+.}-{3:3}, at: > regulator_lock_dependent+0x4c/0x2b0 > > but task is already holding lock: > df7190c0 (regulator_ww_class_mutex){+.+.}-{3:3}, at: > regulator_resolve_supply+0x44/0x318 LKFT testing also found this lockdep warning on arm64 - hi6220-hikey while booting. [ 0.635532] WARNING: possible recursive locking detected [ 0.635558] 5.11.0-rc3-next-20210118 #1 Not tainted [ 0.635585] -------------------------------------------- [ 0.635611] swapper/0/1 is trying to acquire lock: [ 0.635636] ffff000000a13158 (regulator_ww_class_mutex){+.+.}-{3:3}, at: regulator_lock_recursive+0x9c/0x1e8 [ 0.635721] [ 0.635721] but task is already holding lock: [ 0.635749] ffff000000a13958 (regulator_ww_class_mutex){+.+.}-{3:3}, at: regulator_resolve_supply+0x70/0x2f0 [ 0.635817] [ 0.635817] other info that might help us debug this: [ 0.635847] Possible unsafe locking scenario: [ 0.635847] [ 0.635875] CPU0 [ 0.635892] ---- [ 0.635909] lock(regulator_ww_class_mutex); [ 0.635942] lock(regulator_ww_class_mutex); [ 0.635974] [ 0.635974] *** DEADLOCK *** [ 0.635974] [ 0.636002] May be due to missing lock nesting notation [ 0.636002] [ 0.636033] 4 locks held by swapper/0/1: [ 0.636057] #0: ffff000000a02988 (&dev->mutex){....}-{3:3}, at: __device_driver_lock+0x38/0x70 [ 0.636131] #1: ffff000000a13958 (regulator_ww_class_mutex){+.+.}-{3:3}, at: regulator_resolve_supply+0x70/0x2f0 [ 0.636205] #2: ffff800012b102c0 (regulator_list_mutex){+.+.}-{3:3}, at: regulator_lock_dependent+0x5c/0x290 [ 0.636280] #3: ffff8000137e3918 (regulator_ww_class_acquire){+.+.}-{0:0}, at: regulator_enable+0x40/0xe0 [ 0.636352] [ 0.636352] stack backtrace: [ 0.636378] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc3-next-20210118 #1 [ 0.636415] Hardware name: HiKey Development Board (DT) [ 0.636443] Call trace: [ 0.636460] dump_backtrace+0x0/0x1f0 [ 0.636490] show_stack+0x2c/0x80 [ 0.636516] dump_stack+0xf8/0x160 [ 0.636543] __lock_acquire+0xa3c/0x1718 [ 0.636571] lock_acquire+0x3d8/0x4f0 [ 0.636596] __ww_mutex_lock.constprop.14+0xbc/0xf68 [ 0.636628] ww_mutex_lock+0x6c/0x3e8 [ 0.636653] regulator_lock_recursive+0x9c/0x1e8 [ 0.636683] regulator_lock_dependent+0x198/0x290 [ 0.636713] regulator_enable+0x40/0xe0 [ 0.636739] regulator_resolve_supply+0x1e8/0x2f0 [ 0.636767] regulator_register_resolve_supply+0x24/0x80 [ 0.636797] class_for_each_device+0x78/0xf8 [ 0.636825] regulator_register+0x840/0xbb0 [ 0.636851] devm_regulator_register+0x50/0xa8 [ 0.636879] reg_fixed_voltage_probe+0x224/0x410 [ 0.636908] platform_probe+0x6c/0xd8 [ 0.636932] really_probe+0x2b8/0x520 [ 0.636960] driver_probe_device+0xf4/0x168 [ 0.636988] device_driver_attach+0x74/0x98 [ 0.637014] __driver_attach+0xc4/0x178 [ 0.637039] bus_for_each_dev+0x84/0xd8 [ 0.637066] driver_attach+0x30/0x40 [ 0.637092] bus_add_driver+0x170/0x258 [ 0.637119] driver_register+0x64/0x118 [ 0.637144] __platform_driver_register+0x34/0x40 [ 0.637172] regulator_fixed_voltage_init+0x20/0x28 [ 0.637205] do_one_initcall+0x94/0x4a0 [ 0.637231] kernel_init_freeable+0x2f0/0x344 [ 0.637261] kernel_init+0x18/0x120 Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Full boot log here: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20210118/testrun/3771538/suite/linux-log-parser/test/check-kernel-warning-2159912/log metadata: git branch: master git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git git describe: next-20210112 kernel-config: http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/hikey/lkft/linux-next/935/config
On Tue, Jan 12, 2021 at 10:34:19PM +0100, Marek Szyprowski wrote: > ====================================================== > WARNING: possible circular locking dependency detected > 5.11.0-rc1-00008-geaa7995c529b #10095 Not tainted > ------------------------------------------------------ > swapper/0/1 is trying to acquire lock: > c12e1b80 (regulator_list_mutex){+.+.}-{3:3}, at: > regulator_lock_dependent+0x4c/0x2b0 If you're sending backtraces or other enormous reports like this please run them through addr2line first so that things are a bit more leigible. > but task is already holding lock: > df7190c0 (regulator_ww_class_mutex){+.+.}-{3:3}, at: > regulator_resolve_supply+0x44/0x318 > > which lock already depends on the new lock. Does this help (completely untested): diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c index 3ae5ccd9277d..7d1422b00974 100644 --- a/drivers/regulator/core.c +++ b/drivers/regulator/core.c @@ -1823,17 +1823,6 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) if (rdev->supply) return 0; - /* - * Recheck rdev->supply with rdev->mutex lock held to avoid a race - * between rdev->supply null check and setting rdev->supply in - * set_supply() from concurrent tasks. - */ - regulator_lock(rdev); - - /* Supply just resolved by a concurrent task? */ - if (rdev->supply) - goto out; - r = regulator_dev_lookup(dev, rdev->supply_name); if (IS_ERR(r)) { ret = PTR_ERR(r); @@ -1885,10 +1874,23 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) goto out; } + /* + * Recheck rdev->supply with rdev->mutex lock held to avoid a race + * between rdev->supply null check and setting rdev->supply in + * set_supply() from concurrent tasks. + */ + regulator_lock(rdev); + + /* Supply just resolved by a concurrent task? */ + if (rdev->supply) { + put_device(&r->dev); + goto out_rdev_lock; + } + ret = set_supply(rdev, r); if (ret < 0) { put_device(&r->dev); - goto out; + goto out_rdev_lock; } /* @@ -1901,12 +1903,13 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) if (ret < 0) { _regulator_put(rdev->supply); rdev->supply = NULL; - goto out; + goto out_rdev_lock; } } -out: +out_rdev_lock: regulator_unlock(rdev); +out: return ret; }
Hi Mark, On 18.01.2021 21:49, Mark Brown wrote: > On Tue, Jan 12, 2021 at 10:34:19PM +0100, Marek Szyprowski wrote: >> ====================================================== >> WARNING: possible circular locking dependency detected >> 5.11.0-rc1-00008-geaa7995c529b #10095 Not tainted >> ------------------------------------------------------ >> swapper/0/1 is trying to acquire lock: >> c12e1b80 (regulator_list_mutex){+.+.}-{3:3}, at: >> regulator_lock_dependent+0x4c/0x2b0 > If you're sending backtraces or other enormous reports like this please > run them through addr2line first so that things are a bit more leigible. Well, I had a little time to process that issue, so I just copy-pasted the kernel log with the hope it will be useful. The trace is really long, but the function call stack is imho readable. If you need more details about any specific trace, just ask. I don't know any good method of processing the raw kernel logs with addr2line and keeping things readable. >> but task is already holding lock: >> df7190c0 (regulator_ww_class_mutex){+.+.}-{3:3}, at: >> regulator_resolve_supply+0x44/0x318 >> >> which lock already depends on the new lock. > Does this help (completely untested): Sadly nope. I get same warning: ====================================================== WARNING: possible circular locking dependency detected 5.11.0-rc3-next-20210118-00005-g56a65ff7ca8b #10162 Not tainted ------------------------------------------------------ swapper/0/1 is trying to acquire lock: c12e1e40 (regulator_list_mutex){+.+.}-{3:3}, at: regulator_lock_dependent+0x4c/0x2b4 but task is already holding lock: df4fe8c0 (regulator_ww_class_mutex){+.+.}-{3:3}, at: regulator_resolve_supply+0x98/0x320 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (regulator_ww_class_mutex){+.+.}-{3:3}: ww_mutex_lock+0x48/0x88 regulator_lock_recursive+0x84/0x1f4 regulator_lock_dependent+0x188/0x2b4 regulator_enable+0x30/0xe4 dwc3_exynos_probe+0x17c/0x2c0 platform_probe+0x80/0xc0 really_probe+0x1d4/0x4ec driver_probe_device+0x78/0x1d8 device_driver_attach+0x58/0x60 __driver_attach+0xfc/0x160 bus_for_each_dev+0x6c/0xb8 bus_add_driver+0x170/0x20c driver_register+0x78/0x10c do_one_initcall+0x88/0x438 kernel_init_freeable+0x190/0x1e0 kernel_init+0x8/0x118 ret_from_fork+0x14/0x38 0x0 -> #1 (regulator_ww_class_acquire){+.+.}-{0:0}: regulator_enable+0x30/0xe4 dwc3_exynos_probe+0x17c/0x2c0 platform_probe+0x80/0xc0 really_probe+0x1d4/0x4ec driver_probe_device+0x78/0x1d8 device_driver_attach+0x58/0x60 __driver_attach+0xfc/0x160 bus_for_each_dev+0x6c/0xb8 bus_add_driver+0x170/0x20c driver_register+0x78/0x10c do_one_initcall+0x88/0x438 kernel_init_freeable+0x190/0x1e0 kernel_init+0x8/0x118 ret_from_fork+0x14/0x38 0x0 -> #0 (regulator_list_mutex){+.+.}-{3:3}: lock_acquire+0x314/0x5d0 __mutex_lock+0xa4/0xb60 mutex_lock_nested+0x1c/0x24 regulator_lock_dependent+0x4c/0x2b4 regulator_enable+0x30/0xe4 regulator_resolve_supply+0x1d0/0x320 regulator_register_resolve_supply+0x14/0x78 class_for_each_device+0x68/0xe8 regulator_register+0xa30/0xca0 devm_regulator_register+0x40/0x70 tps65090_regulator_probe+0x150/0x648 platform_probe+0x80/0xc0 really_probe+0x1d4/0x4ec driver_probe_device+0x78/0x1d8 bus_for_each_drv+0x78/0xbc __device_attach+0xe8/0x180 bus_probe_device+0x88/0x90 device_add+0x4c8/0x7ec platform_device_add+0x120/0x25c mfd_add_devices+0x580/0x60c tps65090_i2c_probe+0xb8/0x184 i2c_device_probe+0x234/0x2a4 really_probe+0x1d4/0x4ec driver_probe_device+0x78/0x1d8 bus_for_each_drv+0x78/0xbc __device_attach+0xe8/0x180 bus_probe_device+0x88/0x90 device_add+0x4c8/0x7ec i2c_new_client_device+0x15c/0x27c of_i2c_register_devices+0x114/0x184 i2c_register_adapter+0x1d8/0x6dc ec_i2c_probe+0xc8/0x124 platform_probe+0x80/0xc0 really_probe+0x1d4/0x4ec driver_probe_device+0x78/0x1d8 bus_for_each_drv+0x78/0xbc __device_attach+0xe8/0x180 bus_probe_device+0x88/0x90 device_add+0x4c8/0x7ec of_platform_device_create_pdata+0x90/0xc8 of_platform_bus_create+0x1a0/0x4ec of_platform_populate+0x88/0x120 devm_of_platform_populate+0x40/0x80 cros_ec_register+0x174/0x308 cros_ec_spi_probe+0x16c/0x1ec spi_probe+0x88/0xac really_probe+0x1d4/0x4ec driver_probe_device+0x78/0x1d8 device_driver_attach+0x58/0x60 __driver_attach+0xfc/0x160 bus_for_each_dev+0x6c/0xb8 bus_add_driver+0x170/0x20c driver_register+0x78/0x10c do_one_initcall+0x88/0x438 kernel_init_freeable+0x190/0x1e0 kernel_init+0x8/0x118 ret_from_fork+0x14/0x38 0x0 other info that might help us debug this: Chain exists of: regulator_list_mutex --> regulator_ww_class_acquire --> regulator_ww_class_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(regulator_ww_class_mutex); lock(regulator_ww_class_acquire); lock(regulator_ww_class_mutex); lock(regulator_list_mutex); *** DEADLOCK *** 5 locks held by swapper/0/1: #0: dfbef0c8 (&dev->mutex){....}-{3:3}, at: device_driver_attach+0x18/0x60 #1: df4f84d8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x34/0x180 #2: df4f98e8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x34/0x180 #3: df509cd8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x34/0x180 #4: df4fe8c0 (regulator_ww_class_mutex){+.+.}-{3:3}, at: regulator_resolve_supply+0x98/0x320 stack backtrace: CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc3-next-20210118-00005-g56a65ff7ca8b #10162 Hardware name: Samsung Exynos (Flattened Device Tree) [<c01116e8>] (unwind_backtrace) from [<c010cf58>] (show_stack+0x10/0x14) [<c010cf58>] (show_stack) from [<c0b443c0>] (dump_stack+0xa4/0xc4) [<c0b443c0>] (dump_stack) from [<c01932e0>] (check_noncircular+0x14c/0x164) [<c01932e0>] (check_noncircular) from [<c0196a08>] (__lock_acquire+0x181c/0x3204) [<c0196a08>] (__lock_acquire) from [<c01990cc>] (lock_acquire+0x314/0x5d0) [<c01990cc>] (lock_acquire) from [<c0b4bd54>] (__mutex_lock+0xa4/0xb60) [<c0b4bd54>] (__mutex_lock) from [<c0b4c82c>] (mutex_lock_nested+0x1c/0x24) [<c0b4c82c>] (mutex_lock_nested) from [<c05d4544>] (regulator_lock_dependent+0x4c/0x2b4) [<c05d4544>] (regulator_lock_dependent) from [<c05d99c0>] (regulator_enable+0x30/0xe4) [<c05d99c0>] (regulator_enable) from [<c05d9c44>] (regulator_resolve_supply+0x1d0/0x320) [<c05d9c44>] (regulator_resolve_supply) from [<c05da130>] (regulator_register_resolve_supply+0x14/0x78) [<c05da130>] (regulator_register_resolve_supply) from [<c06aba80>] (class_for_each_device+0x68/0xe8) [<c06aba80>] (class_for_each_device) from [<c05db5e0>] (regulator_register+0xa30/0xca0) [<c05db5e0>] (regulator_register) from [<c05dd430>] (devm_regulator_register+0x40/0x70) [<c05dd430>] (devm_regulator_register) from [<c05e6218>] (tps65090_regulator_probe+0x150/0x648) [<c05e6218>] (tps65090_regulator_probe) from [<c06aca70>] (platform_probe+0x80/0xc0) [<c06aca70>] (platform_probe) from [<c06a9b9c>] (really_probe+0x1d4/0x4ec) [<c06a9b9c>] (really_probe) from [<c06a9f2c>] (driver_probe_device+0x78/0x1d8) [<c06a9f2c>] (driver_probe_device) from [<c06a7c24>] (bus_for_each_drv+0x78/0xbc) [<c06a7c24>] (bus_for_each_drv) from [<c06a9928>] (__device_attach+0xe8/0x180) [<c06a9928>] (__device_attach) from [<c06a89d0>] (bus_probe_device+0x88/0x90) [<c06a89d0>] (bus_probe_device) from [<c06a662c>] (device_add+0x4c8/0x7ec) [<c06a662c>] (device_add) from [<c06ac634>] (platform_device_add+0x120/0x25c) [<c06ac634>] (platform_device_add) from [<c06de87c>] (mfd_add_devices+0x580/0x60c) [<c06de87c>] (mfd_add_devices) from [<c06e0ce8>] (tps65090_i2c_probe+0xb8/0x184) [<c06e0ce8>] (tps65090_i2c_probe) from [<c082d2b8>] (i2c_device_probe+0x234/0x2a4) [<c082d2b8>] (i2c_device_probe) from [<c06a9b9c>] (really_probe+0x1d4/0x4ec) [<c06a9b9c>] (really_probe) from [<c06a9f2c>] (driver_probe_device+0x78/0x1d8) [<c06a9f2c>] (driver_probe_device) from [<c06a7c24>] (bus_for_each_drv+0x78/0xbc) [<c06a7c24>] (bus_for_each_drv) from [<c06a9928>] (__device_attach+0xe8/0x180) [<c06a9928>] (__device_attach) from [<c06a89d0>] (bus_probe_device+0x88/0x90) [<c06a89d0>] (bus_probe_device) from [<c06a662c>] (device_add+0x4c8/0x7ec) [<c06a662c>] (device_add) from [<c082f884>] (i2c_new_client_device+0x15c/0x27c) [<c082f884>] (i2c_new_client_device) from [<c08332dc>] (of_i2c_register_devices+0x114/0x184) [<c08332dc>] (of_i2c_register_devices) from [<c0830250>] (i2c_register_adapter+0x1d8/0x6dc) [<c0830250>] (i2c_register_adapter) from [<c0838a1c>] (ec_i2c_probe+0xc8/0x124) [<c0838a1c>] (ec_i2c_probe) from [<c06aca70>] (platform_probe+0x80/0xc0) [<c06aca70>] (platform_probe) from [<c06a9b9c>] (really_probe+0x1d4/0x4ec) [<c06a9b9c>] (really_probe) from [<c06a9f2c>] (driver_probe_device+0x78/0x1d8) [<c06a9f2c>] (driver_probe_device) from [<c06a7c24>] (bus_for_each_drv+0x78/0xbc) [<c06a7c24>] (bus_for_each_drv) from [<c06a9928>] (__device_attach+0xe8/0x180) [<c06a9928>] (__device_attach) from [<c06a89d0>] (bus_probe_device+0x88/0x90) [<c06a89d0>] (bus_probe_device) from [<c06a662c>] (device_add+0x4c8/0x7ec) [<c06a662c>] (device_add) from [<c08bba20>] (of_platform_device_create_pdata+0x90/0xc8) [<c08bba20>] (of_platform_device_create_pdata) from [<c08bbc04>] (of_platform_bus_create+0x1a0/0x4ec) [<c08bbc04>] (of_platform_bus_create) from [<c08bc104>] (of_platform_populate+0x88/0x120) [<c08bc104>] (of_platform_populate) from [<c08bc1f0>] (devm_of_platform_populate+0x40/0x80) [<c08bc1f0>] (devm_of_platform_populate) from [<c08c1910>] (cros_ec_register+0x174/0x308) [<c08c1910>] (cros_ec_register) from [<c08c2ca0>] (cros_ec_spi_probe+0x16c/0x1ec) [<c08c2ca0>] (cros_ec_spi_probe) from [<c07240fc>] (spi_probe+0x88/0xac) [<c07240fc>] (spi_probe) from [<c06a9b9c>] (really_probe+0x1d4/0x4ec) [<c06a9b9c>] (really_probe) from [<c06a9f2c>] (driver_probe_device+0x78/0x1d8) [<c06a9f2c>] (driver_probe_device) from [<c06aa444>] (device_driver_attach+0x58/0x60) [<c06aa444>] (device_driver_attach) from [<c06aa548>] (__driver_attach+0xfc/0x160) [<c06aa548>] (__driver_attach) from [<c06a7b4c>] (bus_for_each_dev+0x6c/0xb8) [<c06a7b4c>] (bus_for_each_dev) from [<c06a8c84>] (bus_add_driver+0x170/0x20c) [<c06a8c84>] (bus_add_driver) from [<c06ab3e8>] (driver_register+0x78/0x10c) [<c06ab3e8>] (driver_register) from [<c0102428>] (do_one_initcall+0x88/0x438) [<c0102428>] (do_one_initcall) from [<c11010d4>] (kernel_init_freeable+0x190/0x1e0) [<c11010d4>] (kernel_init_freeable) from [<c0b47db0>] (kernel_init+0x8/0x118) [<c0b47db0>] (kernel_init) from [<c010011c>] (ret_from_fork+0x14/0x38) Exception stack(0xc1ce3fb0 to 0xc1ce3ff8) Best regards
On Thu, Jan 21, 2021 at 10:41:59AM +0100, Marek Szyprowski wrote: > On 18.01.2021 21:49, Mark Brown wrote: > > Does this help (completely untested): > Sadly nope. I get same warning: Try this instead: diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c index 3ae5ccd9277d..31503776dbd7 100644 --- a/drivers/regulator/core.c +++ b/drivers/regulator/core.c @@ -1823,17 +1823,6 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) if (rdev->supply) return 0; - /* - * Recheck rdev->supply with rdev->mutex lock held to avoid a race - * between rdev->supply null check and setting rdev->supply in - * set_supply() from concurrent tasks. - */ - regulator_lock(rdev); - - /* Supply just resolved by a concurrent task? */ - if (rdev->supply) - goto out; - r = regulator_dev_lookup(dev, rdev->supply_name); if (IS_ERR(r)) { ret = PTR_ERR(r); @@ -1885,12 +1874,29 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) goto out; } + /* + * Recheck rdev->supply with rdev->mutex lock held to avoid a race + * between rdev->supply null check and setting rdev->supply in + * set_supply() from concurrent tasks. + */ + regulator_lock(rdev); + + /* Supply just resolved by a concurrent task? */ + if (rdev->supply) { + regulator_unlock(rdev); + put_device(&r->dev); + return ret; + } + ret = set_supply(rdev, r); if (ret < 0) { + regulator_unlock(rdev); put_device(&r->dev); - goto out; + return ret; } + regulator_unlock(rdev); + /* * In set_machine_constraints() we may have turned this regulator on * but we couldn't propagate to the supply if it hadn't been resolved @@ -1901,12 +1907,11 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) if (ret < 0) { _regulator_put(rdev->supply); rdev->supply = NULL; - goto out; + goto out_rdev_lock; } } out: - regulator_unlock(rdev); return ret; }
Hi Mark, On 21.01.2021 16:44, Mark Brown wrote: > On Thu, Jan 21, 2021 at 10:41:59AM +0100, Marek Szyprowski wrote: >> On 18.01.2021 21:49, Mark Brown wrote: >>> Does this help (completely untested): >> Sadly nope. I get same warning: > Try this instead: > > diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c > index 3ae5ccd9277d..31503776dbd7 100644 > --- a/drivers/regulator/core.c > +++ b/drivers/regulator/core.c > @@ -1823,17 +1823,6 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) > if (rdev->supply) > return 0; > > - /* > - * Recheck rdev->supply with rdev->mutex lock held to avoid a race > - * between rdev->supply null check and setting rdev->supply in > - * set_supply() from concurrent tasks. > - */ > - regulator_lock(rdev); > - > - /* Supply just resolved by a concurrent task? */ > - if (rdev->supply) > - goto out; > - > r = regulator_dev_lookup(dev, rdev->supply_name); > if (IS_ERR(r)) { > ret = PTR_ERR(r); > @@ -1885,12 +1874,29 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) > goto out; > } > > + /* > + * Recheck rdev->supply with rdev->mutex lock held to avoid a race > + * between rdev->supply null check and setting rdev->supply in > + * set_supply() from concurrent tasks. > + */ > + regulator_lock(rdev); > + > + /* Supply just resolved by a concurrent task? */ > + if (rdev->supply) { > + regulator_unlock(rdev); > + put_device(&r->dev); > + return ret; > + } > + > ret = set_supply(rdev, r); > if (ret < 0) { > + regulator_unlock(rdev); > put_device(&r->dev); > - goto out; > + return ret; > } > > + regulator_unlock(rdev); > + > /* > * In set_machine_constraints() we may have turned this regulator on > * but we couldn't propagate to the supply if it hadn't been resolved > @@ -1901,12 +1907,11 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) > if (ret < 0) { > _regulator_put(rdev->supply); > rdev->supply = NULL; > - goto out; > + goto out_rdev_lock; drivers/regulator/core.c:1910:4: error: label ‘out_rdev_lock’ used but not defined > } > } > > out: > - regulator_unlock(rdev); > return ret; > } > It looks that it finally fixes the locking issue, with the above goto removed completely to fix build. Feel free to add: Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Best regards
Hello Mark, On 1/21/21 12:30 PM, Marek Szyprowski wrote: > Hi Mark, > > On 21.01.2021 16:44, Mark Brown wrote: >> On Thu, Jan 21, 2021 at 10:41:59AM +0100, Marek Szyprowski wrote: >>> On 18.01.2021 21:49, Mark Brown wrote: >>>> Does this help (completely untested): >>> Sadly nope. I get same warning: >> Try this instead: >> >> diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c >> index 3ae5ccd9277d..31503776dbd7 100644 >> --- a/drivers/regulator/core.c >> +++ b/drivers/regulator/core.c >> @@ -1823,17 +1823,6 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) >> if (rdev->supply) >> return 0; >> >> - /* >> - * Recheck rdev->supply with rdev->mutex lock held to avoid a race >> - * between rdev->supply null check and setting rdev->supply in >> - * set_supply() from concurrent tasks. >> - */ >> - regulator_lock(rdev); >> - >> - /* Supply just resolved by a concurrent task? */ >> - if (rdev->supply) >> - goto out; >> - >> r = regulator_dev_lookup(dev, rdev->supply_name); >> if (IS_ERR(r)) { >> ret = PTR_ERR(r); >> @@ -1885,12 +1874,29 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) >> goto out; >> } >> >> + /* >> + * Recheck rdev->supply with rdev->mutex lock held to avoid a race >> + * between rdev->supply null check and setting rdev->supply in >> + * set_supply() from concurrent tasks. >> + */ >> + regulator_lock(rdev); >> + >> + /* Supply just resolved by a concurrent task? */ >> + if (rdev->supply) { >> + regulator_unlock(rdev); >> + put_device(&r->dev); >> + return ret; >> + } >> + >> ret = set_supply(rdev, r); >> if (ret < 0) { >> + regulator_unlock(rdev); >> put_device(&r->dev); >> - goto out; >> + return ret; >> } >> >> + regulator_unlock(rdev); >> + >> /* >> * In set_machine_constraints() we may have turned this regulator on >> * but we couldn't propagate to the supply if it hadn't been resolved >> @@ -1901,12 +1907,11 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) >> if (ret < 0) { >> _regulator_put(rdev->supply); >> rdev->supply = NULL; >> - goto out; >> + goto out_rdev_lock; > > drivers/regulator/core.c:1910:4: error: label ‘out_rdev_lock’ used but > not defined > >> } >> } >> >> out: >> - regulator_unlock(rdev); >> return ret; >> } >> > > It looks that it finally fixes the locking issue, with the above goto > removed completely to fix build. Feel free to add: > > Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Thank you for making this fix. I'm sorry that I missed the potential deadlock issue resulting from the regulator_enable() call inside regulator_resolve_supply() with rdev->mutex locked. Your fix avoids deadlock while still ensuring that the there isn't a set supply race condition. Take care, David
Hello: This patch was applied to qcom/linux.git (refs/heads/for-next): On Thu, 7 Jan 2021 17:16:02 -0800 you wrote: > The final step in regulator_register() is to call > regulator_resolve_supply() for each registered regulator > (including the one in the process of being registered). The > regulator_resolve_supply() function first checks if rdev->supply > is NULL, then it performs various steps to try to find the supply. > If successful, rdev->supply is set inside of set_supply(). > > [...] Here is the summary with links: - regulator: core: avoid regulator_resolve_supply() race condition https://git.kernel.org/qcom/c/eaa7995c529b You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c index fee9241..3ae5ccd 100644 --- a/drivers/regulator/core.c +++ b/drivers/regulator/core.c @@ -1813,23 +1813,34 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) { struct regulator_dev *r; struct device *dev = rdev->dev.parent; - int ret; + int ret = 0; /* No supply to resolve? */ if (!rdev->supply_name) return 0; - /* Supply already resolved? */ + /* Supply already resolved? (fast-path without locking contention) */ if (rdev->supply) return 0; + /* + * Recheck rdev->supply with rdev->mutex lock held to avoid a race + * between rdev->supply null check and setting rdev->supply in + * set_supply() from concurrent tasks. + */ + regulator_lock(rdev); + + /* Supply just resolved by a concurrent task? */ + if (rdev->supply) + goto out; + r = regulator_dev_lookup(dev, rdev->supply_name); if (IS_ERR(r)) { ret = PTR_ERR(r); /* Did the lookup explicitly defer for us? */ if (ret == -EPROBE_DEFER) - return ret; + goto out; if (have_full_constraints()) { r = dummy_regulator_rdev; @@ -1837,15 +1848,18 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) } else { dev_err(dev, "Failed to resolve %s-supply for %s\n", rdev->supply_name, rdev->desc->name); - return -EPROBE_DEFER; + ret = -EPROBE_DEFER; + goto out; } } if (r == rdev) { dev_err(dev, "Supply for %s (%s) resolved to itself\n", rdev->desc->name, rdev->supply_name); - if (!have_full_constraints()) - return -EINVAL; + if (!have_full_constraints()) { + ret = -EINVAL; + goto out; + } r = dummy_regulator_rdev; get_device(&r->dev); } @@ -1859,7 +1873,8 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) if (r->dev.parent && r->dev.parent != rdev->dev.parent) { if (!device_is_bound(r->dev.parent)) { put_device(&r->dev); - return -EPROBE_DEFER; + ret = -EPROBE_DEFER; + goto out; } } @@ -1867,13 +1882,13 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) ret = regulator_resolve_supply(r); if (ret < 0) { put_device(&r->dev); - return ret; + goto out; } ret = set_supply(rdev, r); if (ret < 0) { put_device(&r->dev); - return ret; + goto out; } /* @@ -1886,11 +1901,13 @@ static int regulator_resolve_supply(struct regulator_dev *rdev) if (ret < 0) { _regulator_put(rdev->supply); rdev->supply = NULL; - return ret; + goto out; } } - return 0; +out: + regulator_unlock(rdev); + return ret; } /* Internal regulator request function */
The final step in regulator_register() is to call regulator_resolve_supply() for each registered regulator (including the one in the process of being registered). The regulator_resolve_supply() function first checks if rdev->supply is NULL, then it performs various steps to try to find the supply. If successful, rdev->supply is set inside of set_supply(). This procedure can encounter a race condition if two concurrent tasks call regulator_register() near to each other on separate CPUs and one of the regulators has rdev->supply_name specified. There is currently nothing guaranteeing atomicity between the rdev->supply check and set steps. Thus, both tasks can observe rdev->supply==NULL in their regulator_resolve_supply() calls. This then results in both creating a struct regulator for the supply. One ends up actually stored in rdev->supply and the other is lost (though still present in the supply's consumer_list). Here is a kernel log snippet showing the issue: [ 12.421768] gpu_cc_gx_gdsc: supplied by pm8350_s5_level [ 12.425854] gpu_cc_gx_gdsc: supplied by pm8350_s5_level [ 12.429064] debugfs: Directory 'regulator.4-SUPPLY' with parent '17a00000.rsc:rpmh-regulator-gfxlvl-pm8350_s5_level' already present! Avoid this race condition by holding the rdev->mutex lock inside of regulator_resolve_supply() while checking and setting rdev->supply. Signed-off-by: David Collins <collinsd@codeaurora.org> --- drivers/regulator/core.c | 39 ++++++++++++++++++++++++++++----------- 1 file changed, 28 insertions(+), 11 deletions(-)