diff mbox series

regulator: core: avoid regulator_resolve_supply() race condition

Message ID 1610068562-4410-1-git-send-email-collinsd@codeaurora.org (mailing list archive)
State Accepted
Commit eaa7995c529b54d68d97a30f6344cc6ca2f214a7
Headers show
Series regulator: core: avoid regulator_resolve_supply() race condition | expand

Commit Message

David Collins Jan. 8, 2021, 1:16 a.m. UTC
The final step in regulator_register() is to call
regulator_resolve_supply() for each registered regulator
(including the one in the process of being registered).  The
regulator_resolve_supply() function first checks if rdev->supply
is NULL, then it performs various steps to try to find the supply.
If successful, rdev->supply is set inside of set_supply().

This procedure can encounter a race condition if two concurrent
tasks call regulator_register() near to each other on separate CPUs
and one of the regulators has rdev->supply_name specified.  There
is currently nothing guaranteeing atomicity between the rdev->supply
check and set steps.  Thus, both tasks can observe rdev->supply==NULL
in their regulator_resolve_supply() calls.  This then results in
both creating a struct regulator for the supply.  One ends up
actually stored in rdev->supply and the other is lost (though still
present in the supply's consumer_list).

Here is a kernel log snippet showing the issue:

[   12.421768] gpu_cc_gx_gdsc: supplied by pm8350_s5_level
[   12.425854] gpu_cc_gx_gdsc: supplied by pm8350_s5_level
[   12.429064] debugfs: Directory 'regulator.4-SUPPLY' with parent
               '17a00000.rsc:rpmh-regulator-gfxlvl-pm8350_s5_level'
               already present!

Avoid this race condition by holding the rdev->mutex lock inside
of regulator_resolve_supply() while checking and setting
rdev->supply.

Signed-off-by: David Collins <collinsd@codeaurora.org>
---
 drivers/regulator/core.c | 39 ++++++++++++++++++++++++++++-----------
 1 file changed, 28 insertions(+), 11 deletions(-)

Comments

Mark Brown Jan. 11, 2021, 4:28 p.m. UTC | #1
On Thu, 7 Jan 2021 17:16:02 -0800, David Collins wrote:
> The final step in regulator_register() is to call
> regulator_resolve_supply() for each registered regulator
> (including the one in the process of being registered).  The
> regulator_resolve_supply() function first checks if rdev->supply
> is NULL, then it performs various steps to try to find the supply.
> If successful, rdev->supply is set inside of set_supply().
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git for-next

Thanks!

[1/1] regulator: core: avoid regulator_resolve_supply() race condition
      commit: eaa7995c529b54d68d97a30f6344cc6ca2f214a7

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark
Marek Szyprowski Jan. 12, 2021, 9:34 p.m. UTC | #2
Hi,

On 08.01.2021 02:16, David Collins wrote:
> The final step in regulator_register() is to call
> regulator_resolve_supply() for each registered regulator
> (including the one in the process of being registered).  The
> regulator_resolve_supply() function first checks if rdev->supply
> is NULL, then it performs various steps to try to find the supply.
> If successful, rdev->supply is set inside of set_supply().
>
> This procedure can encounter a race condition if two concurrent
> tasks call regulator_register() near to each other on separate CPUs
> and one of the regulators has rdev->supply_name specified.  There
> is currently nothing guaranteeing atomicity between the rdev->supply
> check and set steps.  Thus, both tasks can observe rdev->supply==NULL
> in their regulator_resolve_supply() calls.  This then results in
> both creating a struct regulator for the supply.  One ends up
> actually stored in rdev->supply and the other is lost (though still
> present in the supply's consumer_list).
>
> Here is a kernel log snippet showing the issue:
>
> [   12.421768] gpu_cc_gx_gdsc: supplied by pm8350_s5_level
> [   12.425854] gpu_cc_gx_gdsc: supplied by pm8350_s5_level
> [   12.429064] debugfs: Directory 'regulator.4-SUPPLY' with parent
>                 '17a00000.rsc:rpmh-regulator-gfxlvl-pm8350_s5_level'
>                 already present!
>
> Avoid this race condition by holding the rdev->mutex lock inside
> of regulator_resolve_supply() while checking and setting
> rdev->supply.
>
> Signed-off-by: David Collins <collinsd@codeaurora.org>

This patch landed in linux next-20210112 as commit eaa7995c529b 
("regulator: core: avoid regulator_resolve_supply() race condition"). I 
found that it triggers a following lockdep warning during the DWC3 
driver registration on some Exynos based boards (this log is from 
Samsung Exynos5420-based Peach-Pit board):

======================================================
WARNING: possible circular locking dependency detected
5.11.0-rc1-00008-geaa7995c529b #10095 Not tainted
------------------------------------------------------
swapper/0/1 is trying to acquire lock:
c12e1b80 (regulator_list_mutex){+.+.}-{3:3}, at: 
regulator_lock_dependent+0x4c/0x2b0

but task is already holding lock:
df7190c0 (regulator_ww_class_mutex){+.+.}-{3:3}, at: 
regulator_resolve_supply+0x44/0x318

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (regulator_ww_class_mutex){+.+.}-{3:3}:
        ww_mutex_lock+0x48/0x88
        regulator_lock_recursive+0x84/0x1f4
        regulator_lock_dependent+0x184/0x2b0
        regulator_enable+0x30/0xe4
        dwc3_exynos_probe+0x17c/0x2c0
        platform_probe+0x80/0xc0
        really_probe+0x1c4/0x4e4
        driver_probe_device+0x78/0x1d8
        device_driver_attach+0x58/0x60
        __driver_attach+0xfc/0x160
        bus_for_each_dev+0x6c/0xb8
        bus_add_driver+0x170/0x20c
        driver_register+0x78/0x10c
        do_one_initcall+0x88/0x438
        kernel_init_freeable+0x18c/0x1dc
        kernel_init+0x8/0x118
        ret_from_fork+0x14/0x38
        0x0

-> #1 (regulator_ww_class_acquire){+.+.}-{0:0}:
        regulator_enable+0x30/0xe4
        dwc3_exynos_probe+0x17c/0x2c0
        platform_probe+0x80/0xc0
        really_probe+0x1c4/0x4e4
        driver_probe_device+0x78/0x1d8
        device_driver_attach+0x58/0x60
        __driver_attach+0xfc/0x160
        bus_for_each_dev+0x6c/0xb8
        bus_add_driver+0x170/0x20c
        driver_register+0x78/0x10c
        do_one_initcall+0x88/0x438
        kernel_init_freeable+0x18c/0x1dc
        kernel_init+0x8/0x118
        ret_from_fork+0x14/0x38
        0x0

-> #0 (regulator_list_mutex){+.+.}-{3:3}:
        lock_acquire+0x2e4/0x5dc
        __mutex_lock+0xa4/0xb60
        mutex_lock_nested+0x1c/0x24
        regulator_lock_dependent+0x4c/0x2b0
        regulator_enable+0x30/0xe4
        regulator_resolve_supply+0x1cc/0x318
        regulator_register_resolve_supply+0x14/0x78
        class_for_each_device+0x68/0xe8
        regulator_register+0xa2c/0xc9c
        devm_regulator_register+0x40/0x70
        tps65090_regulator_probe+0x150/0x648
        platform_probe+0x80/0xc0
        really_probe+0x1c4/0x4e4
        driver_probe_device+0x78/0x1d8
        bus_for_each_drv+0x78/0xbc
        __device_attach+0xe8/0x180
        bus_probe_device+0x88/0x90
        device_add+0x4c4/0x7e8
        platform_device_add+0x120/0x25c
        mfd_add_devices+0x580/0x60c
        tps65090_i2c_probe+0xb8/0x184
        i2c_device_probe+0x234/0x2a4
        really_probe+0x1c4/0x4e4
        driver_probe_device+0x78/0x1d8
        bus_for_each_drv+0x78/0xbc
        __device_attach+0xe8/0x180
        bus_probe_device+0x88/0x90
        device_add+0x4c4/0x7e8
        i2c_new_client_device+0x15c/0x27c
        of_i2c_register_devices+0x114/0x184
        i2c_register_adapter+0x1d8/0x6dc
        ec_i2c_probe+0xc8/0x124
        platform_probe+0x80/0xc0
        really_probe+0x1c4/0x4e4
        driver_probe_device+0x78/0x1d8
        bus_for_each_drv+0x78/0xbc
        __device_attach+0xe8/0x180
        bus_probe_device+0x88/0x90
        device_add+0x4c4/0x7e8
        of_platform_device_create_pdata+0x90/0xc8
        of_platform_bus_create+0x1a0/0x4ec
        of_platform_populate+0x88/0x120
        devm_of_platform_populate+0x40/0x80
        cros_ec_register+0x174/0x308
        cros_ec_spi_probe+0x16c/0x1ec
        spi_probe+0x88/0xac
        really_probe+0x1c4/0x4e4
        driver_probe_device+0x78/0x1d8
        device_driver_attach+0x58/0x60
        __driver_attach+0xfc/0x160
        bus_for_each_dev+0x6c/0xb8
        bus_add_driver+0x170/0x20c
        driver_register+0x78/0x10c
        do_one_initcall+0x88/0x438
        kernel_init_freeable+0x18c/0x1dc
        kernel_init+0x8/0x118
        ret_from_fork+0x14/0x38
        0x0

other info that might help us debug this:

Chain exists of:
   regulator_list_mutex --> regulator_ww_class_acquire --> 
regulator_ww_class_mutex

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(regulator_ww_class_mutex);
                                lock(regulator_ww_class_acquire);
                                lock(regulator_ww_class_mutex);
   lock(regulator_list_mutex);

  *** DEADLOCK ***

5 locks held by swapper/0/1:
  #0: dfb6e4c8 (&dev->mutex){....}-{3:3}, at: device_driver_attach+0x18/0x60
  #1: c1fedcd8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x34/0x180
  #2: df53a4e8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x34/0x180
  #3: df5224d8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x34/0x180
  #4: df7190c0 (regulator_ww_class_mutex){+.+.}-{3:3}, at: 
regulator_resolve_supply+0x44/0x318

stack backtrace:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc1-00008-geaa7995c529b 
#10095
Hardware name: Samsung Exynos (Flattened Device Tree)
[<c01116e8>] (unwind_backtrace) from [<c010cf58>] (show_stack+0x10/0x14)
[<c010cf58>] (show_stack) from [<c0b38ffc>] (dump_stack+0xa4/0xc4)
[<c0b38ffc>] (dump_stack) from [<c0193458>] (check_noncircular+0x14c/0x164)
[<c0193458>] (check_noncircular) from [<c0196b90>] 
(__lock_acquire+0x1830/0x31cc)
[<c0196b90>] (__lock_acquire) from [<c01991e4>] (lock_acquire+0x2e4/0x5dc)
[<c01991e4>] (lock_acquire) from [<c0b4043c>] (__mutex_lock+0xa4/0xb60)
[<c0b4043c>] (__mutex_lock) from [<c0b40f14>] (mutex_lock_nested+0x1c/0x24)
[<c0b40f14>] (mutex_lock_nested) from [<c05ccd94>] 
(regulator_lock_dependent+0x4c/0x2b0)
[<c05ccd94>] (regulator_lock_dependent) from [<c05d220c>] 
(regulator_enable+0x30/0xe4)
[<c05d220c>] (regulator_enable) from [<c05d248c>] 
(regulator_resolve_supply+0x1cc/0x318)
[<c05d248c>] (regulator_resolve_supply) from [<c05d2974>] 
(regulator_register_resolve_supply+0x14/0x78)
[<c05d2974>] (regulator_register_resolve_supply) from [<c06a3000>] 
(class_for_each_device+0x68/0xe8)
[<c06a3000>] (class_for_each_device) from [<c05d3e20>] 
(regulator_register+0xa2c/0xc9c)
[<c05d3e20>] (regulator_register) from [<c05d5c70>] 
(devm_regulator_register+0x40/0x70)
[<c05d5c70>] (devm_regulator_register) from [<c05dea58>] 
(tps65090_regulator_probe+0x150/0x648)
[<c05dea58>] (tps65090_regulator_probe) from [<c06a3fe8>] 
(platform_probe+0x80/0xc0)
[<c06a3fe8>] (platform_probe) from [<c06a1114>] (really_probe+0x1c4/0x4e4)
[<c06a1114>] (really_probe) from [<c06a14ac>] 
(driver_probe_device+0x78/0x1d8)
[<c06a14ac>] (driver_probe_device) from [<c069f1a4>] 
(bus_for_each_drv+0x78/0xbc)
[<c069f1a4>] (bus_for_each_drv) from [<c06a0eb0>] 
(__device_attach+0xe8/0x180)
[<c06a0eb0>] (__device_attach) from [<c069ff50>] 
(bus_probe_device+0x88/0x90)
[<c069ff50>] (bus_probe_device) from [<c069dbac>] (device_add+0x4c4/0x7e8)
[<c069dbac>] (device_add) from [<c06a3bac>] 
(platform_device_add+0x120/0x25c)
[<c06a3bac>] (platform_device_add) from [<c06d5c7c>] 
(mfd_add_devices+0x580/0x60c)
[<c06d5c7c>] (mfd_add_devices) from [<c06d80e8>] 
(tps65090_i2c_probe+0xb8/0x184)
[<c06d80e8>] (tps65090_i2c_probe) from [<c0822520>] 
(i2c_device_probe+0x234/0x2a4)
[<c0822520>] (i2c_device_probe) from [<c06a1114>] (really_probe+0x1c4/0x4e4)
[<c06a1114>] (really_probe) from [<c06a14ac>] 
(driver_probe_device+0x78/0x1d8)
[<c06a14ac>] (driver_probe_device) from [<c069f1a4>] 
(bus_for_each_drv+0x78/0xbc)
[<c069f1a4>] (bus_for_each_drv) from [<c06a0eb0>] 
(__device_attach+0xe8/0x180)
[<c06a0eb0>] (__device_attach) from [<c069ff50>] 
(bus_probe_device+0x88/0x90)
[<c069ff50>] (bus_probe_device) from [<c069dbac>] (device_add+0x4c4/0x7e8)
[<c069dbac>] (device_add) from [<c0824aec>] 
(i2c_new_client_device+0x15c/0x27c)
[<c0824aec>] (i2c_new_client_device) from [<c08285e0>] 
(of_i2c_register_devices+0x114/0x184)
[<c08285e0>] (of_i2c_register_devices) from [<c08254b8>] 
(i2c_register_adapter+0x1d8/0x6dc)
[<c08254b8>] (i2c_register_adapter) from [<c082dd1c>] 
(ec_i2c_probe+0xc8/0x124)
[<c082dd1c>] (ec_i2c_probe) from [<c06a3fe8>] (platform_probe+0x80/0xc0)
[<c06a3fe8>] (platform_probe) from [<c06a1114>] (really_probe+0x1c4/0x4e4)
[<c06a1114>] (really_probe) from [<c06a14ac>] 
(driver_probe_device+0x78/0x1d8)
[<c06a14ac>] (driver_probe_device) from [<c069f1a4>] 
(bus_for_each_drv+0x78/0xbc)
[<c069f1a4>] (bus_for_each_drv) from [<c06a0eb0>] 
(__device_attach+0xe8/0x180)
[<c06a0eb0>] (__device_attach) from [<c069ff50>] 
(bus_probe_device+0x88/0x90)
[<c069ff50>] (bus_probe_device) from [<c069dbac>] (device_add+0x4c4/0x7e8)
[<c069dbac>] (device_add) from [<c08b140c>] 
(of_platform_device_create_pdata+0x90/0xc8)
[<c08b140c>] (of_platform_device_create_pdata) from [<c08b15f0>] 
(of_platform_bus_create+0x1a0/0x4ec)
[<c08b15f0>] (of_platform_bus_create) from [<c08b1af0>] 
(of_platform_populate+0x88/0x120)
[<c08b1af0>] (of_platform_populate) from [<c08b1bdc>] 
(devm_of_platform_populate+0x40/0x80)
[<c08b1bdc>] (devm_of_platform_populate) from [<c08b72fc>] 
(cros_ec_register+0x174/0x308)
[<c08b72fc>] (cros_ec_register) from [<c08b868c>] 
(cros_ec_spi_probe+0x16c/0x1ec)
[<c08b868c>] (cros_ec_spi_probe) from [<c071b2f4>] (spi_probe+0x88/0xac)
[<c071b2f4>] (spi_probe) from [<c06a1114>] (really_probe+0x1c4/0x4e4)
[<c06a1114>] (really_probe) from [<c06a14ac>] 
(driver_probe_device+0x78/0x1d8)
[<c06a14ac>] (driver_probe_device) from [<c06a19c4>] 
(device_driver_attach+0x58/0x60)
[<c06a19c4>] (device_driver_attach) from [<c06a1ac8>] 
(__driver_attach+0xfc/0x160)
[<c06a1ac8>] (__driver_attach) from [<c069f0cc>] 
(bus_for_each_dev+0x6c/0xb8)
[<c069f0cc>] (bus_for_each_dev) from [<c06a0204>] 
(bus_add_driver+0x170/0x20c)
[<c06a0204>] (bus_add_driver) from [<c06a2968>] (driver_register+0x78/0x10c)
[<c06a2968>] (driver_register) from [<c0102428>] 
(do_one_initcall+0x88/0x438)
[<c0102428>] (do_one_initcall) from [<c1101104>] 
(kernel_init_freeable+0x18c/0x1dc)
[<c1101104>] (kernel_init_freeable) from [<c0b3c65c>] 
(kernel_init+0x8/0x118)
[<c0b3c65c>] (kernel_init) from [<c010011c>] (ret_from_fork+0x14/0x38)
Exception stack(0xc1ce3fb0 to 0xc1ce3ff8)
3fa0:                                     00000000 00000000 00000000 
00000000
3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
00000000
3fe0: 00000000 00000000 00000000 00000000 00000013 00000000

I didn't analyze it yet if this warning is really an issue or just a 
false positive. If you have any hints or comments let me know.

> ---
>   drivers/regulator/core.c | 39 ++++++++++++++++++++++++++++-----------
>   1 file changed, 28 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
> index fee9241..3ae5ccd 100644
> --- a/drivers/regulator/core.c
> +++ b/drivers/regulator/core.c
> @@ -1813,23 +1813,34 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
>   {
>   	struct regulator_dev *r;
>   	struct device *dev = rdev->dev.parent;
> -	int ret;
> +	int ret = 0;
>   
>   	/* No supply to resolve? */
>   	if (!rdev->supply_name)
>   		return 0;
>   
> -	/* Supply already resolved? */
> +	/* Supply already resolved? (fast-path without locking contention) */
>   	if (rdev->supply)
>   		return 0;
>   
> +	/*
> +	 * Recheck rdev->supply with rdev->mutex lock held to avoid a race
> +	 * between rdev->supply null check and setting rdev->supply in
> +	 * set_supply() from concurrent tasks.
> +	 */
> +	regulator_lock(rdev);
> +
> +	/* Supply just resolved by a concurrent task? */
> +	if (rdev->supply)
> +		goto out;
> +
>   	r = regulator_dev_lookup(dev, rdev->supply_name);
>   	if (IS_ERR(r)) {
>   		ret = PTR_ERR(r);
>   
>   		/* Did the lookup explicitly defer for us? */
>   		if (ret == -EPROBE_DEFER)
> -			return ret;
> +			goto out;
>   
>   		if (have_full_constraints()) {
>   			r = dummy_regulator_rdev;
> @@ -1837,15 +1848,18 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
>   		} else {
>   			dev_err(dev, "Failed to resolve %s-supply for %s\n",
>   				rdev->supply_name, rdev->desc->name);
> -			return -EPROBE_DEFER;
> +			ret = -EPROBE_DEFER;
> +			goto out;
>   		}
>   	}
>   
>   	if (r == rdev) {
>   		dev_err(dev, "Supply for %s (%s) resolved to itself\n",
>   			rdev->desc->name, rdev->supply_name);
> -		if (!have_full_constraints())
> -			return -EINVAL;
> +		if (!have_full_constraints()) {
> +			ret = -EINVAL;
> +			goto out;
> +		}
>   		r = dummy_regulator_rdev;
>   		get_device(&r->dev);
>   	}
> @@ -1859,7 +1873,8 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
>   	if (r->dev.parent && r->dev.parent != rdev->dev.parent) {
>   		if (!device_is_bound(r->dev.parent)) {
>   			put_device(&r->dev);
> -			return -EPROBE_DEFER;
> +			ret = -EPROBE_DEFER;
> +			goto out;
>   		}
>   	}
>   
> @@ -1867,13 +1882,13 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
>   	ret = regulator_resolve_supply(r);
>   	if (ret < 0) {
>   		put_device(&r->dev);
> -		return ret;
> +		goto out;
>   	}
>   
>   	ret = set_supply(rdev, r);
>   	if (ret < 0) {
>   		put_device(&r->dev);
> -		return ret;
> +		goto out;
>   	}
>   
>   	/*
> @@ -1886,11 +1901,13 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
>   		if (ret < 0) {
>   			_regulator_put(rdev->supply);
>   			rdev->supply = NULL;
> -			return ret;
> +			goto out;
>   		}
>   	}
>   
> -	return 0;
> +out:
> +	regulator_unlock(rdev);
> +	return ret;
>   }
>   
>   /* Internal regulator request function */

Best regards
Naresh Kamboju Jan. 18, 2021, 11:41 a.m. UTC | #3
Hi,

On Wed, 13 Jan 2021 at 03:21, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>
> Hi,
>

<trim>

>
> This patch landed in linux next-20210112 as commit eaa7995c529b
> ("regulator: core: avoid regulator_resolve_supply() race condition"). I
> found that it triggers a following lockdep warning during the DWC3
> driver registration on some Exynos based boards (this log is from
> Samsung Exynos5420-based Peach-Pit board):
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.11.0-rc1-00008-geaa7995c529b #10095 Not tainted
> ------------------------------------------------------
> swapper/0/1 is trying to acquire lock:
> c12e1b80 (regulator_list_mutex){+.+.}-{3:3}, at:
> regulator_lock_dependent+0x4c/0x2b0
>
> but task is already holding lock:
> df7190c0 (regulator_ww_class_mutex){+.+.}-{3:3}, at:
> regulator_resolve_supply+0x44/0x318

LKFT testing also found this lockdep warning on
arm64 - hi6220-hikey while booting.

[    0.635532] WARNING: possible recursive locking detected
[    0.635558] 5.11.0-rc3-next-20210118 #1 Not tainted
[    0.635585] --------------------------------------------
[    0.635611] swapper/0/1 is trying to acquire lock:
[    0.635636] ffff000000a13158
(regulator_ww_class_mutex){+.+.}-{3:3}, at:
regulator_lock_recursive+0x9c/0x1e8
[    0.635721]
[    0.635721] but task is already holding lock:
[    0.635749] ffff000000a13958
(regulator_ww_class_mutex){+.+.}-{3:3}, at:
regulator_resolve_supply+0x70/0x2f0
[    0.635817]
[    0.635817] other info that might help us debug this:
[    0.635847]  Possible unsafe locking scenario:
[    0.635847]
[    0.635875]        CPU0
[    0.635892]        ----
[    0.635909]   lock(regulator_ww_class_mutex);
[    0.635942]   lock(regulator_ww_class_mutex);
[    0.635974]
[    0.635974]  *** DEADLOCK ***
[    0.635974]
[    0.636002]  May be due to missing lock nesting notation
[    0.636002]
[    0.636033] 4 locks held by swapper/0/1:
[    0.636057]  #0: ffff000000a02988 (&dev->mutex){....}-{3:3}, at:
__device_driver_lock+0x38/0x70
[    0.636131]  #1: ffff000000a13958
(regulator_ww_class_mutex){+.+.}-{3:3}, at:
regulator_resolve_supply+0x70/0x2f0
[    0.636205]  #2: ffff800012b102c0
(regulator_list_mutex){+.+.}-{3:3}, at:
regulator_lock_dependent+0x5c/0x290
[    0.636280]  #3: ffff8000137e3918
(regulator_ww_class_acquire){+.+.}-{0:0}, at:
regulator_enable+0x40/0xe0
[    0.636352]
[    0.636352] stack backtrace:
[    0.636378] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.11.0-rc3-next-20210118 #1
[    0.636415] Hardware name: HiKey Development Board (DT)
[    0.636443] Call trace:
[    0.636460]  dump_backtrace+0x0/0x1f0
[    0.636490]  show_stack+0x2c/0x80
[    0.636516]  dump_stack+0xf8/0x160
[    0.636543]  __lock_acquire+0xa3c/0x1718
[    0.636571]  lock_acquire+0x3d8/0x4f0
[    0.636596]  __ww_mutex_lock.constprop.14+0xbc/0xf68
[    0.636628]  ww_mutex_lock+0x6c/0x3e8
[    0.636653]  regulator_lock_recursive+0x9c/0x1e8
[    0.636683]  regulator_lock_dependent+0x198/0x290
[    0.636713]  regulator_enable+0x40/0xe0
[    0.636739]  regulator_resolve_supply+0x1e8/0x2f0
[    0.636767]  regulator_register_resolve_supply+0x24/0x80
[    0.636797]  class_for_each_device+0x78/0xf8
[    0.636825]  regulator_register+0x840/0xbb0
[    0.636851]  devm_regulator_register+0x50/0xa8
[    0.636879]  reg_fixed_voltage_probe+0x224/0x410
[    0.636908]  platform_probe+0x6c/0xd8
[    0.636932]  really_probe+0x2b8/0x520
[    0.636960]  driver_probe_device+0xf4/0x168
[    0.636988]  device_driver_attach+0x74/0x98
[    0.637014]  __driver_attach+0xc4/0x178
[    0.637039]  bus_for_each_dev+0x84/0xd8
[    0.637066]  driver_attach+0x30/0x40
[    0.637092]  bus_add_driver+0x170/0x258
[    0.637119]  driver_register+0x64/0x118
[    0.637144]  __platform_driver_register+0x34/0x40
[    0.637172]  regulator_fixed_voltage_init+0x20/0x28
[    0.637205]  do_one_initcall+0x94/0x4a0
[    0.637231]  kernel_init_freeable+0x2f0/0x344
[    0.637261]  kernel_init+0x18/0x120

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>

Full boot log here:
https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20210118/testrun/3771538/suite/linux-log-parser/test/check-kernel-warning-2159912/log

metadata:
  git branch: master
  git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
  git describe: next-20210112
  kernel-config:
http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/hikey/lkft/linux-next/935/config
Mark Brown Jan. 18, 2021, 8:49 p.m. UTC | #4
On Tue, Jan 12, 2021 at 10:34:19PM +0100, Marek Szyprowski wrote:

> ======================================================
> WARNING: possible circular locking dependency detected
> 5.11.0-rc1-00008-geaa7995c529b #10095 Not tainted
> ------------------------------------------------------
> swapper/0/1 is trying to acquire lock:
> c12e1b80 (regulator_list_mutex){+.+.}-{3:3}, at: 
> regulator_lock_dependent+0x4c/0x2b0

If you're sending backtraces or other enormous reports like this please
run them through addr2line first so that things are a bit more leigible.

> but task is already holding lock:
> df7190c0 (regulator_ww_class_mutex){+.+.}-{3:3}, at: 
> regulator_resolve_supply+0x44/0x318
> 
> which lock already depends on the new lock.

Does this help (completely untested):

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 3ae5ccd9277d..7d1422b00974 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -1823,17 +1823,6 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
 	if (rdev->supply)
 		return 0;
 
-	/*
-	 * Recheck rdev->supply with rdev->mutex lock held to avoid a race
-	 * between rdev->supply null check and setting rdev->supply in
-	 * set_supply() from concurrent tasks.
-	 */
-	regulator_lock(rdev);
-
-	/* Supply just resolved by a concurrent task? */
-	if (rdev->supply)
-		goto out;
-
 	r = regulator_dev_lookup(dev, rdev->supply_name);
 	if (IS_ERR(r)) {
 		ret = PTR_ERR(r);
@@ -1885,10 +1874,23 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
 		goto out;
 	}
 
+	/*
+	 * Recheck rdev->supply with rdev->mutex lock held to avoid a race
+	 * between rdev->supply null check and setting rdev->supply in
+	 * set_supply() from concurrent tasks.
+	 */
+	regulator_lock(rdev);
+
+	/* Supply just resolved by a concurrent task? */
+	if (rdev->supply) {
+		put_device(&r->dev);
+		goto out_rdev_lock;
+	}
+
 	ret = set_supply(rdev, r);
 	if (ret < 0) {
 		put_device(&r->dev);
-		goto out;
+		goto out_rdev_lock;
 	}
 
 	/*
@@ -1901,12 +1903,13 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
 		if (ret < 0) {
 			_regulator_put(rdev->supply);
 			rdev->supply = NULL;
-			goto out;
+			goto out_rdev_lock;
 		}
 	}
 
-out:
+out_rdev_lock:
 	regulator_unlock(rdev);
+out:
 	return ret;
 }
Marek Szyprowski Jan. 21, 2021, 9:41 a.m. UTC | #5
Hi Mark,

On 18.01.2021 21:49, Mark Brown wrote:
> On Tue, Jan 12, 2021 at 10:34:19PM +0100, Marek Szyprowski wrote:
>> ======================================================
>> WARNING: possible circular locking dependency detected
>> 5.11.0-rc1-00008-geaa7995c529b #10095 Not tainted
>> ------------------------------------------------------
>> swapper/0/1 is trying to acquire lock:
>> c12e1b80 (regulator_list_mutex){+.+.}-{3:3}, at:
>> regulator_lock_dependent+0x4c/0x2b0
> If you're sending backtraces or other enormous reports like this please
> run them through addr2line first so that things are a bit more leigible.

Well, I had a little time to process that issue, so I just copy-pasted 
the kernel log with the hope it will be useful. The trace is really 
long, but the function call stack is imho readable.

If you need more details about any specific trace, just ask. I don't 
know any good method of processing the raw kernel logs with addr2line 
and keeping things readable.

>> but task is already holding lock:
>> df7190c0 (regulator_ww_class_mutex){+.+.}-{3:3}, at:
>> regulator_resolve_supply+0x44/0x318
>>
>> which lock already depends on the new lock.
> Does this help (completely untested):

Sadly nope. I get same warning:

======================================================
WARNING: possible circular locking dependency detected
5.11.0-rc3-next-20210118-00005-g56a65ff7ca8b #10162 Not tainted
------------------------------------------------------
swapper/0/1 is trying to acquire lock:
c12e1e40 (regulator_list_mutex){+.+.}-{3:3}, at: 
regulator_lock_dependent+0x4c/0x2b4

but task is already holding lock:
df4fe8c0 (regulator_ww_class_mutex){+.+.}-{3:3}, at: 
regulator_resolve_supply+0x98/0x320

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (regulator_ww_class_mutex){+.+.}-{3:3}:
        ww_mutex_lock+0x48/0x88
        regulator_lock_recursive+0x84/0x1f4
        regulator_lock_dependent+0x188/0x2b4
        regulator_enable+0x30/0xe4
        dwc3_exynos_probe+0x17c/0x2c0
        platform_probe+0x80/0xc0
        really_probe+0x1d4/0x4ec
        driver_probe_device+0x78/0x1d8
        device_driver_attach+0x58/0x60
        __driver_attach+0xfc/0x160
        bus_for_each_dev+0x6c/0xb8
        bus_add_driver+0x170/0x20c
        driver_register+0x78/0x10c
        do_one_initcall+0x88/0x438
        kernel_init_freeable+0x190/0x1e0
        kernel_init+0x8/0x118
        ret_from_fork+0x14/0x38
        0x0

-> #1 (regulator_ww_class_acquire){+.+.}-{0:0}:
        regulator_enable+0x30/0xe4
        dwc3_exynos_probe+0x17c/0x2c0
        platform_probe+0x80/0xc0
        really_probe+0x1d4/0x4ec
        driver_probe_device+0x78/0x1d8
        device_driver_attach+0x58/0x60
        __driver_attach+0xfc/0x160
        bus_for_each_dev+0x6c/0xb8
        bus_add_driver+0x170/0x20c
        driver_register+0x78/0x10c
        do_one_initcall+0x88/0x438
        kernel_init_freeable+0x190/0x1e0
        kernel_init+0x8/0x118
        ret_from_fork+0x14/0x38
        0x0

-> #0 (regulator_list_mutex){+.+.}-{3:3}:
        lock_acquire+0x314/0x5d0
        __mutex_lock+0xa4/0xb60
        mutex_lock_nested+0x1c/0x24
        regulator_lock_dependent+0x4c/0x2b4
        regulator_enable+0x30/0xe4
        regulator_resolve_supply+0x1d0/0x320
        regulator_register_resolve_supply+0x14/0x78
        class_for_each_device+0x68/0xe8
        regulator_register+0xa30/0xca0
        devm_regulator_register+0x40/0x70
        tps65090_regulator_probe+0x150/0x648
        platform_probe+0x80/0xc0
        really_probe+0x1d4/0x4ec
        driver_probe_device+0x78/0x1d8
        bus_for_each_drv+0x78/0xbc
        __device_attach+0xe8/0x180
        bus_probe_device+0x88/0x90
        device_add+0x4c8/0x7ec
        platform_device_add+0x120/0x25c
        mfd_add_devices+0x580/0x60c
        tps65090_i2c_probe+0xb8/0x184
        i2c_device_probe+0x234/0x2a4
        really_probe+0x1d4/0x4ec
        driver_probe_device+0x78/0x1d8
        bus_for_each_drv+0x78/0xbc
        __device_attach+0xe8/0x180
        bus_probe_device+0x88/0x90
        device_add+0x4c8/0x7ec
        i2c_new_client_device+0x15c/0x27c
        of_i2c_register_devices+0x114/0x184
        i2c_register_adapter+0x1d8/0x6dc
        ec_i2c_probe+0xc8/0x124
        platform_probe+0x80/0xc0
        really_probe+0x1d4/0x4ec
        driver_probe_device+0x78/0x1d8
        bus_for_each_drv+0x78/0xbc
        __device_attach+0xe8/0x180
        bus_probe_device+0x88/0x90
        device_add+0x4c8/0x7ec
        of_platform_device_create_pdata+0x90/0xc8
        of_platform_bus_create+0x1a0/0x4ec
        of_platform_populate+0x88/0x120
        devm_of_platform_populate+0x40/0x80
        cros_ec_register+0x174/0x308
        cros_ec_spi_probe+0x16c/0x1ec
        spi_probe+0x88/0xac
        really_probe+0x1d4/0x4ec
        driver_probe_device+0x78/0x1d8
        device_driver_attach+0x58/0x60
        __driver_attach+0xfc/0x160
        bus_for_each_dev+0x6c/0xb8
        bus_add_driver+0x170/0x20c
        driver_register+0x78/0x10c
        do_one_initcall+0x88/0x438
        kernel_init_freeable+0x190/0x1e0
        kernel_init+0x8/0x118
        ret_from_fork+0x14/0x38
        0x0

other info that might help us debug this:

Chain exists of:
   regulator_list_mutex --> regulator_ww_class_acquire --> 
regulator_ww_class_mutex

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(regulator_ww_class_mutex);
                                lock(regulator_ww_class_acquire);
                                lock(regulator_ww_class_mutex);
   lock(regulator_list_mutex);

  *** DEADLOCK ***

5 locks held by swapper/0/1:
  #0: dfbef0c8 (&dev->mutex){....}-{3:3}, at: device_driver_attach+0x18/0x60
  #1: df4f84d8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x34/0x180
  #2: df4f98e8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x34/0x180
  #3: df509cd8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x34/0x180
  #4: df4fe8c0 (regulator_ww_class_mutex){+.+.}-{3:3}, at: 
regulator_resolve_supply+0x98/0x320

stack backtrace:
CPU: 3 PID: 1 Comm: swapper/0 Not tainted 
5.11.0-rc3-next-20210118-00005-g56a65ff7ca8b #10162
Hardware name: Samsung Exynos (Flattened Device Tree)
[<c01116e8>] (unwind_backtrace) from [<c010cf58>] (show_stack+0x10/0x14)
[<c010cf58>] (show_stack) from [<c0b443c0>] (dump_stack+0xa4/0xc4)
[<c0b443c0>] (dump_stack) from [<c01932e0>] (check_noncircular+0x14c/0x164)
[<c01932e0>] (check_noncircular) from [<c0196a08>] 
(__lock_acquire+0x181c/0x3204)
[<c0196a08>] (__lock_acquire) from [<c01990cc>] (lock_acquire+0x314/0x5d0)
[<c01990cc>] (lock_acquire) from [<c0b4bd54>] (__mutex_lock+0xa4/0xb60)
[<c0b4bd54>] (__mutex_lock) from [<c0b4c82c>] (mutex_lock_nested+0x1c/0x24)
[<c0b4c82c>] (mutex_lock_nested) from [<c05d4544>] 
(regulator_lock_dependent+0x4c/0x2b4)
[<c05d4544>] (regulator_lock_dependent) from [<c05d99c0>] 
(regulator_enable+0x30/0xe4)
[<c05d99c0>] (regulator_enable) from [<c05d9c44>] 
(regulator_resolve_supply+0x1d0/0x320)
[<c05d9c44>] (regulator_resolve_supply) from [<c05da130>] 
(regulator_register_resolve_supply+0x14/0x78)
[<c05da130>] (regulator_register_resolve_supply) from [<c06aba80>] 
(class_for_each_device+0x68/0xe8)
[<c06aba80>] (class_for_each_device) from [<c05db5e0>] 
(regulator_register+0xa30/0xca0)
[<c05db5e0>] (regulator_register) from [<c05dd430>] 
(devm_regulator_register+0x40/0x70)
[<c05dd430>] (devm_regulator_register) from [<c05e6218>] 
(tps65090_regulator_probe+0x150/0x648)
[<c05e6218>] (tps65090_regulator_probe) from [<c06aca70>] 
(platform_probe+0x80/0xc0)
[<c06aca70>] (platform_probe) from [<c06a9b9c>] (really_probe+0x1d4/0x4ec)
[<c06a9b9c>] (really_probe) from [<c06a9f2c>] 
(driver_probe_device+0x78/0x1d8)
[<c06a9f2c>] (driver_probe_device) from [<c06a7c24>] 
(bus_for_each_drv+0x78/0xbc)
[<c06a7c24>] (bus_for_each_drv) from [<c06a9928>] 
(__device_attach+0xe8/0x180)
[<c06a9928>] (__device_attach) from [<c06a89d0>] 
(bus_probe_device+0x88/0x90)
[<c06a89d0>] (bus_probe_device) from [<c06a662c>] (device_add+0x4c8/0x7ec)
[<c06a662c>] (device_add) from [<c06ac634>] 
(platform_device_add+0x120/0x25c)
[<c06ac634>] (platform_device_add) from [<c06de87c>] 
(mfd_add_devices+0x580/0x60c)
[<c06de87c>] (mfd_add_devices) from [<c06e0ce8>] 
(tps65090_i2c_probe+0xb8/0x184)
[<c06e0ce8>] (tps65090_i2c_probe) from [<c082d2b8>] 
(i2c_device_probe+0x234/0x2a4)
[<c082d2b8>] (i2c_device_probe) from [<c06a9b9c>] (really_probe+0x1d4/0x4ec)
[<c06a9b9c>] (really_probe) from [<c06a9f2c>] 
(driver_probe_device+0x78/0x1d8)
[<c06a9f2c>] (driver_probe_device) from [<c06a7c24>] 
(bus_for_each_drv+0x78/0xbc)
[<c06a7c24>] (bus_for_each_drv) from [<c06a9928>] 
(__device_attach+0xe8/0x180)
[<c06a9928>] (__device_attach) from [<c06a89d0>] 
(bus_probe_device+0x88/0x90)
[<c06a89d0>] (bus_probe_device) from [<c06a662c>] (device_add+0x4c8/0x7ec)
[<c06a662c>] (device_add) from [<c082f884>] 
(i2c_new_client_device+0x15c/0x27c)
[<c082f884>] (i2c_new_client_device) from [<c08332dc>] 
(of_i2c_register_devices+0x114/0x184)
[<c08332dc>] (of_i2c_register_devices) from [<c0830250>] 
(i2c_register_adapter+0x1d8/0x6dc)
[<c0830250>] (i2c_register_adapter) from [<c0838a1c>] 
(ec_i2c_probe+0xc8/0x124)
[<c0838a1c>] (ec_i2c_probe) from [<c06aca70>] (platform_probe+0x80/0xc0)
[<c06aca70>] (platform_probe) from [<c06a9b9c>] (really_probe+0x1d4/0x4ec)
[<c06a9b9c>] (really_probe) from [<c06a9f2c>] 
(driver_probe_device+0x78/0x1d8)
[<c06a9f2c>] (driver_probe_device) from [<c06a7c24>] 
(bus_for_each_drv+0x78/0xbc)
[<c06a7c24>] (bus_for_each_drv) from [<c06a9928>] 
(__device_attach+0xe8/0x180)
[<c06a9928>] (__device_attach) from [<c06a89d0>] 
(bus_probe_device+0x88/0x90)
[<c06a89d0>] (bus_probe_device) from [<c06a662c>] (device_add+0x4c8/0x7ec)
[<c06a662c>] (device_add) from [<c08bba20>] 
(of_platform_device_create_pdata+0x90/0xc8)
[<c08bba20>] (of_platform_device_create_pdata) from [<c08bbc04>] 
(of_platform_bus_create+0x1a0/0x4ec)
[<c08bbc04>] (of_platform_bus_create) from [<c08bc104>] 
(of_platform_populate+0x88/0x120)
[<c08bc104>] (of_platform_populate) from [<c08bc1f0>] 
(devm_of_platform_populate+0x40/0x80)
[<c08bc1f0>] (devm_of_platform_populate) from [<c08c1910>] 
(cros_ec_register+0x174/0x308)
[<c08c1910>] (cros_ec_register) from [<c08c2ca0>] 
(cros_ec_spi_probe+0x16c/0x1ec)
[<c08c2ca0>] (cros_ec_spi_probe) from [<c07240fc>] (spi_probe+0x88/0xac)
[<c07240fc>] (spi_probe) from [<c06a9b9c>] (really_probe+0x1d4/0x4ec)
[<c06a9b9c>] (really_probe) from [<c06a9f2c>] 
(driver_probe_device+0x78/0x1d8)
[<c06a9f2c>] (driver_probe_device) from [<c06aa444>] 
(device_driver_attach+0x58/0x60)
[<c06aa444>] (device_driver_attach) from [<c06aa548>] 
(__driver_attach+0xfc/0x160)
[<c06aa548>] (__driver_attach) from [<c06a7b4c>] 
(bus_for_each_dev+0x6c/0xb8)
[<c06a7b4c>] (bus_for_each_dev) from [<c06a8c84>] 
(bus_add_driver+0x170/0x20c)
[<c06a8c84>] (bus_add_driver) from [<c06ab3e8>] (driver_register+0x78/0x10c)
[<c06ab3e8>] (driver_register) from [<c0102428>] 
(do_one_initcall+0x88/0x438)
[<c0102428>] (do_one_initcall) from [<c11010d4>] 
(kernel_init_freeable+0x190/0x1e0)
[<c11010d4>] (kernel_init_freeable) from [<c0b47db0>] 
(kernel_init+0x8/0x118)
[<c0b47db0>] (kernel_init) from [<c010011c>] (ret_from_fork+0x14/0x38)
Exception stack(0xc1ce3fb0 to 0xc1ce3ff8)

Best regards
Mark Brown Jan. 21, 2021, 3:44 p.m. UTC | #6
On Thu, Jan 21, 2021 at 10:41:59AM +0100, Marek Szyprowski wrote:
> On 18.01.2021 21:49, Mark Brown wrote:

> > Does this help (completely untested):

> Sadly nope. I get same warning:

Try this instead:

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 3ae5ccd9277d..31503776dbd7 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -1823,17 +1823,6 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
 	if (rdev->supply)
 		return 0;
 
-	/*
-	 * Recheck rdev->supply with rdev->mutex lock held to avoid a race
-	 * between rdev->supply null check and setting rdev->supply in
-	 * set_supply() from concurrent tasks.
-	 */
-	regulator_lock(rdev);
-
-	/* Supply just resolved by a concurrent task? */
-	if (rdev->supply)
-		goto out;
-
 	r = regulator_dev_lookup(dev, rdev->supply_name);
 	if (IS_ERR(r)) {
 		ret = PTR_ERR(r);
@@ -1885,12 +1874,29 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
 		goto out;
 	}
 
+	/*
+	 * Recheck rdev->supply with rdev->mutex lock held to avoid a race
+	 * between rdev->supply null check and setting rdev->supply in
+	 * set_supply() from concurrent tasks.
+	 */
+	regulator_lock(rdev);
+
+	/* Supply just resolved by a concurrent task? */
+	if (rdev->supply) {
+		regulator_unlock(rdev);
+		put_device(&r->dev);
+		return ret;
+	}
+
 	ret = set_supply(rdev, r);
 	if (ret < 0) {
+		regulator_unlock(rdev);
 		put_device(&r->dev);
-		goto out;
+		return ret;
 	}
 
+	regulator_unlock(rdev);
+
 	/*
 	 * In set_machine_constraints() we may have turned this regulator on
 	 * but we couldn't propagate to the supply if it hadn't been resolved
@@ -1901,12 +1907,11 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
 		if (ret < 0) {
 			_regulator_put(rdev->supply);
 			rdev->supply = NULL;
-			goto out;
+			goto out_rdev_lock;
 		}
 	}
 
 out:
-	regulator_unlock(rdev);
 	return ret;
 }
Marek Szyprowski Jan. 21, 2021, 8:30 p.m. UTC | #7
Hi Mark,

On 21.01.2021 16:44, Mark Brown wrote:
> On Thu, Jan 21, 2021 at 10:41:59AM +0100, Marek Szyprowski wrote:
>> On 18.01.2021 21:49, Mark Brown wrote:
>>> Does this help (completely untested):
>> Sadly nope. I get same warning:
> Try this instead:
>
> diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
> index 3ae5ccd9277d..31503776dbd7 100644
> --- a/drivers/regulator/core.c
> +++ b/drivers/regulator/core.c
> @@ -1823,17 +1823,6 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
>   	if (rdev->supply)
>   		return 0;
>   
> -	/*
> -	 * Recheck rdev->supply with rdev->mutex lock held to avoid a race
> -	 * between rdev->supply null check and setting rdev->supply in
> -	 * set_supply() from concurrent tasks.
> -	 */
> -	regulator_lock(rdev);
> -
> -	/* Supply just resolved by a concurrent task? */
> -	if (rdev->supply)
> -		goto out;
> -
>   	r = regulator_dev_lookup(dev, rdev->supply_name);
>   	if (IS_ERR(r)) {
>   		ret = PTR_ERR(r);
> @@ -1885,12 +1874,29 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
>   		goto out;
>   	}
>   
> +	/*
> +	 * Recheck rdev->supply with rdev->mutex lock held to avoid a race
> +	 * between rdev->supply null check and setting rdev->supply in
> +	 * set_supply() from concurrent tasks.
> +	 */
> +	regulator_lock(rdev);
> +
> +	/* Supply just resolved by a concurrent task? */
> +	if (rdev->supply) {
> +		regulator_unlock(rdev);
> +		put_device(&r->dev);
> +		return ret;
> +	}
> +
>   	ret = set_supply(rdev, r);
>   	if (ret < 0) {
> +		regulator_unlock(rdev);
>   		put_device(&r->dev);
> -		goto out;
> +		return ret;
>   	}
>   
> +	regulator_unlock(rdev);
> +
>   	/*
>   	 * In set_machine_constraints() we may have turned this regulator on
>   	 * but we couldn't propagate to the supply if it hadn't been resolved
> @@ -1901,12 +1907,11 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
>   		if (ret < 0) {
>   			_regulator_put(rdev->supply);
>   			rdev->supply = NULL;
> -			goto out;
> +			goto out_rdev_lock;

drivers/regulator/core.c:1910:4: error: label ‘out_rdev_lock’ used but 
not defined

>   		}
>   	}
>   
>   out:
> -	regulator_unlock(rdev);
>   	return ret;
>   }
>   

It looks that it finally fixes the locking issue, with the above goto 
removed completely to fix build. Feel free to add:

Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

Best regards
David Collins Jan. 23, 2021, 1:54 a.m. UTC | #8
Hello Mark,

On 1/21/21 12:30 PM, Marek Szyprowski wrote:
> Hi Mark,
> 
> On 21.01.2021 16:44, Mark Brown wrote:
>> On Thu, Jan 21, 2021 at 10:41:59AM +0100, Marek Szyprowski wrote:
>>> On 18.01.2021 21:49, Mark Brown wrote:
>>>> Does this help (completely untested):
>>> Sadly nope. I get same warning:
>> Try this instead:
>>
>> diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
>> index 3ae5ccd9277d..31503776dbd7 100644
>> --- a/drivers/regulator/core.c
>> +++ b/drivers/regulator/core.c
>> @@ -1823,17 +1823,6 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
>>    	if (rdev->supply)
>>    		return 0;
>>    
>> -	/*
>> -	 * Recheck rdev->supply with rdev->mutex lock held to avoid a race
>> -	 * between rdev->supply null check and setting rdev->supply in
>> -	 * set_supply() from concurrent tasks.
>> -	 */
>> -	regulator_lock(rdev);
>> -
>> -	/* Supply just resolved by a concurrent task? */
>> -	if (rdev->supply)
>> -		goto out;
>> -
>>    	r = regulator_dev_lookup(dev, rdev->supply_name);
>>    	if (IS_ERR(r)) {
>>    		ret = PTR_ERR(r);
>> @@ -1885,12 +1874,29 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
>>    		goto out;
>>    	}
>>    
>> +	/*
>> +	 * Recheck rdev->supply with rdev->mutex lock held to avoid a race
>> +	 * between rdev->supply null check and setting rdev->supply in
>> +	 * set_supply() from concurrent tasks.
>> +	 */
>> +	regulator_lock(rdev);
>> +
>> +	/* Supply just resolved by a concurrent task? */
>> +	if (rdev->supply) {
>> +		regulator_unlock(rdev);
>> +		put_device(&r->dev);
>> +		return ret;
>> +	}
>> +
>>    	ret = set_supply(rdev, r);
>>    	if (ret < 0) {
>> +		regulator_unlock(rdev);
>>    		put_device(&r->dev);
>> -		goto out;
>> +		return ret;
>>    	}
>>    
>> +	regulator_unlock(rdev);
>> +
>>    	/*
>>    	 * In set_machine_constraints() we may have turned this regulator on
>>    	 * but we couldn't propagate to the supply if it hadn't been resolved
>> @@ -1901,12 +1907,11 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
>>    		if (ret < 0) {
>>    			_regulator_put(rdev->supply);
>>    			rdev->supply = NULL;
>> -			goto out;
>> +			goto out_rdev_lock;
> 
> drivers/regulator/core.c:1910:4: error: label ‘out_rdev_lock’ used but
> not defined
> 
>>    		}
>>    	}
>>    
>>    out:
>> -	regulator_unlock(rdev);
>>    	return ret;
>>    }
>>    
> 
> It looks that it finally fixes the locking issue, with the above goto
> removed completely to fix build. Feel free to add:
> 
> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> 
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

Thank you for making this fix.  I'm sorry that I missed the potential 
deadlock issue resulting from the regulator_enable() call inside 
regulator_resolve_supply() with rdev->mutex locked.  Your fix avoids 
deadlock while still ensuring that the there isn't a set supply race 
condition.

Take care,
David
patchwork-bot+linux-arm-msm@kernel.org March 1, 2021, 7:59 p.m. UTC | #9
Hello:

This patch was applied to qcom/linux.git (refs/heads/for-next):

On Thu,  7 Jan 2021 17:16:02 -0800 you wrote:
> The final step in regulator_register() is to call
> regulator_resolve_supply() for each registered regulator
> (including the one in the process of being registered).  The
> regulator_resolve_supply() function first checks if rdev->supply
> is NULL, then it performs various steps to try to find the supply.
> If successful, rdev->supply is set inside of set_supply().
> 
> [...]

Here is the summary with links:
  - regulator: core: avoid regulator_resolve_supply() race condition
    https://git.kernel.org/qcom/c/eaa7995c529b

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
diff mbox series

Patch

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index fee9241..3ae5ccd 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -1813,23 +1813,34 @@  static int regulator_resolve_supply(struct regulator_dev *rdev)
 {
 	struct regulator_dev *r;
 	struct device *dev = rdev->dev.parent;
-	int ret;
+	int ret = 0;
 
 	/* No supply to resolve? */
 	if (!rdev->supply_name)
 		return 0;
 
-	/* Supply already resolved? */
+	/* Supply already resolved? (fast-path without locking contention) */
 	if (rdev->supply)
 		return 0;
 
+	/*
+	 * Recheck rdev->supply with rdev->mutex lock held to avoid a race
+	 * between rdev->supply null check and setting rdev->supply in
+	 * set_supply() from concurrent tasks.
+	 */
+	regulator_lock(rdev);
+
+	/* Supply just resolved by a concurrent task? */
+	if (rdev->supply)
+		goto out;
+
 	r = regulator_dev_lookup(dev, rdev->supply_name);
 	if (IS_ERR(r)) {
 		ret = PTR_ERR(r);
 
 		/* Did the lookup explicitly defer for us? */
 		if (ret == -EPROBE_DEFER)
-			return ret;
+			goto out;
 
 		if (have_full_constraints()) {
 			r = dummy_regulator_rdev;
@@ -1837,15 +1848,18 @@  static int regulator_resolve_supply(struct regulator_dev *rdev)
 		} else {
 			dev_err(dev, "Failed to resolve %s-supply for %s\n",
 				rdev->supply_name, rdev->desc->name);
-			return -EPROBE_DEFER;
+			ret = -EPROBE_DEFER;
+			goto out;
 		}
 	}
 
 	if (r == rdev) {
 		dev_err(dev, "Supply for %s (%s) resolved to itself\n",
 			rdev->desc->name, rdev->supply_name);
-		if (!have_full_constraints())
-			return -EINVAL;
+		if (!have_full_constraints()) {
+			ret = -EINVAL;
+			goto out;
+		}
 		r = dummy_regulator_rdev;
 		get_device(&r->dev);
 	}
@@ -1859,7 +1873,8 @@  static int regulator_resolve_supply(struct regulator_dev *rdev)
 	if (r->dev.parent && r->dev.parent != rdev->dev.parent) {
 		if (!device_is_bound(r->dev.parent)) {
 			put_device(&r->dev);
-			return -EPROBE_DEFER;
+			ret = -EPROBE_DEFER;
+			goto out;
 		}
 	}
 
@@ -1867,13 +1882,13 @@  static int regulator_resolve_supply(struct regulator_dev *rdev)
 	ret = regulator_resolve_supply(r);
 	if (ret < 0) {
 		put_device(&r->dev);
-		return ret;
+		goto out;
 	}
 
 	ret = set_supply(rdev, r);
 	if (ret < 0) {
 		put_device(&r->dev);
-		return ret;
+		goto out;
 	}
 
 	/*
@@ -1886,11 +1901,13 @@  static int regulator_resolve_supply(struct regulator_dev *rdev)
 		if (ret < 0) {
 			_regulator_put(rdev->supply);
 			rdev->supply = NULL;
-			return ret;
+			goto out;
 		}
 	}
 
-	return 0;
+out:
+	regulator_unlock(rdev);
+	return ret;
 }
 
 /* Internal regulator request function */