Message ID | 20180306163035.GE13586@sirena.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Mark, On Tue, Mar 6, 2018 at 1:30 PM, Mark Brown <broonie@kernel.org> wrote: > diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c > index e685f8b94acf..2c5b20a97f51 100644 > --- a/drivers/regulator/core.c > +++ b/drivers/regulator/core.c > @@ -159,7 +159,7 @@ static void regulator_lock_supply(struct regulator_dev *rdev) > { > int i; > > - for (i = 0; rdev; rdev = rdev_get_supply(rdev), i++) > + for (i = 1000; rdev; rdev = rdev_get_supply(rdev), i++) > mutex_lock_nested(&rdev->mutex, i); With this change the log is a bit different, but still get a kernel hang: https://pastebin.com/eF08TnuT Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Fabio Estevam <festevam@gmail.com> [180306 16:57]: > Hi Mark, > > On Tue, Mar 6, 2018 at 1:30 PM, Mark Brown <broonie@kernel.org> wrote: > > > diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c > > index e685f8b94acf..2c5b20a97f51 100644 > > --- a/drivers/regulator/core.c > > +++ b/drivers/regulator/core.c > > @@ -159,7 +159,7 @@ static void regulator_lock_supply(struct regulator_dev *rdev) > > { > > int i; > > > > - for (i = 0; rdev; rdev = rdev_get_supply(rdev), i++) > > + for (i = 1000; rdev; rdev = rdev_get_supply(rdev), i++) > > mutex_lock_nested(&rdev->mutex, i); > > With this change the log is a bit different, but still get a kernel hang: > https://pastebin.com/eF08TnuT That patch does not seem to change anything for me. The errors I posted yesterday for mmc0 only happen on duovero, other omap variants just fail silently with no cards showing. Regards, Tony -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 06, 2018 at 01:56:42PM -0300, Fabio Estevam wrote: > On Tue, Mar 6, 2018 at 1:30 PM, Mark Brown <broonie@kernel.org> wrote: > > - for (i = 0; rdev; rdev = rdev_get_supply(rdev), i++) > > + for (i = 1000; rdev; rdev = rdev_get_supply(rdev), i++) > > mutex_lock_nested(&rdev->mutex, i); > With this change the log is a bit different, but still get a kernel hang: > https://pastebin.com/eF08TnuT Yeah, that's what I expected. What we really need to figure out I think is what exactly is taking the lock that we end up colliding with, I don't seem to be able to reproduce so I'm having to just go on staring at the code here.
Hi Mark, On Tue, Mar 6, 2018 at 4:12 PM, Mark Brown <broonie@kernel.org> wrote: > Yeah, that's what I expected. What we really need to figure out I think > is what exactly is taking the lock that we end up colliding with, I > don't seem to be able to reproduce so I'm having to just go on staring > at the code here. Any debug options I should turn on in order to help us to understand more on the lock issue? -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 06, 2018 at 05:06:48PM -0300, Fabio Estevam wrote: > On Tue, Mar 6, 2018 at 4:12 PM, Mark Brown <broonie@kernel.org> wrote: > > Yeah, that's what I expected. What we really need to figure out I think > > is what exactly is taking the lock that we end up colliding with, I > > don't seem to be able to reproduce so I'm having to just go on staring > > at the code here. > Any debug options I should turn on in order to help us to understand > more on the lock issue? No, unfortunately, apart from the generic kernel ones (like lockdep). I'll try to have a poke at adding some tomorrow, though I'm also considering just dropping the series for now.
On Tue, Mar 6, 2018 at 6:33 PM, Mark Brown <broonie@kernel.org> wrote: > No, unfortunately, apart from the generic kernel ones (like lockdep). > I'll try to have a poke at adding some tomorrow, though I'm also > considering just dropping the series for now. Ok, lockdep is already enabled by default in the defconfig I am using (imx_v6_v7_defconfig). Yesterday I spent a few hours trying to understand this locking issue, but didn't get any success on this task. Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi all, sorry it took me so long to answer. On 03/06/2018 05:30 PM, Mark Brown wrote: > On Mon, Mar 05, 2018 at 08:22:26PM -0300, Fabio Estevam wrote: >> On Mon, Mar 5, 2018 at 8:12 PM, Tony Lindgren <tony@atomide.com> wrote: > >>> Looks like with next-20180305 there's a regulator regression >>> where mmc0 won't show any cards or produces errors: > >>> mmcblk0: error -110 requesting status >>> mmc1: new high speed SDIO card at address 0001 >>> mmcblk0: error -110 requesting status >>> mmcblk0: recovery failed! >>> print_req_error: I/O error, dev mmcblk0, sector 0 >>> Buffer I/O error on dev mmcblk0, logical block 0, async page read >>> mmcblk0: error -110 requesting status >>> mmcblk0: recovery failed! > > No other error messages? That seems like there's something going on > that's very different to what Fabio was reporting... I'm guessing some > voltage application didn't go through but it's hard to tell with so > little data. dra7 does seem to have what Fabio had though so there's > definitely some effect on the OMAP platforms. > >> I have also seen regulator issues due to this series: >> https://lkml.org/lkml/2018/3/5/731 > > Looking at your stuff I'm having trouble figuring out what's going on - > we're getting double locking of a parent regulator during enable > according to your backtraces but it's not clear to me what took that > lock already. regulator_enable() walks the supplies before it takes > the lock on the regulator it's immediately being called on, not holding > any locks on supplies while enabling. regulator_balance_voltage() then > tries to lock the supplies again but lockdep says the lock is already > held by regulator_enable(). It's also weird that this doesn't seem to > be showing up on other boards in kernelci, the regulator setup on those > i.MX boards looks to be quite simple so I'd expect a much wider impact. > I'm trying to figure out what is so special about these boards. The only strange thing, that I haven't noticed at first, is that all regulators share a common supply - dummy regulator. It is defined in anatop_regulator.c. > I'm wondering if your case is more pain from mutex_lock_nested(), both > regulator_lock_coupled() and regulator_lock_supply() will end up using > indexes starting at 0 for the locking classes. That doesn't smell right > though, but in case my straw clutching works: > > If we can't figure it out I'll just drop the series but I'd prefer to at > least understand what's going on. > I have been struggling to reproduce the issue on my exynos boards, but all I have achieved is getting the same lockdep warning, but everything else works fine. I think it was a false positive caused by using the same indices in lock_coupled() and lock_supply(). > diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c > index e685f8b94acf..2c5b20a97f51 100644 > --- a/drivers/regulator/core.c > +++ b/drivers/regulator/core.c > @@ -159,7 +159,7 @@ static void regulator_lock_supply(struct regulator_dev *rdev) > { > int i; > > - for (i = 0; rdev; rdev = rdev_get_supply(rdev), i++) > + for (i = 1000; rdev; rdev = rdev_get_supply(rdev), i++) > mutex_lock_nested(&rdev->mutex, i); > } > > Best regards, Maciej Purski -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 07, 2018 at 01:57:12PM +0100, Maciej Purski wrote: > I'm trying to figure out what is so special about these boards. The only > strange thing, that I haven't noticed at first, is that all regulators share > a common supply - dummy regulator. It is defined in anatop_regulator.c. No, that's a regulator framework thing - the regulator framework will use the dummy regulator as a supply when there's nothing described in the DT so long as the client doesn't explicitly tell it that the supply might be optional.
On 03/07/2018 03:10 PM, Mark Brown wrote: > On Wed, Mar 07, 2018 at 01:57:12PM +0100, Maciej Purski wrote: > >> I'm trying to figure out what is so special about these boards. The only >> strange thing, that I haven't noticed at first, is that all regulators share >> a common supply - dummy regulator. It is defined in anatop_regulator.c. > > No, that's a regulator framework thing - the regulator framework will > use the dummy regulator as a supply when there's nothing described in > the DT so long as the client doesn't explicitly tell it that the supply > might be optional. > Ok, thanks for explanation. I think I have found a possibly dangerous scenario, but I can't see this situation possible in Fabio's case. Assume, that we have a chain of supplies, consisting of at least 3. Say: A->B->C. When we're setting voltage on A, we lock it, call balance_voltage(), lock suppliers and call set_voltage_rdev(). So we have regulators A, B, C locked. Then set_voltage_rdev() is trying to set voltage of its supply by calling set_voltage_unlocked(). Now we're on the regulator B. Set_voltage_unlocked() calls balance_voltage(), which again locks its supplies, if they exist. B's supply is C, so we end up with having a deadlock on regulator C. Tony and Fabio, do you find this scenario possible on your boards? Best regards Maciej Purski -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 06, 2018 at 07:10:01PM -0300, Fabio Estevam wrote: > Yesterday I spent a few hours trying to understand this locking issue, > but didn't get any success on this task. Right, so I'm thinking that if it's taking this much effort to think about the problem we should just drop the patch set for now. I've moved the code to a branch test/coupled in my git so it's not getting in people's way in -next. Maciej, I suspect that you might be able to reproduce this if you set supply names in the regulator definitons for the PMICs on your boards. I think that's the difference here.
* Maciej Purski <m.purski@samsung.com> [180307 14:38]: > > On 03/07/2018 03:10 PM, Mark Brown wrote: > > On Wed, Mar 07, 2018 at 01:57:12PM +0100, Maciej Purski wrote: > > > > > I'm trying to figure out what is so special about these boards. The only > > > strange thing, that I haven't noticed at first, is that all regulators share > > > a common supply - dummy regulator. It is defined in anatop_regulator.c. > > > > No, that's a regulator framework thing - the regulator framework will > > use the dummy regulator as a supply when there's nothing described in > > the DT so long as the client doesn't explicitly tell it that the supply > > might be optional. > > > > Ok, thanks for explanation. I think I have found a possibly dangerous > scenario, but I can't see this situation possible in Fabio's case. > > Assume, that we have a chain of supplies, consisting of at least 3. Say: A->B->C. > > When we're setting voltage on A, we lock it, call balance_voltage(), lock > suppliers and call set_voltage_rdev(). So we have regulators A, B, C locked. > Then set_voltage_rdev() is trying to set voltage of its supply by calling > set_voltage_unlocked(). > > Now we're on the regulator B. Set_voltage_unlocked() calls > balance_voltage(), which again locks its supplies, if they exist. B's supply > is C, so we end up with having a deadlock on regulator C. > > Tony and Fabio, do you find this scenario possible on your boards? For mmc0 at least typically the supply is on the PMIC for omap variants and it's controlled over i2c or spi bus like most other regulators. Then boards some have additional GPIO controlled regulators. So yeah some kind of locking issue between two or more regulators on i2c or spi bus could be the reason. Regards, Tony -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c index e685f8b94acf..2c5b20a97f51 100644 --- a/drivers/regulator/core.c +++ b/drivers/regulator/core.c @@ -159,7 +159,7 @@ static void regulator_lock_supply(struct regulator_dev *rdev) { int i; - for (i = 0; rdev; rdev = rdev_get_supply(rdev), i++) + for (i = 1000; rdev; rdev = rdev_get_supply(rdev), i++) mutex_lock_nested(&rdev->mutex, i); }