Message ID | 20230705144255.115299-2-chris.obbard@collabora.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Disable HS400 for eMMC on Radxa ROCK 4 SBCs | expand |
On Wed Jul 5, 2023 at 4:42 PM CEST, Christopher Obbard wrote: > There is some instablity with some eMMC modules on ROCK Pi 4 SBCs running > in HS400 mode. This ends up resulting in some block errors after a while > or after a "heavy" operation utilising the eMMC (e.g. resizing a > filesystem). An example of these errors is as follows: > > [ 289.171014] mmc1: running CQE recovery > [ 290.048972] mmc1: running CQE recovery > [ 290.054834] mmc1: running CQE recovery > [ 290.060817] mmc1: running CQE recovery > [ 290.061337] blk_update_request: I/O error, dev mmcblk1, sector 1411072 op 0x1:(WRITE) flags 0x800 phys_seg 36 prio class 0 > [ 290.061370] EXT4-fs warning (device mmcblk1p1): ext4_end_bio:348: I/O error 10 writing to inode 29547 starting block 176466) > [ 290.061484] Buffer I/O error on device mmcblk1p1, logical block 172288 > [ 290.061531] Buffer I/O error on device mmcblk1p1, logical block 172289 > [ 290.061551] Buffer I/O error on device mmcblk1p1, logical block 172290 > [ 290.061574] Buffer I/O error on device mmcblk1p1, logical block 172291 > [ 290.061592] Buffer I/O error on device mmcblk1p1, logical block 172292 > [ 290.061615] Buffer I/O error on device mmcblk1p1, logical block 172293 > [ 290.061632] Buffer I/O error on device mmcblk1p1, logical block 172294 > [ 290.061654] Buffer I/O error on device mmcblk1p1, logical block 172295 > [ 290.061673] Buffer I/O error on device mmcblk1p1, logical block 172296 > [ 290.061695] Buffer I/O error on device mmcblk1p1, logical block 172297 > > Disabling the Command Queue seems to stop the CQE recovery from running, > but doesn't seem to improve the I/O errors. Until this can be investigated > further, disable HS400 mode on the ROCK Pi 4 SBCs to at least stop I/O > errors from occurring. > > While we are here, set the eMMC maximum clock frequency to 1.5MHz to > follow the ROCK 4C+. > > Fixes: 1b5715c602fd ("arm64: dts: rockchip: add ROCK Pi 4 DTS support") > Signed-off-by: Christopher Obbard <chris.obbard@collabora.com> > --- > > arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi b/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi > index 907071d4fe80..95efee311ece 100644 > --- a/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi > +++ b/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi > @@ -645,9 +645,9 @@ &saradc { > }; > > &sdhci { > + max-frequency = <150000000>; > bus-width = <8>; > - mmc-hs400-1_8v; > - mmc-hs400-enhanced-strobe; > + mmc-hs200-1_8v; > non-removable; > status = "okay"; > }; Works as advertised on a RockPi 4b v1.3 with kernel 6.1.37. Tested-By: Folker Schwesinger <dev@folker-schwesinger.de> Folker
Le mercredi 05 juillet 2023 à 15:42 +0100, Christopher Obbard a écrit : > > > > There is some instablity with some eMMC modules on ROCK Pi 4 > > > > SBCs > > > > running > > > > in HS400 mode. This ends up resulting in some block errors > > > > after a > > > > while > > > > or after a "heavy" operation utilising the eMMC (e.g. resizing > > > > a > > > > filesystem). An example of these errors is as follows: I did not report my finding to the Linux upstream back then (due to using a non vanilla Linux kernel) but with my Armbian install I had bisected this issue to 06653ebc0ad2e0b7d799cd71a5c2933ed2fb7a66 as the first bad commit. I believe it was released in 5.10.60 (the first broken version to reach armbian was 5.10.63 from a working 5.10.43. Since then all rk3399 I have checked have disabled hs400 (down to hs200 which is stable even with the above commits). commit 06653ebc0ad2e0b7d799cd71a5c2933ed2fb7a66 Author: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Date: Thu May 20 01:12:23 2021 +0300 regulator: core: resolve supply for boot-on/always-on regulators commit 98e48cd9283dbac0e1445ee780889f10b3d1db6a upstream. For the boot-on/always-on regulators the set_machine_constrainst() is called before resolving rdev->supply. Thus the code would try to enable rdev before enabling supplying regulator. Enforce resolving supply regulator before enabling rdev. Fixes: aea6cb99703e ("regulator: resolve supply after creating regulator") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20210519221224.2868496-1-dmitry.baryshkov@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> drivers/regulator/core.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c index f192bf19492ed..e20e77e4c159d 100644 --- a/drivers/regulator/core.c +++ b/drivers/regulator/core.c @@ -1425,6 +1425,12 @@ static int set_machine_constraints(struct regulator_dev *rdev) * and we have control then make sure it is enabled. */ if (rdev->constraints->always_on || rdev->constraints->boot_on) { + /* If we want to enable this regulator, make sure that we know + * the supplying regulator. + */ + if (rdev->supply_name && !rdev->supply) + return -EPROBE_DEFER; + if (rdev->supply) { ret = regulator_enable(rdev->supply); if (ret < 0) { My findings here: https://forum.armbian.com/topic/18855-upgrading-to-bullseye-troubleshooting-armbian-21081/?do=findComment&comment=128793 this on a kobol helios64 rk3399 board. I told a user to try this fix (revert commits 06653ebc0ad2e0b7d799cd71a5c2933ed2fb7a66 and aea6cb99703e17019e025aa71643b4d3e0a24413) also for an armbian kernel on a Nanopc-T4 and it fixes the issue https://forum.armbian.com/topic/20002-nanopc-t4-new-kernel-2202-generates-issues-on-mmc2-and-makes-system-not-properly-working/?do=findComment&comment=138052 This above 5.16.8. I had high expectations that the commit that fixed double init would fix the issue for good, but sadly not. I believe this would have been the only required fix for 5.16 kernels but nowadays it is not enough a revert. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/regulator/core.c?id=8a866d527ac0441c0eb14a991fa11358b476b11d regulator: core: Resolve supply name earlier to prevent double-init Previously, an unresolved regulator supply reference upon calling regulator_register on an always-on or boot-on regulator caused set_machine_constraints to be called twice. This in turn may initialize the regulator twice, leading to voltage glitches that are timing-dependent. A simple, unrelated configuration change may be enough to hide this problem, only to be surfaced by chance. One such example is the SD-Card voltage regulator in a NanoPI R4S that would not initialize reliably unless the registration flow was just complex enough to allow the regulator to properly reset between calls. Fix this by re-arranging regulator_register, trying resolve the regulator's supply early enough that set_machine_constraints does not need to be called twice. Signed-off-by: Christian Kohlschütter <christian@kohlschutter.com> Link: https://lore.kernel.org/r/20220818124646.6005-1-christian@kohlschutter.com Signed-off-by: Mark Brown <broonie@kernel.org> " story behing this patch https://kohlschuetter.github.io/blog/posts/2022/10/28/linux-nanopi-r4s/ It should have worked because basically this patch is a revert of commit aea6cb99703e17019e025aa71643b4d3e0a24413 "regulator: resolve supply after creating regulator" except it keep what I believe is now dead code (ie the second set_machine_constains in "if (ret == - EPROBE_DEFER) " is of no use now that the regulator supply is resolved before the first set_machine_constraints call in regilator_registers. The only code left from the 5.10.60 breakage is the EPROBE_DEFER if regulator supply is not registered in set_machine_constrains. But even after removing this leftover and the new EPROBE_DEFER that was added to set_machine_constraints for "regulator that have no direct control", I cannot get rid of the Filesystem corruption and errors with hs400 with 6.3. Still I have no clue why emmc regulators double init is fine on most SoC but not rk3399. Cheers, Alban
diff --git a/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi b/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi index 907071d4fe80..95efee311ece 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi @@ -645,9 +645,9 @@ &saradc { }; &sdhci { + max-frequency = <150000000>; bus-width = <8>; - mmc-hs400-1_8v; - mmc-hs400-enhanced-strobe; + mmc-hs200-1_8v; non-removable; status = "okay"; };
There is some instablity with some eMMC modules on ROCK Pi 4 SBCs running in HS400 mode. This ends up resulting in some block errors after a while or after a "heavy" operation utilising the eMMC (e.g. resizing a filesystem). An example of these errors is as follows: [ 289.171014] mmc1: running CQE recovery [ 290.048972] mmc1: running CQE recovery [ 290.054834] mmc1: running CQE recovery [ 290.060817] mmc1: running CQE recovery [ 290.061337] blk_update_request: I/O error, dev mmcblk1, sector 1411072 op 0x1:(WRITE) flags 0x800 phys_seg 36 prio class 0 [ 290.061370] EXT4-fs warning (device mmcblk1p1): ext4_end_bio:348: I/O error 10 writing to inode 29547 starting block 176466) [ 290.061484] Buffer I/O error on device mmcblk1p1, logical block 172288 [ 290.061531] Buffer I/O error on device mmcblk1p1, logical block 172289 [ 290.061551] Buffer I/O error on device mmcblk1p1, logical block 172290 [ 290.061574] Buffer I/O error on device mmcblk1p1, logical block 172291 [ 290.061592] Buffer I/O error on device mmcblk1p1, logical block 172292 [ 290.061615] Buffer I/O error on device mmcblk1p1, logical block 172293 [ 290.061632] Buffer I/O error on device mmcblk1p1, logical block 172294 [ 290.061654] Buffer I/O error on device mmcblk1p1, logical block 172295 [ 290.061673] Buffer I/O error on device mmcblk1p1, logical block 172296 [ 290.061695] Buffer I/O error on device mmcblk1p1, logical block 172297 Disabling the Command Queue seems to stop the CQE recovery from running, but doesn't seem to improve the I/O errors. Until this can be investigated further, disable HS400 mode on the ROCK Pi 4 SBCs to at least stop I/O errors from occurring. While we are here, set the eMMC maximum clock frequency to 1.5MHz to follow the ROCK 4C+. Fixes: 1b5715c602fd ("arm64: dts: rockchip: add ROCK Pi 4 DTS support") Signed-off-by: Christopher Obbard <chris.obbard@collabora.com> --- arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)