Message ID | 20190708195613.205729-1-dianders@chromium.org (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | mmc: dw_mmc: Fix occasional hang after tuning on eMMC | expand |
On Tue, 9 Jul 2019 at 00:48, Douglas Anderson <dianders@chromium.org> wrote: > > In commit 46d179525a1f ("mmc: dw_mmc: Wait for data transfer after > response errors.") we fixed a tuning-induced hang that I saw when > stress testing tuning on certain SD cards. I won't re-hash that whole > commit, but the summary is that as a normal part of tuning you need to > deal with transfer errors and there were cases where these transfer > errors was putting my system into a bad state causing all future > transfers to fail. That commit fixed handling of the transfer errors > for me. > > In downstream Chrome OS my fix landed and had the same behavior for > all SD/MMC commands. However, it looks like when the commit landed > upstream we limited it to only SD tuning commands. Presumably this > was to try to get around problems that Alim Akhtar reported on exynos > [1]. > > Unfortunately while stress testing reboots (and suspend/resume) on > some rk3288-based Chromebooks I found the same problem on the eMMC on > some of my Chromebooks (the ones with Hynix eMMC). Since the eMMC > tuning command is different (MMC_SEND_TUNING_BLOCK_HS200 > vs. MMC_SEND_TUNING_BLOCK) we were basically getting back into the > same situation. > > I'm hoping that whatever problems exynos was having in the past are > somehow magically fixed now and we can make the behavior the same for > all commands. > > [1] https://lkml.kernel.org/r/CAGOxZ53WfNbaMe0_AM0qBqU47kAfgmPBVZC8K8Y-_J3mDMqW4A@mail.gmail.com > > Fixes: 46d179525a1f ("mmc: dw_mmc: Wait for data transfer after response errors.") > Signed-off-by: Douglas Anderson <dianders@chromium.org> > Cc: Marek Szyprowski <m.szyprowski@samsung.com> > Cc: Alim Akhtar <alim.akhtar@gmail.com> > Cc: Enric Balletbo i Serra <enric.balletbo@collabora.com> > --- > Marek (or anyone else using exynos): is it easy for you to test this > and check if things are still broken when we land this patch? If so, > I guess we could have a quirk to have different behavior for just > Rockchip SoCs but I'd rather avoid that if possible. > > NOTE: I'm not hoping totally in vain here. It is possible that some > of the CTO/DTO timers that landed could be the magic that would get > exynos unstuck. I have eMMC module attached to Odroid U3 (Exynos4412, samsung,exynos4412-dw-mshc). What is the testing procedure? With your patch it boots fine: [ 3.698637] mmc_host mmc1: Bus speed (slot 0) = 50000000Hz (slot req 52000000Hz, actual 50000000HZ div = 0) [ 3.703900] mmc1: new DDR MMC card at address 0001 [ 3.728458] mmcblk1: mmc1:0001 008G92 7.28 GiB Best regards, Krzysztof
Hi, On Tue, Jul 9, 2019 at 2:07 AM Krzysztof Kozlowski <krzk@kernel.org> wrote: > > On Tue, 9 Jul 2019 at 00:48, Douglas Anderson <dianders@chromium.org> wrote: > > > > In commit 46d179525a1f ("mmc: dw_mmc: Wait for data transfer after > > response errors.") we fixed a tuning-induced hang that I saw when > > stress testing tuning on certain SD cards. I won't re-hash that whole > > commit, but the summary is that as a normal part of tuning you need to > > deal with transfer errors and there were cases where these transfer > > errors was putting my system into a bad state causing all future > > transfers to fail. That commit fixed handling of the transfer errors > > for me. > > > > In downstream Chrome OS my fix landed and had the same behavior for > > all SD/MMC commands. However, it looks like when the commit landed > > upstream we limited it to only SD tuning commands. Presumably this > > was to try to get around problems that Alim Akhtar reported on exynos > > [1]. > > > > Unfortunately while stress testing reboots (and suspend/resume) on > > some rk3288-based Chromebooks I found the same problem on the eMMC on > > some of my Chromebooks (the ones with Hynix eMMC). Since the eMMC > > tuning command is different (MMC_SEND_TUNING_BLOCK_HS200 > > vs. MMC_SEND_TUNING_BLOCK) we were basically getting back into the > > same situation. > > > > I'm hoping that whatever problems exynos was having in the past are > > somehow magically fixed now and we can make the behavior the same for > > all commands. > > > > [1] https://lkml.kernel.org/r/CAGOxZ53WfNbaMe0_AM0qBqU47kAfgmPBVZC8K8Y-_J3mDMqW4A@mail.gmail.com > > > > Fixes: 46d179525a1f ("mmc: dw_mmc: Wait for data transfer after response errors.") > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > Cc: Marek Szyprowski <m.szyprowski@samsung.com> > > Cc: Alim Akhtar <alim.akhtar@gmail.com> > > Cc: Enric Balletbo i Serra <enric.balletbo@collabora.com> > > --- > > Marek (or anyone else using exynos): is it easy for you to test this > > and check if things are still broken when we land this patch? If so, > > I guess we could have a quirk to have different behavior for just > > Rockchip SoCs but I'd rather avoid that if possible. > > > > NOTE: I'm not hoping totally in vain here. It is possible that some > > of the CTO/DTO timers that landed could be the magic that would get > > exynos unstuck. > > I have eMMC module attached to Odroid U3 (Exynos4412, > samsung,exynos4412-dw-mshc). What is the testing procedure? With your > patch it boots fine: > [ 3.698637] mmc_host mmc1: Bus speed (slot 0) = 50000000Hz (slot > req 52000000Hz, actual 50000000HZ div = 0) > [ 3.703900] mmc1: new DDR MMC card at address 0001 > [ 3.728458] mmcblk1: mmc1:0001 008G92 7.28 GiB To really test it, it'd be nice to see some HS200 eMMC cards enumerate OK. Specifically the patch adjusts the error handling and the place where that happens mostly is during tuning. I'll also try to find some time today to check a peach_pit or a peach_pi. I think I saw one in the pile near my desk so if it isn't in too bad of a shape I can give mainline a shot on it. -Doug
Hi, Missatge de Doug Anderson <dianders@chromium.org> del dia dt., 9 de jul. 2019 a les 18:38: > > Hi, > > On Tue, Jul 9, 2019 at 2:07 AM Krzysztof Kozlowski <krzk@kernel.org> wrote: > > > > On Tue, 9 Jul 2019 at 00:48, Douglas Anderson <dianders@chromium.org> wrote: > > > > > > In commit 46d179525a1f ("mmc: dw_mmc: Wait for data transfer after > > > response errors.") we fixed a tuning-induced hang that I saw when > > > stress testing tuning on certain SD cards. I won't re-hash that whole > > > commit, but the summary is that as a normal part of tuning you need to > > > deal with transfer errors and there were cases where these transfer > > > errors was putting my system into a bad state causing all future > > > transfers to fail. That commit fixed handling of the transfer errors > > > for me. > > > > > > In downstream Chrome OS my fix landed and had the same behavior for > > > all SD/MMC commands. However, it looks like when the commit landed > > > upstream we limited it to only SD tuning commands. Presumably this > > > was to try to get around problems that Alim Akhtar reported on exynos > > > [1]. > > > > > > Unfortunately while stress testing reboots (and suspend/resume) on > > > some rk3288-based Chromebooks I found the same problem on the eMMC on > > > some of my Chromebooks (the ones with Hynix eMMC). Since the eMMC > > > tuning command is different (MMC_SEND_TUNING_BLOCK_HS200 > > > vs. MMC_SEND_TUNING_BLOCK) we were basically getting back into the > > > same situation. > > > > > > I'm hoping that whatever problems exynos was having in the past are > > > somehow magically fixed now and we can make the behavior the same for > > > all commands. > > > > > > [1] https://lkml.kernel.org/r/CAGOxZ53WfNbaMe0_AM0qBqU47kAfgmPBVZC8K8Y-_J3mDMqW4A@mail.gmail.com > > > > > > Fixes: 46d179525a1f ("mmc: dw_mmc: Wait for data transfer after response errors.") > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > Cc: Marek Szyprowski <m.szyprowski@samsung.com> > > > Cc: Alim Akhtar <alim.akhtar@gmail.com> > > > Cc: Enric Balletbo i Serra <enric.balletbo@collabora.com> > > > --- > > > Marek (or anyone else using exynos): is it easy for you to test this > > > and check if things are still broken when we land this patch? If so, > > > I guess we could have a quirk to have different behavior for just > > > Rockchip SoCs but I'd rather avoid that if possible. > > > > > > NOTE: I'm not hoping totally in vain here. It is possible that some > > > of the CTO/DTO timers that landed could be the magic that would get > > > exynos unstuck. > > > > I have eMMC module attached to Odroid U3 (Exynos4412, > > samsung,exynos4412-dw-mshc). What is the testing procedure? With your > > patch it boots fine: > > [ 3.698637] mmc_host mmc1: Bus speed (slot 0) = 50000000Hz (slot > > req 52000000Hz, actual 50000000HZ div = 0) > > [ 3.703900] mmc1: new DDR MMC card at address 0001 > > [ 3.728458] mmcblk1: mmc1:0001 008G92 7.28 GiB > > To really test it, it'd be nice to see some HS200 eMMC cards enumerate > OK. Specifically the patch adjusts the error handling and the place > where that happens mostly is during tuning. > > I'll also try to find some time today to check a peach_pit or a > peach_pi. I think I saw one in the pile near my desk so if it isn't > in too bad of a shape I can give mainline a shot on it. > I did a normal boot on peach_pi [1] and odroidxu3 [2] with that patch applied, and the eMMC attached on both was detected as [ 2.294798] mmc0: new HS400 MMC card at address 0001 I can do some stress tests tomorrow on those boards if that helps. Cheers, ~ Enric [1] https://storage.kernelci.org/chrome-platform/for-kernelci/ib-mfd-cros-v5.3-87-g0fe7e9d7d5a3/arm/multi_v7_defconfig/gcc-8/lab-collabora/boot-exynos5800-peach-pi.html [2] https://storage.kernelci.org/chrome-platform/for-kernelci/ib-mfd-cros-v5.3-87-g0fe7e9d7d5a3/arm/multi_v7_defconfig/gcc-8/lab-collabora/boot-exynos5422-odroidxu3.html > -Doug > > _______________________________________________ > Linux-rockchip mailing list > Linux-rockchip@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-rockchip
Hi, On Tue, Jul 9, 2019 at 9:38 AM Doug Anderson <dianders@chromium.org> wrote: > > Hi, > > On Tue, Jul 9, 2019 at 2:07 AM Krzysztof Kozlowski <krzk@kernel.org> wrote: > > > > On Tue, 9 Jul 2019 at 00:48, Douglas Anderson <dianders@chromium.org> wrote: > > > > > > In commit 46d179525a1f ("mmc: dw_mmc: Wait for data transfer after > > > response errors.") we fixed a tuning-induced hang that I saw when > > > stress testing tuning on certain SD cards. I won't re-hash that whole > > > commit, but the summary is that as a normal part of tuning you need to > > > deal with transfer errors and there were cases where these transfer > > > errors was putting my system into a bad state causing all future > > > transfers to fail. That commit fixed handling of the transfer errors > > > for me. > > > > > > In downstream Chrome OS my fix landed and had the same behavior for > > > all SD/MMC commands. However, it looks like when the commit landed > > > upstream we limited it to only SD tuning commands. Presumably this > > > was to try to get around problems that Alim Akhtar reported on exynos > > > [1]. > > > > > > Unfortunately while stress testing reboots (and suspend/resume) on > > > some rk3288-based Chromebooks I found the same problem on the eMMC on > > > some of my Chromebooks (the ones with Hynix eMMC). Since the eMMC > > > tuning command is different (MMC_SEND_TUNING_BLOCK_HS200 > > > vs. MMC_SEND_TUNING_BLOCK) we were basically getting back into the > > > same situation. > > > > > > I'm hoping that whatever problems exynos was having in the past are > > > somehow magically fixed now and we can make the behavior the same for > > > all commands. > > > > > > [1] https://lkml.kernel.org/r/CAGOxZ53WfNbaMe0_AM0qBqU47kAfgmPBVZC8K8Y-_J3mDMqW4A@mail.gmail.com > > > > > > Fixes: 46d179525a1f ("mmc: dw_mmc: Wait for data transfer after response errors.") > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > Cc: Marek Szyprowski <m.szyprowski@samsung.com> > > > Cc: Alim Akhtar <alim.akhtar@gmail.com> > > > Cc: Enric Balletbo i Serra <enric.balletbo@collabora.com> > > > --- > > > Marek (or anyone else using exynos): is it easy for you to test this > > > and check if things are still broken when we land this patch? If so, > > > I guess we could have a quirk to have different behavior for just > > > Rockchip SoCs but I'd rather avoid that if possible. > > > > > > NOTE: I'm not hoping totally in vain here. It is possible that some > > > of the CTO/DTO timers that landed could be the magic that would get > > > exynos unstuck. > > > > I have eMMC module attached to Odroid U3 (Exynos4412, > > samsung,exynos4412-dw-mshc). What is the testing procedure? With your > > patch it boots fine: > > [ 3.698637] mmc_host mmc1: Bus speed (slot 0) = 50000000Hz (slot > > req 52000000Hz, actual 50000000HZ div = 0) > > [ 3.703900] mmc1: new DDR MMC card at address 0001 > > [ 3.728458] mmcblk1: mmc1:0001 008G92 7.28 GiB > > To really test it, it'd be nice to see some HS200 eMMC cards enumerate > OK. Specifically the patch adjusts the error handling and the place > where that happens mostly is during tuning. > > I'll also try to find some time today to check a peach_pit or a > peach_pi. I think I saw one in the pile near my desk so if it isn't > in too bad of a shape I can give mainline a shot on it. OK, I managed to get an exynos5800-peach-pi up and running. I put my patch in place and am currently at 45 reboots and counting w/ no problems. NOTE: in my case I actually had to disable "hs400" mode on my peach-pi but that's because the board I dug up was an early version of the board that didn't have the strobe line connected. However, Alim's earlier reports of problems were with hs200 anyway and hs200 still executes the tuning plenty of times. His reports of problems also said that he had problems after just a few boots. So I'll assert that whatever problems were present 4 years ago have indeed gone away. I'll leave rebooting happening overnight just in case, but otherwise I'll assert that this is fine. -Doug
Hi, On Tue, Jul 9, 2019 at 3:02 PM Doug Anderson <dianders@chromium.org> wrote: > > Hi, > > On Tue, Jul 9, 2019 at 9:38 AM Doug Anderson <dianders@chromium.org> wrote: > > > > Hi, > > > > On Tue, Jul 9, 2019 at 2:07 AM Krzysztof Kozlowski <krzk@kernel.org> wrote: > > > > > > On Tue, 9 Jul 2019 at 00:48, Douglas Anderson <dianders@chromium.org> wrote: > > > > > > > > In commit 46d179525a1f ("mmc: dw_mmc: Wait for data transfer after > > > > response errors.") we fixed a tuning-induced hang that I saw when > > > > stress testing tuning on certain SD cards. I won't re-hash that whole > > > > commit, but the summary is that as a normal part of tuning you need to > > > > deal with transfer errors and there were cases where these transfer > > > > errors was putting my system into a bad state causing all future > > > > transfers to fail. That commit fixed handling of the transfer errors > > > > for me. > > > > > > > > In downstream Chrome OS my fix landed and had the same behavior for > > > > all SD/MMC commands. However, it looks like when the commit landed > > > > upstream we limited it to only SD tuning commands. Presumably this > > > > was to try to get around problems that Alim Akhtar reported on exynos > > > > [1]. > > > > > > > > Unfortunately while stress testing reboots (and suspend/resume) on > > > > some rk3288-based Chromebooks I found the same problem on the eMMC on > > > > some of my Chromebooks (the ones with Hynix eMMC). Since the eMMC > > > > tuning command is different (MMC_SEND_TUNING_BLOCK_HS200 > > > > vs. MMC_SEND_TUNING_BLOCK) we were basically getting back into the > > > > same situation. > > > > > > > > I'm hoping that whatever problems exynos was having in the past are > > > > somehow magically fixed now and we can make the behavior the same for > > > > all commands. > > > > > > > > [1] https://lkml.kernel.org/r/CAGOxZ53WfNbaMe0_AM0qBqU47kAfgmPBVZC8K8Y-_J3mDMqW4A@mail.gmail.com > > > > > > > > Fixes: 46d179525a1f ("mmc: dw_mmc: Wait for data transfer after response errors.") > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > > Cc: Marek Szyprowski <m.szyprowski@samsung.com> > > > > Cc: Alim Akhtar <alim.akhtar@gmail.com> > > > > Cc: Enric Balletbo i Serra <enric.balletbo@collabora.com> > > > > --- > > > > Marek (or anyone else using exynos): is it easy for you to test this > > > > and check if things are still broken when we land this patch? If so, > > > > I guess we could have a quirk to have different behavior for just > > > > Rockchip SoCs but I'd rather avoid that if possible. > > > > > > > > NOTE: I'm not hoping totally in vain here. It is possible that some > > > > of the CTO/DTO timers that landed could be the magic that would get > > > > exynos unstuck. > > > > > > I have eMMC module attached to Odroid U3 (Exynos4412, > > > samsung,exynos4412-dw-mshc). What is the testing procedure? With your > > > patch it boots fine: > > > [ 3.698637] mmc_host mmc1: Bus speed (slot 0) = 50000000Hz (slot > > > req 52000000Hz, actual 50000000HZ div = 0) > > > [ 3.703900] mmc1: new DDR MMC card at address 0001 > > > [ 3.728458] mmcblk1: mmc1:0001 008G92 7.28 GiB > > > > To really test it, it'd be nice to see some HS200 eMMC cards enumerate > > OK. Specifically the patch adjusts the error handling and the place > > where that happens mostly is during tuning. > > > > I'll also try to find some time today to check a peach_pit or a > > peach_pi. I think I saw one in the pile near my desk so if it isn't > > in too bad of a shape I can give mainline a shot on it. > > OK, I managed to get an exynos5800-peach-pi up and running. I put my > patch in place and am currently at 45 reboots and counting w/ no > problems. In case it helps, I made it through 2379 more reboots on my peach_pi w/ no hangs. I'm putting the device back in mothball now. :-P I didn't go back and try to reproduce the original problems so I guess I can't assert with 100% authority that the original issue is gone, but my testing combined with Enric's seems like things are working fine. -Doug
On Mon, 8 Jul 2019 at 21:56, Douglas Anderson <dianders@chromium.org> wrote: > > In commit 46d179525a1f ("mmc: dw_mmc: Wait for data transfer after > response errors.") we fixed a tuning-induced hang that I saw when > stress testing tuning on certain SD cards. I won't re-hash that whole > commit, but the summary is that as a normal part of tuning you need to > deal with transfer errors and there were cases where these transfer > errors was putting my system into a bad state causing all future > transfers to fail. That commit fixed handling of the transfer errors > for me. > > In downstream Chrome OS my fix landed and had the same behavior for > all SD/MMC commands. However, it looks like when the commit landed > upstream we limited it to only SD tuning commands. Presumably this > was to try to get around problems that Alim Akhtar reported on exynos > [1]. > > Unfortunately while stress testing reboots (and suspend/resume) on > some rk3288-based Chromebooks I found the same problem on the eMMC on > some of my Chromebooks (the ones with Hynix eMMC). Since the eMMC > tuning command is different (MMC_SEND_TUNING_BLOCK_HS200 > vs. MMC_SEND_TUNING_BLOCK) we were basically getting back into the > same situation. > > I'm hoping that whatever problems exynos was having in the past are > somehow magically fixed now and we can make the behavior the same for > all commands. > > [1] https://lkml.kernel.org/r/CAGOxZ53WfNbaMe0_AM0qBqU47kAfgmPBVZC8K8Y-_J3mDMqW4A@mail.gmail.com > > Fixes: 46d179525a1f ("mmc: dw_mmc: Wait for data transfer after response errors.") > Signed-off-by: Douglas Anderson <dianders@chromium.org> > Cc: Marek Szyprowski <m.szyprowski@samsung.com> > Cc: Alim Akhtar <alim.akhtar@gmail.com> > Cc: Enric Balletbo i Serra <enric.balletbo@collabora.com> Applied for fixes and by adding a stable tag, thanks! Kind regards Uffe > --- > Marek (or anyone else using exynos): is it easy for you to test this > and check if things are still broken when we land this patch? If so, > I guess we could have a quirk to have different behavior for just > Rockchip SoCs but I'd rather avoid that if possible. > > NOTE: I'm not hoping totally in vain here. It is possible that some > of the CTO/DTO timers that landed could be the magic that would get > exynos unstuck. > > drivers/mmc/host/dw_mmc.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c > index b53b6b7d4dd4..60c3a06e3469 100644 > --- a/drivers/mmc/host/dw_mmc.c > +++ b/drivers/mmc/host/dw_mmc.c > @@ -2034,8 +2034,7 @@ static void dw_mci_tasklet_func(unsigned long priv) > * delayed. Allowing the transfer to take place > * avoids races and keeps things simple. > */ > - if ((err != -ETIMEDOUT) && > - (cmd->opcode == MMC_SEND_TUNING_BLOCK)) { > + if (err != -ETIMEDOUT) { > state = STATE_SENDING_DATA; > continue; > } > -- > 2.22.0.410.gd8fdbe21b5-goog >
diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c index b53b6b7d4dd4..60c3a06e3469 100644 --- a/drivers/mmc/host/dw_mmc.c +++ b/drivers/mmc/host/dw_mmc.c @@ -2034,8 +2034,7 @@ static void dw_mci_tasklet_func(unsigned long priv) * delayed. Allowing the transfer to take place * avoids races and keeps things simple. */ - if ((err != -ETIMEDOUT) && - (cmd->opcode == MMC_SEND_TUNING_BLOCK)) { + if (err != -ETIMEDOUT) { state = STATE_SENDING_DATA; continue; }
In commit 46d179525a1f ("mmc: dw_mmc: Wait for data transfer after response errors.") we fixed a tuning-induced hang that I saw when stress testing tuning on certain SD cards. I won't re-hash that whole commit, but the summary is that as a normal part of tuning you need to deal with transfer errors and there were cases where these transfer errors was putting my system into a bad state causing all future transfers to fail. That commit fixed handling of the transfer errors for me. In downstream Chrome OS my fix landed and had the same behavior for all SD/MMC commands. However, it looks like when the commit landed upstream we limited it to only SD tuning commands. Presumably this was to try to get around problems that Alim Akhtar reported on exynos [1]. Unfortunately while stress testing reboots (and suspend/resume) on some rk3288-based Chromebooks I found the same problem on the eMMC on some of my Chromebooks (the ones with Hynix eMMC). Since the eMMC tuning command is different (MMC_SEND_TUNING_BLOCK_HS200 vs. MMC_SEND_TUNING_BLOCK) we were basically getting back into the same situation. I'm hoping that whatever problems exynos was having in the past are somehow magically fixed now and we can make the behavior the same for all commands. [1] https://lkml.kernel.org/r/CAGOxZ53WfNbaMe0_AM0qBqU47kAfgmPBVZC8K8Y-_J3mDMqW4A@mail.gmail.com Fixes: 46d179525a1f ("mmc: dw_mmc: Wait for data transfer after response errors.") Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Alim Akhtar <alim.akhtar@gmail.com> Cc: Enric Balletbo i Serra <enric.balletbo@collabora.com> --- Marek (or anyone else using exynos): is it easy for you to test this and check if things are still broken when we land this patch? If so, I guess we could have a quirk to have different behavior for just Rockchip SoCs but I'd rather avoid that if possible. NOTE: I'm not hoping totally in vain here. It is possible that some of the CTO/DTO timers that landed could be the magic that would get exynos unstuck. drivers/mmc/host/dw_mmc.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)