Message ID | 1541967839-2847-1-git-send-email-stefan.wahren@i2se.com (mailing list archive) |
---|---|
Headers | show |
Series | mmc: Several fixes for bcm2835 driver | expand |
On Sun, 11 Nov 2018 at 21:24, Stefan Wahren <stefan.wahren@i2se.com> wrote: > > This patch series fixes several issues which has been discovered after > submission. > > Changes in V2: > - add my own signed-off-by to patches #1 and #2 > > Michal Suchanek (1): > mmc: bcm2835: reset host on timeout > > Phil Elwell (1): > mmc: bcm2835: Recover from MMC_SEND_EXT_CSD > > Stefan Wahren (5): > mmc: bcm2835: Release DMA channel on driver unload > mmc: bcm2835: Avoid possible races on data requests > mmc: bcm2835: Terminate timeout work synchronously > mmc: bcm2835: Refactor dma_map_sg handling > mmc: bcm2835: Properly handle dmaengine_prep_slave_sg > > drivers/mmc/host/bcm2835.c | 58 ++++++++++++++++++++++++++++++---------------- > 1 file changed, 38 insertions(+), 20 deletions(-) > > -- > 2.7.4 > Applied for next, thanks! Kind regards Uffe
On Sun, 11 Nov 2018 21:23:52 +0100 Stefan Wahren <stefan.wahren@i2se.com> wrote: > This patch series fixes several issues which has been discovered after > submission. > > Changes in V2: > - add my own signed-off-by to patches #1 and #2 > > Michal Suchanek (1): > mmc: bcm2835: reset host on timeout > > Phil Elwell (1): > mmc: bcm2835: Recover from MMC_SEND_EXT_CSD > > Stefan Wahren (5): > mmc: bcm2835: Release DMA channel on driver unload > mmc: bcm2835: Avoid possible races on data requests > mmc: bcm2835: Terminate timeout work synchronously > mmc: bcm2835: Refactor dma_map_sg handling > mmc: bcm2835: Properly handle dmaengine_prep_slave_sg > > drivers/mmc/host/bcm2835.c | 58 ++++++++++++++++++++++++++++++---------------- > 1 file changed, 38 insertions(+), 20 deletions(-) > Hello, thanks for the patches. I tried to replace the bcm2835 sdhost driver in my 4.4 kernel with the upstream driver + these updates but the 16GB orange EVO card still locks up the mmc controller. It seems it locks up much less but is certainly not solid. Thanks Michal
Hi Michal, Am 21.03.19 um 21:03 schrieb Michal Suchánek: > On Sun, 11 Nov 2018 21:23:52 +0100 > Stefan Wahren <stefan.wahren@i2se.com> wrote: > >> This patch series fixes several issues which has been discovered after >> submission. >> >> Changes in V2: >> - add my own signed-off-by to patches #1 and #2 >> >> Michal Suchanek (1): >> mmc: bcm2835: reset host on timeout >> >> Phil Elwell (1): >> mmc: bcm2835: Recover from MMC_SEND_EXT_CSD >> >> Stefan Wahren (5): >> mmc: bcm2835: Release DMA channel on driver unload >> mmc: bcm2835: Avoid possible races on data requests >> mmc: bcm2835: Terminate timeout work synchronously >> mmc: bcm2835: Refactor dma_map_sg handling >> mmc: bcm2835: Properly handle dmaengine_prep_slave_sg >> >> drivers/mmc/host/bcm2835.c | 58 ++++++++++++++++++++++++++++++---------------- >> 1 file changed, 38 insertions(+), 20 deletions(-) >> > Hello, > > thanks for the patches. > > I tried to replace the bcm2835 sdhost driver in my 4.4 kernel with the > upstream driver + these updates but the 16GB orange EVO card still > locks up the mmc controller. It seems it locks up much less but is > certainly not solid. could you please retry with mainline kernel 5.0? Maybe this is related: http://lists.infradead.org/pipermail/linux-rpi-kernel/2019-February/008542.html > > Thanks > > Michal
On Fri, 22 Mar 2019 15:45:13 +0100 Stefan Wahren <stefan.wahren@i2se.com> wrote: > Hi Michal, > > Am 21.03.19 um 21:03 schrieb Michal Suchánek: > > On Sun, 11 Nov 2018 21:23:52 +0100 > > Stefan Wahren <stefan.wahren@i2se.com> wrote: > > > >> This patch series fixes several issues which has been discovered after > >> submission. > >> > >> Changes in V2: > >> - add my own signed-off-by to patches #1 and #2 > >> > >> Michal Suchanek (1): > >> mmc: bcm2835: reset host on timeout > >> > >> Phil Elwell (1): > >> mmc: bcm2835: Recover from MMC_SEND_EXT_CSD > >> > >> Stefan Wahren (5): > >> mmc: bcm2835: Release DMA channel on driver unload > >> mmc: bcm2835: Avoid possible races on data requests > >> mmc: bcm2835: Terminate timeout work synchronously > >> mmc: bcm2835: Refactor dma_map_sg handling > >> mmc: bcm2835: Properly handle dmaengine_prep_slave_sg > >> > >> drivers/mmc/host/bcm2835.c | 58 ++++++++++++++++++++++++++++++---------------- > >> 1 file changed, 38 insertions(+), 20 deletions(-) > >> > > Hello, > > > > thanks for the patches. > > > > I tried to replace the bcm2835 sdhost driver in my 4.4 kernel with the > > upstream driver + these updates but the 16GB orange EVO card still > > locks up the mmc controller. It seems it locks up much less but is > > certainly not solid. > > could you please retry with mainline kernel 5.0? I can try that. What I have is pretty much 5.0 anyway so I don't expect much difference: 660fc733bd7436f4fa1a351376493e635514ed64 mmc: bcm2835: Add new driver for the sdhost controller. bf3240bada0211b4a555d75f027181c8432b2d20 mmc: bcm2835: Fix possible NULL ptr dereference in c00a231ba053a9b0be8d512957b99395b92bc620 mmc: bcm2835: fix potential null pointer dereferences 2c9e89a1d602c12a4f2bd4c7a57a3315247e3f21 mmc: bcm2835: constify mmc_host_ops structures 118032be389009b07ecb5a03ffe219a89d421def mmc: bcm2835: Don't overwrite max frequency unconditionally f6000a4eb34e6462bc0dd39809c1bb99f9633269 mmc: bcm2835: reset host on timeout 07d405769afea5718529fc9e341f0b13b3189b6f mmc: bcm2835: Recover from MMC_SEND_EXT_CSD 5eae252db3856e62c778832d4d59f6efc5b0aaf9 mmc: bcm2835: Release DMA channel on driver unload af19b7ce76ba220f358c82b0a5e7d68909a23aa5 mmc: bcm2835: Avoid possible races on data requests 37fefadee8bb665ae337a15aa635dabff9f66ade mmc: bcm2835: Terminate timeout work synchronously 6dc6f2619017109e45550accc120f823fdc31c3e mmc: bcm2835: Refactor dma_map_sg handling 2f5da678351f0d504966fab113968202aa5713fb mmc: bcm2835: Properly handle dmaengine_prep_slave_sg 8c9620b1cc9b69e82fa8d4081d646d0016b602e7 mmc: bcm2835: Fix DMA channel leak on probe error e5c1e63c932379b89d7404d4e5fde1bf8abff951 mmc: bcm2835: Drop DMA channel error pointer check What I had before was some previous draft of 660fc733bd7436f4fa1a351376493e635514ed64 + f6000a4eb34e6462bc0dd39809c1bb99f9633269 which would eventually recover on the error but the errors were more frequent with my card. > > Maybe this is related: > > http://lists.infradead.org/pipermail/linux-rpi-kernel/2019-February/008542.html I suspect that one of the locking fixes that went into mainline recently prevents recovering from the error but I did not try reverting them yet. It takes quite a while to reimage, boot different kernel, run system update (which now invariably crashes/locks up with any kernel). Thanks Michal
Hi Michal, > Michal Suchánek <msuchanek@suse.de> hat am 22. März 2019 um 17:06 geschrieben: > > > On Fri, 22 Mar 2019 15:45:13 +0100 > Stefan Wahren <stefan.wahren@i2se.com> wrote: > > > Hi Michal, > > > > Am 21.03.19 um 21:03 schrieb Michal Suchánek: > > > > could you please retry with mainline kernel 5.0? > > I can try that. What I have is pretty much 5.0 anyway so I don't expect > much difference: > as long as the issue lies in the sdhost driver code. There also has been a lot of fixes by Lukas Wunner to the DMA engine driver. I prefer a well defined source base. > > I suspect that one of the locking fixes that went into mainline > recently prevents recovering from the error but I did not try > reverting them yet. Would be nice if you can find this regression. > It takes quite a while to reimage, boot different > kernel, run system update (which now invariably crashes/locks up with > any kernel). That's why i prefer Raspbian during development, where i can simply replace the kernel / modules with a sd card reader on a PC: https://gist.github.com/lategoodbye/c7317a42bf7f9c07f5a91baed8c68f75 This should reduce the round-trip, but this could accidently hide the problem as well. Thanks Stefan > > Thanks > > Michal
On Fri, 22 Mar 2019 18:10:11 +0100 (CET) Stefan Wahren <stefan.wahren@i2se.com> wrote: > Hi Michal, > > > Michal Suchánek <msuchanek@suse.de> hat am 22. März 2019 um 17:06 geschrieben: > > > > > > On Fri, 22 Mar 2019 15:45:13 +0100 > > Stefan Wahren <stefan.wahren@i2se.com> wrote: > > > > > Hi Michal, > > > > > > Am 21.03.19 um 21:03 schrieb Michal Suchánek: > > > > > > could you please retry with mainline kernel 5.0? > > > > I can try that. What I have is pretty much 5.0 anyway so I don't expect > > much difference: > > > > as long as the issue lies in the sdhost driver code. There also has been a lot of fixes by Lukas Wunner to the DMA engine driver. I prefer a well defined source base. > > > > > I suspect that one of the locking fixes that went into mainline > > recently prevents recovering from the error but I did not try > > reverting them yet. > > Would be nice if you can find this regression. Does not look that good. # bad: [a48caea1745f30e87ab5a8dba5e365d0346aa600] mmc: bcm2835: Drop DMA channel error pointer check (bsc#983145). # good: [c6b26547caa816608ea5c5717b29c78769a22972] mmc: bcm2835: reset host on timeout (bsc#983145, bsc#1070872). git bisect start 'a48caea1745f30e87ab5a8dba5e365d0346aa600' 'c6b26547caa816608ea5c5717b29c78769a22972' # good: [3eb1fe752f52865eff0f9b8edd95b61b6a9c1010] mmc: bcm2835: Terminate timeout work synchronously (bsc#983145, bsc#1070872). git bisect good 3eb1fe752f52865eff0f9b8edd95b61b6a9c1010 # good: [50b4dd03bd11fbb647f0edbed8501290f6f9ea46] mmc: bcm2835: Properly handle dmaengine_prep_slave_sg (bsc#983145). git bisect good 50b4dd03bd11fbb647f0edbed8501290f6f9ea46 On the good commits a few timeouts occur and are recovered. This leaves 8c9620b1cc9b69e82fa8d4081d646d0016b602e7 mmc: bcm2835: Fix DMA channel leak on probe error e5c1e63c932379b89d7404d4e5fde1bf8abff951 mmc: bcm2835: Drop DMA channel error pointer check which are not overly suspect. The latter should be noop at least. So maybe the indefinite lockup depends on card state or some other factor. Also the recovery in the case when it does recover does take quite long (like minutes) with the system pretty much completely non-responsive. Not waiting long enough for the system to recover might be also a factor. > > > It takes quite a while to reimage, boot different > > kernel, run system update (which now invariably crashes/locks up with > > any kernel). > > That's why i prefer Raspbian during development, where i can simply replace the kernel / modules with a sd card reader on a PC: > > https://gist.github.com/lategoodbye/c7317a42bf7f9c07f5a91baed8c68f75 > > This should reduce the round-trip, but this could accidently hide the problem as well. I can replace the kernel easily but without going back to the obsolete system state I lose the system upgrade as test case. Thanks Michal
Hi Michal, > Michal Suchánek <msuchanek@suse.de> hat am 28. März 2019 um 21:43 geschrieben: > > > On Fri, 22 Mar 2019 18:10:11 +0100 (CET) > Stefan Wahren <stefan.wahren@i2se.com> wrote: > > > Hi Michal, > > > > > Michal Suchánek <msuchanek@suse.de> hat am 22. März 2019 um 17:06 geschrieben: > > > > > > > > > On Fri, 22 Mar 2019 15:45:13 +0100 > > > Stefan Wahren <stefan.wahren@i2se.com> wrote: > > > > > > > Hi Michal, > > > > > > > > Am 21.03.19 um 21:03 schrieb Michal Suchánek: > > > > > > > > could you please retry with mainline kernel 5.0? > > > > > > I can try that. What I have is pretty much 5.0 anyway so I don't expect > > > much difference: > > > > > > > as long as the issue lies in the sdhost driver code. There also has been a lot of fixes by Lukas Wunner to the DMA engine driver. I prefer a well defined source base. > > > > > > > > I suspect that one of the locking fixes that went into mainline > > > recently prevents recovering from the error but I did not try > > > reverting them yet. > > could you please try this patch: http://lists.infradead.org/pipermail/linux-rpi-kernel/2019-March/008627.html Stefan
Hello, On Sat, 30 Mar 2019 16:15:14 +0100 (CET) Stefan Wahren <stefan.wahren@i2se.com> wrote: > Hi Michal, > > > Michal Suchánek <msuchanek@suse.de> hat am 28. März 2019 um 21:43 geschrieben: > > > > > > On Fri, 22 Mar 2019 18:10:11 +0100 (CET) > > Stefan Wahren <stefan.wahren@i2se.com> wrote: > > > > > Hi Michal, > > > > > > > Michal Suchánek <msuchanek@suse.de> hat am 22. März 2019 um 17:06 geschrieben: > > > > > > > > > > > > On Fri, 22 Mar 2019 15:45:13 +0100 > > > > Stefan Wahren <stefan.wahren@i2se.com> wrote: > > > > > > > > > Hi Michal, > > > > > > > > > > Am 21.03.19 um 21:03 schrieb Michal Suchánek: > > > > > > > > > > could you please retry with mainline kernel 5.0? > > > > > > > > I can try that. What I have is pretty much 5.0 anyway so I don't expect > > > > much difference: > > > > > > > > > > as long as the issue lies in the sdhost driver code. There also has been a lot of fixes by Lukas Wunner to the DMA engine driver. I prefer a well defined source base. > > > > > > > > > > > I suspect that one of the locking fixes that went into mainline > > > > recently prevents recovering from the error but I did not try > > > > reverting them yet. > > > So I suspect we have two different issues here: mmc1: Card stuck in programming state! mmcblk0 card_busy_detect this is an issue to which particular card models are more suspectible and the workaround is to reset the controller which we already do. It typically happens under some IO load. 90% of the time I get this when doing system upgrade when system is installed on a particular orange card. The other which is even harder to reproduce is sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt. It typically happens shortly after imaging the card for me. > > could you please try this patch: > > http://lists.infradead.org/pipermail/linux-rpi-kernel/2019-March/008627.html I can try that but the question is what symptom is it supposed to cure and what is the rationale for the change. Thanks Michal
On Sat, 30 Mar 2019 16:15:14 +0100 (CET) Stefan Wahren <stefan.wahren@i2se.com> wrote: > Hi Michal, > > > Michal Suchánek <msuchanek@suse.de> hat am 28. März 2019 um 21:43 geschrieben: > > > > > > On Fri, 22 Mar 2019 18:10:11 +0100 (CET) > > Stefan Wahren <stefan.wahren@i2se.com> wrote: > > > > > Hi Michal, > > > > > > > Michal Suchánek <msuchanek@suse.de> hat am 22. März 2019 um 17:06 geschrieben: > > > > > > > > > > > > On Fri, 22 Mar 2019 15:45:13 +0100 > > > > Stefan Wahren <stefan.wahren@i2se.com> wrote: > > > > > > > > > Hi Michal, > > > > > > > > > > Am 21.03.19 um 21:03 schrieb Michal Suchánek: > > > > > > > > > > could you please retry with mainline kernel 5.0? > > > > > > > > I can try that. What I have is pretty much 5.0 anyway so I don't expect > > > > much difference: > > > > > > > > > > as long as the issue lies in the sdhost driver code. There also has been a lot of fixes by Lukas Wunner to the DMA engine driver. I prefer a well defined source base. > > > > > > > > > > > I suspect that one of the locking fixes that went into mainline > > > > recently prevents recovering from the error but I did not try > > > > reverting them yet. > > > > > could you please try this patch: > > http://lists.infradead.org/pipermail/linux-rpi-kernel/2019-March/008627.html > > Stefan So I updated the dma driver to be able to apply this patch, applied it, and the system locked up. It even stopped logging over the network - only the serial console shows the message: Linux version 4.4.176-1.g755499f-default (geeko@buildhost) (gcc version 4.8.5 (SUSE Linux) ) .... ( 7/526) Installing: info2html-2.0-223.1.noarch .........................[done] ( 8/526) Installing: libX11-data-1.6.3-10.3.1.noarch ...............<89%>===[|][ 664.670582] sdhost-bcm2835 3f202000.sdhost: ti. [ 674.750501] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt. So at least on 4.4 kernel this does not help. Thanks Michal