mbox series

[V2,0/7] mmc: Several fixes for bcm2835 driver

Message ID 1541967839-2847-1-git-send-email-stefan.wahren@i2se.com (mailing list archive)
Headers show
Series mmc: Several fixes for bcm2835 driver | expand

Message

Stefan Wahren Nov. 11, 2018, 8:23 p.m. UTC
This patch series fixes several issues which has been discovered after
submission.

Changes in V2:
- add my own signed-off-by to patches #1 and #2

Michal Suchanek (1):
  mmc: bcm2835: reset host on timeout

Phil Elwell (1):
  mmc: bcm2835: Recover from MMC_SEND_EXT_CSD

Stefan Wahren (5):
  mmc: bcm2835: Release DMA channel on driver unload
  mmc: bcm2835: Avoid possible races on data requests
  mmc: bcm2835: Terminate timeout work synchronously
  mmc: bcm2835: Refactor dma_map_sg handling
  mmc: bcm2835: Properly handle dmaengine_prep_slave_sg

 drivers/mmc/host/bcm2835.c | 58 ++++++++++++++++++++++++++++++----------------
 1 file changed, 38 insertions(+), 20 deletions(-)

Comments

Ulf Hansson Dec. 5, 2018, 2:23 p.m. UTC | #1
On Sun, 11 Nov 2018 at 21:24, Stefan Wahren <stefan.wahren@i2se.com> wrote:
>
> This patch series fixes several issues which has been discovered after
> submission.
>
> Changes in V2:
> - add my own signed-off-by to patches #1 and #2
>
> Michal Suchanek (1):
>   mmc: bcm2835: reset host on timeout
>
> Phil Elwell (1):
>   mmc: bcm2835: Recover from MMC_SEND_EXT_CSD
>
> Stefan Wahren (5):
>   mmc: bcm2835: Release DMA channel on driver unload
>   mmc: bcm2835: Avoid possible races on data requests
>   mmc: bcm2835: Terminate timeout work synchronously
>   mmc: bcm2835: Refactor dma_map_sg handling
>   mmc: bcm2835: Properly handle dmaengine_prep_slave_sg
>
>  drivers/mmc/host/bcm2835.c | 58 ++++++++++++++++++++++++++++++----------------
>  1 file changed, 38 insertions(+), 20 deletions(-)
>
> --
> 2.7.4
>

Applied for next, thanks!

Kind regards
Uffe
Michal Suchanek March 21, 2019, 8:03 p.m. UTC | #2
On Sun, 11 Nov 2018 21:23:52 +0100
Stefan Wahren <stefan.wahren@i2se.com> wrote:

> This patch series fixes several issues which has been discovered after
> submission.
> 
> Changes in V2:
> - add my own signed-off-by to patches #1 and #2
> 
> Michal Suchanek (1):
>   mmc: bcm2835: reset host on timeout
> 
> Phil Elwell (1):
>   mmc: bcm2835: Recover from MMC_SEND_EXT_CSD
> 
> Stefan Wahren (5):
>   mmc: bcm2835: Release DMA channel on driver unload
>   mmc: bcm2835: Avoid possible races on data requests
>   mmc: bcm2835: Terminate timeout work synchronously
>   mmc: bcm2835: Refactor dma_map_sg handling
>   mmc: bcm2835: Properly handle dmaengine_prep_slave_sg
> 
>  drivers/mmc/host/bcm2835.c | 58 ++++++++++++++++++++++++++++++----------------
>  1 file changed, 38 insertions(+), 20 deletions(-)
> 

Hello,

thanks for the patches.

I tried to replace the bcm2835 sdhost driver in my 4.4 kernel with the
upstream driver + these updates but the 16GB orange EVO card still
locks up the mmc controller. It seems it locks up much less but is
certainly not solid.

Thanks

Michal
Stefan Wahren March 22, 2019, 2:45 p.m. UTC | #3
Hi Michal,

Am 21.03.19 um 21:03 schrieb Michal Suchánek:
> On Sun, 11 Nov 2018 21:23:52 +0100
> Stefan Wahren <stefan.wahren@i2se.com> wrote:
>
>> This patch series fixes several issues which has been discovered after
>> submission.
>>
>> Changes in V2:
>> - add my own signed-off-by to patches #1 and #2
>>
>> Michal Suchanek (1):
>>   mmc: bcm2835: reset host on timeout
>>
>> Phil Elwell (1):
>>   mmc: bcm2835: Recover from MMC_SEND_EXT_CSD
>>
>> Stefan Wahren (5):
>>   mmc: bcm2835: Release DMA channel on driver unload
>>   mmc: bcm2835: Avoid possible races on data requests
>>   mmc: bcm2835: Terminate timeout work synchronously
>>   mmc: bcm2835: Refactor dma_map_sg handling
>>   mmc: bcm2835: Properly handle dmaengine_prep_slave_sg
>>
>>  drivers/mmc/host/bcm2835.c | 58 ++++++++++++++++++++++++++++++----------------
>>  1 file changed, 38 insertions(+), 20 deletions(-)
>>
> Hello,
>
> thanks for the patches.
>
> I tried to replace the bcm2835 sdhost driver in my 4.4 kernel with the
> upstream driver + these updates but the 16GB orange EVO card still
> locks up the mmc controller. It seems it locks up much less but is
> certainly not solid.

could you please retry with mainline kernel 5.0?

Maybe this is related:

http://lists.infradead.org/pipermail/linux-rpi-kernel/2019-February/008542.html

>
> Thanks
>
> Michal
Michal Suchanek March 22, 2019, 4:06 p.m. UTC | #4
On Fri, 22 Mar 2019 15:45:13 +0100
Stefan Wahren <stefan.wahren@i2se.com> wrote:

> Hi Michal,
> 
> Am 21.03.19 um 21:03 schrieb Michal Suchánek:
> > On Sun, 11 Nov 2018 21:23:52 +0100
> > Stefan Wahren <stefan.wahren@i2se.com> wrote:
> >  
> >> This patch series fixes several issues which has been discovered after
> >> submission.
> >>
> >> Changes in V2:
> >> - add my own signed-off-by to patches #1 and #2
> >>
> >> Michal Suchanek (1):
> >>   mmc: bcm2835: reset host on timeout
> >>
> >> Phil Elwell (1):
> >>   mmc: bcm2835: Recover from MMC_SEND_EXT_CSD
> >>
> >> Stefan Wahren (5):
> >>   mmc: bcm2835: Release DMA channel on driver unload
> >>   mmc: bcm2835: Avoid possible races on data requests
> >>   mmc: bcm2835: Terminate timeout work synchronously
> >>   mmc: bcm2835: Refactor dma_map_sg handling
> >>   mmc: bcm2835: Properly handle dmaengine_prep_slave_sg
> >>
> >>  drivers/mmc/host/bcm2835.c | 58 ++++++++++++++++++++++++++++++----------------
> >>  1 file changed, 38 insertions(+), 20 deletions(-)
> >>  
> > Hello,
> >
> > thanks for the patches.
> >
> > I tried to replace the bcm2835 sdhost driver in my 4.4 kernel with the
> > upstream driver + these updates but the 16GB orange EVO card still
> > locks up the mmc controller. It seems it locks up much less but is
> > certainly not solid.  
> 
> could you please retry with mainline kernel 5.0?

I can try that. What I have is pretty much 5.0 anyway so I don't expect
much difference:

660fc733bd7436f4fa1a351376493e635514ed64  mmc: bcm2835: Add new driver for the sdhost controller.
bf3240bada0211b4a555d75f027181c8432b2d20  mmc: bcm2835: Fix possible NULL ptr dereference in
c00a231ba053a9b0be8d512957b99395b92bc620  mmc: bcm2835: fix potential null pointer dereferences
2c9e89a1d602c12a4f2bd4c7a57a3315247e3f21  mmc: bcm2835: constify mmc_host_ops structures
118032be389009b07ecb5a03ffe219a89d421def  mmc: bcm2835: Don't overwrite max frequency unconditionally
f6000a4eb34e6462bc0dd39809c1bb99f9633269  mmc: bcm2835: reset host on timeout
07d405769afea5718529fc9e341f0b13b3189b6f  mmc: bcm2835: Recover from MMC_SEND_EXT_CSD
5eae252db3856e62c778832d4d59f6efc5b0aaf9  mmc: bcm2835: Release DMA channel on driver unload
af19b7ce76ba220f358c82b0a5e7d68909a23aa5  mmc: bcm2835: Avoid possible races on data requests
37fefadee8bb665ae337a15aa635dabff9f66ade  mmc: bcm2835: Terminate timeout work synchronously
6dc6f2619017109e45550accc120f823fdc31c3e  mmc: bcm2835: Refactor dma_map_sg handling
2f5da678351f0d504966fab113968202aa5713fb  mmc: bcm2835: Properly handle dmaengine_prep_slave_sg
8c9620b1cc9b69e82fa8d4081d646d0016b602e7  mmc: bcm2835: Fix DMA channel leak on probe error
e5c1e63c932379b89d7404d4e5fde1bf8abff951  mmc: bcm2835: Drop DMA channel error pointer check

What I had before was some previous draft of
660fc733bd7436f4fa1a351376493e635514ed64 +
f6000a4eb34e6462bc0dd39809c1bb99f9633269 which would eventually recover
on the error but the errors were more frequent with my card.

> 
> Maybe this is related:
> 
> http://lists.infradead.org/pipermail/linux-rpi-kernel/2019-February/008542.html

I suspect that one of the locking fixes that went into mainline
recently prevents recovering from the error but I did not try
reverting them yet. It takes quite a while to reimage, boot different
kernel, run system update (which now invariably crashes/locks up with
any kernel). 

Thanks

Michal
Stefan Wahren March 22, 2019, 5:10 p.m. UTC | #5
Hi Michal,

> Michal Suchánek <msuchanek@suse.de> hat am 22. März 2019 um 17:06 geschrieben:
> 
> 
> On Fri, 22 Mar 2019 15:45:13 +0100
> Stefan Wahren <stefan.wahren@i2se.com> wrote:
> 
> > Hi Michal,
> > 
> > Am 21.03.19 um 21:03 schrieb Michal Suchánek:
> > 
> > could you please retry with mainline kernel 5.0?
> 
> I can try that. What I have is pretty much 5.0 anyway so I don't expect
> much difference:
> 

as long as the issue lies in the sdhost driver code. There also has been a lot of fixes by Lukas Wunner to the DMA engine driver. I prefer a well defined source base.

> 
> I suspect that one of the locking fixes that went into mainline
> recently prevents recovering from the error but I did not try
> reverting them yet.

Would be nice if you can find this regression.

> It takes quite a while to reimage, boot different
> kernel, run system update (which now invariably crashes/locks up with
> any kernel). 

That's why i prefer Raspbian during development, where i can simply replace the kernel / modules with a sd card reader on a PC:

https://gist.github.com/lategoodbye/c7317a42bf7f9c07f5a91baed8c68f75

This should reduce the round-trip, but this could accidently hide the problem as well.

Thanks
Stefan

> 
> Thanks
> 
> Michal
Michal Suchanek March 28, 2019, 8:43 p.m. UTC | #6
On Fri, 22 Mar 2019 18:10:11 +0100 (CET)
Stefan Wahren <stefan.wahren@i2se.com> wrote:

> Hi Michal,
> 
> > Michal Suchánek <msuchanek@suse.de> hat am 22. März 2019 um 17:06 geschrieben:
> > 
> > 
> > On Fri, 22 Mar 2019 15:45:13 +0100
> > Stefan Wahren <stefan.wahren@i2se.com> wrote:
> >   
> > > Hi Michal,
> > > 
> > > Am 21.03.19 um 21:03 schrieb Michal Suchánek:
> > > 
> > > could you please retry with mainline kernel 5.0?  
> > 
> > I can try that. What I have is pretty much 5.0 anyway so I don't expect
> > much difference:
> >   
> 
> as long as the issue lies in the sdhost driver code. There also has been a lot of fixes by Lukas Wunner to the DMA engine driver. I prefer a well defined source base.
> 
> > 
> > I suspect that one of the locking fixes that went into mainline
> > recently prevents recovering from the error but I did not try
> > reverting them yet.  
> 
> Would be nice if you can find this regression.

Does not look that good.

# bad: [a48caea1745f30e87ab5a8dba5e365d0346aa600] mmc: bcm2835: Drop DMA channel error pointer check (bsc#983145).
# good: [c6b26547caa816608ea5c5717b29c78769a22972] mmc: bcm2835: reset host on timeout (bsc#983145, bsc#1070872).
git bisect start 'a48caea1745f30e87ab5a8dba5e365d0346aa600' 'c6b26547caa816608ea5c5717b29c78769a22972'
# good: [3eb1fe752f52865eff0f9b8edd95b61b6a9c1010] mmc: bcm2835: Terminate timeout work synchronously (bsc#983145, bsc#1070872).
git bisect good 3eb1fe752f52865eff0f9b8edd95b61b6a9c1010
# good: [50b4dd03bd11fbb647f0edbed8501290f6f9ea46] mmc: bcm2835: Properly handle dmaengine_prep_slave_sg (bsc#983145).
git bisect good 50b4dd03bd11fbb647f0edbed8501290f6f9ea46

On the good commits a few timeouts occur and are recovered. This leaves 

8c9620b1cc9b69e82fa8d4081d646d0016b602e7  mmc: bcm2835: Fix DMA channel leak on probe error
e5c1e63c932379b89d7404d4e5fde1bf8abff951  mmc: bcm2835: Drop DMA channel error pointer check

which are not overly suspect. The latter should be noop at least. So
maybe the indefinite lockup depends on card state or some other factor.
Also the recovery in the case when it does recover does take quite long
(like minutes) with the system pretty much completely non-responsive.
Not waiting long enough for the system to recover might be also a
factor.

> 
> > It takes quite a while to reimage, boot different
> > kernel, run system update (which now invariably crashes/locks up with
> > any kernel).   
> 
> That's why i prefer Raspbian during development, where i can simply replace the kernel / modules with a sd card reader on a PC:
> 
> https://gist.github.com/lategoodbye/c7317a42bf7f9c07f5a91baed8c68f75
> 
> This should reduce the round-trip, but this could accidently hide the problem as well.

I can replace the kernel easily but without going back to the obsolete
system state I lose the system upgrade as test case.

Thanks 

Michal
Stefan Wahren March 30, 2019, 3:15 p.m. UTC | #7
Hi Michal,

> Michal Suchánek <msuchanek@suse.de> hat am 28. März 2019 um 21:43 geschrieben:
> 
> 
> On Fri, 22 Mar 2019 18:10:11 +0100 (CET)
> Stefan Wahren <stefan.wahren@i2se.com> wrote:
> 
> > Hi Michal,
> > 
> > > Michal Suchánek <msuchanek@suse.de> hat am 22. März 2019 um 17:06 geschrieben:
> > > 
> > > 
> > > On Fri, 22 Mar 2019 15:45:13 +0100
> > > Stefan Wahren <stefan.wahren@i2se.com> wrote:
> > >   
> > > > Hi Michal,
> > > > 
> > > > Am 21.03.19 um 21:03 schrieb Michal Suchánek:
> > > > 
> > > > could you please retry with mainline kernel 5.0?  
> > > 
> > > I can try that. What I have is pretty much 5.0 anyway so I don't expect
> > > much difference:
> > >   
> > 
> > as long as the issue lies in the sdhost driver code. There also has been a lot of fixes by Lukas Wunner to the DMA engine driver. I prefer a well defined source base.
> > 
> > > 
> > > I suspect that one of the locking fixes that went into mainline
> > > recently prevents recovering from the error but I did not try
> > > reverting them yet.  
> > 

could you please try this patch:

http://lists.infradead.org/pipermail/linux-rpi-kernel/2019-March/008627.html

Stefan
Michal Suchanek April 2, 2019, 6:58 p.m. UTC | #8
Hello,

On Sat, 30 Mar 2019 16:15:14 +0100 (CET)
Stefan Wahren <stefan.wahren@i2se.com> wrote:

> Hi Michal,
> 
> > Michal Suchánek <msuchanek@suse.de> hat am 28. März 2019 um 21:43 geschrieben:
> > 
> > 
> > On Fri, 22 Mar 2019 18:10:11 +0100 (CET)
> > Stefan Wahren <stefan.wahren@i2se.com> wrote:
> >   
> > > Hi Michal,
> > >   
> > > > Michal Suchánek <msuchanek@suse.de> hat am 22. März 2019 um 17:06 geschrieben:
> > > > 
> > > > 
> > > > On Fri, 22 Mar 2019 15:45:13 +0100
> > > > Stefan Wahren <stefan.wahren@i2se.com> wrote:
> > > >     
> > > > > Hi Michal,
> > > > > 
> > > > > Am 21.03.19 um 21:03 schrieb Michal Suchánek:
> > > > > 
> > > > > could you please retry with mainline kernel 5.0?    
> > > > 
> > > > I can try that. What I have is pretty much 5.0 anyway so I don't expect
> > > > much difference:
> > > >     
> > > 
> > > as long as the issue lies in the sdhost driver code. There also has been a lot of fixes by Lukas Wunner to the DMA engine driver. I prefer a well defined source base.
> > >   
> > > > 
> > > > I suspect that one of the locking fixes that went into mainline
> > > > recently prevents recovering from the error but I did not try
> > > > reverting them yet.    
> > >   

So I suspect we have two different issues here:

mmc1: Card stuck in programming state! mmcblk0 card_busy_detect

this is an issue to which particular card models are more suspectible
and the workaround is to reset the controller which we already do.
It typically happens under some IO load. 90% of the time I get this
when doing system upgrade when system is installed on a particular
orange card.

The other which is even harder to reproduce is

sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.

It typically happens shortly after imaging the card for me.

> 
> could you please try this patch:
> 
> http://lists.infradead.org/pipermail/linux-rpi-kernel/2019-March/008627.html

I can try that but the question is what symptom is it supposed to cure
and what is the rationale for the change.

Thanks

Michal
Michal Suchanek April 8, 2019, 2:55 p.m. UTC | #9
On Sat, 30 Mar 2019 16:15:14 +0100 (CET)
Stefan Wahren <stefan.wahren@i2se.com> wrote:

> Hi Michal,
> 
> > Michal Suchánek <msuchanek@suse.de> hat am 28. März 2019 um 21:43 geschrieben:
> > 
> > 
> > On Fri, 22 Mar 2019 18:10:11 +0100 (CET)
> > Stefan Wahren <stefan.wahren@i2se.com> wrote:
> >   
> > > Hi Michal,
> > >   
> > > > Michal Suchánek <msuchanek@suse.de> hat am 22. März 2019 um 17:06 geschrieben:
> > > > 
> > > > 
> > > > On Fri, 22 Mar 2019 15:45:13 +0100
> > > > Stefan Wahren <stefan.wahren@i2se.com> wrote:
> > > >     
> > > > > Hi Michal,
> > > > > 
> > > > > Am 21.03.19 um 21:03 schrieb Michal Suchánek:
> > > > > 
> > > > > could you please retry with mainline kernel 5.0?    
> > > > 
> > > > I can try that. What I have is pretty much 5.0 anyway so I don't expect
> > > > much difference:
> > > >     
> > > 
> > > as long as the issue lies in the sdhost driver code. There also has been a lot of fixes by Lukas Wunner to the DMA engine driver. I prefer a well defined source base.
> > >   
> > > > 
> > > > I suspect that one of the locking fixes that went into mainline
> > > > recently prevents recovering from the error but I did not try
> > > > reverting them yet.    
> > >   
> 
> could you please try this patch:
> 
> http://lists.infradead.org/pipermail/linux-rpi-kernel/2019-March/008627.html
> 
> Stefan

So I updated the dma driver to be able to apply this patch, applied it,
and the system locked up. It even stopped logging over the network -
only the serial console shows the message:

Linux version 4.4.176-1.g755499f-default (geeko@buildhost) (gcc version 4.8.5 (SUSE Linux) )
....
(  7/526) Installing: info2html-2.0-223.1.noarch .........................[done]
(  8/526) Installing: libX11-data-1.6.3-10.3.1.noarch ...............<89%>===[|][  664.670582] sdhost-bcm2835 3f202000.sdhost: ti.
[  674.750501] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.

So at least on 4.4 kernel this does not help.

Thanks

Michal