Message ID | 20190524111053.12228-1-masneyb@onstation.org (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | mmc: sdhci: queue work after sdhci_defer_done() | expand |
On 24/05/19 2:10 PM, Brian Masney wrote: > WiFi stopped working on the LG Nexus 5 phone and the issue was bisected > to the commit c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") that > moved from using a tasklet to a work queue. That patch also changed > sdhci_irq() to return IRQ_WAKE_THREAD instead of finishing the work when > sdhci_defer_done() is true. Change it to queue work to the complete work > queue if sdhci_defer_done() is true so that the functionality is > equilivent to what was there when the finish_tasklet was present. This > corrects the WiFi breakage on the Nexus 5 phone. > > Signed-off-by: Brian Masney <masneyb@onstation.org> > Fixes: c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") > --- > See 'sdhci@f98a4900' in qcom-msm8974-lge-nexus5-hammerhead.dts for > details about how the WiFi is wired into sdhci on this platform. > > bisect log: > > git bisect start > # bad: [4dde821e4296e156d133b98ddc4c45861935a4fb] Merge tag 'xfs-5.2-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux > git bisect bad 4dde821e4296e156d133b98ddc4c45861935a4fb > # good: [e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd] Linux 5.1 > git bisect good e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd > # bad: [8c79f4cd441b27df6cadd11b70a50e06b3b3a2bf] Merge tag 'docs-5.2' of git://git.lwn.net/linux > git bisect bad 8c79f4cd441b27df6cadd11b70a50e06b3b3a2bf > # bad: [67a242223958d628f0ba33283668e3ddd192d057] Merge tag 'for-5.2/block-20190507' of git://git.kernel.dk/linux-block > git bisect bad 67a242223958d628f0ba33283668e3ddd192d057 > # good: [8ff468c29e9a9c3afe9152c10c7b141343270bf3] Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect good 8ff468c29e9a9c3afe9152c10c7b141343270bf3 > # good: [e2a5be107f52cefb9010ccae6f569c3ddaa954cc] staging: kpc2000: kpc_spi: Fix build error for {read,write}q > git bisect good e2a5be107f52cefb9010ccae6f569c3ddaa954cc > # bad: [cf482a49af564a3044de3178ea28f10ad5921b38] Merge tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core > git bisect bad cf482a49af564a3044de3178ea28f10ad5921b38 > # good: [9f2e3a53f7ec9ef55e9d01bc29a6285d291c151e] Merge tag 'for-5.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux > git bisect good 9f2e3a53f7ec9ef55e9d01bc29a6285d291c151e > # good: [b4b52b881cf08e13d110eac811d4becc0775abbf] Merge tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux > git bisect good b4b52b881cf08e13d110eac811d4becc0775abbf > # bad: [d5f758f2df8015b8dcf47b6403cc192e4cef734d] mmc: meson-gx: disable HS400 > git bisect bad d5f758f2df8015b8dcf47b6403cc192e4cef734d > # good: [b3fb9d64b497b890f7b779a9f0b40b5cc269ea18] mmc: mmci: define get_dctrl_cfg for legacy variant > git bisect good b3fb9d64b497b890f7b779a9f0b40b5cc269ea18 > # good: [ade024f130f742725da9219624b01666f04bc4a6] memstick: jmb38x_ms: remove set but not used variable 'data' > git bisect good ade024f130f742725da9219624b01666f04bc4a6 > # bad: [42c38d4a1bc41e78dedbf73b0fb35e44007789bb] mmc: core: Fix warning and undefined behavior in mmc voltage handling > git bisect bad 42c38d4a1bc41e78dedbf73b0fb35e44007789bb > # good: [19d2f695f4e82794df7465b029c02b104d1b9903] mmc: sdhci: Call mmc_request_done() from IRQ handler if possible > git bisect good 19d2f695f4e82794df7465b029c02b104d1b9903 > # bad: [71c733c4e1aeb83e8221e89caeec893d51f88b7b] mmc: tegra: add sdhci tegra suspend and resume > git bisect bad 71c733c4e1aeb83e8221e89caeec893d51f88b7b > # bad: [c07a48c2651965e84d35cf193dfc0e5f7892d612] mmc: sdhci: Remove finish_tasklet > git bisect bad c07a48c2651965e84d35cf193dfc0e5f7892d612 > # first bad commit: [c07a48c2651965e84d35cf193dfc0e5f7892d612] mmc: sdhci: Remove finish_tasklet > > drivers/mmc/host/sdhci.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c > index 97158344b862..3563c3bc57c9 100644 > --- a/drivers/mmc/host/sdhci.c > +++ b/drivers/mmc/host/sdhci.c > @@ -3115,7 +3115,7 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id) > continue; > > if (sdhci_defer_done(host, mrq)) { > - result = IRQ_WAKE_THREAD; > + queue_work(host->complete_wq, &host->complete_work); The IRQ thread has a lot less latency than the work queue, which is why it is done that way. I am not sure why you say this change is equivalent to what was there before, nor why it fixes your problem. Can you explain some more? > } else { > mrqs_done[i] = mrq; > host->mrqs_done[i] = NULL; >
On Fri, May 24, 2019 at 03:17:13PM +0300, Adrian Hunter wrote: > On 24/05/19 2:10 PM, Brian Masney wrote: > > WiFi stopped working on the LG Nexus 5 phone and the issue was bisected > > to the commit c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") that > > moved from using a tasklet to a work queue. That patch also changed > > sdhci_irq() to return IRQ_WAKE_THREAD instead of finishing the work when > > sdhci_defer_done() is true. Change it to queue work to the complete work > > queue if sdhci_defer_done() is true so that the functionality is > > equilivent to what was there when the finish_tasklet was present. This > > corrects the WiFi breakage on the Nexus 5 phone. > > > > Signed-off-by: Brian Masney <masneyb@onstation.org> > > Fixes: c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") > > --- > > See 'sdhci@f98a4900' in qcom-msm8974-lge-nexus5-hammerhead.dts for > > details about how the WiFi is wired into sdhci on this platform. > > > > bisect log: > > > > git bisect start > > # bad: [4dde821e4296e156d133b98ddc4c45861935a4fb] Merge tag 'xfs-5.2-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux > > git bisect bad 4dde821e4296e156d133b98ddc4c45861935a4fb > > # good: [e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd] Linux 5.1 > > git bisect good e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd > > # bad: [8c79f4cd441b27df6cadd11b70a50e06b3b3a2bf] Merge tag 'docs-5.2' of git://git.lwn.net/linux > > git bisect bad 8c79f4cd441b27df6cadd11b70a50e06b3b3a2bf > > # bad: [67a242223958d628f0ba33283668e3ddd192d057] Merge tag 'for-5.2/block-20190507' of git://git.kernel.dk/linux-block > > git bisect bad 67a242223958d628f0ba33283668e3ddd192d057 > > # good: [8ff468c29e9a9c3afe9152c10c7b141343270bf3] Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > > git bisect good 8ff468c29e9a9c3afe9152c10c7b141343270bf3 > > # good: [e2a5be107f52cefb9010ccae6f569c3ddaa954cc] staging: kpc2000: kpc_spi: Fix build error for {read,write}q > > git bisect good e2a5be107f52cefb9010ccae6f569c3ddaa954cc > > # bad: [cf482a49af564a3044de3178ea28f10ad5921b38] Merge tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core > > git bisect bad cf482a49af564a3044de3178ea28f10ad5921b38 > > # good: [9f2e3a53f7ec9ef55e9d01bc29a6285d291c151e] Merge tag 'for-5.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux > > git bisect good 9f2e3a53f7ec9ef55e9d01bc29a6285d291c151e > > # good: [b4b52b881cf08e13d110eac811d4becc0775abbf] Merge tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux > > git bisect good b4b52b881cf08e13d110eac811d4becc0775abbf > > # bad: [d5f758f2df8015b8dcf47b6403cc192e4cef734d] mmc: meson-gx: disable HS400 > > git bisect bad d5f758f2df8015b8dcf47b6403cc192e4cef734d > > # good: [b3fb9d64b497b890f7b779a9f0b40b5cc269ea18] mmc: mmci: define get_dctrl_cfg for legacy variant > > git bisect good b3fb9d64b497b890f7b779a9f0b40b5cc269ea18 > > # good: [ade024f130f742725da9219624b01666f04bc4a6] memstick: jmb38x_ms: remove set but not used variable 'data' > > git bisect good ade024f130f742725da9219624b01666f04bc4a6 > > # bad: [42c38d4a1bc41e78dedbf73b0fb35e44007789bb] mmc: core: Fix warning and undefined behavior in mmc voltage handling > > git bisect bad 42c38d4a1bc41e78dedbf73b0fb35e44007789bb > > # good: [19d2f695f4e82794df7465b029c02b104d1b9903] mmc: sdhci: Call mmc_request_done() from IRQ handler if possible > > git bisect good 19d2f695f4e82794df7465b029c02b104d1b9903 > > # bad: [71c733c4e1aeb83e8221e89caeec893d51f88b7b] mmc: tegra: add sdhci tegra suspend and resume > > git bisect bad 71c733c4e1aeb83e8221e89caeec893d51f88b7b > > # bad: [c07a48c2651965e84d35cf193dfc0e5f7892d612] mmc: sdhci: Remove finish_tasklet > > git bisect bad c07a48c2651965e84d35cf193dfc0e5f7892d612 > > # first bad commit: [c07a48c2651965e84d35cf193dfc0e5f7892d612] mmc: sdhci: Remove finish_tasklet > > > > drivers/mmc/host/sdhci.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c > > index 97158344b862..3563c3bc57c9 100644 > > --- a/drivers/mmc/host/sdhci.c > > +++ b/drivers/mmc/host/sdhci.c > > @@ -3115,7 +3115,7 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id) > > continue; > > > > if (sdhci_defer_done(host, mrq)) { > > - result = IRQ_WAKE_THREAD; > > + queue_work(host->complete_wq, &host->complete_work); > > The IRQ thread has a lot less latency than the work queue, which is why it > is done that way. > > I am not sure why you say this change is equivalent to what was there > before, nor why it fixes your problem. > > Can you explain some more? What I meant by the equivalent change was that tasklet_schedule() used to be called in this situation rather than returning IRQ_WAKE_THREAD. I'm honestly not sure exactly what's going on yet. Without the patch I sent out, wlan0 is not detected on the phone. Perhaps there is a subtle race condition somewhere that is exposed with the reduction in latency since I assume tasklet_schedule() is more expensive than doing the processing in the bottom half? I'll do some more digging and see if I can find more information. I sent this out to get a discussion started. Brian
On Fri, May 24, 2019 at 03:17:13PM +0300, Adrian Hunter wrote: > On 24/05/19 2:10 PM, Brian Masney wrote: > > WiFi stopped working on the LG Nexus 5 phone and the issue was bisected > > to the commit c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") that > > moved from using a tasklet to a work queue. That patch also changed > > sdhci_irq() to return IRQ_WAKE_THREAD instead of finishing the work when > > sdhci_defer_done() is true. Change it to queue work to the complete work > > queue if sdhci_defer_done() is true so that the functionality is > > equilivent to what was there when the finish_tasklet was present. This > > corrects the WiFi breakage on the Nexus 5 phone. > > > > Signed-off-by: Brian Masney <masneyb@onstation.org> > > Fixes: c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") > > --- > > See 'sdhci@f98a4900' in qcom-msm8974-lge-nexus5-hammerhead.dts for > > details about how the WiFi is wired into sdhci on this platform. > > > > bisect log: > > > > git bisect start > > # bad: [4dde821e4296e156d133b98ddc4c45861935a4fb] Merge tag 'xfs-5.2-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux > > git bisect bad 4dde821e4296e156d133b98ddc4c45861935a4fb > > # good: [e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd] Linux 5.1 > > git bisect good e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd > > # bad: [8c79f4cd441b27df6cadd11b70a50e06b3b3a2bf] Merge tag 'docs-5.2' of git://git.lwn.net/linux > > git bisect bad 8c79f4cd441b27df6cadd11b70a50e06b3b3a2bf > > # bad: [67a242223958d628f0ba33283668e3ddd192d057] Merge tag 'for-5.2/block-20190507' of git://git.kernel.dk/linux-block > > git bisect bad 67a242223958d628f0ba33283668e3ddd192d057 > > # good: [8ff468c29e9a9c3afe9152c10c7b141343270bf3] Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > > git bisect good 8ff468c29e9a9c3afe9152c10c7b141343270bf3 > > # good: [e2a5be107f52cefb9010ccae6f569c3ddaa954cc] staging: kpc2000: kpc_spi: Fix build error for {read,write}q > > git bisect good e2a5be107f52cefb9010ccae6f569c3ddaa954cc > > # bad: [cf482a49af564a3044de3178ea28f10ad5921b38] Merge tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core > > git bisect bad cf482a49af564a3044de3178ea28f10ad5921b38 > > # good: [9f2e3a53f7ec9ef55e9d01bc29a6285d291c151e] Merge tag 'for-5.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux > > git bisect good 9f2e3a53f7ec9ef55e9d01bc29a6285d291c151e > > # good: [b4b52b881cf08e13d110eac811d4becc0775abbf] Merge tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux > > git bisect good b4b52b881cf08e13d110eac811d4becc0775abbf > > # bad: [d5f758f2df8015b8dcf47b6403cc192e4cef734d] mmc: meson-gx: disable HS400 > > git bisect bad d5f758f2df8015b8dcf47b6403cc192e4cef734d > > # good: [b3fb9d64b497b890f7b779a9f0b40b5cc269ea18] mmc: mmci: define get_dctrl_cfg for legacy variant > > git bisect good b3fb9d64b497b890f7b779a9f0b40b5cc269ea18 > > # good: [ade024f130f742725da9219624b01666f04bc4a6] memstick: jmb38x_ms: remove set but not used variable 'data' > > git bisect good ade024f130f742725da9219624b01666f04bc4a6 > > # bad: [42c38d4a1bc41e78dedbf73b0fb35e44007789bb] mmc: core: Fix warning and undefined behavior in mmc voltage handling > > git bisect bad 42c38d4a1bc41e78dedbf73b0fb35e44007789bb > > # good: [19d2f695f4e82794df7465b029c02b104d1b9903] mmc: sdhci: Call mmc_request_done() from IRQ handler if possible > > git bisect good 19d2f695f4e82794df7465b029c02b104d1b9903 > > # bad: [71c733c4e1aeb83e8221e89caeec893d51f88b7b] mmc: tegra: add sdhci tegra suspend and resume > > git bisect bad 71c733c4e1aeb83e8221e89caeec893d51f88b7b > > # bad: [c07a48c2651965e84d35cf193dfc0e5f7892d612] mmc: sdhci: Remove finish_tasklet > > git bisect bad c07a48c2651965e84d35cf193dfc0e5f7892d612 > > # first bad commit: [c07a48c2651965e84d35cf193dfc0e5f7892d612] mmc: sdhci: Remove finish_tasklet > > > > drivers/mmc/host/sdhci.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c > > index 97158344b862..3563c3bc57c9 100644 > > --- a/drivers/mmc/host/sdhci.c > > +++ b/drivers/mmc/host/sdhci.c > > @@ -3115,7 +3115,7 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id) > > continue; > > > > if (sdhci_defer_done(host, mrq)) { > > - result = IRQ_WAKE_THREAD; > > + queue_work(host->complete_wq, &host->complete_work); > > The IRQ thread has a lot less latency than the work queue, which is why it > is done that way. > > I am not sure why you say this change is equivalent to what was there > before, nor why it fixes your problem. > > Can you explain some more? drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c calls sdio_claim_host() and it appears to never return. I'm not going to be able to look into this more today but I should have time this weekend to dig in more with ftrace. Brian
+ Broadcom wireless maintainers On Fri, May 24, 2019 at 11:49:58AM -0400, Brian Masney wrote: > On Fri, May 24, 2019 at 03:17:13PM +0300, Adrian Hunter wrote: > > On 24/05/19 2:10 PM, Brian Masney wrote: > > > WiFi stopped working on the LG Nexus 5 phone and the issue was bisected > > > to the commit c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") that > > > moved from using a tasklet to a work queue. That patch also changed > > > sdhci_irq() to return IRQ_WAKE_THREAD instead of finishing the work when > > > sdhci_defer_done() is true. Change it to queue work to the complete work > > > queue if sdhci_defer_done() is true so that the functionality is > > > equilivent to what was there when the finish_tasklet was present. This > > > corrects the WiFi breakage on the Nexus 5 phone. > > > > > > Signed-off-by: Brian Masney <masneyb@onstation.org> > > > Fixes: c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") > > > --- > > > [ ... ] > > > > > > drivers/mmc/host/sdhci.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c > > > index 97158344b862..3563c3bc57c9 100644 > > > --- a/drivers/mmc/host/sdhci.c > > > +++ b/drivers/mmc/host/sdhci.c > > > @@ -3115,7 +3115,7 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id) > > > continue; > > > > > > if (sdhci_defer_done(host, mrq)) { > > > - result = IRQ_WAKE_THREAD; > > > + queue_work(host->complete_wq, &host->complete_work); > > > > The IRQ thread has a lot less latency than the work queue, which is why it > > is done that way. > > > > I am not sure why you say this change is equivalent to what was there > > before, nor why it fixes your problem. > > > > Can you explain some more? > > [ ... ] > > drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c calls > sdio_claim_host() and it appears to never return. When the brcmfmac driver is loaded, the firmware is requested from disk, and that's when the deadlock occurs in 5.2rc1. Specifically: 1) brcmf_sdio_download_firmware() in drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c calls sdio_claim_host() 2) brcmf_sdio_firmware_callback() is called and brcmf_sdiod_ramrw() tries to claim the host, but has to wait since its already claimed in #1 and the deadlock occurs. I tried to release the host before the firmware is requested, however parts of brcmf_chip_set_active() needs the host to be claimed, and a similar deadlock occurs in brcmf_sdiod_ramrw() if I claim the host before calling brcmf_chip_set_active(). I started to look at moving the sdio_{claim,release}_host() calls out of brcmf_sdiod_ramrw() but there's a fair number of callers, so I'd like to get feedback about the best course of action here. Brian
On 5/26/2019 2:21 PM, Brian Masney wrote: > + Broadcom wireless maintainers > > On Fri, May 24, 2019 at 11:49:58AM -0400, Brian Masney wrote: >> On Fri, May 24, 2019 at 03:17:13PM +0300, Adrian Hunter wrote: >>> On 24/05/19 2:10 PM, Brian Masney wrote: >>>> WiFi stopped working on the LG Nexus 5 phone and the issue was bisected >>>> to the commit c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") that >>>> moved from using a tasklet to a work queue. That patch also changed >>>> sdhci_irq() to return IRQ_WAKE_THREAD instead of finishing the work when >>>> sdhci_defer_done() is true. Change it to queue work to the complete work >>>> queue if sdhci_defer_done() is true so that the functionality is >>>> equilivent to what was there when the finish_tasklet was present. This >>>> corrects the WiFi breakage on the Nexus 5 phone. >>>> >>>> Signed-off-by: Brian Masney <masneyb@onstation.org> >>>> Fixes: c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") >>>> --- >>>> [ ... ] >>>> >>>> drivers/mmc/host/sdhci.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c >>>> index 97158344b862..3563c3bc57c9 100644 >>>> --- a/drivers/mmc/host/sdhci.c >>>> +++ b/drivers/mmc/host/sdhci.c >>>> @@ -3115,7 +3115,7 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id) >>>> continue; >>>> >>>> if (sdhci_defer_done(host, mrq)) { >>>> - result = IRQ_WAKE_THREAD; >>>> + queue_work(host->complete_wq, &host->complete_work); >>> >>> The IRQ thread has a lot less latency than the work queue, which is why it >>> is done that way. >>> >>> I am not sure why you say this change is equivalent to what was there >>> before, nor why it fixes your problem. >>> >>> Can you explain some more? >> >> [ ... ] >> >> drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c calls >> sdio_claim_host() and it appears to never return. > > When the brcmfmac driver is loaded, the firmware is requested from disk, > and that's when the deadlock occurs in 5.2rc1. Specifically: > > 1) brcmf_sdio_download_firmware() in > drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c calls > sdio_claim_host() > > 2) brcmf_sdio_firmware_callback() is called and brcmf_sdiod_ramrw() > tries to claim the host, but has to wait since its already claimed > in #1 and the deadlock occurs. This does not make any sense to me. brcmf_sdio_download_firmware() is called from brcmf_sdio_firmware_callback() so they are in the same context. So #2 is not waiting for #1, but something else I would say. Also #2 calls sdio_claim_host() after brcmf_sdio_download_firmware has completed so definitely not waiting for #1. > I tried to release the host before the firmware is requested, however > parts of brcmf_chip_set_active() needs the host to be claimed, and a > similar deadlock occurs in brcmf_sdiod_ramrw() if I claim the host > before calling brcmf_chip_set_active(). > > I started to look at moving the sdio_{claim,release}_host() calls out of > brcmf_sdiod_ramrw() but there's a fair number of callers, so I'd like to > get feedback about the best course of action here. Long ago Franky reworked the sdio critical sections requiring sdio claim/release and I am pretty sure they are correct. Could you try with lockdep kernel and see if that brings any more information. In the mean time I will update my dev branch to 5.2-rc1 and see if I can find any clues. Regards, Arend
On Sun, May 26, 2019 at 08:42:21PM +0200, Arend Van Spriel wrote: > On 5/26/2019 2:21 PM, Brian Masney wrote: > > + Broadcom wireless maintainers > > > > On Fri, May 24, 2019 at 11:49:58AM -0400, Brian Masney wrote: > > > On Fri, May 24, 2019 at 03:17:13PM +0300, Adrian Hunter wrote: > > > > On 24/05/19 2:10 PM, Brian Masney wrote: > > > > > WiFi stopped working on the LG Nexus 5 phone and the issue was bisected > > > > > to the commit c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") that > > > > > moved from using a tasklet to a work queue. That patch also changed > > > > > sdhci_irq() to return IRQ_WAKE_THREAD instead of finishing the work when > > > > > sdhci_defer_done() is true. Change it to queue work to the complete work > > > > > queue if sdhci_defer_done() is true so that the functionality is > > > > > equilivent to what was there when the finish_tasklet was present. This > > > > > corrects the WiFi breakage on the Nexus 5 phone. > > > > > > > > > > Signed-off-by: Brian Masney <masneyb@onstation.org> > > > > > Fixes: c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") > > > > > --- > > > > > [ ... ] > > > > > > > > > > drivers/mmc/host/sdhci.c | 2 +- > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c > > > > > index 97158344b862..3563c3bc57c9 100644 > > > > > --- a/drivers/mmc/host/sdhci.c > > > > > +++ b/drivers/mmc/host/sdhci.c > > > > > @@ -3115,7 +3115,7 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id) > > > > > continue; > > > > > if (sdhci_defer_done(host, mrq)) { > > > > > - result = IRQ_WAKE_THREAD; > > > > > + queue_work(host->complete_wq, &host->complete_work); > > > > > > > > The IRQ thread has a lot less latency than the work queue, which is why it > > > > is done that way. > > > > > > > > I am not sure why you say this change is equivalent to what was there > > > > before, nor why it fixes your problem. > > > > > > > > Can you explain some more? > > > > > > [ ... ] > > > > > > drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c calls > > > sdio_claim_host() and it appears to never return. > > > > When the brcmfmac driver is loaded, the firmware is requested from disk, > > and that's when the deadlock occurs in 5.2rc1. Specifically: > > > > 1) brcmf_sdio_download_firmware() in > > drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c calls > > sdio_claim_host() > > > > 2) brcmf_sdio_firmware_callback() is called and brcmf_sdiod_ramrw() > > tries to claim the host, but has to wait since its already claimed > > in #1 and the deadlock occurs. > > This does not make any sense to me. brcmf_sdio_download_firmware() is called > from brcmf_sdio_firmware_callback() so they are in the same context. So #2 > is not waiting for #1, but something else I would say. Also #2 calls > sdio_claim_host() after brcmf_sdio_download_firmware has completed so > definitely not waiting for #1. I attached a patch that shows how I was able to determine what had already claimed the host. It's messy; please don't judge me negatively for this. :) Anyways, sdio_claim_host() is mostly a wrapper for __mmc_claim_host() and there is a mmc_ctx structure that contains a task struct. This context can be NULL. I added a description field to the context structure and put the function name that claimed the host in there. The mmc_host structure already contained a 'claimer' member, so that made it easy. I see the following messages in dmesg that shows what has already claimed the host when loading the brcmfmac module in 5.2rc1: cfg80211: Loading compiled-in X.509 certificates for regulatory database cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7' platform regulatory.0: Direct firmware load for regulatory.db failed with error -2 cfg80211: failed to load regulatory.db brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac4339-sdio for chip BCM4339/2 brcmfmac mmc1:0001:1: Direct firmware load for brcm/brcmfmac4339-sdio.lge,hammerhead.txt failed with error -2 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 5 at drivers/mmc/core/core.c:819 __mmc_claim_host+0x28c/0x2c0 Modules linked in: brcmfmac brcmutil cfg80211 dm_mod CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.2.0-rc1-00175-g9899510d2cd1-dirty #420 Hardware name: Generic DT based system Workqueue: events request_firmware_work_func [<c03122dc>] (unwind_backtrace) from [<c030d5dc>] (show_stack+0x10/0x14) [<c030d5dc>] (show_stack) from [<c0ac284c>] (dump_stack+0x78/0x8c) [<c0ac284c>] (dump_stack) from [<c0321438>] (__warn.part.3+0xb8/0xd4) [<c0321438>] (__warn.part.3) from [<c03215b0>] (warn_slowpath_null+0x44/0x4c) [<c03215b0>] (warn_slowpath_null) from [<c093e608>] (__mmc_claim_host+0x28c/0x2c0) [<c093e608>] (__mmc_claim_host) from [<bf115018>] (brcmf_sdiod_ramrw+0x9c/0x200 [brcmfmac]) [<bf115018>] (brcmf_sdiod_ramrw [brcmfmac]) from [<bf110508>] (brcmf_sdio_firmware_callback+0xe8/0x7b4 [brcmfmac]) [<bf110508>] (brcmf_sdio_firmware_callback [brcmfmac]) from [<bf108830>] (brcmf_fw_request_done+0xf0/0x110 [brcmfmac]) [<bf108830>] (brcmf_fw_request_done [brcmfmac]) from [<c081a4e8>] (request_firmware_work_func+0x4c/0x88) [<c081a4e8>] (request_firmware_work_func) from [<c033c260>] (process_one_work+0x1fc/0x564) [<c033c260>] (process_one_work) from [<c033ceb8>] (worker_thread+0x44/0x584) [<c033ceb8>] (worker_thread) from [<c034226c>] (kthread+0x148/0x150) [<c034226c>] (kthread) from [<c03010e8>] (ret_from_fork+0x14/0x2c) Exception stack(0xee8bdfb0 to 0xee8bdff8) dfa0: 00000000 00000000 00000000 00000000 dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 ---[ end trace 4ab1b01efc876120 ]--- mmc_host mmc1: __mmc_claim_host: FIXME - before schedule() - descr=brcmf_sdiod_ramrw, claimer=brcmf_sdio_download_firmware The 'after schedule()' line is not shown and WiFi doesn't work. > > I tried to release the host before the firmware is requested, however > > parts of brcmf_chip_set_active() needs the host to be claimed, and a > > similar deadlock occurs in brcmf_sdiod_ramrw() if I claim the host > > before calling brcmf_chip_set_active(). > > > > I started to look at moving the sdio_{claim,release}_host() calls out of > > brcmf_sdiod_ramrw() but there's a fair number of callers, so I'd like to > > get feedback about the best course of action here. > > Long ago Franky reworked the sdio critical sections requiring sdio > claim/release and I am pretty sure they are correct. > > Could you try with lockdep kernel and see if that brings any more > information. In the mean time I will update my dev branch to 5.2-rc1 and see > if I can find any clues. My .config has CONFIG_LOCKDEP_SUPPORT enabled. I haven't used lockdep but my understanding is that it should print something in dmesg if a deadlock occurs. I assume it won't pick up cases like this where schedule() is called. Brian From 496c4deaa3787bf619baff58493142e11cd6757f Mon Sep 17 00:00:00 2001 From: Brian Masney <masneyb@onstation.org> Date: Sun, 26 May 2019 15:36:40 -0400 Subject: [PATCH] troubleshoot broadcom wireless lockup in 5.2rc1 Signed-off-by: Brian Masney <masneyb@onstation.org> --- drivers/mmc/core/core.c | 15 ++++ .../broadcom/brcm80211/brcmfmac/bcmsdh.c | 32 ++++--- .../broadcom/brcm80211/brcmfmac/sdio.c | 85 ++++++++++++------- include/linux/mmc/host.h | 1 + 4 files changed, 93 insertions(+), 40 deletions(-) diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index 6db36dc870b5..768acabf029b 100644 --- a/drivers/mmc/core/core.c +++ b/drivers/mmc/core/core.c @@ -814,7 +814,22 @@ int __mmc_claim_host(struct mmc_host *host, struct mmc_ctx *ctx, if (stop || !host->claimed || mmc_ctx_matches(host, ctx, task)) break; spin_unlock_irqrestore(&host->lock, flags); + + if (ctx != NULL && ctx->descr != NULL) { + WARN_ON(1); + dev_info(&host->class_dev, "%s: FIXME - before schedule() - descr=%s, claimer=%s\n", + __func__, ctx->descr, + host->claimer != NULL ? host->claimer->descr : NULL); + } + schedule(); + + if (ctx != NULL && ctx->descr != NULL) { + dev_info(&host->class_dev, "%s: FIXME - after schedule() - descr=%s, claimer=%s\n", + __func__, ctx->descr, + host->claimer != NULL ? host->claimer->descr : NULL); + } + spin_lock_irqsave(&host->lock, flags); } set_current_state(TASK_RUNNING); diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c index 60aede5abb4d..aa947fcea736 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c @@ -47,6 +47,8 @@ #include "sdio.h" #include "core.h" #include "common.h" +#include "../../../../../mmc/core/host.h" +#include "../../../../../mmc/core/core.h" #define SDIOH_API_ACCESS_RETRY_LIMIT 2 @@ -59,6 +61,16 @@ #define BRCMF_DEFAULT_RXGLOM_SIZE 32 /* max rx frames in glom chain */ +static void sdio_claim_host_with_descr(struct sdio_func *func, + const char *descr) +{ + struct mmc_ctx mmc_ctx; + + mmc_ctx.task = NULL; + mmc_ctx.descr = descr; + __mmc_claim_host(func->card->host, &mmc_ctx, NULL); +} + struct brcmf_sdiod_freezer { atomic_t freezing; atomic_t thread_count; @@ -132,7 +144,7 @@ int brcmf_sdiod_intr_register(struct brcmf_sdio_dev *sdiodev) } sdiodev->irq_wake = true; - sdio_claim_host(sdiodev->func1); + sdio_claim_host_with_descr(sdiodev->func1, __func__); if (sdiodev->bus_if->chip == BRCM_CC_43362_CHIP_ID) { /* assign GPIO to SDIO core */ @@ -162,7 +174,7 @@ int brcmf_sdiod_intr_register(struct brcmf_sdio_dev *sdiodev) sdio_release_host(sdiodev->func1); } else { brcmf_dbg(SDIO, "Entering\n"); - sdio_claim_host(sdiodev->func1); + sdio_claim_host_with_descr(sdiodev->func1, __func__); sdio_claim_irq(sdiodev->func1, brcmf_sdiod_ib_irqhandler); sdio_claim_irq(sdiodev->func2, brcmf_sdiod_dummy_irqhandler); sdio_release_host(sdiodev->func1); @@ -183,7 +195,7 @@ void brcmf_sdiod_intr_unregister(struct brcmf_sdio_dev *sdiodev) struct brcmfmac_sdio_pd *pdata; pdata = &sdiodev->settings->bus.sdio; - sdio_claim_host(sdiodev->func1); + sdio_claim_host_with_descr(sdiodev->func1, __func__); brcmf_sdiod_func0_wb(sdiodev, SDIO_CCCR_BRCM_SEPINT, 0, NULL); brcmf_sdiod_func0_wb(sdiodev, SDIO_CCCR_IENx, 0, NULL); sdio_release_host(sdiodev->func1); @@ -199,7 +211,7 @@ void brcmf_sdiod_intr_unregister(struct brcmf_sdio_dev *sdiodev) } if (sdiodev->sd_irq_requested) { - sdio_claim_host(sdiodev->func1); + sdio_claim_host_with_descr(sdiodev->func1, __func__); sdio_release_irq(sdiodev->func2); sdio_release_irq(sdiodev->func1); sdio_release_host(sdiodev->func1); @@ -695,7 +707,7 @@ brcmf_sdiod_ramrw(struct brcmf_sdio_dev *sdiodev, bool write, u32 address, else dsize = size; - sdio_claim_host(sdiodev->func1); + sdio_claim_host_with_descr(sdiodev->func1, __func__); /* Do the transfer(s) */ while (size) { @@ -827,7 +839,7 @@ static int brcmf_sdiod_freezer_on(struct brcmf_sdio_dev *sdiodev) brcmf_sdio_trigger_dpc(sdiodev->bus); wait_event(sdiodev->freezer->thread_freeze, atomic_read(expect) == sdiodev->freezer->frozen_count); - sdio_claim_host(sdiodev->func1); + sdio_claim_host_with_descr(sdiodev->func1, __func__); res = brcmf_sdio_sleep(sdiodev->bus, true); sdio_release_host(sdiodev->func1); return res; @@ -835,7 +847,7 @@ static int brcmf_sdiod_freezer_on(struct brcmf_sdio_dev *sdiodev) static void brcmf_sdiod_freezer_off(struct brcmf_sdio_dev *sdiodev) { - sdio_claim_host(sdiodev->func1); + sdio_claim_host_with_descr(sdiodev->func1, __func__); brcmf_sdio_sleep(sdiodev->bus, false); sdio_release_host(sdiodev->func1); atomic_set(&sdiodev->freezer->freezing, 0); @@ -887,12 +899,12 @@ static int brcmf_sdiod_remove(struct brcmf_sdio_dev *sdiodev) brcmf_sdiod_freezer_detach(sdiodev); /* Disable Function 2 */ - sdio_claim_host(sdiodev->func2); + sdio_claim_host_with_descr(sdiodev->func2, __func__); sdio_disable_func(sdiodev->func2); sdio_release_host(sdiodev->func2); /* Disable Function 1 */ - sdio_claim_host(sdiodev->func1); + sdio_claim_host_with_descr(sdiodev->func1, __func__); sdio_disable_func(sdiodev->func1); sdio_release_host(sdiodev->func1); @@ -915,7 +927,7 @@ static int brcmf_sdiod_probe(struct brcmf_sdio_dev *sdiodev) { int ret = 0; - sdio_claim_host(sdiodev->func1); + sdio_claim_host_with_descr(sdiodev->func1, __func__); ret = sdio_set_block_size(sdiodev->func1, SDIO_FUNC1_BLOCKSIZE); if (ret) { diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c index 22b73da42822..31acdb347e59 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c @@ -45,6 +45,8 @@ #include "core.h" #include "common.h" #include "bcdc.h" +#include "../../../../../mmc/core/host.h" +#include "../../../../../mmc/core/core.h" #define DCMD_RESP_TIMEOUT msecs_to_jiffies(2500) #define CTL_DONE_TIMEOUT msecs_to_jiffies(2500) @@ -651,6 +653,16 @@ static const struct brcmf_firmware_mapping brcmf_sdio_fwnames[] = { BRCMF_FW_ENTRY(CY_CC_43012_CHIP_ID, 0xFFFFFFFF, 43012) }; +static void sdio_claim_host_with_descr(struct sdio_func *func, + const char *descr) +{ + struct mmc_ctx mmc_ctx; + + mmc_ctx.task = NULL; + mmc_ctx.descr = descr; + __mmc_claim_host(func->card->host, &mmc_ctx, NULL); +} + static void pkt_align(struct sk_buff *p, int len, int align) { uint datalign; @@ -995,7 +1007,7 @@ static int brcmf_sdio_readshared(struct brcmf_sdio *bus, struct sdpcm_shared_le sh_le; __le32 addr_le; - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, __func__); brcmf_sdio_bus_sleep(bus, false, false); /* @@ -1583,7 +1595,7 @@ static u8 brcmf_sdio_rxglom(struct brcmf_sdio *bus, u8 rxseq) * read directly into the chained packet, or allocate a large * packet and and copy into the chain. */ - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, __func__); errcode = brcmf_sdiod_recv_chain(bus->sdiodev, &bus->glom, dlen); sdio_release_host(bus->sdiodev->func1); @@ -1594,7 +1606,8 @@ static u8 brcmf_sdio_rxglom(struct brcmf_sdio *bus, u8 rxseq) brcmf_err("glom read of %d bytes failed: %d\n", dlen, errcode); - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, + __func__); brcmf_sdio_rxfail(bus, true, false); bus->sdcnt.rxglomfail++; brcmf_sdio_free_glom(bus); @@ -1608,7 +1621,7 @@ static u8 brcmf_sdio_rxglom(struct brcmf_sdio *bus, u8 rxseq) rd_new.seq_num = rxseq; rd_new.len = dlen; - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, __func__); errcode = brcmf_sdio_hdparse(bus, pfirst->data, &rd_new, BRCMF_SDIO_FT_SUPER); sdio_release_host(bus->sdiodev->func1); @@ -1626,7 +1639,8 @@ static u8 brcmf_sdio_rxglom(struct brcmf_sdio *bus, u8 rxseq) rd_new.len = pnext->len; rd_new.seq_num = rxseq++; - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, + __func__); errcode = brcmf_sdio_hdparse(bus, pnext->data, &rd_new, BRCMF_SDIO_FT_SUB); sdio_release_host(bus->sdiodev->func1); @@ -1638,7 +1652,8 @@ static u8 brcmf_sdio_rxglom(struct brcmf_sdio *bus, u8 rxseq) if (errcode) { /* Terminate frame on error */ - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, + __func__); brcmf_sdio_rxfail(bus, true, false); bus->sdcnt.rxglomfail++; brcmf_sdio_free_glom(bus); @@ -1849,7 +1864,7 @@ static uint brcmf_sdio_readframes(struct brcmf_sdio *bus, uint maxframes) rd->len_left = rd->len; /* read header first for unknow frame length */ - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, __func__); if (!rd->len) { ret = brcmf_sdiod_recv_buf(bus->sdiodev, bus->rxhdr, BRCMF_FIRSTREAD); @@ -1916,7 +1931,8 @@ static uint brcmf_sdio_readframes(struct brcmf_sdio *bus, uint maxframes) brcmf_err("read %d bytes from channel %d failed: %d\n", rd->len, rd->channel, ret); brcmu_pkt_buf_free_skb(pkt); - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, + __func__); brcmf_sdio_rxfail(bus, true, RETRYCHAN(rd->channel)); sdio_release_host(bus->sdiodev->func1); @@ -1930,7 +1946,8 @@ static uint brcmf_sdio_readframes(struct brcmf_sdio *bus, uint maxframes) } else { memcpy(bus->rxhdr, pkt->data, SDPCM_HDRLEN); rd_new.seq_num = rd->seq_num; - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, + __func__); if (brcmf_sdio_hdparse(bus, bus->rxhdr, &rd_new, BRCMF_SDIO_FT_NORMAL)) { rd->len = 0; @@ -1963,7 +1980,8 @@ static uint brcmf_sdio_readframes(struct brcmf_sdio *bus, uint maxframes) rd_new.seq_num); /* Force retry w/normal header read */ rd->len = 0; - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, + __func__); brcmf_sdio_rxfail(bus, false, true); sdio_release_host(bus->sdiodev->func1); brcmu_pkt_buf_free_skb(pkt); @@ -1988,7 +2006,8 @@ static uint brcmf_sdio_readframes(struct brcmf_sdio *bus, uint maxframes) } else { brcmf_err("%s: glom superframe w/o " "descriptor!\n", __func__); - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, + __func__); brcmf_sdio_rxfail(bus, false, false); sdio_release_host(bus->sdiodev->func1); } @@ -2267,7 +2286,7 @@ static int brcmf_sdio_txpkt(struct brcmf_sdio *bus, struct sk_buff_head *pktq, if (ret) goto done; - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, __func__); ret = brcmf_sdiod_send_pkt(bus->sdiodev, pktq); bus->sdcnt.f2txdata++; @@ -2330,7 +2349,8 @@ static uint brcmf_sdio_sendfromq(struct brcmf_sdio *bus, uint maxframes) /* In poll mode, need to check for other events */ if (!bus->intr) { /* Check device status, signal pending interrupt */ - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, + __func__); intstatus = brcmf_sdiod_readl(bus->sdiodev, intstat_addr, &ret); sdio_release_host(bus->sdiodev->func1); @@ -2442,7 +2462,7 @@ static void brcmf_sdio_bus_stop(struct device *dev) } if (sdiodev->state != BRCMF_SDIOD_NOMEDIUM) { - sdio_claim_host(sdiodev->func1); + sdio_claim_host_with_descr(sdiodev->func1, __func__); /* Enable clock for device interrupts */ brcmf_sdio_bus_sleep(bus, false, false); @@ -2552,7 +2572,7 @@ static void brcmf_sdio_dpc(struct brcmf_sdio *bus) brcmf_dbg(SDIO, "Enter\n"); - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, __func__); /* If waiting for HTAVAIL, check status */ if (!bus->sr_enabled && bus->clkstate == CLK_PENDING) { @@ -2658,7 +2678,7 @@ static void brcmf_sdio_dpc(struct brcmf_sdio *bus) if (bus->ctrl_frame_stat && (bus->clkstate == CLK_AVAIL) && data_ok(bus)) { - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, __func__); if (bus->ctrl_frame_stat) { err = brcmf_sdio_tx_ctrlframe(bus, bus->ctrl_frame_buf, bus->ctrl_frame_len); @@ -2682,7 +2702,8 @@ static void brcmf_sdio_dpc(struct brcmf_sdio *bus) brcmf_err("failed backplane access over SDIO, halting operation\n"); atomic_set(&bus->intstatus, 0); if (bus->ctrl_frame_stat) { - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, + __func__); if (bus->ctrl_frame_stat) { bus->ctrl_frame_err = -ENODEV; wmb(); @@ -2904,7 +2925,7 @@ brcmf_sdio_bus_txctl(struct device *dev, unsigned char *msg, uint msglen) CTL_DONE_TIMEOUT); ret = 0; if (bus->ctrl_frame_stat) { - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, __func__); if (bus->ctrl_frame_stat) { brcmf_dbg(SDIO, "ctrl_frame timeout\n"); bus->ctrl_frame_stat = false; @@ -3048,7 +3069,7 @@ static int brcmf_sdio_assert_info(struct seq_file *seq, struct brcmf_sdio *bus, return 0; } - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, __func__); if (sh->assert_file_addr != 0) { error = brcmf_sdiod_ramrw(bus->sdiodev, false, sh->assert_file_addr, (u8 *)file, 80); @@ -3340,7 +3361,7 @@ static int brcmf_sdio_download_firmware(struct brcmf_sdio *bus, int bcmerror; u32 rstvec; - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, __func__); brcmf_sdio_clkctl(bus, CLK_AVAIL, false); rstvec = get_unaligned_le32(fw->data); @@ -3564,7 +3585,7 @@ static int brcmf_sdio_bus_get_memdump(struct device *dev, void *data, address = bus->ci->rambase; offset = err = 0; - sdio_claim_host(sdiodev->func1); + sdio_claim_host_with_descr(sdiodev->func1, __func__); while (offset < mem_size) { len = ((offset + MEMBLOCK) < mem_size) ? MEMBLOCK : mem_size - offset; @@ -3637,7 +3658,8 @@ static void brcmf_sdio_bus_watchdog(struct brcmf_sdio *bus) if (!bus->dpc_triggered) { u8 devpend; - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, + __func__); devpend = brcmf_sdiod_func0_rb(bus->sdiodev, SDIO_CCCR_INTx, NULL); sdio_release_host(bus->sdiodev->func1); @@ -3666,7 +3688,8 @@ static void brcmf_sdio_bus_watchdog(struct brcmf_sdio *bus) bus->console.count += jiffies_to_msecs(BRCMF_WD_POLL); if (bus->console.count >= bus->console_interval) { bus->console.count -= bus->console_interval; - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, + __func__); /* Make sure backplane clock is on */ brcmf_sdio_bus_sleep(bus, false, false); if (brcmf_sdio_readconsole(bus) < 0) @@ -3685,7 +3708,8 @@ static void brcmf_sdio_bus_watchdog(struct brcmf_sdio *bus) bus->idlecount++; if (bus->idlecount > bus->idletime) { brcmf_dbg(SDIO, "idle\n"); - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, + __func__); brcmf_sdio_wd_timer(bus, false); bus->idlecount = 0; brcmf_sdio_bus_sleep(bus, true, false); @@ -3903,7 +3927,7 @@ brcmf_sdio_probe_attach(struct brcmf_sdio *bus) u32 drivestrength; sdiodev = bus->sdiodev; - sdio_claim_host(sdiodev->func1); + sdio_claim_host_with_descr(sdiodev->func1, __func__); pr_debug("F1 signature read @0x18000000=0x%4x\n", brcmf_sdiod_readl(sdiodev, SI_ENUM_BASE, NULL)); @@ -4147,7 +4171,7 @@ static void brcmf_sdio_firmware_callback(struct device *dev, int err, bus->sdcnt.tickcnt = 0; brcmf_sdio_wd_timer(bus, true); - sdio_claim_host(sdiod->func1); + sdio_claim_host_with_descr(sdiod->func1, __func__); /* Make sure backplane clock is on, needed to generate F2 interrupt */ brcmf_sdio_clkctl(bus, CLK_AVAIL, false); @@ -4255,7 +4279,7 @@ static void brcmf_sdio_firmware_callback(struct device *dev, int err, err = brcmf_attach(sdiod->dev, sdiod->settings); if (err != 0) { brcmf_err("brcmf_attach failed\n"); - sdio_claim_host(sdiod->func1); + sdio_claim_host_with_descr(sdiod->func1, __func__); goto checkdied; } @@ -4361,7 +4385,7 @@ struct brcmf_sdio *brcmf_sdio_probe(struct brcmf_sdio_dev *sdiodev) bus->blocksize = bus->sdiodev->func2->cur_blksize; bus->roundup = min(max_roundup, bus->blocksize); - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, __func__); /* Disable F2 to clear any intermediate frame state on the dongle */ sdio_disable_func(bus->sdiodev->func2); @@ -4428,7 +4452,8 @@ void brcmf_sdio_remove(struct brcmf_sdio *bus) if (bus->ci) { if (bus->sdiodev->state != BRCMF_SDIOD_NOMEDIUM) { - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, + __func__); brcmf_sdio_wd_timer(bus, false); brcmf_sdio_clkctl(bus, CLK_AVAIL, false); /* Leave the device in state where it is @@ -4485,7 +4510,7 @@ int brcmf_sdio_sleep(struct brcmf_sdio *bus, bool sleep) { int ret; - sdio_claim_host(bus->sdiodev->func1); + sdio_claim_host_with_descr(bus->sdiodev->func1, __func__); ret = brcmf_sdio_bus_sleep(bus, sleep, false); sdio_release_host(bus->sdiodev->func1); diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h index 43d0f0c496f6..6ccc76150f45 100644 --- a/include/linux/mmc/host.h +++ b/include/linux/mmc/host.h @@ -267,6 +267,7 @@ struct mmc_supply { }; struct mmc_ctx { + const char *descr; struct task_struct *task; };
On 26/05/19 10:58 PM, Brian Masney wrote: > On Sun, May 26, 2019 at 08:42:21PM +0200, Arend Van Spriel wrote: >> On 5/26/2019 2:21 PM, Brian Masney wrote: >>> + Broadcom wireless maintainers >>> >>> On Fri, May 24, 2019 at 11:49:58AM -0400, Brian Masney wrote: >>>> On Fri, May 24, 2019 at 03:17:13PM +0300, Adrian Hunter wrote: >>>>> On 24/05/19 2:10 PM, Brian Masney wrote: >>>>>> WiFi stopped working on the LG Nexus 5 phone and the issue was bisected >>>>>> to the commit c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") that >>>>>> moved from using a tasklet to a work queue. That patch also changed >>>>>> sdhci_irq() to return IRQ_WAKE_THREAD instead of finishing the work when >>>>>> sdhci_defer_done() is true. Change it to queue work to the complete work >>>>>> queue if sdhci_defer_done() is true so that the functionality is >>>>>> equilivent to what was there when the finish_tasklet was present. This >>>>>> corrects the WiFi breakage on the Nexus 5 phone. >>>>>> >>>>>> Signed-off-by: Brian Masney <masneyb@onstation.org> >>>>>> Fixes: c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") >>>>>> --- >>>>>> [ ... ] >>>>>> >>>>>> drivers/mmc/host/sdhci.c | 2 +- >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c >>>>>> index 97158344b862..3563c3bc57c9 100644 >>>>>> --- a/drivers/mmc/host/sdhci.c >>>>>> +++ b/drivers/mmc/host/sdhci.c >>>>>> @@ -3115,7 +3115,7 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id) >>>>>> continue; >>>>>> if (sdhci_defer_done(host, mrq)) { >>>>>> - result = IRQ_WAKE_THREAD; >>>>>> + queue_work(host->complete_wq, &host->complete_work); >>>>> >>>>> The IRQ thread has a lot less latency than the work queue, which is why it >>>>> is done that way. >>>>> >>>>> I am not sure why you say this change is equivalent to what was there >>>>> before, nor why it fixes your problem. >>>>> >>>>> Can you explain some more? >>>> >>>> [ ... ] >>>> >>>> drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c calls >>>> sdio_claim_host() and it appears to never return. This is because SDHCI is using the IRQ thread to process the SDIO card interrupt (sdio_run_irqs()). When the card driver tries to use the card, it causes interrupts which deadlocks since c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") has moved the tasklet processing to the IRQ thread. I would expect to be able to use the IRQ thread to complete requests, and it is desirable to do so because it is lower latency. Probably, SDHCI should use sdio_signal_irq() which queues a work item, and is what other drivers are doing. I will investigate some more and send a patch.
On Sun, May 26, 2019 at 03:58:19PM -0400, Brian Masney wrote: > I attached a patch that shows how I was able to determine what had > already claimed the host. I realized this morning that I had a flaw with my test patch that diagnosed what was deadlocked. The mmc_ctx structure was allocated on the stack. I attached a second version of that patch that uses kmalloc() for that structure. It didn't change what I reported yesterday: brcmf_sdiod_ramrw is deadlocked by brcmf_sdio_download_firmware. On Mon, May 27, 2019 at 10:48:24AM +0300, Adrian Hunter wrote: > This is because SDHCI is using the IRQ thread to process the SDIO card > interrupt (sdio_run_irqs()). When the card driver tries to use the card, it > causes interrupts which deadlocks since c07a48c26519 ("mmc: sdhci: Remove > finish_tasklet") has moved the tasklet processing to the IRQ thread. > > I would expect to be able to use the IRQ thread to complete requests, and it > is desirable to do so because it is lower latency. > > Probably, SDHCI should use sdio_signal_irq() which queues a work item, and > is what other drivers are doing. > > I will investigate some more and send a patch. Thank you! Brian From acc78b5e581d2c3ebc994a6ad6e7b367f2b81935 Mon Sep 17 00:00:00 2001 From: Brian Masney <masneyb@onstation.org> Date: Sun, 26 May 2019 15:36:40 -0400 Subject: [PATCH] troubleshoot broadcom wireless lockup in 5.2rc1 (v2) Signed-off-by: Brian Masney <masneyb@onstation.org> --- drivers/mmc/core/core.c | 15 ++ .../broadcom/brcm80211/brcmfmac/bcmsdh.c | 46 ++--- .../broadcom/brcm80211/brcmfmac/sdio.c | 160 ++++++++++-------- .../broadcom/brcm80211/brcmfmac/sdio.h | 3 + include/linux/mmc/host.h | 1 + 5 files changed, 136 insertions(+), 89 deletions(-) diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index 6db36dc870b5..768acabf029b 100644 --- a/drivers/mmc/core/core.c +++ b/drivers/mmc/core/core.c @@ -814,7 +814,22 @@ int __mmc_claim_host(struct mmc_host *host, struct mmc_ctx *ctx, if (stop || !host->claimed || mmc_ctx_matches(host, ctx, task)) break; spin_unlock_irqrestore(&host->lock, flags); + + if (ctx != NULL && ctx->descr != NULL) { + WARN_ON(1); + dev_info(&host->class_dev, "%s: FIXME - before schedule() - descr=%s, claimer=%s\n", + __func__, ctx->descr, + host->claimer != NULL ? host->claimer->descr : NULL); + } + schedule(); + + if (ctx != NULL && ctx->descr != NULL) { + dev_info(&host->class_dev, "%s: FIXME - after schedule() - descr=%s, claimer=%s\n", + __func__, ctx->descr, + host->claimer != NULL ? host->claimer->descr : NULL); + } + spin_lock_irqsave(&host->lock, flags); } set_current_state(TASK_RUNNING); diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c index 60aede5abb4d..fdeaeb1353af 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c @@ -47,6 +47,8 @@ #include "sdio.h" #include "core.h" #include "common.h" +#include "../../../../../mmc/core/host.h" +#include "../../../../../mmc/core/core.h" #define SDIOH_API_ACCESS_RETRY_LIMIT 2 @@ -132,7 +134,7 @@ int brcmf_sdiod_intr_register(struct brcmf_sdio_dev *sdiodev) } sdiodev->irq_wake = true; - sdio_claim_host(sdiodev->func1); + brcmf_sdio_claim_host(sdiodev->func1, __func__); if (sdiodev->bus_if->chip == BRCM_CC_43362_CHIP_ID) { /* assign GPIO to SDIO core */ @@ -159,13 +161,13 @@ int brcmf_sdiod_intr_register(struct brcmf_sdio_dev *sdiodev) data |= SDIO_CCCR_BRCM_SEPINT_ACT_HI; brcmf_sdiod_func0_wb(sdiodev, SDIO_CCCR_BRCM_SEPINT, data, &ret); - sdio_release_host(sdiodev->func1); + brcmf_sdio_release_host(sdiodev->func1); } else { brcmf_dbg(SDIO, "Entering\n"); - sdio_claim_host(sdiodev->func1); + brcmf_sdio_claim_host(sdiodev->func1, __func__); sdio_claim_irq(sdiodev->func1, brcmf_sdiod_ib_irqhandler); sdio_claim_irq(sdiodev->func2, brcmf_sdiod_dummy_irqhandler); - sdio_release_host(sdiodev->func1); + brcmf_sdio_release_host(sdiodev->func1); sdiodev->sd_irq_requested = true; } @@ -183,10 +185,10 @@ void brcmf_sdiod_intr_unregister(struct brcmf_sdio_dev *sdiodev) struct brcmfmac_sdio_pd *pdata; pdata = &sdiodev->settings->bus.sdio; - sdio_claim_host(sdiodev->func1); + brcmf_sdio_claim_host(sdiodev->func1, __func__); brcmf_sdiod_func0_wb(sdiodev, SDIO_CCCR_BRCM_SEPINT, 0, NULL); brcmf_sdiod_func0_wb(sdiodev, SDIO_CCCR_IENx, 0, NULL); - sdio_release_host(sdiodev->func1); + brcmf_sdio_release_host(sdiodev->func1); sdiodev->oob_irq_requested = false; if (sdiodev->irq_wake) { @@ -199,10 +201,10 @@ void brcmf_sdiod_intr_unregister(struct brcmf_sdio_dev *sdiodev) } if (sdiodev->sd_irq_requested) { - sdio_claim_host(sdiodev->func1); + brcmf_sdio_claim_host(sdiodev->func1, __func__); sdio_release_irq(sdiodev->func2); sdio_release_irq(sdiodev->func1); - sdio_release_host(sdiodev->func1); + brcmf_sdio_release_host(sdiodev->func1); sdiodev->sd_irq_requested = false; } } @@ -695,7 +697,7 @@ brcmf_sdiod_ramrw(struct brcmf_sdio_dev *sdiodev, bool write, u32 address, else dsize = size; - sdio_claim_host(sdiodev->func1); + brcmf_sdio_claim_host(sdiodev->func1, __func__); /* Do the transfer(s) */ while (size) { @@ -742,7 +744,7 @@ brcmf_sdiod_ramrw(struct brcmf_sdio_dev *sdiodev, bool write, u32 address, dev_kfree_skb(pkt); - sdio_release_host(sdiodev->func1); + brcmf_sdio_release_host(sdiodev->func1); return err; } @@ -827,17 +829,17 @@ static int brcmf_sdiod_freezer_on(struct brcmf_sdio_dev *sdiodev) brcmf_sdio_trigger_dpc(sdiodev->bus); wait_event(sdiodev->freezer->thread_freeze, atomic_read(expect) == sdiodev->freezer->frozen_count); - sdio_claim_host(sdiodev->func1); + brcmf_sdio_claim_host(sdiodev->func1, __func__); res = brcmf_sdio_sleep(sdiodev->bus, true); - sdio_release_host(sdiodev->func1); + brcmf_sdio_release_host(sdiodev->func1); return res; } static void brcmf_sdiod_freezer_off(struct brcmf_sdio_dev *sdiodev) { - sdio_claim_host(sdiodev->func1); + brcmf_sdio_claim_host(sdiodev->func1, __func__); brcmf_sdio_sleep(sdiodev->bus, false); - sdio_release_host(sdiodev->func1); + brcmf_sdio_release_host(sdiodev->func1); atomic_set(&sdiodev->freezer->freezing, 0); complete_all(&sdiodev->freezer->resumed); } @@ -887,14 +889,14 @@ static int brcmf_sdiod_remove(struct brcmf_sdio_dev *sdiodev) brcmf_sdiod_freezer_detach(sdiodev); /* Disable Function 2 */ - sdio_claim_host(sdiodev->func2); + brcmf_sdio_claim_host(sdiodev->func2, __func__); sdio_disable_func(sdiodev->func2); - sdio_release_host(sdiodev->func2); + brcmf_sdio_release_host(sdiodev->func2); /* Disable Function 1 */ - sdio_claim_host(sdiodev->func1); + brcmf_sdio_claim_host(sdiodev->func1, __func__); sdio_disable_func(sdiodev->func1); - sdio_release_host(sdiodev->func1); + brcmf_sdio_release_host(sdiodev->func1); sg_free_table(&sdiodev->sgtable); sdiodev->sbwad = 0; @@ -915,18 +917,18 @@ static int brcmf_sdiod_probe(struct brcmf_sdio_dev *sdiodev) { int ret = 0; - sdio_claim_host(sdiodev->func1); + brcmf_sdio_claim_host(sdiodev->func1, __func__); ret = sdio_set_block_size(sdiodev->func1, SDIO_FUNC1_BLOCKSIZE); if (ret) { brcmf_err("Failed to set F1 blocksize\n"); - sdio_release_host(sdiodev->func1); + brcmf_sdio_release_host(sdiodev->func1); goto out; } ret = sdio_set_block_size(sdiodev->func2, SDIO_FUNC2_BLOCKSIZE); if (ret) { brcmf_err("Failed to set F2 blocksize\n"); - sdio_release_host(sdiodev->func1); + brcmf_sdio_release_host(sdiodev->func1); goto out; } @@ -935,7 +937,7 @@ static int brcmf_sdiod_probe(struct brcmf_sdio_dev *sdiodev) /* Enable Function 1 */ ret = sdio_enable_func(sdiodev->func1); - sdio_release_host(sdiodev->func1); + brcmf_sdio_release_host(sdiodev->func1); if (ret) { brcmf_err("Failed to enable F1: err=%d\n", ret); goto out; diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c index 22b73da42822..ba836aa5da78 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c @@ -45,6 +45,8 @@ #include "core.h" #include "common.h" #include "bcdc.h" +#include "../../../../../mmc/core/host.h" +#include "../../../../../mmc/core/core.h" #define DCMD_RESP_TIMEOUT msecs_to_jiffies(2500) #define CTL_DONE_TIMEOUT msecs_to_jiffies(2500) @@ -651,6 +653,25 @@ static const struct brcmf_firmware_mapping brcmf_sdio_fwnames[] = { BRCMF_FW_ENTRY(CY_CC_43012_CHIP_ID, 0xFFFFFFFF, 43012) }; +void brcmf_sdio_claim_host(struct sdio_func *func, const char *descr) +{ + struct mmc_ctx *mmc_ctx; + + mmc_ctx = kmalloc(sizeof(*mmc_ctx), GFP_KERNEL); + mmc_ctx->task = NULL; + mmc_ctx->descr = descr; + __mmc_claim_host(func->card->host, mmc_ctx, NULL); +} + +void brcmf_sdio_release_host(struct sdio_func *func) +{ + struct mmc_ctx *tofree; + + tofree = func->card->host->claimer; + sdio_release_host(func); + kfree(tofree); +} + static void pkt_align(struct sk_buff *p, int len, int align) { uint datalign; @@ -995,7 +1016,7 @@ static int brcmf_sdio_readshared(struct brcmf_sdio *bus, struct sdpcm_shared_le sh_le; __le32 addr_le; - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, __func__); brcmf_sdio_bus_sleep(bus, false, false); /* @@ -1029,7 +1050,7 @@ static int brcmf_sdio_readshared(struct brcmf_sdio *bus, if (rv < 0) goto fail; - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); /* Endianness */ sh->flags = le32_to_cpu(sh_le.flags); @@ -1051,7 +1072,7 @@ static int brcmf_sdio_readshared(struct brcmf_sdio *bus, fail: brcmf_err("unable to obtain sdpcm_shared info: rv=%d (addr=0x%x)\n", rv, addr); - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); return rv; } @@ -1583,10 +1604,10 @@ static u8 brcmf_sdio_rxglom(struct brcmf_sdio *bus, u8 rxseq) * read directly into the chained packet, or allocate a large * packet and and copy into the chain. */ - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, __func__); errcode = brcmf_sdiod_recv_chain(bus->sdiodev, &bus->glom, dlen); - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); bus->sdcnt.f2rxdata++; /* On failure, kill the superframe */ @@ -1594,11 +1615,11 @@ static u8 brcmf_sdio_rxglom(struct brcmf_sdio *bus, u8 rxseq) brcmf_err("glom read of %d bytes failed: %d\n", dlen, errcode); - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, __func__); brcmf_sdio_rxfail(bus, true, false); bus->sdcnt.rxglomfail++; brcmf_sdio_free_glom(bus); - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); return 0; } @@ -1608,10 +1629,10 @@ static u8 brcmf_sdio_rxglom(struct brcmf_sdio *bus, u8 rxseq) rd_new.seq_num = rxseq; rd_new.len = dlen; - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, __func__); errcode = brcmf_sdio_hdparse(bus, pfirst->data, &rd_new, BRCMF_SDIO_FT_SUPER); - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); bus->cur_read.len = rd_new.len_nxtfrm << 4; /* Remove superframe header, remember offset */ @@ -1626,10 +1647,10 @@ static u8 brcmf_sdio_rxglom(struct brcmf_sdio *bus, u8 rxseq) rd_new.len = pnext->len; rd_new.seq_num = rxseq++; - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, __func__); errcode = brcmf_sdio_hdparse(bus, pnext->data, &rd_new, BRCMF_SDIO_FT_SUB); - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); brcmf_dbg_hex_dump(BRCMF_GLOM_ON(), pnext->data, 32, "subframe:\n"); @@ -1638,11 +1659,11 @@ static u8 brcmf_sdio_rxglom(struct brcmf_sdio *bus, u8 rxseq) if (errcode) { /* Terminate frame on error */ - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, __func__); brcmf_sdio_rxfail(bus, true, false); bus->sdcnt.rxglomfail++; brcmf_sdio_free_glom(bus); - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); bus->cur_read.len = 0; return 0; } @@ -1849,7 +1870,7 @@ static uint brcmf_sdio_readframes(struct brcmf_sdio *bus, uint maxframes) rd->len_left = rd->len; /* read header first for unknow frame length */ - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, __func__); if (!rd->len) { ret = brcmf_sdiod_recv_buf(bus->sdiodev, bus->rxhdr, BRCMF_FIRSTREAD); @@ -1859,7 +1880,7 @@ static uint brcmf_sdio_readframes(struct brcmf_sdio *bus, uint maxframes) ret); bus->sdcnt.rx_hdrfail++; brcmf_sdio_rxfail(bus, true, true); - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); continue; } @@ -1869,7 +1890,7 @@ static uint brcmf_sdio_readframes(struct brcmf_sdio *bus, uint maxframes) if (brcmf_sdio_hdparse(bus, bus->rxhdr, rd, BRCMF_SDIO_FT_NORMAL)) { - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); if (!bus->rxpending) break; else @@ -1885,7 +1906,7 @@ static uint brcmf_sdio_readframes(struct brcmf_sdio *bus, uint maxframes) rd->len_nxtfrm = 0; /* treat all packet as event if we don't know */ rd->channel = SDPCM_EVENT_CHANNEL; - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); continue; } rd->len_left = rd->len > BRCMF_FIRSTREAD ? @@ -1902,7 +1923,7 @@ static uint brcmf_sdio_readframes(struct brcmf_sdio *bus, uint maxframes) brcmf_err("brcmu_pkt_buf_get_skb failed\n"); brcmf_sdio_rxfail(bus, false, RETRYCHAN(rd->channel)); - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); continue; } skb_pull(pkt, head_read); @@ -1910,16 +1931,16 @@ static uint brcmf_sdio_readframes(struct brcmf_sdio *bus, uint maxframes) ret = brcmf_sdiod_recv_pkt(bus->sdiodev, pkt); bus->sdcnt.f2rxdata++; - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); if (ret < 0) { brcmf_err("read %d bytes from channel %d failed: %d\n", rd->len, rd->channel, ret); brcmu_pkt_buf_free_skb(pkt); - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, __func__); brcmf_sdio_rxfail(bus, true, RETRYCHAN(rd->channel)); - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); continue; } @@ -1930,7 +1951,7 @@ static uint brcmf_sdio_readframes(struct brcmf_sdio *bus, uint maxframes) } else { memcpy(bus->rxhdr, pkt->data, SDPCM_HDRLEN); rd_new.seq_num = rd->seq_num; - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, __func__); if (brcmf_sdio_hdparse(bus, bus->rxhdr, &rd_new, BRCMF_SDIO_FT_NORMAL)) { rd->len = 0; @@ -1943,11 +1964,11 @@ static uint brcmf_sdio_readframes(struct brcmf_sdio *bus, uint maxframes) roundup(rd_new.len, 16) >> 4); rd->len = 0; brcmf_sdio_rxfail(bus, true, true); - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); brcmu_pkt_buf_free_skb(pkt); continue; } - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); rd->len_nxtfrm = rd_new.len_nxtfrm; rd->channel = rd_new.channel; rd->dat_offset = rd_new.dat_offset; @@ -1963,9 +1984,10 @@ static uint brcmf_sdio_readframes(struct brcmf_sdio *bus, uint maxframes) rd_new.seq_num); /* Force retry w/normal header read */ rd->len = 0; - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, + __func__); brcmf_sdio_rxfail(bus, false, true); - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); brcmu_pkt_buf_free_skb(pkt); continue; } @@ -1988,9 +2010,10 @@ static uint brcmf_sdio_readframes(struct brcmf_sdio *bus, uint maxframes) } else { brcmf_err("%s: glom superframe w/o " "descriptor!\n", __func__); - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, + __func__); brcmf_sdio_rxfail(bus, false, false); - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); } /* prepare the descriptor for the next read */ rd->len = rd->len_nxtfrm << 4; @@ -2267,14 +2290,14 @@ static int brcmf_sdio_txpkt(struct brcmf_sdio *bus, struct sk_buff_head *pktq, if (ret) goto done; - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, __func__); ret = brcmf_sdiod_send_pkt(bus->sdiodev, pktq); bus->sdcnt.f2txdata++; if (ret < 0) brcmf_sdio_txfail(bus); - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); done: brcmf_sdio_txpkt_postp(bus, pktq); @@ -2330,10 +2353,10 @@ static uint brcmf_sdio_sendfromq(struct brcmf_sdio *bus, uint maxframes) /* In poll mode, need to check for other events */ if (!bus->intr) { /* Check device status, signal pending interrupt */ - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, __func__); intstatus = brcmf_sdiod_readl(bus->sdiodev, intstat_addr, &ret); - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); bus->sdcnt.f2txdata++; if (ret != 0) @@ -2442,7 +2465,7 @@ static void brcmf_sdio_bus_stop(struct device *dev) } if (sdiodev->state != BRCMF_SDIOD_NOMEDIUM) { - sdio_claim_host(sdiodev->func1); + brcmf_sdio_claim_host(sdiodev->func1, __func__); /* Enable clock for device interrupts */ brcmf_sdio_bus_sleep(bus, false, false); @@ -2477,7 +2500,7 @@ static void brcmf_sdio_bus_stop(struct device *dev) brcmf_sdiod_writel(sdiodev, core->base + SD_REG(intstatus), local_hostintmask, NULL); - sdio_release_host(sdiodev->func1); + brcmf_sdio_release_host(sdiodev->func1); } /* Clear the data packet queues */ brcmu_pktq_flush(&bus->txq, true, NULL, NULL); @@ -2552,7 +2575,7 @@ static void brcmf_sdio_dpc(struct brcmf_sdio *bus) brcmf_dbg(SDIO, "Enter\n"); - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, __func__); /* If waiting for HTAVAIL, check status */ if (!bus->sr_enabled && bus->clkstate == CLK_PENDING) { @@ -2615,7 +2638,7 @@ static void brcmf_sdio_dpc(struct brcmf_sdio *bus) intstatus |= brcmf_sdio_hostmail(bus); } - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); /* Generally don't ask for these, can get CRC errors... */ if (intstatus & I_WR_OOSYNC) { @@ -2658,7 +2681,7 @@ static void brcmf_sdio_dpc(struct brcmf_sdio *bus) if (bus->ctrl_frame_stat && (bus->clkstate == CLK_AVAIL) && data_ok(bus)) { - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, __func__); if (bus->ctrl_frame_stat) { err = brcmf_sdio_tx_ctrlframe(bus, bus->ctrl_frame_buf, bus->ctrl_frame_len); @@ -2666,7 +2689,7 @@ static void brcmf_sdio_dpc(struct brcmf_sdio *bus) wmb(); bus->ctrl_frame_stat = false; } - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); brcmf_sdio_wait_event_wakeup(bus); } /* Send queued frames (limit 1 if rx may still be pending) */ @@ -2682,14 +2705,14 @@ static void brcmf_sdio_dpc(struct brcmf_sdio *bus) brcmf_err("failed backplane access over SDIO, halting operation\n"); atomic_set(&bus->intstatus, 0); if (bus->ctrl_frame_stat) { - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, __func__); if (bus->ctrl_frame_stat) { bus->ctrl_frame_err = -ENODEV; wmb(); bus->ctrl_frame_stat = false; brcmf_sdio_wait_event_wakeup(bus); } - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); } } else if (atomic_read(&bus->intstatus) || atomic_read(&bus->ipend) > 0 || @@ -2904,13 +2927,13 @@ brcmf_sdio_bus_txctl(struct device *dev, unsigned char *msg, uint msglen) CTL_DONE_TIMEOUT); ret = 0; if (bus->ctrl_frame_stat) { - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, __func__); if (bus->ctrl_frame_stat) { brcmf_dbg(SDIO, "ctrl_frame timeout\n"); bus->ctrl_frame_stat = false; ret = -ETIMEDOUT; } - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); } if (!ret) { brcmf_dbg(SDIO, "ctrl_frame complete, err=%d\n", @@ -3048,7 +3071,7 @@ static int brcmf_sdio_assert_info(struct seq_file *seq, struct brcmf_sdio *bus, return 0; } - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, __func__); if (sh->assert_file_addr != 0) { error = brcmf_sdiod_ramrw(bus->sdiodev, false, sh->assert_file_addr, (u8 *)file, 80); @@ -3061,7 +3084,7 @@ static int brcmf_sdio_assert_info(struct seq_file *seq, struct brcmf_sdio *bus, if (error < 0) return error; } - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); seq_printf(seq, "dongle assert: %s:%d: assert(%s)\n", file, sh->assert_line, expr); @@ -3340,7 +3363,7 @@ static int brcmf_sdio_download_firmware(struct brcmf_sdio *bus, int bcmerror; u32 rstvec; - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, __func__); brcmf_sdio_clkctl(bus, CLK_AVAIL, false); rstvec = get_unaligned_le32(fw->data); @@ -3369,7 +3392,7 @@ static int brcmf_sdio_download_firmware(struct brcmf_sdio *bus, err: brcmf_sdio_clkctl(bus, CLK_SDONLY, false); - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); return bcmerror; } @@ -3564,7 +3587,7 @@ static int brcmf_sdio_bus_get_memdump(struct device *dev, void *data, address = bus->ci->rambase; offset = err = 0; - sdio_claim_host(sdiodev->func1); + brcmf_sdio_claim_host(sdiodev->func1, __func__); while (offset < mem_size) { len = ((offset + MEMBLOCK) < mem_size) ? MEMBLOCK : mem_size - offset; @@ -3580,7 +3603,7 @@ static int brcmf_sdio_bus_get_memdump(struct device *dev, void *data, } done: - sdio_release_host(sdiodev->func1); + brcmf_sdio_release_host(sdiodev->func1); return err; } @@ -3637,10 +3660,11 @@ static void brcmf_sdio_bus_watchdog(struct brcmf_sdio *bus) if (!bus->dpc_triggered) { u8 devpend; - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, + __func__); devpend = brcmf_sdiod_func0_rb(bus->sdiodev, SDIO_CCCR_INTx, NULL); - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); intstatus = devpend & (INTR_STATUS_FUNC1 | INTR_STATUS_FUNC2); } @@ -3666,13 +3690,13 @@ static void brcmf_sdio_bus_watchdog(struct brcmf_sdio *bus) bus->console.count += jiffies_to_msecs(BRCMF_WD_POLL); if (bus->console.count >= bus->console_interval) { bus->console.count -= bus->console_interval; - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, __func__); /* Make sure backplane clock is on */ brcmf_sdio_bus_sleep(bus, false, false); if (brcmf_sdio_readconsole(bus) < 0) /* stop on error */ bus->console_interval = 0; - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); } } #endif /* DEBUG */ @@ -3685,11 +3709,12 @@ static void brcmf_sdio_bus_watchdog(struct brcmf_sdio *bus) bus->idlecount++; if (bus->idlecount > bus->idletime) { brcmf_dbg(SDIO, "idle\n"); - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, + __func__); brcmf_sdio_wd_timer(bus, false); bus->idlecount = 0; brcmf_sdio_bus_sleep(bus, true, false); - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); } } else { bus->idlecount = 0; @@ -3903,7 +3928,7 @@ brcmf_sdio_probe_attach(struct brcmf_sdio *bus) u32 drivestrength; sdiodev = bus->sdiodev; - sdio_claim_host(sdiodev->func1); + brcmf_sdio_claim_host(sdiodev->func1, __func__); pr_debug("F1 signature read @0x18000000=0x%4x\n", brcmf_sdiod_readl(sdiodev, SI_ENUM_BASE, NULL)); @@ -4010,7 +4035,7 @@ brcmf_sdio_probe_attach(struct brcmf_sdio *bus) if (err) goto fail; - sdio_release_host(sdiodev->func1); + brcmf_sdio_release_host(sdiodev->func1); brcmu_pktq_init(&bus->txq, (PRIOMASK + 1), TXQLEN); @@ -4031,7 +4056,7 @@ brcmf_sdio_probe_attach(struct brcmf_sdio *bus) return true; fail: - sdio_release_host(sdiodev->func1); + brcmf_sdio_release_host(sdiodev->func1); return false; } @@ -4147,7 +4172,7 @@ static void brcmf_sdio_firmware_callback(struct device *dev, int err, bus->sdcnt.tickcnt = 0; brcmf_sdio_wd_timer(bus, true); - sdio_claim_host(sdiod->func1); + brcmf_sdio_claim_host(sdiod->func1, __func__); /* Make sure backplane clock is on, needed to generate F2 interrupt */ brcmf_sdio_clkctl(bus, CLK_AVAIL, false); @@ -4243,7 +4268,7 @@ static void brcmf_sdio_firmware_callback(struct device *dev, int err, goto checkdied; } - sdio_release_host(sdiod->func1); + brcmf_sdio_release_host(sdiod->func1); /* Assign bus interface call back */ sdiod->bus_if->dev = sdiod->dev; @@ -4255,7 +4280,7 @@ static void brcmf_sdio_firmware_callback(struct device *dev, int err, err = brcmf_attach(sdiod->dev, sdiod->settings); if (err != 0) { brcmf_err("brcmf_attach failed\n"); - sdio_claim_host(sdiod->func1); + brcmf_sdio_claim_host(sdiod->func1, __func__); goto checkdied; } @@ -4265,7 +4290,7 @@ static void brcmf_sdio_firmware_callback(struct device *dev, int err, checkdied: brcmf_sdio_checkdied(bus); release: - sdio_release_host(sdiod->func1); + brcmf_sdio_release_host(sdiod->func1); fail: brcmf_dbg(TRACE, "failed: dev=%s, err=%d\n", dev_name(dev), err); device_release_driver(&sdiod->func2->dev); @@ -4361,7 +4386,7 @@ struct brcmf_sdio *brcmf_sdio_probe(struct brcmf_sdio_dev *sdiodev) bus->blocksize = bus->sdiodev->func2->cur_blksize; bus->roundup = min(max_roundup, bus->blocksize); - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, __func__); /* Disable F2 to clear any intermediate frame state on the dongle */ sdio_disable_func(bus->sdiodev->func2); @@ -4371,7 +4396,7 @@ struct brcmf_sdio *brcmf_sdio_probe(struct brcmf_sdio_dev *sdiodev) /* Done with backplane-dependent accesses, can drop clock... */ brcmf_sdiod_writeb(bus->sdiodev, SBSDIO_FUNC1_CHIPCLKCSR, 0, NULL); - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); /* ...and initialize clock/power states */ bus->clkstate = CLK_SDONLY; @@ -4428,7 +4453,8 @@ void brcmf_sdio_remove(struct brcmf_sdio *bus) if (bus->ci) { if (bus->sdiodev->state != BRCMF_SDIOD_NOMEDIUM) { - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, + __func__); brcmf_sdio_wd_timer(bus, false); brcmf_sdio_clkctl(bus, CLK_AVAIL, false); /* Leave the device in state where it is @@ -4438,7 +4464,7 @@ void brcmf_sdio_remove(struct brcmf_sdio *bus) msleep(20); brcmf_chip_set_passive(bus->ci); brcmf_sdio_clkctl(bus, CLK_NONE, false); - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); } brcmf_chip_detach(bus->ci); } @@ -4485,9 +4511,9 @@ int brcmf_sdio_sleep(struct brcmf_sdio *bus, bool sleep) { int ret; - sdio_claim_host(bus->sdiodev->func1); + brcmf_sdio_claim_host(bus->sdiodev->func1, __func__); ret = brcmf_sdio_bus_sleep(bus, sleep, false); - sdio_release_host(bus->sdiodev->func1); + brcmf_sdio_release_host(bus->sdiodev->func1); return ret; } diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.h b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.h index 34b031154da9..51ade937f5b0 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.h +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.h @@ -388,4 +388,7 @@ void brcmf_sdio_wowl_config(struct device *dev, bool enabled); int brcmf_sdio_sleep(struct brcmf_sdio *bus, bool sleep); void brcmf_sdio_trigger_dpc(struct brcmf_sdio *bus); +void brcmf_sdio_claim_host(struct sdio_func *func, const char *descr); +void brcmf_sdio_release_host(struct sdio_func *func); + #endif /* BRCMFMAC_SDIO_H */ diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h index 43d0f0c496f6..6ccc76150f45 100644 --- a/include/linux/mmc/host.h +++ b/include/linux/mmc/host.h @@ -267,6 +267,7 @@ struct mmc_supply { }; struct mmc_ctx { + const char *descr; struct task_struct *task; };
On 27/05/19 12:37 PM, Brian Masney wrote: > On Sun, May 26, 2019 at 03:58:19PM -0400, Brian Masney wrote: >> I attached a patch that shows how I was able to determine what had >> already claimed the host. > On Mon, May 27, 2019 at 10:48:24AM +0300, Adrian Hunter wrote: >> This is because SDHCI is using the IRQ thread to process the SDIO card >> interrupt (sdio_run_irqs()). When the card driver tries to use the card, it >> causes interrupts which deadlocks since c07a48c26519 ("mmc: sdhci: Remove >> finish_tasklet") has moved the tasklet processing to the IRQ thread. >> >> I would expect to be able to use the IRQ thread to complete requests, and it >> is desirable to do so because it is lower latency. >> >> Probably, SDHCI should use sdio_signal_irq() which queues a work item, and >> is what other drivers are doing. >> >> I will investigate some more and send a patch. Please try the patch below: From: Adrian Hunter <adrian.hunter@intel.com> Date: Mon, 27 May 2019 14:45:55 +0300 Subject: [PATCH] mmc: sdhci: Fix SDIO IRQ thread deadlock Since commit c07a48c26519 ("mmc: sdhci: Remove finish_tasklet"), the IRQ thread might be used to complete requests, but the IRQ thread is also used to process SDIO card interrupts. This can cause a deadlock when the SDIO processing tries to access the card since that would also require the IRQ thread. Change SDHCI to use sdio_signal_irq() to schedule a work item instead. That also requires implementing the ->ack_sdio_irq() mmc host op. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Fixes: c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") --- drivers/mmc/host/sdhci.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c index 97158344b862..0cd5f2ce98df 100644 --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -2137,6 +2137,17 @@ void sdhci_enable_sdio_irq(struct mmc_host *mmc, int enable) } EXPORT_SYMBOL_GPL(sdhci_enable_sdio_irq); +static void sdhci_ack_sdio_irq(struct mmc_host *mmc) +{ + struct sdhci_host *host = mmc_priv(mmc); + unsigned long flags; + + spin_lock_irqsave(&host->lock, flags); + if (host->flags & SDHCI_SDIO_IRQ_ENABLED) + sdhci_enable_sdio_irq_nolock(host, true); + spin_unlock_irqrestore(&host->lock, flags); +} + int sdhci_start_signal_voltage_switch(struct mmc_host *mmc, struct mmc_ios *ios) { @@ -2585,6 +2596,7 @@ static const struct mmc_host_ops sdhci_ops = { .get_ro = sdhci_get_ro, .hw_reset = sdhci_hw_reset, .enable_sdio_irq = sdhci_enable_sdio_irq, + .ack_sdio_irq = sdhci_ack_sdio_irq, .start_signal_voltage_switch = sdhci_start_signal_voltage_switch, .prepare_hs400_tuning = sdhci_prepare_hs400_tuning, .execute_tuning = sdhci_execute_tuning, @@ -3087,8 +3099,7 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id) if ((intmask & SDHCI_INT_CARD_INT) && (host->ier & SDHCI_INT_CARD_INT)) { sdhci_enable_sdio_irq_nolock(host, false); - host->thread_isr |= SDHCI_INT_CARD_INT; - result = IRQ_WAKE_THREAD; + sdio_signal_irq(host->mmc); } intmask &= ~(SDHCI_INT_CARD_INSERT | SDHCI_INT_CARD_REMOVE | @@ -3160,15 +3171,6 @@ static irqreturn_t sdhci_thread_irq(int irq, void *dev_id) mmc_detect_change(mmc, msecs_to_jiffies(200)); } - if (isr & SDHCI_INT_CARD_INT) { - sdio_run_irqs(host->mmc); - - spin_lock_irqsave(&host->lock, flags); - if (host->flags & SDHCI_SDIO_IRQ_ENABLED) - sdhci_enable_sdio_irq_nolock(host, true); - spin_unlock_irqrestore(&host->lock, flags); - } - return IRQ_HANDLED; }
On Mon, May 27, 2019 at 03:08:07PM +0300, Adrian Hunter wrote: > On 27/05/19 12:37 PM, Brian Masney wrote: > > On Sun, May 26, 2019 at 03:58:19PM -0400, Brian Masney wrote: > >> I attached a patch that shows how I was able to determine what had > >> already claimed the host. > > On Mon, May 27, 2019 at 10:48:24AM +0300, Adrian Hunter wrote: > >> This is because SDHCI is using the IRQ thread to process the SDIO card > >> interrupt (sdio_run_irqs()). When the card driver tries to use the card, it > >> causes interrupts which deadlocks since c07a48c26519 ("mmc: sdhci: Remove > >> finish_tasklet") has moved the tasklet processing to the IRQ thread. > >> > >> I would expect to be able to use the IRQ thread to complete requests, and it > >> is desirable to do so because it is lower latency. > >> > >> Probably, SDHCI should use sdio_signal_irq() which queues a work item, and > >> is what other drivers are doing. > >> > >> I will investigate some more and send a patch. > > Please try the patch below: > > From: Adrian Hunter <adrian.hunter@intel.com> > Date: Mon, 27 May 2019 14:45:55 +0300 > Subject: [PATCH] mmc: sdhci: Fix SDIO IRQ thread deadlock > > Since commit c07a48c26519 ("mmc: sdhci: Remove finish_tasklet"), the IRQ > thread might be used to complete requests, but the IRQ thread is also used > to process SDIO card interrupts. This can cause a deadlock when the SDIO > processing tries to access the card since that would also require the IRQ > thread. Change SDHCI to use sdio_signal_irq() to schedule a work item > instead. That also requires implementing the ->ack_sdio_irq() mmc host op. > > Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> > Fixes: c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") Yes, this fixes the issue for me. You can add my: Reported-by: Brian Masney <masneyb@onstation.org> Tested-by: Brian Masney <masneyb@onstation.org> Thanks, Brian
On Mon, 27 May 2019 at 14:50, Brian Masney <masneyb@onstation.org> wrote: > > On Mon, May 27, 2019 at 03:08:07PM +0300, Adrian Hunter wrote: > > On 27/05/19 12:37 PM, Brian Masney wrote: > > > On Sun, May 26, 2019 at 03:58:19PM -0400, Brian Masney wrote: > > >> I attached a patch that shows how I was able to determine what had > > >> already claimed the host. > > > On Mon, May 27, 2019 at 10:48:24AM +0300, Adrian Hunter wrote: > > >> This is because SDHCI is using the IRQ thread to process the SDIO card > > >> interrupt (sdio_run_irqs()). When the card driver tries to use the card, it > > >> causes interrupts which deadlocks since c07a48c26519 ("mmc: sdhci: Remove > > >> finish_tasklet") has moved the tasklet processing to the IRQ thread. > > >> > > >> I would expect to be able to use the IRQ thread to complete requests, and it > > >> is desirable to do so because it is lower latency. > > >> > > >> Probably, SDHCI should use sdio_signal_irq() which queues a work item, and > > >> is what other drivers are doing. > > >> > > >> I will investigate some more and send a patch. > > > > Please try the patch below: > > > > From: Adrian Hunter <adrian.hunter@intel.com> > > Date: Mon, 27 May 2019 14:45:55 +0300 > > Subject: [PATCH] mmc: sdhci: Fix SDIO IRQ thread deadlock > > > > Since commit c07a48c26519 ("mmc: sdhci: Remove finish_tasklet"), the IRQ > > thread might be used to complete requests, but the IRQ thread is also used > > to process SDIO card interrupts. This can cause a deadlock when the SDIO > > processing tries to access the card since that would also require the IRQ > > thread. Change SDHCI to use sdio_signal_irq() to schedule a work item > > instead. That also requires implementing the ->ack_sdio_irq() mmc host op. > > > > Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> > > Fixes: c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") > > Yes, this fixes the issue for me. You can add my: > > Reported-by: Brian Masney <masneyb@onstation.org> > Tested-by: Brian Masney <masneyb@onstation.org> > Applied for fixes, thanks! Kind regards Uffe
On 5/27/2019 2:08 PM, Adrian Hunter wrote: > On 27/05/19 12:37 PM, Brian Masney wrote: >> On Sun, May 26, 2019 at 03:58:19PM -0400, Brian Masney wrote: >>> I attached a patch that shows how I was able to determine what had >>> already claimed the host. >> On Mon, May 27, 2019 at 10:48:24AM +0300, Adrian Hunter wrote: >>> This is because SDHCI is using the IRQ thread to process the SDIO card >>> interrupt (sdio_run_irqs()). When the card driver tries to use the card, it >>> causes interrupts which deadlocks since c07a48c26519 ("mmc: sdhci: Remove >>> finish_tasklet") has moved the tasklet processing to the IRQ thread. >>> >>> I would expect to be able to use the IRQ thread to complete requests, and it >>> is desirable to do so because it is lower latency. >>> >>> Probably, SDHCI should use sdio_signal_irq() which queues a work item, and >>> is what other drivers are doing. >>> >>> I will investigate some more and send a patch. > > Please try the patch below: Finally got time to update my kernel to 5.2-rc2. This patch indeed resolves the issue. Thanks, Arend
diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c index 97158344b862..3563c3bc57c9 100644 --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -3115,7 +3115,7 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id) continue; if (sdhci_defer_done(host, mrq)) { - result = IRQ_WAKE_THREAD; + queue_work(host->complete_wq, &host->complete_work); } else { mrqs_done[i] = mrq; host->mrqs_done[i] = NULL;
WiFi stopped working on the LG Nexus 5 phone and the issue was bisected to the commit c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") that moved from using a tasklet to a work queue. That patch also changed sdhci_irq() to return IRQ_WAKE_THREAD instead of finishing the work when sdhci_defer_done() is true. Change it to queue work to the complete work queue if sdhci_defer_done() is true so that the functionality is equilivent to what was there when the finish_tasklet was present. This corrects the WiFi breakage on the Nexus 5 phone. Signed-off-by: Brian Masney <masneyb@onstation.org> Fixes: c07a48c26519 ("mmc: sdhci: Remove finish_tasklet") --- See 'sdhci@f98a4900' in qcom-msm8974-lge-nexus5-hammerhead.dts for details about how the WiFi is wired into sdhci on this platform. bisect log: git bisect start # bad: [4dde821e4296e156d133b98ddc4c45861935a4fb] Merge tag 'xfs-5.2-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux git bisect bad 4dde821e4296e156d133b98ddc4c45861935a4fb # good: [e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd] Linux 5.1 git bisect good e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd # bad: [8c79f4cd441b27df6cadd11b70a50e06b3b3a2bf] Merge tag 'docs-5.2' of git://git.lwn.net/linux git bisect bad 8c79f4cd441b27df6cadd11b70a50e06b3b3a2bf # bad: [67a242223958d628f0ba33283668e3ddd192d057] Merge tag 'for-5.2/block-20190507' of git://git.kernel.dk/linux-block git bisect bad 67a242223958d628f0ba33283668e3ddd192d057 # good: [8ff468c29e9a9c3afe9152c10c7b141343270bf3] Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good 8ff468c29e9a9c3afe9152c10c7b141343270bf3 # good: [e2a5be107f52cefb9010ccae6f569c3ddaa954cc] staging: kpc2000: kpc_spi: Fix build error for {read,write}q git bisect good e2a5be107f52cefb9010ccae6f569c3ddaa954cc # bad: [cf482a49af564a3044de3178ea28f10ad5921b38] Merge tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core git bisect bad cf482a49af564a3044de3178ea28f10ad5921b38 # good: [9f2e3a53f7ec9ef55e9d01bc29a6285d291c151e] Merge tag 'for-5.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux git bisect good 9f2e3a53f7ec9ef55e9d01bc29a6285d291c151e # good: [b4b52b881cf08e13d110eac811d4becc0775abbf] Merge tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux git bisect good b4b52b881cf08e13d110eac811d4becc0775abbf # bad: [d5f758f2df8015b8dcf47b6403cc192e4cef734d] mmc: meson-gx: disable HS400 git bisect bad d5f758f2df8015b8dcf47b6403cc192e4cef734d # good: [b3fb9d64b497b890f7b779a9f0b40b5cc269ea18] mmc: mmci: define get_dctrl_cfg for legacy variant git bisect good b3fb9d64b497b890f7b779a9f0b40b5cc269ea18 # good: [ade024f130f742725da9219624b01666f04bc4a6] memstick: jmb38x_ms: remove set but not used variable 'data' git bisect good ade024f130f742725da9219624b01666f04bc4a6 # bad: [42c38d4a1bc41e78dedbf73b0fb35e44007789bb] mmc: core: Fix warning and undefined behavior in mmc voltage handling git bisect bad 42c38d4a1bc41e78dedbf73b0fb35e44007789bb # good: [19d2f695f4e82794df7465b029c02b104d1b9903] mmc: sdhci: Call mmc_request_done() from IRQ handler if possible git bisect good 19d2f695f4e82794df7465b029c02b104d1b9903 # bad: [71c733c4e1aeb83e8221e89caeec893d51f88b7b] mmc: tegra: add sdhci tegra suspend and resume git bisect bad 71c733c4e1aeb83e8221e89caeec893d51f88b7b # bad: [c07a48c2651965e84d35cf193dfc0e5f7892d612] mmc: sdhci: Remove finish_tasklet git bisect bad c07a48c2651965e84d35cf193dfc0e5f7892d612 # first bad commit: [c07a48c2651965e84d35cf193dfc0e5f7892d612] mmc: sdhci: Remove finish_tasklet drivers/mmc/host/sdhci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)