diff mbox

brcmfmac: stop watchdog before detach and free everything

Message ID 1527493857-2220-1-git-send-email-michael@amarulasolutions.com (mailing list archive)
State Changes Requested
Delegated to: Kalle Valo
Headers show

Commit Message

Michael Nazzareno Trimarchi May 28, 2018, 7:50 a.m. UTC
Watchdog need to be stopped in brcmf_sdio_remove to avoid
i
The system is going down NOW!
[ 1348.110759] Unable to handle kernel NULL pointer dereference at virtual address 000002f8
Sent SIGTERM to all processes
[ 1348.121412] Mem abort info:
[ 1348.126962]   ESR = 0x96000004
[ 1348.130023]   Exception class = DABT (current EL), IL = 32 bits
[ 1348.135948]   SET = 0, FnV = 0
[ 1348.138997]   EA = 0, S1PTW = 0
[ 1348.142154] Data abort info:
[ 1348.145045]   ISV = 0, ISS = 0x00000004
[ 1348.148884]   CM = 0, WnR = 0
[ 1348.151861] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
[ 1348.158475] [00000000000002f8] pgd=0000000000000000
[ 1348.163364] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 1348.168927] Modules linked in: ipv6
[ 1348.172421] CPU: 3 PID: 1421 Comm: brcmf_wdog/mmc0 Not tainted 4.17.0-rc5-next-20180517 #18
[ 1348.180757] Hardware name: Amarula A64-Relic (DT)
[ 1348.185455] pstate: 60000005 (nZCv daif -PAN -UAO)
[ 1348.190251] pc : brcmf_sdiod_freezer_count+0x0/0x20
[ 1348.195124] lr : brcmf_sdio_watchdog_thread+0x64/0x290
[ 1348.200253] sp : ffff00000b85be30
[ 1348.203561] x29: ffff00000b85be30 x28: 0000000000000000
[ 1348.208868] x27: ffff00000b6cb918 x26: ffff80003b990638
[ 1348.214176] x25: ffff0000087b1a20 x24: ffff80003b94f800
[ 1348.219483] x23: ffff000008e620c8 x22: ffff000008f0b660
[ 1348.224790] x21: ffff000008c6a858 x20: 00000000fffffe00
[ 1348.230097] x19: ffff80003b94f800 x18: 0000000000000001
[ 1348.235404] x17: 0000ffffab2e8a74 x16: ffff0000080d7de8
[ 1348.240711] x15: 0000000000000000 x14: 0000000000000400
[ 1348.246018] x13: 0000000000000400 x12: 0000000000000001
[ 1348.251324] x11: 00000000000002c4 x10: 0000000000000a10
[ 1348.256631] x9 : ffff00000b85bc40 x8 : ffff80003be11870
[ 1348.261937] x7 : ffff80003dfc7308 x6 : 000000078ff08b55
[ 1348.267243] x5 : 00000139e1058400 x4 : 0000000000000000
[ 1348.272550] x3 : dead000000000100 x2 : 958f2788d6618100
[ 1348.277856] x1 : 00000000fffffe00 x0 : 0000000000000000

Signed-off-by: Michael Trimarchi <michael@amarulasolutions.com>
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Arend van Spriel May 28, 2018, 9:51 a.m. UTC | #1
On 5/28/2018 9:50 AM, Michael Trimarchi wrote:
> Watchdog need to be stopped in brcmf_sdio_remove to avoid
> i
> The system is going down NOW!
> [ 1348.110759] Unable to handle kernel NULL pointer dereference at virtual address 000002f8
> Sent SIGTERM to all processes
> [ 1348.121412] Mem abort info:
> [ 1348.126962]   ESR = 0x96000004
> [ 1348.130023]   Exception class = DABT (current EL), IL = 32 bits
> [ 1348.135948]   SET = 0, FnV = 0
> [ 1348.138997]   EA = 0, S1PTW = 0
> [ 1348.142154] Data abort info:
> [ 1348.145045]   ISV = 0, ISS = 0x00000004
> [ 1348.148884]   CM = 0, WnR = 0
> [ 1348.151861] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
> [ 1348.158475] [00000000000002f8] pgd=0000000000000000
> [ 1348.163364] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> [ 1348.168927] Modules linked in: ipv6
> [ 1348.172421] CPU: 3 PID: 1421 Comm: brcmf_wdog/mmc0 Not tainted 4.17.0-rc5-next-20180517 #18
> [ 1348.180757] Hardware name: Amarula A64-Relic (DT)
> [ 1348.185455] pstate: 60000005 (nZCv daif -PAN -UAO)
> [ 1348.190251] pc : brcmf_sdiod_freezer_count+0x0/0x20
> [ 1348.195124] lr : brcmf_sdio_watchdog_thread+0x64/0x290

Hi Michael,

Thanks for the patch. In normal scenario the callstack looks like this:

brcmf_sdio_remove()
	-> brcmf_detach()
		-> brcmf_bus_stop()
			-> brcmf_sdio_bus_stop()

In brcmf_sdio_bus_stop() the watchdog is terminated. So in what scenario 
did you encounter this null pointer deref?

Regards,
Arend
Michael Nazzareno Trimarchi May 28, 2018, 9:54 a.m. UTC | #2
Hi Arend

On Mon, May 28, 2018 at 11:51 AM, Arend van Spriel
<arend.vanspriel@broadcom.com> wrote:
> On 5/28/2018 9:50 AM, Michael Trimarchi wrote:
>>
>> Watchdog need to be stopped in brcmf_sdio_remove to avoid
>> i
>> The system is going down NOW!
>> [ 1348.110759] Unable to handle kernel NULL pointer dereference at virtual
>> address 000002f8
>> Sent SIGTERM to all processes
>> [ 1348.121412] Mem abort info:
>> [ 1348.126962]   ESR = 0x96000004
>> [ 1348.130023]   Exception class = DABT (current EL), IL = 32 bits
>> [ 1348.135948]   SET = 0, FnV = 0
>> [ 1348.138997]   EA = 0, S1PTW = 0
>> [ 1348.142154] Data abort info:
>> [ 1348.145045]   ISV = 0, ISS = 0x00000004
>> [ 1348.148884]   CM = 0, WnR = 0
>> [ 1348.151861] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
>> [ 1348.158475] [00000000000002f8] pgd=0000000000000000
>> [ 1348.163364] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>> [ 1348.168927] Modules linked in: ipv6
>> [ 1348.172421] CPU: 3 PID: 1421 Comm: brcmf_wdog/mmc0 Not tainted
>> 4.17.0-rc5-next-20180517 #18
>> [ 1348.180757] Hardware name: Amarula A64-Relic (DT)
>> [ 1348.185455] pstate: 60000005 (nZCv daif -PAN -UAO)
>> [ 1348.190251] pc : brcmf_sdiod_freezer_count+0x0/0x20
>> [ 1348.195124] lr : brcmf_sdio_watchdog_thread+0x64/0x290
>
>
> Hi Michael,
>
> Thanks for the patch. In normal scenario the callstack looks like this:
>
> brcmf_sdio_remove()
>         -> brcmf_detach()
>                 -> brcmf_bus_stop()
>                         -> brcmf_sdio_bus_stop()
>
> In brcmf_sdio_bus_stop() the watchdog is terminated. So in what scenario did
> you encounter this null pointer deref?

Is this happen even when there is not wifi firmware?
boot without any firmware in the filesystem and then trigger a reboot

Michael

>
> Regards,
> Arend
Andy Shevchenko May 28, 2018, 3:25 p.m. UTC | #3
On Mon, May 28, 2018 at 12:54 PM, Michael Nazzareno Trimarchi
<michael@amarulasolutions.com> wrote:
> Hi Arend
>
> On Mon, May 28, 2018 at 11:51 AM, Arend van Spriel
> <arend.vanspriel@broadcom.com> wrote:
>> On 5/28/2018 9:50 AM, Michael Trimarchi wrote:
>>>
>>> Watchdog need to be stopped in brcmf_sdio_remove to avoid
>>> i
>>> The system is going down NOW!
>>> [ 1348.110759] Unable to handle kernel NULL pointer dereference at virtual
>>> address 000002f8
>>> Sent SIGTERM to all processes
>>> [ 1348.121412] Mem abort info:
>>> [ 1348.126962]   ESR = 0x96000004
>>> [ 1348.130023]   Exception class = DABT (current EL), IL = 32 bits
>>> [ 1348.135948]   SET = 0, FnV = 0
>>> [ 1348.138997]   EA = 0, S1PTW = 0
>>> [ 1348.142154] Data abort info:
>>> [ 1348.145045]   ISV = 0, ISS = 0x00000004
>>> [ 1348.148884]   CM = 0, WnR = 0
>>> [ 1348.151861] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
>>> [ 1348.158475] [00000000000002f8] pgd=0000000000000000
>>> [ 1348.163364] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>> [ 1348.168927] Modules linked in: ipv6
>>> [ 1348.172421] CPU: 3 PID: 1421 Comm: brcmf_wdog/mmc0 Not tainted
>>> 4.17.0-rc5-next-20180517 #18
>>> [ 1348.180757] Hardware name: Amarula A64-Relic (DT)
>>> [ 1348.185455] pstate: 60000005 (nZCv daif -PAN -UAO)
>>> [ 1348.190251] pc : brcmf_sdiod_freezer_count+0x0/0x20
>>> [ 1348.195124] lr : brcmf_sdio_watchdog_thread+0x64/0x290
>>
>>
>> Hi Michael,
>>
>> Thanks for the patch. In normal scenario the callstack looks like this:
>>
>> brcmf_sdio_remove()
>>         -> brcmf_detach()
>>                 -> brcmf_bus_stop()
>>                         -> brcmf_sdio_bus_stop()
>>
>> In brcmf_sdio_bus_stop() the watchdog is terminated. So in what scenario did
>> you encounter this null pointer deref?
>
> Is this happen even when there is not wifi firmware?
> boot without any firmware in the filesystem and then trigger a reboot

Something like the above I had noticed for a long (couple of kernel
releases?) time, but wasn't a big priority to me.
Though, I can test this on my side.

P.S. I think rmmod or echo > unbind will trigger that as well.
Michael Nazzareno Trimarchi May 28, 2018, 3:29 p.m. UTC | #4
Hi

On Mon, May 28, 2018 at 5:25 PM, Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
> On Mon, May 28, 2018 at 12:54 PM, Michael Nazzareno Trimarchi
> <michael@amarulasolutions.com> wrote:
>> Hi Arend
>>
>> On Mon, May 28, 2018 at 11:51 AM, Arend van Spriel
>> <arend.vanspriel@broadcom.com> wrote:
>>> On 5/28/2018 9:50 AM, Michael Trimarchi wrote:
>>>>
>>>> Watchdog need to be stopped in brcmf_sdio_remove to avoid
>>>> i
>>>> The system is going down NOW!
>>>> [ 1348.110759] Unable to handle kernel NULL pointer dereference at virtual
>>>> address 000002f8
>>>> Sent SIGTERM to all processes
>>>> [ 1348.121412] Mem abort info:
>>>> [ 1348.126962]   ESR = 0x96000004
>>>> [ 1348.130023]   Exception class = DABT (current EL), IL = 32 bits
>>>> [ 1348.135948]   SET = 0, FnV = 0
>>>> [ 1348.138997]   EA = 0, S1PTW = 0
>>>> [ 1348.142154] Data abort info:
>>>> [ 1348.145045]   ISV = 0, ISS = 0x00000004
>>>> [ 1348.148884]   CM = 0, WnR = 0
>>>> [ 1348.151861] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
>>>> [ 1348.158475] [00000000000002f8] pgd=0000000000000000
>>>> [ 1348.163364] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>>> [ 1348.168927] Modules linked in: ipv6
>>>> [ 1348.172421] CPU: 3 PID: 1421 Comm: brcmf_wdog/mmc0 Not tainted
>>>> 4.17.0-rc5-next-20180517 #18
>>>> [ 1348.180757] Hardware name: Amarula A64-Relic (DT)
>>>> [ 1348.185455] pstate: 60000005 (nZCv daif -PAN -UAO)
>>>> [ 1348.190251] pc : brcmf_sdiod_freezer_count+0x0/0x20
>>>> [ 1348.195124] lr : brcmf_sdio_watchdog_thread+0x64/0x290
>>>
>>>
>>> Hi Michael,
>>>
>>> Thanks for the patch. In normal scenario the callstack looks like this:
>>>
>>> brcmf_sdio_remove()
>>>         -> brcmf_detach()
>>>                 -> brcmf_bus_stop()
>>>                         -> brcmf_sdio_bus_stop()
>>>
>>> In brcmf_sdio_bus_stop() the watchdog is terminated. So in what scenario did
>>> you encounter this null pointer deref?
>>
>> Is this happen even when there is not wifi firmware?
>> boot without any firmware in the filesystem and then trigger a reboot
>
> Something like the above I had noticed for a long (couple of kernel
> releases?) time, but wasn't a big priority to me.
> Though, I can test this on my side.
>
> P.S. I think rmmod or echo > unbind will trigger that as well.
>

Right now the module is compiled in the kernel. I can dig down tonight
on this if needed

Michael

> --
> With Best Regards,
> Andy Shevchenko
Arend van Spriel May 29, 2018, 9:25 a.m. UTC | #5
On 5/28/2018 9:50 AM, Michael Trimarchi wrote:
> Watchdog need to be stopped in brcmf_sdio_remove to avoid
> i
> The system is going down NOW!
> [ 1348.110759] Unable to handle kernel NULL pointer dereference at virtual address 000002f8
> Sent SIGTERM to all processes

[snip]

Please send a V2 with your configuration details to the commit message, 
ie. using built-in driver, no firmware in place, etc.

Reviewed-by: Arend van Spriel <arend.vanspriel@broadcom.com>
> Signed-off-by: Michael Trimarchi <michael@amarulasolutions.com>
> ---
>   drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 7 +++++++
>   1 file changed, 7 insertions(+)
diff mbox

Patch

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
index 412a05b..061f69d 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
@@ -4294,6 +4294,13 @@  void brcmf_sdio_remove(struct brcmf_sdio *bus)
 	brcmf_dbg(TRACE, "Enter\n");
 
 	if (bus) {
+		/* Stop watchdog task */
+		if (bus->watchdog_tsk) {
+			send_sig(SIGTERM, bus->watchdog_tsk, 1);
+			kthread_stop(bus->watchdog_tsk);
+			bus->watchdog_tsk = NULL;
+		}
+
 		/* De-register interrupt handler */
 		brcmf_sdiod_intr_unregister(bus->sdiodev);