diff mbox series

ath10k: fix kernel panic by moving pci flush after napi_disable

Message ID 1532931051-20118-1-git-send-email-tamizhr@codeaurora.org (mailing list archive)
State New, archived
Headers show
Series ath10k: fix kernel panic by moving pci flush after napi_disable | expand

Commit Message

Tamizh chelvam July 30, 2018, 6:10 a.m. UTC
When continuously running wifi up/down sequence, the napi poll
can be scheduled after the CE buffers being freed by ath10k_pci_flush

Steps:
  In a certain condition, during wifi down below scenario might occur.

ath10k_stop->ath10k_hif_stop->napi_schedule->ath10k_pci_flush->napi_poll(napi_synchronize).

In the above scenario, CE buffer entries will be freed up and become NULL in
ath10k_pci_flush. And the napi_poll has been invoked after the flush process
and it will try to get the skb from the CE buffer entry and perform some action on that.
Since the CE buffer already cleaned by pci flush this action will create NULL
pointer dereference and trigger below kernel panic.

Unable to handle kernel NULL pointer dereference at virtual address 0000005c
PC is at ath10k_pci_htt_rx_cb+0x64/0x3ec [ath10k_pci]
ath10k_pci_htt_rx_cb [ath10k_pci]
ath10k_ce_per_engine_service+0x74/0xc4 [ath10k_pci]
ath10k_ce_per_engine_service [ath10k_pci]
ath10k_ce_per_engine_service_any+0x74/0x80 [ath10k_pci]
ath10k_ce_per_engine_service_any [ath10k_pci]
ath10k_pci_napi_poll+0x48/0xec [ath10k_pci]
ath10k_pci_napi_poll [ath10k_pci]
net_rx_action+0xac/0x160
net_rx_action
__do_softirq+0xdc/0x208
__do_softirq
irq_exit+0x84/0xe0
irq_exit
__handle_domain_irq+0x80/0xa0
__handle_domain_irq
gic_handle_irq+0x38/0x5c
gic_handle_irq
__irq_usr+0x44/0x60

Tested on QCA4019 and firmware version 10.4.3.2.1.1-00010

Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org>
---
 drivers/net/wireless/ath/ath10k/ahb.c | 4 ++--
 drivers/net/wireless/ath/ath10k/pci.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

Comments

Ben Greear July 31, 2018, 10:24 p.m. UTC | #1
On 07/29/2018 11:10 PM, Tamizh chelvam wrote:
> When continuously running wifi up/down sequence, the napi poll
> can be scheduled after the CE buffers being freed by ath10k_pci_flush
>
> Steps:
>   In a certain condition, during wifi down below scenario might occur.
>
> ath10k_stop->ath10k_hif_stop->napi_schedule->ath10k_pci_flush->napi_poll(napi_synchronize).
>
> In the above scenario, CE buffer entries will be freed up and become NULL in
> ath10k_pci_flush. And the napi_poll has been invoked after the flush process
> and it will try to get the skb from the CE buffer entry and perform some action on that.
> Since the CE buffer already cleaned by pci flush this action will create NULL
> pointer dereference and trigger below kernel panic.
>
> Unable to handle kernel NULL pointer dereference at virtual address 0000005c
> PC is at ath10k_pci_htt_rx_cb+0x64/0x3ec [ath10k_pci]
> ath10k_pci_htt_rx_cb [ath10k_pci]
> ath10k_ce_per_engine_service+0x74/0xc4 [ath10k_pci]
> ath10k_ce_per_engine_service [ath10k_pci]
> ath10k_ce_per_engine_service_any+0x74/0x80 [ath10k_pci]
> ath10k_ce_per_engine_service_any [ath10k_pci]
> ath10k_pci_napi_poll+0x48/0xec [ath10k_pci]
> ath10k_pci_napi_poll [ath10k_pci]
> net_rx_action+0xac/0x160
> net_rx_action
> __do_softirq+0xdc/0x208
> __do_softirq
> irq_exit+0x84/0xe0
> irq_exit
> __handle_domain_irq+0x80/0xa0
> __handle_domain_irq
> gic_handle_irq+0x38/0x5c
> gic_handle_irq
> __irq_usr+0x44/0x60
>
> Tested on QCA4019 and firmware version 10.4.3.2.1.1-00010

I have been testing this for two days while bisecting buggy 10.4 9984 firmware.  I still see
crashes, but it is certainly no worse.  So, this patch seems OK to me.

Thanks,
Ben
Kalle Valo Aug. 24, 2018, 11:43 a.m. UTC | #2
Tamizh chelvam <tamizhr@codeaurora.org> wrote:

> When continuously running wifi up/down sequence, the napi poll
> can be scheduled after the CE buffers being freed by ath10k_pci_flush
> 
> Steps:
>   In a certain condition, during wifi down below scenario might occur.
> 
> ath10k_stop->ath10k_hif_stop->napi_schedule->ath10k_pci_flush->napi_poll(napi_synchronize).
> 
> In the above scenario, CE buffer entries will be freed up and become NULL in
> ath10k_pci_flush. And the napi_poll has been invoked after the flush process
> and it will try to get the skb from the CE buffer entry and perform some action on that.
> Since the CE buffer already cleaned by pci flush this action will create NULL
> pointer dereference and trigger below kernel panic.
> 
> Unable to handle kernel NULL pointer dereference at virtual address 0000005c
> PC is at ath10k_pci_htt_rx_cb+0x64/0x3ec [ath10k_pci]
> ath10k_pci_htt_rx_cb [ath10k_pci]
> ath10k_ce_per_engine_service+0x74/0xc4 [ath10k_pci]
> ath10k_ce_per_engine_service [ath10k_pci]
> ath10k_ce_per_engine_service_any+0x74/0x80 [ath10k_pci]
> ath10k_ce_per_engine_service_any [ath10k_pci]
> ath10k_pci_napi_poll+0x48/0xec [ath10k_pci]
> ath10k_pci_napi_poll [ath10k_pci]
> net_rx_action+0xac/0x160
> net_rx_action
> __do_softirq+0xdc/0x208
> __do_softirq
> irq_exit+0x84/0xe0
> irq_exit
> __handle_domain_irq+0x80/0xa0
> __handle_domain_irq
> gic_handle_irq+0x38/0x5c
> gic_handle_irq
> __irq_usr+0x44/0x60
> 
> Tested on QCA4019 and firmware version 10.4.3.2.1.1-00010
> 
> Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org>
> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

Patch applied to ath-next branch of ath.git, thanks.

bd1d395070cc ath10k: fix kernel panic by moving pci flush after napi_disable
diff mbox series

Patch

diff --git a/drivers/net/wireless/ath/ath10k/ahb.c b/drivers/net/wireless/ath/ath10k/ahb.c
index fa39fff..e244dc5 100644
--- a/drivers/net/wireless/ath/ath10k/ahb.c
+++ b/drivers/net/wireless/ath/ath10k/ahb.c
@@ -660,10 +660,10 @@  static void ath10k_ahb_hif_stop(struct ath10k *ar)
 	ath10k_ahb_irq_disable(ar);
 	synchronize_irq(ar_ahb->irq);
 
-	ath10k_pci_flush(ar);
-
 	napi_synchronize(&ar->napi);
 	napi_disable(&ar->napi);
+
+	ath10k_pci_flush(ar);
 }
 
 static int ath10k_ahb_hif_power_up(struct ath10k *ar)
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index af2cf55..97fa5c7 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -2068,9 +2068,9 @@  static void ath10k_pci_hif_stop(struct ath10k *ar)
 
 	ath10k_pci_irq_disable(ar);
 	ath10k_pci_irq_sync(ar);
-	ath10k_pci_flush(ar);
 	napi_synchronize(&ar->napi);
 	napi_disable(&ar->napi);
+	ath10k_pci_flush(ar);
 
 	spin_lock_irqsave(&ar_pci->ps_lock, flags);
 	WARN_ON(ar_pci->ps_wake_refcount > 0);