diff mbox

ath10k: Fix soft lockup during firmware crash/hw-restart

Message ID 1480430762-15995-1-git-send-email-mohammed@qca.qualcomm.com (mailing list archive)
State Changes Requested
Delegated to: Kalle Valo
Headers show

Commit Message

Mohammed Shafi Shajakhan Nov. 29, 2016, 2:46 p.m. UTC
From: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>

During firmware crash (or) user requested manual restart
the system gets into a soft lock up state because of the
below root cause.

During user requested hardware restart / firmware crash
the system goes into a soft lockup state as 'napi_synchronize'
is called after 'napi_disable' (which sets 'NAPI_STATE_SCHED'
bit) and it sleeps into infinite loop as it waits for
'NAPI_STATE_SCHED' to be cleared. This condition is hit because
'ath10k_hif_stop' is called twice as below (resulting in calling
'napi_synchronize' after 'napi_disable')

'ath10k_core_restart' -> 'ath10k_hif_stop' (ATH10K_STATE_ON) ->
-> 'ieee80211_restart_hw' -> 'ath10k_start' -> 'ath10k_halt' ->
'ath10k_core_stop' -> 'ath10k_hif_stop' (ATH10K_STATE_RESTARTING)

Fix this by calling 'ath10k_halt' in ath10k_core_restart itself
as it makes more sense before informing mac80211 to restart h/w
Also remove 'ath10k_halt' in ath10k_start for the state of 'restarting'

Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
---
[thanks to Kalle and Michal for their inputs]

 drivers/net/wireless/ath/ath10k/core.c | 2 +-
 drivers/net/wireless/ath/ath10k/mac.c  | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

Comments

Ben Greear Nov. 29, 2016, 6:16 p.m. UTC | #1
Is this something for stable?  And if so, how far back should it be applied?

I'll add this patch to my 4.9 tree and test it...

Thanks,
Ben

On 11/29/2016 06:46 AM, Mohammed Shafi Shajakhan wrote:
> From: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
>
> During firmware crash (or) user requested manual restart
> the system gets into a soft lock up state because of the
> below root cause.
>
> During user requested hardware restart / firmware crash
> the system goes into a soft lockup state as 'napi_synchronize'
> is called after 'napi_disable' (which sets 'NAPI_STATE_SCHED'
> bit) and it sleeps into infinite loop as it waits for
> 'NAPI_STATE_SCHED' to be cleared. This condition is hit because
> 'ath10k_hif_stop' is called twice as below (resulting in calling
> 'napi_synchronize' after 'napi_disable')
>
> 'ath10k_core_restart' -> 'ath10k_hif_stop' (ATH10K_STATE_ON) ->
> -> 'ieee80211_restart_hw' -> 'ath10k_start' -> 'ath10k_halt' ->
> 'ath10k_core_stop' -> 'ath10k_hif_stop' (ATH10K_STATE_RESTARTING)
>
> Fix this by calling 'ath10k_halt' in ath10k_core_restart itself
> as it makes more sense before informing mac80211 to restart h/w
> Also remove 'ath10k_halt' in ath10k_start for the state of 'restarting'
>
> Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
> ---
> [thanks to Kalle and Michal for their inputs]
>
>  drivers/net/wireless/ath/ath10k/core.c | 2 +-
>  drivers/net/wireless/ath/ath10k/mac.c  | 1 -
>  2 files changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
> index 7005e2a..5bc6847 100644
> --- a/drivers/net/wireless/ath/ath10k/core.c
> +++ b/drivers/net/wireless/ath/ath10k/core.c
> @@ -1536,7 +1536,7 @@ static void ath10k_core_restart(struct work_struct *work)
>  	switch (ar->state) {
>  	case ATH10K_STATE_ON:
>  		ar->state = ATH10K_STATE_RESTARTING;
> -		ath10k_hif_stop(ar);
> +		ath10k_halt(ar);
>  		ath10k_scan_finish(ar);
>  		ieee80211_restart_hw(ar->hw);
>  		break;
> diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
> index 717b2fa..481842b 100644
> --- a/drivers/net/wireless/ath/ath10k/mac.c
> +++ b/drivers/net/wireless/ath/ath10k/mac.c
> @@ -4449,7 +4449,6 @@ static int ath10k_start(struct ieee80211_hw *hw)
>  		ar->state = ATH10K_STATE_ON;
>  		break;
>  	case ATH10K_STATE_RESTARTING:
> -		ath10k_halt(ar);
>  		ar->state = ATH10K_STATE_RESTARTED;
>  		break;
>  	case ATH10K_STATE_ON:
>
Ben Greear Nov. 30, 2016, 12:43 a.m. UTC | #2
I have not seen this hang since adding this patch, so
hopefully it has resolved the problem.

Thanks,
Ben

On 11/29/2016 06:46 AM, Mohammed Shafi Shajakhan wrote:
> From: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
>
> During firmware crash (or) user requested manual restart
> the system gets into a soft lock up state because of the
> below root cause.
>
> During user requested hardware restart / firmware crash
> the system goes into a soft lockup state as 'napi_synchronize'
> is called after 'napi_disable' (which sets 'NAPI_STATE_SCHED'
> bit) and it sleeps into infinite loop as it waits for
> 'NAPI_STATE_SCHED' to be cleared. This condition is hit because
> 'ath10k_hif_stop' is called twice as below (resulting in calling
> 'napi_synchronize' after 'napi_disable')
>
> 'ath10k_core_restart' -> 'ath10k_hif_stop' (ATH10K_STATE_ON) ->
> -> 'ieee80211_restart_hw' -> 'ath10k_start' -> 'ath10k_halt' ->
> 'ath10k_core_stop' -> 'ath10k_hif_stop' (ATH10K_STATE_RESTARTING)
>
> Fix this by calling 'ath10k_halt' in ath10k_core_restart itself
> as it makes more sense before informing mac80211 to restart h/w
> Also remove 'ath10k_halt' in ath10k_start for the state of 'restarting'
>
> Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
> ---
> [thanks to Kalle and Michal for their inputs]
>
>  drivers/net/wireless/ath/ath10k/core.c | 2 +-
>  drivers/net/wireless/ath/ath10k/mac.c  | 1 -
>  2 files changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
> index 7005e2a..5bc6847 100644
> --- a/drivers/net/wireless/ath/ath10k/core.c
> +++ b/drivers/net/wireless/ath/ath10k/core.c
> @@ -1536,7 +1536,7 @@ static void ath10k_core_restart(struct work_struct *work)
>  	switch (ar->state) {
>  	case ATH10K_STATE_ON:
>  		ar->state = ATH10K_STATE_RESTARTING;
> -		ath10k_hif_stop(ar);
> +		ath10k_halt(ar);
>  		ath10k_scan_finish(ar);
>  		ieee80211_restart_hw(ar->hw);
>  		break;
> diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
> index 717b2fa..481842b 100644
> --- a/drivers/net/wireless/ath/ath10k/mac.c
> +++ b/drivers/net/wireless/ath/ath10k/mac.c
> @@ -4449,7 +4449,6 @@ static int ath10k_start(struct ieee80211_hw *hw)
>  		ar->state = ATH10K_STATE_ON;
>  		break;
>  	case ATH10K_STATE_RESTARTING:
> -		ath10k_halt(ar);
>  		ar->state = ATH10K_STATE_RESTARTED;
>  		break;
>  	case ATH10K_STATE_ON:
>
Mohammed Shafi Shajakhan Nov. 30, 2016, 6:10 a.m. UTC | #3
On Tue, Nov 29, 2016 at 04:43:48PM -0800, Ben Greear wrote:
> I have not seen this hang since adding this patch, so
> hopefully it has resolved the problem.

thanks Ben ! I had updated the commit log with Fixes tag
https://patchwork.kernel.org/patch/9453609/

> 
> On 11/29/2016 06:46 AM, Mohammed Shafi Shajakhan wrote:
> >From: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
> >
> >During firmware crash (or) user requested manual restart
> >the system gets into a soft lock up state because of the
> >below root cause.
> >
> >During user requested hardware restart / firmware crash
> >the system goes into a soft lockup state as 'napi_synchronize'
> >is called after 'napi_disable' (which sets 'NAPI_STATE_SCHED'
> >bit) and it sleeps into infinite loop as it waits for
> >'NAPI_STATE_SCHED' to be cleared. This condition is hit because
> >'ath10k_hif_stop' is called twice as below (resulting in calling
> >'napi_synchronize' after 'napi_disable')
> >
> >'ath10k_core_restart' -> 'ath10k_hif_stop' (ATH10K_STATE_ON) ->
> >-> 'ieee80211_restart_hw' -> 'ath10k_start' -> 'ath10k_halt' ->
> >'ath10k_core_stop' -> 'ath10k_hif_stop' (ATH10K_STATE_RESTARTING)
> >
> >Fix this by calling 'ath10k_halt' in ath10k_core_restart itself
> >as it makes more sense before informing mac80211 to restart h/w
> >Also remove 'ath10k_halt' in ath10k_start for the state of 'restarting'
> >
> >Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
> >---
> >[thanks to Kalle and Michal for their inputs]
> >
> > drivers/net/wireless/ath/ath10k/core.c | 2 +-
> > drivers/net/wireless/ath/ath10k/mac.c  | 1 -
> > 2 files changed, 1 insertion(+), 2 deletions(-)
> >
> >diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
> >index 7005e2a..5bc6847 100644
> >--- a/drivers/net/wireless/ath/ath10k/core.c
> >+++ b/drivers/net/wireless/ath/ath10k/core.c
> >@@ -1536,7 +1536,7 @@ static void ath10k_core_restart(struct work_struct *work)
> > 	switch (ar->state) {
> > 	case ATH10K_STATE_ON:
> > 		ar->state = ATH10K_STATE_RESTARTING;
> >-		ath10k_hif_stop(ar);
> >+		ath10k_halt(ar);
> > 		ath10k_scan_finish(ar);
> > 		ieee80211_restart_hw(ar->hw);
> > 		break;
> >diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
> >index 717b2fa..481842b 100644
> >--- a/drivers/net/wireless/ath/ath10k/mac.c
> >+++ b/drivers/net/wireless/ath/ath10k/mac.c
> >@@ -4449,7 +4449,6 @@ static int ath10k_start(struct ieee80211_hw *hw)
> > 		ar->state = ATH10K_STATE_ON;
> > 		break;
> > 	case ATH10K_STATE_RESTARTING:
> >-		ath10k_halt(ar);
> > 		ar->state = ATH10K_STATE_RESTARTED;
> > 		break;
> > 	case ATH10K_STATE_ON:
> >
> 
> 
> -- 
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
>
Mohammed Shafi Shajakhan Dec. 1, 2016, 6:06 a.m. UTC | #4
On Tue, Nov 29, 2016 at 10:16:50AM -0800, Ben Greear wrote:
> Is this something for stable?  And if so, how far back should it be applied?

@Kalle,

[shafi] kindly suggest. If i am not wrong this is only needed for 4.9

> 
> I'll add this patch to my 4.9 tree and test it...
>

[shafi] thanks a lot Ben.

> 
> On 11/29/2016 06:46 AM, Mohammed Shafi Shajakhan wrote:
> >From: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
> >
> >During firmware crash (or) user requested manual restart
> >the system gets into a soft lock up state because of the
> >below root cause.
> >
> >During user requested hardware restart / firmware crash
> >the system goes into a soft lockup state as 'napi_synchronize'
> >is called after 'napi_disable' (which sets 'NAPI_STATE_SCHED'
> >bit) and it sleeps into infinite loop as it waits for
> >'NAPI_STATE_SCHED' to be cleared. This condition is hit because
> >'ath10k_hif_stop' is called twice as below (resulting in calling
> >'napi_synchronize' after 'napi_disable')
> >
> >'ath10k_core_restart' -> 'ath10k_hif_stop' (ATH10K_STATE_ON) ->
> >-> 'ieee80211_restart_hw' -> 'ath10k_start' -> 'ath10k_halt' ->
> >'ath10k_core_stop' -> 'ath10k_hif_stop' (ATH10K_STATE_RESTARTING)
> >
> >Fix this by calling 'ath10k_halt' in ath10k_core_restart itself
> >as it makes more sense before informing mac80211 to restart h/w
> >Also remove 'ath10k_halt' in ath10k_start for the state of 'restarting'
> >
> >Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
> >---
> >[thanks to Kalle and Michal for their inputs]
> >
> > drivers/net/wireless/ath/ath10k/core.c | 2 +-
> > drivers/net/wireless/ath/ath10k/mac.c  | 1 -
> > 2 files changed, 1 insertion(+), 2 deletions(-)
> >
> >diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
> >index 7005e2a..5bc6847 100644
> >--- a/drivers/net/wireless/ath/ath10k/core.c
> >+++ b/drivers/net/wireless/ath/ath10k/core.c
> >@@ -1536,7 +1536,7 @@ static void ath10k_core_restart(struct work_struct *work)
> > 	switch (ar->state) {
> > 	case ATH10K_STATE_ON:
> > 		ar->state = ATH10K_STATE_RESTARTING;
> >-		ath10k_hif_stop(ar);
> >+		ath10k_halt(ar);
> > 		ath10k_scan_finish(ar);
> > 		ieee80211_restart_hw(ar->hw);
> > 		break;
> >diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
> >index 717b2fa..481842b 100644
> >--- a/drivers/net/wireless/ath/ath10k/mac.c
> >+++ b/drivers/net/wireless/ath/ath10k/mac.c
> >@@ -4449,7 +4449,6 @@ static int ath10k_start(struct ieee80211_hw *hw)
> > 		ar->state = ATH10K_STATE_ON;
> > 		break;
> > 	case ATH10K_STATE_RESTARTING:
> >-		ath10k_halt(ar);
> > 		ar->state = ATH10K_STATE_RESTARTED;
> > 		break;
> > 	case ATH10K_STATE_ON:
> >
> 
> 
> -- 
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
>
Kalle Valo Dec. 1, 2016, 10:34 a.m. UTC | #5
Mohammed Shafi Shajakhan <mohammed@codeaurora.org> writes:

> On Tue, Nov 29, 2016 at 10:16:50AM -0800, Ben Greear wrote:
>> Is this something for stable?  And if so, how far back should it be applied?
>
> @Kalle,
>
> [shafi] kindly suggest. If i am not wrong this is only needed for 4.9

Correct, commit 3c97f5de1f28 went to 4.9-rc1, it was not in 4.8. I'll
add CC stable to the commit log.
diff mbox

Patch

diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
index 7005e2a..5bc6847 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -1536,7 +1536,7 @@  static void ath10k_core_restart(struct work_struct *work)
 	switch (ar->state) {
 	case ATH10K_STATE_ON:
 		ar->state = ATH10K_STATE_RESTARTING;
-		ath10k_hif_stop(ar);
+		ath10k_halt(ar);
 		ath10k_scan_finish(ar);
 		ieee80211_restart_hw(ar->hw);
 		break;
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index 717b2fa..481842b 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -4449,7 +4449,6 @@  static int ath10k_start(struct ieee80211_hw *hw)
 		ar->state = ATH10K_STATE_ON;
 		break;
 	case ATH10K_STATE_RESTARTING:
-		ath10k_halt(ar);
 		ar->state = ATH10K_STATE_RESTARTED;
 		break;
 	case ATH10K_STATE_ON: