ath10k: Remove ATH10K_STATE_RESTARTED in simulate fw crash
diff mbox series

Message ID 1542163824-795-1-git-send-email-wgong@codeaurora.org
State New
Delegated to: Kalle Valo
Headers show
Series
  • ath10k: Remove ATH10K_STATE_RESTARTED in simulate fw crash
Related show

Commit Message

Wen Gong Nov. 14, 2018, 2:50 a.m. UTC
When test simulate firmware crash, it is easy to trigger error.
command:
echo soft > /sys/kernel/debug/ieee80211/phyxx/ath10k/simulate_fw_crash.

If input more than two times continuously, then it will have error.
Error message:
ath10k_pci 0000:02:00.0: failed to set vdev 1 RX wake policy: -108
ath10k_pci 0000:02:00.0: device is wedged, will not restart

It is because the state has not changed to ATH10K_STATE_ON immediately,
then it will have more than two simulate crash process running meanwhile,
and complete/wakeup some field twice, it destroy the normal recovery
process.

Tested with QCA6174 PCI with firmware
WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 PCI.
It's not a regression with new firmware releases.

Signed-off-by: Wen Gong <wgong@codeaurora.org>
---
 drivers/net/wireless/ath/ath10k/debug.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Michał Kazior Nov. 14, 2018, 7:48 a.m. UTC | #1
On Wed, 14 Nov 2018 at 03:51, Wen Gong <wgong@codeaurora.org> wrote:
>
> When test simulate firmware crash, it is easy to trigger error.
> command:
> echo soft > /sys/kernel/debug/ieee80211/phyxx/ath10k/simulate_fw_crash.
>
> If input more than two times continuously, then it will have error.
> Error message:
> ath10k_pci 0000:02:00.0: failed to set vdev 1 RX wake policy: -108
> ath10k_pci 0000:02:00.0: device is wedged, will not restart
>
> It is because the state has not changed to ATH10K_STATE_ON immediately,
> then it will have more than two simulate crash process running meanwhile,
> and complete/wakeup some field twice, it destroy the normal recovery
> process.

This was intended to allow testing not only firmware crash path (and
recovery) but also firmware crash while recovering from a firmware
crash.


Michał
Wen Gong Jan. 7, 2019, 7:16 a.m. UTC | #2
> > It is because the state has not changed to ATH10K_STATE_ON
> > immediately, then it will have more than two simulate crash process
> > running meanwhile, and complete/wakeup some field twice, it destroy
> > the normal recovery process.
> 
> This was intended to allow testing not only firmware crash path (and
> recovery) but also firmware crash while recovering from a firmware crash.
> 
If firmware is recovering from crash, then simulate a new crash will trigger error.
So remove it.
> 
> Michał
> 
> _______________________________________________
> ath10k mailing list
> ath10k@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath10k
Michał Kazior Jan. 7, 2019, 8:35 a.m. UTC | #3
On Mon, 7 Jan 2019 at 08:16, Wen Gong <wgong@qti.qualcomm.com> wrote:
>
> > > It is because the state has not changed to ATH10K_STATE_ON
> > > immediately, then it will have more than two simulate crash process
> > > running meanwhile, and complete/wakeup some field twice, it destroy
> > > the normal recovery process.
> >
> > This was intended to allow testing not only firmware crash path (and
> > recovery) but also firmware crash while recovering from a firmware crash.
> >
> If firmware is recovering from crash, then simulate a new crash will trigger error.
> So remove it.

That's actually a feature, not a bug. If firmware crashes while driver
is restarting after a crash then its likely going to fail again and
again causing a crash-restart loop which can affect system performance
and responsiveness. It's better to give up and let the system admin
take over.

If it's still bothering you then please consider a crash counter
threshold so that, e.g. after 5 crash-while-restarting it's going to
give up. However I doubt it's worth the effort. My experience tells me
firmware crashes during recovery are rarely, if at all, transient.

The simulated fw crash is not representative here. It's a mere tool to
test driver code.


Michał
Wen Gong Jan. 8, 2019, 8:45 a.m. UTC | #4
> >
> > > > It is because the state has not changed to ATH10K_STATE_ON
> > > > immediately, then it will have more than two simulate crash
> > > > process running meanwhile, and complete/wakeup some field twice,
> > > > it destroy the normal recovery process.
> > >
> > > This was intended to allow testing not only firmware crash path (and
> > > recovery) but also firmware crash while recovering from a firmware crash.
> > >
> > If firmware is recovering from crash, then simulate a new crash will trigger
> error.
> > So remove it.
> 
> That's actually a feature, not a bug. If firmware crashes while driver is
> restarting after a crash then its likely going to fail again and again causing a
> crash-restart loop which can affect system performance and responsiveness.
> It's better to give up and let the system admin take over.
> 
> If it's still bothering you then please consider a crash counter threshold so
> that, e.g. after 5 crash-while-restarting it's going to give up. However I doubt
> it's worth the effort. My experience tells me firmware crashes during
> recovery are rarely, if at all, transient.
> 
> The simulated fw crash is not representative here. It's a mere tool to test
> driver code.

The simulated fw crash is only a tool for user to trigger fw crash with command,
This change's purpose is to disallow user to trigger fw crash if the fw is not in a
Normal state.

If the fw is in recovering state triggered by user's command or by fw, then it will
disallow user to run command to trigger fw crash again until fw become to a normal
State.

> 
> 
> Michał

Patch
diff mbox series

diff --git a/drivers/net/wireless/ath/ath10k/debug.c b/drivers/net/wireless/ath/ath10k/debug.c
index ada29a4..dc8700b 100644
--- a/drivers/net/wireless/ath/ath10k/debug.c
+++ b/drivers/net/wireless/ath/ath10k/debug.c
@@ -569,8 +569,7 @@  static ssize_t ath10k_write_simulate_fw_crash(struct file *file,
 
 	mutex_lock(&ar->conf_mutex);
 
-	if (ar->state != ATH10K_STATE_ON &&
-	    ar->state != ATH10K_STATE_RESTARTED) {
+	if (ar->state != ATH10K_STATE_ON) {
 		ret = -ENETDOWN;
 		goto exit;
 	}