diff mbox

[v2,07/10] ath10k: add fw-powerup-fail to ethtool stats.

Message ID 1411507045-18973-7-git-send-email-greearb@candelatech.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Ben Greear Sept. 23, 2014, 9:17 p.m. UTC
From: Ben Greear <greearb@candelatech.com>

This gives user-space a normal-ish way to detect that
firmware has failed to start and that a reboot is
probably required.

Signed-off-by: Ben Greear <greearb@candelatech.com>
---

v2:  New patch for this series, goes well with the other
   ethtool patch.

 drivers/net/wireless/ath/ath10k/core.h  |  2 +-
 drivers/net/wireless/ath/ath10k/debug.c |  2 ++
 drivers/net/wireless/ath/ath10k/pci.c   | 12 +++++++++---
 3 files changed, 12 insertions(+), 4 deletions(-)

Comments

Kalle Valo Sept. 29, 2014, 8:24 a.m. UTC | #1
greearb@candelatech.com writes:

> From: Ben Greear <greearb@candelatech.com>
>
> This gives user-space a normal-ish way to detect that
> firmware has failed to start and that a reboot is
> probably required.
>
> Signed-off-by: Ben Greear <greearb@candelatech.com>

[...]

> --- a/drivers/net/wireless/ath/ath10k/pci.c
> +++ b/drivers/net/wireless/ath/ath10k/pci.c
> @@ -1851,7 +1851,7 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar)
>  			    ret);
>  
>  		if (ath10k_pci_reset_mode == ATH10K_PCI_RESET_WARM_ONLY)
> -			return ret;
> +			goto out;
>  
>  		ath10k_warn(ar, "trying cold reset\n");
>  
> @@ -1859,11 +1859,17 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar)
>  		if (ret) {
>  			ath10k_err(ar, "failed to power up target using cold reset too (%d)\n",
>  				   ret);
> -			return ret;
> +			goto out;
>  		}
>  	}
>  
> -	return 0;
> +out:
> +	/* If we have failed to power-up, it may take a reboot to
> +	 * get the NIC back online.
> +	 * Set flag accordinly so that user-space can know.
> +	 */
> +	ar->fw_powerup_failed = !!ret;
> +	return ret;
>  }

Would it be better to use ATH10K_STATE_WEDGED for this and then just
export the state value to user space? Or should we have two different
states, like FW_WEDGED and HW_WEDGED?
Ben Greear Sept. 29, 2014, 4:05 p.m. UTC | #2
On 09/29/2014 01:24 AM, Kalle Valo wrote:
> greearb@candelatech.com writes:
> 
>> From: Ben Greear <greearb@candelatech.com>
>>
>> This gives user-space a normal-ish way to detect that
>> firmware has failed to start and that a reboot is
>> probably required.
>>
>> Signed-off-by: Ben Greear <greearb@candelatech.com>
> 
> [...]
> 
>> --- a/drivers/net/wireless/ath/ath10k/pci.c
>> +++ b/drivers/net/wireless/ath/ath10k/pci.c
>> @@ -1851,7 +1851,7 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar)
>>  			    ret);
>>  
>>  		if (ath10k_pci_reset_mode == ATH10K_PCI_RESET_WARM_ONLY)
>> -			return ret;
>> +			goto out;
>>  
>>  		ath10k_warn(ar, "trying cold reset\n");
>>  
>> @@ -1859,11 +1859,17 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar)
>>  		if (ret) {
>>  			ath10k_err(ar, "failed to power up target using cold reset too (%d)\n",
>>  				   ret);
>> -			return ret;
>> +			goto out;
>>  		}
>>  	}
>>  
>> -	return 0;
>> +out:
>> +	/* If we have failed to power-up, it may take a reboot to
>> +	 * get the NIC back online.
>> +	 * Set flag accordinly so that user-space can know.
>> +	 */
>> +	ar->fw_powerup_failed = !!ret;
>> +	return ret;
>>  }
> 
> Would it be better to use ATH10K_STATE_WEDGED for this and then just
> export the state value to user space? Or should we have two different
> states, like FW_WEDGED and HW_WEDGED?

I didn't want to mess with the state machine.  This counter
is just a clue to users that things might be badly wrong.  Some systems
might recover with another hard reset, some will hang the entire
system hard, and some will just stick in this state unable to
recover.  Some of my systems exhibit this last behaviour, so at
least with this patch I can warn the user that they need to
reboot to regain wifi functionality.

If you want to tie it to a state machine, that is OK with me, but
I don't want to mess with it because that code is already tricky
enough.

Thanks,
Ben
Michal Kazior Sept. 30, 2014, 12:27 p.m. UTC | #3
On 29 September 2014 18:05, Ben Greear <greearb@candelatech.com> wrote:
> On 09/29/2014 01:24 AM, Kalle Valo wrote:
>> greearb@candelatech.com writes:
>>
>>> From: Ben Greear <greearb@candelatech.com>
>>>
>>> This gives user-space a normal-ish way to detect that
>>> firmware has failed to start and that a reboot is
>>> probably required.
>>>
>>> Signed-off-by: Ben Greear <greearb@candelatech.com>
[...]
>>> -    return 0;
>>> +out:
>>> +    /* If we have failed to power-up, it may take a reboot to
>>> +     * get the NIC back online.
>>> +     * Set flag accordinly so that user-space can know.
>>> +     */
>>> +    ar->fw_powerup_failed = !!ret;
>>> +    return ret;
>>>  }
>>
>> Would it be better to use ATH10K_STATE_WEDGED for this and then just
>> export the state value to user space? Or should we have two different
>> states, like FW_WEDGED and HW_WEDGED?

Current WEDGED state is more like ON. It assumes mac80211 will call
ath10k_stop().

Adding another state just for the sake of handling power up / reset
issues seems like an overkill to me.


> I didn't want to mess with the state machine.  This counter
> is just a clue to users that things might be badly wrong.  Some systems
> might recover with another hard reset, some will hang the entire
> system hard, and some will just stick in this state unable to
> recover.  Some of my systems exhibit this last behaviour, so at
> least with this patch I can warn the user that they need to
> reboot to regain wifi functionality.

If power up fails the error should propagate to `ifconfig wlanX up`
(or whatever calling drv_start) eventually so I don't see the point in
having this counter.


Micha?
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Greear Sept. 30, 2014, 3:53 p.m. UTC | #4
On 09/30/2014 05:27 AM, Michal Kazior wrote:
> On 29 September 2014 18:05, Ben Greear <greearb@candelatech.com> wrote:
>> On 09/29/2014 01:24 AM, Kalle Valo wrote:
>>> greearb@candelatech.com writes:
>>>
>>>> From: Ben Greear <greearb@candelatech.com>
>>>>
>>>> This gives user-space a normal-ish way to detect that
>>>> firmware has failed to start and that a reboot is
>>>> probably required.
>>>>
>>>> Signed-off-by: Ben Greear <greearb@candelatech.com>
> [...]
>>>> -    return 0;
>>>> +out:
>>>> +    /* If we have failed to power-up, it may take a reboot to
>>>> +     * get the NIC back online.
>>>> +     * Set flag accordinly so that user-space can know.
>>>> +     */
>>>> +    ar->fw_powerup_failed = !!ret;
>>>> +    return ret;
>>>>  }
>>>
>>> Would it be better to use ATH10K_STATE_WEDGED for this and then just
>>> export the state value to user space? Or should we have two different
>>> states, like FW_WEDGED and HW_WEDGED?
> 
> Current WEDGED state is more like ON. It assumes mac80211 will call
> ath10k_stop().
> 
> Adding another state just for the sake of handling power up / reset
> issues seems like an overkill to me.
> 
> 
>> I didn't want to mess with the state machine.  This counter
>> is just a clue to users that things might be badly wrong.  Some systems
>> might recover with another hard reset, some will hang the entire
>> system hard, and some will just stick in this state unable to
>> recover.  Some of my systems exhibit this last behaviour, so at
>> least with this patch I can warn the user that they need to
>> reboot to regain wifi functionality.
> 
> If power up fails the error should propagate to `ifconfig wlanX up`
> (or whatever calling drv_start) eventually so I don't see the point in
> having this counter.

Supplicant manages this, and programatically it is not at all easy to
figure out that the network is failing to come up because the firmware
is broken-and-can't-be-fixed-without-reboot.

The counter I added can be queried by a management tool and propagate a
clear error to the end user.

Thanks,
Ben
diff mbox

Patch

diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index 7b220b1..601d573 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -399,7 +399,7 @@  struct ath10k {
 	struct ieee80211_hw *hw;
 	struct device *dev;
 	u8 mac_addr[ETH_ALEN];
-
+	bool fw_powerup_failed; /* If true, might take reboot to recover. */
 	u32 chip_id;
 	u32 target_version;
 	u8 fw_version_major;
diff --git a/drivers/net/wireless/ath/ath10k/debug.c b/drivers/net/wireless/ath/ath10k/debug.c
index af1ca3e..77b60f4 100644
--- a/drivers/net/wireless/ath/ath10k/debug.c
+++ b/drivers/net/wireless/ath/ath10k/debug.c
@@ -1091,6 +1091,7 @@  static const char ath10k_gstrings_stats[][ETH_GSTRING_LEN] = {
 	"d_fw_crash_count",
 	"d_fw_warm_reset_count",
 	"d_fw_cold_reset_count",
+	"d_fw_powerup_failed", /* boolean */
 };
 #define ATH10K_SSTATS_LEN ARRAY_SIZE(ath10k_gstrings_stats)
 
@@ -1175,6 +1176,7 @@  void ath10k_get_et_stats(struct ieee80211_hw *hw,
 	data[i++] = ar->fw_crash_counter;
 	data[i++] = ar->fw_warm_reset_counter;
 	data[i++] = ar->fw_cold_reset_counter;
+	data[i++] = ar->fw_powerup_failed;
 
 	spin_unlock_bh(&ar->data_lock);
 
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 38f7386..ebe2ee1 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -1851,7 +1851,7 @@  static int ath10k_pci_hif_power_up(struct ath10k *ar)
 			    ret);
 
 		if (ath10k_pci_reset_mode == ATH10K_PCI_RESET_WARM_ONLY)
-			return ret;
+			goto out;
 
 		ath10k_warn(ar, "trying cold reset\n");
 
@@ -1859,11 +1859,17 @@  static int ath10k_pci_hif_power_up(struct ath10k *ar)
 		if (ret) {
 			ath10k_err(ar, "failed to power up target using cold reset too (%d)\n",
 				   ret);
-			return ret;
+			goto out;
 		}
 	}
 
-	return 0;
+out:
+	/* If we have failed to power-up, it may take a reboot to
+	 * get the NIC back online.
+	 * Set flag accordinly so that user-space can know.
+	 */
+	ar->fw_powerup_failed = !!ret;
+	return ret;
 }
 
 static void ath10k_pci_hif_power_down(struct ath10k *ar)