Message ID | 1411507045-18973-7-git-send-email-greearb@candelatech.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
greearb@candelatech.com writes: > From: Ben Greear <greearb@candelatech.com> > > This gives user-space a normal-ish way to detect that > firmware has failed to start and that a reboot is > probably required. > > Signed-off-by: Ben Greear <greearb@candelatech.com> [...] > --- a/drivers/net/wireless/ath/ath10k/pci.c > +++ b/drivers/net/wireless/ath/ath10k/pci.c > @@ -1851,7 +1851,7 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar) > ret); > > if (ath10k_pci_reset_mode == ATH10K_PCI_RESET_WARM_ONLY) > - return ret; > + goto out; > > ath10k_warn(ar, "trying cold reset\n"); > > @@ -1859,11 +1859,17 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar) > if (ret) { > ath10k_err(ar, "failed to power up target using cold reset too (%d)\n", > ret); > - return ret; > + goto out; > } > } > > - return 0; > +out: > + /* If we have failed to power-up, it may take a reboot to > + * get the NIC back online. > + * Set flag accordinly so that user-space can know. > + */ > + ar->fw_powerup_failed = !!ret; > + return ret; > } Would it be better to use ATH10K_STATE_WEDGED for this and then just export the state value to user space? Or should we have two different states, like FW_WEDGED and HW_WEDGED?
On 09/29/2014 01:24 AM, Kalle Valo wrote: > greearb@candelatech.com writes: > >> From: Ben Greear <greearb@candelatech.com> >> >> This gives user-space a normal-ish way to detect that >> firmware has failed to start and that a reboot is >> probably required. >> >> Signed-off-by: Ben Greear <greearb@candelatech.com> > > [...] > >> --- a/drivers/net/wireless/ath/ath10k/pci.c >> +++ b/drivers/net/wireless/ath/ath10k/pci.c >> @@ -1851,7 +1851,7 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar) >> ret); >> >> if (ath10k_pci_reset_mode == ATH10K_PCI_RESET_WARM_ONLY) >> - return ret; >> + goto out; >> >> ath10k_warn(ar, "trying cold reset\n"); >> >> @@ -1859,11 +1859,17 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar) >> if (ret) { >> ath10k_err(ar, "failed to power up target using cold reset too (%d)\n", >> ret); >> - return ret; >> + goto out; >> } >> } >> >> - return 0; >> +out: >> + /* If we have failed to power-up, it may take a reboot to >> + * get the NIC back online. >> + * Set flag accordinly so that user-space can know. >> + */ >> + ar->fw_powerup_failed = !!ret; >> + return ret; >> } > > Would it be better to use ATH10K_STATE_WEDGED for this and then just > export the state value to user space? Or should we have two different > states, like FW_WEDGED and HW_WEDGED? I didn't want to mess with the state machine. This counter is just a clue to users that things might be badly wrong. Some systems might recover with another hard reset, some will hang the entire system hard, and some will just stick in this state unable to recover. Some of my systems exhibit this last behaviour, so at least with this patch I can warn the user that they need to reboot to regain wifi functionality. If you want to tie it to a state machine, that is OK with me, but I don't want to mess with it because that code is already tricky enough. Thanks, Ben
On 29 September 2014 18:05, Ben Greear <greearb@candelatech.com> wrote: > On 09/29/2014 01:24 AM, Kalle Valo wrote: >> greearb@candelatech.com writes: >> >>> From: Ben Greear <greearb@candelatech.com> >>> >>> This gives user-space a normal-ish way to detect that >>> firmware has failed to start and that a reboot is >>> probably required. >>> >>> Signed-off-by: Ben Greear <greearb@candelatech.com> [...] >>> - return 0; >>> +out: >>> + /* If we have failed to power-up, it may take a reboot to >>> + * get the NIC back online. >>> + * Set flag accordinly so that user-space can know. >>> + */ >>> + ar->fw_powerup_failed = !!ret; >>> + return ret; >>> } >> >> Would it be better to use ATH10K_STATE_WEDGED for this and then just >> export the state value to user space? Or should we have two different >> states, like FW_WEDGED and HW_WEDGED? Current WEDGED state is more like ON. It assumes mac80211 will call ath10k_stop(). Adding another state just for the sake of handling power up / reset issues seems like an overkill to me. > I didn't want to mess with the state machine. This counter > is just a clue to users that things might be badly wrong. Some systems > might recover with another hard reset, some will hang the entire > system hard, and some will just stick in this state unable to > recover. Some of my systems exhibit this last behaviour, so at > least with this patch I can warn the user that they need to > reboot to regain wifi functionality. If power up fails the error should propagate to `ifconfig wlanX up` (or whatever calling drv_start) eventually so I don't see the point in having this counter. Micha? -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/30/2014 05:27 AM, Michal Kazior wrote: > On 29 September 2014 18:05, Ben Greear <greearb@candelatech.com> wrote: >> On 09/29/2014 01:24 AM, Kalle Valo wrote: >>> greearb@candelatech.com writes: >>> >>>> From: Ben Greear <greearb@candelatech.com> >>>> >>>> This gives user-space a normal-ish way to detect that >>>> firmware has failed to start and that a reboot is >>>> probably required. >>>> >>>> Signed-off-by: Ben Greear <greearb@candelatech.com> > [...] >>>> - return 0; >>>> +out: >>>> + /* If we have failed to power-up, it may take a reboot to >>>> + * get the NIC back online. >>>> + * Set flag accordinly so that user-space can know. >>>> + */ >>>> + ar->fw_powerup_failed = !!ret; >>>> + return ret; >>>> } >>> >>> Would it be better to use ATH10K_STATE_WEDGED for this and then just >>> export the state value to user space? Or should we have two different >>> states, like FW_WEDGED and HW_WEDGED? > > Current WEDGED state is more like ON. It assumes mac80211 will call > ath10k_stop(). > > Adding another state just for the sake of handling power up / reset > issues seems like an overkill to me. > > >> I didn't want to mess with the state machine. This counter >> is just a clue to users that things might be badly wrong. Some systems >> might recover with another hard reset, some will hang the entire >> system hard, and some will just stick in this state unable to >> recover. Some of my systems exhibit this last behaviour, so at >> least with this patch I can warn the user that they need to >> reboot to regain wifi functionality. > > If power up fails the error should propagate to `ifconfig wlanX up` > (or whatever calling drv_start) eventually so I don't see the point in > having this counter. Supplicant manages this, and programatically it is not at all easy to figure out that the network is failing to come up because the firmware is broken-and-can't-be-fixed-without-reboot. The counter I added can be queried by a management tool and propagate a clear error to the end user. Thanks, Ben
diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h index 7b220b1..601d573 100644 --- a/drivers/net/wireless/ath/ath10k/core.h +++ b/drivers/net/wireless/ath/ath10k/core.h @@ -399,7 +399,7 @@ struct ath10k { struct ieee80211_hw *hw; struct device *dev; u8 mac_addr[ETH_ALEN]; - + bool fw_powerup_failed; /* If true, might take reboot to recover. */ u32 chip_id; u32 target_version; u8 fw_version_major; diff --git a/drivers/net/wireless/ath/ath10k/debug.c b/drivers/net/wireless/ath/ath10k/debug.c index af1ca3e..77b60f4 100644 --- a/drivers/net/wireless/ath/ath10k/debug.c +++ b/drivers/net/wireless/ath/ath10k/debug.c @@ -1091,6 +1091,7 @@ static const char ath10k_gstrings_stats[][ETH_GSTRING_LEN] = { "d_fw_crash_count", "d_fw_warm_reset_count", "d_fw_cold_reset_count", + "d_fw_powerup_failed", /* boolean */ }; #define ATH10K_SSTATS_LEN ARRAY_SIZE(ath10k_gstrings_stats) @@ -1175,6 +1176,7 @@ void ath10k_get_et_stats(struct ieee80211_hw *hw, data[i++] = ar->fw_crash_counter; data[i++] = ar->fw_warm_reset_counter; data[i++] = ar->fw_cold_reset_counter; + data[i++] = ar->fw_powerup_failed; spin_unlock_bh(&ar->data_lock); diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c index 38f7386..ebe2ee1 100644 --- a/drivers/net/wireless/ath/ath10k/pci.c +++ b/drivers/net/wireless/ath/ath10k/pci.c @@ -1851,7 +1851,7 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar) ret); if (ath10k_pci_reset_mode == ATH10K_PCI_RESET_WARM_ONLY) - return ret; + goto out; ath10k_warn(ar, "trying cold reset\n"); @@ -1859,11 +1859,17 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar) if (ret) { ath10k_err(ar, "failed to power up target using cold reset too (%d)\n", ret); - return ret; + goto out; } } - return 0; +out: + /* If we have failed to power-up, it may take a reboot to + * get the NIC back online. + * Set flag accordinly so that user-space can know. + */ + ar->fw_powerup_failed = !!ret; + return ret; } static void ath10k_pci_hif_power_down(struct ath10k *ar)