ath10k: fix vdev-start timeout on error
diff mbox

Message ID 1531501683-32299-1-git-send-email-greearb@candelatech.com
State New
Headers show

Commit Message

Ben Greear July 13, 2018, 5:08 p.m. UTC
From: Ben Greear <greearb@candelatech.com>

The vdev-start-response message should cause the
completion to fire, even in the error case.  Otherwise,
the user still gets no useful information and everything
is blocked until the timeout period.

Add some warning text to print out the invalid status
code to aid debugging.

Signed-off-by: Ben Greear <greearb@candelatech.com>
---
 drivers/net/wireless/ath/ath10k/wmi.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

Comments

Michał Kazior July 13, 2018, 5:17 p.m. UTC | #1
On 13 July 2018 at 19:08,  <greearb@candelatech.com> wrote:
> From: Ben Greear <greearb@candelatech.com>
>
> The vdev-start-response message should cause the
> completion to fire, even in the error case.  Otherwise,
> the user still gets no useful information and everything
> is blocked until the timeout period.
>
> Add some warning text to print out the invalid status
> code to aid debugging.
>
> Signed-off-by: Ben Greear <greearb@candelatech.com>
> ---
>  drivers/net/wireless/ath/ath10k/wmi.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c
> index a60de71..ec4cd1e 100644
> --- a/drivers/net/wireless/ath/ath10k/wmi.c
> +++ b/drivers/net/wireless/ath/ath10k/wmi.c
> @@ -3443,12 +3443,18 @@ void ath10k_wmi_event_vdev_start_resp(struct ath10k *ar, struct sk_buff *skb)
>         ret = ath10k_wmi_pull_vdev_start(ar, skb, &arg);
>         if (ret) {
>                 ath10k_warn(ar, "failed to parse vdev start event: %d\n", ret);
> -               return;
> +               goto out;
>         }
>
> -       if (WARN_ON(__le32_to_cpu(arg.status)))
> -               return;
> +       if (WARN_ON_ONCE(__le32_to_cpu(arg.status))) {
> +               ath10k_warn(ar, "vdev-start-response reports status error: %d\n",
> +                           __le32_to_cpu(arg.status));
> +               /* Setup is done one way or another though, so we should still
> +                * do the completion, so don't return here.
> +                */
> +       }
>
> +out:
>         complete(&ar->vdev_setup_done);

With this the waiter can no longer tell if vdev_start succeeded or
not. It'll always think it succeeded even if arg.status or parsing
failed. Waiter instead of erroring out will continue to play out happy
scenario and may end up crashing firmware.

Not stalling is nice, but I'd argue the status should be propagated
back to the waiter so it can error-check.


Michal
Ben Greear July 13, 2018, 5:21 p.m. UTC | #2
On 07/13/2018 10:17 AM, Michał Kazior wrote:
> On 13 July 2018 at 19:08,  <greearb@candelatech.com> wrote:
>> From: Ben Greear <greearb@candelatech.com>
>>
>> The vdev-start-response message should cause the
>> completion to fire, even in the error case.  Otherwise,
>> the user still gets no useful information and everything
>> is blocked until the timeout period.
>>
>> Add some warning text to print out the invalid status
>> code to aid debugging.
>>
>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>> ---
>>  drivers/net/wireless/ath/ath10k/wmi.c | 12 +++++++++---
>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c
>> index a60de71..ec4cd1e 100644
>> --- a/drivers/net/wireless/ath/ath10k/wmi.c
>> +++ b/drivers/net/wireless/ath/ath10k/wmi.c
>> @@ -3443,12 +3443,18 @@ void ath10k_wmi_event_vdev_start_resp(struct ath10k *ar, struct sk_buff *skb)
>>         ret = ath10k_wmi_pull_vdev_start(ar, skb, &arg);
>>         if (ret) {
>>                 ath10k_warn(ar, "failed to parse vdev start event: %d\n", ret);
>> -               return;
>> +               goto out;
>>         }
>>
>> -       if (WARN_ON(__le32_to_cpu(arg.status)))
>> -               return;
>> +       if (WARN_ON_ONCE(__le32_to_cpu(arg.status))) {
>> +               ath10k_warn(ar, "vdev-start-response reports status error: %d\n",
>> +                           __le32_to_cpu(arg.status));
>> +               /* Setup is done one way or another though, so we should still
>> +                * do the completion, so don't return here.
>> +                */
>> +       }
>>
>> +out:
>>         complete(&ar->vdev_setup_done);
>
> With this the waiter can no longer tell if vdev_start succeeded or
> not. It'll always think it succeeded even if arg.status or parsing
> failed. Waiter instead of erroring out will continue to play out happy
> scenario and may end up crashing firmware.
>
> Not stalling is nice, but I'd argue the status should be propagated
> back to the waiter so it can error-check.

So, maybe set ar->last_wmi_error = __le32_to_cpu(arg.status) before calling
the complete and change the code that waits for vdev_setup_done to check that
error code?

Or, maybe we need an error code specific to this call, ar->last_wmi_vdev_start_status?

Thanks,
Ben
Michał Kazior July 13, 2018, 5:37 p.m. UTC | #3
On 13 July 2018 at 19:21, Ben Greear <greearb@candelatech.com> wrote:
> On 07/13/2018 10:17 AM, Michał Kazior wrote:
>>
>> On 13 July 2018 at 19:08,  <greearb@candelatech.com> wrote:
>>>
>>> From: Ben Greear <greearb@candelatech.com>
>>>
>>> The vdev-start-response message should cause the
>>> completion to fire, even in the error case.  Otherwise,
>>> the user still gets no useful information and everything
>>> is blocked until the timeout period.
>>>
>>> Add some warning text to print out the invalid status
>>> code to aid debugging.
>>>
>>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>>> ---
>>>  drivers/net/wireless/ath/ath10k/wmi.c | 12 +++++++++---
>>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/net/wireless/ath/ath10k/wmi.c
>>> b/drivers/net/wireless/ath/ath10k/wmi.c
>>> index a60de71..ec4cd1e 100644
>>> --- a/drivers/net/wireless/ath/ath10k/wmi.c
>>> +++ b/drivers/net/wireless/ath/ath10k/wmi.c
>>> @@ -3443,12 +3443,18 @@ void ath10k_wmi_event_vdev_start_resp(struct
>>> ath10k *ar, struct sk_buff *skb)
>>>         ret = ath10k_wmi_pull_vdev_start(ar, skb, &arg);
>>>         if (ret) {
>>>                 ath10k_warn(ar, "failed to parse vdev start event: %d\n",
>>> ret);
>>> -               return;
>>> +               goto out;
>>>         }
>>>
>>> -       if (WARN_ON(__le32_to_cpu(arg.status)))
>>> -               return;
>>> +       if (WARN_ON_ONCE(__le32_to_cpu(arg.status))) {
>>> +               ath10k_warn(ar, "vdev-start-response reports status
>>> error: %d\n",
>>> +                           __le32_to_cpu(arg.status));
>>> +               /* Setup is done one way or another though, so we should
>>> still
>>> +                * do the completion, so don't return here.
>>> +                */
>>> +       }
>>>
>>> +out:
>>>         complete(&ar->vdev_setup_done);
>>
>>
>> With this the waiter can no longer tell if vdev_start succeeded or
>> not. It'll always think it succeeded even if arg.status or parsing
>> failed. Waiter instead of erroring out will continue to play out happy
>> scenario and may end up crashing firmware.
>>
>> Not stalling is nice, but I'd argue the status should be propagated
>> back to the waiter so it can error-check.
>
>
> So, maybe set ar->last_wmi_error = __le32_to_cpu(arg.status) before calling
> the complete and change the code that waits for vdev_setup_done to check
> that
> error code?
>
> Or, maybe we need an error code specific to this call,
> ar->last_wmi_vdev_start_status?

Tough call. I can't find any compelling argument to prefer one over
the other. Maybe last_wmi_error is a bit too generic?


Michał
Ben Greear July 13, 2018, 5:40 p.m. UTC | #4
On 07/13/2018 10:37 AM, Michał Kazior wrote:
> On 13 July 2018 at 19:21, Ben Greear <greearb@candelatech.com> wrote:
>> On 07/13/2018 10:17 AM, Michał Kazior wrote:
>>>
>>> On 13 July 2018 at 19:08,  <greearb@candelatech.com> wrote:
>>>>
>>>> From: Ben Greear <greearb@candelatech.com>
>>>>
>>>> The vdev-start-response message should cause the
>>>> completion to fire, even in the error case.  Otherwise,
>>>> the user still gets no useful information and everything
>>>> is blocked until the timeout period.
>>>>
>>>> Add some warning text to print out the invalid status
>>>> code to aid debugging.
>>>>
>>>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>>>> ---
>>>>  drivers/net/wireless/ath/ath10k/wmi.c | 12 +++++++++---
>>>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/net/wireless/ath/ath10k/wmi.c
>>>> b/drivers/net/wireless/ath/ath10k/wmi.c
>>>> index a60de71..ec4cd1e 100644
>>>> --- a/drivers/net/wireless/ath/ath10k/wmi.c
>>>> +++ b/drivers/net/wireless/ath/ath10k/wmi.c
>>>> @@ -3443,12 +3443,18 @@ void ath10k_wmi_event_vdev_start_resp(struct
>>>> ath10k *ar, struct sk_buff *skb)
>>>>         ret = ath10k_wmi_pull_vdev_start(ar, skb, &arg);
>>>>         if (ret) {
>>>>                 ath10k_warn(ar, "failed to parse vdev start event: %d\n",
>>>> ret);
>>>> -               return;
>>>> +               goto out;
>>>>         }
>>>>
>>>> -       if (WARN_ON(__le32_to_cpu(arg.status)))
>>>> -               return;
>>>> +       if (WARN_ON_ONCE(__le32_to_cpu(arg.status))) {
>>>> +               ath10k_warn(ar, "vdev-start-response reports status
>>>> error: %d\n",
>>>> +                           __le32_to_cpu(arg.status));
>>>> +               /* Setup is done one way or another though, so we should
>>>> still
>>>> +                * do the completion, so don't return here.
>>>> +                */
>>>> +       }
>>>>
>>>> +out:
>>>>         complete(&ar->vdev_setup_done);
>>>
>>>
>>> With this the waiter can no longer tell if vdev_start succeeded or
>>> not. It'll always think it succeeded even if arg.status or parsing
>>> failed. Waiter instead of erroring out will continue to play out happy
>>> scenario and may end up crashing firmware.
>>>
>>> Not stalling is nice, but I'd argue the status should be propagated
>>> back to the waiter so it can error-check.
>>
>>
>> So, maybe set ar->last_wmi_error = __le32_to_cpu(arg.status) before calling
>> the complete and change the code that waits for vdev_setup_done to check
>> that
>> error code?
>>
>> Or, maybe we need an error code specific to this call,
>> ar->last_wmi_vdev_start_status?
>
> Tough call. I can't find any compelling argument to prefer one over
> the other. Maybe last_wmi_error is a bit too generic?

I was thinking some actions might require multiple wmi calls, and so
each one we track would need its own variable.

I'll implement it with a more specific error code and post a new patch.

Thanks,
Ben

Patch
diff mbox

diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c
index a60de71..ec4cd1e 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.c
+++ b/drivers/net/wireless/ath/ath10k/wmi.c
@@ -3443,12 +3443,18 @@  void ath10k_wmi_event_vdev_start_resp(struct ath10k *ar, struct sk_buff *skb)
 	ret = ath10k_wmi_pull_vdev_start(ar, skb, &arg);
 	if (ret) {
 		ath10k_warn(ar, "failed to parse vdev start event: %d\n", ret);
-		return;
+		goto out;
 	}
 
-	if (WARN_ON(__le32_to_cpu(arg.status)))
-		return;
+	if (WARN_ON_ONCE(__le32_to_cpu(arg.status))) {
+		ath10k_warn(ar, "vdev-start-response reports status error: %d\n",
+			    __le32_to_cpu(arg.status));
+		/* Setup is done one way or another though, so we should still
+		 * do the completion, so don't return here.
+		 */
+	}
 
+out:
 	complete(&ar->vdev_setup_done);
 }