diff mbox

wmi: Retry if CE logic is out of buffers.

Message ID 1459368643-13300-1-git-send-email-greearb@candelatech.com (mailing list archive)
State Changes Requested
Delegated to: Kalle Valo
Headers show

Commit Message

Ben Greear March 30, 2016, 8:10 p.m. UTC
From: Ben Greear <greearb@candelatech.com>

I believe the CE tx buffer reaping logic may be able to fall
behind in certain cases (lots of serial console logging, lots
of WMI messages).

Dropping WMI messages is a very serious problem, so it is worth
waiting a bit in hopes the tx buffers become available again.

Signed-off-by: Ben Greear <greearb@candelatech.com>
---

Probably the ath10k_err should be made dbg or rate-limited before
this goes upstream..in meantime, it might help shed some light on
this problem.

 drivers/net/wireless/ath/ath10k/wmi.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Comments

Michal Kazior March 31, 2016, 6:55 a.m. UTC | #1
On 30 March 2016 at 22:10,  <greearb@candelatech.com> wrote:
> From: Ben Greear <greearb@candelatech.com>
>
> I believe the CE tx buffer reaping logic may be able to fall
> behind in certain cases (lots of serial console logging, lots
> of WMI messages).
>
> Dropping WMI messages is a very serious problem, so it is worth
> waiting a bit in hopes the tx buffers become available again.
>
> Signed-off-by: Ben Greear <greearb@candelatech.com>
> ---
>
> Probably the ath10k_err should be made dbg or rate-limited before
> this goes upstream..in meantime, it might help shed some light on
> this problem.
>
>  drivers/net/wireless/ath/ath10k/wmi.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c
> index f042711..43d23fc 100644
> --- a/drivers/net/wireless/ath/ath10k/wmi.c
> +++ b/drivers/net/wireless/ath/ath10k/wmi.c
> @@ -1819,6 +1819,7 @@ static void ath10k_wmi_op_ep_tx_credits(struct ath10k *ar)
>  int ath10k_wmi_cmd_send(struct ath10k *ar, struct sk_buff *skb, u32 cmd_id)
>  {
>         int ret = -EOPNOTSUPP;
> +       int retry = 1000;
>
>         might_sleep();
>
> @@ -1832,7 +1833,19 @@ int ath10k_wmi_cmd_send(struct ath10k *ar, struct sk_buff *skb, u32 cmd_id)
>                 /* try to send pending beacons first. they take priority */
>                 ath10k_wmi_tx_beacons_nowait(ar);
>
> -               ret = ath10k_wmi_cmd_send_nowait(ar, skb, cmd_id);
> +               while (--retry) {
> +                       ret = ath10k_wmi_cmd_send_nowait(ar, skb, cmd_id);
> +                       if ((ret == -ENOBUFS) &&
> +                           !test_bit(ATH10K_FLAG_CRASH_FLUSH, &ar->dev_flags)) {
> +                               /* CE transport logic is full, maybe we cannot reap entries fast
> +                                * enough?
> +                                */
> +                               ath10k_err(ar, "CE transport is full, sleeping for 1ms\n");
> +                               msleep(1);
> +                               continue;
> +                       }
> +                       break;
> +               }

This looks like a workaround to me. This problem shouldn't be
happening in the first place as far as design is concerned.

If it does the only reason I can think of is if MSI-range support is exercised.

Anyway, It'd be a lot more sane to instead poll WMI's CE Tx pipe when
processing WMI's CE Rx pipe (but that's still -arguably- unnecessary)
instead of retrying in WMI..


Micha?
Adrian Chadd March 31, 2016, 3:31 p.m. UTC | #2
On 30 March 2016 at 23:55, Michal Kazior <michal.kazior@tieto.com> wrote:
> On 30 March 2016 at 22:10,  <greearb@candelatech.com> wrote:
>> From: Ben Greear <greearb@candelatech.com>
>>
>> I believe the CE tx buffer reaping logic may be able to fall
>> behind in certain cases (lots of serial console logging, lots
>> of WMI messages).
>>
>> Dropping WMI messages is a very serious problem, so it is worth
>> waiting a bit in hopes the tx buffers become available again.
>>
>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>> ---
>>
>> Probably the ath10k_err should be made dbg or rate-limited before
>> this goes upstream..in meantime, it might help shed some light on
>> this problem.
>>
>>  drivers/net/wireless/ath/ath10k/wmi.c | 15 ++++++++++++++-
>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c
>> index f042711..43d23fc 100644
>> --- a/drivers/net/wireless/ath/ath10k/wmi.c
>> +++ b/drivers/net/wireless/ath/ath10k/wmi.c
>> @@ -1819,6 +1819,7 @@ static void ath10k_wmi_op_ep_tx_credits(struct ath10k *ar)
>>  int ath10k_wmi_cmd_send(struct ath10k *ar, struct sk_buff *skb, u32 cmd_id)
>>  {
>>         int ret = -EOPNOTSUPP;
>> +       int retry = 1000;
>>
>>         might_sleep();
>>
>> @@ -1832,7 +1833,19 @@ int ath10k_wmi_cmd_send(struct ath10k *ar, struct sk_buff *skb, u32 cmd_id)
>>                 /* try to send pending beacons first. they take priority */
>>                 ath10k_wmi_tx_beacons_nowait(ar);
>>
>> -               ret = ath10k_wmi_cmd_send_nowait(ar, skb, cmd_id);
>> +               while (--retry) {
>> +                       ret = ath10k_wmi_cmd_send_nowait(ar, skb, cmd_id);
>> +                       if ((ret == -ENOBUFS) &&
>> +                           !test_bit(ATH10K_FLAG_CRASH_FLUSH, &ar->dev_flags)) {
>> +                               /* CE transport logic is full, maybe we cannot reap entries fast
>> +                                * enough?
>> +                                */
>> +                               ath10k_err(ar, "CE transport is full, sleeping for 1ms\n");
>> +                               msleep(1);
>> +                               continue;
>> +                       }
>> +                       break;
>> +               }
>
> This looks like a workaround to me. This problem shouldn't be
> happening in the first place as far as design is concerned.
>
> If it does the only reason I can think of is if MSI-range support is exercised.
>
> Anyway, It'd be a lot more sane to instead poll WMI's CE Tx pipe when
> processing WMI's CE Rx pipe (but that's still -arguably- unnecessary)
> instead of retrying in WMI..

I think polling TX when processing WMI RX isn't a bad idea. If Ben's
doing a lot of WMI stuff then the interrupt latency or scheduling may
be getting in the way.

But I'd like to first verify that the problem isn't something silly
like you're not seeing the interrupt because it didn't fire in the
first place...


-adrian
diff mbox

Patch

diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c
index f042711..43d23fc 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.c
+++ b/drivers/net/wireless/ath/ath10k/wmi.c
@@ -1819,6 +1819,7 @@  static void ath10k_wmi_op_ep_tx_credits(struct ath10k *ar)
 int ath10k_wmi_cmd_send(struct ath10k *ar, struct sk_buff *skb, u32 cmd_id)
 {
 	int ret = -EOPNOTSUPP;
+	int retry = 1000;
 
 	might_sleep();
 
@@ -1832,7 +1833,19 @@  int ath10k_wmi_cmd_send(struct ath10k *ar, struct sk_buff *skb, u32 cmd_id)
 		/* try to send pending beacons first. they take priority */
 		ath10k_wmi_tx_beacons_nowait(ar);
 
-		ret = ath10k_wmi_cmd_send_nowait(ar, skb, cmd_id);
+		while (--retry) {
+			ret = ath10k_wmi_cmd_send_nowait(ar, skb, cmd_id);
+			if ((ret == -ENOBUFS) &&
+			    !test_bit(ATH10K_FLAG_CRASH_FLUSH, &ar->dev_flags)) {
+				/* CE transport logic is full, maybe we cannot reap entries fast
+				 * enough?
+				 */
+				ath10k_err(ar, "CE transport is full, sleeping for 1ms\n");
+				msleep(1);
+				continue;
+			}
+			break;
+		}
 
 		if (ret && test_bit(ATH10K_FLAG_CRASH_FLUSH, &ar->dev_flags))
 			ret = -ESHUTDOWN;