diff mbox

[v4] brcmfmac: Decrease 8021x_cnt for dropped packets

Message ID CAC0pXTKk1AA5k-i10PKTxHujPG2jPGbWMnNp6y6=v0ngK5+L4Q@mail.gmail.com (mailing list archive)
State Not Applicable
Delegated to: Kalle Valo
Headers show

Commit Message

per.forlin@gmail.com July 13, 2016, 6:52 p.m. UTC
2016-07-13 13:20 GMT+02:00 Arend Van Spriel <arend.vanspriel@broadcom.com>:
> On 12-7-2016 12:23, Per Förlin wrote:
>> 2016-07-12 11:48 GMT+02:00 Arend Van Spriel <arend.vanspriel@broadcom.com>:
>>>
>>>
>>> On 12-7-2016 10:35, Per Förlin wrote:
>>>> 2016-07-06 11:53 GMT+02:00 Per Förlin <per.forlin@gmail.com>:
>>>>> I have now verified this patch on backports 4.4.
>>>>>
>>>>> 2016-04-12 23:55 GMT+02:00  <per.forlin@gmail.com>:
>>>>>> From: Per Forlin <per.forlin@gmail.com>
>>>>>>
>>>>>> This patch resolves an issue where pend_8021x_cnt was not decreased
>>>>>> on txfinalize. This caused brcmf_netdev_wait_pend8021x to timeout
>>>>>> because the counter indicated pending packets.
>>>>>>
>>>>>> WARNING: at .../brcmfmac/core.c:1289 brcmf_netdev_wait_pend8021x
>>>>>>   (warn_slowpath_common)
>>>>>>   (warn_slowpath_null)
>>>>>>   (brcmf_netdev_wait_pend8021x [brcmfmac])
>>>>>>   (send_key_to_dongle [brcmfmac])
>>>>>>   (brcmf_cfg80211_del_key [brcmfmac])
>>>>>>   (nl80211_del_key [cfg80211])
>>>>>>   (genl_rcv_msg)
>>>>>>   (netlink_rcv_skb)
>>>>>>   (genl_rcv)
>>>>>>   (netlink_unicast)
>>>>>>   (netlink_sendmsg)
>>>>>>   (sock_sendmsg)
>>>>>>   (___sys_sendmsg)
>>>>>>   (__sys_sendmsg)
>>>>>>   (SyS_sendmsg)
>>>>>>
>>>>>> The solution is to pull back the header offset in case
>>>>>> of an error in txdata(), which may happen in case of
>>>> Clarification:
>>>>
>>>> txdata=brcmf_proto_bcdc_txdata()
>>>> brcmf_proto_bcdc_txdata(): Calls brcmf_proto_bcdc_hdrpush()
>>>>
>>>> The header needs to be pulled back in case of error otherwise
>>>> the error handling and cleanup up will fail to decrease the counter
>>>> of pending packets.
>>>
>>> Yes, this part of the patch is clear to me.
>>>
>> Thanks, I wasn't sure.
>>
>>>>>> packet overload in brcmf_sdio_bus_txdata.
>>>>>>
>>>>>> Overloading an WLAN interface is not an unlikely scenario.
>>>
>>> So here is where things start to look suspicious and I have mentioned
>>> this before. My thought here was "How the hell can you end up with a
>>> 2048 packets on the sdio queue", which I mentioned to you before. There
>>> is a high watermark on the queue upon which we do a netif_stop_queue()
>>> so network layer does not keep pushing tx packets our way. Looking
>>> further into this I found that we introduced a bug with commit
>>> 9cd18359d31e ("brcmfmac: Make FWS queueing configurable.") so we ended
>>> up doing nothing except increasing as statistics debug counter :-(
>>>
>> Is there a fix available for the high watermark issue or is it
>> something you will look into?
>>
>> To produce a load on the wlan interface I run
>> iperf -c 239.255.1.3 -u -b 10m -f m -i 60 -t 3000
>> and this is enough in my case to fill up the 2048 queue.
>>
>>>>>> In case of packet overload the error print "out of bus->txq"
>>>>>> is very verbose. To reduce SPAM degrade it to a debug print.
>>>>>>
>>>>>> Signed-off-by: Per Forlin <per.forlin@gmail.com>
>>>>>> ---
>>>>>> Change log:
>>>>>>  v2 - Add variable to know whether the counter is increased or not
>>>>>>  v3 - txfinalize should decrease the counter. Adjust skb header offset
>>>>>>  v4 - Fix build error
>>>>>>
>>>>>> drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c     | 4 ++++
>>>>>>  drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c | 4 +++-
>>>>>>  drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c     | 2 +-
>>>>>>  3 files changed, 8 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
>>>>>> index ed9998b..f342f7c 100644
>>>>>> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
>>>>>> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
>>>>>> @@ -541,6 +541,9 @@ void brcmf_txfinalize(struct brcmf_if *ifp, struct sk_buff *txp, bool success)
>>>>>>         struct ethhdr *eh;
>>>>>>         u16 type;
>>>>>>
>>>>>> +       if (!ifp)
>>>>>> +               goto free;
>>>>>> +
>>>
>>> This may not be needed.
>>>
>> This is not strictly needed. I can remove it.
>>
>>>>>>         eh = (struct ethhdr *)(txp->data);
>>>>>>         type = ntohs(eh->h_proto);
>>>>>>
>>>>>> @@ -553,6 +556,7 @@ void brcmf_txfinalize(struct brcmf_if *ifp, struct sk_buff *txp, bool success)
>>>>>>         if (!success)
>>>>>>                 ifp->stats.tx_errors++;
>>>>>>
>>>>>> +free:
>>>>>>         brcmu_pkt_buf_free_skb(txp);
>>>>>>  }
>>>>>>
>>>>>> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
>>>>>> index f82c9ab..98cb83f 100644
>>>>>> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
>>>>>> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
>>>>>> @@ -1899,8 +1899,10 @@ int brcmf_fws_process_skb(struct brcmf_if *ifp, struct sk_buff *skb)
>>>>>>
>>>>>>         if (fws->avoid_queueing) {
>>>>>>                 rc = brcmf_proto_txdata(drvr, ifp->ifidx, 0, skb);
>>>>>> -               if (rc < 0)
>>>>>> +               if (rc < 0) {
>>>>>> +                       (void)brcmf_proto_hdrpull(drvr, false, skb, &ifp);
>>>
>>> Could it be that the ifp is NULL pointer after brcmf_proto_hdrpull().
>>> Can you check. Better use tmp_ifp variable in this call as you have a
>>> valid ifp before this call for sure.
>>>
>> To be on the safe side I can use NULL as in param like you propose,
>> and use the available ifp.
>>
>>>>>>                         brcmf_txfinalize(ifp, skb, false);
>>>>>> +               }
>>>>>>                 return rc;
>>>>>>         }
>>>>>>
>>>>>> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
>>>>>> index a14d9d9d..485e2ad 100644
>>>>>> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
>>>>>> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
>>>>>> @@ -2721,7 +2721,7 @@ static int brcmf_sdio_bus_txdata(struct device *dev, struct sk_buff *pkt)
>>>>>>         *(u16 *)(pkt->cb) = 0;
>>>>>>         if (!brcmf_sdio_prec_enq(&bus->txq, pkt, prec)) {
>>>>>>                 skb_pull(pkt, bus->tx_hdrlen);
>>>>>> -               brcmf_err("out of bus->txq !!!\n");
>>>>>> +               brcmf_dbg(INFO, "out of bus->txq !!!\n");
>>>
>>> Now that I understand the issue I want to keep this as error print as it
>>> should be very unlikely.
>> I would like to test this patch with the watermark fix to confirm this.
>
> Can you try this?
>
> Regards,
> Arend
>
> ---
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
> b/drive
> index cd221ab..9f9024a 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
> @@ -2469,10 +2469,22 @@ void brcmf_fws_bustxfail(struct brcmf_fws_info
> *fws, str
>  void brcmf_fws_bus_blocked(struct brcmf_pub *drvr, bool flow_blocked)
>  {
>         struct brcmf_fws_info *fws = drvr->fws;
> +       struct brcmf_if *ifp;
> +       int i;
>
> -       fws->bus_flow_blocked = flow_blocked;
> -       if (!flow_blocked)
> -               brcmf_fws_schedule_deq(fws);
> -       else
> -               fws->stats.bus_flow_block++;
> +       if (fws->avoid_queueing) {
> +               for (i = 0; i < BRCMF_MAX_IFS; i++) {
> +                       ifp = drvr->iflist[i];
> +                       if (!ifp || !ifp->ndev)
> +                               continue;
> +                       brcmf_txflowblock_if(ifp,
> BRCMF_NETIF_STOP_REASON_FLOW,
> +                                            flow_blocked);
> +               }
> +       } else {
> +               fws->bus_flow_blocked = flow_blocked;
> +               if (!flow_blocked)
> +                       brcmf_fws_schedule_deq(fws);
> +               else
> +                       fws->stats.bus_flow_block++;
> +       }
>  }
>
Thanks for the code. I run a quick test and it looks fine.
I added some prints to check the progress:
len - is number of items in the queue
max -  is max number of entries in the queue

[   89.407856] [brcmf_sdio_prec_enq] prec 2 prio 0 len 1789 max 2048
[   89.414682] [brcmf_sdio_prec_enq] prec 2 prio 0 len 1790 max 2048
[   89.421497] [brcmf_sdio_prec_enq] prec 2 prio 0 len 1791 max 2048
[   93.970466] [brcmf_sdio_prec_enq] prec 2 prio 0 len 1524 max 2048
[   93.977520] [brcmf_sdio_prec_enq] prec 2 prio 0 len 1525 max 2048
[   93.984572] [brcmf_sdio_prec_enq] prec 2 prio 0 len 1526 max 2048

I will some run more WLAN tests tomorrow to make sure.
When I'm done testing I will update my patch as well and let you know.

I came across this issue when I tried to connect to my WLAN access
point. The bug was triggered due to a lot of background traffic
(broadcast and multicast) in the network filling up the queue.
It's now fixed so that the queue is not flooded. Now I move on to the
next issue. It's still difficult to connect because the background
eats up all the bandwidth (this is not a constant issue but it happens
from time to time depending on the background load).
For queueing-mode there seems to exist logic for handling multicast
traffic separately but for the SDIO case (no-queueing) normal traffic
gets same precedence as multicast traffic.

I'm considering something like this:
                        (void)brcmf_proto_hdrpull(drvr, false, skb, NULL);

It feels a little hacky.

I would like to use the drvr->tx_multicast instead, if possible.
The problem is that the "drvr" struct is not passed down to the sdio layer.
One could update bcdc.c : brcmf_proto_bcdc_txdata() to pass
"drcr->multicast" to the SDIO layer but not all bus implementations
need this information (USB for instance)

Any suggestions?

BR
Per Forlin
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Arend Van Spriel July 14, 2016, 8:57 a.m. UTC | #1
On 13-7-2016 20:52, Per Förlin wrote:
> 2016-07-13 13:20 GMT+02:00 Arend Van Spriel <arend.vanspriel@broadcom.com>:
>> On 12-7-2016 12:23, Per Förlin wrote:
>>> 2016-07-12 11:48 GMT+02:00 Arend Van Spriel <arend.vanspriel@broadcom.com>:
>>>>
>>>>
>>>> On 12-7-2016 10:35, Per Förlin wrote:
>>>>> 2016-07-06 11:53 GMT+02:00 Per Förlin <per.forlin@gmail.com>:
>>>>>> I have now verified this patch on backports 4.4.
>>>>>>
>>>>>> 2016-04-12 23:55 GMT+02:00  <per.forlin@gmail.com>:
>>>>>>> From: Per Forlin <per.forlin@gmail.com>
>>>>>>>
>>>>>>> This patch resolves an issue where pend_8021x_cnt was not decreased
>>>>>>> on txfinalize. This caused brcmf_netdev_wait_pend8021x to timeout
>>>>>>> because the counter indicated pending packets.
>>>>>>>
>>>>>>> WARNING: at .../brcmfmac/core.c:1289 brcmf_netdev_wait_pend8021x
>>>>>>>   (warn_slowpath_common)
>>>>>>>   (warn_slowpath_null)
>>>>>>>   (brcmf_netdev_wait_pend8021x [brcmfmac])
>>>>>>>   (send_key_to_dongle [brcmfmac])
>>>>>>>   (brcmf_cfg80211_del_key [brcmfmac])
>>>>>>>   (nl80211_del_key [cfg80211])
>>>>>>>   (genl_rcv_msg)
>>>>>>>   (netlink_rcv_skb)
>>>>>>>   (genl_rcv)
>>>>>>>   (netlink_unicast)
>>>>>>>   (netlink_sendmsg)
>>>>>>>   (sock_sendmsg)
>>>>>>>   (___sys_sendmsg)
>>>>>>>   (__sys_sendmsg)
>>>>>>>   (SyS_sendmsg)
>>>>>>>
>>>>>>> The solution is to pull back the header offset in case
>>>>>>> of an error in txdata(), which may happen in case of
>>>>> Clarification:
>>>>>
>>>>> txdata=brcmf_proto_bcdc_txdata()
>>>>> brcmf_proto_bcdc_txdata(): Calls brcmf_proto_bcdc_hdrpush()
>>>>>
>>>>> The header needs to be pulled back in case of error otherwise
>>>>> the error handling and cleanup up will fail to decrease the counter
>>>>> of pending packets.
>>>>
>>>> Yes, this part of the patch is clear to me.
>>>>
>>> Thanks, I wasn't sure.
>>>
>>>>>>> packet overload in brcmf_sdio_bus_txdata.
>>>>>>>
>>>>>>> Overloading an WLAN interface is not an unlikely scenario.
>>>>
>>>> So here is where things start to look suspicious and I have mentioned
>>>> this before. My thought here was "How the hell can you end up with a
>>>> 2048 packets on the sdio queue", which I mentioned to you before. There
>>>> is a high watermark on the queue upon which we do a netif_stop_queue()
>>>> so network layer does not keep pushing tx packets our way. Looking
>>>> further into this I found that we introduced a bug with commit
>>>> 9cd18359d31e ("brcmfmac: Make FWS queueing configurable.") so we ended
>>>> up doing nothing except increasing as statistics debug counter :-(
>>>>
>>> Is there a fix available for the high watermark issue or is it
>>> something you will look into?
>>>
>>> To produce a load on the wlan interface I run
>>> iperf -c 239.255.1.3 -u -b 10m -f m -i 60 -t 3000
>>> and this is enough in my case to fill up the 2048 queue.
>>>
>>>>>>> In case of packet overload the error print "out of bus->txq"
>>>>>>> is very verbose. To reduce SPAM degrade it to a debug print.
>>>>>>>
>>>>>>> Signed-off-by: Per Forlin <per.forlin@gmail.com>
>>>>>>> ---
>>>>>>> Change log:
>>>>>>>  v2 - Add variable to know whether the counter is increased or not
>>>>>>>  v3 - txfinalize should decrease the counter. Adjust skb header offset
>>>>>>>  v4 - Fix build error
>>>>>>>
>>>>>>> drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c     | 4 ++++
>>>>>>>  drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c | 4 +++-
>>>>>>>  drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c     | 2 +-
>>>>>>>  3 files changed, 8 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
>>>>>>> index ed9998b..f342f7c 100644
>>>>>>> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
>>>>>>> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
>>>>>>> @@ -541,6 +541,9 @@ void brcmf_txfinalize(struct brcmf_if *ifp, struct sk_buff *txp, bool success)
>>>>>>>         struct ethhdr *eh;
>>>>>>>         u16 type;
>>>>>>>
>>>>>>> +       if (!ifp)
>>>>>>> +               goto free;
>>>>>>> +
>>>>
>>>> This may not be needed.
>>>>
>>> This is not strictly needed. I can remove it.
>>>
>>>>>>>         eh = (struct ethhdr *)(txp->data);
>>>>>>>         type = ntohs(eh->h_proto);
>>>>>>>
>>>>>>> @@ -553,6 +556,7 @@ void brcmf_txfinalize(struct brcmf_if *ifp, struct sk_buff *txp, bool success)
>>>>>>>         if (!success)
>>>>>>>                 ifp->stats.tx_errors++;
>>>>>>>
>>>>>>> +free:
>>>>>>>         brcmu_pkt_buf_free_skb(txp);
>>>>>>>  }
>>>>>>>
>>>>>>> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
>>>>>>> index f82c9ab..98cb83f 100644
>>>>>>> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
>>>>>>> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
>>>>>>> @@ -1899,8 +1899,10 @@ int brcmf_fws_process_skb(struct brcmf_if *ifp, struct sk_buff *skb)
>>>>>>>
>>>>>>>         if (fws->avoid_queueing) {
>>>>>>>                 rc = brcmf_proto_txdata(drvr, ifp->ifidx, 0, skb);
>>>>>>> -               if (rc < 0)
>>>>>>> +               if (rc < 0) {
>>>>>>> +                       (void)brcmf_proto_hdrpull(drvr, false, skb, &ifp);
>>>>
>>>> Could it be that the ifp is NULL pointer after brcmf_proto_hdrpull().
>>>> Can you check. Better use tmp_ifp variable in this call as you have a
>>>> valid ifp before this call for sure.
>>>>
>>> To be on the safe side I can use NULL as in param like you propose,
>>> and use the available ifp.
>>>
>>>>>>>                         brcmf_txfinalize(ifp, skb, false);
>>>>>>> +               }
>>>>>>>                 return rc;
>>>>>>>         }
>>>>>>>
>>>>>>> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
>>>>>>> index a14d9d9d..485e2ad 100644
>>>>>>> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
>>>>>>> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
>>>>>>> @@ -2721,7 +2721,7 @@ static int brcmf_sdio_bus_txdata(struct device *dev, struct sk_buff *pkt)
>>>>>>>         *(u16 *)(pkt->cb) = 0;
>>>>>>>         if (!brcmf_sdio_prec_enq(&bus->txq, pkt, prec)) {
>>>>>>>                 skb_pull(pkt, bus->tx_hdrlen);
>>>>>>> -               brcmf_err("out of bus->txq !!!\n");
>>>>>>> +               brcmf_dbg(INFO, "out of bus->txq !!!\n");
>>>>
>>>> Now that I understand the issue I want to keep this as error print as it
>>>> should be very unlikely.
>>> I would like to test this patch with the watermark fix to confirm this.
>>
>> Can you try this?
>>
>> Regards,
>> Arend
>>
>> ---
>> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
>> b/drive
>> index cd221ab..9f9024a 100644
>> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
>> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
>> @@ -2469,10 +2469,22 @@ void brcmf_fws_bustxfail(struct brcmf_fws_info
>> *fws, str
>>  void brcmf_fws_bus_blocked(struct brcmf_pub *drvr, bool flow_blocked)
>>  {
>>         struct brcmf_fws_info *fws = drvr->fws;
>> +       struct brcmf_if *ifp;
>> +       int i;
>>
>> -       fws->bus_flow_blocked = flow_blocked;
>> -       if (!flow_blocked)
>> -               brcmf_fws_schedule_deq(fws);
>> -       else
>> -               fws->stats.bus_flow_block++;
>> +       if (fws->avoid_queueing) {
>> +               for (i = 0; i < BRCMF_MAX_IFS; i++) {
>> +                       ifp = drvr->iflist[i];
>> +                       if (!ifp || !ifp->ndev)
>> +                               continue;
>> +                       brcmf_txflowblock_if(ifp,
>> BRCMF_NETIF_STOP_REASON_FLOW,
>> +                                            flow_blocked);
>> +               }
>> +       } else {
>> +               fws->bus_flow_blocked = flow_blocked;
>> +               if (!flow_blocked)
>> +                       brcmf_fws_schedule_deq(fws);
>> +               else
>> +                       fws->stats.bus_flow_block++;
>> +       }
>>  }
>>
> Thanks for the code. I run a quick test and it looks fine.
> I added some prints to check the progress:
> len - is number of items in the queue
> max -  is max number of entries in the queue
> 
> [   89.407856] [brcmf_sdio_prec_enq] prec 2 prio 0 len 1789 max 2048
> [   89.414682] [brcmf_sdio_prec_enq] prec 2 prio 0 len 1790 max 2048
> [   89.421497] [brcmf_sdio_prec_enq] prec 2 prio 0 len 1791 max 2048
> [   93.970466] [brcmf_sdio_prec_enq] prec 2 prio 0 len 1524 max 2048
> [   93.977520] [brcmf_sdio_prec_enq] prec 2 prio 0 len 1525 max 2048
> [   93.984572] [brcmf_sdio_prec_enq] prec 2 prio 0 len 1526 max 2048
> 
> I will some run more WLAN tests tomorrow to make sure.
> When I'm done testing I will update my patch as well and let you know.
> 
> I came across this issue when I tried to connect to my WLAN access
> point. The bug was triggered due to a lot of background traffic
> (broadcast and multicast) in the network filling up the queue.
> It's now fixed so that the queue is not flooded. Now I move on to the
> next issue. It's still difficult to connect because the background
> eats up all the bandwidth (this is not a constant issue but it happens
> from time to time depending on the background load).
> For queueing-mode there seems to exist logic for handling multicast
> traffic separately but for the SDIO case (no-queueing) normal traffic
> gets same precedence as multicast traffic.

I am a bit confused by the term "background traffic". There is a
specific WMM-AC_BK that I would refer to as background traffic. Most
traffic is set to AC_BE regardless of it being unicast or bc/mc.
Changing the priority like that for multicast is a very bad idea.

> I'm considering something like this:
> --- a/drivers/net/wireless/brcm80211/brcmfmac/fwsignal.c
> +++ b/drivers/net/wireless/brcm80211/brcmfmac/fwsignal.c
> @@ -1900,6 +1902,8 @@ int brcmf_fws_process_skb(struct brcmf_if *ifp,
> struct sk_buff *skb)
>         drvr->tx_multicast += !!multicast;
> 
>         if (fws->avoid_queueing) {
> +               if (multicast)
> +                       skb->priority = PRIO_8021D_NONE;
>                 rc = brcmf_proto_txdata(drvr, ifp->ifidx, 0, skb);
>                 if (rc < 0) {
>                         (void)brcmf_proto_hdrpull(drvr, false, skb, NULL);
> 
> It feels a little hacky.
> 
> I would like to use the drvr->tx_multicast instead, if possible.
> The problem is that the "drvr" struct is not passed down to the sdio layer.
> One could update bcdc.c : brcmf_proto_bcdc_txdata() to pass
> "drcr->multicast" to the SDIO layer but not all bus implementations
> need this information (USB for instance)
> 
> Any suggestions?

I suspect that we fixed your issue with commit 92121e69de8a ("brcmfmac:
Properly set carrier state of netdev."). That is if your issue was
caused on an older kernel.

Regards,
Arend
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
per.forlin@gmail.com July 14, 2016, 2:31 p.m. UTC | #2
2016-07-14 10:57 GMT+02:00 Arend Van Spriel <arend.vanspriel@broadcom.com>:
> On 13-7-2016 20:52, Per Förlin wrote:
>> 2016-07-13 13:20 GMT+02:00 Arend Van Spriel <arend.vanspriel@broadcom.com>:
>>> On 12-7-2016 12:23, Per Förlin wrote:
>>>> 2016-07-12 11:48 GMT+02:00 Arend Van Spriel <arend.vanspriel@broadcom.com>:
>>>>>
>>>>>
>>>>> On 12-7-2016 10:35, Per Förlin wrote:
>>>>>> 2016-07-06 11:53 GMT+02:00 Per Förlin <per.forlin@gmail.com>:
>>>>>>> I have now verified this patch on backports 4.4.
>>>>>>>
>>>>>>> 2016-04-12 23:55 GMT+02:00  <per.forlin@gmail.com>:
>>>>>>>> From: Per Forlin <per.forlin@gmail.com>
>>>>>>>>
>>>>>>>> This patch resolves an issue where pend_8021x_cnt was not decreased
>>>>>>>> on txfinalize. This caused brcmf_netdev_wait_pend8021x to timeout
>>>>>>>> because the counter indicated pending packets.
>>>>>>>>
>>>>>>>> WARNING: at .../brcmfmac/core.c:1289 brcmf_netdev_wait_pend8021x
>>>>>>>>   (warn_slowpath_common)
>>>>>>>>   (warn_slowpath_null)
>>>>>>>>   (brcmf_netdev_wait_pend8021x [brcmfmac])
>>>>>>>>   (send_key_to_dongle [brcmfmac])
>>>>>>>>   (brcmf_cfg80211_del_key [brcmfmac])
>>>>>>>>   (nl80211_del_key [cfg80211])
>>>>>>>>   (genl_rcv_msg)
>>>>>>>>   (netlink_rcv_skb)
>>>>>>>>   (genl_rcv)
>>>>>>>>   (netlink_unicast)
>>>>>>>>   (netlink_sendmsg)
>>>>>>>>   (sock_sendmsg)
>>>>>>>>   (___sys_sendmsg)
>>>>>>>>   (__sys_sendmsg)
>>>>>>>>   (SyS_sendmsg)
>>>>>>>>
>>>>>>>> The solution is to pull back the header offset in case
>>>>>>>> of an error in txdata(), which may happen in case of
>>>>>> Clarification:
>>>>>>
>>>>>> txdata=brcmf_proto_bcdc_txdata()
>>>>>> brcmf_proto_bcdc_txdata(): Calls brcmf_proto_bcdc_hdrpush()
>>>>>>
>>>>>> The header needs to be pulled back in case of error otherwise
>>>>>> the error handling and cleanup up will fail to decrease the counter
>>>>>> of pending packets.
>>>>>
>>>>> Yes, this part of the patch is clear to me.
>>>>>
>>>> Thanks, I wasn't sure.
>>>>
>>>>>>>> packet overload in brcmf_sdio_bus_txdata.
>>>>>>>>
>>>>>>>> Overloading an WLAN interface is not an unlikely scenario.
>>>>>
>>>>> So here is where things start to look suspicious and I have mentioned
>>>>> this before. My thought here was "How the hell can you end up with a
>>>>> 2048 packets on the sdio queue", which I mentioned to you before. There
>>>>> is a high watermark on the queue upon which we do a netif_stop_queue()
>>>>> so network layer does not keep pushing tx packets our way. Looking
>>>>> further into this I found that we introduced a bug with commit
>>>>> 9cd18359d31e ("brcmfmac: Make FWS queueing configurable.") so we ended
>>>>> up doing nothing except increasing as statistics debug counter :-(
>>>>>
>>>> Is there a fix available for the high watermark issue or is it
>>>> something you will look into?
>>>>
>>>> To produce a load on the wlan interface I run
>>>> iperf -c 239.255.1.3 -u -b 10m -f m -i 60 -t 3000
>>>> and this is enough in my case to fill up the 2048 queue.
>>>>
>>>>>>>> In case of packet overload the error print "out of bus->txq"
>>>>>>>> is very verbose. To reduce SPAM degrade it to a debug print.
>>>>>>>>
>>>>>>>> Signed-off-by: Per Forlin <per.forlin@gmail.com>
>>>>>>>> ---
>>>>>>>> Change log:
>>>>>>>>  v2 - Add variable to know whether the counter is increased or not
>>>>>>>>  v3 - txfinalize should decrease the counter. Adjust skb header offset
>>>>>>>>  v4 - Fix build error
>>>>>>>>
>>>>>>>> drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c     | 4 ++++
>>>>>>>>  drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c | 4 +++-
>>>>>>>>  drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c     | 2 +-
>>>>>>>>  3 files changed, 8 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
>>>>>>>> index ed9998b..f342f7c 100644
>>>>>>>> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
>>>>>>>> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
>>>>>>>> @@ -541,6 +541,9 @@ void brcmf_txfinalize(struct brcmf_if *ifp, struct sk_buff *txp, bool success)
>>>>>>>>         struct ethhdr *eh;
>>>>>>>>         u16 type;
>>>>>>>>
>>>>>>>> +       if (!ifp)
>>>>>>>> +               goto free;
>>>>>>>> +
>>>>>
>>>>> This may not be needed.
>>>>>
>>>> This is not strictly needed. I can remove it.
>>>>
>>>>>>>>         eh = (struct ethhdr *)(txp->data);
>>>>>>>>         type = ntohs(eh->h_proto);
>>>>>>>>
>>>>>>>> @@ -553,6 +556,7 @@ void brcmf_txfinalize(struct brcmf_if *ifp, struct sk_buff *txp, bool success)
>>>>>>>>         if (!success)
>>>>>>>>                 ifp->stats.tx_errors++;
>>>>>>>>
>>>>>>>> +free:
>>>>>>>>         brcmu_pkt_buf_free_skb(txp);
>>>>>>>>  }
>>>>>>>>
>>>>>>>> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
>>>>>>>> index f82c9ab..98cb83f 100644
>>>>>>>> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
>>>>>>>> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
>>>>>>>> @@ -1899,8 +1899,10 @@ int brcmf_fws_process_skb(struct brcmf_if *ifp, struct sk_buff *skb)
>>>>>>>>
>>>>>>>>         if (fws->avoid_queueing) {
>>>>>>>>                 rc = brcmf_proto_txdata(drvr, ifp->ifidx, 0, skb);
>>>>>>>> -               if (rc < 0)
>>>>>>>> +               if (rc < 0) {
>>>>>>>> +                       (void)brcmf_proto_hdrpull(drvr, false, skb, &ifp);
>>>>>
>>>>> Could it be that the ifp is NULL pointer after brcmf_proto_hdrpull().
>>>>> Can you check. Better use tmp_ifp variable in this call as you have a
>>>>> valid ifp before this call for sure.
>>>>>
>>>> To be on the safe side I can use NULL as in param like you propose,
>>>> and use the available ifp.
>>>>
>>>>>>>>                         brcmf_txfinalize(ifp, skb, false);
>>>>>>>> +               }
>>>>>>>>                 return rc;
>>>>>>>>         }
>>>>>>>>
>>>>>>>> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
>>>>>>>> index a14d9d9d..485e2ad 100644
>>>>>>>> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
>>>>>>>> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
>>>>>>>> @@ -2721,7 +2721,7 @@ static int brcmf_sdio_bus_txdata(struct device *dev, struct sk_buff *pkt)
>>>>>>>>         *(u16 *)(pkt->cb) = 0;
>>>>>>>>         if (!brcmf_sdio_prec_enq(&bus->txq, pkt, prec)) {
>>>>>>>>                 skb_pull(pkt, bus->tx_hdrlen);
>>>>>>>> -               brcmf_err("out of bus->txq !!!\n");
>>>>>>>> +               brcmf_dbg(INFO, "out of bus->txq !!!\n");
>>>>>
>>>>> Now that I understand the issue I want to keep this as error print as it
>>>>> should be very unlikely.
>>>> I would like to test this patch with the watermark fix to confirm this.
>>>
>>> Can you try this?
>>>
>>> Regards,
>>> Arend
>>>
>>> ---
>>> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
>>> b/drive
>>> index cd221ab..9f9024a 100644
>>> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
>>> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
>>> @@ -2469,10 +2469,22 @@ void brcmf_fws_bustxfail(struct brcmf_fws_info
>>> *fws, str
>>>  void brcmf_fws_bus_blocked(struct brcmf_pub *drvr, bool flow_blocked)
>>>  {
>>>         struct brcmf_fws_info *fws = drvr->fws;
>>> +       struct brcmf_if *ifp;
>>> +       int i;
>>>
>>> -       fws->bus_flow_blocked = flow_blocked;
>>> -       if (!flow_blocked)
>>> -               brcmf_fws_schedule_deq(fws);
>>> -       else
>>> -               fws->stats.bus_flow_block++;
>>> +       if (fws->avoid_queueing) {
>>> +               for (i = 0; i < BRCMF_MAX_IFS; i++) {
>>> +                       ifp = drvr->iflist[i];
>>> +                       if (!ifp || !ifp->ndev)
>>> +                               continue;
>>> +                       brcmf_txflowblock_if(ifp,
>>> BRCMF_NETIF_STOP_REASON_FLOW,
>>> +                                            flow_blocked);
>>> +               }
>>> +       } else {
>>> +               fws->bus_flow_blocked = flow_blocked;
>>> +               if (!flow_blocked)
>>> +                       brcmf_fws_schedule_deq(fws);
>>> +               else
>>> +                       fws->stats.bus_flow_block++;
>>> +       }
>>>  }
>>>
>> Thanks for the code. I run a quick test and it looks fine.
>> I added some prints to check the progress:
>> len - is number of items in the queue
>> max -  is max number of entries in the queue
>>
>> [   89.407856] [brcmf_sdio_prec_enq] prec 2 prio 0 len 1789 max 2048
>> [   89.414682] [brcmf_sdio_prec_enq] prec 2 prio 0 len 1790 max 2048
>> [   89.421497] [brcmf_sdio_prec_enq] prec 2 prio 0 len 1791 max 2048
>> [   93.970466] [brcmf_sdio_prec_enq] prec 2 prio 0 len 1524 max 2048
>> [   93.977520] [brcmf_sdio_prec_enq] prec 2 prio 0 len 1525 max 2048
>> [   93.984572] [brcmf_sdio_prec_enq] prec 2 prio 0 len 1526 max 2048
>>
>> I will some run more WLAN tests tomorrow to make sure.
>> When I'm done testing I will update my patch as well and let you know.
>>
>> I came across this issue when I tried to connect to my WLAN access
>> point. The bug was triggered due to a lot of background traffic
>> (broadcast and multicast) in the network filling up the queue.
>> It's now fixed so that the queue is not flooded. Now I move on to the
>> next issue. It's still difficult to connect because the background
>> eats up all the bandwidth (this is not a constant issue but it happens
>> from time to time depending on the background load).
>> For queueing-mode there seems to exist logic for handling multicast
>> traffic separately but for the SDIO case (no-queueing) normal traffic
>> gets same precedence as multicast traffic.
>
> I am a bit confused by the term "background traffic". There is a
> specific WMM-AC_BK that I would refer to as background traffic. Most
> traffic is set to AC_BE regardless of it being unicast or bc/mc.
> Changing the priority like that for multicast is a very bad idea.
>
Sorry for using "background traffic", I simply mean broadcast and
multicast traffic.
Normally broadcast traffic and multicast traffic are less time
critical. In my case
the broadcast traffic prevents clients to connect to the access point.
The connection traffic and the broadcast traffic have the same prio.

I will consider other options to improve this. I going away for a
couple of weeks now
and I hope to pick up this work again in mid August.

>> I'm considering something like this:
>> --- a/drivers/net/wireless/brcm80211/brcmfmac/fwsignal.c
>> +++ b/drivers/net/wireless/brcm80211/brcmfmac/fwsignal.c
>> @@ -1900,6 +1902,8 @@ int brcmf_fws_process_skb(struct brcmf_if *ifp,
>> struct sk_buff *skb)
>>         drvr->tx_multicast += !!multicast;
>>
>>         if (fws->avoid_queueing) {
>> +               if (multicast)
>> +                       skb->priority = PRIO_8021D_NONE;
>>                 rc = brcmf_proto_txdata(drvr, ifp->ifidx, 0, skb);
>>                 if (rc < 0) {
>>                         (void)brcmf_proto_hdrpull(drvr, false, skb, NULL);
>>
>> It feels a little hacky.
>>
>> I would like to use the drvr->tx_multicast instead, if possible.
>> The problem is that the "drvr" struct is not passed down to the sdio layer.
>> One could update bcdc.c : brcmf_proto_bcdc_txdata() to pass
>> "drcr->multicast" to the SDIO layer but not all bus implementations
>> need this information (USB for instance)
>>
>> Any suggestions?
>
> I suspect that we fixed your issue with commit 92121e69de8a ("brcmfmac:
> Properly set carrier state of netdev."). That is if your issue was
> caused on an older kernel.
I'm running on 4.4.6 so I should have this patch in place already.

About the watermark fix.
It looks like the fix makes the eviction handling in sdio.c
brcmf_sdio_prec_enq()
obsolete. The function evicts packets with lower prio to make way for
packets with
higher prio. This only kicks in when the queue is full.

if (pktq_pfull(q, prec) || pktq_full(q))
  // Give precedence
If pktq_full(q) is not full, pktq_pfull(q, prec) will not get full either

With the watermark-fix in place the queue never gets full.

>
> Regards,
> Arend
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/drivers/net/wireless/brcm80211/brcmfmac/fwsignal.c
+++ b/drivers/net/wireless/brcm80211/brcmfmac/fwsignal.c
@@ -1900,6 +1902,8 @@  int brcmf_fws_process_skb(struct brcmf_if *ifp,
struct sk_buff *skb)
        drvr->tx_multicast += !!multicast;

        if (fws->avoid_queueing) {
+               if (multicast)
+                       skb->priority = PRIO_8021D_NONE;
                rc = brcmf_proto_txdata(drvr, ifp->ifidx, 0, skb);
                if (rc < 0) {