diff mbox

[net] brcmfmac: clear skb head state on xmit

Message ID e5bb3c8e8be4417cc648cfb53add2c89d2c89aa9.1486486128.git.pabeni@redhat.com (mailing list archive)
State Changes Requested
Delegated to: Kalle Valo
Headers show

Commit Message

Paolo Abeni Feb. 7, 2017, 4:50 p.m. UTC
the skbs can be held by the driver for a long time, so we need
to clear any state on xmit to avoid hanging other subsystems.
The skbs are already orphaned later in cmsg code, so we just
need to clear the nf/dst/secpath.
Do it early, while the relevant entries are hopefully still
hot in the cache.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Arend van Spriel Feb. 7, 2017, 7:23 p.m. UTC | #1
On 7-2-2017 17:50, Paolo Abeni wrote:
> the skbs can be held by the driver for a long time, so we need
> to clear any state on xmit to avoid hanging other subsystems.
> The skbs are already orphaned later in cmsg code, so we just
> need to clear the nf/dst/secpath.
> Do it early, while the relevant entries are hopefully still
> hot in the cache.

What is this about really? A bit more background about the issue might
help understanding the need for this patch. Is this really specific to
brcmfmac. For instance is something similar already done in mac80211?

Regards,
Arend

> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
> index 9e6f60a..5a8d57b 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
> @@ -22,6 +22,7 @@
>  #include <net/rtnetlink.h>
>  #include <net/addrconf.h>
>  #include <net/ipv6.h>
> +#include <net/xfrm.h>
>  #include <brcmu_utils.h>
>  #include <brcmu_wifi.h>
>  
> @@ -243,6 +244,13 @@ static netdev_tx_t brcmf_netdev_start_xmit(struct sk_buff *skb,
>  	if ((skb->priority == 0) || (skb->priority > 7))
>  		skb->priority = cfg80211_classify8021d(skb, NULL);
>  
> +	/* we can keep the skb for a long time; avoid starving other
> +	 * subsystems
> +	 */
> +	nf_reset(skb);
> +	skb_dst_drop(skb);
> +	secpath_reset(skb);
> +
>  	ret = brcmf_proto_tx_queue_data(drvr, ifp->ifidx, skb);
>  	if (ret < 0)
>  		brcmf_txfinalize(ifp, skb, false);
>
Paolo Abeni Feb. 8, 2017, 8:38 a.m. UTC | #2
On Tue, 2017-02-07 at 20:23 +0100, Arend Van Spriel wrote:
> On 7-2-2017 17:50, Paolo Abeni wrote:
> > the skbs can be held by the driver for a long time, so we need
> > to clear any state on xmit to avoid hanging other subsystems.
> > The skbs are already orphaned later in cmsg code, so we just
> > need to clear the nf/dst/secpath.
> > Do it early, while the relevant entries are hopefully still
> > hot in the cache.
> 
> What is this about really? A bit more background about the issue
> might
> help understanding the need for this patch. Is this really specific
> to
> brcmfmac. For instance is something similar already done in mac80211?

The issue is apparently driver specific, as reported in:

https://bugzilla.redhat.com/show_bug.cgi?id=1294415

This is caused by xmit skbs carrying a notrack ct entry not being freed
by the device driver in a timely manner. Removing the ct module waits
for such entries refcount going to zero and hangs the kernel in busy
loop (for several minutes).

The relevant skbs are icmp6 packets (ND if I recall correctly, they
bcast packets at the mac level).

The only other known device driver suffering for the issue is the
infiniband ipoib driver, I send a separate patch for it.

I lack the broadcom h/w, but with infiniband the bug can be reproduced
with the following steps:

- ensure ipv6 is enabled on the target device, and firewalld is running
(e.g. the module nf_conntrack_ipv6 is loaded)
- assign a static ip to the device
- shut down the firewall (e.g. try to remove the module nf_conntrack)

For the brcmfmac driver most probably it is necessary being
disassociated from the AP before shutting down the firewall (but I
can't double check). This is probably why mac80211 does not suffer this
issue.

The root cause for the issue could be actually a firmware issue, any
better clues are more than welcome!

Thank you,

Paolo
Rafał Miłecki Feb. 8, 2017, 8:52 a.m. UTC | #3
On 8 February 2017 at 09:38, Paolo Abeni <pabeni@redhat.com> wrote:
> On Tue, 2017-02-07 at 20:23 +0100, Arend Van Spriel wrote:
>> On 7-2-2017 17:50, Paolo Abeni wrote:
>> > the skbs can be held by the driver for a long time, so we need
>> > to clear any state on xmit to avoid hanging other subsystems.
>> > The skbs are already orphaned later in cmsg code, so we just
>> > need to clear the nf/dst/secpath.
>> > Do it early, while the relevant entries are hopefully still
>> > hot in the cache.
>>
>> What is this about really? A bit more background about the issue
>> might
>> help understanding the need for this patch. Is this really specific
>> to
>> brcmfmac. For instance is something similar already done in mac80211?
>
> The issue is apparently driver specific, as reported in:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1294415
>
> This is caused by xmit skbs carrying a notrack ct entry not being freed
> by the device driver in a timely manner. Removing the ct module waits
> for such entries refcount going to zero and hangs the kernel in busy
> loop (for several minutes).
>
> The relevant skbs are icmp6 packets (ND if I recall correctly, they
> bcast packets at the mac level).
>
> The only other known device driver suffering for the issue is the
> infiniband ipoib driver, I send a separate patch for it.
>
> I lack the broadcom h/w, but with infiniband the bug can be reproduced
> with the following steps:
>
> - ensure ipv6 is enabled on the target device, and firewalld is running
> (e.g. the module nf_conntrack_ipv6 is loaded)
> - assign a static ip to the device
> - shut down the firewall (e.g. try to remove the module nf_conntrack)
>
> For the brcmfmac driver most probably it is necessary being
> disassociated from the AP before shutting down the firewall (but I
> can't double check). This is probably why mac80211 does not suffer this
> issue.
>
> The root cause for the issue could be actually a firmware issue, any
> better clues are more than welcome!

Do I get this correctly brcmfmac gets some skb for transmitting it and
doesn't free it for few minutes? It sounds like some bug that should
be fixed instead of hiding it.
Paolo Abeni Feb. 8, 2017, 9:29 a.m. UTC | #4
On Wed, 2017-02-08 at 09:52 +0100, Rafał Miłecki wrote:
> On 8 February 2017 at 09:38, Paolo Abeni <pabeni@redhat.com> wrote:
> > On Tue, 2017-02-07 at 20:23 +0100, Arend Van Spriel wrote:
> > > On 7-2-2017 17:50, Paolo Abeni wrote:
> > > > the skbs can be held by the driver for a long time, so we need
> > > > to clear any state on xmit to avoid hanging other subsystems.
> > > > The skbs are already orphaned later in cmsg code, so we just
> > > > need to clear the nf/dst/secpath.
> > > > Do it early, while the relevant entries are hopefully still
> > > > hot in the cache.
> > > 
> > > What is this about really? A bit more background about the issue
> > > might
> > > help understanding the need for this patch. Is this really
> > > specific
> > > to
> > > brcmfmac. For instance is something similar already done in
> > > mac80211?
> > 
> > The issue is apparently driver specific, as reported in:
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=1294415
> > 
> > This is caused by xmit skbs carrying a notrack ct entry not being
> > freed
> > by the device driver in a timely manner. Removing the ct module
> > waits
> > for such entries refcount going to zero and hangs the kernel in
> > busy
> > loop (for several minutes).
> > 
> > The relevant skbs are icmp6 packets (ND if I recall correctly, they
> > bcast packets at the mac level).
> > 
> > The only other known device driver suffering for the issue is the
> > infiniband ipoib driver, I send a separate patch for it.
> > 
> > I lack the broadcom h/w, but with infiniband the bug can be
> > reproduced
> > with the following steps:
> > 
> > - ensure ipv6 is enabled on the target device, and firewalld is
> > running
> > (e.g. the module nf_conntrack_ipv6 is loaded)
> > - assign a static ip to the device
> > - shut down the firewall (e.g. try to remove the module
> > nf_conntrack)
> > 
> > For the brcmfmac driver most probably it is necessary being
> > disassociated from the AP before shutting down the firewall (but I
> > can't double check). This is probably why mac80211 does not suffer
> > this
> > issue.
> > 
> > The root cause for the issue could be actually a firmware issue,
> > any
> > better clues are more than welcome!
> 
> Do I get this correctly brcmfmac gets some skb for transmitting it
> and
> doesn't free it for few minutes? It sounds like some bug that should
> be fixed instead of hiding it.

I mostly agreed, but please also note that early clearing the skb head
state makes sense from a performance pov: we can do the costly,
required, atomic operations while the data is still hot in the cpu L1
cache.

I'm unable to find anything obvious at the driver level, and I think
the root cause could be in the firmware. I hope some more knowledgeable
than me on this topics can have a better look.

Thank you,

Paolo
Kalle Valo Feb. 8, 2017, 10:27 a.m. UTC | #5
Paolo Abeni <pabeni@redhat.com> writes:

> On Tue, 2017-02-07 at 20:23 +0100, Arend Van Spriel wrote:
>> On 7-2-2017 17:50, Paolo Abeni wrote:
>> > the skbs can be held by the driver for a long time, so we need
>> > to clear any state on xmit to avoid hanging other subsystems.
>> > The skbs are already orphaned later in cmsg code, so we just
>> > need to clear the nf/dst/secpath.
>> > Do it early, while the relevant entries are hopefully still
>> > hot in the cache.
>> 
>> What is this about really? A bit more background about the issue
>> might
>> help understanding the need for this patch. Is this really specific
>> to
>> brcmfmac. For instance is something similar already done in mac80211?
>
> The issue is apparently driver specific, as reported in:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1294415
>
> This is caused by xmit skbs carrying a notrack ct entry not being freed
> by the device driver in a timely manner. Removing the ct module waits
> for such entries refcount going to zero and hangs the kernel in busy
> loop (for several minutes).
>
> The relevant skbs are icmp6 packets (ND if I recall correctly, they
> bcast packets at the mac level).
>
> The only other known device driver suffering for the issue is the
> infiniband ipoib driver, I send a separate patch for it.
>
> I lack the broadcom h/w, but with infiniband the bug can be reproduced
> with the following steps:
>
> - ensure ipv6 is enabled on the target device, and firewalld is running
> (e.g. the module nf_conntrack_ipv6 is loaded)
> - assign a static ip to the device
> - shut down the firewall (e.g. try to remove the module nf_conntrack)
>
> For the brcmfmac driver most probably it is necessary being
> disassociated from the AP before shutting down the firewall (but I
> can't double check). This is probably why mac80211 does not suffer this
> issue.
>
> The root cause for the issue could be actually a firmware issue, any
> better clues are more than welcome!

BTW, you should have added all this to the commit log. After reading the
original description it left more questions open than answered them.
Paolo Abeni Feb. 8, 2017, 10:34 a.m. UTC | #6
On Wed, 2017-02-08 at 12:27 +0200, Kalle Valo wrote:
> Paolo Abeni <pabeni@redhat.com> writes:
> 
> > On Tue, 2017-02-07 at 20:23 +0100, Arend Van Spriel wrote:
> > > On 7-2-2017 17:50, Paolo Abeni wrote:
> > > > the skbs can be held by the driver for a long time, so we need
> > > > to clear any state on xmit to avoid hanging other subsystems.
> > > > The skbs are already orphaned later in cmsg code, so we just
> > > > need to clear the nf/dst/secpath.
> > > > Do it early, while the relevant entries are hopefully still
> > > > hot in the cache.
> > > 
> > > What is this about really? A bit more background about the issue
> > > might
> > > help understanding the need for this patch. Is this really specific
> > > to
> > > brcmfmac. For instance is something similar already done in mac80211?
> > 
> > The issue is apparently driver specific, as reported in:
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=1294415
> > 
> > This is caused by xmit skbs carrying a notrack ct entry not being freed
> > by the device driver in a timely manner. Removing the ct module waits
> > for such entries refcount going to zero and hangs the kernel in busy
> > loop (for several minutes).
> > 
> > The relevant skbs are icmp6 packets (ND if I recall correctly, they
> > bcast packets at the mac level).
> > 
> > The only other known device driver suffering for the issue is the
> > infiniband ipoib driver, I send a separate patch for it.
> > 
> > I lack the broadcom h/w, but with infiniband the bug can be reproduced
> > with the following steps:
> > 
> > - ensure ipv6 is enabled on the target device, and firewalld is running
> > (e.g. the module nf_conntrack_ipv6 is loaded)
> > - assign a static ip to the device
> > - shut down the firewall (e.g. try to remove the module nf_conntrack)
> > 
> > For the brcmfmac driver most probably it is necessary being
> > disassociated from the AP before shutting down the firewall (but I
> > can't double check). This is probably why mac80211 does not suffer this
> > issue.
> > 
> > The root cause for the issue could be actually a firmware issue, any
> > better clues are more than welcome!
> 
> BTW, you should have added all this to the commit log. After reading the
> original description it left more questions open than answered them.

I'm sorry for the lack of clarity, thank you for the advice, will do
next time. 

I tried to keep the commit message short since others reviewer in other
area of the kernel really prefer shorter ones.

Cheers,

Paolo
Arend van Spriel Feb. 8, 2017, 10:43 a.m. UTC | #7
On 8-2-2017 10:29, Paolo Abeni wrote:
> On Wed, 2017-02-08 at 09:52 +0100, Rafał Miłecki wrote:
>> On 8 February 2017 at 09:38, Paolo Abeni <pabeni@redhat.com> wrote:
>>> On Tue, 2017-02-07 at 20:23 +0100, Arend Van Spriel wrote:
>>>> On 7-2-2017 17:50, Paolo Abeni wrote:
>>>>> the skbs can be held by the driver for a long time, so we need
>>>>> to clear any state on xmit to avoid hanging other subsystems.
>>>>> The skbs are already orphaned later in cmsg code, so we just
>>>>> need to clear the nf/dst/secpath.
>>>>> Do it early, while the relevant entries are hopefully still
>>>>> hot in the cache.
>>>>
>>>> What is this about really? A bit more background about the issue
>>>> might
>>>> help understanding the need for this patch. Is this really
>>>> specific
>>>> to
>>>> brcmfmac. For instance is something similar already done in
>>>> mac80211?
>>>
>>> The issue is apparently driver specific, as reported in:
>>>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1294415
>>>
>>> This is caused by xmit skbs carrying a notrack ct entry not being
>>> freed
>>> by the device driver in a timely manner. Removing the ct module
>>> waits
>>> for such entries refcount going to zero and hangs the kernel in
>>> busy
>>> loop (for several minutes).
>>>
>>> The relevant skbs are icmp6 packets (ND if I recall correctly, they
>>> bcast packets at the mac level).
>>>
>>> The only other known device driver suffering for the issue is the
>>> infiniband ipoib driver, I send a separate patch for it.
>>>
>>> I lack the broadcom h/w, but with infiniband the bug can be
>>> reproduced
>>> with the following steps:
>>>
>>> - ensure ipv6 is enabled on the target device, and firewalld is
>>> running
>>> (e.g. the module nf_conntrack_ipv6 is loaded)
>>> - assign a static ip to the device
>>> - shut down the firewall (e.g. try to remove the module
>>> nf_conntrack)
>>>
>>> For the brcmfmac driver most probably it is necessary being
>>> disassociated from the AP before shutting down the firewall (but I
>>> can't double check). This is probably why mac80211 does not suffer
>>> this
>>> issue.
>>>
>>> The root cause for the issue could be actually a firmware issue,
>>> any
>>> better clues are more than welcome!
>>
>> Do I get this correctly brcmfmac gets some skb for transmitting it
>> and
>> doesn't free it for few minutes? It sounds like some bug that should
>> be fixed instead of hiding it.
> 
> I mostly agreed, but please also note that early clearing the skb head
> state makes sense from a performance pov: we can do the costly,
> required, atomic operations while the data is still hot in the cpu L1
> cache.
> 
> I'm unable to find anything obvious at the driver level, and I think
> the root cause could be in the firmware. I hope some more knowledgeable
> than me on this topics can have a better look.

Hi Paolo,

I agree with Rafał that several minutes for an skb to be freed is a red
flag. Now I know our driver and firmware but not played to much with
IPv6 and firewalls. Hopefully I have all the netfilter stuff in my
kernel when I try to reproduce it over here.

Regards,
Arend
diff mbox

Patch

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
index 9e6f60a..5a8d57b 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
@@ -22,6 +22,7 @@ 
 #include <net/rtnetlink.h>
 #include <net/addrconf.h>
 #include <net/ipv6.h>
+#include <net/xfrm.h>
 #include <brcmu_utils.h>
 #include <brcmu_wifi.h>
 
@@ -243,6 +244,13 @@  static netdev_tx_t brcmf_netdev_start_xmit(struct sk_buff *skb,
 	if ((skb->priority == 0) || (skb->priority > 7))
 		skb->priority = cfg80211_classify8021d(skb, NULL);
 
+	/* we can keep the skb for a long time; avoid starving other
+	 * subsystems
+	 */
+	nf_reset(skb);
+	skb_dst_drop(skb);
+	secpath_reset(skb);
+
 	ret = brcmf_proto_tx_queue_data(drvr, ifp->ifidx, skb);
 	if (ret < 0)
 		brcmf_txfinalize(ifp, skb, false);