Message ID | e5bb3c8e8be4417cc648cfb53add2c89d2c89aa9.1486486128.git.pabeni@redhat.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Kalle Valo |
Headers | show |
On 7-2-2017 17:50, Paolo Abeni wrote: > the skbs can be held by the driver for a long time, so we need > to clear any state on xmit to avoid hanging other subsystems. > The skbs are already orphaned later in cmsg code, so we just > need to clear the nf/dst/secpath. > Do it early, while the relevant entries are hopefully still > hot in the cache. What is this about really? A bit more background about the issue might help understanding the need for this patch. Is this really specific to brcmfmac. For instance is something similar already done in mac80211? Regards, Arend > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > --- > drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c > index 9e6f60a..5a8d57b 100644 > --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c > +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c > @@ -22,6 +22,7 @@ > #include <net/rtnetlink.h> > #include <net/addrconf.h> > #include <net/ipv6.h> > +#include <net/xfrm.h> > #include <brcmu_utils.h> > #include <brcmu_wifi.h> > > @@ -243,6 +244,13 @@ static netdev_tx_t brcmf_netdev_start_xmit(struct sk_buff *skb, > if ((skb->priority == 0) || (skb->priority > 7)) > skb->priority = cfg80211_classify8021d(skb, NULL); > > + /* we can keep the skb for a long time; avoid starving other > + * subsystems > + */ > + nf_reset(skb); > + skb_dst_drop(skb); > + secpath_reset(skb); > + > ret = brcmf_proto_tx_queue_data(drvr, ifp->ifidx, skb); > if (ret < 0) > brcmf_txfinalize(ifp, skb, false); >
On Tue, 2017-02-07 at 20:23 +0100, Arend Van Spriel wrote: > On 7-2-2017 17:50, Paolo Abeni wrote: > > the skbs can be held by the driver for a long time, so we need > > to clear any state on xmit to avoid hanging other subsystems. > > The skbs are already orphaned later in cmsg code, so we just > > need to clear the nf/dst/secpath. > > Do it early, while the relevant entries are hopefully still > > hot in the cache. > > What is this about really? A bit more background about the issue > might > help understanding the need for this patch. Is this really specific > to > brcmfmac. For instance is something similar already done in mac80211? The issue is apparently driver specific, as reported in: https://bugzilla.redhat.com/show_bug.cgi?id=1294415 This is caused by xmit skbs carrying a notrack ct entry not being freed by the device driver in a timely manner. Removing the ct module waits for such entries refcount going to zero and hangs the kernel in busy loop (for several minutes). The relevant skbs are icmp6 packets (ND if I recall correctly, they bcast packets at the mac level). The only other known device driver suffering for the issue is the infiniband ipoib driver, I send a separate patch for it. I lack the broadcom h/w, but with infiniband the bug can be reproduced with the following steps: - ensure ipv6 is enabled on the target device, and firewalld is running (e.g. the module nf_conntrack_ipv6 is loaded) - assign a static ip to the device - shut down the firewall (e.g. try to remove the module nf_conntrack) For the brcmfmac driver most probably it is necessary being disassociated from the AP before shutting down the firewall (but I can't double check). This is probably why mac80211 does not suffer this issue. The root cause for the issue could be actually a firmware issue, any better clues are more than welcome! Thank you, Paolo
On 8 February 2017 at 09:38, Paolo Abeni <pabeni@redhat.com> wrote: > On Tue, 2017-02-07 at 20:23 +0100, Arend Van Spriel wrote: >> On 7-2-2017 17:50, Paolo Abeni wrote: >> > the skbs can be held by the driver for a long time, so we need >> > to clear any state on xmit to avoid hanging other subsystems. >> > The skbs are already orphaned later in cmsg code, so we just >> > need to clear the nf/dst/secpath. >> > Do it early, while the relevant entries are hopefully still >> > hot in the cache. >> >> What is this about really? A bit more background about the issue >> might >> help understanding the need for this patch. Is this really specific >> to >> brcmfmac. For instance is something similar already done in mac80211? > > The issue is apparently driver specific, as reported in: > > https://bugzilla.redhat.com/show_bug.cgi?id=1294415 > > This is caused by xmit skbs carrying a notrack ct entry not being freed > by the device driver in a timely manner. Removing the ct module waits > for such entries refcount going to zero and hangs the kernel in busy > loop (for several minutes). > > The relevant skbs are icmp6 packets (ND if I recall correctly, they > bcast packets at the mac level). > > The only other known device driver suffering for the issue is the > infiniband ipoib driver, I send a separate patch for it. > > I lack the broadcom h/w, but with infiniband the bug can be reproduced > with the following steps: > > - ensure ipv6 is enabled on the target device, and firewalld is running > (e.g. the module nf_conntrack_ipv6 is loaded) > - assign a static ip to the device > - shut down the firewall (e.g. try to remove the module nf_conntrack) > > For the brcmfmac driver most probably it is necessary being > disassociated from the AP before shutting down the firewall (but I > can't double check). This is probably why mac80211 does not suffer this > issue. > > The root cause for the issue could be actually a firmware issue, any > better clues are more than welcome! Do I get this correctly brcmfmac gets some skb for transmitting it and doesn't free it for few minutes? It sounds like some bug that should be fixed instead of hiding it.
On Wed, 2017-02-08 at 09:52 +0100, Rafał Miłecki wrote: > On 8 February 2017 at 09:38, Paolo Abeni <pabeni@redhat.com> wrote: > > On Tue, 2017-02-07 at 20:23 +0100, Arend Van Spriel wrote: > > > On 7-2-2017 17:50, Paolo Abeni wrote: > > > > the skbs can be held by the driver for a long time, so we need > > > > to clear any state on xmit to avoid hanging other subsystems. > > > > The skbs are already orphaned later in cmsg code, so we just > > > > need to clear the nf/dst/secpath. > > > > Do it early, while the relevant entries are hopefully still > > > > hot in the cache. > > > > > > What is this about really? A bit more background about the issue > > > might > > > help understanding the need for this patch. Is this really > > > specific > > > to > > > brcmfmac. For instance is something similar already done in > > > mac80211? > > > > The issue is apparently driver specific, as reported in: > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1294415 > > > > This is caused by xmit skbs carrying a notrack ct entry not being > > freed > > by the device driver in a timely manner. Removing the ct module > > waits > > for such entries refcount going to zero and hangs the kernel in > > busy > > loop (for several minutes). > > > > The relevant skbs are icmp6 packets (ND if I recall correctly, they > > bcast packets at the mac level). > > > > The only other known device driver suffering for the issue is the > > infiniband ipoib driver, I send a separate patch for it. > > > > I lack the broadcom h/w, but with infiniband the bug can be > > reproduced > > with the following steps: > > > > - ensure ipv6 is enabled on the target device, and firewalld is > > running > > (e.g. the module nf_conntrack_ipv6 is loaded) > > - assign a static ip to the device > > - shut down the firewall (e.g. try to remove the module > > nf_conntrack) > > > > For the brcmfmac driver most probably it is necessary being > > disassociated from the AP before shutting down the firewall (but I > > can't double check). This is probably why mac80211 does not suffer > > this > > issue. > > > > The root cause for the issue could be actually a firmware issue, > > any > > better clues are more than welcome! > > Do I get this correctly brcmfmac gets some skb for transmitting it > and > doesn't free it for few minutes? It sounds like some bug that should > be fixed instead of hiding it. I mostly agreed, but please also note that early clearing the skb head state makes sense from a performance pov: we can do the costly, required, atomic operations while the data is still hot in the cpu L1 cache. I'm unable to find anything obvious at the driver level, and I think the root cause could be in the firmware. I hope some more knowledgeable than me on this topics can have a better look. Thank you, Paolo
Paolo Abeni <pabeni@redhat.com> writes: > On Tue, 2017-02-07 at 20:23 +0100, Arend Van Spriel wrote: >> On 7-2-2017 17:50, Paolo Abeni wrote: >> > the skbs can be held by the driver for a long time, so we need >> > to clear any state on xmit to avoid hanging other subsystems. >> > The skbs are already orphaned later in cmsg code, so we just >> > need to clear the nf/dst/secpath. >> > Do it early, while the relevant entries are hopefully still >> > hot in the cache. >> >> What is this about really? A bit more background about the issue >> might >> help understanding the need for this patch. Is this really specific >> to >> brcmfmac. For instance is something similar already done in mac80211? > > The issue is apparently driver specific, as reported in: > > https://bugzilla.redhat.com/show_bug.cgi?id=1294415 > > This is caused by xmit skbs carrying a notrack ct entry not being freed > by the device driver in a timely manner. Removing the ct module waits > for such entries refcount going to zero and hangs the kernel in busy > loop (for several minutes). > > The relevant skbs are icmp6 packets (ND if I recall correctly, they > bcast packets at the mac level). > > The only other known device driver suffering for the issue is the > infiniband ipoib driver, I send a separate patch for it. > > I lack the broadcom h/w, but with infiniband the bug can be reproduced > with the following steps: > > - ensure ipv6 is enabled on the target device, and firewalld is running > (e.g. the module nf_conntrack_ipv6 is loaded) > - assign a static ip to the device > - shut down the firewall (e.g. try to remove the module nf_conntrack) > > For the brcmfmac driver most probably it is necessary being > disassociated from the AP before shutting down the firewall (but I > can't double check). This is probably why mac80211 does not suffer this > issue. > > The root cause for the issue could be actually a firmware issue, any > better clues are more than welcome! BTW, you should have added all this to the commit log. After reading the original description it left more questions open than answered them.
On Wed, 2017-02-08 at 12:27 +0200, Kalle Valo wrote: > Paolo Abeni <pabeni@redhat.com> writes: > > > On Tue, 2017-02-07 at 20:23 +0100, Arend Van Spriel wrote: > > > On 7-2-2017 17:50, Paolo Abeni wrote: > > > > the skbs can be held by the driver for a long time, so we need > > > > to clear any state on xmit to avoid hanging other subsystems. > > > > The skbs are already orphaned later in cmsg code, so we just > > > > need to clear the nf/dst/secpath. > > > > Do it early, while the relevant entries are hopefully still > > > > hot in the cache. > > > > > > What is this about really? A bit more background about the issue > > > might > > > help understanding the need for this patch. Is this really specific > > > to > > > brcmfmac. For instance is something similar already done in mac80211? > > > > The issue is apparently driver specific, as reported in: > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1294415 > > > > This is caused by xmit skbs carrying a notrack ct entry not being freed > > by the device driver in a timely manner. Removing the ct module waits > > for such entries refcount going to zero and hangs the kernel in busy > > loop (for several minutes). > > > > The relevant skbs are icmp6 packets (ND if I recall correctly, they > > bcast packets at the mac level). > > > > The only other known device driver suffering for the issue is the > > infiniband ipoib driver, I send a separate patch for it. > > > > I lack the broadcom h/w, but with infiniband the bug can be reproduced > > with the following steps: > > > > - ensure ipv6 is enabled on the target device, and firewalld is running > > (e.g. the module nf_conntrack_ipv6 is loaded) > > - assign a static ip to the device > > - shut down the firewall (e.g. try to remove the module nf_conntrack) > > > > For the brcmfmac driver most probably it is necessary being > > disassociated from the AP before shutting down the firewall (but I > > can't double check). This is probably why mac80211 does not suffer this > > issue. > > > > The root cause for the issue could be actually a firmware issue, any > > better clues are more than welcome! > > BTW, you should have added all this to the commit log. After reading the > original description it left more questions open than answered them. I'm sorry for the lack of clarity, thank you for the advice, will do next time. I tried to keep the commit message short since others reviewer in other area of the kernel really prefer shorter ones. Cheers, Paolo
On 8-2-2017 10:29, Paolo Abeni wrote: > On Wed, 2017-02-08 at 09:52 +0100, Rafał Miłecki wrote: >> On 8 February 2017 at 09:38, Paolo Abeni <pabeni@redhat.com> wrote: >>> On Tue, 2017-02-07 at 20:23 +0100, Arend Van Spriel wrote: >>>> On 7-2-2017 17:50, Paolo Abeni wrote: >>>>> the skbs can be held by the driver for a long time, so we need >>>>> to clear any state on xmit to avoid hanging other subsystems. >>>>> The skbs are already orphaned later in cmsg code, so we just >>>>> need to clear the nf/dst/secpath. >>>>> Do it early, while the relevant entries are hopefully still >>>>> hot in the cache. >>>> >>>> What is this about really? A bit more background about the issue >>>> might >>>> help understanding the need for this patch. Is this really >>>> specific >>>> to >>>> brcmfmac. For instance is something similar already done in >>>> mac80211? >>> >>> The issue is apparently driver specific, as reported in: >>> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1294415 >>> >>> This is caused by xmit skbs carrying a notrack ct entry not being >>> freed >>> by the device driver in a timely manner. Removing the ct module >>> waits >>> for such entries refcount going to zero and hangs the kernel in >>> busy >>> loop (for several minutes). >>> >>> The relevant skbs are icmp6 packets (ND if I recall correctly, they >>> bcast packets at the mac level). >>> >>> The only other known device driver suffering for the issue is the >>> infiniband ipoib driver, I send a separate patch for it. >>> >>> I lack the broadcom h/w, but with infiniband the bug can be >>> reproduced >>> with the following steps: >>> >>> - ensure ipv6 is enabled on the target device, and firewalld is >>> running >>> (e.g. the module nf_conntrack_ipv6 is loaded) >>> - assign a static ip to the device >>> - shut down the firewall (e.g. try to remove the module >>> nf_conntrack) >>> >>> For the brcmfmac driver most probably it is necessary being >>> disassociated from the AP before shutting down the firewall (but I >>> can't double check). This is probably why mac80211 does not suffer >>> this >>> issue. >>> >>> The root cause for the issue could be actually a firmware issue, >>> any >>> better clues are more than welcome! >> >> Do I get this correctly brcmfmac gets some skb for transmitting it >> and >> doesn't free it for few minutes? It sounds like some bug that should >> be fixed instead of hiding it. > > I mostly agreed, but please also note that early clearing the skb head > state makes sense from a performance pov: we can do the costly, > required, atomic operations while the data is still hot in the cpu L1 > cache. > > I'm unable to find anything obvious at the driver level, and I think > the root cause could be in the firmware. I hope some more knowledgeable > than me on this topics can have a better look. Hi Paolo, I agree with Rafał that several minutes for an skb to be freed is a red flag. Now I know our driver and firmware but not played to much with IPv6 and firewalls. Hopefully I have all the netfilter stuff in my kernel when I try to reproduce it over here. Regards, Arend
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c index 9e6f60a..5a8d57b 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c @@ -22,6 +22,7 @@ #include <net/rtnetlink.h> #include <net/addrconf.h> #include <net/ipv6.h> +#include <net/xfrm.h> #include <brcmu_utils.h> #include <brcmu_wifi.h> @@ -243,6 +244,13 @@ static netdev_tx_t brcmf_netdev_start_xmit(struct sk_buff *skb, if ((skb->priority == 0) || (skb->priority > 7)) skb->priority = cfg80211_classify8021d(skb, NULL); + /* we can keep the skb for a long time; avoid starving other + * subsystems + */ + nf_reset(skb); + skb_dst_drop(skb); + secpath_reset(skb); + ret = brcmf_proto_tx_queue_data(drvr, ifp->ifidx, skb); if (ret < 0) brcmf_txfinalize(ifp, skb, false);
the skbs can be held by the driver for a long time, so we need to clear any state on xmit to avoid hanging other subsystems. The skbs are already orphaned later in cmsg code, so we just need to clear the nf/dst/secpath. Do it early, while the relevant entries are hopefully still hot in the cache. Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c | 8 ++++++++ 1 file changed, 8 insertions(+)