Message ID | 1480645800-2148-1-git-send-email-greearb@candelatech.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Johannes Berg |
Headers | show |
On 2 December 2016 at 03:29, <greearb@candelatech.com> wrote: > From: Ben Greear <greearb@candelatech.com> > > This appears to fix a problem where ath10k firmware would crash, > mac80211 would start re-adding interfaces to the driver, but the > iterate-active-interfaces logic would then try to use the half-built > interfaces. With a bit of extra debug to catch the problem, the > ath10k crash looks like this: > > ath10k_pci 0000:05:00.0: Initializing arvif: ffff8801ce97e320 on vif: ffff8801ce97e1d8 > > [the print that happens after arvif->ar is assigned is not shown, so code did not make it that far before > the tx-beacon-nowait method was called] > > tx-beacon-nowait: arvif: ffff8801ce97e320 ar: (null) [...] > > Signed-off-by: Ben Greear <greearb@candelatech.com> > --- > net/mac80211/util.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/mac80211/util.c b/net/mac80211/util.c > index 863f2c1..abe1f64 100644 > --- a/net/mac80211/util.c > +++ b/net/mac80211/util.c > @@ -705,7 +705,7 @@ static void __iterate_interfaces(struct ieee80211_local *local, > break; > } > if (!(iter_flags & IEEE80211_IFACE_ITER_RESUME_ALL) && > - active_only && !(sdata->flags & IEEE80211_SDATA_IN_DRIVER)) > + (active_only && (local->in_reconfig || !(sdata->flags & IEEE80211_SDATA_IN_DRIVER)))) > continue; Doesn't this effectivelly prevent you from iterating over interfaces completely during reconfig? As you bring up interfaces you might need/want to iterate over others to re-adjust your own state. I'd argue there should be another flag, IEEE80211_SDATA_RESUMING, used with sdata->flags for resuming so that once it is re-added to the driver it can be cleared (and therefore properly iterated over). Michał
On Mon, 2016-12-05 at 09:13 +0100, Michal Kazior wrote: > On 2 December 2016 at 03:29, <greearb@candelatech.com> wrote: > > > > From: Ben Greear <greearb@candelatech.com> > > > > This appears to fix a problem where ath10k firmware would crash, > > mac80211 would start re-adding interfaces to the driver, but the > > iterate-active-interfaces logic would then try to use the half- > > built > > interfaces. With a bit of extra debug to catch the problem, the > > ath10k crash looks like this: > > > > ath10k_pci 0000:05:00.0: Initializing arvif: ffff8801ce97e320 on > > vif: ffff8801ce97e1d8 > > > > [the print that happens after arvif->ar is assigned is not shown, > > so code did not make it that far before > > the tx-beacon-nowait method was called] > > > > tx-beacon-nowait: arvif: ffff8801ce97e320 ar: (null) > [...] > > > > > > Signed-off-by: Ben Greear <greearb@candelatech.com> > > --- > > net/mac80211/util.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/mac80211/util.c b/net/mac80211/util.c > > index 863f2c1..abe1f64 100644 > > --- a/net/mac80211/util.c > > +++ b/net/mac80211/util.c > > @@ -705,7 +705,7 @@ static void __iterate_interfaces(struct > > ieee80211_local *local, > > break; > > } > > if (!(iter_flags & IEEE80211_IFACE_ITER_RESUME_ALL) > > && > > - active_only && !(sdata->flags & > > IEEE80211_SDATA_IN_DRIVER)) > > + (active_only && (local->in_reconfig || !(sdata- > > >flags & IEEE80211_SDATA_IN_DRIVER)))) > > continue; > > Doesn't this effectivelly prevent you from iterating over interfaces > completely during reconfig? As you bring up interfaces you might > need/want to iterate over others to re-adjust your own state. Agree, that doesn't really make sense. > I'd argue there should be another flag, IEEE80211_SDATA_RESUMING, > used with sdata->flags for resuming so that once it is re-added to > the driver it can be cleared (and therefore properly iterated over). That would make some sense, or perhaps the sdata_in_driver should be cleared (and remembered elsewhere) at some point during the restart. johannes
On 12/05/2016 05:52 AM, Johannes Berg wrote: > On Mon, 2016-12-05 at 09:13 +0100, Michal Kazior wrote: >> On 2 December 2016 at 03:29, <greearb@candelatech.com> wrote: >>> >>> From: Ben Greear <greearb@candelatech.com> >>> >>> This appears to fix a problem where ath10k firmware would crash, >>> mac80211 would start re-adding interfaces to the driver, but the >>> iterate-active-interfaces logic would then try to use the half- >>> built >>> interfaces. With a bit of extra debug to catch the problem, the >>> ath10k crash looks like this: >>> >>> ath10k_pci 0000:05:00.0: Initializing arvif: ffff8801ce97e320 on >>> vif: ffff8801ce97e1d8 >>> >>> [the print that happens after arvif->ar is assigned is not shown, >>> so code did not make it that far before >>> the tx-beacon-nowait method was called] >>> >>> tx-beacon-nowait: arvif: ffff8801ce97e320 ar: (null) >> [...] >>> >>> >>> Signed-off-by: Ben Greear <greearb@candelatech.com> >>> --- >>> net/mac80211/util.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/net/mac80211/util.c b/net/mac80211/util.c >>> index 863f2c1..abe1f64 100644 >>> --- a/net/mac80211/util.c >>> +++ b/net/mac80211/util.c >>> @@ -705,7 +705,7 @@ static void __iterate_interfaces(struct >>> ieee80211_local *local, >>> break; >>> } >>> if (!(iter_flags & IEEE80211_IFACE_ITER_RESUME_ALL) >>> && >>> - active_only && !(sdata->flags & >>> IEEE80211_SDATA_IN_DRIVER)) >>> + (active_only && (local->in_reconfig || !(sdata- >>>> flags & IEEE80211_SDATA_IN_DRIVER)))) >>> continue; >> >> Doesn't this effectivelly prevent you from iterating over interfaces >> completely during reconfig? As you bring up interfaces you might >> need/want to iterate over others to re-adjust your own state. > > Agree, that doesn't really make sense. > >> I'd argue there should be another flag, IEEE80211_SDATA_RESUMING, >> used with sdata->flags for resuming so that once it is re-added to >> the driver it can be cleared (and therefore properly iterated over). > > That would make some sense, or perhaps the sdata_in_driver should be > cleared (and remembered elsewhere) at some point during the restart. I think clearing sdata-in-driver would fix the ath10k problem, at least, but I was afraid it would break something else in mac80211 or maybe in other thick firmware drivers. One way or another, we cannot be iterating over interfaces while the interfaces are at the same time being (re)added. Maybe mac80211 should explicitly remove all interfaces from the driver during crash recovery? And the behaviour needs to be clearly documented somewhere easy to find so that we can think about and program to the correct API behaviour. Thanks, Bne > > johannes > > _______________________________________________ > ath10k mailing list > ath10k@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/ath10k >
On Mon, 2016-12-05 at 06:57 -0800, Ben Greear wrote: > I think clearing sdata-in-driver would fix the ath10k problem, at > least, but I was afraid it would break something else in mac80211 or > maybe in other thick firmware drivers. It's pretty much an internal thing - not sure what it'd break. OTOH, some drivers might actually assume that iterating finds them all, if they never clear the data even across a restart? > One way or another, we cannot be iterating over interfaces while > the interfaces are at the same time being (re)added. Well, we obviously *can* be, and we do in fact do that - it's just that ath10k specifically has issues with the data it's putting there, no? > Maybe mac80211 should explicitly remove all interfaces from the > driver during crash recovery? I don't think that'll work. Removing them would interact with the firmware, which is dead, etc. That'd just cause trouble. > And the behaviour needs to be clearly documented somewhere > easy to find so that we can think about and program to the correct > API behaviour. We assume that the driver resets all its internal state - this whole interface iteration is a corner case we hadn't considered, I suppose. johannes
On 12/05/2016 07:00 AM, Johannes Berg wrote: > On Mon, 2016-12-05 at 06:57 -0800, Ben Greear wrote: > >> I think clearing sdata-in-driver would fix the ath10k problem, at >> least, but I was afraid it would break something else in mac80211 or >> maybe in other thick firmware drivers. > > It's pretty much an internal thing - not sure what it'd break. OTOH, > some drivers might actually assume that iterating finds them all, if > they never clear the data even across a restart? > >> One way or another, we cannot be iterating over interfaces while >> the interfaces are at the same time being (re)added. > > Well, we obviously *can* be, and we do in fact do that - it's just that > ath10k specifically has issues with the data it's putting there, no? It causes races that appear to be very difficult to resolve in the driver alone. On normal bringup of an interface, the sdata-in-driver flag is only set at the bottom of the add-interface. In case of re-config, the flag is already set, and never cleared, so behaviour is different w/regard to the iterate. > >> Maybe mac80211 should explicitly remove all interfaces from the >> driver during crash recovery? > > I don't think that'll work. Removing them would interact with the > firmware, which is dead, etc. That'd just cause trouble. That issue already causes trouble and is dealt with in ath10k, I think, but clearing the flag in mac80211 would probably be enough to fix the iterate logic. >> And the behaviour needs to be clearly documented somewhere >> easy to find so that we can think about and program to the correct >> API behaviour. > > We assume that the driver resets all its internal state - this whole > interface iteration is a corner case we hadn't considered, I suppose. Yeah, tricky beastie. I think the txq issue is also part of this since there are references up in mac80211 and also down in ath10k. Part of my hack to clean up that crash might be resolved by mac80211 doing better cleanup API when firmware crashes. Thanks, Ben > > johannes >
fwiw, I'm facing the same kinds of cleanup problems with my port of (oct 2015) ath10k to freebsd. The oct 2015 ath10k tree doesn't have the firmware per-txq/tid/peer feedback stuff in it, so this hasn't yet bitten me, but there rest of the races have - mostly surrounding handling pending TX frames when a VAP is deleted (vdev/interface in ath10k/mac80211 language) and if any TX frames were stuck. Stuck TX frames happens more often than I'd like because of how earlier firmware required peer entries to first appear in the hardware. Maybe we need some kind of lifecycle checkpoint for things like peer addition/removal (for the txq issues ben had before) and the ability to ask the firmware to stop/flush HTT TX and re-start it. That way we can cleanly add/remove interfaces at any point without worrying about any dangling frames in the transmit queue waiting for completion. -adrian
diff --git a/net/mac80211/util.c b/net/mac80211/util.c index 863f2c1..abe1f64 100644 --- a/net/mac80211/util.c +++ b/net/mac80211/util.c @@ -705,7 +705,7 @@ static void __iterate_interfaces(struct ieee80211_local *local, break; } if (!(iter_flags & IEEE80211_IFACE_ITER_RESUME_ALL) && - active_only && !(sdata->flags & IEEE80211_SDATA_IN_DRIVER)) + (active_only && (local->in_reconfig || !(sdata->flags & IEEE80211_SDATA_IN_DRIVER)))) continue; if (ieee80211_sdata_running(sdata) || !active_only) iterator(data, sdata->vif.addr,