Message ID | 1533180646-8028-1-git-send-email-sushant2k1513@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Johannes Berg |
Headers | show |
Series | [v3,1/2] mac80211: invoke sw_scan if hw_scan returns EPERM | expand |
On Thu, 2018-08-02 at 09:00 +0530, Sushant Kumar Mishra wrote: > From: Sanjay Kumar Koduri <sanjay.konduri@redpinesignals.com> > > Currently, software scan in mac80211 is used by drivers, which don't > implement hardware scan. However some drivers which have implemented > hardware scan may also sometimes want to use software scan in mac80211. > Such drivers can return '-EPERM' and ask mac80211 to fallback to > software scan with this patch. > > Signed-off-by: Sanjay Kumar konduri <sanjay.konduri@redpinesignals.com> > Signed-off-by: Siva Rebbagondla <siva.rebbagondla@redpinesignals.com> > Signed-off-by: Sushant Kumar Mishra <sushant.mishra@redpinesignals.com> > --- > changes in v3: Set SCAN_HW_CANCELLED bit, before SW_SCAN triggered. I'm not convinced - why would you set that? It seems to me that drivers might, for example, still do one band in hardware and the other in software, or something like that? You might also run into the WARN_ON here? > @@ -686,6 +686,11 @@ static int __ieee80211_start_scan(struct ieee80211_sub_if_data *sdata, > if (local->ops->hw_scan) { > WARN_ON(!ieee80211_prep_hw_scan(local)); > rc = drv_hw_scan(local, sdata, local->hw_scan_req); > + if (rc == -EPERM) { > + set_bit(SCAN_HW_CANCELLED, &local->scanning); > + __set_bit(SCAN_SW_SCANNING, &local->scanning); > + rc = ieee80211_start_sw_scan(local, sdata); > + } Also, -EPERM is probably not a good idea - we might want to let the driver propagate arbitrary return values up. There's precedent for using a positive number (just the value 1) for such "special" behaviour, so I think that'd be better. Obviously this is also lacking documentation in mac80211.h one way or the other. johannes
Huh, why did you send this like 10 times? Also, HTML is dropped by the list ... > > I'm not convinced - why would you set that? It seems to me that drivers > > might, for example, still do one band in hardware and the other in > > software, or something like that? You might also run into the WARN_ON > > here? > > If we don't set this bit, observed "scan aborted" when "iw dev wlan0 > scan" command > is given in redpine dual band modules(Didn't see any issue with single > band module). > sh# iw dev wlan0 scan > scan aborted! What happened underneath here? I'm having a hard time understanding - perhaps you can at least capture some mac80211 event tracing for this? perhaps with function graph tracing too, so we see where exactly the abort happens. Unless you already know why this happens, and that's why you set the cancel bit? So I think the clearer thing to do would be to have a separate bit for this and check it in the right place. johannes
On Wed, Aug 29, 2018 at 12:54 PM Johannes Berg <johannes@sipsolutions.net> wrote: > > Huh, why did you send this like 10 times? Also, HTML is dropped by the > list ... > Hi Johannes, There was an issue with our mailing server and we received Mail Delivery Failure notification. Hence, we tried multiple times by doing the changes. Apologies for the inconvenience. > > > I'm not convinced - why would you set that? It seems to me that drivers > > > might, for example, still do one band in hardware and the other in > > > software, or something like that? You might also run into the WARN_ON > > > here? > > > > If we don't set this bit, observed "scan aborted" when "iw dev wlan0 > > scan" command > > is given in redpine dual band modules(Didn't see any issue with single > > band module). > > sh# iw dev wlan0 scan > > scan aborted! > > What happened underneath here? I'm having a hard time understanding - > perhaps you can at least capture some mac80211 event tracing for this? > perhaps with function graph tracing too, so we see where exactly the > abort happens. > > Unless you already know why this happens, and that's why you set the > cancel bit? Sure Johannes. I will provide event trace or function graph details for this and send you the updatd patch. Is that fine?. > > So I think the clearer thing to do would be to have a separate bit for > this and check it in the right place. > > johannes Thanks Siva Rebbagondla
[re-adding the previous thread] > As discussed earlier, please find attached tar ball for driver logs, > traces and debug patches. > I have added a README for reference in attached folder, go through the > steps and let me know your feedback. Thanks! The tracing doesn't seem to have worked, but the logging helps a lot. So what happens, to summarize, is that we have hardware that doesn't support scanning on both bands at the same time, i.e. SINGLE_SCAN_ON_ALL_BANDS isn't set. This is with the patch to make -EPERM into "please do software scan" (we can debate whether that should be -EPERM or 1 or something, doesn't matter for this discussion). This logic only happens here: if (local->ops->hw_scan) { WARN_ON(!ieee80211_prep_hw_scan(local)); rc = drv_hw_scan(local, sdata, local->hw_scan_req); if (rc == -EPERM) { set_bit(SCAN_HW_CANCELLED, &local->scanning); __set_bit(SCAN_SW_SCANNING, &local->scanning); rc = ieee80211_start_sw_scan(local, sdata); } (in __ieee80211_start_scan). Next thing is that the software scan completes. This calls __ieee80211_scan_completed(), which assumes hardware scan is running, since bool hw_scan = local->ops->hw_scan; This now again runs drv_hw_scan(): if (hw_scan && !aborted && !ieee80211_hw_check(&local->hw, SINGLE_SCAN_ON_ALL_BANDS) && ieee80211_prep_hw_scan(local)) { int rc; rc = drv_hw_scan(local, rcu_dereference_protected(local->scan_sdata, lockdep_is_held(&local->mtx)), local->hw_scan_req); which presumably again returns -EPERM (conditions didn't change), and thus we end up with scan abort. To work around this, you were setting and testing HW_SCAN_CANCELLED in two new places (you can see the setting above). Looking at this in more detail, I think we have multiple issues with the way the fallback is implemented. First of all, this issue at hand. That's worked around, and could be - more properly IMHO - worked around by setting a new bit (HW_SCAN_SW_FALLBACK or so). However, note that due to the way you implemented this, the software scan requests that happens as a fallback behaves differently from a regular software scan. This doesn't matter to you (right now) because you only fall back to software if you're not connected, but in the case that you _are_ connected, it would behave differently due to all the code in __ieee80211_start_scan() that handles a single-channel software scan differently if that matches the currently used channel. So ultimately, I think the (combined) problem is that the fallback is implemented badly. My suggestion would be to separate the decision of whether to do software or hardware scan from the hw_scan callback. This could be done by flags, e.g. hw->scanflags & IEEE80211_SW_SCAN_WHILE_UNASSOC, which would be your case IIRC, though TBD what that means for AP mode. Obviously, you'll have to change all the (four) things that are checking for "local->ops->hw_scan" to check something else. The three in __ieee80211_start_scan() obviously just check the result of the condition, and the one in __ieee80211_scan_completed() can already check local->scanning today since that's set by then (and the scanning==0 case is only valid with aborted, so not relevant for the hw_scan condition). Perhaps we can actually do the return value from drv_hw_scan(), but in that case I'd say it should look something like this: https://p.sipsolutions.net/6ca3554b24e09ffb.txt Yes, that's less efficient (due to the whole idle recalculation), but it's much easier to understand what's going on, IMHO. There's also a question about whether or not capabilities match, e.g. if we do software scan we report 4 SSIDs, low prio scan, AP scan, random scan, minimal scan content ... these are OK, but if there are ever features that mac80211 *doesn't* support, we'll be in trouble if somebody tries to use them. Hope this helps. If my patch actually works (I obviously haven't tested) then I can send it to the list with signed-off-by and all. johannes
On Tue, Sep 11, 2018 at 5:55 PM Johannes Berg <johannes@sipsolutions.net> wrote: > > [re-adding the previous thread] > > > As discussed earlier, please find attached tar ball for driver logs, > > traces and debug patches. > > I have added a README for reference in attached folder, go through the > > steps and let me know your feedback. > > Thanks! The tracing doesn't seem to have worked, but the logging helps a > lot. > > So what happens, to summarize, is that we have hardware that doesn't > support scanning on both bands at the same time, i.e. > SINGLE_SCAN_ON_ALL_BANDS isn't set. > > This is with the patch to make -EPERM into "please do software scan" (we > can debate whether that should be -EPERM or 1 or something, doesn't > matter for this discussion). > > This logic only happens here: > > if (local->ops->hw_scan) { > WARN_ON(!ieee80211_prep_hw_scan(local)); > rc = drv_hw_scan(local, sdata, local->hw_scan_req); > if (rc == -EPERM) { > set_bit(SCAN_HW_CANCELLED, &local->scanning); > __set_bit(SCAN_SW_SCANNING, &local->scanning); > rc = ieee80211_start_sw_scan(local, sdata); > } > (in __ieee80211_start_scan). > > Next thing is that the software scan completes. This calls > __ieee80211_scan_completed(), which assumes hardware scan is running, > since > bool hw_scan = local->ops->hw_scan; > > This now again runs drv_hw_scan(): > > if (hw_scan && !aborted && > !ieee80211_hw_check(&local->hw, SINGLE_SCAN_ON_ALL_BANDS) && > ieee80211_prep_hw_scan(local)) { > int rc; > > rc = drv_hw_scan(local, > rcu_dereference_protected(local->scan_sdata, > lockdep_is_held(&local->mtx)), > local->hw_scan_req); > > which presumably again returns -EPERM (conditions didn't change), and > thus we end up with scan abort. > > To work around this, you were setting and testing HW_SCAN_CANCELLED in > two new places (you can see the setting above). > > Looking at this in more detail, I think we have multiple issues with the > way the fallback is implemented. > > First of all, this issue at hand. That's worked around, and could be - > more properly IMHO - worked around by setting a new bit > (HW_SCAN_SW_FALLBACK or so). > > However, note that due to the way you implemented this, the software > scan requests that happens as a fallback behaves differently from a > regular software scan. This doesn't matter to you (right now) because > you only fall back to software if you're not connected, but in the case > that you _are_ connected, it would behave differently due to all the > code in __ieee80211_start_scan() that handles a single-channel software > scan differently if that matches the currently used channel. > > > So ultimately, I think the (combined) problem is that the fallback is > implemented badly. > > My suggestion would be to separate the decision of whether to do > software or hardware scan from the hw_scan callback. This could be done > by flags, e.g. hw->scanflags & IEEE80211_SW_SCAN_WHILE_UNASSOC, which > would be your case IIRC, though TBD what that means for AP mode. > > Obviously, you'll have to change all the (four) things that are checking > for "local->ops->hw_scan" to check something else. The three in > __ieee80211_start_scan() obviously just check the result of the > condition, and the one in __ieee80211_scan_completed() can already check > local->scanning today since that's set by then (and the scanning==0 case > is only valid with aborted, so not relevant for the hw_scan condition). > > Perhaps we can actually do the return value from drv_hw_scan(), but in > that case I'd say it should look something like this: > > https://p.sipsolutions.net/6ca3554b24e09ffb.txt > > Yes, that's less efficient (due to the whole idle recalculation), but > it's much easier to understand what's going on, IMHO. > > There's also a question about whether or not capabilities match, e.g. if > we do software scan we report 4 SSIDs, low prio scan, AP scan, random > scan, minimal scan content ... these are OK, but if there are ever > features that mac80211 *doesn't* support, we'll be in trouble if > somebody tries to use them. > > Hope this helps. If my patch actually works (I obviously haven't tested) > then I can send it to the list with signed-off-by and all. > > johannes Hi Johannes, Thanks for the brief explanation and As you mentioned above, we made the changes w.r.t our module. We will understand, what we missed here and will test with your patch and update you. Regards, Siva Rebbagondla.
On Tue, Sep 11, 2018 at 6:50 PM Siva Rebbagondla <siva8118@gmail.com> wrote: > > On Tue, Sep 11, 2018 at 5:55 PM Johannes Berg <johannes@sipsolutions.net> wrote: > > > > [re-adding the previous thread] > > > > > As discussed earlier, please find attached tar ball for driver logs, > > > traces and debug patches. > > > I have added a README for reference in attached folder, go through the > > > steps and let me know your feedback. > > > > Thanks! The tracing doesn't seem to have worked, but the logging helps a > > lot. > > > > So what happens, to summarize, is that we have hardware that doesn't > > support scanning on both bands at the same time, i.e. > > SINGLE_SCAN_ON_ALL_BANDS isn't set. Hi Johannes, Actually, we have missed this in code. Our module supports scanning on both bands at the same time. Thanks for the info. And also we ran the scan on our dual band module by setting "SINGLE_SCAN_ON_ALL_BANDS" flag in driver and removing SCAN_HW_CANCELLED bit in mac80211 stack. We didn't see any issues in scan and as i mentioned earlier, I didn't observe "scan aborted". Can i resubmit patch with these changes?. Thanks Siva Rebbagondla > > > > This is with the patch to make -EPERM into "please do software scan" (we > > can debate whether that should be -EPERM or 1 or something, doesn't > > matter for this discussion). > > > > This logic only happens here: > > > > if (local->ops->hw_scan) { > > WARN_ON(!ieee80211_prep_hw_scan(local)); > > rc = drv_hw_scan(local, sdata, local->hw_scan_req); > > if (rc == -EPERM) { > > set_bit(SCAN_HW_CANCELLED, &local->scanning); > > __set_bit(SCAN_SW_SCANNING, &local->scanning); > > rc = ieee80211_start_sw_scan(local, sdata); > > } > > (in __ieee80211_start_scan). > > > > Next thing is that the software scan completes. This calls > > __ieee80211_scan_completed(), which assumes hardware scan is running, > > since > > bool hw_scan = local->ops->hw_scan; > > > > This now again runs drv_hw_scan(): > > > > if (hw_scan && !aborted && > > !ieee80211_hw_check(&local->hw, SINGLE_SCAN_ON_ALL_BANDS) && > > ieee80211_prep_hw_scan(local)) { > > int rc; > > > > rc = drv_hw_scan(local, > > rcu_dereference_protected(local->scan_sdata, > > lockdep_is_held(&local->mtx)), > > local->hw_scan_req); > > > > which presumably again returns -EPERM (conditions didn't change), and > > thus we end up with scan abort. > > > > To work around this, you were setting and testing HW_SCAN_CANCELLED in > > two new places (you can see the setting above). > > > > Looking at this in more detail, I think we have multiple issues with the > > way the fallback is implemented. > > > > First of all, this issue at hand. That's worked around, and could be - > > more properly IMHO - worked around by setting a new bit > > (HW_SCAN_SW_FALLBACK or so). > > > > However, note that due to the way you implemented this, the software > > scan requests that happens as a fallback behaves differently from a > > regular software scan. This doesn't matter to you (right now) because > > you only fall back to software if you're not connected, but in the case > > that you _are_ connected, it would behave differently due to all the > > code in __ieee80211_start_scan() that handles a single-channel software > > scan differently if that matches the currently used channel. > > > > > > So ultimately, I think the (combined) problem is that the fallback is > > implemented badly. > > > > My suggestion would be to separate the decision of whether to do > > software or hardware scan from the hw_scan callback. This could be done > > by flags, e.g. hw->scanflags & IEEE80211_SW_SCAN_WHILE_UNASSOC, which > > would be your case IIRC, though TBD what that means for AP mode. > > > > Obviously, you'll have to change all the (four) things that are checking > > for "local->ops->hw_scan" to check something else. The three in > > __ieee80211_start_scan() obviously just check the result of the > > condition, and the one in __ieee80211_scan_completed() can already check > > local->scanning today since that's set by then (and the scanning==0 case > > is only valid with aborted, so not relevant for the hw_scan condition). > > > > Perhaps we can actually do the return value from drv_hw_scan(), but in > > that case I'd say it should look something like this: > > > > https://p.sipsolutions.net/6ca3554b24e09ffb.txt > > > > Yes, that's less efficient (due to the whole idle recalculation), but > > it's much easier to understand what's going on, IMHO. > > > > There's also a question about whether or not capabilities match, e.g. if > > we do software scan we report 4 SSIDs, low prio scan, AP scan, random > > scan, minimal scan content ... these are OK, but if there are ever > > features that mac80211 *doesn't* support, we'll be in trouble if > > somebody tries to use them. > > > > Hope this helps. If my patch actually works (I obviously haven't tested) > > then I can send it to the list with signed-off-by and all. > > > > johannes > > Hi Johannes, > Thanks for the brief explanation and As you mentioned above, we made > the changes w.r.t our module. > We will understand, what we missed here and will test with your patch > and update you. > > Regards, > Siva Rebbagondla.
Hi, > Actually, we have missed this in code. Our module supports scanning on > both bands at the same time. Yeah, but we still have to support drivers that don't support this. > Thanks for the info. > And also we ran the scan on our dual band module by setting > "SINGLE_SCAN_ON_ALL_BANDS" flag in > driver and removing SCAN_HW_CANCELLED bit in mac80211 stack. We didn't > see any issues in scan and > as i mentioned earlier, I didn't observe "scan aborted". > Can i resubmit patch with these changes?. Were you able to try my patch instead? I tend to like the structure of it better - i.e. how we fall back after cleaning up the state, rather than directly in the middle. Like I said above, I don't think we should take your patch - you've seen a bug there, and if we just work around it by making your driver do both-bands-at-once, we still have the bug... johannes
diff --git a/net/mac80211/scan.c b/net/mac80211/scan.c index 2e917a6..bb1029b 100644 --- a/net/mac80211/scan.c +++ b/net/mac80211/scan.c @@ -412,7 +412,7 @@ static void __ieee80211_scan_completed(struct ieee80211_hw *hw, bool aborted) /* Set power back to normal operating levels. */ ieee80211_hw_config(local, 0); - if (!hw_scan) { + if (!test_bit(SCAN_SW_SCANNING, &local->scanning)) { ieee80211_configure_filter(local); drv_sw_scan_complete(local, scan_sdata); ieee80211_offchannel_return(local); @@ -686,6 +686,11 @@ static int __ieee80211_start_scan(struct ieee80211_sub_if_data *sdata, if (local->ops->hw_scan) { WARN_ON(!ieee80211_prep_hw_scan(local)); rc = drv_hw_scan(local, sdata, local->hw_scan_req); + if (rc == -EPERM) { + set_bit(SCAN_HW_CANCELLED, &local->scanning); + __set_bit(SCAN_SW_SCANNING, &local->scanning); + rc = ieee80211_start_sw_scan(local, sdata); + } } else { rc = ieee80211_start_sw_scan(local, sdata); }