[v3,1/2] mac80211: invoke sw_scan if hw_scan returns EPERM

Message ID	1533180646-8028-1-git-send-email-sushant2k1513@gmail.com (mailing list archive)
State	Changes Requested
Delegated to:	Johannes Berg
Headers	show Return-Path: <linux-wireless-owner@kernel.org> From: Sushant Kumar Mishra <sushant2k1513@gmail.com> To: Kalle Valo <kvalo@codeaurora.org> Cc: linux-wireless@vger.kernel.org, Siva Rebbagondla <siva.rebbagondla@redpinesignals.com>, Sanjay Kumar Konduri <sanjay.konduri@redpinesignals.com>, Sushant Kumar Mishra <sushant.mishra@redpinesignals.com> Subject: [PATCH v3 1/2] mac80211: invoke sw_scan if hw_scan returns EPERM Date: Thu, 2 Aug 2018 09:00:45 +0530 Message-Id: <1533180646-8028-1-git-send-email-sushant2k1513@gmail.com> Sender: linux-wireless-owner@vger.kernel.org Precedence: bulk
Series	[v3,1/2] mac80211: invoke sw_scan if hw_scan returns EPERM \| expand [v3,1/2] mac80211: invoke sw_scan if hw_scan returns EPERM [v3,2/2] rsi: add support for hardware scan offload

Sushant Kumar Mishra Aug. 2, 2018, 3:30 a.m. UTC

From: Sanjay Kumar Koduri <sanjay.konduri@redpinesignals.com>

Currently, software scan in mac80211 is used by drivers, which don't
implement hardware scan. However some drivers which have implemented
hardware scan may also sometimes want to use software scan in mac80211.
Such drivers can return '-EPERM' and ask mac80211 to fallback to
software scan with this patch.

Signed-off-by: Sanjay Kumar konduri <sanjay.konduri@redpinesignals.com>
Signed-off-by: Siva Rebbagondla <siva.rebbagondla@redpinesignals.com>
Signed-off-by: Sushant Kumar Mishra <sushant.mishra@redpinesignals.com>
---
changes in v3: Set SCAN_HW_CANCELLED bit, before SW_SCAN triggered.
changes in v2: Nothing
---
 net/mac80211/scan.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Johannes Berg Aug. 14, 2018, 11:33 a.m. UTC | #1

On Thu, 2018-08-02 at 09:00 +0530, Sushant Kumar Mishra wrote:
> From: Sanjay Kumar Koduri <sanjay.konduri@redpinesignals.com>
> 
> Currently, software scan in mac80211 is used by drivers, which don't
> implement hardware scan. However some drivers which have implemented
> hardware scan may also sometimes want to use software scan in mac80211.
> Such drivers can return '-EPERM' and ask mac80211 to fallback to
> software scan with this patch.
> 
> Signed-off-by: Sanjay Kumar konduri <sanjay.konduri@redpinesignals.com>
> Signed-off-by: Siva Rebbagondla <siva.rebbagondla@redpinesignals.com>
> Signed-off-by: Sushant Kumar Mishra <sushant.mishra@redpinesignals.com>
> ---
> changes in v3: Set SCAN_HW_CANCELLED bit, before SW_SCAN triggered.

I'm not convinced - why would you set that? It seems to me that drivers
might, for example, still do one band in hardware and the other in
software, or something like that? You might also run into the WARN_ON
here?

> @@ -686,6 +686,11 @@ static int __ieee80211_start_scan(struct ieee80211_sub_if_data *sdata,
>  	if (local->ops->hw_scan) {
>  		WARN_ON(!ieee80211_prep_hw_scan(local));

>  		rc = drv_hw_scan(local, sdata, local->hw_scan_req);
> +		if (rc == -EPERM) {
> +			set_bit(SCAN_HW_CANCELLED, &local->scanning);
> +			__set_bit(SCAN_SW_SCANNING, &local->scanning);
> +			rc = ieee80211_start_sw_scan(local, sdata);
> +		}

Also, -EPERM is probably not a good idea - we might want to let the
driver propagate arbitrary return values up. There's precedent for using
a positive number (just the value 1) for such "special" behaviour, so I
think that'd be better.

Obviously this is also lacking documentation in mac80211.h one way or
the other.

johannes

Johannes Berg Aug. 29, 2018, 7:24 a.m. UTC | #2

Huh, why did you send this like 10 times? Also, HTML is dropped by the
list ...

> > I'm not convinced - why would you set that? It seems to me that drivers
> > might, for example, still do one band in hardware and the other in
> > software, or something like that? You might also run into the WARN_ON
> > here?
> 
> If we don't set this bit, observed "scan aborted" when "iw dev wlan0
> scan" command
> is given in redpine dual band modules(Didn't see any issue with single
> band module).
> sh# iw dev wlan0 scan
> scan aborted!

What happened underneath here? I'm having a hard time understanding -
perhaps you can at least capture some mac80211 event tracing for this?
perhaps with function graph tracing too, so we see where exactly the
abort happens.

Unless you already know why this happens, and that's why you set the
cancel bit?

So I think the clearer thing to do would be to have a separate bit for
this and check it in the right place.

johannes

Siva Rebbagondla Aug. 31, 2018, 1:34 p.m. UTC | #3

On Wed, Aug 29, 2018 at 12:54 PM Johannes Berg
<johannes@sipsolutions.net> wrote:
>
> Huh, why did you send this like 10 times? Also, HTML is dropped by the
> list ...
>
Hi Johannes,
There was an issue with our mailing server and we received Mail
Delivery Failure notification.
Hence, we tried multiple times by doing the changes.
Apologies for the inconvenience.
> > > I'm not convinced - why would you set that? It seems to me that drivers
> > > might, for example, still do one band in hardware and the other in
> > > software, or something like that? You might also run into the WARN_ON
> > > here?
> >
> > If we don't set this bit, observed "scan aborted" when "iw dev wlan0
> > scan" command
> > is given in redpine dual band modules(Didn't see any issue with single
> > band module).
> > sh# iw dev wlan0 scan
> > scan aborted!
>
> What happened underneath here? I'm having a hard time understanding -
> perhaps you can at least capture some mac80211 event tracing for this?
> perhaps with function graph tracing too, so we see where exactly the
> abort happens.
>
> Unless you already know why this happens, and that's why you set the
> cancel bit?
Sure Johannes. I will provide event trace or function graph details
for this and send you
the updatd patch. Is that fine?.
>
> So I think the clearer thing to do would be to have a separate bit for
> this and check it in the right place.
>
> johannes

Thanks
Siva Rebbagondla

Johannes Berg Sept. 11, 2018, 12:25 p.m. UTC | #4

[re-adding the previous thread]

> As discussed earlier, please find attached tar ball for driver logs,
> traces and debug patches.
> I have added a README for reference in attached folder, go through the
> steps and let me know your feedback.

Thanks! The tracing doesn't seem to have worked, but the logging helps a
lot.

So what happens, to summarize, is that we have hardware that doesn't
support scanning on both bands at the same time, i.e.
SINGLE_SCAN_ON_ALL_BANDS isn't set.

This is with the patch to make -EPERM into "please do software scan" (we
can debate whether that should be -EPERM or 1 or something, doesn't
matter for this discussion).

This logic only happens here:

        if (local->ops->hw_scan) {
                WARN_ON(!ieee80211_prep_hw_scan(local));
                rc = drv_hw_scan(local, sdata, local->hw_scan_req);
                if (rc == -EPERM) {
                        set_bit(SCAN_HW_CANCELLED, &local->scanning);
                        __set_bit(SCAN_SW_SCANNING, &local->scanning);
                        rc = ieee80211_start_sw_scan(local, sdata);
                }
(in __ieee80211_start_scan).

Next thing is that the software scan completes. This calls
__ieee80211_scan_completed(), which assumes hardware scan is running,
since
	bool hw_scan = local->ops->hw_scan;

This now again runs drv_hw_scan():

        if (hw_scan && !aborted &&
            !ieee80211_hw_check(&local->hw, SINGLE_SCAN_ON_ALL_BANDS) &&
            ieee80211_prep_hw_scan(local)) {
                int rc;

                rc = drv_hw_scan(local,
                        rcu_dereference_protected(local->scan_sdata,
                                                  lockdep_is_held(&local->mtx)),
                        local->hw_scan_req);

which presumably again returns -EPERM (conditions didn't change), and
thus we end up with scan abort.

To work around this, you were setting and testing HW_SCAN_CANCELLED in
two new places (you can see the setting above).

Looking at this in more detail, I think we have multiple issues with the
way the fallback is implemented.

First of all, this issue at hand. That's worked around, and could be -
more properly IMHO - worked around by setting a new bit
(HW_SCAN_SW_FALLBACK or so).

However, note that due to the way you implemented this, the software
scan requests that happens as a fallback behaves differently from a
regular software scan. This doesn't matter to you (right now) because
you only fall back to software if you're not connected, but in the case
that you _are_ connected, it would behave differently due to all the
code in __ieee80211_start_scan() that handles a single-channel software
scan differently if that matches the currently used channel.


So ultimately, I think the (combined) problem is that the fallback is
implemented badly.

My suggestion would be to separate the decision of whether to do
software or hardware scan from the hw_scan callback. This could be done
by flags, e.g. hw->scanflags & IEEE80211_SW_SCAN_WHILE_UNASSOC, which
would be your case IIRC, though TBD what that means for AP mode.

Obviously, you'll have to change all the (four) things that are checking
for "local->ops->hw_scan" to check something else. The three in
__ieee80211_start_scan() obviously just check the result of the
condition, and the one in __ieee80211_scan_completed() can already check
local->scanning today since that's set by then (and the scanning==0 case
is only valid with aborted, so not relevant for the hw_scan condition).

Perhaps we can actually do the return value from drv_hw_scan(), but in
that case I'd say it should look something like this:

https://p.sipsolutions.net/6ca3554b24e09ffb.txt

Yes, that's less efficient (due to the whole idle recalculation), but
it's much easier to understand what's going on, IMHO.

There's also a question about whether or not capabilities match, e.g. if
we do software scan we report 4 SSIDs, low prio scan, AP scan, random
scan, minimal scan content ... these are OK, but if there are ever
features that mac80211 *doesn't* support, we'll be in trouble if
somebody tries to use them.

Hope this helps. If my patch actually works (I obviously haven't tested)
then I can send it to the list with signed-off-by and all.

johannes

Siva Rebbagondla Sept. 11, 2018, 1:20 p.m. UTC | #5

On Tue, Sep 11, 2018 at 5:55 PM Johannes Berg <johannes@sipsolutions.net> wrote:
>
> [re-adding the previous thread]
>
> > As discussed earlier, please find attached tar ball for driver logs,
> > traces and debug patches.
> > I have added a README for reference in attached folder, go through the
> > steps and let me know your feedback.
>
> Thanks! The tracing doesn't seem to have worked, but the logging helps a
> lot.
>
> So what happens, to summarize, is that we have hardware that doesn't
> support scanning on both bands at the same time, i.e.
> SINGLE_SCAN_ON_ALL_BANDS isn't set.
>
> This is with the patch to make -EPERM into "please do software scan" (we
> can debate whether that should be -EPERM or 1 or something, doesn't
> matter for this discussion).
>
> This logic only happens here:
>
>         if (local->ops->hw_scan) {
>                 WARN_ON(!ieee80211_prep_hw_scan(local));
>                 rc = drv_hw_scan(local, sdata, local->hw_scan_req);
>                 if (rc == -EPERM) {
>                         set_bit(SCAN_HW_CANCELLED, &local->scanning);
>                         __set_bit(SCAN_SW_SCANNING, &local->scanning);
>                         rc = ieee80211_start_sw_scan(local, sdata);
>                 }
> (in __ieee80211_start_scan).
>
> Next thing is that the software scan completes. This calls
> __ieee80211_scan_completed(), which assumes hardware scan is running,
> since
>         bool hw_scan = local->ops->hw_scan;
>
> This now again runs drv_hw_scan():
>
>         if (hw_scan && !aborted &&
>             !ieee80211_hw_check(&local->hw, SINGLE_SCAN_ON_ALL_BANDS) &&
>             ieee80211_prep_hw_scan(local)) {
>                 int rc;
>
>                 rc = drv_hw_scan(local,
>                         rcu_dereference_protected(local->scan_sdata,
>                                                   lockdep_is_held(&local->mtx)),
>                         local->hw_scan_req);
>
> which presumably again returns -EPERM (conditions didn't change), and
> thus we end up with scan abort.
>
> To work around this, you were setting and testing HW_SCAN_CANCELLED in
> two new places (you can see the setting above).
>
> Looking at this in more detail, I think we have multiple issues with the
> way the fallback is implemented.
>
> First of all, this issue at hand. That's worked around, and could be -
> more properly IMHO - worked around by setting a new bit
> (HW_SCAN_SW_FALLBACK or so).
>
> However, note that due to the way you implemented this, the software
> scan requests that happens as a fallback behaves differently from a
> regular software scan. This doesn't matter to you (right now) because
> you only fall back to software if you're not connected, but in the case
> that you _are_ connected, it would behave differently due to all the
> code in __ieee80211_start_scan() that handles a single-channel software
> scan differently if that matches the currently used channel.
>
>
> So ultimately, I think the (combined) problem is that the fallback is
> implemented badly.
>
> My suggestion would be to separate the decision of whether to do
> software or hardware scan from the hw_scan callback. This could be done
> by flags, e.g. hw->scanflags & IEEE80211_SW_SCAN_WHILE_UNASSOC, which
> would be your case IIRC, though TBD what that means for AP mode.
>
> Obviously, you'll have to change all the (four) things that are checking
> for "local->ops->hw_scan" to check something else. The three in
> __ieee80211_start_scan() obviously just check the result of the
> condition, and the one in __ieee80211_scan_completed() can already check
> local->scanning today since that's set by then (and the scanning==0 case
> is only valid with aborted, so not relevant for the hw_scan condition).
>
> Perhaps we can actually do the return value from drv_hw_scan(), but in
> that case I'd say it should look something like this:
>
> https://p.sipsolutions.net/6ca3554b24e09ffb.txt
>
> Yes, that's less efficient (due to the whole idle recalculation), but
> it's much easier to understand what's going on, IMHO.
>
> There's also a question about whether or not capabilities match, e.g. if
> we do software scan we report 4 SSIDs, low prio scan, AP scan, random
> scan, minimal scan content ... these are OK, but if there are ever
> features that mac80211 *doesn't* support, we'll be in trouble if
> somebody tries to use them.
>
> Hope this helps. If my patch actually works (I obviously haven't tested)
> then I can send it to the list with signed-off-by and all.
>
> johannes

Hi Johannes,
Thanks for the brief explanation and As you mentioned above, we made
the changes w.r.t our module.
We will understand, what we missed here and will test with your patch
and update you.

Regards,
Siva Rebbagondla.

Siva Rebbagondla Sept. 22, 2018, 5:29 a.m. UTC | #6

On Tue, Sep 11, 2018 at 6:50 PM Siva Rebbagondla <siva8118@gmail.com> wrote:
>
> On Tue, Sep 11, 2018 at 5:55 PM Johannes Berg <johannes@sipsolutions.net> wrote:
> >
> > [re-adding the previous thread]
> >
> > > As discussed earlier, please find attached tar ball for driver logs,
> > > traces and debug patches.
> > > I have added a README for reference in attached folder, go through the
> > > steps and let me know your feedback.
> >
> > Thanks! The tracing doesn't seem to have worked, but the logging helps a
> > lot.
> >
> > So what happens, to summarize, is that we have hardware that doesn't
> > support scanning on both bands at the same time, i.e.
> > SINGLE_SCAN_ON_ALL_BANDS isn't set.
Hi Johannes,
Actually, we have missed this in code. Our module supports scanning on
both bands at the same time.
Thanks for the info.
And also we ran the scan on our dual band module by setting
"SINGLE_SCAN_ON_ALL_BANDS" flag in
driver and removing SCAN_HW_CANCELLED bit in mac80211 stack. We didn't
see any issues in scan and
as i mentioned earlier, I didn't observe "scan aborted".
Can i resubmit patch with these changes?.

Thanks
Siva Rebbagondla
> >
> > This is with the patch to make -EPERM into "please do software scan" (we
> > can debate whether that should be -EPERM or 1 or something, doesn't
> > matter for this discussion).
> >
> > This logic only happens here:
> >
> >         if (local->ops->hw_scan) {
> >                 WARN_ON(!ieee80211_prep_hw_scan(local));
> >                 rc = drv_hw_scan(local, sdata, local->hw_scan_req);
> >                 if (rc == -EPERM) {
> >                         set_bit(SCAN_HW_CANCELLED, &local->scanning);
> >                         __set_bit(SCAN_SW_SCANNING, &local->scanning);
> >                         rc = ieee80211_start_sw_scan(local, sdata);
> >                 }
> > (in __ieee80211_start_scan).
> >
> > Next thing is that the software scan completes. This calls
> > __ieee80211_scan_completed(), which assumes hardware scan is running,
> > since
> >         bool hw_scan = local->ops->hw_scan;
> >
> > This now again runs drv_hw_scan():
> >
> >         if (hw_scan && !aborted &&
> >             !ieee80211_hw_check(&local->hw, SINGLE_SCAN_ON_ALL_BANDS) &&
> >             ieee80211_prep_hw_scan(local)) {
> >                 int rc;
> >
> >                 rc = drv_hw_scan(local,
> >                         rcu_dereference_protected(local->scan_sdata,
> >                                                   lockdep_is_held(&local->mtx)),
> >                         local->hw_scan_req);
> >
> > which presumably again returns -EPERM (conditions didn't change), and
> > thus we end up with scan abort.
> >
> > To work around this, you were setting and testing HW_SCAN_CANCELLED in
> > two new places (you can see the setting above).
> >
> > Looking at this in more detail, I think we have multiple issues with the
> > way the fallback is implemented.
> >
> > First of all, this issue at hand. That's worked around, and could be -
> > more properly IMHO - worked around by setting a new bit
> > (HW_SCAN_SW_FALLBACK or so).
> >
> > However, note that due to the way you implemented this, the software
> > scan requests that happens as a fallback behaves differently from a
> > regular software scan. This doesn't matter to you (right now) because
> > you only fall back to software if you're not connected, but in the case
> > that you _are_ connected, it would behave differently due to all the
> > code in __ieee80211_start_scan() that handles a single-channel software
> > scan differently if that matches the currently used channel.
> >
> >
> > So ultimately, I think the (combined) problem is that the fallback is
> > implemented badly.
> >
> > My suggestion would be to separate the decision of whether to do
> > software or hardware scan from the hw_scan callback. This could be done
> > by flags, e.g. hw->scanflags & IEEE80211_SW_SCAN_WHILE_UNASSOC, which
> > would be your case IIRC, though TBD what that means for AP mode.
> >
> > Obviously, you'll have to change all the (four) things that are checking
> > for "local->ops->hw_scan" to check something else. The three in
> > __ieee80211_start_scan() obviously just check the result of the
> > condition, and the one in __ieee80211_scan_completed() can already check
> > local->scanning today since that's set by then (and the scanning==0 case
> > is only valid with aborted, so not relevant for the hw_scan condition).
> >
> > Perhaps we can actually do the return value from drv_hw_scan(), but in
> > that case I'd say it should look something like this:
> >
> > https://p.sipsolutions.net/6ca3554b24e09ffb.txt
> >
> > Yes, that's less efficient (due to the whole idle recalculation), but
> > it's much easier to understand what's going on, IMHO.
> >
> > There's also a question about whether or not capabilities match, e.g. if
> > we do software scan we report 4 SSIDs, low prio scan, AP scan, random
> > scan, minimal scan content ... these are OK, but if there are ever
> > features that mac80211 *doesn't* support, we'll be in trouble if
> > somebody tries to use them.
> >
> > Hope this helps. If my patch actually works (I obviously haven't tested)
> > then I can send it to the list with signed-off-by and all.
> >
> > johannes
>
> Hi Johannes,
> Thanks for the brief explanation and As you mentioned above, we made
> the changes w.r.t our module.
> We will understand, what we missed here and will test with your patch
> and update you.
>
> Regards,
> Siva Rebbagondla.

Johannes Berg Sept. 28, 2018, 7:21 a.m. UTC | #7

Hi,

> Actually, we have missed this in code. Our module supports scanning on
> both bands at the same time.

Yeah, but we still have to support drivers that don't support this.

> Thanks for the info.
> And also we ran the scan on our dual band module by setting
> "SINGLE_SCAN_ON_ALL_BANDS" flag in
> driver and removing SCAN_HW_CANCELLED bit in mac80211 stack. We didn't
> see any issues in scan and
> as i mentioned earlier, I didn't observe "scan aborted".
> Can i resubmit patch with these changes?.

Were you able to try my patch instead? I tend to like the structure of
it better - i.e. how we fall back after cleaning up the state, rather
than directly in the middle.

Like I said above, I don't think we should take your patch - you've seen
a bug there, and if we just work around it by making your driver do
both-bands-at-once, we still have the bug...

johannes

[v3,1/2] mac80211: invoke sw_scan if hw_scan returns EPERM

Commit Message

Comments

Patch