diff mbox series

[2/2] wifi: mwifiex: Stop rejecting connection attempts while connected

Message ID 20230602225751.164525-2-kglund@google.com (mailing list archive)
State Rejected
Delegated to: Kalle Valo
Headers show
Series [1/2] wifi: cfg80211: Reject (re-)association to the same BSSID | expand

Commit Message

Kevin Lund June 2, 2023, 10:57 p.m. UTC
Currently, the Marvell WiFi driver rejects any connection attmept while
we are currently connected. This is poor logic, since there are several
legitimate use-cases for initiating a connection attempt while
connected, including re-association and BSS Transitioning. This logic
means that it's impossible for userspace to request the driver to
connect to a different BSS on the same ESS without explicitly requesting
a disconnect first.

Remove the check from the driver so that we can complete BSS transitions
on the first attempt.

Testing on Chrome OS has shown that this change resolves some issues
with failed BSS transitions.

Signed-off-by: Kevin Lund <kglund@google.com>
---
 drivers/net/wireless/marvell/mwifiex/cfg80211.c | 6 ------
 1 file changed, 6 deletions(-)

Comments

Brian Norris Aug. 4, 2023, 1:21 a.m. UTC | #1
On Fri, Jun 02, 2023 at 04:57:51PM -0600, Kevin Lund wrote:
> Currently, the Marvell WiFi driver rejects any connection attmept while
> we are currently connected. This is poor logic, since there are several
> legitimate use-cases for initiating a connection attempt while
> connected, including re-association and BSS Transitioning. This logic
> means that it's impossible for userspace to request the driver to
> connect to a different BSS on the same ESS without explicitly requesting
> a disconnect first.
> 
> Remove the check from the driver so that we can complete BSS transitions
> on the first attempt.
> 
> Testing on Chrome OS has shown that this change resolves some issues
> with failed BSS transitions.
> 
> Signed-off-by: Kevin Lund <kglund@google.com>
> ---
>  drivers/net/wireless/marvell/mwifiex/cfg80211.c | 6 ------
>  1 file changed, 6 deletions(-)

I've been told this one might need an extra look, but first off, it's
marked Rejected, likely due to feedback on patch 1, so probably needs a
resubmit if it needs consideration:

https://patchwork.kernel.org/project/linux-wireless/patch/20230602225751.164525-2-kglund@google.com/

But, did you attempt any background work on this, to determine why it
exists, or what other mitigations are in place? For example, I see that
sme.c's cfg80211_connect() makes a similar check, but only rejects
things if the SSID is different. So with that understanding, it's a
reasonable guess to say that mwifiex would be OK with just relying on
the existing cfg80211 checks instead.

In other words, I think this patch may be OK, but probably could use a
bit more explanation.

Also, how does "BSS Transitioning" (in your description) fit in here?
IIUC, cfg80211_connect() doesn't support that, as it only allows
reassociation to the same BSSID.
(Or, I might be confused here.)

Brian
Kevin Lund Aug. 7, 2023, 10:35 p.m. UTC | #2
Hey Brian, thanks for the response. I'm revising the patch now to
better illustrate the context and purpose for the patch, but I'll also
respond to your comments here so there is a clear chain.

> But, did you attempt any background work on this, to determine why it
> exists, or what other mitigations are in place?

Yes, I'll share some of my background checks here:

The check was initially added in the following patch:

https://github.com/torvalds/linux/commit/71954f24c93fd569314985e9a7319b68e0b918e6

Before this patch, the driver would explicitly deauthenticate
from its current AP whenever a new connection was requested.
Apparently this was racy, so the deauthentication was removed.
The commit message states that they want to  "Avoid
re-association while the device is already associated to a
network." My assertion is that this is invalid, since
re-associating while connect is a valid action, and it happens
during BSS-transitions.

I cross checked this behavior with the userspace SME side
and looked into a driver with userspace SME, and I could
not find any indication that the driver rejects association
attempts while connected.

> For example, I see that
> sme.c's cfg80211_connect() makes a similar check, but only rejects
> things if the SSID is different. So with that understanding, it's a
> reasonable guess to say that mwifiex would be OK with just relying on
> the existing cfg80211 checks instead.

> In other words, I think this patch may be OK, but probably could use a
> bit more explanation.

Yes, sme.c's cfg80211_connect() is a very useful point of
reference, and I do believe that it makes sufficient checks
such that the Marvell driver can rely on them, and is a part of
what drove me to make this patch in the first place. The
function makes two checks that basically carve out the exact
re-association functionality that I'm enabling in the Marvell driver.

These checks are commented as such:

 /*
* If we have an ssid_len, we're trying to connect or are
* already connected, so reject a new SSID unless it's the
* same (which is the case for re-association.)
*/

and

 /*
* If connected, reject (re-)association unless prev_bssid
* matches the current BSSID.
*/

The first condition asserts that if we're connected,
then the only network we should be trying to
connect to should be the network which we're
currently connected to. Basically, re-association
is only valid within the same ESS.

The second condition says that if we are currently
connected, then

1. We must have a prev_bssid value. This is the BSSID
that we were connected to previously, and it's presence
is the indication that we are requesting a re-association
rather than a normal association. Basically, if we're
connected, then we must be re-associating, not
normal associating.

2. prev_bssid must be the same as our current bssid.
All this is saying is that the previous BSSID which was
supplied by the caller must match the BSSID which the
driver thinks it is connected to. Basically, "the caller is
saying we should move from A to B, let's make sure
we're actually connected to A".

So, based on these checks it is abundantly clear that
cfg80211 absolutely intends it to be a normal flow to
request a connection while currently connected, and
it makes deliberate checks to ensure we're in a good
state when that happens.

> Also, how does "BSS Transitioning" (in your description) fit in here?
> IIUC, cfg80211_connect() doesn't support that, as it only allows
> reassociation to the same BSSID.

This is covered above, but cfg80211_connect()
doesn't actually assert that we're re-associating to
the same BSSID. It only asserts that the BSSID the
caller thinks we're transitioning from is the same
BSSID that the driver thinks we are currently
connected to.

Thanks,
Kevin


On Thu, Aug 3, 2023 at 7:21 PM Brian Norris <briannorris@chromium.org> wrote:
>
> On Fri, Jun 02, 2023 at 04:57:51PM -0600, Kevin Lund wrote:
> > Currently, the Marvell WiFi driver rejects any connection attmept while
> > we are currently connected. This is poor logic, since there are several
> > legitimate use-cases for initiating a connection attempt while
> > connected, including re-association and BSS Transitioning. This logic
> > means that it's impossible for userspace to request the driver to
> > connect to a different BSS on the same ESS without explicitly requesting
> > a disconnect first.
> >
> > Remove the check from the driver so that we can complete BSS transitions
> > on the first attempt.
> >
> > Testing on Chrome OS has shown that this change resolves some issues
> > with failed BSS transitions.
> >
> > Signed-off-by: Kevin Lund <kglund@google.com>
> > ---
> >  drivers/net/wireless/marvell/mwifiex/cfg80211.c | 6 ------
> >  1 file changed, 6 deletions(-)
>
> I've been told this one might need an extra look, but first off, it's
> marked Rejected, likely due to feedback on patch 1, so probably needs a
> resubmit if it needs consideration:
>
> https://patchwork.kernel.org/project/linux-wireless/patch/20230602225751.164525-2-kglund@google.com/
>
> But, did you attempt any background work on this, to determine why it
> exists, or what other mitigations are in place? For example, I see that
> sme.c's cfg80211_connect() makes a similar check, but only rejects
> things if the SSID is different. So with that understanding, it's a
> reasonable guess to say that mwifiex would be OK with just relying on
> the existing cfg80211 checks instead.
>
> In other words, I think this patch may be OK, but probably could use a
> bit more explanation.
>
> Also, how does "BSS Transitioning" (in your description) fit in here?
> IIUC, cfg80211_connect() doesn't support that, as it only allows
> reassociation to the same BSSID.
> (Or, I might be confused here.)
>
> Brian
Brian Norris Aug. 15, 2023, 12:21 a.m. UTC | #3
On Mon, Aug 07, 2023 at 04:35:49PM -0600, Kevin Lund wrote:
> Hey Brian, thanks for the response. I'm revising the patch now to
> better illustrate the context and purpose for the patch, but I'll also
> respond to your comments here so there is a clear chain.

Yes, good idea :)

> > But, did you attempt any background work on this, to determine why it
> > exists, or what other mitigations are in place?
> 
> Yes, I'll share some of my background checks here:
> 
> The check was initially added in the following patch:
> 
> https://github.com/torvalds/linux/commit/71954f24c93fd569314985e9a7319b68e0b918e6

Ah, good context. I didn't notice that part, and instead assumed that
the quirk (like so many quirks in this driver) existed since its
introduction.

Definitely note this in your followup patch, and while you're at it,
maybe avoid linking github and instead use the preferred commit format.
I think think checkpatch.pl would tell you that's something like:
commit 71954f24c93f ("mwifiex: do not re-associate when already
connected")

And you'll want to explain how you account for the original problem
being solved, or else explain that the problem doesn't apply.

> Before this patch, the driver would explicitly deauthenticate
> from its current AP whenever a new connection was requested.
> Apparently this was racy, so the deauthentication was removed.
> The commit message states that they want to  "Avoid
> re-association while the device is already associated to a
> network." My assertion is that this is invalid, since
> re-associating while connect is a valid action, and it happens
> during BSS-transitions.

Actually, I believe that commit was primarily addressing a particular
WARN()/WARN_ON() issue seen when doing this kind of reassociation,
because of how mwifiex was doing an internal deassociation, and losing
the context of the SSID that it was going to report back to cfg80211
upon (re)association.

So you'd need to show that you don't hit that warning again. (I doubt
you will, because the aforementioned commit also dropped the internal
deauthentication call.)

But I think you'd also need to explain why (or even better,
explain+test) the internal deauthentication was present in the first
place. The previous commit removed the internal deauthentication, and
instead just rejected the request. The former is OK to do, but only(?)
because we did the latter.

So, you'd have to help me remove that question mark: why is it OK for
mwifiex to run its connect (HostCmd_CMD_802_11_ASSOCIATE) flow without
an intervening disconnect? It's not clear to me that the firmware
protocol actually supports this, or that it's been vetted very much, given
how the original mwifiex used to behave (with a full disconnect and
reconnect).

Maybe David Lin @ NXP (CC'd) would be able to help here, as this starts
to ask questions we can only answer by either inspecting the firmware
(i.e., ask NXP) or by testing.

Regarding testing: what kind of testing has been done? On your multi-BSS
setup, have you ensured we're really sending appropriate 802.11 level
frames with this patch? e.g., do mwifiex clients end up sending a proper
REASSOCIATION REQUEST frame, or just ASSOCIATION REQUEST? It doesn't
look like mwifiex actually pushes the required "Current AP Address" down
into the association command, but it's always possible the firmware just
memorizes this and rewrites it...
...or alternatively, maybe mwifiex doesn't actually do Reassociation at
all here, and it just kinda happens to work OK when sending a regular
Association. I'm not sure if that makes this a good patch, or if we'll
end up with interop problems where cfg80211 thinks we're reassociating,
but the AP thinks we're associating, and eventually things break down.

Sorry if that's just a bunch of "unknowns" here, but that's life when
trying to retrofit things into an old full-MAC driver with no support
from the owners of the proprietary firmware. Maybe 802.11 protocol dumps
would make us happy enough though.

> I cross checked this behavior with the userspace SME side
> and looked into a driver with userspace SME, and I could
> not find any indication that the driver rejects association
> attempts while connected.

I don't think comparison with other drivers gives much evidence here.
It's a question of whether the firmware is properly tracking
"reassociation" (to the same or different BSS), or whether it really
needs a DEAUTH in between.

> > For example, I see that
> > sme.c's cfg80211_connect() makes a similar check, but only rejects
> > things if the SSID is different. So with that understanding, it's a
> > reasonable guess to say that mwifiex would be OK with just relying on
> > the existing cfg80211 checks instead.
> 
> > In other words, I think this patch may be OK, but probably could use a
> > bit more explanation.
> 
[...]
> The second condition says that if we are currently
> connected, then
> 
> 1. We must have a prev_bssid value. This is the BSSID
> that we were connected to previously, and it's presence
> is the indication that we are requesting a re-association
> rather than a normal association. Basically, if we're
> connected, then we must be re-associating, not
> normal associating.
> 
> 2. prev_bssid must be the same as our current bssid.
> All this is saying is that the previous BSSID which was
> supplied by the caller must match the BSSID which the
> driver thinks it is connected to. Basically, "the caller is
> saying we should move from A to B, let's make sure
> we're actually connected to A".

I misread what cfg80211_connect() was looking for -- thanks for the
additional explanation.

> So, based on these checks it is abundantly clear that
> cfg80211 absolutely intends it to be a normal flow to
> request a connection while currently connected, and
> it makes deliberate checks to ensure we're in a good
> state when that happens.

Right, thanks, I misunderstood cfg80211 for a bit. So I agree on the
cfg80211 expectation, but that still doesn't tell us how the mwifiex
firmware really behaves. I guess I retain some confusion (per above) of
why mwifiex would have forcibly DEAUTH'd on each reassociation request
in the past, if it wasn't necessary.

Brian
diff mbox series

Patch

diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
index bcd564dc3554a..84d650c9dceb0 100644
--- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c
+++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
@@ -2414,12 +2414,6 @@  mwifiex_cfg80211_connect(struct wiphy *wiphy, struct net_device *dev,
 		return -EINVAL;
 	}
 
-	if (priv->wdev.connected) {
-		mwifiex_dbg(adapter, ERROR,
-			    "%s: already connected\n", dev->name);
-		return -EALREADY;
-	}
-
 	if (priv->scan_block)
 		priv->scan_block = false;