diff mbox series

scan: make scan requests fail if trigger returns -EBUSY

Message ID 20221117152155.2310412-1-alvin@pqrs.dk (mailing list archive)
State New
Headers show
Series scan: make scan requests fail if trigger returns -EBUSY | expand

Checks

Context Check Description
tedd_an/pre-ci_am success Success
prestwoj/iwd-alpine-ci-fetch success Fetch PR
prestwoj/iwd-ci-gitlint success GitLint
prestwoj/iwd-ci-fetch success Fetch PR
prestwoj/iwd-ci-makedistcheck success Make Distcheck
prestwoj/iwd-ci-incremental_build success Incremental build not run PASS
prestwoj/iwd-alpine-ci-makedistcheck success Make Distcheck
prestwoj/iwd-alpine-ci-incremental_build success Incremental build not run PASS
prestwoj/iwd-ci-build success Build - Configure
prestwoj/iwd-alpine-ci-build success Build - Configure
prestwoj/iwd-ci-makecheckvalgrind success Make Check w/Valgrind
prestwoj/iwd-ci-clang success clang PASS
prestwoj/iwd-ci-makecheck success Make Check
prestwoj/iwd-alpine-ci-makecheckvalgrind success Make Check w/Valgrind
prestwoj/iwd-alpine-ci-makecheck success Make Check
prestwoj/iwd-ci-testrunner success test-runner PASS

Commit Message

Alvin Šipraga Nov. 17, 2022, 3:21 p.m. UTC
From: Alvin Šipraga <alsi@bang-olufsen.dk>

If a CMD_TRIGGER_SCAN request fails with -EBUSY, iwd currently assumes
that a scan is ongoing on the underlying wdev and will retry the same
command when that scan is complete. It gets notified of that completion
via the scan_notify() function, and kicks the scan logic to try again.

However, if there is another wdev on the same wiphy and that wdev has a
scan request in flight, the kernel will also return -EBUSY. In other
words, only one scan request per wiphy is permitted.

As an example, the brcmfmac driver can create an AP interface on the
same wiphy as the default station interface, and scans can be triggered
on that AP interface.

If -EBUSY is returned because another wdev is scanning, then iwd won't
know when it can retry the original trigger request because the relevant
netlink event will arrive on a different wdev. Indeed, if no scan
context exists for that other wdev, then scan_notify will return early
and the scan logic will stall indefinitely.

From a user perspective, this leads to a situation where iwd reports
over DBus that it is not scanning, but it never queues up another
periodic scan. Or if a scan was explicitly requested over DBus and the
trigger for that scan ended in an -EBUSY for the aforementioned reason,
the DBus method call stall indefinitely.

Since scan contexts are handled on a per-wdev basis, and since multiple
wdevs may in principle share the same wiphy, solve the above problem by
instead having the scan module simply propagate the -EBUSY error. In
practice this will mean that scans may fail more often, in which case at
least one default error message will be logged each time:

iwd[1451]: Received error during CMD_TRIGGER_SCAN: Device or resource busy (16)

It can be debated whether or not this message is needed: maybe the
upper-layer may wish to implement its own retry logic instead. In the
case of a periodic scan for example, a failure is met with ambivalence:
no additional error is logged, and a new scan is simply scheduled
according to the periodic scan interval. In other cases, a scan might be
part of a DBus request to connect to a hidden network. Here, the error
will be different:

Failure to connect because the network wasn't found after scanning:

    $ iwctl station wlan0 connect-hidden foobar
    Object not found

Failure to connect because another scan was active on the same wiphy:

    $ iw dev uap0 scan >/dev/null &
    $ iwctl station wlan0 connect-hidden foobar
    Operation failed

This is the trade-off made by this change in order not to complicate the
scan context with lookup-by-wiphy and an internal retry mechanism, while
still providing robustness and not stalling the scan logic of the
daemon.

Additionally, in the event of a failed scan trigger via the DBus scan
method, do not automatically jumpstart the autoconnect logic, as this
would not make sense.
---
 src/scan.c    | 22 ++++------------------
 src/station.c |  2 +-
 2 files changed, 5 insertions(+), 19 deletions(-)

Comments

Denis Kenzior Nov. 17, 2022, 4:29 p.m. UTC | #1
Hi Alvin,

On 11/17/22 09:21, Alvin Šipraga wrote:
> From: Alvin Šipraga <alsi@bang-olufsen.dk>
> 
> If a CMD_TRIGGER_SCAN request fails with -EBUSY, iwd currently assumes
> that a scan is ongoing on the underlying wdev and will retry the same
> command when that scan is complete. It gets notified of that completion
> via the scan_notify() function, and kicks the scan logic to try again.
> 
> However, if there is another wdev on the same wiphy and that wdev has a
> scan request in flight, the kernel will also return -EBUSY. In other
> words, only one scan request per wiphy is permitted.

This should really not be possible since all scans go through the wiphy work queue?

> 
> As an example, the brcmfmac driver can create an AP interface on the
> same wiphy as the default station interface, and scans can be triggered
> on that AP interface.

Or are you setting up a second interface out of iwd's control?

> 
> If -EBUSY is returned because another wdev is scanning, then iwd won't
> know when it can retry the original trigger request because the relevant
> netlink event will arrive on a different wdev. Indeed, if no scan
> context exists for that other wdev, then scan_notify will return early
> and the scan logic will stall indefinitely.

So I think the way to handle this is to make scan aware of all interfaces, even 
foreign ones, and consider restarting a scan request whenever a scan finished 
notification on a given phy is received.

> 
>  From a user perspective, this leads to a situation where iwd reports
> over DBus that it is not scanning, but it never queues up another
> periodic scan. Or if a scan was explicitly requested over DBus and the
> trigger for that scan ended in an -EBUSY for the aforementioned reason,
> the DBus method call stall indefinitely.
> 
> Since scan contexts are handled on a per-wdev basis, and since multiple
> wdevs may in principle share the same wiphy, solve the above problem by
> instead having the scan module simply propagate the -EBUSY error. In
> practice this will mean that scans may fail more often, in which case at
> least one default error message will be logged each time:

I doubt this is a good idea unless you want to visit every call site that 
invokes the scan logic and make it handle -EBUSY gracefully, since it is a bit 
of a special case.

Regards,
-Denis
Alvin Šipraga Nov. 17, 2022, 4:46 p.m. UTC | #2
Hi Denis,

Thanks for your quick reply.

On Thu, Nov 17, 2022 at 10:29:22AM -0600, Denis Kenzior wrote:
> Hi Alvin,
> 
> On 11/17/22 09:21, Alvin Šipraga wrote:
> > From: Alvin Šipraga <alsi@bang-olufsen.dk>
> > 
> > If a CMD_TRIGGER_SCAN request fails with -EBUSY, iwd currently assumes
> > that a scan is ongoing on the underlying wdev and will retry the same
> > command when that scan is complete. It gets notified of that completion
> > via the scan_notify() function, and kicks the scan logic to try again.
> > 
> > However, if there is another wdev on the same wiphy and that wdev has a
> > scan request in flight, the kernel will also return -EBUSY. In other
> > words, only one scan request per wiphy is permitted.
> 
> This should really not be possible since all scans go through the wiphy work queue?
> 
> > 
> > As an example, the brcmfmac driver can create an AP interface on the
> > same wiphy as the default station interface, and scans can be triggered
> > on that AP interface.
> 
> Or are you setting up a second interface out of iwd's control?

This.

> 
> > 
> > If -EBUSY is returned because another wdev is scanning, then iwd won't
> > know when it can retry the original trigger request because the relevant
> > netlink event will arrive on a different wdev. Indeed, if no scan
> > context exists for that other wdev, then scan_notify will return early
> > and the scan logic will stall indefinitely.
> 
> So I think the way to handle this is to make scan aware of all interfaces,
> even foreign ones, and consider restarting a scan request whenever a scan
> finished notification on a given phy is received.

Yes, I also considered doing something like that:

- have scan_notify take notice of all NEW_SCAN_RESULTS/SCAN_ABORTED
  events;
- if a scan context is found for the wdev associated with the event,
  preserve the old logic;
- if a scan context isn't found, then search the same queue with an
  alternate match function like this:

static bool scan_context_match_wiphy(const void *a, const void *b)
{
	const struct scan_context *sc = a;
	const uint64_t *wiphy_id = b;

	/* Find a scan context on the same wiphy, as it might be held up */
	if (sc->wiphy_id != *wiphy_id)
		return false;

	/* It should have a request pending... */
	if (!l_queue_peek_head(sc->requests))
		return false;

	/* This means the next request is worth retrying */
	return true;
}

- then if such a scan context is found, we can reset the scanning state
  to SCAN_STATE_NOT_RUNNING and call start_next_scan_request()

If that sounds semi-reasonable I can draft a patch and we can discuss it
further. One theoretical shortcoming of the above is that it is not
"fair", i.e. if you have multiple wdevs on the same wiphy then their
scan request queues will be fully drained before the next wdev gets a
chance to scan. But I guess it's OK.

> 
> > 
> >  From a user perspective, this leads to a situation where iwd reports
> > over DBus that it is not scanning, but it never queues up another
> > periodic scan. Or if a scan was explicitly requested over DBus and the
> > trigger for that scan ended in an -EBUSY for the aforementioned reason,
> > the DBus method call stall indefinitely.
> > 
> > Since scan contexts are handled on a per-wdev basis, and since multiple
> > wdevs may in principle share the same wiphy, solve the above problem by
> > instead having the scan module simply propagate the -EBUSY error. In
> > practice this will mean that scans may fail more often, in which case at
> > least one default error message will be logged each time:
> 
> I doubt this is a good idea unless you want to visit every call site that
> invokes the scan logic and make it handle -EBUSY gracefully, since it is a
> bit of a special case.

OK, let me try the other way then :)

Kind regards,
Alvin
Denis Kenzior Nov. 17, 2022, 5 p.m. UTC | #3
Hi Alvin,

>> Or are you setting up a second interface out of iwd's control?
> 
> This.
> 

Ok, makes sense.

>>
>>>
>>> If -EBUSY is returned because another wdev is scanning, then iwd won't
>>> know when it can retry the original trigger request because the relevant
>>> netlink event will arrive on a different wdev. Indeed, if no scan
>>> context exists for that other wdev, then scan_notify will return early
>>> and the scan logic will stall indefinitely.
>>
>> So I think the way to handle this is to make scan aware of all interfaces,
>> even foreign ones, and consider restarting a scan request whenever a scan
>> finished notification on a given phy is received.
> 
> Yes, I also considered doing something like that:
> 
> - have scan_notify take notice of all NEW_SCAN_RESULTS/SCAN_ABORTED
>    events;
> - if a scan context is found for the wdev associated with the event,
>    preserve the old logic;

I'm not aware of any devices that can invoke multiple scans.  It is always 
limited to 1 / phy AFAIK, although that might change in the future.  So really a 
scan finished / aborted notification should be treated as per phy and not 
per-wdev.

> - if a scan context isn't found, then search the same queue with an
>    alternate match function like this:
> 
> static bool scan_context_match_wiphy(const void *a, const void *b)
> {
> 	const struct scan_context *sc = a;
> 	const uint64_t *wiphy_id = b;
> 
> 	/* Find a scan context on the same wiphy, as it might be held up */
> 	if (sc->wiphy_id != *wiphy_id)
> 		return false;
> 
> 	/* It should have a request pending... */
> 	if (!l_queue_peek_head(sc->requests))
> 		return false;
> 
> 	/* This means the next request is worth retrying */
> 	return true;
> }

Yep, something like this.  The wiphy work queue would have a 'running' request 
and if that request is a scan, then that's the one that should be restarted.

> 
> - then if such a scan context is found, we can reset the scanning state
>    to SCAN_STATE_NOT_RUNNING and call start_next_scan_request()
> 
> If that sounds semi-reasonable I can draft a patch and we can discuss it
> further. One theoretical shortcoming of the above is that it is not
> "fair", i.e. if you have multiple wdevs on the same wiphy then their
> scan request queues will be fully drained before the next wdev gets a
> chance to scan. But I guess it's OK.

Not sure what you mean?  Internally iwd operates on 'logical' scan requests. 
These might be broken up into multiple actual scan requests to the driver, but 
each logical scan request is wrapped by a wiphy work request.  So multiple 
logical scan requests from multiple wdevs would be interleaved / scheduled 
appropriately.

But yes, I think what you propose here is the right approach.

Regards,
-Denis
diff mbox series

Patch

diff --git a/src/scan.c b/src/scan.c
index 5548914a12de..b487c342129d 100644
--- a/src/scan.c
+++ b/src/scan.c
@@ -229,12 +229,6 @@  static void scan_request_triggered(struct l_genl_msg *msg, void *userdata)
 
 	err = l_genl_msg_get_error(msg);
 	if (err < 0) {
-		/* Scan in progress, assume another scan is running */
-		if (err == -EBUSY) {
-			sc->state = SCAN_STATE_PASSIVE;
-			return;
-		}
-
 		scan_request_failed(sc, sr, err);
 
 		l_error("Received error during CMD_TRIGGER_SCAN: %s (%d)",
@@ -2094,7 +2088,6 @@  static void scan_notify(struct l_genl_msg *msg, void *user_data)
 	{
 		struct scan_freq_set *freqs;
 		bool send_next = false;
-		bool retry = false;
 		bool get_results = false;
 
 		sc->state = SCAN_STATE_NOT_RUNNING;
@@ -2138,24 +2131,17 @@  static void scan_notify(struct l_genl_msg *msg, void *user_data)
 
 			/*
 			 * Drop the ongoing scan if an external scan flushed
-			 * our results.  Otherwise, try to retry the trigger
-			 * request if it failed with an -EBUSY.
+			 * our results.
 			 */
 			if (sr && sr->started &&
 					scan_parse_flush_flag_from_msg(msg))
 				scan_finished(sc, -EAGAIN, NULL, NULL, sr);
-			else
-				retry = true;
 
 			sr = NULL;
 		}
 
-		/*
-		 * Send the next command of an ongoing request, or continue
-		 * with a previously busy scan attempt due to an external
-		 * scan.
-		 */
-		if (send_next || retry) {
+		/* Send the next command of an ongoing request */
+		if (send_next) {
 			struct scan_request *next = l_queue_peek_head(
 								sc->requests);
 
@@ -2220,7 +2206,7 @@  static void scan_notify(struct l_genl_msg *msg, void *user_data)
 			 * we may be able to now queue our own scan although
 			 * the abort could also have been triggered by the
 			 * hardware or the driver because of another activity
-			 * starting in which case we should just get an EBUSY.
+			 * starting in which case we should get an EBUSY.
 			 */
 			start_next_scan_request(&sr->work);
 		}
diff --git a/src/station.c b/src/station.c
index eab16eff5afa..f12ca22a8648 100644
--- a/src/station.c
+++ b/src/station.c
@@ -3845,7 +3845,7 @@  static void station_dbus_scan_triggered(int err, void *user_data)
 			dbus_pending_reply(&station->scan_pending, reply);
 		}
 
-		station_dbus_scan_done(station, true);
+		station_dbus_scan_done(station, false);
 		return;
 	}