mbox series

[RFC,0/4] Roam based on packet loss event

Message ID 20220809230428.215970-1-prestwoj@gmail.com (mailing list archive)
Headers show
Series Roam based on packet loss event | expand

Message

James Prestwood Aug. 9, 2022, 11:04 p.m. UTC
A few weeks ago a user (CC'ed) reported that his driver was not roaming
as expected, and would get disconnected before any CQM RSSI events got
to userspace. He did see that the kernel was sending packet loss events
which should also be a good reason to roam.

These patches do just that, roam based on packet loss events.

Sending as an RFC so Michael can (hopefully) give this a trial run. I
can simulate packet loss events in a virtual environment, but testing
on the driver in question would be best.

James Prestwood (4):
  station: react to (new) netdev packet loss event
  netdev: handle packet loss notification
  auto-t: add generic tx_packet function
  auto-t: add packet loss test to testPSK-roam

 autotests/testPSK-roam/connection_test.py | 46 ++++++++++++++++++++++-
 autotests/util/testutil.py                |  7 ++++
 src/netdev.c                              |  8 ++--
 src/netdev.h                              |  1 +
 src/station.c                             | 39 ++++++++++++++++---
 5 files changed, 90 insertions(+), 11 deletions(-)

Comments

Michael Johnson Aug. 15, 2022, 2:17 p.m. UTC | #1
Hi James,

I've been testing this today and it does seem to do what is intended.
When we get a packet loss event we start a scan.

```
Aug 15 12:07:17 p3-1337 iwd[447]: src/netdev.c:netdev_mlme_notify()
MLME notification Notify CQM(64)
Aug 15 12:07:17 p3-1337 iwd[447]: src/netdev.c:netdev_cqm_event()
Signal change event (above=1 signal=-60)
Aug 15 12:08:20 p3-1337 iwd[447]: src/netdev.c:netdev_mlme_notify()
MLME notification Notify CQM(64)
Aug 15 12:08:20 p3-1337 iwd[447]: src/station.c:station_packets_lost()
Packets lost event: 10
Aug 15 12:08:20 p3-1337 iwd[447]: src/station.c:station_roam_scan() ifindex: 5
Aug 15 12:08:20 p3-1337 iwd[447]:
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 61
Aug 15 12:08:20 p3-1337 iwd[447]: src/wiphy.c:wiphy_radio_work_next()
Starting work item 61
Aug 15 12:08:20 p3-1337 iwd[447]: src/station.c:station_start_roam()
Using cached neighbor report for roam
Aug 15 12:08:20 p3-1337 iwd[447]: src/scan.c:scan_notify() Scan
notification Trigger Scan(33)
```

In most cases, this does result in a better BSSID being found and a
roam occuring. Occasionally a better BSSID isn't found and the roam
fails, presumably when the packet loss event is not directly caused by
a low RSSI. Either way the new behaviour does add more opportunities
for fixing itself so seems an improvement, thanks!

Regards,
Michael

On Wed, 10 Aug 2022 at 00:04, James Prestwood <prestwoj@gmail.com> wrote:
>
> A few weeks ago a user (CC'ed) reported that his driver was not roaming
> as expected, and would get disconnected before any CQM RSSI events got
> to userspace. He did see that the kernel was sending packet loss events
> which should also be a good reason to roam.
>
> These patches do just that, roam based on packet loss events.
>
> Sending as an RFC so Michael can (hopefully) give this a trial run. I
> can simulate packet loss events in a virtual environment, but testing
> on the driver in question would be best.
>
> James Prestwood (4):
>   station: react to (new) netdev packet loss event
>   netdev: handle packet loss notification
>   auto-t: add generic tx_packet function
>   auto-t: add packet loss test to testPSK-roam
>
>  autotests/testPSK-roam/connection_test.py | 46 ++++++++++++++++++++++-
>  autotests/util/testutil.py                |  7 ++++
>  src/netdev.c                              |  8 ++--
>  src/netdev.h                              |  1 +
>  src/station.c                             | 39 ++++++++++++++++---
>  5 files changed, 90 insertions(+), 11 deletions(-)
>
> --
> 2.34.3
>
James Prestwood Aug. 15, 2022, 4:25 p.m. UTC | #2
On Mon, 2022-08-15 at 15:17 +0100, Michael Johnson wrote:
> Hi James,
> 
> I've been testing this today and it does seem to do what is intended.
> When we get a packet loss event we start a scan.
> 
> ```
> Aug 15 12:07:17 p3-1337 iwd[447]: src/netdev.c:netdev_mlme_notify()
> MLME notification Notify CQM(64)
> Aug 15 12:07:17 p3-1337 iwd[447]: src/netdev.c:netdev_cqm_event()
> Signal change event (above=1 signal=-60)
> Aug 15 12:08:20 p3-1337 iwd[447]: src/netdev.c:netdev_mlme_notify()
> MLME notification Notify CQM(64)
> Aug 15 12:08:20 p3-1337 iwd[447]:
> src/station.c:station_packets_lost()
> Packets lost event: 10
> Aug 15 12:08:20 p3-1337 iwd[447]: src/station.c:station_roam_scan()
> ifindex: 5
> Aug 15 12:08:20 p3-1337 iwd[447]:
> src/wiphy.c:wiphy_radio_work_insert() Inserting work item 61
> Aug 15 12:08:20 p3-1337 iwd[447]: src/wiphy.c:wiphy_radio_work_next()
> Starting work item 61
> Aug 15 12:08:20 p3-1337 iwd[447]: src/station.c:station_start_roam()
> Using cached neighbor report for roam
> Aug 15 12:08:20 p3-1337 iwd[447]: src/scan.c:scan_notify() Scan
> notification Trigger Scan(33)
> ```
> 
> In most cases, this does result in a better BSSID being found and a
> roam occuring. Occasionally a better BSSID isn't found and the roam
> fails, presumably when the packet loss event is not directly caused
> by
> a low RSSI. Either way the new behaviour does add more opportunities
> for fixing itself so seems an improvement, thanks!

Great, thanks for testing!

As far as not finding a better BSS things become a bit harder to handle
since we don't want to affect the existing roam behavior which has
worked well for the majority of users. We are still talking about ideas
to handle this one.

Anyways, at least its an improvement.

Thanks,
James

 

> 
> Regards,
> Michael
> 
> On Wed, 10 Aug 2022 at 00:04, James Prestwood <prestwoj@gmail.com>
> wrote:
> > 
> > A few weeks ago a user (CC'ed) reported that his driver was not
> > roaming
> > as expected, and would get disconnected before any CQM RSSI events
> > got
> > to userspace. He did see that the kernel was sending packet loss
> > events
> > which should also be a good reason to roam.
> > 
> > These patches do just that, roam based on packet loss events.
> > 
> > Sending as an RFC so Michael can (hopefully) give this a trial run.
> > I
> > can simulate packet loss events in a virtual environment, but
> > testing
> > on the driver in question would be best.
> > 
> > James Prestwood (4):
> >   station: react to (new) netdev packet loss event
> >   netdev: handle packet loss notification
> >   auto-t: add generic tx_packet function
> >   auto-t: add packet loss test to testPSK-roam
> > 
> >  autotests/testPSK-roam/connection_test.py | 46
> > ++++++++++++++++++++++-
> >  autotests/util/testutil.py                |  7 ++++
> >  src/netdev.c                              |  8 ++--
> >  src/netdev.h                              |  1 +
> >  src/station.c                             | 39 ++++++++++++++++---
> >  5 files changed, 90 insertions(+), 11 deletions(-)
> > 
> > --
> > 2.34.3
> >
Denis Kenzior Aug. 16, 2022, 8:31 p.m. UTC | #3
Hi James,

On 8/9/22 18:04, James Prestwood wrote:
> A few weeks ago a user (CC'ed) reported that his driver was not roaming
> as expected, and would get disconnected before any CQM RSSI events got
> to userspace. He did see that the kernel was sending packet loss events
> which should also be a good reason to roam.
> 
> These patches do just that, roam based on packet loss events.
> 
> Sending as an RFC so Michael can (hopefully) give this a trial run. I
> can simulate packet loss events in a virtual environment, but testing
> on the driver in question would be best.
> 

I went ahead and applied this series.  Hopefully the default packet loss 
threshold is good enough.  Worst case we might have to make it configurable or 
disable it on certain drivers.

Regards,
-Denis