Message ID | 20240827141037.745738-1-prestwoj@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | station: check for roam timeout before rearming | expand |
Context | Check | Description |
---|---|---|
tedd_an/pre-ci_am | success | Success |
prestwoj/iwd-alpine-ci-fetch | success | Fetch PR |
prestwoj/iwd-ci-gitlint | success | GitLint |
prestwoj/iwd-ci-fetch | success | Fetch PR |
prestwoj/iwd-alpine-ci-setupell | success | Prep - Setup ELL |
prestwoj/iwd-ci-setupell | success | Prep - Setup ELL |
prestwoj/iwd-ci-makedistcheck | success | Make Distcheck |
prestwoj/iwd-ci-incremental_build | success | Incremental build not run PASS |
prestwoj/iwd-alpine-ci-makedistcheck | success | Make Distcheck |
prestwoj/iwd-alpine-ci-incremental_build | success | Incremental build not run PASS |
prestwoj/iwd-ci-build | success | Build - Configure |
prestwoj/iwd-alpine-ci-build | success | Build - Configure |
prestwoj/iwd-ci-makecheckvalgrind | success | Make Check w/Valgrind |
prestwoj/iwd-ci-makecheck | success | Make Check |
prestwoj/iwd-ci-clang | success | clang PASS |
prestwoj/iwd-alpine-ci-makecheckvalgrind | success | Make Check w/Valgrind |
prestwoj/iwd-alpine-ci-makecheck | success | Make Check |
prestwoj/iwd-ci-testrunner | success | test-runner PASS |
Hi James, On 8/27/24 9:10 AM, James Prestwood wrote: > A user reported a crash which was due to the roam trigger timeout > being overwritten, followed by a disconnect. Post-disconnect the > timer would fire and result in a crash. Its not clear exactly where > the overwrite was happening but upon code inspection it could > happen in the following scenario: > > 1. Beacon loss event, start roam timeout > 2. Signal low event, no check if timeout is running and the timeout > gets overwritten. > > The reported crash actually didn't appear to be from the above > scenario but something else, so now all instances where the timer > is rearmed we also check if there is already a timer set. > --- > src/station.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/src/station.c b/src/station.c > index 30a1232a..e188aed4 100644 > --- a/src/station.c > +++ b/src/station.c > @@ -2221,7 +2221,7 @@ static void station_roamed(struct station *station) > * Schedule another roaming attempt in case the signal continues to > * remain low. A subsequent high signal notification will cancel it. > */ > - if (station->signal_low) > + if (station->signal_low && L_WARN_ON(!station->roam_trigger_timeout)) I'm reading this as: if signal_low and roam_trigger_timeout is NULL: 1. print a warning 2. rearm the timer Is 1. intended? > station_roam_timeout_rearm(station, roam_retry_interval); > > if (station->netconfig) > @@ -2260,7 +2260,7 @@ static void station_roam_retry(struct station *station) > station->roam_scan_full = false; > station->ap_directed_roaming = false; > > - if (station->signal_low) > + if (station->signal_low && !station->roam_trigger_timeout) > station_roam_timeout_rearm(station, roam_retry_interval); > } > > @@ -3202,6 +3202,9 @@ static void station_low_rssi(struct station *station) > if (station_cannot_roam(station)) > return; > > + if (station->roam_trigger_timeout) > + return; > + One thing I'm a bit worried about with such a simple approach is that we might have different roam timeouts. For example, the 60 second roam retry interval is active and we get a low rssi event. In this case we probably should schedule a roam sooner. Maybe we should book-keep timeout information better better and make sure the timeout happens within the minimum time (active time remaining / new timeout)? > /* Set a 5-second initial timeout */ > station_roam_timeout_rearm(station, 5); > } Regards, -Denis
Hi Denis, On 8/27/24 7:36 PM, Denis Kenzior wrote: > Hi James, > > On 8/27/24 9:10 AM, James Prestwood wrote: >> A user reported a crash which was due to the roam trigger timeout >> being overwritten, followed by a disconnect. Post-disconnect the >> timer would fire and result in a crash. Its not clear exactly where >> the overwrite was happening but upon code inspection it could >> happen in the following scenario: >> >> 1. Beacon loss event, start roam timeout >> 2. Signal low event, no check if timeout is running and the timeout >> gets overwritten. >> >> The reported crash actually didn't appear to be from the above >> scenario but something else, so now all instances where the timer >> is rearmed we also check if there is already a timer set. >> --- >> src/station.c | 7 +++++-- >> 1 file changed, 5 insertions(+), 2 deletions(-) >> >> diff --git a/src/station.c b/src/station.c >> index 30a1232a..e188aed4 100644 >> --- a/src/station.c >> +++ b/src/station.c >> @@ -2221,7 +2221,7 @@ static void station_roamed(struct station >> *station) >> * Schedule another roaming attempt in case the signal >> continues to >> * remain low. A subsequent high signal notification will >> cancel it. >> */ >> - if (station->signal_low) >> + if (station->signal_low && >> L_WARN_ON(!station->roam_trigger_timeout)) > > I'm reading this as: > if signal_low and roam_trigger_timeout is NULL: > 1. print a warning > 2. rearm the timer > > Is 1. intended? > >> station_roam_timeout_rearm(station, roam_retry_interval); >> if (station->netconfig) >> @@ -2260,7 +2260,7 @@ static void station_roam_retry(struct station >> *station) >> station->roam_scan_full = false; >> station->ap_directed_roaming = false; >> - if (station->signal_low) >> + if (station->signal_low && !station->roam_trigger_timeout) >> station_roam_timeout_rearm(station, roam_retry_interval); >> } >> @@ -3202,6 +3202,9 @@ static void station_low_rssi(struct station >> *station) >> if (station_cannot_roam(station)) >> return; >> + if (station->roam_trigger_timeout) >> + return; >> + > > One thing I'm a bit worried about with such a simple approach is that > we might have different roam timeouts. For example, the 60 second > roam retry interval is active and we get a low rssi event. In this > case we probably should schedule a roam sooner. Maybe we should > book-keep timeout information better better and make sure the timeout > happens within the minimum time (active time remaining / new timeout)? Sure, the thought crossed my mind as well. Looks like there is no way to get the active timeout within the l_timeout itself, it doesn't track that so we can manually. > >> /* Set a 5-second initial timeout */ >> station_roam_timeout_rearm(station, 5); >> } > > Regards, > -Denis
Hi James, >> >> One thing I'm a bit worried about with such a simple approach is that we might >> have different roam timeouts. For example, the 60 second roam retry interval >> is active and we get a low rssi event. In this case we probably should >> schedule a roam sooner. Maybe we should book-keep timeout information better >> better and make sure the timeout happens within the minimum time (active time >> remaining / new timeout)? > Sure, the thought crossed my mind as well. Looks like there is no way to get the > active timeout within the l_timeout itself, it doesn't track that so we can > manually. You might be able to add something into ell using timerfd_gettime. Regards, -Denis
diff --git a/src/station.c b/src/station.c index 30a1232a..e188aed4 100644 --- a/src/station.c +++ b/src/station.c @@ -2221,7 +2221,7 @@ static void station_roamed(struct station *station) * Schedule another roaming attempt in case the signal continues to * remain low. A subsequent high signal notification will cancel it. */ - if (station->signal_low) + if (station->signal_low && L_WARN_ON(!station->roam_trigger_timeout)) station_roam_timeout_rearm(station, roam_retry_interval); if (station->netconfig) @@ -2260,7 +2260,7 @@ static void station_roam_retry(struct station *station) station->roam_scan_full = false; station->ap_directed_roaming = false; - if (station->signal_low) + if (station->signal_low && !station->roam_trigger_timeout) station_roam_timeout_rearm(station, roam_retry_interval); } @@ -3202,6 +3202,9 @@ static void station_low_rssi(struct station *station) if (station_cannot_roam(station)) return; + if (station->roam_trigger_timeout) + return; + /* Set a 5-second initial timeout */ station_roam_timeout_rearm(station, 5); }