diff mbox series

station: check for roam timeout before rearming

Message ID 20240827141037.745738-1-prestwoj@gmail.com (mailing list archive)
State New
Headers show
Series station: check for roam timeout before rearming | expand

Checks

Context Check Description
tedd_an/pre-ci_am success Success
prestwoj/iwd-alpine-ci-fetch success Fetch PR
prestwoj/iwd-ci-gitlint success GitLint
prestwoj/iwd-ci-fetch success Fetch PR
prestwoj/iwd-alpine-ci-setupell success Prep - Setup ELL
prestwoj/iwd-ci-setupell success Prep - Setup ELL
prestwoj/iwd-ci-makedistcheck success Make Distcheck
prestwoj/iwd-ci-incremental_build success Incremental build not run PASS
prestwoj/iwd-alpine-ci-makedistcheck success Make Distcheck
prestwoj/iwd-alpine-ci-incremental_build success Incremental build not run PASS
prestwoj/iwd-ci-build success Build - Configure
prestwoj/iwd-alpine-ci-build success Build - Configure
prestwoj/iwd-ci-makecheckvalgrind success Make Check w/Valgrind
prestwoj/iwd-ci-makecheck success Make Check
prestwoj/iwd-ci-clang success clang PASS
prestwoj/iwd-alpine-ci-makecheckvalgrind success Make Check w/Valgrind
prestwoj/iwd-alpine-ci-makecheck success Make Check
prestwoj/iwd-ci-testrunner success test-runner PASS

Commit Message

James Prestwood Aug. 27, 2024, 2:10 p.m. UTC
A user reported a crash which was due to the roam trigger timeout
being overwritten, followed by a disconnect. Post-disconnect the
timer would fire and result in a crash. Its not clear exactly where
the overwrite was happening but upon code inspection it could
happen in the following scenario:

1. Beacon loss event, start roam timeout
2. Signal low event, no check if timeout is running and the timeout
   gets overwritten.

The reported crash actually didn't appear to be from the above
scenario but something else, so now all instances where the timer
is rearmed we also check if there is already a timer set.
---
 src/station.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Denis Kenzior Aug. 28, 2024, 2:36 a.m. UTC | #1
Hi James,

On 8/27/24 9:10 AM, James Prestwood wrote:
> A user reported a crash which was due to the roam trigger timeout
> being overwritten, followed by a disconnect. Post-disconnect the
> timer would fire and result in a crash. Its not clear exactly where
> the overwrite was happening but upon code inspection it could
> happen in the following scenario:
> 
> 1. Beacon loss event, start roam timeout
> 2. Signal low event, no check if timeout is running and the timeout
>     gets overwritten.
> 
> The reported crash actually didn't appear to be from the above
> scenario but something else, so now all instances where the timer
> is rearmed we also check if there is already a timer set.
> ---
>   src/station.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/src/station.c b/src/station.c
> index 30a1232a..e188aed4 100644
> --- a/src/station.c
> +++ b/src/station.c
> @@ -2221,7 +2221,7 @@ static void station_roamed(struct station *station)
>   	 * Schedule another roaming attempt in case the signal continues to
>   	 * remain low. A subsequent high signal notification will cancel it.
>   	 */
> -	if (station->signal_low)
> +	if (station->signal_low && L_WARN_ON(!station->roam_trigger_timeout))

I'm reading this as:
if signal_low and roam_trigger_timeout is NULL:
	1. print a warning
	2. rearm the timer

Is 1. intended?

>   		station_roam_timeout_rearm(station, roam_retry_interval);
>   
>   	if (station->netconfig)
> @@ -2260,7 +2260,7 @@ static void station_roam_retry(struct station *station)
>   	station->roam_scan_full = false;
>   	station->ap_directed_roaming = false;
>   
> -	if (station->signal_low)
> +	if (station->signal_low && !station->roam_trigger_timeout)
>   		station_roam_timeout_rearm(station, roam_retry_interval);
>   }
>   
> @@ -3202,6 +3202,9 @@ static void station_low_rssi(struct station *station)
>   	if (station_cannot_roam(station))
>   		return;
>   
> +	if (station->roam_trigger_timeout)
> +		return;
> +

One thing I'm a bit worried about with such a simple approach is that we might 
have different roam timeouts.  For example, the 60 second roam retry interval is 
active and we get a low rssi event.  In this case we probably should schedule a 
roam sooner.  Maybe we should book-keep timeout information better better and 
make sure the timeout happens within the minimum time (active time remaining / 
new timeout)?

>   	/* Set a 5-second initial timeout */
>   	station_roam_timeout_rearm(station, 5);
>   }

Regards,
-Denis
James Prestwood Aug. 28, 2024, 12:08 p.m. UTC | #2
Hi Denis,

On 8/27/24 7:36 PM, Denis Kenzior wrote:
> Hi James,
>
> On 8/27/24 9:10 AM, James Prestwood wrote:
>> A user reported a crash which was due to the roam trigger timeout
>> being overwritten, followed by a disconnect. Post-disconnect the
>> timer would fire and result in a crash. Its not clear exactly where
>> the overwrite was happening but upon code inspection it could
>> happen in the following scenario:
>>
>> 1. Beacon loss event, start roam timeout
>> 2. Signal low event, no check if timeout is running and the timeout
>>     gets overwritten.
>>
>> The reported crash actually didn't appear to be from the above
>> scenario but something else, so now all instances where the timer
>> is rearmed we also check if there is already a timer set.
>> ---
>>   src/station.c | 7 +++++--
>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/station.c b/src/station.c
>> index 30a1232a..e188aed4 100644
>> --- a/src/station.c
>> +++ b/src/station.c
>> @@ -2221,7 +2221,7 @@ static void station_roamed(struct station 
>> *station)
>>        * Schedule another roaming attempt in case the signal 
>> continues to
>>        * remain low. A subsequent high signal notification will 
>> cancel it.
>>        */
>> -    if (station->signal_low)
>> +    if (station->signal_low && 
>> L_WARN_ON(!station->roam_trigger_timeout))
>
> I'm reading this as:
> if signal_low and roam_trigger_timeout is NULL:
>     1. print a warning
>     2. rearm the timer
>
> Is 1. intended?
>
>> station_roam_timeout_rearm(station, roam_retry_interval);
>>         if (station->netconfig)
>> @@ -2260,7 +2260,7 @@ static void station_roam_retry(struct station 
>> *station)
>>       station->roam_scan_full = false;
>>       station->ap_directed_roaming = false;
>>   -    if (station->signal_low)
>> +    if (station->signal_low && !station->roam_trigger_timeout)
>>           station_roam_timeout_rearm(station, roam_retry_interval);
>>   }
>>   @@ -3202,6 +3202,9 @@ static void station_low_rssi(struct station 
>> *station)
>>       if (station_cannot_roam(station))
>>           return;
>>   +    if (station->roam_trigger_timeout)
>> +        return;
>> +
>
> One thing I'm a bit worried about with such a simple approach is that 
> we might have different roam timeouts.  For example, the 60 second 
> roam retry interval is active and we get a low rssi event. In this 
> case we probably should schedule a roam sooner.  Maybe we should 
> book-keep timeout information better better and make sure the timeout 
> happens within the minimum time (active time remaining / new timeout)?
Sure, the thought crossed my mind as well. Looks like there is no way to 
get the active timeout within the l_timeout itself, it doesn't track 
that so we can manually.
>
>>       /* Set a 5-second initial timeout */
>>       station_roam_timeout_rearm(station, 5);
>>   }
>
> Regards,
> -Denis
Denis Kenzior Aug. 28, 2024, 2:41 p.m. UTC | #3
Hi James,

>>
>> One thing I'm a bit worried about with such a simple approach is that we might 
>> have different roam timeouts.  For example, the 60 second roam retry interval 
>> is active and we get a low rssi event. In this case we probably should 
>> schedule a roam sooner.  Maybe we should book-keep timeout information better 
>> better and make sure the timeout happens within the minimum time (active time 
>> remaining / new timeout)?
> Sure, the thought crossed my mind as well. Looks like there is no way to get the 
> active timeout within the l_timeout itself, it doesn't track that so we can 
> manually.

You might be able to add something into ell using timerfd_gettime.

Regards,
-Denis
diff mbox series

Patch

diff --git a/src/station.c b/src/station.c
index 30a1232a..e188aed4 100644
--- a/src/station.c
+++ b/src/station.c
@@ -2221,7 +2221,7 @@  static void station_roamed(struct station *station)
 	 * Schedule another roaming attempt in case the signal continues to
 	 * remain low. A subsequent high signal notification will cancel it.
 	 */
-	if (station->signal_low)
+	if (station->signal_low && L_WARN_ON(!station->roam_trigger_timeout))
 		station_roam_timeout_rearm(station, roam_retry_interval);
 
 	if (station->netconfig)
@@ -2260,7 +2260,7 @@  static void station_roam_retry(struct station *station)
 	station->roam_scan_full = false;
 	station->ap_directed_roaming = false;
 
-	if (station->signal_low)
+	if (station->signal_low && !station->roam_trigger_timeout)
 		station_roam_timeout_rearm(station, roam_retry_interval);
 }
 
@@ -3202,6 +3202,9 @@  static void station_low_rssi(struct station *station)
 	if (station_cannot_roam(station))
 		return;
 
+	if (station->roam_trigger_timeout)
+		return;
+
 	/* Set a 5-second initial timeout */
 	station_roam_timeout_rearm(station, 5);
 }