Message ID | 20240905134315.374800-1-prestwoj@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/2] scan: check scan request in get_survey_done before deref | expand |
Context | Check | Description |
---|---|---|
tedd_an/pre-ci_am | success | Success |
prestwoj/iwd-alpine-ci-fetch | success | Fetch PR |
prestwoj/iwd-ci-gitlint | success | GitLint |
prestwoj/iwd-alpine-ci-setupell | success | Prep - Setup ELL |
prestwoj/iwd-ci-fetch | success | Fetch PR |
prestwoj/iwd-ci-setupell | success | Prep - Setup ELL |
prestwoj/iwd-alpine-ci-makedistcheck | success | Make Distcheck |
prestwoj/iwd-ci-makedistcheck | success | Make Distcheck |
prestwoj/iwd-ci-build | success | Build - Configure |
prestwoj/iwd-alpine-ci-build | success | Build - Configure |
prestwoj/iwd-ci-makecheckvalgrind | success | Make Check w/Valgrind |
prestwoj/iwd-ci-clang | success | clang PASS |
prestwoj/iwd-ci-makecheck | success | Make Check |
prestwoj/iwd-alpine-ci-makecheckvalgrind | success | Make Check w/Valgrind |
prestwoj/iwd-alpine-ci-makecheck | success | Make Check |
prestwoj/iwd-ci-incremental_build | success | Incremental Build with patches |
prestwoj/iwd-alpine-ci-incremental_build | success | Incremental Build with patches |
prestwoj/iwd-ci-testrunner | success | test-runner PASS |
Daniel, On 9/5/24 6:43 AM, James Prestwood wrote: > Due to the possibility of external scans the scan request pointer > could be NULL. Prior to surveys IWD would still get the results in > order for periodic scans to utilize them. This behavior can be > retained by checking both if we don't have a request or if the > request was canceled. This check is identical to the one in > get_scan_done. > > This fixes a crash when checking if the NULL scan request has been > canceled: > > 0x00005ffa6a0376de in get_survey_done (user_data=0x5ffa783a3f90) at src/scan.c:2059 > 0x0000749646a29bbd in ?? () from /usr/lib/libell.so.0 > 0x0000749646a243cb in ?? () from /usr/lib/libell.so.0 > 0x0000749646a24655 in l_main_iterate () from /usr/lib/libell.so.0 > 0x0000749646a24ace in l_main_run () from /usr/lib/libell.so.0 > 0x0000749646a263a4 in l_main_run_with_signal () from /usr/lib/libell.so.0 > 0x00005ffa6a00d642 in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:614 > > Reported-by: Daniel Bond <danielbondno@gmail.com> > --- > src/scan.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/scan.c b/src/scan.c > index debdeb1f..205365cd 100644 > --- a/src/scan.c > +++ b/src/scan.c > @@ -2056,7 +2056,7 @@ static void get_survey_done(void *user_data) > > sc->get_survey_cmd_id = 0; > > - if (!results->sr->canceled) > + if (!results->sr || !results->sr->canceled) > get_results(results); > else > get_scan_done(user_data); Feel free to re-submit under your name as this is nearly the same as your patch, but the difference in logic change is important. I also forgot about Co-authored-by, which maybe is more fitting for this case. I can send that in v2, or maybe it could be added before merging. Thanks, James
Hi James, On 9/5/24 8:43 AM, James Prestwood wrote: > Due to the possibility of external scans the scan request pointer Do we know why external scans are happening? I can understand a one off because someone triggered iw scan, but from the bug report it sounded like it was crashing iwd repeatedly? > could be NULL. Prior to surveys IWD would still get the results in > order for periodic scans to utilize them. This behavior can be > retained by checking both if we don't have a request or if the > request was canceled. This check is identical to the one in > get_scan_done. > > This fixes a crash when checking if the NULL scan request has been > canceled: > > 0x00005ffa6a0376de in get_survey_done (user_data=0x5ffa783a3f90) at src/scan.c:2059 > 0x0000749646a29bbd in ?? () from /usr/lib/libell.so.0 > 0x0000749646a243cb in ?? () from /usr/lib/libell.so.0 > 0x0000749646a24655 in l_main_iterate () from /usr/lib/libell.so.0 > 0x0000749646a24ace in l_main_run () from /usr/lib/libell.so.0 > 0x0000749646a263a4 in l_main_run_with_signal () from /usr/lib/libell.so.0 > 0x00005ffa6a00d642 in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:614 > > Reported-by: Daniel Bond <danielbondno@gmail.com> > --- > src/scan.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/scan.c b/src/scan.c > index debdeb1f..205365cd 100644 > --- a/src/scan.c > +++ b/src/scan.c > @@ -2056,7 +2056,7 @@ static void get_survey_done(void *user_data) > > sc->get_survey_cmd_id = 0; > > - if (!results->sr->canceled) > + if (!results->sr || !results->sr->canceled) I still don't understand why we're even bothering requesting a survey for a scan we didn't trigger? In other words, we shouldn't even be in this function. > get_results(results); > else > get_scan_done(user_data); Regards, -Denis
Hi Denis, On 9/5/24 7:37 AM, Denis Kenzior wrote: > Hi James, > > On 9/5/24 8:43 AM, James Prestwood wrote: >> Due to the possibility of external scans the scan request pointer > > Do we know why external scans are happening? I can understand a one > off because someone triggered iw scan, but from the bug report it > sounded like it was crashing iwd repeatedly? NetworkManager? wpa_supplicant running? I really have no idea. Either way its out of our control. > >> could be NULL. Prior to surveys IWD would still get the results in >> order for periodic scans to utilize them. This behavior can be >> retained by checking both if we don't have a request or if the >> request was canceled. This check is identical to the one in >> get_scan_done. >> >> This fixes a crash when checking if the NULL scan request has been >> canceled: >> >> 0x00005ffa6a0376de in get_survey_done (user_data=0x5ffa783a3f90) at >> src/scan.c:2059 >> 0x0000749646a29bbd in ?? () from /usr/lib/libell.so.0 >> 0x0000749646a243cb in ?? () from /usr/lib/libell.so.0 >> 0x0000749646a24655 in l_main_iterate () from /usr/lib/libell.so.0 >> 0x0000749646a24ace in l_main_run () from /usr/lib/libell.so.0 >> 0x0000749646a263a4 in l_main_run_with_signal () from >> /usr/lib/libell.so.0 >> 0x00005ffa6a00d642 in main (argc=<optimized out>, argv=<optimized >> out>) at src/main.c:614 >> >> Reported-by: Daniel Bond <danielbondno@gmail.com> >> --- >> src/scan.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/src/scan.c b/src/scan.c >> index debdeb1f..205365cd 100644 >> --- a/src/scan.c >> +++ b/src/scan.c >> @@ -2056,7 +2056,7 @@ static void get_survey_done(void *user_data) >> sc->get_survey_cmd_id = 0; >> - if (!results->sr->canceled) >> + if (!results->sr || !results->sr->canceled) > > I still don't understand why we're even bothering requesting a survey > for a scan we didn't trigger? In other words, we shouldn't even be in > this function. Because we still utilize external scan results for periodic scans. So if periodic scans are running we will still try and get the survey/results. > >> get_results(results); >> else >> get_scan_done(user_data); > > Regards, > -Denis
Hi James, >> Do we know why external scans are happening? I can understand a one off >> because someone triggered iw scan, but from the bug report it sounded like it >> was crashing iwd repeatedly? > NetworkManager? wpa_supplicant running? I really have no idea. Either way its > out of our control. Would be good to know and fix that. >> >> I still don't understand why we're even bothering requesting a survey for a >> scan we didn't trigger? In other words, we shouldn't even be in this function. > Because we still utilize external scan results for periodic scans. So if > periodic scans are running we will still try and get the survey/results. Should we though? I'm not really sure how surveys are implemented at the driver level. Are you sure this isn't introducing other timing problems? Regards, -Denis
Hi Denis, On 9/5/24 7:57 AM, Denis Kenzior wrote: > Hi James, > >>> Do we know why external scans are happening? I can understand a one >>> off because someone triggered iw scan, but from the bug report it >>> sounded like it was crashing iwd repeatedly? >> NetworkManager? wpa_supplicant running? I really have no idea. Either >> way its out of our control. > > Would be good to know and fix that. > >>> >>> I still don't understand why we're even bothering requesting a >>> survey for a scan we didn't trigger? In other words, we shouldn't >>> even be in this function. >> Because we still utilize external scan results for periodic scans. So >> if periodic scans are running we will still try and get the >> survey/results. > > Should we though? I'm not really sure how surveys are implemented at > the driver level. Are you sure this isn't introducing other timing > problems? So looking at the nl80211 code it is actually making a driver call for surveys, versus GET_SCAN simply returns the list. I can't speak to the implementation but based on testing several cards it appears that GET_SURVEY was always gathering data from the last scan request. E.g. if you do a limited scan then survey it will only have updated results for those limited frequencies. But anyways, if this seems sketchy to you I'm also fine skipping the survey and only doing GET_SCAN in the case of external scans. I guess without knowing what the driver is doing a survey could in theory take an excessive amount of time and cause unintended behavior, like if our own internal scan got issued during that. Thanks, James > > Regards, > -Denis
diff --git a/src/scan.c b/src/scan.c index debdeb1f..205365cd 100644 --- a/src/scan.c +++ b/src/scan.c @@ -2056,7 +2056,7 @@ static void get_survey_done(void *user_data) sc->get_survey_cmd_id = 0; - if (!results->sr->canceled) + if (!results->sr || !results->sr->canceled) get_results(results); else get_scan_done(user_data);