Message ID | 20230822152931.276136-3-prestwoj@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Support fallback if FT fails | expand |
Hi James, On 8/22/23 10:29, James Prestwood wrote: > The auth/action status is now tracked in ft.c. If an AP rejects the > FT attempt with "Invalid PMKID" we can now assume this AP is either > mis-configured for FT or is lagging behind getting the proper keys > from neighboring APs (e.g. was just rebooted). > > If we see this condition IWD can now fall back to reassociation in > an attempt to still roam to the best candidate. > > The motivation behind this isn't necissarily to always force a roam, typo, 'necessarily' > but instead to handle the case where there are no other BSS's > around. Currently IWD can get into a situation where the current > BSS link is very poor but the only cadidate is stuck in this state typo, 'candidate' > where its rejecting the PMKID. When this happens it eventually leads > to a disconnect due to packet/beacon loss. Its a much better > approach to at least try reassociation. Okay, but you should really explain that what is the strategy if IWD has multiple FT candidates and the first (or many) reject the FT authentication step. > --- > src/station.c | 129 ++++++++++++++++++++++++++++++++------------------ > 1 file changed, 82 insertions(+), 47 deletions(-) > > diff --git a/src/station.c b/src/station.c > index 2473de2a..733b4196 100644 > --- a/src/station.c > +++ b/src/station.c > @@ -147,15 +147,49 @@ struct roam_bss { > uint8_t addr[6]; > uint16_t rank; > int32_t signal_strength; > + bool reassoc: 1; > }; > <snip> > -static void station_transition_start(struct station *station); > +static void station_transition_start(struct station *station, bool no_ft); > > static bool station_ft_work_ready(struct wiphy_radio_work_item *item) > { > struct station *station = l_container_of(item, struct station, ft_work); > - struct roam_bss *rbss = l_queue_pop_head(station->roam_bss_list); > - struct scan_bss *bss = network_bss_find_by_addr( > - station->connected_network, rbss->addr); > + L_AUTO_FREE_VAR(struct roam_bss *, rbss) = l_queue_pop_head( > + station->roam_bss_list); _auto_(l_free) is prefered to L_AUTO_FREE_VAR > + struct scan_bss *bss; > int ret; > > - l_free(rbss); > + /* Already failed FT, then failed reassociation, disregard this BSS */ > + if (rbss->reassoc) { > + l_free(l_steal_ptr(rbss)); > + > + rbss = l_queue_pop_head(station->roam_bss_list); > + if (!rbss) > + goto assoc_failed; > + } > > /* Very unlikely, but the BSS could have gone away */ > + bss = network_bss_find_by_addr(station->connected_network, rbss->addr); > if (!bss) > goto try_next; > > ret = ft_associate(netdev_get_ifindex(station->netdev), bss->addr); > - if (ret == -ENOENT) { > + if (ret > 0) { > + if (ret != MMPDU_STATUS_CODE_INVALID_PMKID) { > + station_debug_event(station, "ft-roam-failed"); > + goto try_next; > + } > + > + rbss->reassoc = true; > + > + l_queue_push_head(station->roam_bss_list, rbss); > + > + station_debug_event(station, "ft-fallback-to-reassoc"); > + > + /* FT failed, attempt reassociation as a last resort */ > + station_transition_start(station, true); > + l_steal_ptr(rbss); > + return true; So shouldn't you prefer FT over associate? If there are multiple FT candidates, then should iwd try those first? > + } else if (ret == -ENOENT) { > station_debug_event(station, "ft-roam-failed"); > try_next: > - station_transition_start(station); > + station_transition_start(station, false); > return true; > } else if (ret < 0) > goto assoc_failed; Regards, -Denis
Hi Denis, <snip> > >> where its rejecting the PMKID. When this happens it eventually leads >> to a disconnect due to packet/beacon loss. Its a much better >> approach to at least try reassociation. > > Okay, but you should really explain that what is the strategy if IWD has > multiple FT candidates and the first (or many) reject the FT > authentication step. In this patch the strategy is to always try the first candidate: try FT, and if failed with status=53, try reassociate. There are drawbacks and benefits to this: - The drawback is if reassociate also fails we have to disconnect, and can't FT to any other candidates. This would mainly hurt if there are other good candidates to connect to. - The benefit is if there is only one available BSS to roam to and we failed FT but reassociation succeeded. This prevents us from trying and failing FT over and over. In addition if there are additional FT candidates but poor ranking we would still connect to the best candidate even if we do so via reassociation. The decision to retry with reassociation mainly boils down to if there are other "good" candidates. If yes we should try FT on those since its faster and failure tolerant. But if all the other candidates or lack thereof are poor, we should reassociate to the one that failed FT, since its the best. Doing this logically is hard because we need to define whats "good" and whats "bad". One option to make this better would be to recompute the BSS rank after FT fails, without RANK_FT_FACTOR, and re-insert into the queue. Although a 1.3x factor likely won't make a huge difference. The idea here being that if other candidates are closely ranked we try those via FT before falling back to reassociation. Another option is to introduce some cutoff RSSI or rank value that determines if we retry with reassociation or move on to the next BSS with FT. At the very least (mostly applicable to 1 roam candidate) we should keep track that all FT attempts failed and try reassociation on the next round. Thanks, James >> --- >> src/station.c | 129 ++++++++++++++++++++++++++++++++------------------ >> 1 file changed, 82 insertions(+), 47 deletions(-) >> >> diff --git a/src/station.c b/src/station.c >> index 2473de2a..733b4196 100644 >> --- a/src/station.c >> +++ b/src/station.c >> @@ -147,15 +147,49 @@ struct roam_bss { >> uint8_t addr[6]; >> uint16_t rank; >> int32_t signal_strength; >> + bool reassoc: 1; >> }; > > <snip> > >> -static void station_transition_start(struct station *station); >> +static void station_transition_start(struct station *station, bool >> no_ft); >> static bool station_ft_work_ready(struct wiphy_radio_work_item *item) >> { >> struct station *station = l_container_of(item, struct station, >> ft_work); >> - struct roam_bss *rbss = l_queue_pop_head(station->roam_bss_list); >> - struct scan_bss *bss = network_bss_find_by_addr( >> - station->connected_network, rbss->addr); >> + L_AUTO_FREE_VAR(struct roam_bss *, rbss) = l_queue_pop_head( >> + station->roam_bss_list); > > _auto_(l_free) is prefered to L_AUTO_FREE_VAR > >> + struct scan_bss *bss; >> int ret; >> - l_free(rbss); >> + /* Already failed FT, then failed reassociation, disregard this >> BSS */ >> + if (rbss->reassoc) { >> + l_free(l_steal_ptr(rbss)); >> + >> + rbss = l_queue_pop_head(station->roam_bss_list); >> + if (!rbss) >> + goto assoc_failed; >> + } >> /* Very unlikely, but the BSS could have gone away */ >> + bss = network_bss_find_by_addr(station->connected_network, >> rbss->addr); >> if (!bss) >> goto try_next; >> ret = ft_associate(netdev_get_ifindex(station->netdev), bss->addr); >> - if (ret == -ENOENT) { >> + if (ret > 0) { >> + if (ret != MMPDU_STATUS_CODE_INVALID_PMKID) { >> + station_debug_event(station, "ft-roam-failed"); >> + goto try_next; >> + } >> + >> + rbss->reassoc = true; >> + >> + l_queue_push_head(station->roam_bss_list, rbss); >> + >> + station_debug_event(station, "ft-fallback-to-reassoc"); >> + >> + /* FT failed, attempt reassociation as a last resort */ >> + station_transition_start(station, true); >> + l_steal_ptr(rbss); >> + return true; > > So shouldn't you prefer FT over associate? If there are multiple FT > candidates, then should iwd try those first? > >> + } else if (ret == -ENOENT) { >> station_debug_event(station, "ft-roam-failed"); >> try_next: >> - station_transition_start(station); >> + station_transition_start(station, false); >> return true; >> } else if (ret < 0) >> goto assoc_failed; > > Regards, > -Denis >
Hi James, > In this patch the strategy is to always try the first candidate: try FT, and if > failed with status=53, try reassociate. > > There are drawbacks and benefits to this: > - The drawback is if reassociate also fails we have to disconnect, and can't > FT to any other candidates. This would mainly hurt if there are other good > candidates to connect to. So that is a pretty significant drawback in my opinion. There's really no guarantee that the behavior you're seeing is the same for every vendor. As you point out, trying FT first has very little drawback. Falling back to the slow re-associate path does, especially on 802.1X networks. > - The benefit is if there is only one available BSS to roam to and we failed > FT but reassociation succeeded. This prevents us from trying and failing FT over > and over. In addition if there are additional FT candidates but poor ranking we > would still connect to the best candidate even if we do so via reassociation. If there's only a single candidate, there's no difference between the two approaches. I'm not following the argument in the second sentence? > > The decision to retry with reassociation mainly boils down to if there are other > "good" candidates. If yes we should try FT on those since its faster and failure > tolerant. But if all the other candidates or lack thereof are poor, we should > reassociate to the one that failed FT, since its the best. Doing this logically > is hard because we need to define whats "good" and whats "bad". That sounds like an argument for trying all FT candidates first. Doesn't iwd only consider candidates that have a higher rank than the current BSS? Another possibility would be to insert the BSS twice (or reinsert once FT fails), once with FT and once without FT and try them in rank order. Still, there's almost no penalty to trying all FT candidates, so unless the ranking calculation is really off, I think that is a better approach. > > One option to make this better would be to recompute the BSS rank after FT > fails, without RANK_FT_FACTOR, and re-insert into the queue. Although a 1.3x > factor likely won't make a huge difference. The idea here being that if other > candidates are closely ranked we try those via FT before falling back to > reassociation. Right. > > Another option is to introduce some cutoff RSSI or rank value that determines if > we retry with reassociation or move on to the next BSS with FT. I doubt any absolute RSSI or rank cutoff will work. These are very hardware dependent. Better to refine the ranking calculation, but in the end the information we base the rank off of is imperfect, so sub-optimal decisions are going to be a fact of life. > > At the very least (mostly applicable to 1 roam candidate) we should keep track > that all FT attempts failed and try reassociation on the next round. Or better yet, have the vendor fix the network. Regards, -Denis
Hi Denis, On 8/28/23 8:02 AM, Denis Kenzior wrote: > Hi James, > >> In this patch the strategy is to always try the first candidate: try >> FT, and if failed with status=53, try reassociate. >> >> There are drawbacks and benefits to this: >> - The drawback is if reassociate also fails we have to disconnect, >> and can't FT to any other candidates. This would mainly hurt if there >> are other good candidates to connect to. > > So that is a pretty significant drawback in my opinion. There's really > no guarantee that the behavior you're seeing is the same for every > vendor. As you point out, trying FT first has very little drawback. > Falling back to the slow re-associate path does, especially on 802.1X > networks. There isn't a guarantee about this behavior, but it is how hostapd behaves if the FT key pull mechanism is slow or not working yet. Either if the AP backend DHCP server is slow, or some other network issue. I really have no visibility into why it happens, its hidden behind their proprietary backend. Anyways handling this specific error code, if we can minimize the drawbacks, shouldn't hurt clients in other network deployments who's vendors don't have this behavior. > >> - The benefit is if there is only one available BSS to roam to and >> we failed FT but reassociation succeeded. This prevents us from trying >> and failing FT over and over. In addition if there are additional FT >> candidates but poor ranking we would still connect to the best >> candidate even if we do so via reassociation. > > If there's only a single candidate, there's no difference between the > two approaches. I'm not following the argument in the second sentence? Yes a single candidate won't make a difference. What I was talking about is if you have the following: Current BSS: -85dbm Candidate A: -55dbm Candidate B: -84dbm If candidate A failed FT as described we would then FT to candidate B which would then immediately trigger another roam. I know its a very specific scenario but definitely can happen. > >> >> The decision to retry with reassociation mainly boils down to if there >> are other "good" candidates. If yes we should try FT on those since >> its faster and failure tolerant. But if all the other candidates or >> lack thereof are poor, we should reassociate to the one that failed >> FT, since its the best. Doing this logically is hard because we need >> to define whats "good" and whats "bad". > > That sounds like an argument for trying all FT candidates first. > > Doesn't iwd only consider candidates that have a higher rank than the > current BSS? > > Another possibility would be to insert the BSS twice (or reinsert once > FT fails), once with FT and once without FT and try them in rank order. > Still, there's almost no penalty to trying all FT candidates, so unless > the ranking calculation is really off, I think that is a better approach. Ok, I'd rather go this route then, re-insert the BSS without FT after this specific failure. Would you want me to put this behind a config option? Or you think reassociation based on ranking is ok? Again, it would only be with status=53 so I think the impact is minimal. > >> >> One option to make this better would be to recompute the BSS rank >> after FT fails, without RANK_FT_FACTOR, and re-insert into the queue. >> Although a 1.3x factor likely won't make a huge difference. The idea >> here being that if other candidates are closely ranked we try those >> via FT before falling back to reassociation. > > Right. > >> >> Another option is to introduce some cutoff RSSI or rank value that >> determines if we retry with reassociation or move on to the next BSS >> with FT. > > I doubt any absolute RSSI or rank cutoff will work. These are very > hardware dependent. Better to refine the ranking calculation, but in > the end the information we base the rank off of is imperfect, so > sub-optimal decisions are going to be a fact of life. > >> >> At the very least (mostly applicable to 1 roam candidate) we should >> keep track that all FT attempts failed and try reassociation on the >> next round. > > Or better yet, have the vendor fix the network. Agreed... but easier said than done. This is just one of several issues we're trying to work around on the client side that we shouldn't have to :( > > Regards, > -Denis
Hi James, >> Another possibility would be to insert the BSS twice (or reinsert once FT >> fails), once with FT and once without FT and try them in rank order. Still, >> there's almost no penalty to trying all FT candidates, so unless the ranking >> calculation is really off, I think that is a better approach. > > Ok, I'd rather go this route then, re-insert the BSS without FT after this > specific failure. Ok, lets do that then. > > Would you want me to put this behind a config option? Or you think reassociation > based on ranking is ok? Again, it would only be with status=53 so I think the > impact is minimal. > Make it the default behavior. Regards, -Denis
diff --git a/src/station.c b/src/station.c index 2473de2a..733b4196 100644 --- a/src/station.c +++ b/src/station.c @@ -147,15 +147,49 @@ struct roam_bss { uint8_t addr[6]; uint16_t rank; int32_t signal_strength; + bool reassoc: 1; }; -static struct roam_bss *roam_bss_from_scan_bss(const struct scan_bss *bss) +static bool station_can_fast_transition(struct handshake_state *hs, + const struct scan_bss *bss) +{ + uint16_t mdid; + + if (!hs->mde) + return false; + + if (ie_parse_mobility_domain_from_data(hs->mde, hs->mde[1] + 2, + &mdid, NULL, NULL) < 0) + return false; + + if (!(bss->mde_present && l_get_le16(bss->mde) == mdid)) + return false; + + if (hs->supplicant_ie != NULL) { + struct ie_rsn_info rsn_info; + + if (!IE_AKM_IS_FT(hs->akm_suite)) + return false; + + if (scan_bss_get_rsn_info(bss, &rsn_info) < 0) + return false; + + if (!IE_AKM_IS_FT(rsn_info.akm_suites)) + return false; + } + + return true; +} + +static struct roam_bss *roam_bss_from_scan_bss(struct handshake_state *hs, + const struct scan_bss *bss) { struct roam_bss *rbss = l_new(struct roam_bss, 1); memcpy(rbss->addr, bss->addr, 6); rbss->rank = bss->rank; rbss->signal_strength = bss->signal_strength; + rbss->reassoc = !station_can_fast_transition(hs, bss); return rbss; } @@ -1942,37 +1976,6 @@ static void station_early_neighbor_report_cb(struct netdev *netdev, int err, &station->roam_freqs); } -static bool station_can_fast_transition(struct handshake_state *hs, - struct scan_bss *bss) -{ - uint16_t mdid; - - if (!hs->mde) - return false; - - if (ie_parse_mobility_domain_from_data(hs->mde, hs->mde[1] + 2, - &mdid, NULL, NULL) < 0) - return false; - - if (!(bss->mde_present && l_get_le16(bss->mde) == mdid)) - return false; - - if (hs->supplicant_ie != NULL) { - struct ie_rsn_info rsn_info; - - if (!IE_AKM_IS_FT(hs->akm_suite)) - return false; - - if (scan_bss_get_rsn_info(bss, &rsn_info) < 0) - return false; - - if (!IE_AKM_IS_FT(rsn_info.akm_suites)) - return false; - } - - return true; -} - static void station_disconnect_on_error_cb(struct netdev *netdev, bool success, void *user_data) { @@ -2169,10 +2172,14 @@ static int station_transition_reassociate(struct station *station, if (ret < 0) return ret; + l_debug(""); + station->connected_bss = bss; station->preparing_roam = false; station_enter_state(station, STATION_STATE_ROAMING); + station_debug_event(station, "reassoc-roam"); + return 0; } @@ -2259,27 +2266,51 @@ static void station_preauthenticate_cb(struct netdev *netdev, } } -static void station_transition_start(struct station *station); +static void station_transition_start(struct station *station, bool no_ft); static bool station_ft_work_ready(struct wiphy_radio_work_item *item) { struct station *station = l_container_of(item, struct station, ft_work); - struct roam_bss *rbss = l_queue_pop_head(station->roam_bss_list); - struct scan_bss *bss = network_bss_find_by_addr( - station->connected_network, rbss->addr); + L_AUTO_FREE_VAR(struct roam_bss *, rbss) = l_queue_pop_head( + station->roam_bss_list); + struct scan_bss *bss; int ret; - l_free(rbss); + /* Already failed FT, then failed reassociation, disregard this BSS */ + if (rbss->reassoc) { + l_free(l_steal_ptr(rbss)); + + rbss = l_queue_pop_head(station->roam_bss_list); + if (!rbss) + goto assoc_failed; + } /* Very unlikely, but the BSS could have gone away */ + bss = network_bss_find_by_addr(station->connected_network, rbss->addr); if (!bss) goto try_next; ret = ft_associate(netdev_get_ifindex(station->netdev), bss->addr); - if (ret == -ENOENT) { + if (ret > 0) { + if (ret != MMPDU_STATUS_CODE_INVALID_PMKID) { + station_debug_event(station, "ft-roam-failed"); + goto try_next; + } + + rbss->reassoc = true; + + l_queue_push_head(station->roam_bss_list, rbss); + + station_debug_event(station, "ft-fallback-to-reassoc"); + + /* FT failed, attempt reassociation as a last resort */ + station_transition_start(station, true); + l_steal_ptr(rbss); + return true; + } else if (ret == -ENOENT) { station_debug_event(station, "ft-roam-failed"); try_next: - station_transition_start(station); + station_transition_start(station, false); return true; } else if (ret < 0) goto assoc_failed; @@ -2288,6 +2319,8 @@ try_next: station->preparing_roam = false; station_enter_state(station, STATION_STATE_FT_ROAMING); + station_debug_event(station, "ft-roam"); + return true; assoc_failed: @@ -2349,7 +2382,8 @@ done: } static bool station_try_next_transition(struct station *station, - struct scan_bss *bss) + struct scan_bss *bss, + bool no_ft) { struct handshake_state *hs = netdev_get_handshake(station->netdev); struct network *connected = station->connected_network; @@ -2364,7 +2398,7 @@ static bool station_try_next_transition(struct station *station, station->ap_directed_roaming = false; /* Can we use Fast Transition? */ - if (station_can_fast_transition(hs, bss)) + if (station_can_fast_transition(hs, bss) && !no_ft) return station_fast_transition(station, bss); /* Non-FT transition */ @@ -2412,7 +2446,7 @@ static bool station_try_next_transition(struct station *station, return true; } -static void station_transition_start(struct station *station) +static void station_transition_start(struct station *station, bool no_ft) { struct roam_bss *rbss; bool roaming = false; @@ -2425,7 +2459,7 @@ static void station_transition_start(struct station *station) struct scan_bss *bss = network_bss_find_by_addr( station->connected_network, rbss->addr); - roaming = station_try_next_transition(station, bss); + roaming = station_try_next_transition(station, bss, no_ft); if (roaming) break; @@ -2581,7 +2615,7 @@ static bool station_roam_scan_notify(int err, struct l_queue *bss_list, */ station_update_roam_bss(station, bss); - rbss = roam_bss_from_scan_bss(bss); + rbss = roam_bss_from_scan_bss(hs, bss); l_queue_insert(station->roam_bss_list, rbss, roam_bss_rank_compare, NULL); @@ -2600,7 +2634,7 @@ next: goto fail; } - station_transition_start(station); + station_transition_start(station, false); return true; @@ -4584,6 +4618,7 @@ static bool station_force_roam_scan_notify(int err, struct l_queue *bss_list, { struct station_roam_data *data = user_data; struct station *station = data->station; + struct handshake_state *hs = netdev_get_handshake(station->netdev); struct scan_bss *target; struct l_dbus_message *reply; @@ -4603,11 +4638,11 @@ static bool station_force_roam_scan_notify(int err, struct l_queue *bss_list, /* The various roam routines expect this to be set from scanning */ station->preparing_roam = true; l_queue_push_tail(station->roam_bss_list, - roam_bss_from_scan_bss(target)); + roam_bss_from_scan_bss(hs, target)); station_update_roam_bss(station, target); - station_transition_start(station); + station_transition_start(station, false); reply = l_dbus_message_new_method_return(data->pending);