diff mbox series

netdev: handle disconnect event during the 4-way handshake

Message ID 20240717140009.2648885-1-prestwoj@gmail.com (mailing list archive)
State New
Headers show
Series netdev: handle disconnect event during the 4-way handshake | expand

Checks

Context Check Description
tedd_an/pre-ci_am success Success
prestwoj/iwd-alpine-ci-fetch success Fetch PR
prestwoj/iwd-ci-gitlint fail netdev: handle disconnect event during the 4-way handshake 20: B1 Line exceeds max length (84>80): "Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)" 22: B1 Line exceeds max length (87>80): "Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)" 24: B1 Line exceeds max length (84>80): "Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification New Station(19)" 26: B1 Line exceeds max length (85>80): "Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)" 28: B1 Line exceeds max length (82>80): "Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Associate(38)" 33: B1 Line exceeds max length (103>80): "Jul 16 18:16:13: src/netdev.c:netdev_connect_event() aborting and ignore_connect_event not set, proceed" 34: B1 Line exceeds max length (92>80): "Jul 16 18:16:13: src/netdev.c:netdev_connect_event() expect_connect_failure not set, proceed" 36: B1 Line exceeds max length (82>80): "Jul 16 18:16:13: src/netdev.c:netdev_connect_event() Request / Response IEs parsed" 41: B1 Line exceeds max length (111>80): "Jul 16 18:16:13: src/netdev.c:netdev_get_oci_cb() Obtained OCI: freq: 5220, width: 3, center1: 5210, center2: 0" 43: B1 Line exceeds max length (98>80): "Jul 16 18:16:13: src/netdev.c:netdev_unicast_notify() Unicast notification Control Port Frame(129)" 46: B1 Line exceeds max length (96>80): "Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Control Port TX Status(139)" 47: B1 Line exceeds max length (83>80): "Jul 16 18:16:14: src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)" 48: B1 Line exceeds max length (89>80): "Jul 16 18:16:14: src/netdev.c:netdev_cqm_event() Signal change event (above=1 signal=-60)" 50: B1 Line exceeds max length (84>80): "Jul 16 18:16:17: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)" 51: B1 Line exceeds max length (87>80): "Jul 16 18:16:17: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)" 53: B1 Line exceeds max length (83>80): "Jul 16 18:16:17: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)" 60: B1 Line exceeds max length (86>80): "Jul 16 18:16:17: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)" 61: B1 Line exceeds max length (101>80): "Jul 16 18:16:17: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX"
prestwoj/iwd-alpine-ci-setupell success Prep - Setup ELL
prestwoj/iwd-ci-fetch success Fetch PR
prestwoj/iwd-ci-setupell success Prep - Setup ELL
prestwoj/iwd-ci-incremental_build success Incremental build not run PASS
prestwoj/iwd-alpine-ci-makedistcheck success Make Distcheck
prestwoj/iwd-alpine-ci-incremental_build success Incremental build not run PASS
prestwoj/iwd-alpine-ci-build success Build - Configure
prestwoj/iwd-ci-build success Build - Configure
prestwoj/iwd-alpine-ci-makecheckvalgrind success Make Check w/Valgrind
prestwoj/iwd-alpine-ci-makecheck success Make Check
prestwoj/iwd-ci-clang success clang PASS
prestwoj/iwd-ci-makecheckvalgrind success Make Check w/Valgrind
prestwoj/iwd-ci-makecheck success Make Check
prestwoj/iwd-ci-makedistcheck success Make Distcheck
prestwoj/iwd-ci-testrunner fail test-runner - FAIL: testEncryptedProfiles

Commit Message

James Prestwood July 17, 2024, 2 p.m. UTC
If a disconnect arrives during the 4-way handshake this would result
in netdev sending a disconnect event to station. If this is a
reassociation this case is unhandled in station and causes a hang as
it expects any connection failure to be handled via the reassociation
callback, not a random disconnect event.

To handle this case the disconnect event can check if there is a
pending connect callback and call that instead of the event handler.

Below are logs showing the "Unexpected disconnect event" which
prevents IWD from cleaning up its state and ultimately results in a
hang:

Jul 16 18:16:13: src/station.c:station_transition_reassociate()
Jul 16 18:16:13: event: state, old: connected, new: roaming
Jul 16 18:16:13: src/wiphy.c:wiphy_radio_work_done() Work item 65 done
Jul 16 18:16:13: src/wiphy.c:wiphy_radio_work_next() Starting work item 66
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
Jul 16 18:16:13: src/netdev.c:netdev_deauthenticate_event()
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification New Station(19)
Jul 16 18:16:13: src/station.c:station_netdev_event() Associating
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
Jul 16 18:16:13: src/netdev.c:netdev_authenticate_event()
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Associate(38)
Jul 16 18:16:13: src/netdev.c:netdev_associate_event()
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Connect(46)
Jul 16 18:16:13: src/netdev.c:netdev_connect_event()
Jul 16 18:16:13: src/netdev.c:netdev_connect_event() aborting and ignore_connect_event not set, proceed
Jul 16 18:16:13: src/netdev.c:netdev_connect_event() expect_connect_failure not set, proceed
Jul 16 18:16:13: src/netdev.c:parse_request_ies()
Jul 16 18:16:13: src/netdev.c:netdev_connect_event() Request / Response IEs parsed
Jul 16 18:16:13: src/netdev.c:netdev_get_oci()
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_get_oci_cb() Obtained OCI: freq: 5220, width: 3, center1: 5210, center2: 0
Jul 16 18:16:13: src/eapol.c:eapol_start()
Jul 16 18:16:13: src/netdev.c:netdev_unicast_notify() Unicast notification Control Port Frame(129)
Jul 16 18:16:13: src/netdev.c:netdev_control_port_frame_event()
Jul 16 18:16:13: src/eapol.c:eapol_handle_ptk_1_of_4() ifindex=6
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Control Port TX Status(139)
Jul 16 18:16:14: src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
Jul 16 18:16:14: src/netdev.c:netdev_cqm_event() Signal change event (above=1 signal=-60)
Jul 16 18:16:17: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:17: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
Jul 16 18:16:17: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
Jul 16 18:16:17: src/netdev.c:netdev_deauthenticate_event()
Jul 16 18:16:17: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
Jul 16 18:16:17: src/netdev.c:netdev_disconnect_event()
Jul 16 18:16:17: Received Deauthentication event, reason: 15, from_ap: true
Jul 16 18:16:17: src/wiphy.c:wiphy_radio_work_done() Work item 66 done
Jul 16 18:16:17: src/station.c:station_disconnect_event() 6
Jul 16 18:16:17: Unexpected disconnect event
Jul 16 18:16:17: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:17: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
Jul 16 18:16:17: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX
---
 src/netdev.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Comments

Denis Kenzior July 17, 2024, 4:33 p.m. UTC | #1
Hi James,

On 7/17/24 9:00 AM, James Prestwood wrote:
> If a disconnect arrives during the 4-way handshake this would result
> in netdev sending a disconnect event to station. If this is a
> reassociation this case is unhandled in station and causes a hang as
> it expects any connection failure to be handled via the reassociation
> callback, not a random disconnect event.
> 
> To handle this case the disconnect event can check if there is a
> pending connect callback and call that instead of the event handler.
> 
> Below are logs showing the "Unexpected disconnect event" which
> prevents IWD from cleaning up its state and ultimately results in a
> hang:
> 
> Jul 16 18:16:13: src/station.c:station_transition_reassociate()
> Jul 16 18:16:13: event: state, old: connected, new: roaming
> Jul 16 18:16:13: src/wiphy.c:wiphy_radio_work_done() Work item 65 done
> Jul 16 18:16:13: src/wiphy.c:wiphy_radio_work_next() Starting work item 66
> Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
> Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
> Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
> Jul 16 18:16:13: src/netdev.c:netdev_deauthenticate_event()
> Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification New Station(19)
> Jul 16 18:16:13: src/station.c:station_netdev_event() Associating
> Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
> Jul 16 18:16:13: src/netdev.c:netdev_authenticate_event()
> Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Associate(38)
> Jul 16 18:16:13: src/netdev.c:netdev_associate_event()
> Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
> Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Connect(46)
> Jul 16 18:16:13: src/netdev.c:netdev_connect_event()
> Jul 16 18:16:13: src/netdev.c:netdev_connect_event() aborting and ignore_connect_event not set, proceed
> Jul 16 18:16:13: src/netdev.c:netdev_connect_event() expect_connect_failure not set, proceed
> Jul 16 18:16:13: src/netdev.c:parse_request_ies()
> Jul 16 18:16:13: src/netdev.c:netdev_connect_event() Request / Response IEs parsed
> Jul 16 18:16:13: src/netdev.c:netdev_get_oci()
> Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
> Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
> Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
> Jul 16 18:16:13: src/netdev.c:netdev_get_oci_cb() Obtained OCI: freq: 5220, width: 3, center1: 5210, center2: 0
> Jul 16 18:16:13: src/eapol.c:eapol_start()
> Jul 16 18:16:13: src/netdev.c:netdev_unicast_notify() Unicast notification Control Port Frame(129)
> Jul 16 18:16:13: src/netdev.c:netdev_control_port_frame_event()
> Jul 16 18:16:13: src/eapol.c:eapol_handle_ptk_1_of_4() ifindex=6
> Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Control Port TX Status(139)
> Jul 16 18:16:14: src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
> Jul 16 18:16:14: src/netdev.c:netdev_cqm_event() Signal change event (above=1 signal=-60)
> Jul 16 18:16:17: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
> Jul 16 18:16:17: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
> Jul 16 18:16:17: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
> Jul 16 18:16:17: src/netdev.c:netdev_deauthenticate_event()
> Jul 16 18:16:17: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
> Jul 16 18:16:17: src/netdev.c:netdev_disconnect_event()
> Jul 16 18:16:17: Received Deauthentication event, reason: 15, from_ap: true
> Jul 16 18:16:17: src/wiphy.c:wiphy_radio_work_done() Work item 66 done
> Jul 16 18:16:17: src/station.c:station_disconnect_event() 6
> Jul 16 18:16:17: Unexpected disconnect event
> Jul 16 18:16:17: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
> Jul 16 18:16:17: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
> Jul 16 18:16:17: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX
> ---
>   src/netdev.c | 15 ++++++++++++++-
>   1 file changed, 14 insertions(+), 1 deletion(-)
> 

<snip>

> @@ -1310,9 +1312,20 @@ static void netdev_disconnect_event(struct l_genl_msg *msg,
>   
>   	event_filter = netdev->event_filter;
>   	event_data = netdev->user_data;
> +	connect_cb = netdev->connect_cb;
> +	user_data = netdev->user_data;
>   	netdev_connect_free(netdev);
>   
> -	if (!event_filter)
> +	/*
> +	 * If a deauth frame come in after the connect event during the 4-way
> +	 * handshake we need to handle this via the connect callback since its
> +	 * still effectively a connection failure.
> +	 */
> +	if (connect_cb) {
> +		connect_cb(netdev, NETDEV_RESULT_AUTHENTICATION_FAILED,
> +				&reason_code, user_data);
> +		return;
> +	} else if (!event_filter)
>   		return;

Can we use netdev_disconnected() instead?

>   
>   	if (disconnect_by_ap)

Regards,
-Denis
James Prestwood July 17, 2024, 9:18 p.m. UTC | #2
Hi Alvin,

On 7/17/24 12:35 PM, Alvin Šipraga wrote:
> Hi James,
>
> On Wed, Jul 17, 2024 at 08:02:17AM GMT, James Prestwood wrote:
>> + CC Alvin
>>
>> You had originally reported [1] a similar issue but it was unclear exactly
>> what was happening at the time. I believe this is the same problem. This may
>> not be the final version of a fix, but I wanted to CC you in case you still
>> were seeing this and wanted to test it.
>>
>> [1] https://lore.kernel.org/iwd/20230403141927.235014-1-alvin@pqrs.dk/
> It's been a while since I dealt with this problem, and in the mean time
> we have been using the original patch I sent to the list as seen in your
> link. Unfortunately I don't recall how I reproduced this issue, but
> maybe my colleague Pedro (+cc) remembers.
>
> Regardless, a static code analysis suggests that your patch yields the
> same result, so I think this is good - one less patch for us to apply :)
>
> Do you have some easily communicable steps to reproduce this issue on
> your end?

I was able to write an autotest for it that took that code path. Its 
basically just as you thought from your previous thread. If a deauth 
frame comes in during the 4-way handshake or key settings during a 
reassociate roam it triggers that "Unexpected disconnect" path in 
station, causing IWD to hang up and not clean up its state.

I just wanted to check in with you that its being fixed. Denis and I 
have been discussing it and my ML patch still isn't quite how he wants 
it, but we think we know how it should be handled.

Thanks,
James

>
> Thanks for following up on this!
>
> Kind regards,
> Alvin
diff mbox series

Patch

diff --git a/src/netdev.c b/src/netdev.c
index 7a335894..c6be05b5 100644
--- a/src/netdev.c
+++ b/src/netdev.c
@@ -1270,6 +1270,8 @@  static void netdev_disconnect_event(struct l_genl_msg *msg,
 	uint16_t reason_code = 0;
 	bool disconnect_by_ap = false;
 	netdev_event_func_t event_filter;
+	netdev_connect_cb_t connect_cb;
+	void *user_data;
 	void *event_data;
 
 	l_debug("");
@@ -1310,9 +1312,20 @@  static void netdev_disconnect_event(struct l_genl_msg *msg,
 
 	event_filter = netdev->event_filter;
 	event_data = netdev->user_data;
+	connect_cb = netdev->connect_cb;
+	user_data = netdev->user_data;
 	netdev_connect_free(netdev);
 
-	if (!event_filter)
+	/*
+	 * If a deauth frame come in after the connect event during the 4-way
+	 * handshake we need to handle this via the connect callback since its
+	 * still effectively a connection failure.
+	 */
+	if (connect_cb) {
+		connect_cb(netdev, NETDEV_RESULT_AUTHENTICATION_FAILED,
+				&reason_code, user_data);
+		return;
+	} else if (!event_filter)
 		return;
 
 	if (disconnect_by_ap)