mbox series

[0/1] Retry when ifup fails

Message ID 20231214190340.133011-1-prestwoj@gmail.com (mailing list archive)
Headers show
Series Retry when ifup fails | expand

Message

James Prestwood Dec. 14, 2023, 7:03 p.m. UTC
For some background, at its core this is a driver issue. The problem
seemed to happen much more frequently when power save was disabled
while the interface was down (this was reordered in a prior patch)
but I've received two reports of this happening now even after
that change was made. The driver logs are the same in both cases.
It appears to be some race between bringing the interface down and
up quickly.

This issue seems to have cropped up somewhat recently, in the last
few months (or nobody was reporting it). This specific code path
hasn't changed in a long time so I suspect other IWD changes, or gcc
variations slightly altered timing/scheduling and exposed this ath10k
bug.

After applying this patch I did see it happen again and the retry
was able to bring the interface back up successufully so to me this
seems like a viable option until the driver is fixed (if it ever is).

I do have an open thread with some ath10k engineers about this.

The one occurrence I've seen since the workaround was applied:

iwd[1571924]: src/manager.c:manager_new_p2p_interface_cb()
iwd[1571924]: src/p2p.c:p2p_device_update_from_genl() Created P2P device 15
kernel: ath10k_pci 0000:02:00.0: wmi service ready event not received
iwd[1571924]: Error bringing interface 14 up: Connection timed out, retrying in 1s
kernel: ath10k_pci 0000:02:00.0: Could not init core: -110
iwd[1571924]: src/netdev.c:netdev_link_notify() event 16 on ifindex 14
iwd[1571924]: src/netdev.c:netdev_set_4addr() netdev: 14 use_4addr: 0
iwd[1571924]: src/netdev.c:netdev_initial_up_cb() Interface 14 initialized
iwd[1571924]: src/netconfig.c:netconfig_new() Creating netconfig for interface: 14
iwd[1571924]: src/station.c:station_enter_state() Old State: disconnected, new state: autoconnect_quick

James Prestwood (1):
  netdev: retry on failed ifup

 src/netdev.c | 41 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 34 insertions(+), 7 deletions(-)

Comments

James Prestwood Dec. 14, 2023, 8:09 p.m. UTC | #1
On 12/14/23 11:03, James Prestwood wrote:
> For some background, at its core this is a driver issue. The problem
> seemed to happen much more frequently when power save was disabled
> while the interface was down (this was reordered in a prior patch)
> but I've received two reports of this happening now even after
> that change was made. The driver logs are the same in both cases.
> It appears to be some race between bringing the interface down and
> up quickly.
>
> This issue seems to have cropped up somewhat recently, in the last
> few months (or nobody was reporting it). This specific code path
> hasn't changed in a long time so I suspect other IWD changes, or gcc
> variations slightly altered timing/scheduling and exposed this ath10k
> bug.
>
> After applying this patch I did see it happen again and the retry
> was able to bring the interface back up successufully so to me this
> seems like a viable option until the driver is fixed (if it ever is).
>
> I do have an open thread with some ath10k engineers about this.
>
> The one occurrence I've seen since the workaround was applied:
>
> iwd[1571924]: src/manager.c:manager_new_p2p_interface_cb()
> iwd[1571924]: src/p2p.c:p2p_device_update_from_genl() Created P2P device 15
> kernel: ath10k_pci 0000:02:00.0: wmi service ready event not received
> iwd[1571924]: Error bringing interface 14 up: Connection timed out, retrying in 1s
> kernel: ath10k_pci 0000:02:00.0: Could not init core: -110
> iwd[1571924]: src/netdev.c:netdev_link_notify() event 16 on ifindex 14
> iwd[1571924]: src/netdev.c:netdev_set_4addr() netdev: 14 use_4addr: 0
> iwd[1571924]: src/netdev.c:netdev_initial_up_cb() Interface 14 initialized
> iwd[1571924]: src/netconfig.c:netconfig_new() Creating netconfig for interface: 14
> iwd[1571924]: src/station.c:station_enter_state() Old State: disconnected, new state: autoconnect_quick

After closer inspection, while the ifup succeeded, the driver still 
seemed to be in a bad state. Probably better to hold off until more is 
known. I suspect the interface needs to be completely removed and 
created again in order to get the driver working when this happens.

Reported on linux-wireless:

https://lore.kernel.org/linux-wireless/abbb7874-7f7f-423b-b67c-6ef850ae5bd6@gmail.com/T/#u

>
> James Prestwood (1):
>    netdev: retry on failed ifup
>
>   src/netdev.c | 41 ++++++++++++++++++++++++++++++++++-------
>   1 file changed, 34 insertions(+), 7 deletions(-)
>