Message ID | 20230404012010.15261-1-ajay.kathat@microchip.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Kalle Valo |
Headers | show |
Series | wifi: wilc1000: fix kernel oops during interface down during background scan | expand |
Hi, [+ wireless and cfg80211 maintainers because I'm not familiar with cfg80211 ] > Fix for kernel crash observed with following test procedure [1]: > while true; > do ifconfig wlan0 up; > iw dev wlan0 scan & > ifconfig wlan0 down; > done > > During the above test procedure, the scan results are received from > firmware > for 'iw scan' command gets queued even when the interface is going > down. It > was causing the kernel oops when dereferencing the freed pointers. > > For synchronization, 'mac_close()' calls flush_workqueue() to block its > execution till all pending work is completed. Afterwards 'wilc->close' > flag > which is set before the flush_workqueue() should avoid adding new work. > Added 'wilc->close' check in wilc_handle_isr() which is common for > SPI/SDIO bus to ignore the interrupts from firmware that inturns adds > the > work since the interface is getting closed. With this patch I'm now getting wilc1000_sdio mmc0:0001:1 wlan0: Failed to send setup multicast when you close the interface. > > 1. > https://lore.kernel.org/linux-wireless/20221024135407.7udo3dwl3mqyv2yj@0002.3ffe.de/ should be Link: > Reported-by: Michael Walle <mwalle@kernel.org> > Signed-off-by: Ajay Singh <ajay.kathat@microchip.com> Missing Fixes: tag. In this regard, most of the previous wilc fixes patches miss a proper Fixes tag which makes the wilc1000 pretty unusable on stable kernels IMHO :/ > --- > drivers/net/wireless/microchip/wilc1000/netdev.c | 9 +++------ > drivers/net/wireless/microchip/wilc1000/wlan.c | 3 +++ > 2 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/drivers/net/wireless/microchip/wilc1000/netdev.c > b/drivers/net/wireless/microchip/wilc1000/netdev.c > index e9f59de31b0b..40edee10a81f 100644 > --- a/drivers/net/wireless/microchip/wilc1000/netdev.c > +++ b/drivers/net/wireless/microchip/wilc1000/netdev.c > @@ -38,11 +38,6 @@ static irqreturn_t isr_bh_routine(int irq, void > *userdata) > { > struct wilc *wilc = userdata; > > - if (wilc->close) { > - pr_err("Can't handle BH interrupt\n"); > - return IRQ_HANDLED; > - } > - This check is still in the top half of the interrupt processing. Shouldn't it be removed there, too? That way you can get rid of the top half entirely and just let the irq subsys use the default top half implementation. > wilc_handle_isr(wilc); > > return IRQ_HANDLED; > @@ -781,13 +776,15 @@ static int wilc_mac_close(struct net_device > *ndev) > if (vif->ndev) { > netif_stop_queue(vif->ndev); > > + if (wl->open_ifcs == 0) > + wl->close = 1; Ignoring the fact that this isn't protected somehow and that there is no write barrier (maybe I'm overthinking this and it isn't really needed for an 'int' field), this and your reasoning with the flush_workqueue() sounds legit. But I'm still not convinced a lock is not required. wilc_user_scan_req::scan_result is at least updated in wilc_disconnect() and wilc_deinit(). wilc_disconnect() is called from the cfg80211_ops::disconnect callback. wilc_deinit() is called from net_device_ops::ndo_stop. Is there any lock which prevents both functions be called in parallel? wl->close is checked in the .disconnect op, but as mentioned above, it is not protected by any lock. -michael > + > wilc_handle_disconnect(vif); > wilc_deinit_host_int(vif->ndev); > } > > if (wl->open_ifcs == 0) { > netdev_dbg(ndev, "Deinitializing wilc1000\n"); > - wl->close = 1; > wilc_wlan_deinitialize(ndev); > } > > diff --git a/drivers/net/wireless/microchip/wilc1000/wlan.c > b/drivers/net/wireless/microchip/wilc1000/wlan.c > index 58bbf50081e4..700cb657be00 100644 > --- a/drivers/net/wireless/microchip/wilc1000/wlan.c > +++ b/drivers/net/wireless/microchip/wilc1000/wlan.c > @@ -1066,6 +1066,9 @@ void wilc_handle_isr(struct wilc *wilc) > { > u32 int_status; > > + if (wilc->close) > + return; > + > acquire_bus(wilc, WILC_BUS_ACQUIRE_AND_WAKEUP); > wilc->hif_func->hif_read_int(wilc, &int_status); > > -- > 2.34.1
On Wed, 2023-04-05 at 13:40 +0200, Michael Walle wrote: > > wilc_disconnect() is called from the cfg80211_ops::disconnect > callback. wilc_deinit() is called from net_device_ops::ndo_stop. > Is there any lock which prevents both functions be called in > parallel? I don't _think_ there's any common lock, ndo_stop() holds the RTNL, but cfg80211 for a normal nl80211 disconnect command will only briefly hold the RTNL and drop it again before calling into the driver. The internal flags here don't indicate requiring RTNL and that wouldn't make much sense either: { .cmd = NL80211_CMD_DISCONNECT, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = nl80211_disconnect, .flags = GENL_UNS_ADMIN_PERM, .internal_flags = IFLAGS(NL80211_FLAG_NEED_NETDEV_UP), }, See commit a05829a7222e ("cfg80211: avoid holding the RTNL when calling the driver"). johannes
Hi Michael, On 4/5/23 04:40, Michael Walle wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know > the content is safe > > Hi, > > [+ wireless and cfg80211 maintainers because I'm not familiar with > cfg80211 ] > >> Fix for kernel crash observed with following test procedure [1]: >> while true; >> do ifconfig wlan0 up; >> iw dev wlan0 scan & >> ifconfig wlan0 down; >> done >> >> During the above test procedure, the scan results are received from >> firmware >> for 'iw scan' command gets queued even when the interface is going >> down. It >> was causing the kernel oops when dereferencing the freed pointers. >> >> For synchronization, 'mac_close()' calls flush_workqueue() to block its >> execution till all pending work is completed. Afterwards 'wilc->close' >> flag >> which is set before the flush_workqueue() should avoid adding new work. >> Added 'wilc->close' check in wilc_handle_isr() which is common for >> SPI/SDIO bus to ignore the interrupts from firmware that inturns adds >> the >> work since the interface is getting closed. > > With this patch I'm now getting > wilc1000_sdio mmc0:0001:1 wlan0: Failed to send setup multicast > > when you close the interface. > This is a false alarm. I will modify the patch to ignore this debug message when the mac_close() is in progress. >> >> 1. >> https://lore.kernel.org/linux-wireless/20221024135407.7udo3dwl3mqyv2yj@0002.3ffe.de/ > > should be Link: Okay > >> Reported-by: Michael Walle <mwalle@kernel.org> >> Signed-off-by: Ajay Singh <ajay.kathat@microchip.com> > > Missing Fixes: tag. In this regard, most of the previous wilc fixes > patches > miss a proper Fixes tag which makes the wilc1000 pretty unusable on > stable > kernels IMHO :/ > Sure. I will include the fixes tag in updated version. >> --- >> drivers/net/wireless/microchip/wilc1000/netdev.c | 9 +++------ >> drivers/net/wireless/microchip/wilc1000/wlan.c | 3 +++ >> 2 files changed, 6 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/net/wireless/microchip/wilc1000/netdev.c >> b/drivers/net/wireless/microchip/wilc1000/netdev.c >> index e9f59de31b0b..40edee10a81f 100644 >> --- a/drivers/net/wireless/microchip/wilc1000/netdev.c >> +++ b/drivers/net/wireless/microchip/wilc1000/netdev.c >> @@ -38,11 +38,6 @@ static irqreturn_t isr_bh_routine(int irq, void >> *userdata) >> { >> struct wilc *wilc = userdata; >> >> - if (wilc->close) { >> - pr_err("Can't handle BH interrupt\n"); >> - return IRQ_HANDLED; >> - } >> - > > This check is still in the top half of the interrupt processing. > Shouldn't it be removed there, too? That way you can get rid of > the top half entirely and just let the irq subsys use the default > top half implementation. > Yeah, it makes sense. I will include this change in the updated version. >> wilc_handle_isr(wilc); >> >> return IRQ_HANDLED; >> @@ -781,13 +776,15 @@ static int wilc_mac_close(struct net_device >> *ndev) >> if (vif->ndev) { >> netif_stop_queue(vif->ndev); >> >> + if (wl->open_ifcs == 0) >> + wl->close = 1; > > Ignoring the fact that this isn't protected somehow and that > there is no write barrier (maybe I'm overthinking this and > it isn't really needed for an 'int' field), this and your > reasoning with the flush_workqueue() sounds legit. > > But I'm still not convinced a lock is not required. > wilc_user_scan_req::scan_result is at least updated in > wilc_disconnect() and wilc_deinit(). > > wilc_disconnect() is called from the cfg80211_ops::disconnect > callback. wilc_deinit() is called from net_device_ops::ndo_stop. > Is there any lock which prevents both functions be called in > parallel? wl->close is checked in the .disconnect op, but as > mentioned above, it is not protected by any lock. Sure, I will prepare a separate patch to handle this. > > -michael > >> + >> wilc_handle_disconnect(vif); >> wilc_deinit_host_int(vif->ndev); >> } >> >> if (wl->open_ifcs == 0) { >> netdev_dbg(ndev, "Deinitializing wilc1000\n"); >> - wl->close = 1; >> wilc_wlan_deinitialize(ndev); >> } >> >> diff --git a/drivers/net/wireless/microchip/wilc1000/wlan.c >> b/drivers/net/wireless/microchip/wilc1000/wlan.c >> index 58bbf50081e4..700cb657be00 100644 >> --- a/drivers/net/wireless/microchip/wilc1000/wlan.c >> +++ b/drivers/net/wireless/microchip/wilc1000/wlan.c >> @@ -1066,6 +1066,9 @@ void wilc_handle_isr(struct wilc *wilc) >> { >> u32 int_status; >> >> + if (wilc->close) >> + return; >> + >> acquire_bus(wilc, WILC_BUS_ACQUIRE_AND_WAKEUP); >> wilc->hif_func->hif_read_int(wilc, &int_status); >> >> -- >> 2.34.1
<Ajay.Kathat@microchip.com> writes: > Fix for kernel crash observed with following test procedure [1]: > while true; > do ifconfig wlan0 up; > iw dev wlan0 scan & > ifconfig wlan0 down; > done > > During the above test procedure, the scan results are received from firmware > for 'iw scan' command gets queued even when the interface is going down. It > was causing the kernel oops when dereferencing the freed pointers. > > For synchronization, 'mac_close()' calls flush_workqueue() to block its > execution till all pending work is completed. Afterwards 'wilc->close' flag > which is set before the flush_workqueue() should avoid adding new work. > Added 'wilc->close' check in wilc_handle_isr() which is common for > SPI/SDIO bus to ignore the interrupts from firmware that inturns adds the > work since the interface is getting closed. > > 1. https://lore.kernel.org/linux-wireless/20221024135407.7udo3dwl3mqyv2yj@0002.3ffe.de/ > > Reported-by: Michael Walle <mwalle@kernel.org> > Signed-off-by: Ajay Singh <ajay.kathat@microchip.com> [...] > @@ -781,13 +776,15 @@ static int wilc_mac_close(struct net_device *ndev) > if (vif->ndev) { > netif_stop_queue(vif->ndev); > > + if (wl->open_ifcs == 0) > + wl->close = 1; > + wl-close is an int, I wonder if it's racy to int as a flag like this? In cases like this I usually use set_bit() & co because those guarantee atomicity, though don't know if that's overkill.
Hi Kalle, On 5/5/23 08:47, Kalle Valo wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > <Ajay.Kathat@microchip.com> writes: > >> Fix for kernel crash observed with following test procedure [1]: >> while true; >> do ifconfig wlan0 up; >> iw dev wlan0 scan & >> ifconfig wlan0 down; >> done >> >> During the above test procedure, the scan results are received from firmware >> for 'iw scan' command gets queued even when the interface is going down. It >> was causing the kernel oops when dereferencing the freed pointers. >> >> For synchronization, 'mac_close()' calls flush_workqueue() to block its >> execution till all pending work is completed. Afterwards 'wilc->close' flag >> which is set before the flush_workqueue() should avoid adding new work. >> Added 'wilc->close' check in wilc_handle_isr() which is common for >> SPI/SDIO bus to ignore the interrupts from firmware that inturns adds the >> work since the interface is getting closed. >> >> 1. https://lore.kernel.org/linux-wireless/20221024135407.7udo3dwl3mqyv2yj@0002.3ffe.de/ >> >> Reported-by: Michael Walle <mwalle@kernel.org> >> Signed-off-by: Ajay Singh <ajay.kathat@microchip.com> > > [...] > >> @@ -781,13 +776,15 @@ static int wilc_mac_close(struct net_device *ndev) >> if (vif->ndev) { >> netif_stop_queue(vif->ndev); >> >> + if (wl->open_ifcs == 0) >> + wl->close = 1; >> + > > wl-close is an int, I wonder if it's racy to int as a flag like this? In > cases like this I usually use set_bit() & co because those guarantee > atomicity, though don't know if that's overkill. > I think it's a good idea to use an atomic operation but I am not sure if using atomic for 'wl->close' will have much impact. For instance, if any new work gets added to the workqueue before the 'wl->close=1' is fully completed, then that work would get executed as normal. However, I feel it's safe to define 'wl->close' as atomic_t type. I will prepare the conversion patch and will try to include it along with the updated version of this patch. Regards, Ajay
<Ajay.Kathat@microchip.com> writes: >>> @@ -781,13 +776,15 @@ static int wilc_mac_close(struct net_device *ndev) >>> if (vif->ndev) { >>> netif_stop_queue(vif->ndev); >>> >>> + if (wl->open_ifcs == 0) >>> + wl->close = 1; >>> + >> >> wl-close is an int, I wonder if it's racy to int as a flag like this? In >> cases like this I usually use set_bit() & co because those guarantee >> atomicity, though don't know if that's overkill. >> > > I think it's a good idea to use an atomic operation but I am not sure if > using atomic for 'wl->close' will have much impact. For instance, if any > new work gets added to the workqueue before the 'wl->close=1' is fully > completed, then that work would get executed as normal. Sure, this is most likely a small race condition. But still a race. > However, I feel it's safe to define 'wl->close' as atomic_t type. I will > prepare the conversion patch and will try to include it along with the > updated version of this patch. Why atomic_t? You only use values 0 and 1 so test_bit() and set_bit() sounds more approriate to me.
diff --git a/drivers/net/wireless/microchip/wilc1000/netdev.c b/drivers/net/wireless/microchip/wilc1000/netdev.c index e9f59de31b0b..40edee10a81f 100644 --- a/drivers/net/wireless/microchip/wilc1000/netdev.c +++ b/drivers/net/wireless/microchip/wilc1000/netdev.c @@ -38,11 +38,6 @@ static irqreturn_t isr_bh_routine(int irq, void *userdata) { struct wilc *wilc = userdata; - if (wilc->close) { - pr_err("Can't handle BH interrupt\n"); - return IRQ_HANDLED; - } - wilc_handle_isr(wilc); return IRQ_HANDLED; @@ -781,13 +776,15 @@ static int wilc_mac_close(struct net_device *ndev) if (vif->ndev) { netif_stop_queue(vif->ndev); + if (wl->open_ifcs == 0) + wl->close = 1; + wilc_handle_disconnect(vif); wilc_deinit_host_int(vif->ndev); } if (wl->open_ifcs == 0) { netdev_dbg(ndev, "Deinitializing wilc1000\n"); - wl->close = 1; wilc_wlan_deinitialize(ndev); } diff --git a/drivers/net/wireless/microchip/wilc1000/wlan.c b/drivers/net/wireless/microchip/wilc1000/wlan.c index 58bbf50081e4..700cb657be00 100644 --- a/drivers/net/wireless/microchip/wilc1000/wlan.c +++ b/drivers/net/wireless/microchip/wilc1000/wlan.c @@ -1066,6 +1066,9 @@ void wilc_handle_isr(struct wilc *wilc) { u32 int_status; + if (wilc->close) + return; + acquire_bus(wilc, WILC_BUS_ACQUIRE_AND_WAKEUP); wilc->hif_func->hif_read_int(wilc, &int_status);
Fix for kernel crash observed with following test procedure [1]: while true; do ifconfig wlan0 up; iw dev wlan0 scan & ifconfig wlan0 down; done During the above test procedure, the scan results are received from firmware for 'iw scan' command gets queued even when the interface is going down. It was causing the kernel oops when dereferencing the freed pointers. For synchronization, 'mac_close()' calls flush_workqueue() to block its execution till all pending work is completed. Afterwards 'wilc->close' flag which is set before the flush_workqueue() should avoid adding new work. Added 'wilc->close' check in wilc_handle_isr() which is common for SPI/SDIO bus to ignore the interrupts from firmware that inturns adds the work since the interface is getting closed. 1. https://lore.kernel.org/linux-wireless/20221024135407.7udo3dwl3mqyv2yj@0002.3ffe.de/ Reported-by: Michael Walle <mwalle@kernel.org> Signed-off-by: Ajay Singh <ajay.kathat@microchip.com> --- drivers/net/wireless/microchip/wilc1000/netdev.c | 9 +++------ drivers/net/wireless/microchip/wilc1000/wlan.c | 3 +++ 2 files changed, 6 insertions(+), 6 deletions(-) -- 2.34.1