Message ID | 20220519135345.109936-1-duoming@zju.edu.cn (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v2] net: wireless: marvell: mwifiex: fix sleep in atomic context bugs | expand |
Duoming Zhou <duoming@zju.edu.cn> writes: > There are sleep in atomic context bugs when uploading device dump > data on usb interface. The root cause is that the operations that > may sleep are called in fw_dump_timer_fn which is a timer handler. > The call tree shows the execution paths that could lead to bugs: > > (Interrupt context) > fw_dump_timer_fn > mwifiex_upload_device_dump > dev_coredumpv(..., GFP_KERNEL) > dev_coredumpm() > kzalloc(sizeof(*devcd), gfp); //may sleep > dev_set_name > kobject_set_name_vargs > kvasprintf_const(GFP_KERNEL, ...); //may sleep > kstrdup(s, GFP_KERNEL); //may sleep > > This patch moves the operations that may sleep into a work item. > The work item will run in another kernel thread which is in > process context to execute the bottom half of the interrupt. > So it could prevent atomic context from sleeping. > > Fixes: f5ecd02a8b20 ("mwifiex: device dump support for usb interface") > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> mwifiex patches go to wireless-next, not net tree. > --- > Changes in v2: > - Fix compile problem. So you don't even compile test your patches? That's bad and in that case I'll just directly drop this. We expect that the patches are properly tested.
Hello, On Thu, 19 May 2022 17:58:47 +0300 Kalle Valo wrote: > > There are sleep in atomic context bugs when uploading device dump > > data on usb interface. The root cause is that the operations that > > may sleep are called in fw_dump_timer_fn which is a timer handler. > > The call tree shows the execution paths that could lead to bugs: > > > > (Interrupt context) > > fw_dump_timer_fn > > mwifiex_upload_device_dump > > dev_coredumpv(..., GFP_KERNEL) > > dev_coredumpm() > > kzalloc(sizeof(*devcd), gfp); //may sleep > > dev_set_name > > kobject_set_name_vargs > > kvasprintf_const(GFP_KERNEL, ...); //may sleep > > kstrdup(s, GFP_KERNEL); //may sleep > > > > This patch moves the operations that may sleep into a work item. > > The work item will run in another kernel thread which is in > > process context to execute the bottom half of the interrupt. > > So it could prevent atomic context from sleeping. > > > > Fixes: f5ecd02a8b20 ("mwifiex: device dump support for usb interface") > > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> > > mwifiex patches go to wireless-next, not net tree. > > > --- > > Changes in v2: > > - Fix compile problem. > > So you don't even compile test your patches? That's bad and in that case > I'll just directly drop this. We expect that the patches are properly > tested. Ok, I will properly test this patch. Best regards, Duoming Zhou
On 5/19/2022 8:14 AM, duoming@zju.edu.cn wrote: > Hello, > > On Thu, 19 May 2022 17:58:47 +0300 Kalle Valo wrote: > >>> There are sleep in atomic context bugs when uploading device dump >>> data on usb interface. The root cause is that the operations that >>> may sleep are called in fw_dump_timer_fn which is a timer handler. >>> The call tree shows the execution paths that could lead to bugs: >>> >>> (Interrupt context) >>> fw_dump_timer_fn >>> mwifiex_upload_device_dump >>> dev_coredumpv(..., GFP_KERNEL) just looking at this description, why isn't the simple fix just to change this call to use GFP_ATOMIC?
Hello, On Thu, 19 May 2022 08:48:44 -0700 Jeff Johnson wrote: > >>> There are sleep in atomic context bugs when uploading device dump > >>> data on usb interface. The root cause is that the operations that > >>> may sleep are called in fw_dump_timer_fn which is a timer handler. > >>> The call tree shows the execution paths that could lead to bugs: > >>> > >>> (Interrupt context) > >>> fw_dump_timer_fn > >>> mwifiex_upload_device_dump > >>> dev_coredumpv(..., GFP_KERNEL) > > just looking at this description, why isn't the simple fix just to > change this call to use GFP_ATOMIC? Because change the parameter of dev_coredumpv() to GFP_ATOMIC could only solve partial problem. The following GFP_KERNEL parameters are in /lib/kobject.c which is not influenced by dev_coredumpv(). kobject_set_name_vargs kvasprintf_const(GFP_KERNEL, ...); //may sleep kstrdup(s, GFP_KERNEL); //may sleep Best regards, Duoming Zhou
On 5/19/2022 5:08 PM, duoming@zju.edu.cn wrote: > Hello, > > On Thu, 19 May 2022 08:48:44 -0700 Jeff Johnson wrote: > >>>>> There are sleep in atomic context bugs when uploading device dump >>>>> data on usb interface. The root cause is that the operations that >>>>> may sleep are called in fw_dump_timer_fn which is a timer handler. >>>>> The call tree shows the execution paths that could lead to bugs: >>>>> >>>>> (Interrupt context) >>>>> fw_dump_timer_fn >>>>> mwifiex_upload_device_dump >>>>> dev_coredumpv(..., GFP_KERNEL) >> >> just looking at this description, why isn't the simple fix just to >> change this call to use GFP_ATOMIC? > > Because change the parameter of dev_coredumpv() to GFP_ATOMIC could only solve > partial problem. The following GFP_KERNEL parameters are in /lib/kobject.c > which is not influenced by dev_coredumpv(). > > kobject_set_name_vargs > kvasprintf_const(GFP_KERNEL, ...); //may sleep > kstrdup(s, GFP_KERNEL); //may sleep Then it seems there is a problem with dev_coredumpm(). dev_coredumpm() takes a gfp param which means it expects to be called in any context, but it then calls dev_set_name() which, as you point out, cannot be called from an atomic context. So if we cannot change the fact that dev_set_name() cannot be called from an atomic context, then it would seem to follow that dev_coredumpv()/dev_coredumpm() also cannot be called from an atomic context and hence their gfp param is pointless and should presumably be removed. /jeff
Hello, On Fri, 20 May 2022 09:08:52 -0700 Jeff Johnson wrote: > >>>>> There are sleep in atomic context bugs when uploading device dump > >>>>> data on usb interface. The root cause is that the operations that > >>>>> may sleep are called in fw_dump_timer_fn which is a timer handler. > >>>>> The call tree shows the execution paths that could lead to bugs: > >>>>> > >>>>> (Interrupt context) > >>>>> fw_dump_timer_fn > >>>>> mwifiex_upload_device_dump > >>>>> dev_coredumpv(..., GFP_KERNEL) > >> > >> just looking at this description, why isn't the simple fix just to > >> change this call to use GFP_ATOMIC? > > > > Because change the parameter of dev_coredumpv() to GFP_ATOMIC could only solve > > partial problem. The following GFP_KERNEL parameters are in /lib/kobject.c > > which is not influenced by dev_coredumpv(). > > > > kobject_set_name_vargs > > kvasprintf_const(GFP_KERNEL, ...); //may sleep > > kstrdup(s, GFP_KERNEL); //may sleep > > Then it seems there is a problem with dev_coredumpm(). > > dev_coredumpm() takes a gfp param which means it expects to be called in > any context, but it then calls dev_set_name() which, as you point out, > cannot be called from an atomic context. > > So if we cannot change the fact that dev_set_name() cannot be called > from an atomic context, then it would seem to follow that > dev_coredumpv also cannot be called from an atomic > context and hence their gfp param is pointless and should presumably be > removed. Thanks for your time and suggestions! I think the gfp_t parameter of dev_coredumpv and dev_coredumpm may not be removed, because it could be used to pass value to gfp_t parameter of kzalloc in dev_coredumpm. What's more, there are also many other places use dev_coredumpv and dev_coredumpm, if we remove the gfp_t parameter, there are too many places that need to modify and these places are not in interrupt context. There are two solutions now: one is to moves the operations that may sleep into a work item. Another is to change the gfp_t parameter of dev_coredumpv from GFP_KERNEL to GFP_ATOMIC, and change the gfp_t parameter of kvasprintf_const and kstrdup from GFP_KERNEL to "in_interrupt() ? GFP_ATOMIC : GFP_KERNEL". The detail of the first solution is shown below: diff --git a/drivers/net/wireless/marvell/mwifiex/init.c b/drivers/net/wireless/marvell/mwifiex/init.c index 88c72d1827a..cc3f1121eb9 100644 --- a/drivers/net/wireless/marvell/mwifiex/init.c +++ b/drivers/net/wireless/marvell/mwifiex/init.c @@ -63,11 +63,19 @@ static void wakeup_timer_fn(struct timer_list *t) adapter->if_ops.card_reset(adapter); } +static void fw_dump_work(struct work_struct *work) +{ + struct mwifiex_adapter *adapter = + container_of(work, struct mwifiex_adapter, devdump_work); + + mwifiex_upload_device_dump(adapter); +} + static void fw_dump_timer_fn(struct timer_list *t) { struct mwifiex_adapter *adapter = from_timer(adapter, t, devdump_timer); - mwifiex_upload_device_dump(adapter); + schedule_work(&adapter->devdump_work); } /* @@ -321,6 +329,7 @@ static void mwifiex_init_adapter(struct mwifiex_adapter *adapter) adapter->active_scan_triggered = false; timer_setup(&adapter->wakeup_timer, wakeup_timer_fn, 0); adapter->devdump_len = 0; + INIT_WORK(&adapter->devdump_work, fw_dump_work); timer_setup(&adapter->devdump_timer, fw_dump_timer_fn, 0); } @@ -401,6 +410,7 @@ mwifiex_adapter_cleanup(struct mwifiex_adapter *adapter) { del_timer(&adapter->wakeup_timer); del_timer_sync(&adapter->devdump_timer); + cancel_work_sync(&adapter->devdump_work); mwifiex_cancel_all_pending_cmd(adapter); wake_up_interruptible(&adapter->cmd_wait_q.wait); wake_up_interruptible(&adapter->hs_activate_wait_q); diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h index 332dd1c8db3..c8ac2f57f18 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.h +++ b/drivers/net/wireless/marvell/mwifiex/main.h @@ -900,6 +900,7 @@ struct mwifiex_adapter { struct work_struct rx_work; struct workqueue_struct *dfs_workqueue; struct work_struct dfs_work; + struct work_struct devdump_work; bool rx_work_enabled; bool rx_processing; bool delay_main_work; diff --git a/drivers/net/wireless/marvell/mwifiex/sta_event.c b/drivers/net/wireless/marvell/mwifiex/sta_event.c index 7d42c5d2dbf..8e28d0107d7 100644 --- a/drivers/net/wireless/marvell/mwifiex/sta_event.c +++ b/drivers/net/wireless/marvell/mwifiex/sta_event.c @@ -644,6 +644,7 @@ mwifiex_fw_dump_info_event(struct mwifiex_private *priv, upload_dump: del_timer_sync(&adapter->devdump_timer); + cancel_work_sync(&adapter->devdump_work); mwifiex_upload_device_dump(adapter); } The detail of the second solution is shown below: diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c index ace7371c477..258906920a2 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.c +++ b/drivers/net/wireless/marvell/mwifiex/main.c @@ -1116,7 +1116,7 @@ void mwifiex_upload_device_dump(struct mwifiex_adapter *adapter) mwifiex_dbg(adapter, MSG, "== mwifiex dump information to /sys/class/devcoredump start\n"); dev_coredumpv(adapter->dev, adapter->devdump_data, adapter->devdump_len, - GFP_KERNEL); + GFP_ATOMIC); mwifiex_dbg(adapter, MSG, "== mwifiex dump information to /sys/class/devcoredump end\n"); diff --git a/lib/kobject.c b/lib/kobject.c index 5f0e71ab292..dbb2e57ef78 100644 --- a/lib/kobject.c +++ b/lib/kobject.c @@ -254,7 +254,7 @@ int kobject_set_name_vargs(struct kobject *kobj, const char *fmt, if (kobj->name && !fmt) return 0; - s = kvasprintf_const(GFP_KERNEL, fmt, vargs); + s = kvasprintf_const(in_interrupt() ? GFP_ATOMIC : GFP_KERNEL, fmt, vargs); if (!s) return -ENOMEM; @@ -267,7 +267,7 @@ int kobject_set_name_vargs(struct kobject *kobj, const char *fmt, if (strchr(s, '/')) { char *t; - t = kstrdup(s, GFP_KERNEL); + t = kstrdup(s, in_interrupt() ? GFP_ATOMIC : GFP_KERNEL); kfree_const(s); if (!t) return -ENOMEM; Which one do you think is better? I will choose the better one to test. Best regards, Duoming Zhou
duoming@zju.edu.cn writes: > Hello, > > On Fri, 20 May 2022 09:08:52 -0700 Jeff Johnson wrote: > >> >>>>> There are sleep in atomic context bugs when uploading device dump >> >>>>> data on usb interface. The root cause is that the operations that >> >>>>> may sleep are called in fw_dump_timer_fn which is a timer handler. >> >>>>> The call tree shows the execution paths that could lead to bugs: >> >>>>> >> >>>>> (Interrupt context) >> >>>>> fw_dump_timer_fn >> >>>>> mwifiex_upload_device_dump >> >>>>> dev_coredumpv(..., GFP_KERNEL) >> >> >> >> just looking at this description, why isn't the simple fix just to >> >> change this call to use GFP_ATOMIC? >> > >> > Because change the parameter of dev_coredumpv() to GFP_ATOMIC could only solve >> > partial problem. The following GFP_KERNEL parameters are in /lib/kobject.c >> > which is not influenced by dev_coredumpv(). >> > >> > kobject_set_name_vargs >> > kvasprintf_const(GFP_KERNEL, ...); //may sleep >> > kstrdup(s, GFP_KERNEL); //may sleep >> >> Then it seems there is a problem with dev_coredumpm(). >> >> dev_coredumpm() takes a gfp param which means it expects to be called in >> any context, but it then calls dev_set_name() which, as you point out, >> cannot be called from an atomic context. >> >> So if we cannot change the fact that dev_set_name() cannot be called >> from an atomic context, then it would seem to follow that >> dev_coredumpv also cannot be called from an atomic >> context and hence their gfp param is pointless and should presumably be >> removed. > > Thanks for your time and suggestions! I think the gfp_t parameter of dev_coredumpv and > dev_coredumpm may not be removed, because it could be used to pass value to gfp_t > parameter of kzalloc in dev_coredumpm. What's more, there are also many other places > use dev_coredumpv and dev_coredumpm, if we remove the gfp_t parameter, there are too many > places that need to modify and these places are not in interrupt > context. "Too many users" is not a valid reason to leave a bug in place, either dev_coredumpv() should support GFP_ATOMIC or the gfp_t parameter should be removed. > There are two solutions now: one is to moves the operations that may > sleep into a work item. That does not fix the root cause that dev_coredumpv() claims it can be called in atomic contexts. > Another is to change the gfp_t parameter of dev_coredumpv from GFP_KERNEL to GFP_ATOMIC, and > change the gfp_t parameter of kvasprintf_const and kstrdup from GFP_KERNEL to > "in_interrupt() ? GFP_ATOMIC : GFP_KERNEL". in_interrupt() is deprecated and should not be used. And I don't think it detects all atomic contexts like spinlocks.
Hello, On Sat, 21 May 2022 09:32:37 +0300 Kalle Valo wrote: > >> >>>>> There are sleep in atomic context bugs when uploading device dump > >> >>>>> data on usb interface. The root cause is that the operations that > >> >>>>> may sleep are called in fw_dump_timer_fn which is a timer handler. > >> >>>>> The call tree shows the execution paths that could lead to bugs: > >> >>>>> > >> >>>>> (Interrupt context) > >> >>>>> fw_dump_timer_fn > >> >>>>> mwifiex_upload_device_dump > >> >>>>> dev_coredumpv(..., GFP_KERNEL) > >> >> > >> >> just looking at this description, why isn't the simple fix just to > >> >> change this call to use GFP_ATOMIC? > >> > > >> > Because change the parameter of dev_coredumpv() to GFP_ATOMIC could only solve > >> > partial problem. The following GFP_KERNEL parameters are in /lib/kobject.c > >> > which is not influenced by dev_coredumpv(). > >> > > >> > kobject_set_name_vargs > >> > kvasprintf_const(GFP_KERNEL, ...); //may sleep > >> > kstrdup(s, GFP_KERNEL); //may sleep > >> > >> Then it seems there is a problem with dev_coredumpm(). > >> > >> dev_coredumpm() takes a gfp param which means it expects to be called in > >> any context, but it then calls dev_set_name() which, as you point out, > >> cannot be called from an atomic context. > >> > >> So if we cannot change the fact that dev_set_name() cannot be called > >> from an atomic context, then it would seem to follow that > >> dev_coredumpv also cannot be called from an atomic > >> context and hence their gfp param is pointless and should presumably be > >> removed. > > > > Thanks for your time and suggestions! I think the gfp_t parameter of dev_coredumpv and > > dev_coredumpm may not be removed, because it could be used to pass value to gfp_t > > parameter of kzalloc in dev_coredumpm. What's more, there are also many other places > > use dev_coredumpv and dev_coredumpm, if we remove the gfp_t parameter, there are too many > > places that need to modify and these places are not in interrupt > > context. > > "Too many users" is not a valid reason to leave a bug in place, either > dev_coredumpv() should support GFP_ATOMIC or the gfp_t parameter should > be removed. The following is one method that letting dev_coredump() support GFP_ATOMIC: I think dev_set_name() is used to allocate memory to set a device name, which need only several bytes and there is little chance to sleep in the real world. However the dev_coredumpv() is used to allocate memory to store device coredump, which need lots of memory space and have more chances to sleep. So I think only change the gfp_t parameter of dev_coredumpv() from GFP_KERNEL to GFP_ATOMIC is ok. diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c index ace7371c477..258906920a2 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.c +++ b/drivers/net/wireless/marvell/mwifiex/main.c @@ -1116,7 +1116,7 @@ void mwifiex_upload_device_dump(struct mwifiex_adapter *adapter) mwifiex_dbg(adapter, MSG, "== mwifiex dump information to /sys/class/devcoredump start\n"); dev_coredumpv(adapter->dev, adapter->devdump_data, adapter->devdump_len, - GFP_KERNEL); + GFP_ATOMIC); mwifiex_dbg(adapter, MSG, "== mwifiex dump information to /sys/class/devcoredump end\n"); > > There are two solutions now: one is to moves the operations that may > > sleep into a work item. > > That does not fix the root cause that dev_coredumpv() claims it can be > called in atomic contexts. I agree with you. There is not GFP_ATOMIC in lib/kobject.c. Should we modify the gfp_t parameter in kobject.c in order to support atomic contexts? Do you have any other good methods? > > Another is to change the gfp_t parameter of dev_coredumpv from GFP_KERNEL to GFP_ATOMIC, and > > change the gfp_t parameter of kvasprintf_const and kstrdup from GFP_KERNEL to > > "in_interrupt() ? GFP_ATOMIC : GFP_KERNEL". > > in_interrupt() is deprecated and should not be used. And I don't think > it detects all atomic contexts like spinlocks. I agree with you, the in_interrupt() is not proper. Thanks for your time and suggestions! Best regards, Duoming Zhou
diff --git a/drivers/net/wireless/marvell/mwifiex/init.c b/drivers/net/wireless/marvell/mwifiex/init.c index 88c72d1827a..cc3f1121eb9 100644 --- a/drivers/net/wireless/marvell/mwifiex/init.c +++ b/drivers/net/wireless/marvell/mwifiex/init.c @@ -63,11 +63,19 @@ static void wakeup_timer_fn(struct timer_list *t) adapter->if_ops.card_reset(adapter); } +static void fw_dump_work(struct work_struct *work) +{ + struct mwifiex_adapter *adapter = + container_of(work, struct mwifiex_adapter, devdump_work); + + mwifiex_upload_device_dump(adapter); +} + static void fw_dump_timer_fn(struct timer_list *t) { struct mwifiex_adapter *adapter = from_timer(adapter, t, devdump_timer); - mwifiex_upload_device_dump(adapter); + schedule_work(&adapter->devdump_work); } /* @@ -321,6 +329,7 @@ static void mwifiex_init_adapter(struct mwifiex_adapter *adapter) adapter->active_scan_triggered = false; timer_setup(&adapter->wakeup_timer, wakeup_timer_fn, 0); adapter->devdump_len = 0; + INIT_WORK(&adapter->devdump_work, fw_dump_work); timer_setup(&adapter->devdump_timer, fw_dump_timer_fn, 0); } @@ -401,6 +410,7 @@ mwifiex_adapter_cleanup(struct mwifiex_adapter *adapter) { del_timer(&adapter->wakeup_timer); del_timer_sync(&adapter->devdump_timer); + cancel_work_sync(&adapter->devdump_work); mwifiex_cancel_all_pending_cmd(adapter); wake_up_interruptible(&adapter->cmd_wait_q.wait); wake_up_interruptible(&adapter->hs_activate_wait_q); diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h index 332dd1c8db3..c8ac2f57f18 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.h +++ b/drivers/net/wireless/marvell/mwifiex/main.h @@ -900,6 +900,7 @@ struct mwifiex_adapter { struct work_struct rx_work; struct workqueue_struct *dfs_workqueue; struct work_struct dfs_work; + struct work_struct devdump_work; bool rx_work_enabled; bool rx_processing; bool delay_main_work; diff --git a/drivers/net/wireless/marvell/mwifiex/sta_event.c b/drivers/net/wireless/marvell/mwifiex/sta_event.c index 7d42c5d2dbf..8e28d0107d7 100644 --- a/drivers/net/wireless/marvell/mwifiex/sta_event.c +++ b/drivers/net/wireless/marvell/mwifiex/sta_event.c @@ -644,6 +644,7 @@ mwifiex_fw_dump_info_event(struct mwifiex_private *priv, upload_dump: del_timer_sync(&adapter->devdump_timer); + cancel_work_sync(&adapter->devdump_work); mwifiex_upload_device_dump(adapter); }
There are sleep in atomic context bugs when uploading device dump data on usb interface. The root cause is that the operations that may sleep are called in fw_dump_timer_fn which is a timer handler. The call tree shows the execution paths that could lead to bugs: (Interrupt context) fw_dump_timer_fn mwifiex_upload_device_dump dev_coredumpv(..., GFP_KERNEL) dev_coredumpm() kzalloc(sizeof(*devcd), gfp); //may sleep dev_set_name kobject_set_name_vargs kvasprintf_const(GFP_KERNEL, ...); //may sleep kstrdup(s, GFP_KERNEL); //may sleep This patch moves the operations that may sleep into a work item. The work item will run in another kernel thread which is in process context to execute the bottom half of the interrupt. So it could prevent atomic context from sleeping. Fixes: f5ecd02a8b20 ("mwifiex: device dump support for usb interface") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> --- Changes in v2: - Fix compile problem. drivers/net/wireless/marvell/mwifiex/init.c | 12 +++++++++++- drivers/net/wireless/marvell/mwifiex/main.h | 1 + drivers/net/wireless/marvell/mwifiex/sta_event.c | 1 + 3 files changed, 13 insertions(+), 1 deletion(-)