diff mbox series

[v3] mwifiex: fix sleep in atomic context bugs caused by dev_coredumpv

Message ID 20220523052810.24767-1-duoming@zju.edu.cn (mailing list archive)
State Rejected
Delegated to: Netdev Maintainers
Headers show
Series [v3] mwifiex: fix sleep in atomic context bugs caused by dev_coredumpv | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Duoming Zhou May 23, 2022, 5:28 a.m. UTC
There are sleep in atomic context bugs when uploading device dump
data in mwifiex. The root cause is that dev_coredumpv could not
be used in atomic contexts, because it calls dev_set_name which
include operations that may sleep. The call tree shows execution
paths that could lead to bugs:

   (Interrupt context)
fw_dump_timer_fn
  mwifiex_upload_device_dump
    dev_coredumpv(..., GFP_KERNEL)
      dev_coredumpm()
        kzalloc(sizeof(*devcd), gfp); //may sleep
        dev_set_name
          kobject_set_name_vargs
            kvasprintf_const(GFP_KERNEL, ...); //may sleep
            kstrdup(s, GFP_KERNEL); //may sleep

In order to let dev_coredumpv support atomic contexts, this patch
changes the gfp_t parameter of kvasprintf_const and kstrdup in
kobject_set_name_vargs from GFP_KERNEL to GFP_ATOMIC. What's more,
In order to mitigate bug, this patch changes the gfp_t parameter
of dev_coredumpv from GFP_KERNEL to GFP_ATOMIC.

Fixes: 57670ee882d4 ("mwifiex: device dump support via devcoredump framework")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
Changes in v3:
  - Let dev_coredumpv support atomic contexts.

 drivers/net/wireless/marvell/mwifiex/main.c | 2 +-
 lib/kobject.c                               | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

Comments

Greg Kroah-Hartman May 23, 2022, 6:31 a.m. UTC | #1
On Mon, May 23, 2022 at 01:28:10PM +0800, Duoming Zhou wrote:
> There are sleep in atomic context bugs when uploading device dump
> data in mwifiex. The root cause is that dev_coredumpv could not
> be used in atomic contexts, because it calls dev_set_name which
> include operations that may sleep. The call tree shows execution
> paths that could lead to bugs:
> 
>    (Interrupt context)
> fw_dump_timer_fn
>   mwifiex_upload_device_dump
>     dev_coredumpv(..., GFP_KERNEL)
>       dev_coredumpm()
>         kzalloc(sizeof(*devcd), gfp); //may sleep
>         dev_set_name
>           kobject_set_name_vargs
>             kvasprintf_const(GFP_KERNEL, ...); //may sleep
>             kstrdup(s, GFP_KERNEL); //may sleep
> 
> In order to let dev_coredumpv support atomic contexts, this patch
> changes the gfp_t parameter of kvasprintf_const and kstrdup in
> kobject_set_name_vargs from GFP_KERNEL to GFP_ATOMIC. What's more,
> In order to mitigate bug, this patch changes the gfp_t parameter
> of dev_coredumpv from GFP_KERNEL to GFP_ATOMIC.
> 
> Fixes: 57670ee882d4 ("mwifiex: device dump support via devcoredump framework")
> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> ---
> Changes in v3:
>   - Let dev_coredumpv support atomic contexts.
> 
>  drivers/net/wireless/marvell/mwifiex/main.c | 2 +-
>  lib/kobject.c                               | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c
> index ace7371c477..258906920a2 100644
> --- a/drivers/net/wireless/marvell/mwifiex/main.c
> +++ b/drivers/net/wireless/marvell/mwifiex/main.c
> @@ -1116,7 +1116,7 @@ void mwifiex_upload_device_dump(struct mwifiex_adapter *adapter)
>  	mwifiex_dbg(adapter, MSG,
>  		    "== mwifiex dump information to /sys/class/devcoredump start\n");
>  	dev_coredumpv(adapter->dev, adapter->devdump_data, adapter->devdump_len,
> -		      GFP_KERNEL);
> +		      GFP_ATOMIC);
>  	mwifiex_dbg(adapter, MSG,
>  		    "== mwifiex dump information to /sys/class/devcoredump end\n");
>  
> diff --git a/lib/kobject.c b/lib/kobject.c
> index 5f0e71ab292..7672c54944c 100644
> --- a/lib/kobject.c
> +++ b/lib/kobject.c
> @@ -254,7 +254,7 @@ int kobject_set_name_vargs(struct kobject *kobj, const char *fmt,
>  	if (kobj->name && !fmt)
>  		return 0;
>  
> -	s = kvasprintf_const(GFP_KERNEL, fmt, vargs);
> +	s = kvasprintf_const(GFP_ATOMIC, fmt, vargs);
>  	if (!s)
>  		return -ENOMEM;
>  
> @@ -267,7 +267,7 @@ int kobject_set_name_vargs(struct kobject *kobj, const char *fmt,
>  	if (strchr(s, '/')) {
>  		char *t;
>  
> -		t = kstrdup(s, GFP_KERNEL);
> +		t = kstrdup(s, GFP_ATOMIC);
>  		kfree_const(s);
>  		if (!t)
>  			return -ENOMEM;

Please no, you are hurting the whole kernel because of one odd user.
Please do not make these calls under atomic context.

thanks,

greg k-h
Kalle Valo May 23, 2022, 11:31 a.m. UTC | #2
(adding Johannes)

duoming@zju.edu.cn writes:

>> > --- a/lib/kobject.c
>> > +++ b/lib/kobject.c
>> > @@ -254,7 +254,7 @@ int kobject_set_name_vargs(struct kobject *kobj, const char *fmt,
>> >  	if (kobj->name && !fmt)
>> >  		return 0;
>> >  
>> > -	s = kvasprintf_const(GFP_KERNEL, fmt, vargs);
>> > +	s = kvasprintf_const(GFP_ATOMIC, fmt, vargs);
>> >  	if (!s)
>> >  		return -ENOMEM;
>> >  
>> > @@ -267,7 +267,7 @@ int kobject_set_name_vargs(struct kobject *kobj, const char *fmt,
>> >  	if (strchr(s, '/')) {
>> >  		char *t;
>> >  
>> > -		t = kstrdup(s, GFP_KERNEL);
>> > +		t = kstrdup(s, GFP_ATOMIC);
>> >  		kfree_const(s);
>> >  		if (!t)
>> >  			return -ENOMEM;
>> 
>> Please no, you are hurting the whole kernel because of one odd user.
>> Please do not make these calls under atomic context.
>
> Thanks for your time and suggestions. I will remove the gfp_t
> parameter of dev_coredumpv in order to show it could not be used in
> atomic context.

In a way it would be nice to be able to call dev_coredump from atomic
contexts, though I don't know how practical it actually is. Is there any
other option? What about adding a gfp_t parameter to dev_set_name()? Or
is there an alternative for dev_set_name() which can be called in atomic
contexts?

Johannes&Greg, any ideas?
Greg Kroah-Hartman May 23, 2022, 1:41 p.m. UTC | #3
On Mon, May 23, 2022 at 02:31:48PM +0300, Kalle Valo wrote:
> (adding Johannes)
> 
> duoming@zju.edu.cn writes:
> 
> >> > --- a/lib/kobject.c
> >> > +++ b/lib/kobject.c
> >> > @@ -254,7 +254,7 @@ int kobject_set_name_vargs(struct kobject *kobj, const char *fmt,
> >> >  	if (kobj->name && !fmt)
> >> >  		return 0;
> >> >  
> >> > -	s = kvasprintf_const(GFP_KERNEL, fmt, vargs);
> >> > +	s = kvasprintf_const(GFP_ATOMIC, fmt, vargs);
> >> >  	if (!s)
> >> >  		return -ENOMEM;
> >> >  
> >> > @@ -267,7 +267,7 @@ int kobject_set_name_vargs(struct kobject *kobj, const char *fmt,
> >> >  	if (strchr(s, '/')) {
> >> >  		char *t;
> >> >  
> >> > -		t = kstrdup(s, GFP_KERNEL);
> >> > +		t = kstrdup(s, GFP_ATOMIC);
> >> >  		kfree_const(s);
> >> >  		if (!t)
> >> >  			return -ENOMEM;
> >> 
> >> Please no, you are hurting the whole kernel because of one odd user.
> >> Please do not make these calls under atomic context.
> >
> > Thanks for your time and suggestions. I will remove the gfp_t
> > parameter of dev_coredumpv in order to show it could not be used in
> > atomic context.
> 
> In a way it would be nice to be able to call dev_coredump from atomic
> contexts, though I don't know how practical it actually is.

Dumping core information from atomic context feels very very wrong to
me.

Why not just not do that?

> Is there any other option? What about adding a gfp_t parameter to
> dev_set_name()? Or is there an alternative for dev_set_name() which
> can be called in atomic contexts?

dev_set_name() should not be called in atomic context as that implies
you are doing a very slow operation with locks disabled, not a good
thing at all.

thanks,

greg k-h
Johannes Berg May 23, 2022, 1:43 p.m. UTC | #4
On Mon, 2022-05-23 at 14:31 +0300, Kalle Valo wrote:
> 
> In a way it would be nice to be able to call dev_coredump from atomic
> contexts, though I don't know how practical it actually is. Is there any
> other option? What about adding a gfp_t parameter to dev_set_name()? Or
> is there an alternative for dev_set_name() which can be called in atomic
> contexts?
> 
> Johannes&Greg, any ideas?

If you need to, I guess you can collect the data into some area and then
provide it to devcoredump later? Not sure it's a good idea though since
collecting data can take a while.

johannes
Eric W. Biederman May 23, 2022, 2:20 p.m. UTC | #5
Duoming Zhou <duoming@zju.edu.cn> writes:

> There are sleep in atomic context bugs when uploading device dump
> data in mwifiex. The root cause is that dev_coredumpv could not
> be used in atomic contexts, because it calls dev_set_name which
> include operations that may sleep. The call tree shows execution
> paths that could lead to bugs:
>
>    (Interrupt context)
> fw_dump_timer_fn
>   mwifiex_upload_device_dump
>     dev_coredumpv(..., GFP_KERNEL)
>       dev_coredumpm()
>         kzalloc(sizeof(*devcd), gfp); //may sleep
>         dev_set_name
>           kobject_set_name_vargs
>             kvasprintf_const(GFP_KERNEL, ...); //may sleep
>             kstrdup(s, GFP_KERNEL); //may sleep
>
> In order to let dev_coredumpv support atomic contexts, this patch
> changes the gfp_t parameter of kvasprintf_const and kstrdup in
> kobject_set_name_vargs from GFP_KERNEL to GFP_ATOMIC. What's more,
> In order to mitigate bug, this patch changes the gfp_t parameter
> of dev_coredumpv from GFP_KERNEL to GFP_ATOMIC.

vmalloc in atomic context?

Not only does dev_coredumpm set a device name dev_coredumpm creates an
entire device to hold the device dump.

My sense is that either dev_coredumpm needs to be rebuilt on a
completely different principle that does not need a device to hold the
coredump (so that it can be called from interrupt context) or that
dev_coredumpm should never be called in an context that can not sleep.


Looking at fw_dump_timer_fn the only purpose of the timer is to trigger
a device dump after a certain amount of time.  So I suspect all that is
needed to fix this issue is to change the type of devdump_timer to
struct delayed_work and use scheduled_delayed_work instead of mod_timer.


Eric

p.s.  I looked at this because there was coredump in the infrastructure
name, and I do some of the work to keep coredumps working.  Device dump
seems like a much better term, and I wished the designer of the api had
used that instead.
Duoming Zhou May 23, 2022, 3:58 p.m. UTC | #6
Hello maintainers,

Thank you for your time and suggestions! 

> > There are sleep in atomic context bugs when uploading device dump
> > data in mwifiex. The root cause is that dev_coredumpv could not
> > be used in atomic contexts, because it calls dev_set_name which
> > include operations that may sleep. The call tree shows execution
> > paths that could lead to bugs:
> >
> >    (Interrupt context)
> > fw_dump_timer_fn
> >   mwifiex_upload_device_dump
> >     dev_coredumpv(..., GFP_KERNEL)
> >       dev_coredumpm()
> >         kzalloc(sizeof(*devcd), gfp); //may sleep
> >         dev_set_name
> >           kobject_set_name_vargs
> >             kvasprintf_const(GFP_KERNEL, ...); //may sleep
> >             kstrdup(s, GFP_KERNEL); //may sleep
> >
> > In order to let dev_coredumpv support atomic contexts, this patch
> > changes the gfp_t parameter of kvasprintf_const and kstrdup in
> > kobject_set_name_vargs from GFP_KERNEL to GFP_ATOMIC. What's more,
> > In order to mitigate bug, this patch changes the gfp_t parameter
> > of dev_coredumpv from GFP_KERNEL to GFP_ATOMIC.
> 
> vmalloc in atomic context?
> 
> Not only does dev_coredumpm set a device name dev_coredumpm creates an
> entire device to hold the device dump.
> 
> My sense is that either dev_coredumpm needs to be rebuilt on a
> completely different principle that does not need a device to hold the
> coredump (so that it can be called from interrupt context) or that
> dev_coredumpm should never be called in an context that can not sleep.

The following solution removes the gfp_t parameter of dev_coredumpv(), 
dev_coredumpm() and dev_coredumpsg() and change the gfp_t parameter of 
kzalloc() in dev_coredumpm() to GFP_KERNEL, in order to show that these 
functions can not be used in atomic context.

What's more, I move the operations that may sleep into a work item and use
schedule_work() to call a kernel thread to do the operations that may sleep.

diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
index f4d794d6bb8..8535f0bd5df 100644
--- a/drivers/base/devcoredump.c
+++ b/drivers/base/devcoredump.c
@@ -173,15 +173,13 @@ static void devcd_freev(void *data)
  * @dev: the struct device for the crashed device
  * @data: vmalloc data containing the device coredump
  * @datalen: length of the data
- * @gfp: allocation flags
  *
  * This function takes ownership of the vmalloc'ed data and will free
  * it when it is no longer used. See dev_coredumpm() for more information.
  */
-void dev_coredumpv(struct device *dev, void *data, size_t datalen,
-		   gfp_t gfp)
+void dev_coredumpv(struct device *dev, void *data, size_t datalen)
 {
-	dev_coredumpm(dev, NULL, data, datalen, gfp, devcd_readv, devcd_freev);
+	dev_coredumpm(dev, NULL, data, datalen, devcd_readv, devcd_freev);
 }
 EXPORT_SYMBOL_GPL(dev_coredumpv);
 
@@ -236,7 +234,6 @@ static ssize_t devcd_read_from_sgtable(char *buffer, loff_t offset,
  * @owner: the module that contains the read/free functions, use %THIS_MODULE
  * @data: data cookie for the @read/@free functions
  * @datalen: length of the data
- * @gfp: allocation flags
  * @read: function to read from the given buffer
  * @free: function to free the given buffer
  *
@@ -246,7 +243,7 @@ static ssize_t devcd_read_from_sgtable(char *buffer, loff_t offset,
  * function will be called to free the data.
  */
 void dev_coredumpm(struct device *dev, struct module *owner,
-		   void *data, size_t datalen, gfp_t gfp,
+		   void *data, size_t datalen,
 		   ssize_t (*read)(char *buffer, loff_t offset, size_t count,
 				   void *data, size_t datalen),
 		   void (*free)(void *data))
@@ -268,7 +265,7 @@ void dev_coredumpm(struct device *dev, struct module *owner,
 	if (!try_module_get(owner))
 		goto free;
 
-	devcd = kzalloc(sizeof(*devcd), gfp);
+	devcd = kzalloc(sizeof(*devcd), GFP_KERNEL);
 	if (!devcd)
 		goto put_module;
 
@@ -318,7 +315,6 @@ EXPORT_SYMBOL_GPL(dev_coredumpm);
  * @dev: the struct device for the crashed device
  * @table: the dump data
  * @datalen: length of the data
- * @gfp: allocation flags
  *
  * Creates a new device coredump for the given device. If a previous one hasn't
  * been read yet, the new coredump is discarded. The data lifetime is determined
@@ -326,9 +322,9 @@ EXPORT_SYMBOL_GPL(dev_coredumpm);
  * it will free the data.
  */
 void dev_coredumpsg(struct device *dev, struct scatterlist *table,
-		    size_t datalen, gfp_t gfp)
+		    size_t datalen)
 {
-	dev_coredumpm(dev, NULL, table, datalen, gfp, devcd_read_from_sgtable,
+	dev_coredumpm(dev, NULL, table, datalen, devcd_read_from_sgtable,
 		      devcd_free_sgtable);
 }
 EXPORT_SYMBOL_GPL(dev_coredumpsg);
diff --git a/drivers/bluetooth/btmrvl_sdio.c b/drivers/bluetooth/btmrvl_sdio.c
index b8ef66f89fc..9b9728719db 100644
--- a/drivers/bluetooth/btmrvl_sdio.c
+++ b/drivers/bluetooth/btmrvl_sdio.c
@@ -1515,7 +1515,7 @@ static void btmrvl_sdio_coredump(struct device *dev)
 	/* fw_dump_data will be free in device coredump release function
 	 * after 5 min
 	 */
-	dev_coredumpv(&card->func->dev, fw_dump_data, fw_dump_len, GFP_KERNEL);
+	dev_coredumpv(&card->func->dev, fw_dump_data, fw_dump_len);
 	BT_INFO("== btmrvl firmware dump to /sys/class/devcoredump end");
 }
 
diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index f6e91fb432a..fe9229c3216 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -1120,7 +1120,7 @@ static void qca_controller_memdump(struct work_struct *work)
 				    qca_memdump->ram_dump_size);
 			memdump_buf = qca_memdump->memdump_buf_head;
 			dev_coredumpv(&hu->serdev->dev, memdump_buf,
-				      qca_memdump->received_dump, GFP_KERNEL);
+				      qca_memdump->received_dump);
 			cancel_delayed_work(&qca->ctrl_memdump_timeout);
 			kfree(qca->qca_memdump);
 			qca->qca_memdump = NULL;
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
index f418e0b7577..519fcb234b3 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
@@ -225,5 +225,5 @@ void etnaviv_core_dump(struct etnaviv_gem_submit *submit)
 
 	etnaviv_core_dump_header(&iter, ETDUMP_BUF_END, iter.data);
 
-	dev_coredumpv(gpu->dev, iter.start, iter.data - iter.start, GFP_KERNEL);
+	dev_coredumpv(gpu->dev, iter.start, iter.data - iter.start);
 }
diff --git a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c
index e75b97127c0..a8b93664276 100644
--- a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c
+++ b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c
@@ -74,7 +74,7 @@ static void _msm_disp_snapshot_work(struct kthread_work *work)
 	 * If there is a codedump pending for the device, the dev_coredumpm()
 	 * will also free new coredump state.
 	 */
-	dev_coredumpm(disp_state->dev, THIS_MODULE, disp_state, 0, GFP_KERNEL,
+	dev_coredumpm(disp_state->dev, THIS_MODULE, disp_state, 0,
 			disp_devcoredump_read, msm_disp_state_free);
 }
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index faf0c242874..c93dbd3840f 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -317,7 +317,7 @@ static void msm_gpu_crashstate_capture(struct msm_gpu *gpu,
 	gpu->crashstate = state;
 
 	/* FIXME: Release the crashstate if this errors out? */
-	dev_coredumpm(gpu->dev->dev, THIS_MODULE, gpu, 0, GFP_KERNEL,
+	dev_coredumpm(gpu->dev->dev, THIS_MODULE, gpu, 0,
 		msm_gpu_devcoredump_read, msm_gpu_devcoredump_free);
 }
 #else
diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c
index 877eca12580..db84dfb3fb1 100644
--- a/drivers/media/platform/qcom/venus/core.c
+++ b/drivers/media/platform/qcom/venus/core.c
@@ -49,7 +49,7 @@ static void venus_coredump(struct venus_core *core)
 
 	memcpy(data, mem_va, mem_size);
 	memunmap(mem_va);
-	dev_coredumpv(dev, data, mem_size, GFP_KERNEL);
+	dev_coredumpv(dev, data, mem_size);
 }
 
 static void venus_event_notify(struct venus_core *core, u32 event)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-dump.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-dump.c
index c991b30bc9f..fa520ab7c96 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-dump.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-dump.c
@@ -281,5 +281,5 @@ void mcp251xfd_dump(const struct mcp251xfd_priv *priv)
 	mcp251xfd_dump_end(priv, &iter);
 
 	dev_coredumpv(&priv->spi->dev, iter.start,
-		      iter.data - iter.start, GFP_KERNEL);
+		      iter.data - iter.start);
 }
diff --git a/drivers/net/wireless/ath/ath10k/coredump.c b/drivers/net/wireless/ath/ath10k/coredump.c
index fe6b6f97a91..dc923706992 100644
--- a/drivers/net/wireless/ath/ath10k/coredump.c
+++ b/drivers/net/wireless/ath/ath10k/coredump.c
@@ -1607,7 +1607,7 @@ int ath10k_coredump_submit(struct ath10k *ar)
 		return -ENODATA;
 	}
 
-	dev_coredumpv(ar->dev, dump, le32_to_cpu(dump->len), GFP_KERNEL);
+	dev_coredumpv(ar->dev, dump, le32_to_cpu(dump->len));
 
 	return 0;
 }
diff --git a/drivers/net/wireless/ath/wil6210/wil_crash_dump.c b/drivers/net/wireless/ath/wil6210/wil_crash_dump.c
index 89c12cb2aaa..79299609dd6 100644
--- a/drivers/net/wireless/ath/wil6210/wil_crash_dump.c
+++ b/drivers/net/wireless/ath/wil6210/wil_crash_dump.c
@@ -117,6 +117,6 @@ void wil_fw_core_dump(struct wil6210_priv *wil)
 	/* fw_dump_data will be free in device coredump release function
 	 * after 5 min
 	 */
-	dev_coredumpv(wil_to_dev(wil), fw_dump_data, fw_dump_size, GFP_KERNEL);
+	dev_coredumpv(wil_to_dev(wil), fw_dump_data, fw_dump_size);
 	wil_info(wil, "fw core dumped, size %d bytes\n", fw_dump_size);
 }
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/debug.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/debug.c
index eecf8a38d94..87f3652ef3b 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/debug.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/debug.c
@@ -37,7 +37,7 @@ int brcmf_debug_create_memdump(struct brcmf_bus *bus, const void *data,
 		return err;
 	}
 
-	dev_coredumpv(bus->dev, dump, len + ramsize, GFP_KERNEL);
+	dev_coredumpv(bus->dev, dump, len + ramsize);
 
 	return 0;
 }
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
index abf49022edb..f2f7cf494a8 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
+++ b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
@@ -2601,8 +2601,7 @@ static void iwl_fw_error_dump(struct iwl_fw_runtime *fwrt,
 					     fw_error_dump.trans_ptr->data,
 					     fw_error_dump.trans_ptr->len,
 					     fw_error_dump.fwrt_len);
-		dev_coredumpsg(fwrt->trans->dev, sg_dump_data, file_len,
-			       GFP_KERNEL);
+		dev_coredumpsg(fwrt->trans->dev, sg_dump_data, file_len);
 	}
 	vfree(fw_error_dump.fwrt_ptr);
 	vfree(fw_error_dump.trans_ptr);
@@ -2647,8 +2646,7 @@ static void iwl_fw_error_ini_dump(struct iwl_fw_runtime *fwrt,
 					     entry->data, entry->size, offs);
 			offs += entry->size;
 		}
-		dev_coredumpsg(fwrt->trans->dev, sg_dump_data, file_len,
-			       GFP_KERNEL);
+		dev_coredumpsg(fwrt->trans->dev, sg_dump_data, file_len);
 	}
 	iwl_dump_ini_list_free(&dump_list);
 }
diff --git a/drivers/net/wireless/marvell/mwifiex/init.c b/drivers/net/wireless/marvell/mwifiex/init.c
index 88c72d1827a..cc3f1121eb9 100644
--- a/drivers/net/wireless/marvell/mwifiex/init.c
+++ b/drivers/net/wireless/marvell/mwifiex/init.c
@@ -63,11 +63,19 @@ static void wakeup_timer_fn(struct timer_list *t)
 		adapter->if_ops.card_reset(adapter);
 }
 
+static void fw_dump_work(struct work_struct *work)
+{
+	struct mwifiex_adapter *adapter =
+		container_of(work, struct mwifiex_adapter, devdump_work);
+
+	mwifiex_upload_device_dump(adapter);
+}
+
 static void fw_dump_timer_fn(struct timer_list *t)
 {
 	struct mwifiex_adapter *adapter = from_timer(adapter, t, devdump_timer);
 
-	mwifiex_upload_device_dump(adapter);
+	schedule_work(&adapter->devdump_work);
 }
 
 /*
@@ -321,6 +329,7 @@ static void mwifiex_init_adapter(struct mwifiex_adapter *adapter)
 	adapter->active_scan_triggered = false;
 	timer_setup(&adapter->wakeup_timer, wakeup_timer_fn, 0);
 	adapter->devdump_len = 0;
+	INIT_WORK(&adapter->devdump_work, fw_dump_work);
 	timer_setup(&adapter->devdump_timer, fw_dump_timer_fn, 0);
 }
 
@@ -401,6 +410,7 @@ mwifiex_adapter_cleanup(struct mwifiex_adapter *adapter)
 {
 	del_timer(&adapter->wakeup_timer);
 	del_timer_sync(&adapter->devdump_timer);
+	cancel_work_sync(&adapter->devdump_work);
 	mwifiex_cancel_all_pending_cmd(adapter);
 	wake_up_interruptible(&adapter->cmd_wait_q.wait);
 	wake_up_interruptible(&adapter->hs_activate_wait_q);
diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c
index ace7371c477..26fef0ab1b0 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.c
+++ b/drivers/net/wireless/marvell/mwifiex/main.c
@@ -1115,8 +1115,7 @@ void mwifiex_upload_device_dump(struct mwifiex_adapter *adapter)
 	 */
 	mwifiex_dbg(adapter, MSG,
 		    "== mwifiex dump information to /sys/class/devcoredump start\n");
-	dev_coredumpv(adapter->dev, adapter->devdump_data, adapter->devdump_len,
-		      GFP_KERNEL);
+	dev_coredumpv(adapter->dev, adapter->devdump_data, adapter->devdump_len);
 	mwifiex_dbg(adapter, MSG,
 		    "== mwifiex dump information to /sys/class/devcoredump end\n");
 
diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h
index 332dd1c8db3..c8ac2f57f18 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.h
+++ b/drivers/net/wireless/marvell/mwifiex/main.h
@@ -900,6 +900,7 @@ struct mwifiex_adapter {
 	struct work_struct rx_work;
 	struct workqueue_struct *dfs_workqueue;
 	struct work_struct dfs_work;
+	struct work_struct devdump_work;
 	bool rx_work_enabled;
 	bool rx_processing;
 	bool delay_main_work;
diff --git a/drivers/net/wireless/marvell/mwifiex/sta_event.c b/drivers/net/wireless/marvell/mwifiex/sta_event.c
index 7d42c5d2dbf..8e28d0107d7 100644
--- a/drivers/net/wireless/marvell/mwifiex/sta_event.c
+++ b/drivers/net/wireless/marvell/mwifiex/sta_event.c
@@ -644,6 +644,7 @@ mwifiex_fw_dump_info_event(struct mwifiex_private *priv,
 
 upload_dump:
 	del_timer_sync(&adapter->devdump_timer);
+	cancel_work_sync(&adapter->devdump_work);
 	mwifiex_upload_device_dump(adapter);
 }
 
diff --git a/drivers/net/wireless/mediatek/mt76/mt7615/mac.c b/drivers/net/wireless/mediatek/mt76/mt7615/mac.c
index bd687f7de62..5336fe8c668 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7615/mac.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7615/mac.c
@@ -2421,6 +2421,5 @@ void mt7615_coredump_work(struct work_struct *work)
 
 		dev_kfree_skb(skb);
 	}
-	dev_coredumpv(dev->mt76.dev, dump, MT76_CONNAC_COREDUMP_SZ,
-		      GFP_KERNEL);
+	dev_coredumpv(dev->mt76.dev, dump, MT76_CONNAC_COREDUMP_SZ);
 }
diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/mac.c b/drivers/net/wireless/mediatek/mt76/mt7921/mac.c
index 233998ca485..fec51e460bf 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7921/mac.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/mac.c
@@ -1619,8 +1619,7 @@ void mt7921_coredump_work(struct work_struct *work)
 	}
 
 	if (dump)
-		dev_coredumpv(dev->mt76.dev, dump, MT76_CONNAC_COREDUMP_SZ,
-			      GFP_KERNEL);
+		dev_coredumpv(dev->mt76.dev, dump, MT76_CONNAC_COREDUMP_SZ);
 
 	mt7921_reset(&dev->mt76);
 }
diff --git a/drivers/net/wireless/realtek/rtw88/main.c b/drivers/net/wireless/realtek/rtw88/main.c
index 8b9899e41b0..2fec240070b 100644
--- a/drivers/net/wireless/realtek/rtw88/main.c
+++ b/drivers/net/wireless/realtek/rtw88/main.c
@@ -413,7 +413,7 @@ static void rtw_fwcd_dump(struct rtw_dev *rtwdev)
 	 * framework. Note that a new dump will be discarded if a previous one
 	 * hasn't been released yet.
 	 */
-	dev_coredumpv(rtwdev->dev, desc->data, desc->size, GFP_KERNEL);
+	dev_coredumpv(rtwdev->dev, desc->data, desc->size);
 }
 
 static void rtw_fwcd_free(struct rtw_dev *rtwdev, bool free_self)
diff --git a/drivers/remoteproc/qcom_q6v5_mss.c b/drivers/remoteproc/qcom_q6v5_mss.c
index af217de75e4..813d87faef6 100644
--- a/drivers/remoteproc/qcom_q6v5_mss.c
+++ b/drivers/remoteproc/qcom_q6v5_mss.c
@@ -597,7 +597,7 @@ static void q6v5_dump_mba_logs(struct q6v5 *qproc)
 	data = vmalloc(MBA_LOG_SIZE);
 	if (data) {
 		memcpy(data, mba_region, MBA_LOG_SIZE);
-		dev_coredumpv(&rproc->dev, data, MBA_LOG_SIZE, GFP_KERNEL);
+		dev_coredumpv(&rproc->dev, data, MBA_LOG_SIZE);
 	}
 	memunmap(mba_region);
 }
diff --git a/drivers/remoteproc/remoteproc_coredump.c b/drivers/remoteproc/remoteproc_coredump.c
index 4b093420d98..cd55c2abd22 100644
--- a/drivers/remoteproc/remoteproc_coredump.c
+++ b/drivers/remoteproc/remoteproc_coredump.c
@@ -309,7 +309,7 @@ void rproc_coredump(struct rproc *rproc)
 		phdr += elf_size_of_phdr(class);
 	}
 	if (dump_conf == RPROC_COREDUMP_ENABLED) {
-		dev_coredumpv(&rproc->dev, data, data_size, GFP_KERNEL);
+		dev_coredumpv(&rproc->dev, data, data_size);
 		return;
 	}
 
@@ -318,7 +318,7 @@ void rproc_coredump(struct rproc *rproc)
 	dump_state.header = data;
 	init_completion(&dump_state.dump_done);
 
-	dev_coredumpm(&rproc->dev, NULL, &dump_state, data_size, GFP_KERNEL,
+	dev_coredumpm(&rproc->dev, NULL, &dump_state, data_size,
 		      rproc_coredump_read, rproc_coredump_free);
 
 	/*
@@ -449,7 +449,7 @@ void rproc_coredump_using_sections(struct rproc *rproc)
 	}
 
 	if (dump_conf == RPROC_COREDUMP_ENABLED) {
-		dev_coredumpv(&rproc->dev, data, data_size, GFP_KERNEL);
+		dev_coredumpv(&rproc->dev, data, data_size);
 		return;
 	}
 
@@ -458,7 +458,7 @@ void rproc_coredump_using_sections(struct rproc *rproc)
 	dump_state.header = data;
 	init_completion(&dump_state.dump_done);
 
-	dev_coredumpm(&rproc->dev, NULL, &dump_state, data_size, GFP_KERNEL,
+	dev_coredumpm(&rproc->dev, NULL, &dump_state, data_size,
 		      rproc_coredump_read, rproc_coredump_free);
 
 	/* Wait until the dump is read and free is called. Data is freed
diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h
index 22fabdeed29..b41850366bc 100644
--- a/include/drm/drm_print.h
+++ b/include/drm/drm_print.h
@@ -162,7 +162,7 @@ struct drm_print_iterator {
  *	void makecoredump(...)
  *	{
  *		...
- *		dev_coredumpm(dev, THIS_MODULE, data, 0, GFP_KERNEL,
+ *		dev_coredumpm(dev, THIS_MODULE, data, 0,
  *			coredump_read, ...)
  *	}
  *
diff --git a/include/linux/devcoredump.h b/include/linux/devcoredump.h
index c008169ed2c..c7d840d824c 100644
--- a/include/linux/devcoredump.h
+++ b/include/linux/devcoredump.h
@@ -52,27 +52,26 @@ static inline void _devcd_free_sgtable(struct scatterlist *table)
 
 
 #ifdef CONFIG_DEV_COREDUMP
-void dev_coredumpv(struct device *dev, void *data, size_t datalen,
-		   gfp_t gfp);
+void dev_coredumpv(struct device *dev, void *data, size_t datalen);
 
 void dev_coredumpm(struct device *dev, struct module *owner,
-		   void *data, size_t datalen, gfp_t gfp,
+		   void *data, size_t datalen,
 		   ssize_t (*read)(char *buffer, loff_t offset, size_t count,
 				   void *data, size_t datalen),
 		   void (*free)(void *data));
 
 void dev_coredumpsg(struct device *dev, struct scatterlist *table,
-		    size_t datalen, gfp_t gfp);
+		    size_t datalen);
 #else
 static inline void dev_coredumpv(struct device *dev, void *data,
-				 size_t datalen, gfp_t gfp)
+				 size_t datalen)
 {
 	vfree(data);
 }
 
 static inline void
 dev_coredumpm(struct device *dev, struct module *owner,
-	      void *data, size_t datalen, gfp_t gfp,
+	      void *data, size_t datalen,
 	      ssize_t (*read)(char *buffer, loff_t offset, size_t count,
 			      void *data, size_t datalen),
 	      void (*free)(void *data))
@@ -81,7 +80,7 @@ dev_coredumpm(struct device *dev, struct module *owner,
 }
 
 static inline void dev_coredumpsg(struct device *dev, struct scatterlist *table,
-				  size_t datalen, gfp_t gfp)
+				  size_t datalen)
 {
 	_devcd_free_sgtable(table);
 }
diff --git a/sound/soc/intel/catpt/dsp.c b/sound/soc/intel/catpt/dsp.c
index 346bec00030..d2afe9ff1e3 100644
--- a/sound/soc/intel/catpt/dsp.c
+++ b/sound/soc/intel/catpt/dsp.c
@@ -539,7 +539,7 @@ int catpt_coredump(struct catpt_dev *cdev)
 		pos += CATPT_DMA_REGS_SIZE;
 	}
 
-	dev_coredumpv(cdev->dev, dump, dump_size, GFP_KERNEL);
+	dev_coredumpv(cdev->dev, dump, dump_size);
 
 	return 0;
 }

Do you think this solution is ok? I will try to test it and I'd appreciate you
if you test and give feedback.

Best regards,
Duoming Zhou
Kalle Valo May 23, 2022, 4:27 p.m. UTC | #7
Greg KH <gregkh@linuxfoundation.org> writes:

> On Mon, May 23, 2022 at 02:31:48PM +0300, Kalle Valo wrote:
>> (adding Johannes)
>> 
>> duoming@zju.edu.cn writes:
>> 
>> >> > --- a/lib/kobject.c
>> >> > +++ b/lib/kobject.c
>> >> > @@ -254,7 +254,7 @@ int kobject_set_name_vargs(struct kobject *kobj, const char *fmt,
>> >> >  	if (kobj->name && !fmt)
>> >> >  		return 0;
>> >> >  
>> >> > -	s = kvasprintf_const(GFP_KERNEL, fmt, vargs);
>> >> > +	s = kvasprintf_const(GFP_ATOMIC, fmt, vargs);
>> >> >  	if (!s)
>> >> >  		return -ENOMEM;
>> >> >  
>> >> > @@ -267,7 +267,7 @@ int kobject_set_name_vargs(struct kobject *kobj, const char *fmt,
>> >> >  	if (strchr(s, '/')) {
>> >> >  		char *t;
>> >> >  
>> >> > -		t = kstrdup(s, GFP_KERNEL);
>> >> > +		t = kstrdup(s, GFP_ATOMIC);
>> >> >  		kfree_const(s);
>> >> >  		if (!t)
>> >> >  			return -ENOMEM;
>> >> 
>> >> Please no, you are hurting the whole kernel because of one odd user.
>> >> Please do not make these calls under atomic context.
>> >
>> > Thanks for your time and suggestions. I will remove the gfp_t
>> > parameter of dev_coredumpv in order to show it could not be used in
>> > atomic context.
>> 
>> In a way it would be nice to be able to call dev_coredump from atomic
>> contexts, though I don't know how practical it actually is.
>
> Dumping core information from atomic context feels very very wrong to
> me.
>
> Why not just not do that?

I was wondering why dev_coredumpm() has the gfp parameter in the first
place. But yeah, removing gfp from devcoredump API altogether sounds
like the best thing to do.
Kalle Valo May 23, 2022, 4:31 p.m. UTC | #8
duoming@zju.edu.cn writes:

> Hello maintainers,
>
> Thank you for your time and suggestions! 
>
>> > There are sleep in atomic context bugs when uploading device dump
>> > data in mwifiex. The root cause is that dev_coredumpv could not
>> > be used in atomic contexts, because it calls dev_set_name which
>> > include operations that may sleep. The call tree shows execution
>> > paths that could lead to bugs:
>> >
>> >    (Interrupt context)
>> > fw_dump_timer_fn
>> >   mwifiex_upload_device_dump
>> >     dev_coredumpv(..., GFP_KERNEL)
>> >       dev_coredumpm()
>> >         kzalloc(sizeof(*devcd), gfp); //may sleep
>> >         dev_set_name
>> >           kobject_set_name_vargs
>> >             kvasprintf_const(GFP_KERNEL, ...); //may sleep
>> >             kstrdup(s, GFP_KERNEL); //may sleep
>> >
>> > In order to let dev_coredumpv support atomic contexts, this patch
>> > changes the gfp_t parameter of kvasprintf_const and kstrdup in
>> > kobject_set_name_vargs from GFP_KERNEL to GFP_ATOMIC. What's more,
>> > In order to mitigate bug, this patch changes the gfp_t parameter
>> > of dev_coredumpv from GFP_KERNEL to GFP_ATOMIC.
>> 
>> vmalloc in atomic context?
>> 
>> Not only does dev_coredumpm set a device name dev_coredumpm creates an
>> entire device to hold the device dump.
>> 
>> My sense is that either dev_coredumpm needs to be rebuilt on a
>> completely different principle that does not need a device to hold the
>> coredump (so that it can be called from interrupt context) or that
>> dev_coredumpm should never be called in an context that can not sleep.
>
> The following solution removes the gfp_t parameter of dev_coredumpv(), 
> dev_coredumpm() and dev_coredumpsg() and change the gfp_t parameter of 
> kzalloc() in dev_coredumpm() to GFP_KERNEL, in order to show that these 
> functions can not be used in atomic context.
>
> What's more, I move the operations that may sleep into a work item and use
> schedule_work() to call a kernel thread to do the operations that may sleep.
>

[...]

> --- a/drivers/net/wireless/marvell/mwifiex/init.c
> +++ b/drivers/net/wireless/marvell/mwifiex/init.c
> @@ -63,11 +63,19 @@ static void wakeup_timer_fn(struct timer_list *t)
>  		adapter->if_ops.card_reset(adapter);
>  }
>  
> +static void fw_dump_work(struct work_struct *work)
> +{
> +	struct mwifiex_adapter *adapter =
> +		container_of(work, struct mwifiex_adapter, devdump_work);
> +
> +	mwifiex_upload_device_dump(adapter);
> +}
> +
>  static void fw_dump_timer_fn(struct timer_list *t)
>  {
>  	struct mwifiex_adapter *adapter = from_timer(adapter, t, devdump_timer);
>  
> -	mwifiex_upload_device_dump(adapter);
> +	schedule_work(&adapter->devdump_work);
>  }
>  
>  /*
> @@ -321,6 +329,7 @@ static void mwifiex_init_adapter(struct mwifiex_adapter *adapter)
>  	adapter->active_scan_triggered = false;
>  	timer_setup(&adapter->wakeup_timer, wakeup_timer_fn, 0);
>  	adapter->devdump_len = 0;
> +	INIT_WORK(&adapter->devdump_work, fw_dump_work);
>  	timer_setup(&adapter->devdump_timer, fw_dump_timer_fn, 0);
>  }
>  
> @@ -401,6 +410,7 @@ mwifiex_adapter_cleanup(struct mwifiex_adapter *adapter)
>  {
>  	del_timer(&adapter->wakeup_timer);
>  	del_timer_sync(&adapter->devdump_timer);
> +	cancel_work_sync(&adapter->devdump_work);
>  	mwifiex_cancel_all_pending_cmd(adapter);
>  	wake_up_interruptible(&adapter->cmd_wait_q.wait);
>  	wake_up_interruptible(&adapter->hs_activate_wait_q);

In this patch please only do the API change in mwifiex. The change to
using a workqueue needs to be in separate patch so it can be properly
tested. I don't want a change like that going to the kernel without
testing on a real device.
Brian Norris May 23, 2022, 5:55 p.m. UTC | #9
+ Johannes (you should check MAINTAINERS; devcoredump has a specified
maintainer)

On Sun, May 22, 2022 at 10:29 PM Duoming Zhou <duoming@zju.edu.cn> wrote:
>
> There are sleep in atomic context bugs when uploading device dump
> data in mwifiex. The root cause is that dev_coredumpv could not
> be used in atomic contexts, because it calls dev_set_name which
> include operations that may sleep. The call tree shows execution
> paths that could lead to bugs:
>
>    (Interrupt context)
> fw_dump_timer_fn
>   mwifiex_upload_device_dump
>     dev_coredumpv(..., GFP_KERNEL)
>       dev_coredumpm()
>         kzalloc(sizeof(*devcd), gfp); //may sleep
>         dev_set_name
>           kobject_set_name_vargs
>             kvasprintf_const(GFP_KERNEL, ...); //may sleep
>             kstrdup(s, GFP_KERNEL); //may sleep

I was only half paying attention to this patch earlier, but I realize
one source of my confusion: you're blaming the fix wrong. This piece
of code was only added for mwifiex's USB support; the SDIO/PCIe
support is totally fine, as we perform the dump from a worker, not a
timer. So, you have the Fixes tag wrong.

> In order to let dev_coredumpv support atomic contexts, this patch
> changes the gfp_t parameter of kvasprintf_const and kstrdup in
> kobject_set_name_vargs from GFP_KERNEL to GFP_ATOMIC. What's more,
> In order to mitigate bug, this patch changes the gfp_t parameter
> of dev_coredumpv from GFP_KERNEL to GFP_ATOMIC.
>
> Fixes: 57670ee882d4 ("mwifiex: device dump support via devcoredump framework")

That's wrong. Should be:

Fixes: f5ecd02a8b20 mwifiex: device dump support for usb interface

> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> ---
> Changes in v3:
>   - Let dev_coredumpv support atomic contexts.
>
>  drivers/net/wireless/marvell/mwifiex/main.c | 2 +-
>  lib/kobject.c                               | 4 ++--

You almost definitely want to split this in two. One to fix
devcoredump to _really_ support the gfp arg (or else to drop it), and
one to start using it appropriately in mwifiex.

>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c
> index ace7371c477..258906920a2 100644
> --- a/drivers/net/wireless/marvell/mwifiex/main.c
> +++ b/drivers/net/wireless/marvell/mwifiex/main.c
> @@ -1116,7 +1116,7 @@ void mwifiex_upload_device_dump(struct mwifiex_adapter *adapter)
>         mwifiex_dbg(adapter, MSG,
>                     "== mwifiex dump information to /sys/class/devcoredump start\n");
>         dev_coredumpv(adapter->dev, adapter->devdump_data, adapter->devdump_len,
> -                     GFP_KERNEL);
> +                     GFP_ATOMIC);

As noted above, PCIe and SDIO support is working just fine. Seems a
bit much to force it to be GFP_ATOMIC in those cases.

Maybe you also need a gfp arg for mwifiex_upload_device_dump()?

Brian

>         mwifiex_dbg(adapter, MSG,
>                     "== mwifiex dump information to /sys/class/devcoredump end\n");
>
Brian Norris May 23, 2022, 7:42 p.m. UTC | #10
(I think people generally agreed on this approach, but please submit a
new series, with separate patches)

On Mon, May 23, 2022 at 12:27 PM <duoming@zju.edu.cn> wrote:
> What's more, I move the operations that may sleep into a work item and use
> schedule_work() to call a kernel thread to do the operations that may sleep.

You end up with a timer that just exists to kick a work item. Eric
suggested you just use a delayed_work, and then you don't need both a
timer and a work struct.

Brian
Duoming Zhou May 24, 2022, 1:54 a.m. UTC | #11
Hello,

On Mon, 23 May 2022 12:42:44 -0700 Brian wrote:

> (I think people generally agreed on this approach, but please submit a
> new series, with separate patches)
> 
> On Mon, May 23, 2022 at 12:27 PM <duoming@zju.edu.cn> wrote:
> > What's more, I move the operations that may sleep into a work item and use
> > schedule_work() to call a kernel thread to do the operations that may sleep.
> 
> You end up with a timer that just exists to kick a work item. Eric
> suggested you just use a delayed_work, and then you don't need both a
> timer and a work struct.

I will submit a new series, one is removing the gfp_t parameter of dev_coredumpv,
another is using it properly in mwifiex(put the dev_coredumpv in the delayed_work).
Thank you for your suggestions!

Best regards,
Duoming Zhou
Duoming Zhou May 24, 2022, 2:08 a.m. UTC | #12
Hello,

On Mon, 23 May 2022 19:31:35 +0300 Kalle Valo wrote:

> >> > There are sleep in atomic context bugs when uploading device dump
> >> > data in mwifiex. The root cause is that dev_coredumpv could not
> >> > be used in atomic contexts, because it calls dev_set_name which
> >> > include operations that may sleep. The call tree shows execution
> >> > paths that could lead to bugs:
> >> >
> >> >    (Interrupt context)
> >> > fw_dump_timer_fn
> >> >   mwifiex_upload_device_dump
> >> >     dev_coredumpv(..., GFP_KERNEL)
> >> >       dev_coredumpm()
> >> >         kzalloc(sizeof(*devcd), gfp); //may sleep
> >> >         dev_set_name
> >> >           kobject_set_name_vargs
> >> >             kvasprintf_const(GFP_KERNEL, ...); //may sleep
> >> >             kstrdup(s, GFP_KERNEL); //may sleep
> >> >
> >> > In order to let dev_coredumpv support atomic contexts, this patch
> >> > changes the gfp_t parameter of kvasprintf_const and kstrdup in
> >> > kobject_set_name_vargs from GFP_KERNEL to GFP_ATOMIC. What's more,
> >> > In order to mitigate bug, this patch changes the gfp_t parameter
> >> > of dev_coredumpv from GFP_KERNEL to GFP_ATOMIC.
> >> 
> >> vmalloc in atomic context?
> >> 
> >> Not only does dev_coredumpm set a device name dev_coredumpm creates an
> >> entire device to hold the device dump.
> >> 
> >> My sense is that either dev_coredumpm needs to be rebuilt on a
> >> completely different principle that does not need a device to hold the
> >> coredump (so that it can be called from interrupt context) or that
> >> dev_coredumpm should never be called in an context that can not sleep.
> >
> > The following solution removes the gfp_t parameter of dev_coredumpv(), 
> > dev_coredumpm() and dev_coredumpsg() and change the gfp_t parameter of 
> > kzalloc() in dev_coredumpm() to GFP_KERNEL, in order to show that these 
> > functions can not be used in atomic context.
> >
> > What's more, I move the operations that may sleep into a work item and use
> > schedule_work() to call a kernel thread to do the operations that may sleep.
> >
> 
> [...]
> 
> > --- a/drivers/net/wireless/marvell/mwifiex/init.c
> > +++ b/drivers/net/wireless/marvell/mwifiex/init.c
> > @@ -63,11 +63,19 @@ static void wakeup_timer_fn(struct timer_list *t)
> >  		adapter->if_ops.card_reset(adapter);
> >  }
> >  
> > +static void fw_dump_work(struct work_struct *work)
> > +{
> > +	struct mwifiex_adapter *adapter =
> > +		container_of(work, struct mwifiex_adapter, devdump_work);
> > +
> > +	mwifiex_upload_device_dump(adapter);
> > +}
> > +
> >  static void fw_dump_timer_fn(struct timer_list *t)
> >  {
> >  	struct mwifiex_adapter *adapter = from_timer(adapter, t, devdump_timer);
> >  
> > -	mwifiex_upload_device_dump(adapter);
> > +	schedule_work(&adapter->devdump_work);
> >  }
> >  
> >  /*
> > @@ -321,6 +329,7 @@ static void mwifiex_init_adapter(struct mwifiex_adapter *adapter)
> >  	adapter->active_scan_triggered = false;
> >  	timer_setup(&adapter->wakeup_timer, wakeup_timer_fn, 0);
> >  	adapter->devdump_len = 0;
> > +	INIT_WORK(&adapter->devdump_work, fw_dump_work);
> >  	timer_setup(&adapter->devdump_timer, fw_dump_timer_fn, 0);
> >  }
> >  
> > @@ -401,6 +410,7 @@ mwifiex_adapter_cleanup(struct mwifiex_adapter *adapter)
> >  {
> >  	del_timer(&adapter->wakeup_timer);
> >  	del_timer_sync(&adapter->devdump_timer);
> > +	cancel_work_sync(&adapter->devdump_work);
> >  	mwifiex_cancel_all_pending_cmd(adapter);
> >  	wake_up_interruptible(&adapter->cmd_wait_q.wait);
> >  	wake_up_interruptible(&adapter->hs_activate_wait_q);
> 
> In this patch please only do the API change in mwifiex. The change to
> using a workqueue needs to be in separate patch so it can be properly
> tested. I don't want a change like that going to the kernel without
> testing on a real device.

Thank you for your suggestions! I will only do the API change in mwifies and
keep other places that call dev_coredumpv remain unchanged. I will test the
new separate patches on real Marvell wifi chip these days.

Best regards,
Duoming Zhou
diff mbox series

Patch

diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c
index ace7371c477..258906920a2 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.c
+++ b/drivers/net/wireless/marvell/mwifiex/main.c
@@ -1116,7 +1116,7 @@  void mwifiex_upload_device_dump(struct mwifiex_adapter *adapter)
 	mwifiex_dbg(adapter, MSG,
 		    "== mwifiex dump information to /sys/class/devcoredump start\n");
 	dev_coredumpv(adapter->dev, adapter->devdump_data, adapter->devdump_len,
-		      GFP_KERNEL);
+		      GFP_ATOMIC);
 	mwifiex_dbg(adapter, MSG,
 		    "== mwifiex dump information to /sys/class/devcoredump end\n");
 
diff --git a/lib/kobject.c b/lib/kobject.c
index 5f0e71ab292..7672c54944c 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -254,7 +254,7 @@  int kobject_set_name_vargs(struct kobject *kobj, const char *fmt,
 	if (kobj->name && !fmt)
 		return 0;
 
-	s = kvasprintf_const(GFP_KERNEL, fmt, vargs);
+	s = kvasprintf_const(GFP_ATOMIC, fmt, vargs);
 	if (!s)
 		return -ENOMEM;
 
@@ -267,7 +267,7 @@  int kobject_set_name_vargs(struct kobject *kobj, const char *fmt,
 	if (strchr(s, '/')) {
 		char *t;
 
-		t = kstrdup(s, GFP_KERNEL);
+		t = kstrdup(s, GFP_ATOMIC);
 		kfree_const(s);
 		if (!t)
 			return -ENOMEM;