diff mbox series

[3/8] bdi: add a ->dev_name field to struct backing_dev_info

Message ID 20200416165453.1080463-4-hch@lst.de (mailing list archive)
State New, archived
Headers show
Series [1/8] bdi: move bdi_dev_name out of line | expand

Commit Message

Christoph Hellwig April 16, 2020, 4:54 p.m. UTC
Cache a copy of the name for the life time of the backing_dev_info
structure so that we can reference it even after unregistering.

Fixes: 68f23b89067f ("memcg: fix a crash in wb_workfn when a device disappears")
Reported-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/backing-dev-defs.h | 1 +
 mm/backing-dev.c                 | 5 +++--
 2 files changed, 4 insertions(+), 2 deletions(-)

Comments

Jan Kara April 17, 2020, 8:59 a.m. UTC | #1
On Thu 16-04-20 18:54:48, Christoph Hellwig wrote:
> Cache a copy of the name for the life time of the backing_dev_info
> structure so that we can reference it even after unregistering.
> 
> Fixes: 68f23b89067f ("memcg: fix a crash in wb_workfn when a device disappears")
> Reported-by: Yufen Yu <yuyufen@huawei.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

...

> @@ -938,7 +938,8 @@ int bdi_register_va(struct backing_dev_info *bdi, const char *fmt, va_list args)
>  	if (bdi->dev)	/* The driver needs to use separate queues per device */
>  		return 0;
>  
> -	dev = device_create_vargs(bdi_class, NULL, MKDEV(0, 0), bdi, fmt, args);
> +	vsnprintf(bdi->dev_name, sizeof(bdi->dev_name), fmt, args);
> +	dev = device_create(bdi_class, NULL, MKDEV(0, 0), bdi, bdi->dev_name);
>  	if (IS_ERR(dev))
>  		return PTR_ERR(dev);
>  

This can have a sideeffect not only bdi->dev_name will be truncated to 64
chars (which generally doesn't matter) but possibly also kobject name will
be truncated in the same way.  Which may have user visible effects. E.g.
for fs/vboxsf 64 chars need not be enough. So shouldn't we rather do it the
other way around - i.e., let device_create_vargs() create the device name
and then copy to bdi->dev_name whatever fits?

								Honza
Christoph Hellwig April 17, 2020, 1:01 p.m. UTC | #2
On Fri, Apr 17, 2020 at 10:59:09AM +0200, Jan Kara wrote:
> > -	dev = device_create_vargs(bdi_class, NULL, MKDEV(0, 0), bdi, fmt, args);
> > +	vsnprintf(bdi->dev_name, sizeof(bdi->dev_name), fmt, args);
> > +	dev = device_create(bdi_class, NULL, MKDEV(0, 0), bdi, bdi->dev_name);
> >  	if (IS_ERR(dev))
> >  		return PTR_ERR(dev);
> >  
> 
> This can have a sideeffect not only bdi->dev_name will be truncated to 64
> chars (which generally doesn't matter) but possibly also kobject name will
> be truncated in the same way.  Which may have user visible effects. E.g.
> for fs/vboxsf 64 chars need not be enough. So shouldn't we rather do it the
> other way around - i.e., let device_create_vargs() create the device name
> and then copy to bdi->dev_name whatever fits?

I think having them mismatch is worse, as the kobject name is what
people look for.  Hans, do you know what fc->source typicall contains
and if there is much of a problem if it gets truncated/  Can we switch
to something else that is guranteed to be 64 charaters or less for the
bdi name?
Bart Van Assche April 18, 2020, 3:40 p.m. UTC | #3
On 2020-04-17 01:59, Jan Kara wrote:
> On Thu 16-04-20 18:54:48, Christoph Hellwig wrote:
>> @@ -938,7 +938,8 @@ int bdi_register_va(struct backing_dev_info *bdi, const char *fmt, va_list args)
>>  	if (bdi->dev)	/* The driver needs to use separate queues per device */
>>  		return 0;
>>  
>> -	dev = device_create_vargs(bdi_class, NULL, MKDEV(0, 0), bdi, fmt, args);
>> +	vsnprintf(bdi->dev_name, sizeof(bdi->dev_name), fmt, args);
>> +	dev = device_create(bdi_class, NULL, MKDEV(0, 0), bdi, bdi->dev_name);
>>  	if (IS_ERR(dev))
>>  		return PTR_ERR(dev);
>>  
> 
> This can have a sideeffect not only bdi->dev_name will be truncated to 64
> chars (which generally doesn't matter) but possibly also kobject name will
> be truncated in the same way.  Which may have user visible effects. E.g.
> for fs/vboxsf 64 chars need not be enough. So shouldn't we rather do it the
> other way around - i.e., let device_create_vargs() create the device name
> and then copy to bdi->dev_name whatever fits?

How about using kvasprintf() instead of vsnprintf()?

Thanks,

Bart.
Christoph Hellwig April 19, 2020, 7:58 a.m. UTC | #4
On Sat, Apr 18, 2020 at 08:40:20AM -0700, Bart Van Assche wrote:
> > This can have a sideeffect not only bdi->dev_name will be truncated to 64
> > chars (which generally doesn't matter) but possibly also kobject name will
> > be truncated in the same way.  Which may have user visible effects. E.g.
> > for fs/vboxsf 64 chars need not be enough. So shouldn't we rather do it the
> > other way around - i.e., let device_create_vargs() create the device name
> > and then copy to bdi->dev_name whatever fits?
> 
> How about using kvasprintf() instead of vsnprintf()?

That is what v1 did, see the thread in response to that on why it isn't
a good idea.
Bart Van Assche April 19, 2020, 3:29 p.m. UTC | #5
On 4/19/20 12:58 AM, Christoph Hellwig wrote:
> On Sat, Apr 18, 2020 at 08:40:20AM -0700, Bart Van Assche wrote:
>>> This can have a sideeffect not only bdi->dev_name will be truncated to 64
>>> chars (which generally doesn't matter) but possibly also kobject name will
>>> be truncated in the same way.  Which may have user visible effects. E.g.
>>> for fs/vboxsf 64 chars need not be enough. So shouldn't we rather do it the
>>> other way around - i.e., let device_create_vargs() create the device name
>>> and then copy to bdi->dev_name whatever fits?
>>
>> How about using kvasprintf() instead of vsnprintf()?
> 
> That is what v1 did, see the thread in response to that on why it isn't
> a good idea.

Are you perhaps referring to patch "[PATCH 3/8] bdi: add a ->dev_name 
field to struct backing_dev_info" 
(https://lore.kernel.org/linux-block/20200416071519.807660-4-hch@lst.de/) 
and also to the replies to that patch? This is what I found in the 
replies: "When driver try to to re-register bdi but without 
release_bdi(), the old dev_name will be cover directly by the newer in 
bdi_register_va(). So, I am not sure whether it can cause memory leak 
for bdi->dev_name."

Has it been considered to avoid that leak by freeing bdi->dev_name from 
unregister_bdi(), e.g. as follows?

void bdi_unregister(struct backing_dev_info *bdi)
{
+	char *dev_name;

	/* make sure nobody finds us on the bdi_list anymore */
	bdi_remove_from_list(bdi);
	wb_shutdown(&bdi->wb);
	cgwb_bdi_unregister(bdi);

	if (bdi->dev) {
		bdi_debug_unregister(bdi);
		device_unregister(bdi->dev);
		bdi->dev = NULL;
+		dev_name = bdi->dev_name;
+		bdi->dev_name = "(unregistered)";
+		kfree(dev_name);
	}

	if (bdi->owner) {
		put_device(bdi->owner);
		bdi->owner = NULL;
	}
}

Thanks,

Bart.
Christoph Hellwig April 19, 2020, 4:06 p.m. UTC | #6
On Sun, Apr 19, 2020 at 08:29:21AM -0700, Bart Van Assche wrote:
> On 4/19/20 12:58 AM, Christoph Hellwig wrote:
>> On Sat, Apr 18, 2020 at 08:40:20AM -0700, Bart Van Assche wrote:
>>>> This can have a sideeffect not only bdi->dev_name will be truncated to 64
>>>> chars (which generally doesn't matter) but possibly also kobject name will
>>>> be truncated in the same way.  Which may have user visible effects. E.g.
>>>> for fs/vboxsf 64 chars need not be enough. So shouldn't we rather do it the
>>>> other way around - i.e., let device_create_vargs() create the device name
>>>> and then copy to bdi->dev_name whatever fits?
>>>
>>> How about using kvasprintf() instead of vsnprintf()?
>>
>> That is what v1 did, see the thread in response to that on why it isn't
>> a good idea.
>
> Are you perhaps referring to patch "[PATCH 3/8] bdi: add a ->dev_name field 
> to struct backing_dev_info" 
> (https://lore.kernel.org/linux-block/20200416071519.807660-4-hch@lst.de/) 
> and also to the replies to that patch? This is what I found in the replies: 
> "When driver try to to re-register bdi but without release_bdi(), the old 
> dev_name will be cover directly by the newer in bdi_register_va(). So, I am 
> not sure whether it can cause memory leak for bdi->dev_name."
>
> Has it been considered to avoid that leak by freeing bdi->dev_name from 
> unregister_bdi(), e.g. as follows?

We'd need some protection against concurrent accesses as unregister_bdi
can race with them.  But with RCU that could be handled, so let me try
that.
Christoph Hellwig April 20, 2020, 7:48 a.m. UTC | #7
On Sun, Apr 19, 2020 at 06:06:51PM +0200, Christoph Hellwig wrote:
> > (https://lore.kernel.org/linux-block/20200416071519.807660-4-hch@lst.de/) 
> > and also to the replies to that patch? This is what I found in the replies: 
> > "When driver try to to re-register bdi but without release_bdi(), the old 
> > dev_name will be cover directly by the newer in bdi_register_va(). So, I am 
> > not sure whether it can cause memory leak for bdi->dev_name."
> >
> > Has it been considered to avoid that leak by freeing bdi->dev_name from 
> > unregister_bdi(), e.g. as follows?
> 
> We'd need some protection against concurrent accesses as unregister_bdi
> can race with them.  But with RCU that could be handled, so let me try
> that.

I looked into it, and while it seems doable I think this goes in the
wrong direction as it pushed the RCU knowledge into the callers.  I'd
rather get something like this series in ASAP, and then for 5.8 or 5.9
move the bdi pointer to the gendisk and stop re-registering it and thus
solve the problems root cause for real.
Jan Kara April 20, 2020, 9:49 a.m. UTC | #8
On Sun 19-04-20 18:06:51, Christoph Hellwig wrote:
> On Sun, Apr 19, 2020 at 08:29:21AM -0700, Bart Van Assche wrote:
> > On 4/19/20 12:58 AM, Christoph Hellwig wrote:
> >> On Sat, Apr 18, 2020 at 08:40:20AM -0700, Bart Van Assche wrote:
> >>>> This can have a sideeffect not only bdi->dev_name will be truncated to 64
> >>>> chars (which generally doesn't matter) but possibly also kobject name will
> >>>> be truncated in the same way.  Which may have user visible effects. E.g.
> >>>> for fs/vboxsf 64 chars need not be enough. So shouldn't we rather do it the
> >>>> other way around - i.e., let device_create_vargs() create the device name
> >>>> and then copy to bdi->dev_name whatever fits?
> >>>
> >>> How about using kvasprintf() instead of vsnprintf()?
> >>
> >> That is what v1 did, see the thread in response to that on why it isn't
> >> a good idea.
> >
> > Are you perhaps referring to patch "[PATCH 3/8] bdi: add a ->dev_name field 
> > to struct backing_dev_info" 
> > (https://lore.kernel.org/linux-block/20200416071519.807660-4-hch@lst.de/) 
> > and also to the replies to that patch? This is what I found in the replies: 
> > "When driver try to to re-register bdi but without release_bdi(), the old 
> > dev_name will be cover directly by the newer in bdi_register_va(). So, I am 
> > not sure whether it can cause memory leak for bdi->dev_name."
> >
> > Has it been considered to avoid that leak by freeing bdi->dev_name from 
> > unregister_bdi(), e.g. as follows?
> 
> We'd need some protection against concurrent accesses as unregister_bdi
> can race with them.  But with RCU that could be handled, so let me try
> that.

Yeah, that's what Yufen tried in his series some time ago and what I think
you personally didn't like :).

								Honza
Jan Kara April 20, 2020, 9:52 a.m. UTC | #9
On Mon 20-04-20 09:48:01, Christoph Hellwig wrote:
> On Sun, Apr 19, 2020 at 06:06:51PM +0200, Christoph Hellwig wrote:
> > > (https://lore.kernel.org/linux-block/20200416071519.807660-4-hch@lst.de/) 
> > > and also to the replies to that patch? This is what I found in the replies: 
> > > "When driver try to to re-register bdi but without release_bdi(), the old 
> > > dev_name will be cover directly by the newer in bdi_register_va(). So, I am 
> > > not sure whether it can cause memory leak for bdi->dev_name."
> > >
> > > Has it been considered to avoid that leak by freeing bdi->dev_name from 
> > > unregister_bdi(), e.g. as follows?
> > 
> > We'd need some protection against concurrent accesses as unregister_bdi
> > can race with them.  But with RCU that could be handled, so let me try
> > that.
> 
> I looked into it, and while it seems doable I think this goes in the
> wrong direction as it pushed the RCU knowledge into the callers.  I'd
> rather get something like this series in ASAP, and then for 5.8 or 5.9
> move the bdi pointer to the gendisk and stop re-registering it and thus
> solve the problems root cause for real.

Yeah, if it could be done it would be a nice solution. Because
re-registering of BDIs is a long-term source of troubles...

								Honza
Hans de Goede April 20, 2020, 11:41 a.m. UTC | #10
Hi,

<A lot of context has been trimmed here before I got added to the Cc, so
  I'm assuming that we are talking about the vboxsf code here.>

On 4/17/20 3:01 PM, Christoph Hellwig wrote:
> On Fri, Apr 17, 2020 at 10:59:09AM +0200, Jan Kara wrote:
>>> -	dev = device_create_vargs(bdi_class, NULL, MKDEV(0, 0), bdi, fmt, args);
>>> +	vsnprintf(bdi->dev_name, sizeof(bdi->dev_name), fmt, args);
>>> +	dev = device_create(bdi_class, NULL, MKDEV(0, 0), bdi, bdi->dev_name);
>>>   	if (IS_ERR(dev))
>>>   		return PTR_ERR(dev);
>>>   
>>
>> This can have a sideeffect not only bdi->dev_name will be truncated to 64
>> chars (which generally doesn't matter) but possibly also kobject name will
>> be truncated in the same way.  Which may have user visible effects. E.g.
>> for fs/vboxsf 64 chars need not be enough. So shouldn't we rather do it the
>> other way around - i.e., let device_create_vargs() create the device name
>> and then copy to bdi->dev_name whatever fits?
> 
> I think having them mismatch is worse, as the kobject name is what
> people look for.  Hans, do you know what fc->source typicall contains
> and if there is much of a problem if it gets truncated/  Can we switch
> to something else that is guranteed to be 64 charaters or less for the
> bdi name?

It contains the name the user has given to the shared-folder when
exporting it from the host/hypervisor. Typically this will be the
last element of the directory path, e.g. if I export /home/hans/projects/linux
then the default/suggested share name and this the source name to pass to
the host when mounting the shared-folder will be "linux". But the user can
put anything there.

AFAICT for vboxsf the bdi-name can be anything as long as it is unique, hence
the "vboxsf-" prefix to make this unique vs other block-devices and the
".%d" postfix is necessary because the same export can be mounted multiple
times (without using bind mounts), see:
https://github.com/jwrdegoede/vboxsf/issues/3

The presence of the source inside the bdi-name is only for informational
purposes really, so truncating that should be fine, maybe switch to:

"vboxsf%d-%s" as format string and swap the sbi->bdi_id and fc->source
in the args, then if we truncate anything it will be the source (which
as said is only there for informational purposes) and the name will
still be guaranteed to be unique.

Regards,

Hans
Christoph Hellwig April 20, 2020, 11:58 a.m. UTC | #11
On Mon, Apr 20, 2020 at 01:41:57PM +0200, Hans de Goede wrote:
> AFAICT for vboxsf the bdi-name can be anything as long as it is unique, hence
> the "vboxsf-" prefix to make this unique vs other block-devices and the
> ".%d" postfix is necessary because the same export can be mounted multiple
> times (without using bind mounts), see:
> https://github.com/jwrdegoede/vboxsf/issues/3

Shouldn't vboxsf switch to get_tree_single instead of get_tree_nodev?
Having two independent dentry trees for a single actual file system
can be pretty dangerous.

>
> The presence of the source inside the bdi-name is only for informational
> purposes really, so truncating that should be fine, maybe switch to:
>
> "vboxsf%d-%s" as format string and swap the sbi->bdi_id and fc->source
> in the args, then if we truncate anything it will be the source (which
> as said is only there for informational purposes) and the name will
> still be guaranteed to be unique.

Can we just switch to vboxsf%d where %d іs a simple monotonically
incrementing count?  That is what various other file systems (e.g. ceph)
do.
Hans de Goede April 21, 2020, 12:42 p.m. UTC | #12
Hi,

On 4/20/20 1:58 PM, Christoph Hellwig wrote:
> On Mon, Apr 20, 2020 at 01:41:57PM +0200, Hans de Goede wrote:
>> AFAICT for vboxsf the bdi-name can be anything as long as it is unique, hence
>> the "vboxsf-" prefix to make this unique vs other block-devices and the
>> ".%d" postfix is necessary because the same export can be mounted multiple
>> times (without using bind mounts), see:
>> https://github.com/jwrdegoede/vboxsf/issues/3
> 
> Shouldn't vboxsf switch to get_tree_single instead of get_tree_nodev?
> Having two independent dentry trees for a single actual file system
> can be pretty dangerous.

That is a good point I will look into this.

> 
>>
>> The presence of the source inside the bdi-name is only for informational
>> purposes really, so truncating that should be fine, maybe switch to:
>>
>> "vboxsf%d-%s" as format string and swap the sbi->bdi_id and fc->source
>> in the args, then if we truncate anything it will be the source (which
>> as said is only there for informational purposes) and the name will
>> still be guaranteed to be unique.
> 
> Can we just switch to vboxsf%d where %d іs a simple monotonically
> incrementing count?  That is what various other file systems (e.g. ceph)
> do.

Yes that is fine with me.

Regards,

Hans
diff mbox series

Patch

diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 4fc87dee005a..2849bdbb3acb 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -220,6 +220,7 @@  struct backing_dev_info {
 	wait_queue_head_t wb_waitq;
 
 	struct device *dev;
+	char dev_name[64];
 	struct device *owner;
 
 	struct timer_list laptop_mode_wb_timer;
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index c2c44c89ee5d..efc5b83acd2d 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -938,7 +938,8 @@  int bdi_register_va(struct backing_dev_info *bdi, const char *fmt, va_list args)
 	if (bdi->dev)	/* The driver needs to use separate queues per device */
 		return 0;
 
-	dev = device_create_vargs(bdi_class, NULL, MKDEV(0, 0), bdi, fmt, args);
+	vsnprintf(bdi->dev_name, sizeof(bdi->dev_name), fmt, args);
+	dev = device_create(bdi_class, NULL, MKDEV(0, 0), bdi, bdi->dev_name);
 	if (IS_ERR(dev))
 		return PTR_ERR(dev);
 
@@ -1047,7 +1048,7 @@  const char *bdi_dev_name(struct backing_dev_info *bdi)
 {
 	if (!bdi || !bdi->dev)
 		return bdi_unknown_name;
-	return dev_name(bdi->dev);
+	return bdi->dev_name;
 }
 EXPORT_SYMBOL_GPL(bdi_dev_name);