Message ID | 20200416165453.1080463-4-hch@lst.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/8] bdi: move bdi_dev_name out of line | expand |
On Thu 16-04-20 18:54:48, Christoph Hellwig wrote: > Cache a copy of the name for the life time of the backing_dev_info > structure so that we can reference it even after unregistering. > > Fixes: 68f23b89067f ("memcg: fix a crash in wb_workfn when a device disappears") > Reported-by: Yufen Yu <yuyufen@huawei.com> > Signed-off-by: Christoph Hellwig <hch@lst.de> ... > @@ -938,7 +938,8 @@ int bdi_register_va(struct backing_dev_info *bdi, const char *fmt, va_list args) > if (bdi->dev) /* The driver needs to use separate queues per device */ > return 0; > > - dev = device_create_vargs(bdi_class, NULL, MKDEV(0, 0), bdi, fmt, args); > + vsnprintf(bdi->dev_name, sizeof(bdi->dev_name), fmt, args); > + dev = device_create(bdi_class, NULL, MKDEV(0, 0), bdi, bdi->dev_name); > if (IS_ERR(dev)) > return PTR_ERR(dev); > This can have a sideeffect not only bdi->dev_name will be truncated to 64 chars (which generally doesn't matter) but possibly also kobject name will be truncated in the same way. Which may have user visible effects. E.g. for fs/vboxsf 64 chars need not be enough. So shouldn't we rather do it the other way around - i.e., let device_create_vargs() create the device name and then copy to bdi->dev_name whatever fits? Honza
On Fri, Apr 17, 2020 at 10:59:09AM +0200, Jan Kara wrote: > > - dev = device_create_vargs(bdi_class, NULL, MKDEV(0, 0), bdi, fmt, args); > > + vsnprintf(bdi->dev_name, sizeof(bdi->dev_name), fmt, args); > > + dev = device_create(bdi_class, NULL, MKDEV(0, 0), bdi, bdi->dev_name); > > if (IS_ERR(dev)) > > return PTR_ERR(dev); > > > > This can have a sideeffect not only bdi->dev_name will be truncated to 64 > chars (which generally doesn't matter) but possibly also kobject name will > be truncated in the same way. Which may have user visible effects. E.g. > for fs/vboxsf 64 chars need not be enough. So shouldn't we rather do it the > other way around - i.e., let device_create_vargs() create the device name > and then copy to bdi->dev_name whatever fits? I think having them mismatch is worse, as the kobject name is what people look for. Hans, do you know what fc->source typicall contains and if there is much of a problem if it gets truncated/ Can we switch to something else that is guranteed to be 64 charaters or less for the bdi name?
On 2020-04-17 01:59, Jan Kara wrote: > On Thu 16-04-20 18:54:48, Christoph Hellwig wrote: >> @@ -938,7 +938,8 @@ int bdi_register_va(struct backing_dev_info *bdi, const char *fmt, va_list args) >> if (bdi->dev) /* The driver needs to use separate queues per device */ >> return 0; >> >> - dev = device_create_vargs(bdi_class, NULL, MKDEV(0, 0), bdi, fmt, args); >> + vsnprintf(bdi->dev_name, sizeof(bdi->dev_name), fmt, args); >> + dev = device_create(bdi_class, NULL, MKDEV(0, 0), bdi, bdi->dev_name); >> if (IS_ERR(dev)) >> return PTR_ERR(dev); >> > > This can have a sideeffect not only bdi->dev_name will be truncated to 64 > chars (which generally doesn't matter) but possibly also kobject name will > be truncated in the same way. Which may have user visible effects. E.g. > for fs/vboxsf 64 chars need not be enough. So shouldn't we rather do it the > other way around - i.e., let device_create_vargs() create the device name > and then copy to bdi->dev_name whatever fits? How about using kvasprintf() instead of vsnprintf()? Thanks, Bart.
On Sat, Apr 18, 2020 at 08:40:20AM -0700, Bart Van Assche wrote: > > This can have a sideeffect not only bdi->dev_name will be truncated to 64 > > chars (which generally doesn't matter) but possibly also kobject name will > > be truncated in the same way. Which may have user visible effects. E.g. > > for fs/vboxsf 64 chars need not be enough. So shouldn't we rather do it the > > other way around - i.e., let device_create_vargs() create the device name > > and then copy to bdi->dev_name whatever fits? > > How about using kvasprintf() instead of vsnprintf()? That is what v1 did, see the thread in response to that on why it isn't a good idea.
On 4/19/20 12:58 AM, Christoph Hellwig wrote: > On Sat, Apr 18, 2020 at 08:40:20AM -0700, Bart Van Assche wrote: >>> This can have a sideeffect not only bdi->dev_name will be truncated to 64 >>> chars (which generally doesn't matter) but possibly also kobject name will >>> be truncated in the same way. Which may have user visible effects. E.g. >>> for fs/vboxsf 64 chars need not be enough. So shouldn't we rather do it the >>> other way around - i.e., let device_create_vargs() create the device name >>> and then copy to bdi->dev_name whatever fits? >> >> How about using kvasprintf() instead of vsnprintf()? > > That is what v1 did, see the thread in response to that on why it isn't > a good idea. Are you perhaps referring to patch "[PATCH 3/8] bdi: add a ->dev_name field to struct backing_dev_info" (https://lore.kernel.org/linux-block/20200416071519.807660-4-hch@lst.de/) and also to the replies to that patch? This is what I found in the replies: "When driver try to to re-register bdi but without release_bdi(), the old dev_name will be cover directly by the newer in bdi_register_va(). So, I am not sure whether it can cause memory leak for bdi->dev_name." Has it been considered to avoid that leak by freeing bdi->dev_name from unregister_bdi(), e.g. as follows? void bdi_unregister(struct backing_dev_info *bdi) { + char *dev_name; /* make sure nobody finds us on the bdi_list anymore */ bdi_remove_from_list(bdi); wb_shutdown(&bdi->wb); cgwb_bdi_unregister(bdi); if (bdi->dev) { bdi_debug_unregister(bdi); device_unregister(bdi->dev); bdi->dev = NULL; + dev_name = bdi->dev_name; + bdi->dev_name = "(unregistered)"; + kfree(dev_name); } if (bdi->owner) { put_device(bdi->owner); bdi->owner = NULL; } } Thanks, Bart.
On Sun, Apr 19, 2020 at 08:29:21AM -0700, Bart Van Assche wrote: > On 4/19/20 12:58 AM, Christoph Hellwig wrote: >> On Sat, Apr 18, 2020 at 08:40:20AM -0700, Bart Van Assche wrote: >>>> This can have a sideeffect not only bdi->dev_name will be truncated to 64 >>>> chars (which generally doesn't matter) but possibly also kobject name will >>>> be truncated in the same way. Which may have user visible effects. E.g. >>>> for fs/vboxsf 64 chars need not be enough. So shouldn't we rather do it the >>>> other way around - i.e., let device_create_vargs() create the device name >>>> and then copy to bdi->dev_name whatever fits? >>> >>> How about using kvasprintf() instead of vsnprintf()? >> >> That is what v1 did, see the thread in response to that on why it isn't >> a good idea. > > Are you perhaps referring to patch "[PATCH 3/8] bdi: add a ->dev_name field > to struct backing_dev_info" > (https://lore.kernel.org/linux-block/20200416071519.807660-4-hch@lst.de/) > and also to the replies to that patch? This is what I found in the replies: > "When driver try to to re-register bdi but without release_bdi(), the old > dev_name will be cover directly by the newer in bdi_register_va(). So, I am > not sure whether it can cause memory leak for bdi->dev_name." > > Has it been considered to avoid that leak by freeing bdi->dev_name from > unregister_bdi(), e.g. as follows? We'd need some protection against concurrent accesses as unregister_bdi can race with them. But with RCU that could be handled, so let me try that.
On Sun, Apr 19, 2020 at 06:06:51PM +0200, Christoph Hellwig wrote: > > (https://lore.kernel.org/linux-block/20200416071519.807660-4-hch@lst.de/) > > and also to the replies to that patch? This is what I found in the replies: > > "When driver try to to re-register bdi but without release_bdi(), the old > > dev_name will be cover directly by the newer in bdi_register_va(). So, I am > > not sure whether it can cause memory leak for bdi->dev_name." > > > > Has it been considered to avoid that leak by freeing bdi->dev_name from > > unregister_bdi(), e.g. as follows? > > We'd need some protection against concurrent accesses as unregister_bdi > can race with them. But with RCU that could be handled, so let me try > that. I looked into it, and while it seems doable I think this goes in the wrong direction as it pushed the RCU knowledge into the callers. I'd rather get something like this series in ASAP, and then for 5.8 or 5.9 move the bdi pointer to the gendisk and stop re-registering it and thus solve the problems root cause for real.
On Sun 19-04-20 18:06:51, Christoph Hellwig wrote: > On Sun, Apr 19, 2020 at 08:29:21AM -0700, Bart Van Assche wrote: > > On 4/19/20 12:58 AM, Christoph Hellwig wrote: > >> On Sat, Apr 18, 2020 at 08:40:20AM -0700, Bart Van Assche wrote: > >>>> This can have a sideeffect not only bdi->dev_name will be truncated to 64 > >>>> chars (which generally doesn't matter) but possibly also kobject name will > >>>> be truncated in the same way. Which may have user visible effects. E.g. > >>>> for fs/vboxsf 64 chars need not be enough. So shouldn't we rather do it the > >>>> other way around - i.e., let device_create_vargs() create the device name > >>>> and then copy to bdi->dev_name whatever fits? > >>> > >>> How about using kvasprintf() instead of vsnprintf()? > >> > >> That is what v1 did, see the thread in response to that on why it isn't > >> a good idea. > > > > Are you perhaps referring to patch "[PATCH 3/8] bdi: add a ->dev_name field > > to struct backing_dev_info" > > (https://lore.kernel.org/linux-block/20200416071519.807660-4-hch@lst.de/) > > and also to the replies to that patch? This is what I found in the replies: > > "When driver try to to re-register bdi but without release_bdi(), the old > > dev_name will be cover directly by the newer in bdi_register_va(). So, I am > > not sure whether it can cause memory leak for bdi->dev_name." > > > > Has it been considered to avoid that leak by freeing bdi->dev_name from > > unregister_bdi(), e.g. as follows? > > We'd need some protection against concurrent accesses as unregister_bdi > can race with them. But with RCU that could be handled, so let me try > that. Yeah, that's what Yufen tried in his series some time ago and what I think you personally didn't like :). Honza
On Mon 20-04-20 09:48:01, Christoph Hellwig wrote: > On Sun, Apr 19, 2020 at 06:06:51PM +0200, Christoph Hellwig wrote: > > > (https://lore.kernel.org/linux-block/20200416071519.807660-4-hch@lst.de/) > > > and also to the replies to that patch? This is what I found in the replies: > > > "When driver try to to re-register bdi but without release_bdi(), the old > > > dev_name will be cover directly by the newer in bdi_register_va(). So, I am > > > not sure whether it can cause memory leak for bdi->dev_name." > > > > > > Has it been considered to avoid that leak by freeing bdi->dev_name from > > > unregister_bdi(), e.g. as follows? > > > > We'd need some protection against concurrent accesses as unregister_bdi > > can race with them. But with RCU that could be handled, so let me try > > that. > > I looked into it, and while it seems doable I think this goes in the > wrong direction as it pushed the RCU knowledge into the callers. I'd > rather get something like this series in ASAP, and then for 5.8 or 5.9 > move the bdi pointer to the gendisk and stop re-registering it and thus > solve the problems root cause for real. Yeah, if it could be done it would be a nice solution. Because re-registering of BDIs is a long-term source of troubles... Honza
Hi, <A lot of context has been trimmed here before I got added to the Cc, so I'm assuming that we are talking about the vboxsf code here.> On 4/17/20 3:01 PM, Christoph Hellwig wrote: > On Fri, Apr 17, 2020 at 10:59:09AM +0200, Jan Kara wrote: >>> - dev = device_create_vargs(bdi_class, NULL, MKDEV(0, 0), bdi, fmt, args); >>> + vsnprintf(bdi->dev_name, sizeof(bdi->dev_name), fmt, args); >>> + dev = device_create(bdi_class, NULL, MKDEV(0, 0), bdi, bdi->dev_name); >>> if (IS_ERR(dev)) >>> return PTR_ERR(dev); >>> >> >> This can have a sideeffect not only bdi->dev_name will be truncated to 64 >> chars (which generally doesn't matter) but possibly also kobject name will >> be truncated in the same way. Which may have user visible effects. E.g. >> for fs/vboxsf 64 chars need not be enough. So shouldn't we rather do it the >> other way around - i.e., let device_create_vargs() create the device name >> and then copy to bdi->dev_name whatever fits? > > I think having them mismatch is worse, as the kobject name is what > people look for. Hans, do you know what fc->source typicall contains > and if there is much of a problem if it gets truncated/ Can we switch > to something else that is guranteed to be 64 charaters or less for the > bdi name? It contains the name the user has given to the shared-folder when exporting it from the host/hypervisor. Typically this will be the last element of the directory path, e.g. if I export /home/hans/projects/linux then the default/suggested share name and this the source name to pass to the host when mounting the shared-folder will be "linux". But the user can put anything there. AFAICT for vboxsf the bdi-name can be anything as long as it is unique, hence the "vboxsf-" prefix to make this unique vs other block-devices and the ".%d" postfix is necessary because the same export can be mounted multiple times (without using bind mounts), see: https://github.com/jwrdegoede/vboxsf/issues/3 The presence of the source inside the bdi-name is only for informational purposes really, so truncating that should be fine, maybe switch to: "vboxsf%d-%s" as format string and swap the sbi->bdi_id and fc->source in the args, then if we truncate anything it will be the source (which as said is only there for informational purposes) and the name will still be guaranteed to be unique. Regards, Hans
On Mon, Apr 20, 2020 at 01:41:57PM +0200, Hans de Goede wrote: > AFAICT for vboxsf the bdi-name can be anything as long as it is unique, hence > the "vboxsf-" prefix to make this unique vs other block-devices and the > ".%d" postfix is necessary because the same export can be mounted multiple > times (without using bind mounts), see: > https://github.com/jwrdegoede/vboxsf/issues/3 Shouldn't vboxsf switch to get_tree_single instead of get_tree_nodev? Having two independent dentry trees for a single actual file system can be pretty dangerous. > > The presence of the source inside the bdi-name is only for informational > purposes really, so truncating that should be fine, maybe switch to: > > "vboxsf%d-%s" as format string and swap the sbi->bdi_id and fc->source > in the args, then if we truncate anything it will be the source (which > as said is only there for informational purposes) and the name will > still be guaranteed to be unique. Can we just switch to vboxsf%d where %d іs a simple monotonically incrementing count? That is what various other file systems (e.g. ceph) do.
Hi, On 4/20/20 1:58 PM, Christoph Hellwig wrote: > On Mon, Apr 20, 2020 at 01:41:57PM +0200, Hans de Goede wrote: >> AFAICT for vboxsf the bdi-name can be anything as long as it is unique, hence >> the "vboxsf-" prefix to make this unique vs other block-devices and the >> ".%d" postfix is necessary because the same export can be mounted multiple >> times (without using bind mounts), see: >> https://github.com/jwrdegoede/vboxsf/issues/3 > > Shouldn't vboxsf switch to get_tree_single instead of get_tree_nodev? > Having two independent dentry trees for a single actual file system > can be pretty dangerous. That is a good point I will look into this. > >> >> The presence of the source inside the bdi-name is only for informational >> purposes really, so truncating that should be fine, maybe switch to: >> >> "vboxsf%d-%s" as format string and swap the sbi->bdi_id and fc->source >> in the args, then if we truncate anything it will be the source (which >> as said is only there for informational purposes) and the name will >> still be guaranteed to be unique. > > Can we just switch to vboxsf%d where %d іs a simple monotonically > incrementing count? That is what various other file systems (e.g. ceph) > do. Yes that is fine with me. Regards, Hans
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index 4fc87dee005a..2849bdbb3acb 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -220,6 +220,7 @@ struct backing_dev_info { wait_queue_head_t wb_waitq; struct device *dev; + char dev_name[64]; struct device *owner; struct timer_list laptop_mode_wb_timer; diff --git a/mm/backing-dev.c b/mm/backing-dev.c index c2c44c89ee5d..efc5b83acd2d 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -938,7 +938,8 @@ int bdi_register_va(struct backing_dev_info *bdi, const char *fmt, va_list args) if (bdi->dev) /* The driver needs to use separate queues per device */ return 0; - dev = device_create_vargs(bdi_class, NULL, MKDEV(0, 0), bdi, fmt, args); + vsnprintf(bdi->dev_name, sizeof(bdi->dev_name), fmt, args); + dev = device_create(bdi_class, NULL, MKDEV(0, 0), bdi, bdi->dev_name); if (IS_ERR(dev)) return PTR_ERR(dev); @@ -1047,7 +1048,7 @@ const char *bdi_dev_name(struct backing_dev_info *bdi) { if (!bdi || !bdi->dev) return bdi_unknown_name; - return dev_name(bdi->dev); + return bdi->dev_name; } EXPORT_SYMBOL_GPL(bdi_dev_name);
Cache a copy of the name for the life time of the backing_dev_info structure so that we can reference it even after unregistering. Fixes: 68f23b89067f ("memcg: fix a crash in wb_workfn when a device disappears") Reported-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> --- include/linux/backing-dev-defs.h | 1 + mm/backing-dev.c | 5 +++-- 2 files changed, 4 insertions(+), 2 deletions(-)