Message ID | 20230530203116.2008-17-demi@invisiblethingslab.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Diskseq support in loop, device-mapper, and blkback | expand |
On Tue, May 30, 2023 at 04:31:16PM -0400, Demi Marie Obenour wrote: > Set "opened" to "0" before the hotplug script is called. Once the > device node has been opened, set "opened" to "1". > > "opened" is used exclusively by userspace. It serves two purposes: > > 1. It tells userspace that the diskseq Xenstore entry is supported. > > 2. It tells userspace that it can wait for "opened" to be set to 1. > Once "opened" is 1, blkback has a reference to the device, so > userspace doesn't need to keep one. > > Together, these changes allow userspace to use block devices with > delete-on-close behavior, such as loop devices with the autoclear flag > set or device-mapper devices with the deferred-remove flag set. There was some work in the past to allow reloading blkback as a module, it's clear that using delete-on-close won't work if attempting to reload blkback. Isn't there some existing way to check whether a device is opened? (stat syscall maybe?). I would like to avoid adding more xenstore blkback state if such information can be fetched from other methods. > Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> > --- > drivers/block/xen-blkback/xenbus.c | 35 ++++++++++++++++++++++++++++++ > 1 file changed, 35 insertions(+) > > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c > index 9c3eb148fbd802c74e626c3d7bcd69dcb09bd921..519a78aa9073d1faa1dce5c1b36e95ae58da534b 100644 > --- a/drivers/block/xen-blkback/xenbus.c > +++ b/drivers/block/xen-blkback/xenbus.c > @@ -3,6 +3,20 @@ > Copyright (C) 2005 Rusty Russell <rusty@rustcorp.com.au> > Copyright (C) 2005 XenSource Ltd > > +In addition to the Xenstore nodes required by the Xen block device > +specification, this implementation of blkback uses a new Xenstore > +node: "opened". blkback sets "opened" to "0" before the hotplug script > +is called. Once the device node has been opened, blkback sets "opened" > +to "1". > + > +"opened" is read exclusively by userspace. It serves two purposes: > + > +1. It tells userspace that diskseq@major:minor syntax for "physical-device" is > + supported. > + > +2. It tells userspace that it can wait for "opened" to be set to 1 after writing > + "physical-device". Once "opened" is 1, blkback has a reference to the > + device, so userspace doesn't need to keep one. > > */ > > @@ -699,6 +713,14 @@ static int xen_blkbk_probe(struct xenbus_device *dev, > if (err) > pr_warn("%s write out 'max-ring-page-order' failed\n", __func__); > > + /* > + * This informs userspace that the "opened" node will be set to "1" when > + * the device has been opened successfully. > + */ > + err = xenbus_write(XBT_NIL, dev->nodename, "opened", "0"); > + if (err) > + goto fail; > + You would need to set "opened" before registering the xenstore backend watch AFAICT, or else it could be racy. Thanks, Roger.
On Tue, Jun 06, 2023 at 11:15:37AM +0200, Roger Pau Monné wrote: > On Tue, May 30, 2023 at 04:31:16PM -0400, Demi Marie Obenour wrote: > > Set "opened" to "0" before the hotplug script is called. Once the > > device node has been opened, set "opened" to "1". > > > > "opened" is used exclusively by userspace. It serves two purposes: > > > > 1. It tells userspace that the diskseq Xenstore entry is supported. > > > > 2. It tells userspace that it can wait for "opened" to be set to 1. > > Once "opened" is 1, blkback has a reference to the device, so > > userspace doesn't need to keep one. > > > > Together, these changes allow userspace to use block devices with > > delete-on-close behavior, such as loop devices with the autoclear flag > > set or device-mapper devices with the deferred-remove flag set. > > There was some work in the past to allow reloading blkback as a > module, it's clear that using delete-on-close won't work if attempting > to reload blkback. Should blkback stop itself from being unloaded if delete-on-close is in use? > Isn't there some existing way to check whether a device is opened? > (stat syscall maybe?). Knowing that the device has been opened isn’t enough. The block script needs to be able to wait for blkback (and not something else) to open the device. Otherwise it will be confused if the device is opened by e.g. udev. > I would like to avoid adding more xenstore blkback state if such > information can be fetched from other methods. I don’t think it can be, unless the information is passed via a completely different method. Maybe netlink(7) or ioctl(2)? Arguably this information should not be stored in Xenstore at all, as it exposes backend implementation details to the frontend. > > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c > > index 9c3eb148fbd802c74e626c3d7bcd69dcb09bd921..519a78aa9073d1faa1dce5c1b36e95ae58da534b 100644 > > --- a/drivers/block/xen-blkback/xenbus.c > > +++ b/drivers/block/xen-blkback/xenbus.c > > @@ -699,6 +713,14 @@ static int xen_blkbk_probe(struct xenbus_device *dev, > > if (err) > > pr_warn("%s write out 'max-ring-page-order' failed\n", __func__); > > > > + /* > > + * This informs userspace that the "opened" node will be set to "1" when > > + * the device has been opened successfully. > > + */ > > + err = xenbus_write(XBT_NIL, dev->nodename, "opened", "0"); > > + if (err) > > + goto fail; > > + > > You would need to set "opened" before registering the xenstore backend > watch AFAICT, or else it could be racy. Will fix in the next version.
On Tue, Jun 06, 2023 at 01:31:25PM -0400, Demi Marie Obenour wrote: > On Tue, Jun 06, 2023 at 11:15:37AM +0200, Roger Pau Monné wrote: > > On Tue, May 30, 2023 at 04:31:16PM -0400, Demi Marie Obenour wrote: > > > Set "opened" to "0" before the hotplug script is called. Once the > > > device node has been opened, set "opened" to "1". > > > > > > "opened" is used exclusively by userspace. It serves two purposes: > > > > > > 1. It tells userspace that the diskseq Xenstore entry is supported. > > > > > > 2. It tells userspace that it can wait for "opened" to be set to 1. > > > Once "opened" is 1, blkback has a reference to the device, so > > > userspace doesn't need to keep one. > > > > > > Together, these changes allow userspace to use block devices with > > > delete-on-close behavior, such as loop devices with the autoclear flag > > > set or device-mapper devices with the deferred-remove flag set. > > > > There was some work in the past to allow reloading blkback as a > > module, it's clear that using delete-on-close won't work if attempting > > to reload blkback. > > Should blkback stop itself from being unloaded if delete-on-close is in > use? Hm, maybe. I guess that's the best we can do right now. > > Isn't there some existing way to check whether a device is opened? > > (stat syscall maybe?). > > Knowing that the device has been opened isn’t enough. The block script > needs to be able to wait for blkback (and not something else) to open > the device. Otherwise it will be confused if the device is opened by > e.g. udev. Urg, no, the block script cannot wait indefinitely for blkback to open the device, as it has an execution timeout. blkback is free to only open the device upon guest frontend connection, and that (when using libxl) requires the hotplug scripts execution to be finished so the guest can be started. > > I would like to avoid adding more xenstore blkback state if such > > information can be fetched from other methods. > > I don’t think it can be, unless the information is passed via a > completely different method. Maybe netlink(7) or ioctl(2)? Arguably > this information should not be stored in Xenstore at all, as it exposes > backend implementation details to the frontend. Could you maybe use sysfs for this information? We have all sorts of crap in xenstore, but it would be best if we can see of placing stuff like this in another interface. Thanks, Roger.
On Wed, Jun 07, 2023 at 10:44:48AM +0200, Roger Pau Monné wrote: > On Tue, Jun 06, 2023 at 01:31:25PM -0400, Demi Marie Obenour wrote: > > On Tue, Jun 06, 2023 at 11:15:37AM +0200, Roger Pau Monné wrote: > > > On Tue, May 30, 2023 at 04:31:16PM -0400, Demi Marie Obenour wrote: > > > > Set "opened" to "0" before the hotplug script is called. Once the > > > > device node has been opened, set "opened" to "1". > > > > > > > > "opened" is used exclusively by userspace. It serves two purposes: > > > > > > > > 1. It tells userspace that the diskseq Xenstore entry is supported. > > > > > > > > 2. It tells userspace that it can wait for "opened" to be set to 1. > > > > Once "opened" is 1, blkback has a reference to the device, so > > > > userspace doesn't need to keep one. > > > > > > > > Together, these changes allow userspace to use block devices with > > > > delete-on-close behavior, such as loop devices with the autoclear flag > > > > set or device-mapper devices with the deferred-remove flag set. > > > > > > There was some work in the past to allow reloading blkback as a > > > module, it's clear that using delete-on-close won't work if attempting > > > to reload blkback. > > > > Should blkback stop itself from being unloaded if delete-on-close is in > > use? > > Hm, maybe. I guess that's the best we can do right now. I’ll implement this. > > > Isn't there some existing way to check whether a device is opened? > > > (stat syscall maybe?). > > > > Knowing that the device has been opened isn’t enough. The block script > > needs to be able to wait for blkback (and not something else) to open > > the device. Otherwise it will be confused if the device is opened by > > e.g. udev. > > Urg, no, the block script cannot wait indefinitely for blkback to open > the device, as it has an execution timeout. blkback is free to only > open the device upon guest frontend connection, and that (when using > libxl) requires the hotplug scripts execution to be finished so the > guest can be started. I’m a bit confused here. My understanding is that blkdev_get_by_dev() already opens the device, and that happens in the xenstore watch handler. I have tested this with delete-on-close device-mapper devices, and it does work. > > > I would like to avoid adding more xenstore blkback state if such > > > information can be fetched from other methods. > > > > I don’t think it can be, unless the information is passed via a > > completely different method. Maybe netlink(7) or ioctl(2)? Arguably > > this information should not be stored in Xenstore at all, as it exposes > > backend implementation details to the frontend. > > Could you maybe use sysfs for this information? Probably? This would involve adding a new file in sysfs. > We have all sorts of crap in xenstore, but it would be best if we can > see of placing stuff like this in another interface. Fair. > Thanks, Roger.
On Wed, Jun 07, 2023 at 12:29:26PM -0400, Demi Marie Obenour wrote: > On Wed, Jun 07, 2023 at 10:44:48AM +0200, Roger Pau Monné wrote: > > On Tue, Jun 06, 2023 at 01:31:25PM -0400, Demi Marie Obenour wrote: > > > On Tue, Jun 06, 2023 at 11:15:37AM +0200, Roger Pau Monné wrote: > > > > On Tue, May 30, 2023 at 04:31:16PM -0400, Demi Marie Obenour wrote: > > > > > Set "opened" to "0" before the hotplug script is called. Once the > > > > > device node has been opened, set "opened" to "1". > > > > > > > > > > "opened" is used exclusively by userspace. It serves two purposes: > > > > > > > > > > 1. It tells userspace that the diskseq Xenstore entry is supported. > > > > > > > > > > 2. It tells userspace that it can wait for "opened" to be set to 1. > > > > > Once "opened" is 1, blkback has a reference to the device, so > > > > > userspace doesn't need to keep one. > > > > > > > > > > Together, these changes allow userspace to use block devices with > > > > > delete-on-close behavior, such as loop devices with the autoclear flag > > > > > set or device-mapper devices with the deferred-remove flag set. > > > > > > > > There was some work in the past to allow reloading blkback as a > > > > module, it's clear that using delete-on-close won't work if attempting > > > > to reload blkback. > > > > > > Should blkback stop itself from being unloaded if delete-on-close is in > > > use? > > > > Hm, maybe. I guess that's the best we can do right now. > > I’ll implement this. Let's make this a separate patch. > > > > Isn't there some existing way to check whether a device is opened? > > > > (stat syscall maybe?). > > > > > > Knowing that the device has been opened isn’t enough. The block script > > > needs to be able to wait for blkback (and not something else) to open > > > the device. Otherwise it will be confused if the device is opened by > > > e.g. udev. > > > > Urg, no, the block script cannot wait indefinitely for blkback to open > > the device, as it has an execution timeout. blkback is free to only > > open the device upon guest frontend connection, and that (when using > > libxl) requires the hotplug scripts execution to be finished so the > > guest can be started. > > I’m a bit confused here. My understanding is that blkdev_get_by_dev() > already opens the device, and that happens in the xenstore watch > handler. I have tested this with delete-on-close device-mapper devices, > and it does work. Right, but on a very contended system there's no guarantee of when blkback will pick up the update to "physical-device" and open the device, so far the block script only writes the physical-device node and exits. With the proposed change the block script will also wait for blkback to react to the physcal-device write, hence making VM creation slower. > > > > I would like to avoid adding more xenstore blkback state if such > > > > information can be fetched from other methods. > > > > > > I don’t think it can be, unless the information is passed via a > > > completely different method. Maybe netlink(7) or ioctl(2)? Arguably > > > this information should not be stored in Xenstore at all, as it exposes > > > backend implementation details to the frontend. > > > > Could you maybe use sysfs for this information? > > Probably? This would involve adding a new file in sysfs. > > > We have all sorts of crap in xenstore, but it would be best if we can > > see of placing stuff like this in another interface. > > Fair. Let's see if that's a suitable approach, and we can avoid having to add an extra node to xenstore. Thanks, Roger.
On Tue, May 30, 2023 at 04:31:16PM -0400, Demi Marie Obenour wrote: > Set "opened" to "0" before the hotplug script is called. Once the > device node has been opened, set "opened" to "1". > > "opened" is used exclusively by userspace. It serves two purposes: > > 1. It tells userspace that the diskseq Xenstore entry is supported. > > 2. It tells userspace that it can wait for "opened" to be set to 1. > Once "opened" is 1, blkback has a reference to the device, so > userspace doesn't need to keep one. > > Together, these changes allow userspace to use block devices with > delete-on-close behavior, such as loop devices with the autoclear flag > set or device-mapper devices with the deferred-remove flag set. Now that I think a bit more about this, how are you planning to handle reboot with such devices? It's fine for loop (because those get instantiated by the block script), but likely not with other block devices, as on reboot the toolstack will find the block device is gone. I guess the delete-on-close is only intended to be used for loop devices? (or in general block devices that are instantiated by the block script itself) Thanks, Roger.
On Thu, Jun 08, 2023 at 11:11:44AM +0200, Roger Pau Monné wrote: > On Wed, Jun 07, 2023 at 12:29:26PM -0400, Demi Marie Obenour wrote: > > On Wed, Jun 07, 2023 at 10:44:48AM +0200, Roger Pau Monné wrote: > > > On Tue, Jun 06, 2023 at 01:31:25PM -0400, Demi Marie Obenour wrote: > > > > On Tue, Jun 06, 2023 at 11:15:37AM +0200, Roger Pau Monné wrote: > > > > > On Tue, May 30, 2023 at 04:31:16PM -0400, Demi Marie Obenour wrote: > > > > > > Set "opened" to "0" before the hotplug script is called. Once the > > > > > > device node has been opened, set "opened" to "1". > > > > > > > > > > > > "opened" is used exclusively by userspace. It serves two purposes: > > > > > > > > > > > > 1. It tells userspace that the diskseq Xenstore entry is supported. > > > > > > > > > > > > 2. It tells userspace that it can wait for "opened" to be set to 1. > > > > > > Once "opened" is 1, blkback has a reference to the device, so > > > > > > userspace doesn't need to keep one. > > > > > > > > > > > > Together, these changes allow userspace to use block devices with > > > > > > delete-on-close behavior, such as loop devices with the autoclear flag > > > > > > set or device-mapper devices with the deferred-remove flag set. > > > > > > > > > > There was some work in the past to allow reloading blkback as a > > > > > module, it's clear that using delete-on-close won't work if attempting > > > > > to reload blkback. > > > > > > > > Should blkback stop itself from being unloaded if delete-on-close is in > > > > use? > > > > > > Hm, maybe. I guess that's the best we can do right now. > > > > I’ll implement this. > > Let's make this a separate patch. Good idea. > > > > > Isn't there some existing way to check whether a device is opened? > > > > > (stat syscall maybe?). > > > > > > > > Knowing that the device has been opened isn’t enough. The block script > > > > needs to be able to wait for blkback (and not something else) to open > > > > the device. Otherwise it will be confused if the device is opened by > > > > e.g. udev. > > > > > > Urg, no, the block script cannot wait indefinitely for blkback to open > > > the device, as it has an execution timeout. blkback is free to only > > > open the device upon guest frontend connection, and that (when using > > > libxl) requires the hotplug scripts execution to be finished so the > > > guest can be started. > > > > I’m a bit confused here. My understanding is that blkdev_get_by_dev() > > already opens the device, and that happens in the xenstore watch > > handler. I have tested this with delete-on-close device-mapper devices, > > and it does work. > > Right, but on a very contended system there's no guarantee of when > blkback will pick up the update to "physical-device" and open the > device, so far the block script only writes the physical-device node > and exits. With the proposed change the block script will also wait > for blkback to react to the physcal-device write, hence making VM > creation slower. Only block scripts that choose to wait for device open suffer this performance penalty. My current plan is to only do so for delete-on-close devices which are managed by the block script itself. Other devices will not suffer a performance hit. In the long term, I would like to solve this problem entirely by using an ioctl to configure blkback. The ioctl would take a file descriptor argument, avoiding the need for a round-trip through xenstore. This also solves a security annoyance with the current design, which is that the device is opened by a kernel thread and so the security context of whoever requested the device to be opened is lost. > > > > > I would like to avoid adding more xenstore blkback state if such > > > > > information can be fetched from other methods. > > > > > > > > I don’t think it can be, unless the information is passed via a > > > > completely different method. Maybe netlink(7) or ioctl(2)? Arguably > > > > this information should not be stored in Xenstore at all, as it exposes > > > > backend implementation details to the frontend. > > > > > > Could you maybe use sysfs for this information? > > > > Probably? This would involve adding a new file in sysfs. > > > > > We have all sorts of crap in xenstore, but it would be best if we can > > > see of placing stuff like this in another interface. > > > > Fair. > > Let's see if that's a suitable approach, and we can avoid having to > add an extra node to xenstore. I thought about this some more and realized that in Qubes OS, we might want to include the diskseq in the information dom0 gets about each exported block device. This would allow dom0 to write the xenstore node itself, but it would require some way for dom0 to be informed about blkback having this feature.
On Thu, Jun 08, 2023 at 12:08:55PM +0200, Roger Pau Monné wrote: > On Tue, May 30, 2023 at 04:31:16PM -0400, Demi Marie Obenour wrote: > > Set "opened" to "0" before the hotplug script is called. Once the > > device node has been opened, set "opened" to "1". > > > > "opened" is used exclusively by userspace. It serves two purposes: > > > > 1. It tells userspace that the diskseq Xenstore entry is supported. > > > > 2. It tells userspace that it can wait for "opened" to be set to 1. > > Once "opened" is 1, blkback has a reference to the device, so > > userspace doesn't need to keep one. > > > > Together, these changes allow userspace to use block devices with > > delete-on-close behavior, such as loop devices with the autoclear flag > > set or device-mapper devices with the deferred-remove flag set. > > Now that I think a bit more about this, how are you planning to handle > reboot with such devices? It's fine for loop (because those get > instantiated by the block script), but likely not with other block > devices, as on reboot the toolstack will find the block device is > gone. > > I guess the delete-on-close is only intended to be used for loop > devices? (or in general block devices that are instantiated by the > block script itself) You understand correctly.
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index 9c3eb148fbd802c74e626c3d7bcd69dcb09bd921..519a78aa9073d1faa1dce5c1b36e95ae58da534b 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -3,6 +3,20 @@ Copyright (C) 2005 Rusty Russell <rusty@rustcorp.com.au> Copyright (C) 2005 XenSource Ltd +In addition to the Xenstore nodes required by the Xen block device +specification, this implementation of blkback uses a new Xenstore +node: "opened". blkback sets "opened" to "0" before the hotplug script +is called. Once the device node has been opened, blkback sets "opened" +to "1". + +"opened" is read exclusively by userspace. It serves two purposes: + +1. It tells userspace that diskseq@major:minor syntax for "physical-device" is + supported. + +2. It tells userspace that it can wait for "opened" to be set to 1 after writing + "physical-device". Once "opened" is 1, blkback has a reference to the + device, so userspace doesn't need to keep one. */ @@ -699,6 +713,14 @@ static int xen_blkbk_probe(struct xenbus_device *dev, if (err) pr_warn("%s write out 'max-ring-page-order' failed\n", __func__); + /* + * This informs userspace that the "opened" node will be set to "1" when + * the device has been opened successfully. + */ + err = xenbus_write(XBT_NIL, dev->nodename, "opened", "0"); + if (err) + goto fail; + err = xenbus_switch_state(dev, XenbusStateInitWait); if (err) goto fail; @@ -826,6 +848,19 @@ static void backend_changed(struct xenbus_watch *watch, goto fail; } + /* + * Tell userspace that the device has been opened and that blkback has a + * reference to it. Userspace can then close the device or mark it as + * delete-on-close, knowing that blkback will keep the device open as + * long as necessary. + */ + err = xenbus_write(XBT_NIL, dev->nodename, "opened", "1"); + if (err) { + xenbus_dev_fatal(dev, err, "%s: notifying userspace device has been opened", + dev->nodename); + goto free_vbd; + } + err = xenvbd_sysfs_addif(dev); if (err) { xenbus_dev_fatal(dev, err, "creating sysfs entries");
Set "opened" to "0" before the hotplug script is called. Once the device node has been opened, set "opened" to "1". "opened" is used exclusively by userspace. It serves two purposes: 1. It tells userspace that the diskseq Xenstore entry is supported. 2. It tells userspace that it can wait for "opened" to be set to 1. Once "opened" is 1, blkback has a reference to the device, so userspace doesn't need to keep one. Together, these changes allow userspace to use block devices with delete-on-close behavior, such as loop devices with the autoclear flag set or device-mapper devices with the deferred-remove flag set. Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> --- drivers/block/xen-blkback/xenbus.c | 35 ++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+)