diff mbox series

[11/12] hw/xen: automatically assign device index to block devices

Message ID 20231016151909.22133-12-dwmw2@infradead.org (mailing list archive)
State New, archived
Headers show
Series Get Xen PV shim running in qemu | expand

Commit Message

David Woodhouse Oct. 16, 2023, 3:19 p.m. UTC
From: David Woodhouse <dwmw@amazon.co.uk>

There's no need to force the user to assign a vdev. We can automatically
assign one, starting at xvda and searching until we find the first disk
name that's unused.

This means we can now allow '-drive if=xen,file=xxx' to work without an
explicit separate -driver argument, just like if=virtio.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 blockdev.c           | 15 ++++++++++++---
 hw/block/xen-block.c | 25 +++++++++++++++++++++++++
 2 files changed, 37 insertions(+), 3 deletions(-)

Comments

Kevin Wolf Oct. 17, 2023, 10:21 a.m. UTC | #1
Am 16.10.2023 um 17:19 hat David Woodhouse geschrieben:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> There's no need to force the user to assign a vdev. We can automatically
> assign one, starting at xvda and searching until we find the first disk
> name that's unused.
> 
> This means we can now allow '-drive if=xen,file=xxx' to work without an
> explicit separate -driver argument, just like if=virtio.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

> @@ -34,6 +34,31 @@ static char *xen_block_get_name(XenDevice *xendev, Error **errp)
>      XenBlockDevice *blockdev = XEN_BLOCK_DEVICE(xendev);
>      XenBlockVdev *vdev = &blockdev->props.vdev;
>  
> +    if (blockdev->props.vdev.type == XEN_BLOCK_VDEV_TYPE_INVALID) {
> +        char name[11];
> +        int disk = 0;
> +        unsigned long idx;
> +
> +        /* Find an unoccupied device name */
> +        while (disk < (1 << 20)) {

I like your optimism that we can handle a million disks. :-)

I haven't reviewed the Xen part in detail, but the patch looks fine on
the block layer side.

Acked-by: Kevin Wolf <kwolf@redhat.com>
David Woodhouse Oct. 17, 2023, 6:02 p.m. UTC | #2
On Tue, 2023-10-17 at 12:21 +0200, Kevin Wolf wrote:
> Am 16.10.2023 um 17:19 hat David Woodhouse geschrieben:
> > From: David Woodhouse <dwmw@amazon.co.uk>
> > 
> > There's no need to force the user to assign a vdev. We can automatically
> > assign one, starting at xvda and searching until we find the first disk
> > name that's unused.
> > 
> > This means we can now allow '-drive if=xen,file=xxx' to work without an
> > explicit separate -driver argument, just like if=virtio.
> > 
> > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> 
> > @@ -34,6 +34,31 @@ static char *xen_block_get_name(XenDevice *xendev, Error **errp)
> >      XenBlockDevice *blockdev = XEN_BLOCK_DEVICE(xendev);
> >      XenBlockVdev *vdev = &blockdev->props.vdev;
> >  
> > +    if (blockdev->props.vdev.type == XEN_BLOCK_VDEV_TYPE_INVALID) {
> > +        char name[11];
> > +        int disk = 0;
> > +        unsigned long idx;
> > +
> > +        /* Find an unoccupied device name */
> > +        while (disk < (1 << 20)) {
> 
> I like your optimism that we can handle a million disks. :-)

Heh, yeah. Given that there *is* a limit, setting it lower seemed a bit
arbitrary.

For consoles I picked 100 instead of letting it go all the way to
INT_MAX, and in a patch set soon to be posted I'll do the same for the
xen-net-device as I convert it.

Even with a limit of 100, having that many devices *WITHOUT SPECIFYING
WHICH ONE IS WHICH* seems a bit many!

FWIW I've changed it to check for the existence of the *frontend* nodes
(the ones which are visible to the guest). Currently it checks for the
backend nodes, but those might be in different places.

> I haven't reviewed the Xen part in detail, but the patch looks fine on
> the block layer side.
> 
> Acked-by: Kevin Wolf <kwolf@redhat.com>

Thanks.
Igor Mammedov Oct. 18, 2023, 7:32 a.m. UTC | #3
On Mon, 16 Oct 2023 16:19:08 +0100
David Woodhouse <dwmw2@infradead.org> wrote:

> From: David Woodhouse <dwmw@amazon.co.uk>
> 

is this index a user (guest) visible?

> There's no need to force the user to assign a vdev. We can automatically
> assign one, starting at xvda and searching until we find the first disk
> name that's unused.
> 
> This means we can now allow '-drive if=xen,file=xxx' to work without an
> explicit separate -driver argument, just like if=virtio.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  blockdev.c           | 15 ++++++++++++---
>  hw/block/xen-block.c | 25 +++++++++++++++++++++++++
>  2 files changed, 37 insertions(+), 3 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 325b7a3bef..9dec4c9c74 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -255,13 +255,13 @@ void drive_check_orphaned(void)
>           * Ignore default drives, because we create certain default
>           * drives unconditionally, then leave them unclaimed.  Not the
>           * users fault.
> -         * Ignore IF_VIRTIO, because it gets desugared into -device,
> -         * so we can leave failing to -device.
> +         * Ignore IF_VIRTIO or IF_XEN, because it gets desugared into
> +         * -device, so we can leave failing to -device.
>           * Ignore IF_NONE, because leaving unclaimed IF_NONE remains
>           * available for device_add is a feature.
>           */
>          if (dinfo->is_default || dinfo->type == IF_VIRTIO
> -            || dinfo->type == IF_NONE) {
> +            || dinfo->type == IF_XEN || dinfo->type == IF_NONE) {
>              continue;
>          }
>          if (!blk_get_attached_dev(blk)) {
> @@ -977,6 +977,15 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type,
>          qemu_opt_set(devopts, "driver", "virtio-blk", &error_abort);
>          qemu_opt_set(devopts, "drive", qdict_get_str(bs_opts, "id"),
>                       &error_abort);
> +    } else if (type == IF_XEN) {
> +        QemuOpts *devopts;
> +        devopts = qemu_opts_create(qemu_find_opts("device"), NULL, 0,
> +                                   &error_abort);
> +        qemu_opt_set(devopts, "driver",
> +                     (media == MEDIA_CDROM) ? "xen-cdrom" : "xen-disk",
> +                     &error_abort);
> +        qemu_opt_set(devopts, "drive", qdict_get_str(bs_opts, "id"),
> +                     &error_abort);
>      }
>  
>      filename = qemu_opt_get(legacy_opts, "file");
> diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
> index 9262338535..20fa783cbe 100644
> --- a/hw/block/xen-block.c
> +++ b/hw/block/xen-block.c
> @@ -34,6 +34,31 @@ static char *xen_block_get_name(XenDevice *xendev, Error **errp)
>      XenBlockDevice *blockdev = XEN_BLOCK_DEVICE(xendev);
>      XenBlockVdev *vdev = &blockdev->props.vdev;
>  
> +    if (blockdev->props.vdev.type == XEN_BLOCK_VDEV_TYPE_INVALID) {
> +        char name[11];
> +        int disk = 0;
> +        unsigned long idx;
> +
> +        /* Find an unoccupied device name */
> +        while (disk < (1 << 20)) {
> +            if (disk < (1 << 4)) {
> +                idx = (202 << 8) | (disk << 4);
> +            } else {
> +                idx = (1 << 28) | (disk << 8);
> +            }
> +            snprintf(name, sizeof(name), "%lu", idx);
> +            if (!xen_backend_exists("qdisk", name)) {
> +                vdev->type = XEN_BLOCK_VDEV_TYPE_XVD;
> +                vdev->partition = 0;
> +                vdev->disk = disk;
> +                vdev->number = idx;
> +                return g_strdup(name);
> +            }
> +            disk++;
> +        }
> +        error_setg(errp, "cannot find device vdev for block device");
> +        return NULL;
> +    }
>      return g_strdup_printf("%lu", vdev->number);
>  }
>
David Woodhouse Oct. 18, 2023, 8:32 a.m. UTC | #4
On Wed, 2023-10-18 at 09:32 +0200, Igor Mammedov wrote:
> On Mon, 16 Oct 2023 16:19:08 +0100
> David Woodhouse <dwmw2@infradead.org> wrote:
> 
> > From: David Woodhouse <dwmw@amazon.co.uk>
> > 
> 
> is this index a user (guest) visible?

Yes. It defines what block device (e.g. /dev/xvda) the disk appears as
in the guest. In the common case, it literally encodes the Linux
major/minor numbers. So xvda (major 202) is 0xca00, xvdb is 0xca10 etc.

Previously we had to explicitly set it for each disk in Qemu:

  -drive file=disk1.img,id=drv1 -device xen-disk,drive=drv1,vdev=xvda
  -drive file=disk2.img,id=drv2 -device xen-disk,drive=drv2,vdev=xvdb

Now we can just do

  -drive file=disk1.img,if=xen -drive file-disk2.img,if=xen

(We could go further and make if=xen the default for Xen guests too,
but that doesn't work right now because xen-block will barf on the
default provided CD-ROM when it's empty. It doesn't handle empty
devices. And if I work around that, then `-hda disk1.img` would work on
the command line... but would make it appear as /dev/xvda instead of
/dev/hda, and I don't know how I feel about that.)

[root@localhost ~]# xenstore-ls  -f device/vbd
device/vbd = ""
device/vbd/51712 = ""
device/vbd/51712/backend = "/local/domain/0/backend/qdisk/1/51712"
device/vbd/51712/backend-id = "0"
device/vbd/51712/device-type = "disk"
device/vbd/51712/event-channel = "8"
device/vbd/51712/feature-persistent = "1"
device/vbd/51712/protocol = "x86_64-abi"
device/vbd/51712/ring-ref = "8"
device/vbd/51712/state = "4"
device/vbd/51712/virtual-device = "51712"

> 
> > There's no need to force the user to assign a vdev. We can automatically
> > assign one, starting at xvda and searching until we find the first disk
> > name that's unused.
> > 
> > This means we can now allow '-drive if=xen,file=xxx' to work without an
> > explicit separate -driver argument, just like if=virtio.
> > 
> > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> > ---
> >  blockdev.c           | 15 ++++++++++++---
> >  hw/block/xen-block.c | 25 +++++++++++++++++++++++++
> >  2 files changed, 37 insertions(+), 3 deletions(-)
> > 
> > diff --git a/blockdev.c b/blockdev.c
> > index 325b7a3bef..9dec4c9c74 100644
> > --- a/blockdev.c
> > +++ b/blockdev.c
> > @@ -255,13 +255,13 @@ void drive_check_orphaned(void)
> >           * Ignore default drives, because we create certain default
> >           * drives unconditionally, then leave them unclaimed.  Not the
> >           * users fault.
> > -         * Ignore IF_VIRTIO, because it gets desugared into -device,
> > -         * so we can leave failing to -device.
> > +         * Ignore IF_VIRTIO or IF_XEN, because it gets desugared into
> > +         * -device, so we can leave failing to -device.
> >           * Ignore IF_NONE, because leaving unclaimed IF_NONE remains
> >           * available for device_add is a feature.
> >           */
> >          if (dinfo->is_default || dinfo->type == IF_VIRTIO
> > -            || dinfo->type == IF_NONE) {
> > +            || dinfo->type == IF_XEN || dinfo->type == IF_NONE) {
> >              continue;
> >          }
> >          if (!blk_get_attached_dev(blk)) {
> > @@ -977,6 +977,15 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type,
> >          qemu_opt_set(devopts, "driver", "virtio-blk", &error_abort);
> >          qemu_opt_set(devopts, "drive", qdict_get_str(bs_opts, "id"),
> >                       &error_abort);
> > +    } else if (type == IF_XEN) {
> > +        QemuOpts *devopts;
> > +        devopts = qemu_opts_create(qemu_find_opts("device"), NULL, 0,
> > +                                   &error_abort);
> > +        qemu_opt_set(devopts, "driver",
> > +                     (media == MEDIA_CDROM) ? "xen-cdrom" : "xen-disk",
> > +                     &error_abort);
> > +        qemu_opt_set(devopts, "drive", qdict_get_str(bs_opts, "id"),
> > +                     &error_abort);
> >      }
> >  
> >      filename = qemu_opt_get(legacy_opts, "file");
> > diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
> > index 9262338535..20fa783cbe 100644
> > --- a/hw/block/xen-block.c
> > +++ b/hw/block/xen-block.c
> > @@ -34,6 +34,31 @@ static char *xen_block_get_name(XenDevice *xendev, Error **errp)
> >      XenBlockDevice *blockdev = XEN_BLOCK_DEVICE(xendev);
> >      XenBlockVdev *vdev = &blockdev->props.vdev;
> >  
> > +    if (blockdev->props.vdev.type == XEN_BLOCK_VDEV_TYPE_INVALID) {
> > +        char name[11];
> > +        int disk = 0;
> > +        unsigned long idx;
> > +
> > +        /* Find an unoccupied device name */
> > +        while (disk < (1 << 20)) {
> > +            if (disk < (1 << 4)) {
> > +                idx = (202 << 8) | (disk << 4);
> > +            } else {
> > +                idx = (1 << 28) | (disk << 8);
> > +            }
> > +            snprintf(name, sizeof(name), "%lu", idx);
> > +            if (!xen_backend_exists("qdisk", name)) {
> > +                vdev->type = XEN_BLOCK_VDEV_TYPE_XVD;
> > +                vdev->partition = 0;
> > +                vdev->disk = disk;
> > +                vdev->number = idx;
> > +                return g_strdup(name);
> > +            }
> > +            disk++;
> > +        }
> > +        error_setg(errp, "cannot find device vdev for block device");
> > +        return NULL;
> > +    }
> >      return g_strdup_printf("%lu", vdev->number);
> >  }
> >  
> 
>
Kevin Wolf Oct. 18, 2023, 8:52 a.m. UTC | #5
Am 16.10.2023 um 17:19 hat David Woodhouse geschrieben:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> There's no need to force the user to assign a vdev. We can automatically
> assign one, starting at xvda and searching until we find the first disk
> name that's unused.
> 
> This means we can now allow '-drive if=xen,file=xxx' to work without an
> explicit separate -driver argument, just like if=virtio.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

Actually, how does this play together with xen_config_dev_blk()? This
looks like it tried to implement a very similar thing (which is IF_XEN
even already existed).

Are we now trying to attach each if=xen disk twice in the 'xenpv'
machine? Or if something prevents this, is it dead code?

I think in both cases, we would want to delete that function and remove
the loop that calls it in xen_init_pv()?

Kevin
David Woodhouse Oct. 18, 2023, 10:52 a.m. UTC | #6
On Wed, 2023-10-18 at 10:52 +0200, Kevin Wolf wrote:
> Am 16.10.2023 um 17:19 hat David Woodhouse geschrieben:
> > From: David Woodhouse <dwmw@amazon.co.uk>
> > 
> > There's no need to force the user to assign a vdev. We can automatically
> > assign one, starting at xvda and searching until we find the first disk
> > name that's unused.
> > 
> > This means we can now allow '-drive if=xen,file=xxx' to work without an
> > explicit separate -driver argument, just like if=virtio.
> > 
> > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> 
> Actually, how does this play together with xen_config_dev_blk()? This
> looks like it tried to implement a very similar thing (which is IF_XEN
> even already existed).

Ah yes, thanks for spotting that! I hadn't been looking at the xenfv

> Are we now trying to attach each if=xen disk twice in the 'xenpv'
> machine? Or if something prevents this, is it dead code.

I suspect we end up creating them twice (and probably thus failing to
lock the backing file).

The old Xen PV device drivers in Qemu, before Paul's XenDevice
conversion, were purely reactive. If they see nodes in XenStore
describing a backend which they should implement, they'd do so.

The code in xen_config_dev_blk() is *creating* those nodes for the
frontend (guest) and backend (host/qemu) to see, which prompts the
xen-block and xen-net drivers into action.

In the new XenDevice model, we can just instantiate a device (e.g.
qdev_new(TYPE_XEN_DISK_DEVICE) and its realize() method will create the
corresponding XenStore nodes (with some help from the generic XenBus
code for the common ones).

The new XenDevice/XenBus model will *also* react to nodes which it
discovers in XenStore, and instantiate the corresponding devices.

So I suspect what'll happen is xen_config_dev_blk() will create the
nodes starting at xvda (int vdev = 202 * 256 + 16 * disk->unit), and
later my new code will create a new set starting from xvdb or wherever
the next free one is.

> I think in both cases, we would want to delete that function and remove
> the loop that calls it in xen_init_pv()?

Yes, I think so. The xen_config_dev_blk() one can just die, as can
xen_config_dev_console() which isn't called from anywhere anyway.

The vkbd/vfb ones can stay until/unless those drivers are converted to
the new XenDevice model.

And xen_config_dev_nic() probably just needs to loop doing the same as
I did in pc_init_nic() in
https://lore.kernel.org/qemu-devel/20231017182545.97973-5-dwmw2@infradead.org/T/#u

+        if (xen_bus && (!nd->model || g_str_equal(model, "xen-net-device"))) {
+            DeviceState *dev = qdev_new("xen-net-device");
+            qdev_set_nic_properties(dev, nd);
+            qdev_realize_and_unref(dev, xen_bus, &error_fatal);


... but this just reinforces what I said there about "if
qmp_device_add() can find the damn bus and do this right, why do we
have to litter it through platform code?"
David Woodhouse Oct. 18, 2023, 11:13 p.m. UTC | #7
On Wed, 2023-10-18 at 10:52 +0200, Kevin Wolf wrote:
> Am 16.10.2023 um 17:19 hat David Woodhouse geschrieben:
> > From: David Woodhouse <dwmw@amazon.co.uk>
> > 
> > There's no need to force the user to assign a vdev. We can automatically
> > assign one, starting at xvda and searching until we find the first disk
> > name that's unused.
> > 
> > This means we can now allow '-drive if=xen,file=xxx' to work without an
> > explicit separate -driver argument, just like if=virtio.
> > 
> > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> 
> Actually, how does this play together with xen_config_dev_blk()? This
> looks like it tried to implement a very similar thing (which is IF_XEN
> even already existed).
> 
> Are we now trying to attach each if=xen disk twice in the 'xenpv'
> machine? Or if something prevents this, is it dead code?
> 
> I think in both cases, we would want to delete that function and remove
> the loop that calls it in xen_init_pv()?

I tested this with an xl config which looks a bit like this...

disk = [ "backendtype=qdisk,format=qcow2,vdev=xvda,access=rw,target=/var/lib/libvirt/images/fedora28.qcow2" ]
device_model_override = "/home/dwmw2/git/qemu/build/qemu-system-x86_64"
device_model_version = "qemu-xen"
device_model_args = [ "-trace","xen*","-chardev","pty,id=mon","-mon","mon","-drive","file=/var/lib/libvirt/images/solaris11.qcow2,format=qcow2,if=xen","-nic","user,model=xen" ]

The code in xen_config_dev_blk() scribbles over the disk that the
toolstack has configured for me, and adds that qcow2 file from the
'-drive' option on the command line, but in *raw* mode.

Then the new xen-disk support kicks in and finds a free vdev, and adds
the -drive properly /dev/xvdb (as qcow2).

So v2 of this patch will just rip out xen_config_dev_blk() as you
suggest.
Kevin Wolf Oct. 19, 2023, 11:21 a.m. UTC | #8
Am 18.10.2023 um 12:52 hat David Woodhouse geschrieben:
> > Actually, how does this play together with xen_config_dev_blk()? This
> > looks like it tried to implement a very similar thing (which is IF_XEN
> > even already existed).
> 
> Ah yes, thanks for spotting that! I hadn't been looking at the xenfv
> 
> > Are we now trying to attach each if=xen disk twice in the 'xenpv'
> > machine? Or if something prevents this, is it dead code.
> 
> I suspect we end up creating them twice (and probably thus failing to
> lock the backing file).
>
> [...]
>
> ... but this just reinforces what I said there about "if
> qmp_device_add() can find the damn bus and do this right, why do we
> have to litter it through platform code?"

Indeed, if you can do -device, it's always the best option. For block
devices not the least because it gives you a way to use -blockdev with
it. I'm happy whenever I see a drive_get() call go away.

Kevin
David Woodhouse Oct. 20, 2023, 5:47 p.m. UTC | #9
On Wed, 2023-10-18 at 11:52 +0100, David Woodhouse wrote:
> 
> And xen_config_dev_nic() probably just needs to loop doing the same
> as
> I did in pc_init_nic() in
> https://lore.kernel.org/qemu-devel/20231017182545.97973-5-dwmw2@infradead.org/T/#u
> 
> +        if (xen_bus && (!nd->model || g_str_equal(model, "xen-net-
> device"))) {
> +            DeviceState *dev = qdev_new("xen-net-device");
> +            qdev_set_nic_properties(dev, nd);
> +            qdev_realize_and_unref(dev, xen_bus, &error_fatal);
> 
> 
> ... but this just reinforces what I said there about "if
> qmp_device_add() can find the damn bus and do this right, why do we
> have to litter it through platform code?"

I had a look through the network setup.

There are a bunch of platforms adding specific devices to their own
internal system bus, which often use nd_table[] directly. Sometimes
they do so whether it's been set up or now.

They can mostly be divided into two camps. Some of them will create
their NIC anyway, and will use a matching -nic configuration if it
exists. Others will only create their NIC if a corresponding -nic
configuration does exist.

This is fairly random, and perhaps platforms should be more consistent,
but it's best to avoid user-visible changes as much as possible while
doing the cleanup, so I've kept them as they were.

I've created qemu_configure_nic_device() and qemu_create_nic_device()
functions for those two use cases respectively, and most code which
directly accesses nd_table[] can be converted to those fairly simply:
https://git.infradead.org/users/dwmw2/qemu.git/commitdiff/7b4fb6fc10a4

It means I can throw away the horrid parts of -nic support for the Xen
network device, which were in the pc and xenfv platforms, and replace
it with a trivial loop in xenbus_init():

+    /* This is a bus-generic "configure all NICs on this bus type" */
+    while ((qnic = qemu_create_nic_device("xen-net-device", true, "xen"))) {
+            qdev_realize_and_unref(qnic, bus, &error_fatal);

Other than that one (which is cheating because there's only one type of
network device that can be instantiated on the XenBus), the only
remaining case is PCI. Most platforms just iterate over the -nic
configurations adding devices to a PCI bus of the platform's choice.
In some cases there's a special case, the first one goes at a specific
devfn and/or on a different bus.

There was a little more variation here... for example fuloong2e would
put the first nic (nd_table[0]) in slot 7 only if it's an rtl8139 (if
the model is unspecified). But PReP would put the first PCI NIC in slot
3 *regardless* of whether it's the expected PCNET or not.

I didn't faithfully preserve the behaviour there, because I don't think
it matters. They mostly look like this now (e.g. hw/sh4/r2d):

+    nd = qemu_find_nic_info(mc->default_nic, true, NULL);
+    if (nd) {
+        pci_nic_init_nofail(nd, pci_bus, mc->default_nic, "2");
+    }
+    pci_init_nic_devices(pci_bus, mc->default_nic);

So they'll take the first NIC configuration which is of the expected
model (again, or unspecified model) and place that in the special slot,
and then put the rest of the devices wherever they land.

For the change in behaviour to *matter*, the user would have to
explicitly specify a NIC of the *non-default* type first, and then a
NIC of the default type. My new code will put the default-type NIC in
the "right place", while the old code mostly wouldn't. I think that's
an OK change to make.

My plan is to *remember* which NIC models are used for lookups during
the board in, so that qemu_show_nic_devices() can print them all at the
end.

Current WIP if anyone wants to take a quick look at
https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/xenfv-net
but the weekend arrived quicker than I'd hoped, so I haven't quite got
it into shape to post yet. Hopefully it makes sense as an approach
though?
Igor Mammedov Oct. 23, 2023, 9:30 a.m. UTC | #10
On Wed, 18 Oct 2023 09:32:47 +0100
David Woodhouse <dwmw2@infradead.org> wrote:

> On Wed, 2023-10-18 at 09:32 +0200, Igor Mammedov wrote:
> > On Mon, 16 Oct 2023 16:19:08 +0100
> > David Woodhouse <dwmw2@infradead.org> wrote:
> >   
> > > From: David Woodhouse <dwmw@amazon.co.uk>
> > >   
> > 
> > is this index a user (guest) visible?  
> 
> Yes. It defines what block device (e.g. /dev/xvda) the disk appears as
> in the guest. In the common case, it literally encodes the Linux
> major/minor numbers. So xvda (major 202) is 0xca00, xvdb is 0xca10 etc.

that makes 'index' an implicit ABI and a subject to versioning
when the way it's assigned changes (i.e. one has to use versioned
machine types to keep older versions working the they used to).

From what I remember it's discouraged to make QEMU invent
various IDs that are part of ABI (guest or mgmt side).
Instead it's preferred for mgmt side/user to provide that explicitly.

Basically you are trading off manageability/simplicity at QEMU
level with CLI usability for human user.
I don't care much as long as it is hidden within xen code base,
but maybe libvirt does.

> Previously we had to explicitly set it for each disk in Qemu:
> 
>   -drive file=disk1.img,id=drv1 -device xen-disk,drive=drv1,vdev=xvda
>   -drive file=disk2.img,id=drv2 -device xen-disk,drive=drv2,vdev=xvdb
> 
> Now we can just do
> 
>   -drive file=disk1.img,if=xen -drive file-disk2.img,if=xen
>
> (We could go further and make if=xen the default for Xen guests too,
> but that doesn't work right now because xen-block will barf on the
> default provided CD-ROM when it's empty. It doesn't handle empty
> devices. And if I work around that, then `-hda disk1.img` would work on
> the command line... but would make it appear as /dev/xvda instead of
> /dev/hda, and I don't know how I feel about that.)
>
> [root@localhost ~]# xenstore-ls  -f device/vbd
> device/vbd = ""
> device/vbd/51712 = ""
> device/vbd/51712/backend = "/local/domain/0/backend/qdisk/1/51712"
> device/vbd/51712/backend-id = "0"
> device/vbd/51712/device-type = "disk"
> device/vbd/51712/event-channel = "8"
> device/vbd/51712/feature-persistent = "1"
> device/vbd/51712/protocol = "x86_64-abi"
> device/vbd/51712/ring-ref = "8"
> device/vbd/51712/state = "4"
> device/vbd/51712/virtual-device = "51712"
> 
> >   
> > > There's no need to force the user to assign a vdev. We can automatically
> > > assign one, starting at xvda and searching until we find the first disk
> > > name that's unused.
> > > 
> > > This means we can now allow '-drive if=xen,file=xxx' to work without an
> > > explicit separate -driver argument, just like if=virtio.
> > > 
> > > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> > > ---
> > >  blockdev.c           | 15 ++++++++++++---
> > >  hw/block/xen-block.c | 25 +++++++++++++++++++++++++
> > >  2 files changed, 37 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/blockdev.c b/blockdev.c
> > > index 325b7a3bef..9dec4c9c74 100644
> > > --- a/blockdev.c
> > > +++ b/blockdev.c
> > > @@ -255,13 +255,13 @@ void drive_check_orphaned(void)
> > >           * Ignore default drives, because we create certain default
> > >           * drives unconditionally, then leave them unclaimed.  Not the
> > >           * users fault.
> > > -         * Ignore IF_VIRTIO, because it gets desugared into -device,
> > > -         * so we can leave failing to -device.
> > > +         * Ignore IF_VIRTIO or IF_XEN, because it gets desugared into
> > > +         * -device, so we can leave failing to -device.
> > >           * Ignore IF_NONE, because leaving unclaimed IF_NONE remains
> > >           * available for device_add is a feature.
> > >           */
> > >          if (dinfo->is_default || dinfo->type == IF_VIRTIO
> > > -            || dinfo->type == IF_NONE) {
> > > +            || dinfo->type == IF_XEN || dinfo->type == IF_NONE) {
> > >              continue;
> > >          }
> > >          if (!blk_get_attached_dev(blk)) {
> > > @@ -977,6 +977,15 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type,
> > >          qemu_opt_set(devopts, "driver", "virtio-blk", &error_abort);
> > >          qemu_opt_set(devopts, "drive", qdict_get_str(bs_opts, "id"),
> > >                       &error_abort);
> > > +    } else if (type == IF_XEN) {
> > > +        QemuOpts *devopts;
> > > +        devopts = qemu_opts_create(qemu_find_opts("device"), NULL, 0,
> > > +                                   &error_abort);
> > > +        qemu_opt_set(devopts, "driver",
> > > +                     (media == MEDIA_CDROM) ? "xen-cdrom" : "xen-disk",
> > > +                     &error_abort);
> > > +        qemu_opt_set(devopts, "drive", qdict_get_str(bs_opts, "id"),
> > > +                     &error_abort);
> > >      }
> > >  
> > >      filename = qemu_opt_get(legacy_opts, "file");
> > > diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
> > > index 9262338535..20fa783cbe 100644
> > > --- a/hw/block/xen-block.c
> > > +++ b/hw/block/xen-block.c
> > > @@ -34,6 +34,31 @@ static char *xen_block_get_name(XenDevice *xendev, Error **errp)
> > >      XenBlockDevice *blockdev = XEN_BLOCK_DEVICE(xendev);
> > >      XenBlockVdev *vdev = &blockdev->props.vdev;
> > >  
> > > +    if (blockdev->props.vdev.type == XEN_BLOCK_VDEV_TYPE_INVALID) {
> > > +        char name[11];
> > > +        int disk = 0;
> > > +        unsigned long idx;
> > > +
> > > +        /* Find an unoccupied device name */
> > > +        while (disk < (1 << 20)) {
> > > +            if (disk < (1 << 4)) {
> > > +                idx = (202 << 8) | (disk << 4);
> > > +            } else {
> > > +                idx = (1 << 28) | (disk << 8);
> > > +            }
> > > +            snprintf(name, sizeof(name), "%lu", idx);
> > > +            if (!xen_backend_exists("qdisk", name)) {
> > > +                vdev->type = XEN_BLOCK_VDEV_TYPE_XVD;
> > > +                vdev->partition = 0;
> > > +                vdev->disk = disk;
> > > +                vdev->number = idx;
> > > +                return g_strdup(name);
> > > +            }
> > > +            disk++;
> > > +        }
> > > +        error_setg(errp, "cannot find device vdev for block device");
> > > +        return NULL;
> > > +    }
> > >      return g_strdup_printf("%lu", vdev->number);
> > >  }
> > >    
> > 
> >   
>
David Woodhouse Oct. 23, 2023, 9:42 a.m. UTC | #11
On 23 October 2023 10:30:02 BST, Igor Mammedov <imammedo@redhat.com> wrote:
>On Wed, 18 Oct 2023 09:32:47 +0100
>David Woodhouse <dwmw2@infradead.org> wrote:
>
>> On Wed, 2023-10-18 at 09:32 +0200, Igor Mammedov wrote:
>> > On Mon, 16 Oct 2023 16:19:08 +0100
>> > David Woodhouse <dwmw2@infradead.org> wrote:
>> >   
>> > > From: David Woodhouse <dwmw@amazon.co.uk>
>> > >   
>> > 
>> > is this index a user (guest) visible?  
>> 
>> Yes. It defines what block device (e.g. /dev/xvda) the disk appears as
>> in the guest. In the common case, it literally encodes the Linux
>> major/minor numbers. So xvda (major 202) is 0xca00, xvdb is 0xca10 etc.
>
>that makes 'index' an implicit ABI and a subject to versioning
>when the way it's assigned changes (i.e. one has to use versioned
>machine types to keep older versions working the they used to).
>
>From what I remember it's discouraged to make QEMU invent
>various IDs that are part of ABI (guest or mgmt side).
>Instead it's preferred for mgmt side/user to provide that explicitly.
>
>Basically you are trading off manageability/simplicity at QEMU
>level with CLI usability for human user.
>I don't care much as long as it is hidden within xen code base,
>but maybe libvirt does.

Well, it can still be set explicitly. So not so much a "trade-off" as adding the option for the user to choose the simple way.

Yes, in a way it's an ABI, just like the dynamic assignment of PCI devfn for network devices added with "-nic". And I think also for virtio block devices too? And for the ISA ne2000.

But it seems unlikely that we'll ever really want to change "the first one is xvda, the second is xvdb...."
Kevin Wolf Oct. 23, 2023, 1:45 p.m. UTC | #12
Am 23.10.2023 um 11:30 hat Igor Mammedov geschrieben:
> On Wed, 18 Oct 2023 09:32:47 +0100
> David Woodhouse <dwmw2@infradead.org> wrote:
> 
> > On Wed, 2023-10-18 at 09:32 +0200, Igor Mammedov wrote:
> > > On Mon, 16 Oct 2023 16:19:08 +0100
> > > David Woodhouse <dwmw2@infradead.org> wrote:
> > >   
> > > > From: David Woodhouse <dwmw@amazon.co.uk>
> > > >   
> > > 
> > > is this index a user (guest) visible?  
> > 
> > Yes. It defines what block device (e.g. /dev/xvda) the disk appears as
> > in the guest. In the common case, it literally encodes the Linux
> > major/minor numbers. So xvda (major 202) is 0xca00, xvdb is 0xca10 etc.
> 
> that makes 'index' an implicit ABI and a subject to versioning
> when the way it's assigned changes (i.e. one has to use versioned
> machine types to keep older versions working the they used to).
> 
> From what I remember it's discouraged to make QEMU invent
> various IDs that are part of ABI (guest or mgmt side).
> Instead it's preferred for mgmt side/user to provide that explicitly.
> 
> Basically you are trading off manageability/simplicity at QEMU
> level with CLI usability for human user.
> I don't care much as long as it is hidden within xen code base,
> but maybe libvirt does.

-drive is mostly a convenience option for human users anyway. Management
tools should use a combination of -blockdev and -device.

Kevin
diff mbox series

Patch

diff --git a/blockdev.c b/blockdev.c
index 325b7a3bef..9dec4c9c74 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -255,13 +255,13 @@  void drive_check_orphaned(void)
          * Ignore default drives, because we create certain default
          * drives unconditionally, then leave them unclaimed.  Not the
          * users fault.
-         * Ignore IF_VIRTIO, because it gets desugared into -device,
-         * so we can leave failing to -device.
+         * Ignore IF_VIRTIO or IF_XEN, because it gets desugared into
+         * -device, so we can leave failing to -device.
          * Ignore IF_NONE, because leaving unclaimed IF_NONE remains
          * available for device_add is a feature.
          */
         if (dinfo->is_default || dinfo->type == IF_VIRTIO
-            || dinfo->type == IF_NONE) {
+            || dinfo->type == IF_XEN || dinfo->type == IF_NONE) {
             continue;
         }
         if (!blk_get_attached_dev(blk)) {
@@ -977,6 +977,15 @@  DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type,
         qemu_opt_set(devopts, "driver", "virtio-blk", &error_abort);
         qemu_opt_set(devopts, "drive", qdict_get_str(bs_opts, "id"),
                      &error_abort);
+    } else if (type == IF_XEN) {
+        QemuOpts *devopts;
+        devopts = qemu_opts_create(qemu_find_opts("device"), NULL, 0,
+                                   &error_abort);
+        qemu_opt_set(devopts, "driver",
+                     (media == MEDIA_CDROM) ? "xen-cdrom" : "xen-disk",
+                     &error_abort);
+        qemu_opt_set(devopts, "drive", qdict_get_str(bs_opts, "id"),
+                     &error_abort);
     }
 
     filename = qemu_opt_get(legacy_opts, "file");
diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index 9262338535..20fa783cbe 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -34,6 +34,31 @@  static char *xen_block_get_name(XenDevice *xendev, Error **errp)
     XenBlockDevice *blockdev = XEN_BLOCK_DEVICE(xendev);
     XenBlockVdev *vdev = &blockdev->props.vdev;
 
+    if (blockdev->props.vdev.type == XEN_BLOCK_VDEV_TYPE_INVALID) {
+        char name[11];
+        int disk = 0;
+        unsigned long idx;
+
+        /* Find an unoccupied device name */
+        while (disk < (1 << 20)) {
+            if (disk < (1 << 4)) {
+                idx = (202 << 8) | (disk << 4);
+            } else {
+                idx = (1 << 28) | (disk << 8);
+            }
+            snprintf(name, sizeof(name), "%lu", idx);
+            if (!xen_backend_exists("qdisk", name)) {
+                vdev->type = XEN_BLOCK_VDEV_TYPE_XVD;
+                vdev->partition = 0;
+                vdev->disk = disk;
+                vdev->number = idx;
+                return g_strdup(name);
+            }
+            disk++;
+        }
+        error_setg(errp, "cannot find device vdev for block device");
+        return NULL;
+    }
     return g_strdup_printf("%lu", vdev->number);
 }