diff mbox series

[v6] xen/pt: reserve PCI slot 2 for Intel igd-passthru

Message ID 830263507e8f1a24a94f81909d5102c4b204e938.1672615492.git.brchuckz@aol.com (mailing list archive)
State Superseded
Headers show
Series [v6] xen/pt: reserve PCI slot 2 for Intel igd-passthru | expand

Commit Message

Chuck Zmudzinski Jan. 1, 2023, 11:52 p.m. UTC
Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus,
as noted in docs/igd-assign.txt in the Qemu source code.

Currently, when the xl toolstack is used to configure a Xen HVM guest with
Intel IGD passthrough to the guest with the Qemu upstream device model,
a Qemu emulated PCI device will occupy slot 2 and the Intel IGD will occupy
a different slot. This problem often prevents the guest from booting.

The only available workaround is not good: Configure Xen HVM guests to use
the old and no longer maintained Qemu traditional device model available
from xenbits.xen.org which does reserve slot 2 for the Intel IGD.

To implement this feature in the Qemu upstream device model for Xen HVM
guests, introduce the following new functions, types, and macros:

* XEN_PT_DEVICE_CLASS declaration, based on the existing TYPE_XEN_PT_DEVICE
* XEN_PT_DEVICE_GET_CLASS macro helper function for XEN_PT_DEVICE_CLASS
* typedef XenPTQdevRealize function pointer
* XEN_PCI_IGD_SLOT_MASK, the value of slot_reserved_mask to reserve slot 2
* xen_igd_reserve_slot and xen_igd_clear_slot functions

The new xen_igd_reserve_slot function uses the existing slot_reserved_mask
member of PCIBus to reserve PCI slot 2 for Xen HVM guests configured using
the xl toolstack with the gfx_passthru option enabled, which sets the
igd-passthru=on option to Qemu for the Xen HVM machine type.

The new xen_igd_reserve_slot function also needs to be implemented in
hw/xen/xen_pt_stub.c to prevent FTBFS during the link stage for the case
when Qemu is configured with --enable-xen and --disable-xen-pci-passthrough,
in which case it does nothing.

The new xen_igd_clear_slot function overrides qdev->realize of the parent
PCI device class to enable the Intel IGD to occupy slot 2 on the PCI bus
since slot 2 was reserved by xen_igd_reserve_slot when the PCI bus was
created in hw/i386/pc_piix.c for the case when igd-passthru=on.

Move the call to xen_host_pci_device_get, and the associated error
handling, from xen_pt_realize to the new xen_igd_clear_slot function to
initialize the device class and vendor values which enables the checks for
the Intel IGD to succeed. The verification that the host device is an
Intel IGD to be passed through is done by checking the domain, bus, slot,
and function values as well as by checking that gfx_passthru is enabled,
the device class is VGA, and the device vendor in Intel.

Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
---
Notes that might be helpful to reviewers of patched code in hw/xen:

The new functions and types are based on recommendations from Qemu docs:
https://qemu.readthedocs.io/en/latest/devel/qom.html

Notes that might be helpful to reviewers of patched code in hw/i386:

The small patch to hw/i386/pc_piix.c is protected by CONFIG_XEN so it does
not affect builds that do not have CONFIG_XEN defined.

xen_igd_gfx_pt_enabled() in the patched hw/i386/pc_piix.c file is an
existing function that is only true when Qemu is built with
xen-pci-passthrough enabled and the administrator has configured the Xen
HVM guest with Qemu's igd-passthru=on option.

v2: Remove From: <email address> tag at top of commit message

v3: Changed the test for the Intel IGD in xen_igd_clear_slot:

    if (is_igd_vga_passthrough(&s->real_device) &&
        (s->real_device.vendor_id == PCI_VENDOR_ID_INTEL)) {

    is changed to

    if (xen_igd_gfx_pt_enabled() && (s->hostaddr.slot == 2)
        && (s->hostaddr.function == 0)) {

    I hoped that I could use the test in v2, since it matches the
    other tests for the Intel IGD in Qemu and Xen, but those tests
    do not work because the necessary data structures are not set with
    their values yet. So instead use the test that the administrator
    has enabled gfx_passthru and the device address on the host is
    02.0. This test does detect the Intel IGD correctly.

v4: Use brchuckz@aol.com instead of brchuckz@netscape.net for the author's
    email address to match the address used by the same author in commits
    be9c61da and c0e86b76
    
    Change variable for XEN_PT_DEVICE_CLASS: xptc changed to xpdc

v5: The patch of xen_pt.c was re-worked to allow a more consistent test
    for the Intel IGD that uses the same criteria as in other places.
    This involved moving the call to xen_host_pci_device_get from
    xen_pt_realize to xen_igd_clear_slot and updating the checks for the
    Intel IGD in xen_igd_clear_slot:
    
    if (xen_igd_gfx_pt_enabled() && (s->hostaddr.slot == 2)
        && (s->hostaddr.function == 0)) {

    is changed to

    if (is_igd_vga_passthrough(&s->real_device) &&
        s->real_device.domain == 0 && s->real_device.bus == 0 &&
        s->real_device.dev == 2 && s->real_device.func == 0 &&
        s->real_device.vendor_id == PCI_VENDOR_ID_INTEL) {

    Added an explanation for the move of xen_host_pci_device_get from
    xen_pt_realize to xen_igd_clear_slot to the commit message.

    Rebase.

v6: Fix logging by removing these lines from the move from xen_pt_realize
    to xen_igd_clear_slot that was done in v5:

    XEN_PT_LOG(d, "Assigning real physical device %02x:%02x.%d"
               " to devfn 0x%x\n",
               s->hostaddr.bus, s->hostaddr.slot, s->hostaddr.function,
               s->dev.devfn);

    This log needs to be in xen_pt_realize because s->dev.devfn is not
    set yet in xen_igd_clear_slot.

    Sorry for the extra noise.

 hw/i386/pc_piix.c    |  3 +++
 hw/xen/xen_pt.c      | 46 +++++++++++++++++++++++++++++++++++---------
 hw/xen/xen_pt.h      | 16 +++++++++++++++
 hw/xen/xen_pt_stub.c |  4 ++++
 4 files changed, 60 insertions(+), 9 deletions(-)

Comments

Michael S. Tsirkin Jan. 2, 2023, 5:46 p.m. UTC | #1
On Sun, Jan 01, 2023 at 06:52:03PM -0500, Chuck Zmudzinski wrote:
> Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus,
> as noted in docs/igd-assign.txt in the Qemu source code.
> 
> Currently, when the xl toolstack is used to configure a Xen HVM guest with
> Intel IGD passthrough to the guest with the Qemu upstream device model,
> a Qemu emulated PCI device will occupy slot 2 and the Intel IGD will occupy
> a different slot. This problem often prevents the guest from booting.
> 
> The only available workaround is not good: Configure Xen HVM guests to use
> the old and no longer maintained Qemu traditional device model available
> from xenbits.xen.org which does reserve slot 2 for the Intel IGD.
> 
> To implement this feature in the Qemu upstream device model for Xen HVM
> guests, introduce the following new functions, types, and macros:
> 
> * XEN_PT_DEVICE_CLASS declaration, based on the existing TYPE_XEN_PT_DEVICE
> * XEN_PT_DEVICE_GET_CLASS macro helper function for XEN_PT_DEVICE_CLASS
> * typedef XenPTQdevRealize function pointer
> * XEN_PCI_IGD_SLOT_MASK, the value of slot_reserved_mask to reserve slot 2
> * xen_igd_reserve_slot and xen_igd_clear_slot functions
> 
> The new xen_igd_reserve_slot function uses the existing slot_reserved_mask
> member of PCIBus to reserve PCI slot 2 for Xen HVM guests configured using
> the xl toolstack with the gfx_passthru option enabled, which sets the
> igd-passthru=on option to Qemu for the Xen HVM machine type.
> 
> The new xen_igd_reserve_slot function also needs to be implemented in
> hw/xen/xen_pt_stub.c to prevent FTBFS during the link stage for the case
> when Qemu is configured with --enable-xen and --disable-xen-pci-passthrough,
> in which case it does nothing.
> 
> The new xen_igd_clear_slot function overrides qdev->realize of the parent
> PCI device class to enable the Intel IGD to occupy slot 2 on the PCI bus
> since slot 2 was reserved by xen_igd_reserve_slot when the PCI bus was
> created in hw/i386/pc_piix.c for the case when igd-passthru=on.
> 
> Move the call to xen_host_pci_device_get, and the associated error
> handling, from xen_pt_realize to the new xen_igd_clear_slot function to
> initialize the device class and vendor values which enables the checks for
> the Intel IGD to succeed. The verification that the host device is an
> Intel IGD to be passed through is done by checking the domain, bus, slot,
> and function values as well as by checking that gfx_passthru is enabled,
> the device class is VGA, and the device vendor in Intel.
> 
> Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>

I'm not sure why is the issue xen specific. Can you explain?
Doesn't it affect kvm too?

> ---
> Notes that might be helpful to reviewers of patched code in hw/xen:
> 
> The new functions and types are based on recommendations from Qemu docs:
> https://qemu.readthedocs.io/en/latest/devel/qom.html
> 
> Notes that might be helpful to reviewers of patched code in hw/i386:
> 
> The small patch to hw/i386/pc_piix.c is protected by CONFIG_XEN so it does
> not affect builds that do not have CONFIG_XEN defined.
> 
> xen_igd_gfx_pt_enabled() in the patched hw/i386/pc_piix.c file is an
> existing function that is only true when Qemu is built with
> xen-pci-passthrough enabled and the administrator has configured the Xen
> HVM guest with Qemu's igd-passthru=on option.
> 
> v2: Remove From: <email address> tag at top of commit message
> 
> v3: Changed the test for the Intel IGD in xen_igd_clear_slot:
> 
>     if (is_igd_vga_passthrough(&s->real_device) &&
>         (s->real_device.vendor_id == PCI_VENDOR_ID_INTEL)) {
> 
>     is changed to
> 
>     if (xen_igd_gfx_pt_enabled() && (s->hostaddr.slot == 2)
>         && (s->hostaddr.function == 0)) {
> 
>     I hoped that I could use the test in v2, since it matches the
>     other tests for the Intel IGD in Qemu and Xen, but those tests
>     do not work because the necessary data structures are not set with
>     their values yet. So instead use the test that the administrator
>     has enabled gfx_passthru and the device address on the host is
>     02.0. This test does detect the Intel IGD correctly.
> 
> v4: Use brchuckz@aol.com instead of brchuckz@netscape.net for the author's
>     email address to match the address used by the same author in commits
>     be9c61da and c0e86b76
>     
>     Change variable for XEN_PT_DEVICE_CLASS: xptc changed to xpdc
> 
> v5: The patch of xen_pt.c was re-worked to allow a more consistent test
>     for the Intel IGD that uses the same criteria as in other places.
>     This involved moving the call to xen_host_pci_device_get from
>     xen_pt_realize to xen_igd_clear_slot and updating the checks for the
>     Intel IGD in xen_igd_clear_slot:
>     
>     if (xen_igd_gfx_pt_enabled() && (s->hostaddr.slot == 2)
>         && (s->hostaddr.function == 0)) {
> 
>     is changed to
> 
>     if (is_igd_vga_passthrough(&s->real_device) &&
>         s->real_device.domain == 0 && s->real_device.bus == 0 &&
>         s->real_device.dev == 2 && s->real_device.func == 0 &&
>         s->real_device.vendor_id == PCI_VENDOR_ID_INTEL) {
> 
>     Added an explanation for the move of xen_host_pci_device_get from
>     xen_pt_realize to xen_igd_clear_slot to the commit message.
> 
>     Rebase.
> 
> v6: Fix logging by removing these lines from the move from xen_pt_realize
>     to xen_igd_clear_slot that was done in v5:
> 
>     XEN_PT_LOG(d, "Assigning real physical device %02x:%02x.%d"
>                " to devfn 0x%x\n",
>                s->hostaddr.bus, s->hostaddr.slot, s->hostaddr.function,
>                s->dev.devfn);
> 
>     This log needs to be in xen_pt_realize because s->dev.devfn is not
>     set yet in xen_igd_clear_slot.
> 
>     Sorry for the extra noise.
> 
>  hw/i386/pc_piix.c    |  3 +++
>  hw/xen/xen_pt.c      | 46 +++++++++++++++++++++++++++++++++++---------
>  hw/xen/xen_pt.h      | 16 +++++++++++++++
>  hw/xen/xen_pt_stub.c |  4 ++++
>  4 files changed, 60 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index b48047f50c..bc5efa4f59 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -405,6 +405,9 @@ static void pc_xen_hvm_init(MachineState *machine)
>      }
>  
>      pc_xen_hvm_init_pci(machine);
> +    if (xen_igd_gfx_pt_enabled()) {
> +        xen_igd_reserve_slot(pcms->bus);
> +    }
>      pci_create_simple(pcms->bus, -1, "xen-platform");
>  }
>  #endif
> diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
> index 0ec7e52183..7fae1e7a6f 100644
> --- a/hw/xen/xen_pt.c
> +++ b/hw/xen/xen_pt.c
> @@ -780,15 +780,6 @@ static void xen_pt_realize(PCIDevice *d, Error **errp)
>                 s->hostaddr.bus, s->hostaddr.slot, s->hostaddr.function,
>                 s->dev.devfn);
>  
> -    xen_host_pci_device_get(&s->real_device,
> -                            s->hostaddr.domain, s->hostaddr.bus,
> -                            s->hostaddr.slot, s->hostaddr.function,
> -                            errp);
> -    if (*errp) {
> -        error_append_hint(errp, "Failed to \"open\" the real pci device");
> -        return;
> -    }
> -
>      s->is_virtfn = s->real_device.is_virtfn;
>      if (s->is_virtfn) {
>          XEN_PT_LOG(d, "%04x:%02x:%02x.%d is a SR-IOV Virtual Function\n",
> @@ -950,11 +941,47 @@ static void xen_pci_passthrough_instance_init(Object *obj)
>      PCI_DEVICE(obj)->cap_present |= QEMU_PCI_CAP_EXPRESS;
>  }
>  
> +void xen_igd_reserve_slot(PCIBus *pci_bus)
> +{
> +    XEN_PT_LOG(0, "Reserving PCI slot 2 for IGD\n");
> +    pci_bus->slot_reserved_mask |= XEN_PCI_IGD_SLOT_MASK;
> +}
> +
> +static void xen_igd_clear_slot(DeviceState *qdev, Error **errp)
> +{
> +    ERRP_GUARD();
> +    PCIDevice *pci_dev = (PCIDevice *)qdev;
> +    XenPCIPassthroughState *s = XEN_PT_DEVICE(pci_dev);
> +    XenPTDeviceClass *xpdc = XEN_PT_DEVICE_GET_CLASS(s);
> +    PCIBus *pci_bus = pci_get_bus(pci_dev);
> +
> +    xen_host_pci_device_get(&s->real_device,
> +                            s->hostaddr.domain, s->hostaddr.bus,
> +                            s->hostaddr.slot, s->hostaddr.function,
> +                            errp);
> +    if (*errp) {
> +        error_append_hint(errp, "Failed to \"open\" the real pci device");
> +        return;
> +    }
> +
> +    if (is_igd_vga_passthrough(&s->real_device) &&
> +        s->real_device.domain == 0 && s->real_device.bus == 0 &&
> +        s->real_device.dev == 2 && s->real_device.func == 0 &&
> +        s->real_device.vendor_id == PCI_VENDOR_ID_INTEL) {
> +        pci_bus->slot_reserved_mask &= ~XEN_PCI_IGD_SLOT_MASK;
> +        XEN_PT_LOG(pci_dev, "Intel IGD found, using slot 2\n");
> +    }
> +    xpdc->pci_qdev_realize(qdev, errp);
> +}
> +
>  static void xen_pci_passthrough_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
>      PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>  
> +    XenPTDeviceClass *xpdc = XEN_PT_DEVICE_CLASS(klass);
> +    xpdc->pci_qdev_realize = dc->realize;
> +    dc->realize = xen_igd_clear_slot;
>      k->realize = xen_pt_realize;
>      k->exit = xen_pt_unregister_device;
>      k->config_read = xen_pt_pci_read_config;
> @@ -977,6 +1004,7 @@ static const TypeInfo xen_pci_passthrough_info = {
>      .instance_size = sizeof(XenPCIPassthroughState),
>      .instance_finalize = xen_pci_passthrough_finalize,
>      .class_init = xen_pci_passthrough_class_init,
> +    .class_size = sizeof(XenPTDeviceClass),
>      .instance_init = xen_pci_passthrough_instance_init,
>      .interfaces = (InterfaceInfo[]) {
>          { INTERFACE_CONVENTIONAL_PCI_DEVICE },
> diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
> index e7c4316a7d..40b31b5263 100644
> --- a/hw/xen/xen_pt.h
> +++ b/hw/xen/xen_pt.h
> @@ -3,6 +3,7 @@
>  
>  #include "hw/xen/xen_common.h"
>  #include "hw/pci/pci.h"
> +#include "hw/pci/pci_bus.h"
>  #include "xen-host-pci-device.h"
>  #include "qom/object.h"
>  
> @@ -41,7 +42,20 @@ typedef struct XenPTReg XenPTReg;
>  #define TYPE_XEN_PT_DEVICE "xen-pci-passthrough"
>  OBJECT_DECLARE_SIMPLE_TYPE(XenPCIPassthroughState, XEN_PT_DEVICE)
>  
> +#define XEN_PT_DEVICE_CLASS(klass) \
> +    OBJECT_CLASS_CHECK(XenPTDeviceClass, klass, TYPE_XEN_PT_DEVICE)
> +#define XEN_PT_DEVICE_GET_CLASS(obj) \
> +    OBJECT_GET_CLASS(XenPTDeviceClass, obj, TYPE_XEN_PT_DEVICE)
> +
> +typedef void (*XenPTQdevRealize)(DeviceState *qdev, Error **errp);
> +
> +typedef struct XenPTDeviceClass {
> +    PCIDeviceClass parent_class;
> +    XenPTQdevRealize pci_qdev_realize;
> +} XenPTDeviceClass;
> +
>  uint32_t igd_read_opregion(XenPCIPassthroughState *s);
> +void xen_igd_reserve_slot(PCIBus *pci_bus);
>  void igd_write_opregion(XenPCIPassthroughState *s, uint32_t val);
>  void xen_igd_passthrough_isa_bridge_create(XenPCIPassthroughState *s,
>                                             XenHostPCIDevice *dev);
> @@ -76,6 +90,8 @@ typedef int (*xen_pt_conf_byte_read)
>  
>  #define XEN_PCI_INTEL_OPREGION 0xfc
>  
> +#define XEN_PCI_IGD_SLOT_MASK 0x4UL /* Intel IGD slot_reserved_mask */
> +
>  typedef enum {
>      XEN_PT_GRP_TYPE_HARDWIRED = 0,  /* 0 Hardwired reg group */
>      XEN_PT_GRP_TYPE_EMU,            /* emul reg group */
> diff --git a/hw/xen/xen_pt_stub.c b/hw/xen/xen_pt_stub.c
> index 2d8cac8d54..5c108446a8 100644
> --- a/hw/xen/xen_pt_stub.c
> +++ b/hw/xen/xen_pt_stub.c
> @@ -20,3 +20,7 @@ void xen_igd_gfx_pt_set(bool value, Error **errp)
>          error_setg(errp, "Xen PCI passthrough support not built in");
>      }
>  }
> +
> +void xen_igd_reserve_slot(PCIBus *pci_bus)
> +{
> +}
> -- 
> 2.39.0
Chuck Zmudzinski Jan. 2, 2023, 11:10 p.m. UTC | #2
On 1/2/23 12:46 PM, Michael S. Tsirkin wrote:
> On Sun, Jan 01, 2023 at 06:52:03PM -0500, Chuck Zmudzinski wrote:
> > Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus,
> > as noted in docs/igd-assign.txt in the Qemu source code.
> > 
> > Currently, when the xl toolstack is used to configure a Xen HVM guest with
> > Intel IGD passthrough to the guest with the Qemu upstream device model,
> > a Qemu emulated PCI device will occupy slot 2 and the Intel IGD will occupy
> > a different slot. This problem often prevents the guest from booting.
> > 
> > The only available workaround is not good: Configure Xen HVM guests to use
> > the old and no longer maintained Qemu traditional device model available
> > from xenbits.xen.org which does reserve slot 2 for the Intel IGD.
> > 
> > To implement this feature in the Qemu upstream device model for Xen HVM
> > guests, introduce the following new functions, types, and macros:
> > 
> > * XEN_PT_DEVICE_CLASS declaration, based on the existing TYPE_XEN_PT_DEVICE
> > * XEN_PT_DEVICE_GET_CLASS macro helper function for XEN_PT_DEVICE_CLASS
> > * typedef XenPTQdevRealize function pointer
> > * XEN_PCI_IGD_SLOT_MASK, the value of slot_reserved_mask to reserve slot 2
> > * xen_igd_reserve_slot and xen_igd_clear_slot functions
> > 
> > The new xen_igd_reserve_slot function uses the existing slot_reserved_mask
> > member of PCIBus to reserve PCI slot 2 for Xen HVM guests configured using
> > the xl toolstack with the gfx_passthru option enabled, which sets the
> > igd-passthru=on option to Qemu for the Xen HVM machine type.
> > 
> > The new xen_igd_reserve_slot function also needs to be implemented in
> > hw/xen/xen_pt_stub.c to prevent FTBFS during the link stage for the case
> > when Qemu is configured with --enable-xen and --disable-xen-pci-passthrough,
> > in which case it does nothing.
> > 
> > The new xen_igd_clear_slot function overrides qdev->realize of the parent
> > PCI device class to enable the Intel IGD to occupy slot 2 on the PCI bus
> > since slot 2 was reserved by xen_igd_reserve_slot when the PCI bus was
> > created in hw/i386/pc_piix.c for the case when igd-passthru=on.
> > 
> > Move the call to xen_host_pci_device_get, and the associated error
> > handling, from xen_pt_realize to the new xen_igd_clear_slot function to
> > initialize the device class and vendor values which enables the checks for
> > the Intel IGD to succeed. The verification that the host device is an
> > Intel IGD to be passed through is done by checking the domain, bus, slot,
> > and function values as well as by checking that gfx_passthru is enabled,
> > the device class is VGA, and the device vendor in Intel.
> > 
> > Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
>
> I'm not sure why is the issue xen specific. Can you explain?
> Doesn't it affect kvm too?

Recall from docs/igd-assign.txt that there are two modes for
igd passthrough: legacy and upt, and the igd needs to be
at slot 2 only when using legacy mode which gives one
single guest exclusive access to the Intel igd.

It's only xen specific insofar as xen does not have support
for the upt mode so xen must use legacy mode which
requires the igd to be at slot 2. I am not an expert with
kvm, but if I understand correctly, with kvm one can use
the upt mode with the Intel i915 kvmgt kernel module
and in that case the guest will see a virtual Intel gpu
that can be at any arbitrary slot when using kvmgt, and
also, in that case, more than one guest can access the
igd through the kvmgt kernel module.

Again, I am not an expert and do not have as much
experience with kvm, but if I understand correctly it is
possible to use the legacy mode with kvm and I think you
are correct that if one uses kvm in legacy mode and without
using the Intel i915 kvmgt kernel module, then it would be
necessary to reserve slot 2 for the igd on kvm.

Your question makes me curious, and I have not been able
to determine if anyone has tried igd passthrough using
legacy mode on kvm with recent versions of linux and qemu.
I will try reproducing the problem on kvm in legacy mode with
current versions of linux and qemu and report my findings.
With kvm, there might be enough flexibility to specify the
slot number for every pci device in the guest. Such a
capability is not available using the xenlight toolstack
for managing xen guests, so I have been using this patch
to ensure that the Intel igd is at slot 2 with xen guests
created by the xenlight toolstack.

The patch as is will only fix the problem on xen, so if the
problem exists on kvm also, I agree that the patch should
be modified to also fix it on kvm.

Chuck
Alex Williamson Jan. 3, 2023, 3:14 p.m. UTC | #3
On Mon, 2 Jan 2023 18:10:24 -0500
Chuck Zmudzinski <brchuckz@netscape.net> wrote:

> On 1/2/23 12:46 PM, Michael S. Tsirkin wrote:
> > On Sun, Jan 01, 2023 at 06:52:03PM -0500, Chuck Zmudzinski wrote:  
> > > Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus,
> > > as noted in docs/igd-assign.txt in the Qemu source code.
> > > 
> > > Currently, when the xl toolstack is used to configure a Xen HVM guest with
> > > Intel IGD passthrough to the guest with the Qemu upstream device model,
> > > a Qemu emulated PCI device will occupy slot 2 and the Intel IGD will occupy
> > > a different slot. This problem often prevents the guest from booting.
> > > 
> > > The only available workaround is not good: Configure Xen HVM guests to use
> > > the old and no longer maintained Qemu traditional device model available
> > > from xenbits.xen.org which does reserve slot 2 for the Intel IGD.
> > > 
> > > To implement this feature in the Qemu upstream device model for Xen HVM
> > > guests, introduce the following new functions, types, and macros:
> > > 
> > > * XEN_PT_DEVICE_CLASS declaration, based on the existing TYPE_XEN_PT_DEVICE
> > > * XEN_PT_DEVICE_GET_CLASS macro helper function for XEN_PT_DEVICE_CLASS
> > > * typedef XenPTQdevRealize function pointer
> > > * XEN_PCI_IGD_SLOT_MASK, the value of slot_reserved_mask to reserve slot 2
> > > * xen_igd_reserve_slot and xen_igd_clear_slot functions
> > > 
> > > The new xen_igd_reserve_slot function uses the existing slot_reserved_mask
> > > member of PCIBus to reserve PCI slot 2 for Xen HVM guests configured using
> > > the xl toolstack with the gfx_passthru option enabled, which sets the
> > > igd-passthru=on option to Qemu for the Xen HVM machine type.
> > > 
> > > The new xen_igd_reserve_slot function also needs to be implemented in
> > > hw/xen/xen_pt_stub.c to prevent FTBFS during the link stage for the case
> > > when Qemu is configured with --enable-xen and --disable-xen-pci-passthrough,
> > > in which case it does nothing.
> > > 
> > > The new xen_igd_clear_slot function overrides qdev->realize of the parent
> > > PCI device class to enable the Intel IGD to occupy slot 2 on the PCI bus
> > > since slot 2 was reserved by xen_igd_reserve_slot when the PCI bus was
> > > created in hw/i386/pc_piix.c for the case when igd-passthru=on.
> > > 
> > > Move the call to xen_host_pci_device_get, and the associated error
> > > handling, from xen_pt_realize to the new xen_igd_clear_slot function to
> > > initialize the device class and vendor values which enables the checks for
> > > the Intel IGD to succeed. The verification that the host device is an
> > > Intel IGD to be passed through is done by checking the domain, bus, slot,
> > > and function values as well as by checking that gfx_passthru is enabled,
> > > the device class is VGA, and the device vendor in Intel.
> > > 
> > > Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>  
> >
> > I'm not sure why is the issue xen specific. Can you explain?
> > Doesn't it affect kvm too?  
> 
> Recall from docs/igd-assign.txt that there are two modes for
> igd passthrough: legacy and upt, and the igd needs to be
> at slot 2 only when using legacy mode which gives one
> single guest exclusive access to the Intel igd.
> 
> It's only xen specific insofar as xen does not have support
> for the upt mode so xen must use legacy mode which
> requires the igd to be at slot 2. I am not an expert with

UPT mode never fully materialized for direct assignment, the folks at
Intel championing this scenario left.

> kvm, but if I understand correctly, with kvm one can use
> the upt mode with the Intel i915 kvmgt kernel module
> and in that case the guest will see a virtual Intel gpu
> that can be at any arbitrary slot when using kvmgt, and
> also, in that case, more than one guest can access the
> igd through the kvmgt kernel module.

This is true, IIRC an Intel vGPU does not need to be in slot 2.

> Again, I am not an expert and do not have as much
> experience with kvm, but if I understand correctly it is
> possible to use the legacy mode with kvm and I think you
> are correct that if one uses kvm in legacy mode and without
> using the Intel i915 kvmgt kernel module, then it would be
> necessary to reserve slot 2 for the igd on kvm.

It's necessary to configure the assigned IGD at slot 2 to make it
functional, yes, but I don't really understand this notion of
"reserving" slot 2.  If something occupies address 00:02.0 in the
config, it's the user's or management tool's responsibility to move it
to make this configuration functional.  Why does QEMU need to play a
part in reserving this bus address.  IGD devices are not generally
hot-pluggable either, so it doesn't seem we need to reserve an address
in case an IGD device is added dynamically later.
 
> Your question makes me curious, and I have not been able
> to determine if anyone has tried igd passthrough using
> legacy mode on kvm with recent versions of linux and qemu.

Yes, it works.

> I will try reproducing the problem on kvm in legacy mode with
> current versions of linux and qemu and report my findings.
> With kvm, there might be enough flexibility to specify the
> slot number for every pci device in the guest. Such a

I think this is always the recommendation, libvirt will do this by
default in order to make sure the configuration is reproducible.  This
is what we generally rely on for kvm/vfio IGD assignment to place the
GPU at the correct address.

> capability is not available using the xenlight toolstack
> for managing xen guests, so I have been using this patch
> to ensure that the Intel igd is at slot 2 with xen guests
> created by the xenlight toolstack.

Seems like a deficiency in xenlight.  I'm not sure why QEMU should take
on this burden to support support tool stacks that lack such basic
features.
 
> The patch as is will only fix the problem on xen, so if the
> problem exists on kvm also, I agree that the patch should
> be modified to also fix it on kvm.

AFAICT, it's not a problem on kvm/vfio because we generally make use of
invocations that specify bus addresses for each device by default,
making this a configuration requirement for the user or management tool
stack.  Thanks,

Alex
Chuck Zmudzinski Jan. 3, 2023, 6:38 p.m. UTC | #4
On 1/2/2023 12:46 PM, Michael S. Tsirkin wrote:
> On Sun, Jan 01, 2023 at 06:52:03PM -0500, Chuck Zmudzinski wrote:
> > Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus,
> > as noted in docs/igd-assign.txt in the Qemu source code.
> > 
> > Currently, when the xl toolstack is used to configure a Xen HVM guest with
> > Intel IGD passthrough to the guest with the Qemu upstream device model,
> > a Qemu emulated PCI device will occupy slot 2 and the Intel IGD will occupy
> > a different slot. This problem often prevents the guest from booting.
> > 
> > The only available workaround is not good: Configure Xen HVM guests to use
> > the old and no longer maintained Qemu traditional device model available
> > from xenbits.xen.org which does reserve slot 2 for the Intel IGD.
> > 
> > To implement this feature in the Qemu upstream device model for Xen HVM
> > guests, introduce the following new functions, types, and macros:
> > 
> > * XEN_PT_DEVICE_CLASS declaration, based on the existing TYPE_XEN_PT_DEVICE
> > * XEN_PT_DEVICE_GET_CLASS macro helper function for XEN_PT_DEVICE_CLASS
> > * typedef XenPTQdevRealize function pointer
> > * XEN_PCI_IGD_SLOT_MASK, the value of slot_reserved_mask to reserve slot 2
> > * xen_igd_reserve_slot and xen_igd_clear_slot functions
> > 
> > The new xen_igd_reserve_slot function uses the existing slot_reserved_mask
> > member of PCIBus to reserve PCI slot 2 for Xen HVM guests configured using
> > the xl toolstack with the gfx_passthru option enabled, which sets the
> > igd-passthru=on option to Qemu for the Xen HVM machine type.
> > 
> > The new xen_igd_reserve_slot function also needs to be implemented in
> > hw/xen/xen_pt_stub.c to prevent FTBFS during the link stage for the case
> > when Qemu is configured with --enable-xen and --disable-xen-pci-passthrough,
> > in which case it does nothing.
> > 
> > The new xen_igd_clear_slot function overrides qdev->realize of the parent
> > PCI device class to enable the Intel IGD to occupy slot 2 on the PCI bus
> > since slot 2 was reserved by xen_igd_reserve_slot when the PCI bus was
> > created in hw/i386/pc_piix.c for the case when igd-passthru=on.
> > 
> > Move the call to xen_host_pci_device_get, and the associated error
> > handling, from xen_pt_realize to the new xen_igd_clear_slot function to
> > initialize the device class and vendor values which enables the checks for
> > the Intel IGD to succeed. The verification that the host device is an
> > Intel IGD to be passed through is done by checking the domain, bus, slot,
> > and function values as well as by checking that gfx_passthru is enabled,
> > the device class is VGA, and the device vendor in Intel.
> > 
> > Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
>
> I'm not sure why is the issue xen specific. Can you explain?
> Doesn't it affect kvm too?

Yes, it does, and of course this only applies to using the igd in a
guest in legacy mode as described in docs/igd-assign.txt.

Searching the web, I found this successful report of legacy
igd passthrough using kvm:

https://www.reddit.com/r/VFIO/comments/i9dbyp/this_is_how_i_managed_to_passthrough_my_igd/

That user posted the virtual machine xml on pastebin:

https://pastebin.com/vYf3a1gz

For reference, details of my configuration of legacy igd passthrough on
xen are available on the xenproject wiki:

https://wiki.xenproject.org/wiki/Xen_VGA_Passthrough_Tested_Adapters#Intel_display_adapters

As I expected, with kvm, it is possible to specify the slot number
of every pci device in the guest (as well as domain, bus, and function)
in the xml configuration, but this is not easy to do with xen's
xenlight (libxl) toolstack. That is why this patch is specific to xen.

To further explain this:

On xen, the xl.cfg guest configuration file does not allow the
administrator to specify the slot number of the xen platform
pci device, or of the emulated network device and disk controller,
and one of these devices will grab slot 2 without this patch to
qemu, making it impossible to have the passed through igd at
slot 2 on xen without patching qemu.

Another way to solve this problem on xen is to extend libxl so the
administrator can specify the slot number of the emulated qemu
pci devices, or possibly by using the xl.cfg
device_model_args_hvm=[ "ARG", "ARG", ...] settings which might
allow the administrator to control the slot number of the emulated
qemu pci devices, and I tried that without success.

This solution of patching qemu to reserve slot 2 for the intel igd when
the qemu igd-passthru=on option for the xenfv machine type is set is
a more simple solution to the problem on xen than trying to manually
set all the slot numbers using the device_model_args_hvm option in
xl.cfg.

I think kvm users who desire this feature of legacy igd passthrough
would benefit from something like the qemu igd-passthru=on option
which, as far as I know, only applies to the xenfv machine type that is
enabled with the gfx_passthru setting in the xl.cfg configuration file.
Such an option for kvm could allow for qemu to take care of all the
details of configuring the vm correctly for igd legacy passthrough
on kvm instead of requiring the administrator to manually specify
all the settings correctly in the xml configuration file.

I think making igd legacy passthrough easier to configure on kvm
would be a useful patch, but it is beyond the scope of what this patch
is trying to accomplish.

Chuck
Chuck Zmudzinski Jan. 3, 2023, 9:50 p.m. UTC | #5
On 1/3/2023 10:14 AM, Alex Williamson wrote:
> On Mon, 2 Jan 2023 18:10:24 -0500
> Chuck Zmudzinski <brchuckz@netscape.net> wrote:
>
> > On 1/2/23 12:46 PM, Michael S. Tsirkin wrote:
> > > On Sun, Jan 01, 2023 at 06:52:03PM -0500, Chuck Zmudzinski wrote:  
> > > > Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus,
> > > > as noted in docs/igd-assign.txt in the Qemu source code.
> > > > ... 
> > > > Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>  
> > >
> > > I'm not sure why is the issue xen specific. Can you explain?
> > > Doesn't it affect kvm too?  
> > 
> > Recall from docs/igd-assign.txt that there are two modes for
> > igd passthrough: legacy and upt, and the igd needs to be
> > at slot 2 only when using legacy mode which gives one
> > single guest exclusive access to the Intel igd.
> > 
> > It's only xen specific insofar as xen does not have support
> > for the upt mode so xen must use legacy mode which
> > requires the igd to be at slot 2. I am not an expert with
>
> UPT mode never fully materialized for direct assignment, the folks at
> Intel championing this scenario left.

Thanks for clarifying that for me.

>
> > kvm, but if I understand correctly, with kvm one can use
> > the upt mode with the Intel i915 kvmgt kernel module
> > and in that case the guest will see a virtual Intel gpu
> > that can be at any arbitrary slot when using kvmgt, and
> > also, in that case, more than one guest can access the
> > igd through the kvmgt kernel module.
>
> This is true, IIRC an Intel vGPU does not need to be in slot 2.
>
> > Again, I am not an expert and do not have as much
> > experience with kvm, but if I understand correctly it is
> > possible to use the legacy mode with kvm and I think you
> > are correct that if one uses kvm in legacy mode and without
> > using the Intel i915 kvmgt kernel module, then it would be
> > necessary to reserve slot 2 for the igd on kvm.
>
> It's necessary to configure the assigned IGD at slot 2 to make it
> functional, yes, but I don't really understand this notion of
> "reserving" slot 2.  If something occupies address 00:02.0 in the
> config, it's the user's or management tool's responsibility to move it
> to make this configuration functional.  Why does QEMU need to play a
> part in reserving this bus address.  IGD devices are not generally
> hot-pluggable either, so it doesn't seem we need to reserve an address
> in case an IGD device is added dynamically later.

As I said in earlier message in this thread, the xenlight toolstack (libxl) that is
provided as the default toolstack for building xen guests with pci passthrough
is not the most flexible management tool, and that is why, in the case of xen,
it is simpler to reserve slot 2 while qemu assigns the slot addresses of the
qemu emulated pci devices so that the igd can use slot 2. IIRC, In hw/pci/pci.c,
once the slot value is assigned, it is constant and cannot be changed later on
by a management tool.

>  
> > Your question makes me curious, and I have not been able
> > to determine if anyone has tried igd passthrough using
> > legacy mode on kvm with recent versions of linux and qemu.
>
> Yes, it works.
>
> > I will try reproducing the problem on kvm in legacy mode with
> > current versions of linux and qemu and report my findings.
> > With kvm, there might be enough flexibility to specify the
> > slot number for every pci device in the guest. Such a
>
> I think this is always the recommendation, libvirt will do this by
> default in order to make sure the configuration is reproducible.  This
> is what we generally rely on for kvm/vfio IGD assignment to place the
> GPU at the correct address.
>
> > capability is not available using the xenlight toolstack
> > for managing xen guests, so I have been using this patch
> > to ensure that the Intel igd is at slot 2 with xen guests
> > created by the xenlight toolstack.
>
> Seems like a deficiency in xenlight.  I'm not sure why QEMU should take
> on this burden to support support tool stacks that lack such basic
> features.

So you would prefer to patch xenlight (libxl) to make it flexible enough to properly
handle this case of legacy igd passthrough.

>  
> > The patch as is will only fix the problem on xen, so if the
> > problem exists on kvm also, I agree that the patch should
> > be modified to also fix it on kvm.
>
> AFAICT, it's not a problem on kvm/vfio because we generally make use of
> invocations that specify bus addresses for each device by default,
> making this a configuration requirement for the user or management tool
> stack.  Thanks,

Unfortunately, and as I mentioned in an earlier message on this thread,
the xenlight management tool stack (libxl) is not so flexible and does not
make it so easy for the administrator to specify the bus address of each
device, and that is why either this patch is needed for igd legacy passtrhough
on xen, or the libxl management tool needs to be patched so it is flexible
enough to enable the slot addresses to be assigned correctly using
that tool instead of relying on qemu to reserve slot 2 for the igd.

If there is a consensus that this should be fixed in libxl instead of in qemu,
I will work on a patch to libxl, but I will wait a while for some feedback from
the xen people before I do that.

Thanks for your feedback,

Chuck
Chuck Zmudzinski Jan. 3, 2023, 10:58 p.m. UTC | #6
On 1/3/2023 4:50 PM, Chuck Zmudzinski wrote:
> On 1/3/2023 10:14 AM, Alex Williamson wrote:
> > On Mon, 2 Jan 2023 18:10:24 -0500
> > Chuck Zmudzinski <brchuckz@netscape.net> wrote:
> >
> > > On 1/2/23 12:46 PM, Michael S. Tsirkin wrote:
> > > > On Sun, Jan 01, 2023 at 06:52:03PM -0500, Chuck Zmudzinski wrote:  
> > > > > Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus,
> > > > > as noted in docs/igd-assign.txt in the Qemu source code.
> > > > > ... 
> > > > > Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>  
> > > >
> > > > I'm not sure why is the issue xen specific. Can you explain?
> > > > Doesn't it affect kvm too?  
> > > 
> > > Recall from docs/igd-assign.txt that there are two modes for
> > > igd passthrough: legacy and upt, and the igd needs to be
> > > at slot 2 only when using legacy mode which gives one
> > > single guest exclusive access to the Intel igd.
> > > 
> > > It's only xen specific insofar as xen does not have support
> > > for the upt mode so xen must use legacy mode which
> > > requires the igd to be at slot 2. I am not an expert with
> >
> > UPT mode never fully materialized for direct assignment, the folks at
> > Intel championing this scenario left.
>
> Thanks for clarifying that for me.
>
> >
> > > kvm, but if I understand correctly, with kvm one can use
> > > the upt mode with the Intel i915 kvmgt kernel module
> > > and in that case the guest will see a virtual Intel gpu
> > > that can be at any arbitrary slot when using kvmgt, and
> > > also, in that case, more than one guest can access the
> > > igd through the kvmgt kernel module.
> >
> > This is true, IIRC an Intel vGPU does not need to be in slot 2.
> >
> > > Again, I am not an expert and do not have as much
> > > experience with kvm, but if I understand correctly it is
> > > possible to use the legacy mode with kvm and I think you
> > > are correct that if one uses kvm in legacy mode and without
> > > using the Intel i915 kvmgt kernel module, then it would be
> > > necessary to reserve slot 2 for the igd on kvm.
> >
> > It's necessary to configure the assigned IGD at slot 2 to make it
> > functional, yes, but I don't really understand this notion of
> > "reserving" slot 2.  If something occupies address 00:02.0 in the
> > config, it's the user's or management tool's responsibility to move it
> > to make this configuration functional.  Why does QEMU need to play a
> > part in reserving this bus address.  IGD devices are not generally
> > hot-pluggable either, so it doesn't seem we need to reserve an address
> > in case an IGD device is added dynamically later.
>
> As I said in earlier message in this thread, the xenlight toolstack (libxl) that is
> provided as the default toolstack for building xen guests with pci passthrough
> is not the most flexible management tool, and that is why, in the case of xen,
> it is simpler to reserve slot 2 while qemu assigns the slot addresses of the
> qemu emulated pci devices so that the igd can use slot 2. IIRC, In hw/pci/pci.c,
> once the slot value is assigned, it is constant and cannot be changed later on
> by a management tool.
>
> >  
> > > Your question makes me curious, and I have not been able
> > > to determine if anyone has tried igd passthrough using
> > > legacy mode on kvm with recent versions of linux and qemu.
> >
> > Yes, it works.
> >
> > > I will try reproducing the problem on kvm in legacy mode with
> > > current versions of linux and qemu and report my findings.
> > > With kvm, there might be enough flexibility to specify the
> > > slot number for every pci device in the guest. Such a
> >
> > I think this is always the recommendation, libvirt will do this by
> > default in order to make sure the configuration is reproducible.  This
> > is what we generally rely on for kvm/vfio IGD assignment to place the
> > GPU at the correct address.
> >
> > > capability is not available using the xenlight toolstack
> > > for managing xen guests, so I have been using this patch
> > > to ensure that the Intel igd is at slot 2 with xen guests
> > > created by the xenlight toolstack.
> >
> > Seems like a deficiency in xenlight.  I'm not sure why QEMU should take
> > on this burden to support support tool stacks that lack such basic
> > features.
>
> So you would prefer to patch xenlight (libxl) to make it flexible enough to properly
> handle this case of legacy igd passthrough.
>
> >  
> > > The patch as is will only fix the problem on xen, so if the
> > > problem exists on kvm also, I agree that the patch should
> > > be modified to also fix it on kvm.
> >
> > AFAICT, it's not a problem on kvm/vfio because we generally make use of
> > invocations that specify bus addresses for each device by default,
> > making this a configuration requirement for the user or management tool
> > stack.  Thanks,
>
> Unfortunately, and as I mentioned in an earlier message on this thread,
> the xenlight management tool stack (libxl) is not so flexible and does not
> make it so easy for the administrator to specify the bus address of each
> device, and that is why either this patch is needed for igd legacy passtrhough
> on xen, or the libxl management tool needs to be patched so it is flexible
> enough to enable the slot addresses to be assigned correctly using
> that tool instead of relying on qemu to reserve slot 2 for the igd.
>
> If there is a consensus that this should be fixed in libxl instead of in qemu,
> I will work on a patch to libxl, but I will wait a while for some feedback from
> the xen people (Anthony, Paul) before I do that.

Hello Anthony and Paul,

I am requesting your feedback to Alex Williamson's suggestion that this
problem with assigning the correct slot address to the igd on xen should
be fixed in libxl instead of in qemu.

It seems to me that the xen folks and the kvm folks have two different
philosophies regarding how a tool stack should be designed. kvm/libvirt
provides much greater flexibility in configuring the guest which puts
the burden on the administrator to set all the options correctly for
a given feature set, while xen/xenlight does not provide so much
flexibility and tries to automatically configure the guest based on
a high-level feature option such as the igd-passthrough=on option that
is available for xen guests using qemu but not for kvm guests using
qemu.

What do you think? Should libxl be patched instead of fixing the problem
with this patch to qemu, which is contrary to Alex's suggestion?

Thanks in advance,

Chuck
Chuck Zmudzinski Jan. 4, 2023, 8:47 p.m. UTC | #7
On 1/3/23 10:14 AM, Alex Williamson wrote:

> 
> It's necessary to configure the assigned IGD at slot 2 to make it
> functional, yes, but I don't really understand this notion of
> "reserving" slot 2.  If something occupies address 00:02.0 in the
> config, it's the user's or management tool's responsibility to move it
> to make this configuration functional.  Why does QEMU need to play a
> part in reserving this bus address.  IGD devices are not generally
> hot-pluggable either, so it doesn't seem we need to reserve an address
> in case an IGD device is added dynamically later.

The capability to reserve a bus address for a quirky device need not
be limited to the case of hotplugged or dynamically added devices. The
igd is a quirky device, and its presence in an emulated system like
qemu requires special handling. The slot_reserved_mask member of PCIBus
is also well-suited to the case of quirky device like Intel the igd that
needs to be at slot 2. Just because it is not dynamically added later
does not change the fact that it needs special handling at its initial
configuration when the guest is being created.

>  

Here's the problem that answers Michael's question why this patch is
specific to xen:

---snip---
#ifdef CONFIG_XEN

...

static void pc_xen_hvm_init(MachineState *machine)
{
    PCMachineState *pcms = PC_MACHINE(machine);

    if (!xen_enabled()) {
        error_report("xenfv machine requires the xen accelerator");
        exit(1);
    }

    pc_xen_hvm_init_pci(machine);
    pci_create_simple(pcms->bus, -1, "xen-platform");
}
#endif
---snip---

This code is from hw/i386/pc_piix.c. Note the call to
pci_create_simple to create the xen platform pci device,
which has -1 as the second argument. That -1 tells
pci_create_simple to autoconfigure the pci bdf address.

It is *hard-coded* that way. That means no toolstack or
management tool can change it. And what is hard-coded here
is that the xen platform device will occupy slot 2, preventing
the Intel igd or any other device from occupying slot 2.

So, even if xen developers wanted to create a version of the
libxl that is flexible enough to allow the xen platform device
to be at a different slot, they could not without patching
qemu to at least change that -1 to an initialization variable
that can be read from a qemu command line option that libxl
could configure.

So, why not just accept this patch as the best way to deal
with a xen-specific problem and fix it in a way that uses
the xen/libxl philosophy of autoconfiguring things as much as
possible except in cases of quirky devices like the Intel igd
in which case the existing slot_reserved_mask member of PCIBus
is very useful to accommodate the quirky igd device?

IMHO, trying to impose the kvm/libvirt philosophy of having
a very configurable toolstack on the xen/xenlight philosophy
of autoconfiguring things that can be autoconfigured and
using higher-level configuration options like igd-passthrough=on
to tweak how autoconfiguration is done in a way that is compatible
with quirky devices like the Intel igd is like trying to put
a square peg into a round hole. Actually, qemu with its qom is
able to accommodate both approaches to the design of a toolstack,
and each vendor or project that depends on qemu should be free to
use the approach it prefers.

Just my two cents, FWIW.

Kind regards,

Chuck
Chuck Zmudzinski Jan. 5, 2023, 9:20 p.m. UTC | #8
On 1/4/23 3:47 PM, Chuck Zmudzinski wrote:
> On 1/3/23 10:14 AM, Alex Williamson wrote:
> 
>> 
>> It's necessary to configure the assigned IGD at slot 2 to make it
>> functional, yes, but I don't really understand this notion of
>> "reserving" slot 2.  If something occupies address 00:02.0 in the
>> config, it's the user's or management tool's responsibility to move it
>> to make this configuration functional.  Why does QEMU need to play a
>> part in reserving this bus address.  IGD devices are not generally
>> hot-pluggable either, so it doesn't seem we need to reserve an address
>> in case an IGD device is added dynamically later.
> 
> The capability to reserve a bus address for a quirky device need not
> be limited to the case of hotplugged or dynamically added devices. The
> igd is a quirky device, and its presence in an emulated system like
> qemu requires special handling. The slot_reserved_mask member of PCIBus
> is also well-suited to the case of quirky device like Intel the igd that
> needs to be at slot 2. Just because it is not dynamically added later
> does not change the fact that it needs special handling at its initial
> configuration when the guest is being created.
> 
>>  
> 
> Here's the problem that answers Michael's question why this patch is
> specific to xen:
> 
> ---snip---
> #ifdef CONFIG_XEN
> 
> ...
> 
> static void pc_xen_hvm_init(MachineState *machine)
> {
>     PCMachineState *pcms = PC_MACHINE(machine);
> 
>     if (!xen_enabled()) {
>         error_report("xenfv machine requires the xen accelerator");
>         exit(1);
>     }
> 
>     pc_xen_hvm_init_pci(machine);
>     pci_create_simple(pcms->bus, -1, "xen-platform");
> }
> #endif
> ---snip---
> 
> This code is from hw/i386/pc_piix.c. Note the call to
> pci_create_simple to create the xen platform pci device,
> which has -1 as the second argument. That -1 tells
> pci_create_simple to autoconfigure the pci bdf address.
> 
> It is *hard-coded* that way. That means no toolstack or
> management tool can change it. And what is hard-coded here
> is that the xen platform device will occupy slot 2, preventing
> the Intel igd or any other device from occupying slot 2.
> 

Actually, today I discovered it is possible to workaround
this issue with libxl that appears to hard-code the xen
platform device to slot 2.

The code referenced here that effectively hard codes the xen
platform device to slot 2 is only executed with the
'-machine xenfv' qemu option, which is the default that
libxl uses for hvm guests, but this behavior can be changed
by setting xen_platform_pci = '0' in the xl guest configuration
file, in which case libxl configures qemu with the
'-machine pc,accel=xen' option instead, and then one can add
the xen platform device at a different slot by adding, for
example,

device_model_args = ['-device','xen-platform,addr=03']

to the xl guest configuration file which adds the device at
slot 3 instead of slot 2.

This approach is an ugly workaround which has other side effects,
such as by having -machine as pc,accel=xen instead of xenfv, the
machine type is not exactly the same because the xenfv machine
type is based on a much older version of qemu's i440fx emulation
(qemu version 3.1 IIRC), so with this workaround there could be
some unintended side effects, although I just tested this
workaround and it does seem to work, but I have not been using
this approach to the problem for very long.

Also, this approach requires setting type=vif in the vif network
setting of the xl guest configuration to suppress the creation of
the qemu emulated network device that would grab slot 2 if it is
created by the ordinary libxl network configuration settings and,
for guests that do not have the xen pv network drivers installed,
this also would require manually building the qemu command line
using device_model_args to allow adding an emulated qemu network
device at a different slot and probably also manually configuring
the correct networking scripts outside of the normal xen networking
scripts, because the ordinary xl guest configuration options do not
have a setting for assigning an emulated qemu network device to a
specific slot, and the device would otherwise grab slot 2.

So,it is possible to configure the guest to have the Intel igd at slot
2 without patching either libxl or qemu, but it is a real PITA with
some possible unintended side effects, and that is why I think patching
qemu to reserve slot 2 is a much better solution to the problem.

Kind regards,

Chuck
Anthony PERARD Jan. 6, 2023, 10:52 a.m. UTC | #9
On Tue, Jan 03, 2023 at 05:58:01PM -0500, Chuck Zmudzinski wrote:
> Hello Anthony and Paul,

Hi Chuck,

> I am requesting your feedback to Alex Williamson's suggestion that this
> problem with assigning the correct slot address to the igd on xen should
> be fixed in libxl instead of in qemu.
> 
> It seems to me that the xen folks and the kvm folks have two different
> philosophies regarding how a tool stack should be designed. kvm/libvirt
> provides much greater flexibility in configuring the guest which puts
> the burden on the administrator to set all the options correctly for
> a given feature set, while xen/xenlight does not provide so much
> flexibility and tries to automatically configure the guest based on
> a high-level feature option such as the igd-passthrough=on option that
> is available for xen guests using qemu but not for kvm guests using
> qemu.
> 
> What do you think? Should libxl be patched instead of fixing the problem
> with this patch to qemu, which is contrary to Alex's suggestion?

I do think that libxl should be able to deal with having to put a
graphic card on slot 2. QEMU already provides every API necessary for a
toolstack to be able to start a Xen guest with all the PCI card in the
right slot. But it would just be a bit more complicated to implement in
libxl.

At the moment, libxl makes use of the QEMU machine 'xenfv', libxl should
instead start to use the 'pc' machine and add the "xen-platform" pci
device. (libxl already uses 'pc' when the "xen-platform" pci card isn't
needed.) Also probably add the other pci devices to specific slot to be
able to add the passthrough graphic card at the right slot.

Next is to deal with migration when using the 'pc' machine, as it's just
an alias to a specific version of the machine. We need to use the same
machine on the receiving end, that is start with e.g. "pc-i440fx-7.1" if
'pc' was an alias for it at guest creation.


I wonder if we can already avoid to patch the 'xenfv' machine with some
xl config:
    # avoid 'xenfv' machine and use 'pc' instead
    xen_platform_pci=0
    # add xen-platform pci device back
    device_model_args_hvm = [
        "-device", "xen-platform,addr=3",
    ]
But there's probably another device which is going to be auto-assigned
to slot 2.


If you feel like dealing with the technical dept in libxl, that is to
stop using 'xenfv' and use 'pc' instead, then go for it, I can help with
that. Otherwise, if the patch to QEMU only changes the behavior of the
'xenfv' machine then I think I would be ok with it.

I'll do a review of that QEMU patch in another email.

Cheers,
Michael S. Tsirkin Jan. 6, 2023, 1 p.m. UTC | #10
On Sun, Jan 01, 2023 at 06:52:03PM -0500, Chuck Zmudzinski wrote:
> Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus,
> as noted in docs/igd-assign.txt in the Qemu source code.
> 
> Currently, when the xl toolstack is used to configure a Xen HVM guest with
> Intel IGD passthrough to the guest with the Qemu upstream device model,
> a Qemu emulated PCI device will occupy slot 2 and the Intel IGD will occupy
> a different slot. This problem often prevents the guest from booting.
> 
> The only available workaround is not good: Configure Xen HVM guests to use
> the old and no longer maintained Qemu traditional device model available
> from xenbits.xen.org which does reserve slot 2 for the Intel IGD.
> 
> To implement this feature in the Qemu upstream device model for Xen HVM
> guests, introduce the following new functions, types, and macros:
> 
> * XEN_PT_DEVICE_CLASS declaration, based on the existing TYPE_XEN_PT_DEVICE
> * XEN_PT_DEVICE_GET_CLASS macro helper function for XEN_PT_DEVICE_CLASS
> * typedef XenPTQdevRealize function pointer
> * XEN_PCI_IGD_SLOT_MASK, the value of slot_reserved_mask to reserve slot 2
> * xen_igd_reserve_slot and xen_igd_clear_slot functions
> 
> The new xen_igd_reserve_slot function uses the existing slot_reserved_mask
> member of PCIBus to reserve PCI slot 2 for Xen HVM guests configured using
> the xl toolstack with the gfx_passthru option enabled, which sets the
> igd-passthru=on option to Qemu for the Xen HVM machine type.
> 
> The new xen_igd_reserve_slot function also needs to be implemented in
> hw/xen/xen_pt_stub.c to prevent FTBFS during the link stage for the case
> when Qemu is configured with --enable-xen and --disable-xen-pci-passthrough,
> in which case it does nothing.
> 
> The new xen_igd_clear_slot function overrides qdev->realize of the parent
> PCI device class to enable the Intel IGD to occupy slot 2 on the PCI bus
> since slot 2 was reserved by xen_igd_reserve_slot when the PCI bus was
> created in hw/i386/pc_piix.c for the case when igd-passthru=on.
> 
> Move the call to xen_host_pci_device_get, and the associated error
> handling, from xen_pt_realize to the new xen_igd_clear_slot function to
> initialize the device class and vendor values which enables the checks for
> the Intel IGD to succeed. The verification that the host device is an
> Intel IGD to be passed through is done by checking the domain, bus, slot,
> and function values as well as by checking that gfx_passthru is enabled,
> the device class is VGA, and the device vendor in Intel.
> 
> Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>

If we are to make changes, something that might be generally
useful is a new mask along the lines of slot_reserved_mask,
however
	- only affecting auto-allocated addresses
	- controllable from a command line property

this way one could say "don't allocate any devices to slot 2"
and later "put igd device in slot 2".
And, xenpv machine could set defaults for these using the
compat machinery.


> ---
> Notes that might be helpful to reviewers of patched code in hw/xen:
> 
> The new functions and types are based on recommendations from Qemu docs:
> https://qemu.readthedocs.io/en/latest/devel/qom.html
> 
> Notes that might be helpful to reviewers of patched code in hw/i386:
> 
> The small patch to hw/i386/pc_piix.c is protected by CONFIG_XEN so it does
> not affect builds that do not have CONFIG_XEN defined.
> 
> xen_igd_gfx_pt_enabled() in the patched hw/i386/pc_piix.c file is an
> existing function that is only true when Qemu is built with
> xen-pci-passthrough enabled and the administrator has configured the Xen
> HVM guest with Qemu's igd-passthru=on option.
> 
> v2: Remove From: <email address> tag at top of commit message
> 
> v3: Changed the test for the Intel IGD in xen_igd_clear_slot:
> 
>     if (is_igd_vga_passthrough(&s->real_device) &&
>         (s->real_device.vendor_id == PCI_VENDOR_ID_INTEL)) {
> 
>     is changed to
> 
>     if (xen_igd_gfx_pt_enabled() && (s->hostaddr.slot == 2)
>         && (s->hostaddr.function == 0)) {
> 
>     I hoped that I could use the test in v2, since it matches the
>     other tests for the Intel IGD in Qemu and Xen, but those tests
>     do not work because the necessary data structures are not set with
>     their values yet. So instead use the test that the administrator
>     has enabled gfx_passthru and the device address on the host is
>     02.0. This test does detect the Intel IGD correctly.
> 
> v4: Use brchuckz@aol.com instead of brchuckz@netscape.net for the author's
>     email address to match the address used by the same author in commits
>     be9c61da and c0e86b76
>     
>     Change variable for XEN_PT_DEVICE_CLASS: xptc changed to xpdc
> 
> v5: The patch of xen_pt.c was re-worked to allow a more consistent test
>     for the Intel IGD that uses the same criteria as in other places.
>     This involved moving the call to xen_host_pci_device_get from
>     xen_pt_realize to xen_igd_clear_slot and updating the checks for the
>     Intel IGD in xen_igd_clear_slot:
>     
>     if (xen_igd_gfx_pt_enabled() && (s->hostaddr.slot == 2)
>         && (s->hostaddr.function == 0)) {
> 
>     is changed to
> 
>     if (is_igd_vga_passthrough(&s->real_device) &&
>         s->real_device.domain == 0 && s->real_device.bus == 0 &&
>         s->real_device.dev == 2 && s->real_device.func == 0 &&
>         s->real_device.vendor_id == PCI_VENDOR_ID_INTEL) {
> 
>     Added an explanation for the move of xen_host_pci_device_get from
>     xen_pt_realize to xen_igd_clear_slot to the commit message.
> 
>     Rebase.
> 
> v6: Fix logging by removing these lines from the move from xen_pt_realize
>     to xen_igd_clear_slot that was done in v5:
> 
>     XEN_PT_LOG(d, "Assigning real physical device %02x:%02x.%d"
>                " to devfn 0x%x\n",
>                s->hostaddr.bus, s->hostaddr.slot, s->hostaddr.function,
>                s->dev.devfn);
> 
>     This log needs to be in xen_pt_realize because s->dev.devfn is not
>     set yet in xen_igd_clear_slot.
> 
>     Sorry for the extra noise.
> 
>  hw/i386/pc_piix.c    |  3 +++
>  hw/xen/xen_pt.c      | 46 +++++++++++++++++++++++++++++++++++---------
>  hw/xen/xen_pt.h      | 16 +++++++++++++++
>  hw/xen/xen_pt_stub.c |  4 ++++
>  4 files changed, 60 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index b48047f50c..bc5efa4f59 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -405,6 +405,9 @@ static void pc_xen_hvm_init(MachineState *machine)
>      }
>  
>      pc_xen_hvm_init_pci(machine);
> +    if (xen_igd_gfx_pt_enabled()) {
> +        xen_igd_reserve_slot(pcms->bus);
> +    }
>      pci_create_simple(pcms->bus, -1, "xen-platform");
>  }
>  #endif
> diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
> index 0ec7e52183..7fae1e7a6f 100644
> --- a/hw/xen/xen_pt.c
> +++ b/hw/xen/xen_pt.c
> @@ -780,15 +780,6 @@ static void xen_pt_realize(PCIDevice *d, Error **errp)
>                 s->hostaddr.bus, s->hostaddr.slot, s->hostaddr.function,
>                 s->dev.devfn);
>  
> -    xen_host_pci_device_get(&s->real_device,
> -                            s->hostaddr.domain, s->hostaddr.bus,
> -                            s->hostaddr.slot, s->hostaddr.function,
> -                            errp);
> -    if (*errp) {
> -        error_append_hint(errp, "Failed to \"open\" the real pci device");
> -        return;
> -    }
> -
>      s->is_virtfn = s->real_device.is_virtfn;
>      if (s->is_virtfn) {
>          XEN_PT_LOG(d, "%04x:%02x:%02x.%d is a SR-IOV Virtual Function\n",
> @@ -950,11 +941,47 @@ static void xen_pci_passthrough_instance_init(Object *obj)
>      PCI_DEVICE(obj)->cap_present |= QEMU_PCI_CAP_EXPRESS;
>  }
>  
> +void xen_igd_reserve_slot(PCIBus *pci_bus)
> +{
> +    XEN_PT_LOG(0, "Reserving PCI slot 2 for IGD\n");
> +    pci_bus->slot_reserved_mask |= XEN_PCI_IGD_SLOT_MASK;
> +}
> +
> +static void xen_igd_clear_slot(DeviceState *qdev, Error **errp)
> +{
> +    ERRP_GUARD();
> +    PCIDevice *pci_dev = (PCIDevice *)qdev;
> +    XenPCIPassthroughState *s = XEN_PT_DEVICE(pci_dev);
> +    XenPTDeviceClass *xpdc = XEN_PT_DEVICE_GET_CLASS(s);
> +    PCIBus *pci_bus = pci_get_bus(pci_dev);
> +
> +    xen_host_pci_device_get(&s->real_device,
> +                            s->hostaddr.domain, s->hostaddr.bus,
> +                            s->hostaddr.slot, s->hostaddr.function,
> +                            errp);
> +    if (*errp) {
> +        error_append_hint(errp, "Failed to \"open\" the real pci device");
> +        return;
> +    }
> +
> +    if (is_igd_vga_passthrough(&s->real_device) &&
> +        s->real_device.domain == 0 && s->real_device.bus == 0 &&
> +        s->real_device.dev == 2 && s->real_device.func == 0 &&
> +        s->real_device.vendor_id == PCI_VENDOR_ID_INTEL) {
> +        pci_bus->slot_reserved_mask &= ~XEN_PCI_IGD_SLOT_MASK;
> +        XEN_PT_LOG(pci_dev, "Intel IGD found, using slot 2\n");
> +    }
> +    xpdc->pci_qdev_realize(qdev, errp);
> +}
> +
>  static void xen_pci_passthrough_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
>      PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>  
> +    XenPTDeviceClass *xpdc = XEN_PT_DEVICE_CLASS(klass);
> +    xpdc->pci_qdev_realize = dc->realize;
> +    dc->realize = xen_igd_clear_slot;
>      k->realize = xen_pt_realize;
>      k->exit = xen_pt_unregister_device;
>      k->config_read = xen_pt_pci_read_config;
> @@ -977,6 +1004,7 @@ static const TypeInfo xen_pci_passthrough_info = {
>      .instance_size = sizeof(XenPCIPassthroughState),
>      .instance_finalize = xen_pci_passthrough_finalize,
>      .class_init = xen_pci_passthrough_class_init,
> +    .class_size = sizeof(XenPTDeviceClass),
>      .instance_init = xen_pci_passthrough_instance_init,
>      .interfaces = (InterfaceInfo[]) {
>          { INTERFACE_CONVENTIONAL_PCI_DEVICE },
> diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
> index e7c4316a7d..40b31b5263 100644
> --- a/hw/xen/xen_pt.h
> +++ b/hw/xen/xen_pt.h
> @@ -3,6 +3,7 @@
>  
>  #include "hw/xen/xen_common.h"
>  #include "hw/pci/pci.h"
> +#include "hw/pci/pci_bus.h"
>  #include "xen-host-pci-device.h"
>  #include "qom/object.h"
>  
> @@ -41,7 +42,20 @@ typedef struct XenPTReg XenPTReg;
>  #define TYPE_XEN_PT_DEVICE "xen-pci-passthrough"
>  OBJECT_DECLARE_SIMPLE_TYPE(XenPCIPassthroughState, XEN_PT_DEVICE)
>  
> +#define XEN_PT_DEVICE_CLASS(klass) \
> +    OBJECT_CLASS_CHECK(XenPTDeviceClass, klass, TYPE_XEN_PT_DEVICE)
> +#define XEN_PT_DEVICE_GET_CLASS(obj) \
> +    OBJECT_GET_CLASS(XenPTDeviceClass, obj, TYPE_XEN_PT_DEVICE)
> +
> +typedef void (*XenPTQdevRealize)(DeviceState *qdev, Error **errp);
> +
> +typedef struct XenPTDeviceClass {
> +    PCIDeviceClass parent_class;
> +    XenPTQdevRealize pci_qdev_realize;
> +} XenPTDeviceClass;
> +
>  uint32_t igd_read_opregion(XenPCIPassthroughState *s);
> +void xen_igd_reserve_slot(PCIBus *pci_bus);
>  void igd_write_opregion(XenPCIPassthroughState *s, uint32_t val);
>  void xen_igd_passthrough_isa_bridge_create(XenPCIPassthroughState *s,
>                                             XenHostPCIDevice *dev);
> @@ -76,6 +90,8 @@ typedef int (*xen_pt_conf_byte_read)
>  
>  #define XEN_PCI_INTEL_OPREGION 0xfc
>  
> +#define XEN_PCI_IGD_SLOT_MASK 0x4UL /* Intel IGD slot_reserved_mask */
> +
>  typedef enum {
>      XEN_PT_GRP_TYPE_HARDWIRED = 0,  /* 0 Hardwired reg group */
>      XEN_PT_GRP_TYPE_EMU,            /* emul reg group */
> diff --git a/hw/xen/xen_pt_stub.c b/hw/xen/xen_pt_stub.c
> index 2d8cac8d54..5c108446a8 100644
> --- a/hw/xen/xen_pt_stub.c
> +++ b/hw/xen/xen_pt_stub.c
> @@ -20,3 +20,7 @@ void xen_igd_gfx_pt_set(bool value, Error **errp)
>          error_setg(errp, "Xen PCI passthrough support not built in");
>      }
>  }
> +
> +void xen_igd_reserve_slot(PCIBus *pci_bus)
> +{
> +}
> -- 
> 2.39.0
Chuck Zmudzinski Jan. 6, 2023, 2:02 p.m. UTC | #11
On 1/6/23 5:52 AM, Anthony PERARD wrote:
> On Tue, Jan 03, 2023 at 05:58:01PM -0500, Chuck Zmudzinski wrote:
>> Hello Anthony and Paul,
> 
> Hi Chuck,
> 
>> I am requesting your feedback to Alex Williamson's suggestion that this
>> problem with assigning the correct slot address to the igd on xen should
>> be fixed in libxl instead of in qemu.
>> 
>> It seems to me that the xen folks and the kvm folks have two different
>> philosophies regarding how a tool stack should be designed. kvm/libvirt
>> provides much greater flexibility in configuring the guest which puts
>> the burden on the administrator to set all the options correctly for
>> a given feature set, while xen/xenlight does not provide so much
>> flexibility and tries to automatically configure the guest based on
>> a high-level feature option such as the igd-passthrough=on option that
>> is available for xen guests using qemu but not for kvm guests using
>> qemu.
>> 
>> What do you think? Should libxl be patched instead of fixing the problem
>> with this patch to qemu, which is contrary to Alex's suggestion?
> 
> I do think that libxl should be able to deal with having to put a
> graphic card on slot 2. QEMU already provides every API necessary for a
> toolstack to be able to start a Xen guest with all the PCI card in the
> right slot. But it would just be a bit more complicated to implement in
> libxl.
> 
> At the moment, libxl makes use of the QEMU machine 'xenfv', libxl should
> instead start to use the 'pc' machine and add the "xen-platform" pci
> device. (libxl already uses 'pc' when the "xen-platform" pci card isn't
> needed.) Also probably add the other pci devices to specific slot to be
> able to add the passthrough graphic card at the right slot.
> 
> Next is to deal with migration when using the 'pc' machine, as it's just
> an alias to a specific version of the machine. We need to use the same
> machine on the receiving end, that is start with e.g. "pc-i440fx-7.1" if
> 'pc' was an alias for it at guest creation.
> 
> 
> I wonder if we can already avoid to patch the 'xenfv' machine with some
> xl config:
>     # avoid 'xenfv' machine and use 'pc' instead
>     xen_platform_pci=0
>     # add xen-platform pci device back
>     device_model_args_hvm = [
>         "-device", "xen-platform,addr=3",
>     ]
> But there's probably another device which is going to be auto-assigned
> to slot 2.
> 
> 
> If you feel like dealing with the technical dept in libxl, that is to
> stop using 'xenfv' and use 'pc' instead, then go for it, I can help with
> that. Otherwise, if the patch to QEMU only changes the behavior of the
> 'xenfv' machine then I think I would be ok with it.
> 
> I'll do a review of that QEMU patch in another email.
> 
> Cheers,
> 

Hello Anthony,

Thanks for responding!

The first part of my v6 of the patch only affects the xenfv
machine. Guests created with the pc machine type will not call
the new function that reserves slot 2 for the igd because that
function is only called when the machine type is xenfv (or xenfv-4.2).
But the new functions I added to configure the TYPE_XEN_PT_DEVICE
when igd-passthru=on will be called to check if the device is an
Intel igd and clear the slot if it is, but this will not have any
effect on the behavior in this case because the slot was never
reserved. Still, this would add some unnecessary processing in the
case of machines other than xenfv, which is undesirable.

So I can add a check for the machine type to a v7 of the patch
that will skip the new functions that clear the reserved slot if
slot 2 is not reserved and therefore does not need to be cleared.

Would that be OK?

Kind regards,

Chuck
Anthony PERARD Jan. 6, 2023, 2:03 p.m. UTC | #12
On Sun, Jan 01, 2023 at 06:52:03PM -0500, Chuck Zmudzinski wrote:
> Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus,
> as noted in docs/igd-assign.txt in the Qemu source code.
> 
> Currently, when the xl toolstack is used to configure a Xen HVM guest with
> Intel IGD passthrough to the guest with the Qemu upstream device model,
> a Qemu emulated PCI device will occupy slot 2 and the Intel IGD will occupy
> a different slot. This problem often prevents the guest from booting.
> 
> The only available workaround is not good: Configure Xen HVM guests to use
> the old and no longer maintained Qemu traditional device model available
> from xenbits.xen.org which does reserve slot 2 for the Intel IGD.
> 
> To implement this feature in the Qemu upstream device model for Xen HVM
> guests, introduce the following new functions, types, and macros:
> 
> * XEN_PT_DEVICE_CLASS declaration, based on the existing TYPE_XEN_PT_DEVICE
> * XEN_PT_DEVICE_GET_CLASS macro helper function for XEN_PT_DEVICE_CLASS
> * typedef XenPTQdevRealize function pointer
> * XEN_PCI_IGD_SLOT_MASK, the value of slot_reserved_mask to reserve slot 2
> * xen_igd_reserve_slot and xen_igd_clear_slot functions
> 
> The new xen_igd_reserve_slot function uses the existing slot_reserved_mask
> member of PCIBus to reserve PCI slot 2 for Xen HVM guests configured using
> the xl toolstack with the gfx_passthru option enabled, which sets the
> igd-passthru=on option to Qemu for the Xen HVM machine type.
> 
> The new xen_igd_reserve_slot function also needs to be implemented in
> hw/xen/xen_pt_stub.c to prevent FTBFS during the link stage for the case
> when Qemu is configured with --enable-xen and --disable-xen-pci-passthrough,
> in which case it does nothing.
> 
> The new xen_igd_clear_slot function overrides qdev->realize of the parent
> PCI device class to enable the Intel IGD to occupy slot 2 on the PCI bus
> since slot 2 was reserved by xen_igd_reserve_slot when the PCI bus was
> created in hw/i386/pc_piix.c for the case when igd-passthru=on.
> 
> Move the call to xen_host_pci_device_get, and the associated error
> handling, from xen_pt_realize to the new xen_igd_clear_slot function to
> initialize the device class and vendor values which enables the checks for
> the Intel IGD to succeed. The verification that the host device is an
> Intel IGD to be passed through is done by checking the domain, bus, slot,
> and function values as well as by checking that gfx_passthru is enabled,
> the device class is VGA, and the device vendor in Intel.
> 
> Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>


This patch looks good enough. It only changes the "xenfv" machine so it
doesn't prevent a proper fix to be done in the toolstack libxl.

The change in xen_pci_passthrough_class_init() to try to run some code
before pci_qdev_realize() could potentially break in the future due to
been uncommon but hopefully that will be ok.

So if no work to fix libxl appear soon, I'm ok with this patch:

Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>

Thanks,
Chuck Zmudzinski Jan. 6, 2023, 2:10 p.m. UTC | #13
On 1/6/23 9:03 AM, Anthony PERARD wrote:
> On Sun, Jan 01, 2023 at 06:52:03PM -0500, Chuck Zmudzinski wrote:
>> Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus,
>> as noted in docs/igd-assign.txt in the Qemu source code.
>> 
>> Currently, when the xl toolstack is used to configure a Xen HVM guest with
>> Intel IGD passthrough to the guest with the Qemu upstream device model,
>> a Qemu emulated PCI device will occupy slot 2 and the Intel IGD will occupy
>> a different slot. This problem often prevents the guest from booting.
>> 
>> The only available workaround is not good: Configure Xen HVM guests to use
>> the old and no longer maintained Qemu traditional device model available
>> from xenbits.xen.org which does reserve slot 2 for the Intel IGD.
>> 
>> To implement this feature in the Qemu upstream device model for Xen HVM
>> guests, introduce the following new functions, types, and macros:
>> 
>> * XEN_PT_DEVICE_CLASS declaration, based on the existing TYPE_XEN_PT_DEVICE
>> * XEN_PT_DEVICE_GET_CLASS macro helper function for XEN_PT_DEVICE_CLASS
>> * typedef XenPTQdevRealize function pointer
>> * XEN_PCI_IGD_SLOT_MASK, the value of slot_reserved_mask to reserve slot 2
>> * xen_igd_reserve_slot and xen_igd_clear_slot functions
>> 
>> The new xen_igd_reserve_slot function uses the existing slot_reserved_mask
>> member of PCIBus to reserve PCI slot 2 for Xen HVM guests configured using
>> the xl toolstack with the gfx_passthru option enabled, which sets the
>> igd-passthru=on option to Qemu for the Xen HVM machine type.
>> 
>> The new xen_igd_reserve_slot function also needs to be implemented in
>> hw/xen/xen_pt_stub.c to prevent FTBFS during the link stage for the case
>> when Qemu is configured with --enable-xen and --disable-xen-pci-passthrough,
>> in which case it does nothing.
>> 
>> The new xen_igd_clear_slot function overrides qdev->realize of the parent
>> PCI device class to enable the Intel IGD to occupy slot 2 on the PCI bus
>> since slot 2 was reserved by xen_igd_reserve_slot when the PCI bus was
>> created in hw/i386/pc_piix.c for the case when igd-passthru=on.
>> 
>> Move the call to xen_host_pci_device_get, and the associated error
>> handling, from xen_pt_realize to the new xen_igd_clear_slot function to
>> initialize the device class and vendor values which enables the checks for
>> the Intel IGD to succeed. The verification that the host device is an
>> Intel IGD to be passed through is done by checking the domain, bus, slot,
>> and function values as well as by checking that gfx_passthru is enabled,
>> the device class is VGA, and the device vendor in Intel.
>> 
>> Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
> 
> 
> This patch looks good enough. It only changes the "xenfv" machine so it
> doesn't prevent a proper fix to be done in the toolstack libxl.
> 
> The change in xen_pci_passthrough_class_init() to try to run some code
> before pci_qdev_realize() could potentially break in the future due to
> been uncommon but hopefully that will be ok.
> 
> So if no work to fix libxl appear soon, I'm ok with this patch:
> 
> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
> 
> Thanks,
> 

Well, our messages almost collided! I just proposed a v7 that adds
a check to prevent the extra processing for cases when machine is
not xenfv and the slot does not need to be cleared because it was
never reserved. The proposed v7 would not change the behavior of the
patch at all but it would avoid some unnecessary processing. Do you
want me to submit that v7?

Chuck
Chuck Zmudzinski Jan. 6, 2023, 2:31 p.m. UTC | #14
On 1/6/23 9:10 AM, Chuck Zmudzinski wrote:
> On 1/6/23 9:03 AM, Anthony PERARD wrote:
>> On Sun, Jan 01, 2023 at 06:52:03PM -0500, Chuck Zmudzinski wrote:
>>> Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus,
>>> as noted in docs/igd-assign.txt in the Qemu source code.
>>> 
>>> Currently, when the xl toolstack is used to configure a Xen HVM guest with
>>> Intel IGD passthrough to the guest with the Qemu upstream device model,
>>> a Qemu emulated PCI device will occupy slot 2 and the Intel IGD will occupy
>>> a different slot. This problem often prevents the guest from booting.
>>> 
>>> The only available workaround is not good: Configure Xen HVM guests to use
>>> the old and no longer maintained Qemu traditional device model available
>>> from xenbits.xen.org which does reserve slot 2 for the Intel IGD.
>>> 
>>> To implement this feature in the Qemu upstream device model for Xen HVM
>>> guests, introduce the following new functions, types, and macros:
>>> 
>>> * XEN_PT_DEVICE_CLASS declaration, based on the existing TYPE_XEN_PT_DEVICE
>>> * XEN_PT_DEVICE_GET_CLASS macro helper function for XEN_PT_DEVICE_CLASS
>>> * typedef XenPTQdevRealize function pointer
>>> * XEN_PCI_IGD_SLOT_MASK, the value of slot_reserved_mask to reserve slot 2
>>> * xen_igd_reserve_slot and xen_igd_clear_slot functions
>>> 
>>> The new xen_igd_reserve_slot function uses the existing slot_reserved_mask
>>> member of PCIBus to reserve PCI slot 2 for Xen HVM guests configured using
>>> the xl toolstack with the gfx_passthru option enabled, which sets the
>>> igd-passthru=on option to Qemu for the Xen HVM machine type.
>>> 
>>> The new xen_igd_reserve_slot function also needs to be implemented in
>>> hw/xen/xen_pt_stub.c to prevent FTBFS during the link stage for the case
>>> when Qemu is configured with --enable-xen and --disable-xen-pci-passthrough,
>>> in which case it does nothing.
>>> 
>>> The new xen_igd_clear_slot function overrides qdev->realize of the parent
>>> PCI device class to enable the Intel IGD to occupy slot 2 on the PCI bus
>>> since slot 2 was reserved by xen_igd_reserve_slot when the PCI bus was
>>> created in hw/i386/pc_piix.c for the case when igd-passthru=on.
>>> 
>>> Move the call to xen_host_pci_device_get, and the associated error
>>> handling, from xen_pt_realize to the new xen_igd_clear_slot function to
>>> initialize the device class and vendor values which enables the checks for
>>> the Intel IGD to succeed. The verification that the host device is an
>>> Intel IGD to be passed through is done by checking the domain, bus, slot,
>>> and function values as well as by checking that gfx_passthru is enabled,
>>> the device class is VGA, and the device vendor in Intel.
>>> 
>>> Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
>> 
>> 
>> This patch looks good enough. It only changes the "xenfv" machine so it
>> doesn't prevent a proper fix to be done in the toolstack libxl.
>> 
>> The change in xen_pci_passthrough_class_init() to try to run some code
>> before pci_qdev_realize() could potentially break in the future due to
>> been uncommon but hopefully that will be ok.
>> 
>> So if no work to fix libxl appear soon, I'm ok with this patch:

Well, I can tell you and others who use qemu are more comfortable
fixing this in libxl, so hold off for a week or so. I should have
a patch to fix this in libxl written and tested by then. If for
some reason that does not work out, then we can fix it in qemu.

Cheers,

Chuck

>> 
>> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
>> 
>> Thanks,
>>
Anthony PERARD Jan. 6, 2023, 2:42 p.m. UTC | #15
On Fri, Jan 06, 2023 at 09:10:55AM -0500, Chuck Zmudzinski wrote:
> Well, our messages almost collided! I just proposed a v7 that adds
> a check to prevent the extra processing for cases when machine is
> not xenfv and the slot does not need to be cleared because it was
> never reserved. The proposed v7 would not change the behavior of the
> patch at all but it would avoid some unnecessary processing. Do you
> want me to submit that v7?

Well, preventing an simple assignment and a message from getting logged
isn't going to get us far. On the other end, the message "using slot 2"
when we don't even if slot 2 is actually going to be used could be
confusing, so I guess preventing that message from been logged could be
useful indeed.

So your proposed v7 would be fine.

Cheers,
Chuck Zmudzinski Jan. 6, 2023, 3:02 p.m. UTC | #16
On 1/6/23 9:31 AM, Chuck Zmudzinski wrote:
> On 1/6/23 9:10 AM, Chuck Zmudzinski wrote:
>> On 1/6/23 9:03 AM, Anthony PERARD wrote:
>>> On Sun, Jan 01, 2023 at 06:52:03PM -0500, Chuck Zmudzinski wrote:
>>>> Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus,
>>>> as noted in docs/igd-assign.txt in the Qemu source code.
>>>> 
>>>> Currently, when the xl toolstack is used to configure a Xen HVM guest with
>>>> Intel IGD passthrough to the guest with the Qemu upstream device model,
>>>> a Qemu emulated PCI device will occupy slot 2 and the Intel IGD will occupy
>>>> a different slot. This problem often prevents the guest from booting.
>>>> 
>>>> The only available workaround is not good: Configure Xen HVM guests to use
>>>> the old and no longer maintained Qemu traditional device model available
>>>> from xenbits.xen.org which does reserve slot 2 for the Intel IGD.
>>>> 
>>>> To implement this feature in the Qemu upstream device model for Xen HVM
>>>> guests, introduce the following new functions, types, and macros:
>>>> 
>>>> * XEN_PT_DEVICE_CLASS declaration, based on the existing TYPE_XEN_PT_DEVICE
>>>> * XEN_PT_DEVICE_GET_CLASS macro helper function for XEN_PT_DEVICE_CLASS
>>>> * typedef XenPTQdevRealize function pointer
>>>> * XEN_PCI_IGD_SLOT_MASK, the value of slot_reserved_mask to reserve slot 2
>>>> * xen_igd_reserve_slot and xen_igd_clear_slot functions
>>>> 
>>>> The new xen_igd_reserve_slot function uses the existing slot_reserved_mask
>>>> member of PCIBus to reserve PCI slot 2 for Xen HVM guests configured using
>>>> the xl toolstack with the gfx_passthru option enabled, which sets the
>>>> igd-passthru=on option to Qemu for the Xen HVM machine type.
>>>> 
>>>> The new xen_igd_reserve_slot function also needs to be implemented in
>>>> hw/xen/xen_pt_stub.c to prevent FTBFS during the link stage for the case
>>>> when Qemu is configured with --enable-xen and --disable-xen-pci-passthrough,
>>>> in which case it does nothing.
>>>> 
>>>> The new xen_igd_clear_slot function overrides qdev->realize of the parent
>>>> PCI device class to enable the Intel IGD to occupy slot 2 on the PCI bus
>>>> since slot 2 was reserved by xen_igd_reserve_slot when the PCI bus was
>>>> created in hw/i386/pc_piix.c for the case when igd-passthru=on.
>>>> 
>>>> Move the call to xen_host_pci_device_get, and the associated error
>>>> handling, from xen_pt_realize to the new xen_igd_clear_slot function to
>>>> initialize the device class and vendor values which enables the checks for
>>>> the Intel IGD to succeed. The verification that the host device is an
>>>> Intel IGD to be passed through is done by checking the domain, bus, slot,
>>>> and function values as well as by checking that gfx_passthru is enabled,
>>>> the device class is VGA, and the device vendor in Intel.
>>>> 
>>>> Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
>>> 
>>> 
>>> This patch looks good enough. It only changes the "xenfv" machine so it
>>> doesn't prevent a proper fix to be done in the toolstack libxl.
>>> 
>>> The change in xen_pci_passthrough_class_init() to try to run some code
>>> before pci_qdev_realize() could potentially break in the future due to
>>> been uncommon but hopefully that will be ok.
>>> 
>>> So if no work to fix libxl appear soon, I'm ok with this patch:
> 
> Well, I can tell you and others who use qemu are more comfortable
> fixing this in libxl, so hold off for a week or so. I should have
> a patch to fix this in libxl written and tested by then. If for
> some reason that does not work out, then we can fix it in qemu.

One last thought: the only donwnside to fixing this in libxl is that
other toolstacks that configure qemu to use the xenfv machine will not
benefit from the fix in qemu that would simplify configuring the
guest correctly for the igd. Other toolstacks would still need to
override the default behavior of adding the xen platform device at
slot 2. I think no matter what, we should at least patch qemu to have
the xen-platform device use slot 3 instead of being automatically assigned
to slot 2 when igd-passthru=on. The rest of the fix could then be
implemented in libxl so that other pci devices such as emulated network
devices, other passed through pci devices, etc., do not take slot 2 when
gfx_passthru in xl.cfg is set.

So, unless I hear any objection, my plan is to patch qemu to use slot
3 for the xen platform device when igd-passthru is on, and implement the
rest of the fix in libxl. I should have it ready within a week.

Thanks for your help,

Chuck
Chuck Zmudzinski Jan. 8, 2023, 4:01 p.m. UTC | #17
On 1/6/2023 10:02 AM, Chuck Zmudzinski wrote:
> On 1/6/23 9:31 AM, Chuck Zmudzinski wrote:
> > On 1/6/23 9:10 AM, Chuck Zmudzinski wrote:
> >> On 1/6/23 9:03 AM, Anthony PERARD wrote:
> >>> On Sun, Jan 01, 2023 at 06:52:03PM -0500, Chuck Zmudzinski wrote:
> >>>> ...
> >>>> 
> >>>> Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
> >>> 
> >>> 
> >>> This patch looks good enough. It only changes the "xenfv" machine so it
> >>> doesn't prevent a proper fix to be done in the toolstack libxl.
> >>> 
> >>> The change in xen_pci_passthrough_class_init() to try to run some code
> >>> before pci_qdev_realize() could potentially break in the future due to
> >>> been uncommon but hopefully that will be ok.
> >>> 
> >>> So if no work to fix libxl appear soon, I'm ok with this patch:

I have a patch that fixes it in libxl. It still needs a few tweaks before it is
ready for submission, but I plan to do that soon, perhaps later today or
tomorrow at the latest.

> > 
> > Well, I can tell you and others who use qemu are more comfortable
> > fixing this in libxl, so hold off for a week or so. I should have
> > a patch to fix this in libxl written and tested by then. If for
> > some reason that does not work out, then we can fix it in qemu.
>
> One last thought: the only donwnside to fixing this in libxl is that
> other toolstacks that configure qemu to use the xenfv machine will not
> benefit from the fix in qemu that would simplify configuring the
> guest correctly for the igd. Other toolstacks would still need to
> override the default behavior of adding the xen platform device at
> slot 2. I think no matter what, we should at least patch qemu to have
> the xen-platform device use slot 3 instead of being automatically assigned
> to slot 2 when igd-passthru=on. The rest of the fix could then be
> implemented in libxl so that other pci devices such as emulated network
> devices, other passed through pci devices, etc., do not take slot 2 when
> gfx_passthru in xl.cfg is set.

I decided to write the patch to libxl to fix this presuming no
changes to qemu. I think dealing with the "qemu behaves
differently starting from version 8 problem" is more trouble
that it's worth, so I am OK with implementing the fix completely
in libxl, which means libxl will now use the "pc" machine type
when igd-passthru=on and xen_platform_pci is true, but my patch
to libxl will still use the "xenfv" machine when xen_platform_pci
is true and igd-passthru is disabled.

Cheers,

Chuck
Chuck Zmudzinski Jan. 21, 2023, 6:06 p.m. UTC | #18
On 1/6/2023 9:03 AM, Anthony PERARD wrote:
> On Sun, Jan 01, 2023 at 06:52:03PM -0500, Chuck Zmudzinski wrote:
> > Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus,
> > as noted in docs/igd-assign.txt in the Qemu source code.
> > 
> > Currently, when the xl toolstack is used to configure a Xen HVM guest with
> > Intel IGD passthrough to the guest with the Qemu upstream device model,
> > a Qemu emulated PCI device will occupy slot 2 and the Intel IGD will occupy
> > a different slot. This problem often prevents the guest from booting.
> > 
> > The only available workaround is not good: Configure Xen HVM guests to use
> > the old and no longer maintained Qemu traditional device model available
> > from xenbits.xen.org which does reserve slot 2 for the Intel IGD.
> > 
> > To implement this feature in the Qemu upstream device model for Xen HVM
> > guests, introduce the following new functions, types, and macros:
> > 
> > * XEN_PT_DEVICE_CLASS declaration, based on the existing TYPE_XEN_PT_DEVICE
> > * XEN_PT_DEVICE_GET_CLASS macro helper function for XEN_PT_DEVICE_CLASS
> > * typedef XenPTQdevRealize function pointer
> > * XEN_PCI_IGD_SLOT_MASK, the value of slot_reserved_mask to reserve slot 2
> > * xen_igd_reserve_slot and xen_igd_clear_slot functions
> > 
> > The new xen_igd_reserve_slot function uses the existing slot_reserved_mask
> > member of PCIBus to reserve PCI slot 2 for Xen HVM guests configured using
> > the xl toolstack with the gfx_passthru option enabled, which sets the
> > igd-passthru=on option to Qemu for the Xen HVM machine type.
> > 
> > The new xen_igd_reserve_slot function also needs to be implemented in
> > hw/xen/xen_pt_stub.c to prevent FTBFS during the link stage for the case
> > when Qemu is configured with --enable-xen and --disable-xen-pci-passthrough,
> > in which case it does nothing.
> > 
> > The new xen_igd_clear_slot function overrides qdev->realize of the parent
> > PCI device class to enable the Intel IGD to occupy slot 2 on the PCI bus
> > since slot 2 was reserved by xen_igd_reserve_slot when the PCI bus was
> > created in hw/i386/pc_piix.c for the case when igd-passthru=on.
> > 
> > Move the call to xen_host_pci_device_get, and the associated error
> > handling, from xen_pt_realize to the new xen_igd_clear_slot function to
> > initialize the device class and vendor values which enables the checks for
> > the Intel IGD to succeed. The verification that the host device is an
> > Intel IGD to be passed through is done by checking the domain, bus, slot,
> > and function values as well as by checking that gfx_passthru is enabled,
> > the device class is VGA, and the device vendor in Intel.
> > 
> > Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
>
>
> This patch looks good enough. It only changes the "xenfv" machine so it
> doesn't prevent a proper fix to be done in the toolstack libxl.
>
> The change in xen_pci_passthrough_class_init() to try to run some code
> before pci_qdev_realize() could potentially break in the future due to
> been uncommon but hopefully that will be ok.
>
> So if no work to fix libxl appear soon, I'm ok with this patch:
>
> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
>
> Thanks,
>

Hi Anthony,

If you have been following this patch it is now at v10. Since there is
another approach of patching libxl by using the "pc" machine instead of
patching Qemu to fix the "xenfv" machine and there have been other
changes, I did not include your Reviewed-by tag in the later versions.

I presume you are not interested in dealing with the technical debt
of patching libxl as proposed by this patch to libxl:

https://lore.kernel.org/xen-devel/20230110073201.mdUvSjy1vKtxPriqMQuWAxIjQzf1eAqIlZgal1u3GBI@z/

because it would be more difficult to maintain and result in reduced
startup performance with the Intel IGD than by patching Qemu and
fixing the "xenfv" machine type with the Intel IGD directly.

So are you OK with v10 of this patch? If so, you can add your Reviewed-by
again to v10. The v10 has several changes since v6 as requested by other
reviewers (Michael, Stefano, Igor).

The v10 of the patch is here:

https://lore.kernel.org/qemu-devel/d473914c4d2dc38ae87dca4b898d75b44751c9cb.1674297794.git.brchuckz@aol.com/

Thanks,

Chuck
Chuck Zmudzinski Jan. 22, 2023, 1:09 a.m. UTC | #19
On 1/21/23 1:06 PM, Chuck Zmudzinski wrote:
> On 1/6/2023 9:03 AM, Anthony PERARD wrote:
>> On Sun, Jan 01, 2023 at 06:52:03PM -0500, Chuck Zmudzinski wrote:
>> > Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus,
>> > as noted in docs/igd-assign.txt in the Qemu source code.
>> > 
>> > Currently, when the xl toolstack is used to configure a Xen HVM guest with
>> > Intel IGD passthrough to the guest with the Qemu upstream device model,
>> > a Qemu emulated PCI device will occupy slot 2 and the Intel IGD will occupy
>> > a different slot. This problem often prevents the guest from booting.
>> > 
>> > The only available workaround is not good: Configure Xen HVM guests to use
>> > the old and no longer maintained Qemu traditional device model available
>> > from xenbits.xen.org which does reserve slot 2 for the Intel IGD.
>> > 
>> > To implement this feature in the Qemu upstream device model for Xen HVM
>> > guests, introduce the following new functions, types, and macros:
>> > 
>> > * XEN_PT_DEVICE_CLASS declaration, based on the existing TYPE_XEN_PT_DEVICE
>> > * XEN_PT_DEVICE_GET_CLASS macro helper function for XEN_PT_DEVICE_CLASS
>> > * typedef XenPTQdevRealize function pointer
>> > * XEN_PCI_IGD_SLOT_MASK, the value of slot_reserved_mask to reserve slot 2
>> > * xen_igd_reserve_slot and xen_igd_clear_slot functions
>> > 
>> > The new xen_igd_reserve_slot function uses the existing slot_reserved_mask
>> > member of PCIBus to reserve PCI slot 2 for Xen HVM guests configured using
>> > the xl toolstack with the gfx_passthru option enabled, which sets the
>> > igd-passthru=on option to Qemu for the Xen HVM machine type.
>> > 
>> > The new xen_igd_reserve_slot function also needs to be implemented in
>> > hw/xen/xen_pt_stub.c to prevent FTBFS during the link stage for the case
>> > when Qemu is configured with --enable-xen and --disable-xen-pci-passthrough,
>> > in which case it does nothing.
>> > 
>> > The new xen_igd_clear_slot function overrides qdev->realize of the parent
>> > PCI device class to enable the Intel IGD to occupy slot 2 on the PCI bus
>> > since slot 2 was reserved by xen_igd_reserve_slot when the PCI bus was
>> > created in hw/i386/pc_piix.c for the case when igd-passthru=on.
>> > 
>> > Move the call to xen_host_pci_device_get, and the associated error
>> > handling, from xen_pt_realize to the new xen_igd_clear_slot function to
>> > initialize the device class and vendor values which enables the checks for
>> > the Intel IGD to succeed. The verification that the host device is an
>> > Intel IGD to be passed through is done by checking the domain, bus, slot,
>> > and function values as well as by checking that gfx_passthru is enabled,
>> > the device class is VGA, and the device vendor in Intel.
>> > 
>> > Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
>>
>>
>> This patch looks good enough. It only changes the "xenfv" machine so it
>> doesn't prevent a proper fix to be done in the toolstack libxl.
>>
>> The change in xen_pci_passthrough_class_init() to try to run some code
>> before pci_qdev_realize() could potentially break in the future due to
>> been uncommon but hopefully that will be ok.
>>
>> So if no work to fix libxl appear soon, I'm ok with this patch:
>>
>> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
>>
>> Thanks,
>>
> 
> Hi Anthony,
> 
> If you have been following this patch it is now at v10. Since there is
> another approach of patching libxl by using the "pc" machine instead of
> patching Qemu to fix the "xenfv" machine and there have been other
> changes, I did not include your Reviewed-by tag in the later versions.
> 
> I presume you are not interested in dealing with the technical debt
> of patching libxl as proposed by this patch to libxl:
> 
> https://lore.kernel.org/xen-devel/20230110073201.mdUvSjy1vKtxPriqMQuWAxIjQzf1eAqIlZgal1u3GBI@z/
> 
> because it would be more difficult to maintain and result in reduced
> startup performance with the Intel IGD than by patching Qemu and
> fixing the "xenfv" machine type with the Intel IGD directly.
> 
> So are you OK with v10 of this patch? If so, you can add your Reviewed-by
> again to v10. The v10 has several changes since v6 as requested by other
> reviewers (Michael, Stefano, Igor).
> 
> The v10 of the patch is here:
> 
> https://lore.kernel.org/qemu-devel/d473914c4d2dc38ae87dca4b898d75b44751c9cb.1674297794.git.brchuckz@aol.com/
> 
> Thanks,
> 
> Chuck

Sorry to bother you again, Anthony, but no one noticed a style
mistake that has been present in the past few versions. The
v11 fixes that without making any other changes since v10, so
if you want to add your Reviewed-by to the most recent version,
here it is (v11) (you should also have it in your Inbox):

https://lore.kernel.org/qemu-devel/b1b4a21fe9a600b1322742dda55a40e9961daa57.1674346505.git.brchuckz@aol.com/

Thanks,

Chuck
diff mbox series

Patch

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index b48047f50c..bc5efa4f59 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -405,6 +405,9 @@  static void pc_xen_hvm_init(MachineState *machine)
     }
 
     pc_xen_hvm_init_pci(machine);
+    if (xen_igd_gfx_pt_enabled()) {
+        xen_igd_reserve_slot(pcms->bus);
+    }
     pci_create_simple(pcms->bus, -1, "xen-platform");
 }
 #endif
diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index 0ec7e52183..7fae1e7a6f 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -780,15 +780,6 @@  static void xen_pt_realize(PCIDevice *d, Error **errp)
                s->hostaddr.bus, s->hostaddr.slot, s->hostaddr.function,
                s->dev.devfn);
 
-    xen_host_pci_device_get(&s->real_device,
-                            s->hostaddr.domain, s->hostaddr.bus,
-                            s->hostaddr.slot, s->hostaddr.function,
-                            errp);
-    if (*errp) {
-        error_append_hint(errp, "Failed to \"open\" the real pci device");
-        return;
-    }
-
     s->is_virtfn = s->real_device.is_virtfn;
     if (s->is_virtfn) {
         XEN_PT_LOG(d, "%04x:%02x:%02x.%d is a SR-IOV Virtual Function\n",
@@ -950,11 +941,47 @@  static void xen_pci_passthrough_instance_init(Object *obj)
     PCI_DEVICE(obj)->cap_present |= QEMU_PCI_CAP_EXPRESS;
 }
 
+void xen_igd_reserve_slot(PCIBus *pci_bus)
+{
+    XEN_PT_LOG(0, "Reserving PCI slot 2 for IGD\n");
+    pci_bus->slot_reserved_mask |= XEN_PCI_IGD_SLOT_MASK;
+}
+
+static void xen_igd_clear_slot(DeviceState *qdev, Error **errp)
+{
+    ERRP_GUARD();
+    PCIDevice *pci_dev = (PCIDevice *)qdev;
+    XenPCIPassthroughState *s = XEN_PT_DEVICE(pci_dev);
+    XenPTDeviceClass *xpdc = XEN_PT_DEVICE_GET_CLASS(s);
+    PCIBus *pci_bus = pci_get_bus(pci_dev);
+
+    xen_host_pci_device_get(&s->real_device,
+                            s->hostaddr.domain, s->hostaddr.bus,
+                            s->hostaddr.slot, s->hostaddr.function,
+                            errp);
+    if (*errp) {
+        error_append_hint(errp, "Failed to \"open\" the real pci device");
+        return;
+    }
+
+    if (is_igd_vga_passthrough(&s->real_device) &&
+        s->real_device.domain == 0 && s->real_device.bus == 0 &&
+        s->real_device.dev == 2 && s->real_device.func == 0 &&
+        s->real_device.vendor_id == PCI_VENDOR_ID_INTEL) {
+        pci_bus->slot_reserved_mask &= ~XEN_PCI_IGD_SLOT_MASK;
+        XEN_PT_LOG(pci_dev, "Intel IGD found, using slot 2\n");
+    }
+    xpdc->pci_qdev_realize(qdev, errp);
+}
+
 static void xen_pci_passthrough_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
+    XenPTDeviceClass *xpdc = XEN_PT_DEVICE_CLASS(klass);
+    xpdc->pci_qdev_realize = dc->realize;
+    dc->realize = xen_igd_clear_slot;
     k->realize = xen_pt_realize;
     k->exit = xen_pt_unregister_device;
     k->config_read = xen_pt_pci_read_config;
@@ -977,6 +1004,7 @@  static const TypeInfo xen_pci_passthrough_info = {
     .instance_size = sizeof(XenPCIPassthroughState),
     .instance_finalize = xen_pci_passthrough_finalize,
     .class_init = xen_pci_passthrough_class_init,
+    .class_size = sizeof(XenPTDeviceClass),
     .instance_init = xen_pci_passthrough_instance_init,
     .interfaces = (InterfaceInfo[]) {
         { INTERFACE_CONVENTIONAL_PCI_DEVICE },
diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
index e7c4316a7d..40b31b5263 100644
--- a/hw/xen/xen_pt.h
+++ b/hw/xen/xen_pt.h
@@ -3,6 +3,7 @@ 
 
 #include "hw/xen/xen_common.h"
 #include "hw/pci/pci.h"
+#include "hw/pci/pci_bus.h"
 #include "xen-host-pci-device.h"
 #include "qom/object.h"
 
@@ -41,7 +42,20 @@  typedef struct XenPTReg XenPTReg;
 #define TYPE_XEN_PT_DEVICE "xen-pci-passthrough"
 OBJECT_DECLARE_SIMPLE_TYPE(XenPCIPassthroughState, XEN_PT_DEVICE)
 
+#define XEN_PT_DEVICE_CLASS(klass) \
+    OBJECT_CLASS_CHECK(XenPTDeviceClass, klass, TYPE_XEN_PT_DEVICE)
+#define XEN_PT_DEVICE_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(XenPTDeviceClass, obj, TYPE_XEN_PT_DEVICE)
+
+typedef void (*XenPTQdevRealize)(DeviceState *qdev, Error **errp);
+
+typedef struct XenPTDeviceClass {
+    PCIDeviceClass parent_class;
+    XenPTQdevRealize pci_qdev_realize;
+} XenPTDeviceClass;
+
 uint32_t igd_read_opregion(XenPCIPassthroughState *s);
+void xen_igd_reserve_slot(PCIBus *pci_bus);
 void igd_write_opregion(XenPCIPassthroughState *s, uint32_t val);
 void xen_igd_passthrough_isa_bridge_create(XenPCIPassthroughState *s,
                                            XenHostPCIDevice *dev);
@@ -76,6 +90,8 @@  typedef int (*xen_pt_conf_byte_read)
 
 #define XEN_PCI_INTEL_OPREGION 0xfc
 
+#define XEN_PCI_IGD_SLOT_MASK 0x4UL /* Intel IGD slot_reserved_mask */
+
 typedef enum {
     XEN_PT_GRP_TYPE_HARDWIRED = 0,  /* 0 Hardwired reg group */
     XEN_PT_GRP_TYPE_EMU,            /* emul reg group */
diff --git a/hw/xen/xen_pt_stub.c b/hw/xen/xen_pt_stub.c
index 2d8cac8d54..5c108446a8 100644
--- a/hw/xen/xen_pt_stub.c
+++ b/hw/xen/xen_pt_stub.c
@@ -20,3 +20,7 @@  void xen_igd_gfx_pt_set(bool value, Error **errp)
         error_setg(errp, "Xen PCI passthrough support not built in");
     }
 }
+
+void xen_igd_reserve_slot(PCIBus *pci_bus)
+{
+}