diff mbox series

BUG: libxenlight fails to grant permission to access Intel IGD Opregion

Message ID 4703b7f1-8d3c-5128-213c-e39f487e4cde@netscape.net (mailing list archive)
State New, archived
Headers show
Series BUG: libxenlight fails to grant permission to access Intel IGD Opregion | expand

Commit Message

Chuck Zmudzinski March 11, 2022, 5:01 a.m. UTC
Hello,

Below is a proposed patch for a bug that I have noticed on
Debian versions of Xen going back to at least Xen 4.8, which
shipped with Debian 9 in 2017. In fact, the last Debian version
of Xen that did not have the bug was Xen 4.4 that shipped
with Debian 8 back in 2015. The bug causes passthrough of the Intel
IGD to a Linux HVM domain to fail.

On Debian BTS this is 
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=988333

I have provided a patch that fixes the bug on Xen-4.16 at the end of
this message.

A long time ago, during the development of Xen 4.5 in 2014,
two patches implemented a change to the way permission is
granted for an unprivileged domain to access PCI/VGA-related
I/O memory. Prior to this, AFAICT, permission was implicitly
granted to access the memory the domain requested when a
PCI device being passed to the domain was being configured.
After the change, permission to access such memory is not
granted without prior explicit permission being configured,
and this is still the current behavior.

The relevant patches are:

1. 
https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=abfb006f1ff4af1424e5b0d0589f0226377fda36

and

2. 
https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=0561e1f01e87b777bcc47971e4ae1f420314f4a0

I found these patches by brute force, building and testing various
versions of Xen 4.5-unstable until I found the patches that caused
passthrough of the Intel IGD to Linux to fail. I used the traditional
device model in my tests because back when Xen 4.5 was under
development, the upstream device model did not support Intel IGD
passthrough.

The first of these patches intended to implement explicit
granting of permission to access the I/O memory that is
needed to support the gfx_passthru feature in libxl_pci.c,
in the libxl__grant_vga_iomem_permission function. The second
patch implements the removal of implicit permission to access
the PCI/VGA-related I/O memory and causes requests to access
such memory by a domain to be denied unless prior explicit
permission had been configured.

Specifically, the first patch adds 32 (0x20) pages of data starting at
memory address (0xa0000 >> XC_PAGE_SHIFT) (in pages) to the memory the
domain is permitted to access. XC_PAGE_SHIFT is 12, so this memory
range shows up in the logs when running Xen 4.5 as:

memory_map:add: dom1 gfn=a0 mfn=a0 nr=20

But my testing of these old patches with added custom logging
shows that another two pages are needed:

memory_map:access not permitted: dom1 gfn=fdffc mfn=cc490 nr=2

The patch for custom logging was against the Xen-4.5 branch after the
second of the aforementioned patches was committed to Xen 4.5-unstable.

Here is the patch (against Xen 4.5) for custom logging, and I would
recommend adding it to Xen so a message will be printed in the dmesg
buffer when a domain tries to access memory it is not allowed to
access:

  {
@@ -640,6 +641,45 @@
  }

  /*
+ * This function assumes prior verification
+ * that pci is an Intel IGD device.
+ */
+static uint32_t sysfs_dev_get_igd_opregion(libxl__gc *gc, 
libxl_device_pci *pci)
+{
+    char *pci_device_vendor_path =
+            GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/config",
+                      pci->domain, pci->bus, pci->dev, pci->func);
+    size_t read_item;
+    uint32_t igd_opregion;
+
+    FILE *f = fopen(pci_device_vendor_path, "r");
+    if (!f) {
+        LOGE(ERROR,
+             "pci device "PCI_BDF" does not have config attribute",
+             pci->domain, pci->bus, pci->dev, pci->func);
+        return 0xffffffff;
+    }
+    if (fseek(f, PCI_INTEL_OPREGION, SEEK_SET)) {
+        LOGE(ERROR,
+             "pci device "PCI_BDF": cannot find igd-opregion address",
+             pci->domain, pci->bus, pci->dev, pci->func);
+        fclose(f);
+        return 0xffffffff;
+    }
+    read_item = fread(&igd_opregion, 4, 1, f);
+    fclose(f);
+
+    if (read_item != 1) {
+        LOGE(ERROR,
+             "cannot read igd-opresgion address of pci device "PCI_BDF,
+             pci->domain, pci->bus, pci->dev, pci->func);
+        return 0xffffffff;
+    }
+
+    return igd_opregion;
+}
+
+/*
   * A brief comment about slots.  I don't know what slots are for; however,
   * I have by experimentation determined:
   * - Before a device can be bound to pciback, its BDF must first be listed
@@ -2531,6 +2571,34 @@
                    domid, vga_iomem_start, (vga_iomem_start + 0x20 - 1));
              return ret;
          }
+
+        /* Allow access to Intel igd-opregion */
+        if (sysfs_dev_get_vendor(gc, pci) == 0x8086)
+        {
+            uint32_t igd_opregion = sysfs_dev_get_igd_opregion(gc, pci);
+            if (igd_opregion == 0xffffffff)
+                break;
+            vga_iomem_start = ((uint64_t)igd_opregion) >> XC_PAGE_SHIFT;
+            ret = xc_domain_iomem_permission(CTX->xch, stubdom_domid,
+                                             vga_iomem_start, 0x2, 1);
+            if (ret < 0) {
+                LOGED(ERROR, domid,
+                      "failed to give stubdom%d access to iomem range "
+                      "%"PRIx64"-%"PRIx64" for IGD passthru",
+                      stubdom_domid,
+                      vga_iomem_start, (vga_iomem_start + 0x2 - 1));
+                return ret;
+            }
+            ret = xc_domain_iomem_permission(CTX->xch, domid,
+                                             vga_iomem_start, 0x2, 1);
+            if (ret < 0) {
+                LOGED(ERROR, domid,
+                      "failed to give dom%d access to iomem range "
+                      "%"PRIx64"-%"PRIx64" for IGD passthru",
+                      domid, vga_iomem_start, (vga_iomem_start + 0x2 - 1));
+                return ret;
+            }
+        }
          break;
      }

snip ---------------------------------------------------- snip

Comments

Jan Beulich March 11, 2022, 8:09 a.m. UTC | #1
On 11.03.2022 06:01, Chuck Zmudzinski wrote:
> Further research showed that these two pages at 0xcc490 are for the
> Intel IGD opregion, and because this memory is not permitted to be
> accessed by the domain, the passthrough of an Intel IGD to a Linux
> HVM domain fails, causing a crash of the Linux i915.ko kernel module
> in the HVM domain. My testing, which was on a desktop with a Haswell
> Intel CPU/IGD, confirmed that these two extra pages need to be
> permitted in order for passthrough of the Intel IGD to a Linux
> domain to work properly.
> 
> I find that adding two pages is enough to fix the problem, but I
> have read in other places that the Opregion is actually three pages,
> and maybe newer revisions of the Intel IGD do need three pages instead
> of two. I am testing on a Haswell Intel chip, which is over 8 years old
> now. So the patch I propose adds two pages, but I am not sure if
> it should be three pages for newer Intel chips.
> 
> The failure to map this memory with gfx_passthru enabled
> is therefore a bug, a regression that was introduced with the two
> aforementioned patches way back in 2014 when Xen 4.5 was under
> development.

Thanks for this analysis. It looks quite plausible (but the question
of 2 vs 3 pages of course needs resolving).

> Once I developed a patch, I did more testing with the traditional
> Qemu device model and Debian's package of Xen-4.16 for Debian
> sid/unstable after I discovered where this bug first appeared in
> Xen 4.5-unstable back in 2014. In my testing, Windows HVM domains are
> not affected by this bug and they function properly, most likely
> because proprietary Intel graphics drivers for Windows are more
> forgiving than the Linux open source drivers for Intel graphics
> regarding the details of how Xen and Qemu configure the domain.
> 
> This bug still exists in current supported versions of Xen
> because in Xen 4.16, passthrough of my Haswell Intel IGD to a Linux
> domain still fails with a crash of the i915 Linux kernel module in
> the Linux unprivileged domain when the traditional Qemu device model
> is used in dom0. The patch at the end of this message fixes it.
> 
> I have not yet succeeded in reproducing this bug with the
> upstream device model because there is another bug in Qemu
> upstream that breaks passthrough of the Intel IGD to a Linux HVM
> domU, so for now, to reproduce it, please use the traditional device
> model.
> 
> Also, as a starting point to reproduce the bug, first get Intel IGD
> passthrough to a Windows HVM domain using the Qemu traditional
> device model working on Xen 4.16. Then replace the Windows HVM domain
> with a Linux HVM domain, keeping everything else the same including
> the Qemu traditional device model. I tested using a Debian 11.2
> (bullseye) HVM domain and Debian sid/unstable with Xen 4.16 and
> a build of the Qemu traditional device model from source as
> provided on xenbits.xen.org
> 
> I am using a desktop computer and the xl toolstack and Xen as
> packaged by Debian, except that I added the traditional device
> model that Debian does not provide.
> 
> If you need more info, please let me know. I am not subscribed to
> xen-devel so please cc me with your replies.
> 
> Regards,
> 
> Chuck
> 
> Here is the patch that fixes the bug on Debian sid/Xen 4.16:

As to an actual patch for us to take - please see
docs/process/sending-patches.pandoc for the formal requirements.
(Note this was recently introduced, so you won't find it in the
4.16 sources. But your patch wants to be against latest staging
anyway.)

Jan
Chuck Zmudzinski March 11, 2022, 1:47 p.m. UTC | #2
On 3/11/2022 3:09 AM, Jan Beulich wrote:
> On 11.03.2022 06:01, Chuck Zmudzinski wrote:
>> Further research showed that these two pages at 0xcc490 are for the
>> Intel IGD opregion, and because this memory is not permitted to be
>> accessed by the domain, the passthrough of an Intel IGD to a Linux
>> HVM domain fails, causing a crash of the Linux i915.ko kernel module
>> in the HVM domain. My testing, which was on a desktop with a Haswell
>> Intel CPU/IGD, confirmed that these two extra pages need to be
>> permitted in order for passthrough of the Intel IGD to a Linux
>> domain to work properly.
>>
>> I find that adding two pages is enough to fix the problem, but I
>> have read in other places that the Opregion is actually three pages,
>> and maybe newer revisions of the Intel IGD do need three pages instead
>> of two. I am testing on a Haswell Intel chip, which is over 8 years old
>> now. So the patch I propose adds two pages, but I am not sure if
>> it should be three pages for newer Intel chips.
>>
>> The failure to map this memory with gfx_passthru enabled
>> is therefore a bug, a regression that was introduced with the two
>> aforementioned patches way back in 2014 when Xen 4.5 was under
>> development.
> Thanks for this analysis. It looks quite plausible (but the question
> of 2 vs 3 pages of course needs resolving).
>
>> Once I developed a patch, I did more testing with the traditional
>> Qemu device model and Debian's package of Xen-4.16 for Debian
>> sid/unstable after I discovered where this bug first appeared in
>> Xen 4.5-unstable back in 2014. In my testing, Windows HVM domains are
>> not affected by this bug and they function properly, most likely
>> because proprietary Intel graphics drivers for Windows are more
>> forgiving than the Linux open source drivers for Intel graphics
>> regarding the details of how Xen and Qemu configure the domain.
>>
>> This bug still exists in current supported versions of Xen
>> because in Xen 4.16, passthrough of my Haswell Intel IGD to a Linux
>> domain still fails with a crash of the i915 Linux kernel module in
>> the Linux unprivileged domain when the traditional Qemu device model
>> is used in dom0. The patch at the end of this message fixes it.
>>
>> I have not yet succeeded in reproducing this bug with the
>> upstream device model because there is another bug in Qemu
>> upstream that breaks passthrough of the Intel IGD to a Linux HVM
>> domU, so for now, to reproduce it, please use the traditional device
>> model.
>>
>> Also, as a starting point to reproduce the bug, first get Intel IGD
>> passthrough to a Windows HVM domain using the Qemu traditional
>> device model working on Xen 4.16. Then replace the Windows HVM domain
>> with a Linux HVM domain, keeping everything else the same including
>> the Qemu traditional device model. I tested using a Debian 11.2
>> (bullseye) HVM domain and Debian sid/unstable with Xen 4.16 and
>> a build of the Qemu traditional device model from source as
>> provided on xenbits.xen.org
>>
>> I am using a desktop computer and the xl toolstack and Xen as
>> packaged by Debian, except that I added the traditional device
>> model that Debian does not provide.
>>
>> If you need more info, please let me know. I am not subscribed to
>> xen-devel so please cc me with your replies.
>>
>> Regards,
>>
>> Chuck
>>
>> Here is the patch that fixes the bug on Debian sid/Xen 4.16:
> As to an actual patch for us to take - please see
> docs/process/sending-patches.pandoc for the formal requirements.
> (Note this was recently introduced, so you won't find it in the
> 4.16 sources. But your patch wants to be against latest staging
> anyway.)
>
> Jan
>

After resolving the question of two vs. three pages, I will follow
the process for submitting a patch against the latest staging.

Qubes OS has a patch that uses three pages, and the
igd.c pci file in Qemu's hw/vfio directory also specifies
three pages, but if two is enough, that seems to be safer.
I haven't checked yet to see if there is an official specification
from Intel. I will start by looking in the Linux kernel i915
driver code which might give a clue.

Chuck
Chuck Zmudzinski March 11, 2022, 8:35 p.m. UTC | #3
On 3/11/22 8:47 AM, Chuck Zmudzinski wrote:
 > On 3/11/2022 3:09 AM, Jan Beulich wrote:
 >>
 >> Thanks for this analysis. It looks quite plausible (but the question
 >> of 2 vs 3 pages of course needs resolving).
 >>
 >>
 >>
 >
 > After resolving the question of two vs. three pages, I will follow
 > the process for submitting a patch against the latest staging.
 >
 > Qubes OS has a patch that uses three pages, and the
 > igd.c pci file in Qemu's hw/vfio directory also specifies
 > three pages, but if two is enough, that seems to be safer.
 > I haven't checked yet to see if there is an official specification
 > from Intel. I will start by looking in the Linux kernel i915
 > driver code which might give a clue.

It looks like both in Xen and Qemu, it is agreed we need 3 pages

In Xen, we have:

From: Keir Fraser <keir@xen.org>
Date: Thu, 10 Jan 2013 17:26:24 +0000 (+0000)
Subject: hvmloader: Allocate 3 pages for Intel GPU OpRegion passthrough.
X-Git-Tag: 4.3.0-rc1~546
X-Git-Url:
https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff_plain;h=408a9e56343b006c9e58a334f0b97dd2deedf9ac

hvmloader: Allocate 3 pages for Intel GPU OpRegion passthrough.

The 8kB region may not be page aligned, hence requiring 3 pages to
be mapped through.

Signed-off-by: Keir Fraser <keir@xen.org>

Also Qemu upstream defines it as 3 pages:

https://xenbits.xen.org/gitweb/?p=qemu-xen.git;a=blob;f=hw/xen/xen_pt_graphics.c;h=a3bc7e39214b3d71c32c8386758fdef6ced5c0df;hb=a68d6d311c2d1fd9d2fa9a0768ea2353e8a79b42#l242

In Keir's commit, the OpRegion size was increased from 2 to
3 pages, so I think it is correct to use 3 pages in a proposed
patch.

In tools/firmware/hvmloader, there are header files that have
macros setting the address and size of the Intel IGD opregion.

So my only question is:

Is it proper to include header files from tools/firmware/hvmloader in
tools/libs/light/libxl_pci.c ?

Chuck
Chuck Zmudzinski March 12, 2022, 8:28 p.m. UTC | #4
On 3/11/22 3:09 AM, Jan Beulich wrote:
> On 11.03.2022 06:01, Chuck Zmudzinski wrote:
>
>> Here is the patch that fixes the bug on Debian sid/Xen 4.16:
> As to an actual patch for us to take - please see
> docs/process/sending-patches.pandoc for the formal requirements.
> (Note this was recently introduced, so you won't find it in the
> 4.16 sources. But your patch wants to be against latest staging
> anyway.)
>
> Jan
>

OK, I took a look at the process and I also studied this
issue more closely, and my conclusion is that I would not
recommend fixing this old bug now until we have a better
idea about how good our current tests for the Intel IGD are.

AFAICT, if our tests for the Intel IGD result in a false positive,
then hvmloader will map three pages in the guest for the
IGD opregion, but the mapped memory would certainly
not be the expected IGD opregion if the device is not actually
an IGD or GPU with an opregion. In such a case, we would be
mapping three pages of unexpected memory to the guest. So before
proposing a patch that would fix this bug but have the unintended
consequence of allowing access to unexpected memory in the case
of a false positive detection of an Intel IGD, I will first spend some
time deciding if a more accurate and reliable test is needed to
determine if a PCI device with class VGA and vendor Intel actually
has an IGD opregion. Once I am confident that the risk of a false
positive when testing for the Intel IGD is acceptably low , then I
would consider submitting a patch that fixes this bug.

Our tests check if the PCI device has class VGA and that the
vendor is Intel, and we also check if the gfx_passthru option
is enabled. Those tests are applied both in hvmloader
and in libxenlight to decide about mapping the IGD opregion
to the guest and informing Qemu about the mapped address.
I don't think these tests for the Intel IGD account for recent
developments such as newer discrete Intel GPUs that
might not have an IGD opregion, nor do they account for
older Intel IGDs/GPUs that also might not have an IGD opregion.

I think some time is needed to look at the i915 Linux kernel driver
code to more precisely identify the devices that need access
to the IGD opregion. Other devices either are not compatible
with the feature of VGA passthrough or do not need to have
access to the IGD opregion.

With this information, a patch can be developed that will more
accurately determine when the guest needs access to the IGD
opregion. With such a patch committed in Xen, I would be more
comfortable submitting a fix for this bug.

Regards,

Chuck
Jan Beulich March 14, 2022, 7:26 a.m. UTC | #5
On 11.03.2022 21:35, Chuck Zmudzinski wrote:
> So my only question is:
> 
> Is it proper to include header files from tools/firmware/hvmloader in
> tools/libs/light/libxl_pci.c ?

No, it certainly is not.

Jan
Chuck Zmudzinski March 14, 2022, 1:05 p.m. UTC | #6
On 3/14/22 3:26 AM, Jan Beulich wrote:
> On 11.03.2022 21:35, Chuck Zmudzinski wrote:
>> So my only question is:
>>
>> Is it proper to include header files from tools/firmware/hvmloader in
>> tools/libs/light/libxl_pci.c ?
> No, it certainly is not.
>
> Jan
>

That's a relief, because if if was proper, I wouldn't know
how to do it properly. Instead, I wrote a patch that defines
the macros I need in tools/libs/light/libxl_pci.c with the same
values they have in tools/firmware/hvmloader.

When you get a chance, can you take a look at it?
I cc'd the patch to you because of it's reference to hvmloader.

Chuck
diff mbox series

Patch

--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -1033,7 +1033,12 @@  long 
do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
          ret = -EPERM;
          if ( !iomem_access_permitted(current->domain, mfn, mfn_end) ||
               !iomem_access_permitted(d, mfn, mfn_end) )
+        {
+            printk(XENLOG_G_WARNING
+                       "memory_map:access not permitted: dom%d gfn=%lx 
mfn=%lx nr=%lx\n",
+                       d->domain_id, gfn, mfn, nr_mfns);
              break;
+        }

          ret = xsm_iomem_mapping(XSM_HOOK, d, mfn, mfn_end, add);
          if ( ret )
snip ---------------------------------------------- snip

Further research showed that these two pages at 0xcc490 are for the
Intel IGD opregion, and because this memory is not permitted to be
accessed by the domain, the passthrough of an Intel IGD to a Linux
HVM domain fails, causing a crash of the Linux i915.ko kernel module
in the HVM domain. My testing, which was on a desktop with a Haswell
Intel CPU/IGD, confirmed that these two extra pages need to be
permitted in order for passthrough of the Intel IGD to a Linux
domain to work properly.

I find that adding two pages is enough to fix the problem, but I
have read in other places that the Opregion is actually three pages,
and maybe newer revisions of the Intel IGD do need three pages instead
of two. I am testing on a Haswell Intel chip, which is over 8 years old
now. So the patch I propose adds two pages, but I am not sure if
it should be three pages for newer Intel chips.

The failure to map this memory with gfx_passthru enabled
is therefore a bug, a regression that was introduced with the two
aforementioned patches way back in 2014 when Xen 4.5 was under
development.

Once I developed a patch, I did more testing with the traditional
Qemu device model and Debian's package of Xen-4.16 for Debian
sid/unstable after I discovered where this bug first appeared in
Xen 4.5-unstable back in 2014. In my testing, Windows HVM domains are
not affected by this bug and they function properly, most likely
because proprietary Intel graphics drivers for Windows are more
forgiving than the Linux open source drivers for Intel graphics
regarding the details of how Xen and Qemu configure the domain.

This bug still exists in current supported versions of Xen
because in Xen 4.16, passthrough of my Haswell Intel IGD to a Linux
domain still fails with a crash of the i915 Linux kernel module in
the Linux unprivileged domain when the traditional Qemu device model
is used in dom0. The patch at the end of this message fixes it.

I have not yet succeeded in reproducing this bug with the
upstream device model because there is another bug in Qemu
upstream that breaks passthrough of the Intel IGD to a Linux HVM
domU, so for now, to reproduce it, please use the traditional device
model.

Also, as a starting point to reproduce the bug, first get Intel IGD
passthrough to a Windows HVM domain using the Qemu traditional
device model working on Xen 4.16. Then replace the Windows HVM domain
with a Linux HVM domain, keeping everything else the same including
the Qemu traditional device model. I tested using a Debian 11.2
(bullseye) HVM domain and Debian sid/unstable with Xen 4.16 and
a build of the Qemu traditional device model from source as
provided on xenbits.xen.org

I am using a desktop computer and the xl toolstack and Xen as
packaged by Debian, except that I added the traditional device
model that Debian does not provide.

If you need more info, please let me know. I am not subscribed to
xen-devel so please cc me with your replies.

Regards,

Chuck

Here is the patch that fixes the bug on Debian sid/Xen 4.16:

--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -24,6 +24,7 @@ 
  #define PCI_OPTIONS            "msitranslate=%d,power_mgmt=%d"
  #define PCI_BDF_XSPATH         "%04x-%02x-%02x-%01x"
  #define PCI_PT_QDEV_ID         "pci-pt-%02x_%02x.%01x"
+#define PCI_INTEL_OPREGION     0xfc /* value defined in 
tools/firmware/hvmloader/pci_regs.h */

  static unsigned int pci_encode_bdf(libxl_device_pci *pci)