diff mbox series

vfio/igd: Update IGD passthrough docoumentation

Message ID 20250312155002.286841-1-tomitamoeko@gmail.com (mailing list archive)
State New
Headers show
Series vfio/igd: Update IGD passthrough docoumentation | expand

Commit Message

Tomita Moeko March 12, 2025, 3:50 p.m. UTC
A previous change made the OpRegion and LPC quirks independent of the
exising legacy mode, update the docoumentation accordingly. More related
topics, like creating EFI Option ROM of IGD for OVMF, how to solve the
VFIO_DMA_MAP Invalid Argument warning, as well as details on IGD memory
internals, are also added.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
---
 docs/igd-assign.txt | 262 ++++++++++++++++++++++++++++++++------------
 1 file changed, 193 insertions(+), 69 deletions(-)

Comments

Alex Williamson March 12, 2025, 4:29 p.m. UTC | #1
On Wed, 12 Mar 2025 23:50:02 +0800
Tomita Moeko <tomitamoeko@gmail.com> wrote:

> A previous change made the OpRegion and LPC quirks independent of the
> exising legacy mode, update the docoumentation accordingly. More related
> topics, like creating EFI Option ROM of IGD for OVMF, how to solve the
> VFIO_DMA_MAP Invalid Argument warning, as well as details on IGD memory
> internals, are also added.
> 
> Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
> ---
>  docs/igd-assign.txt | 262 ++++++++++++++++++++++++++++++++------------
>  1 file changed, 193 insertions(+), 69 deletions(-)
> 
> diff --git a/docs/igd-assign.txt b/docs/igd-assign.txt
> index e17bb50789..c7c4565906 100644
> --- a/docs/igd-assign.txt
> +++ b/docs/igd-assign.txt
> @@ -1,44 +1,69 @@
>  Intel Graphics Device (IGD) assignment with vfio-pci
>  ====================================================
>  
> -IGD has two different modes for assignment using vfio-pci:
> -
> -1) Universal Pass-Through (UPT) mode:
> -
> -   In this mode the IGD device is added as a *secondary* (ie. non-primary)
> -   graphics device in combination with an emulated primary graphics device.
> -   This mode *requires* guest driver support to remove the external
> -   dependencies generally associated with IGD (see below).  Those guest
> -   drivers only support this mode for Broadwell and newer IGD, according to
> -   Intel.  Additionally, this mode by default, and as officially supported
> -   by Intel, does not support direct video output.  The intention is to use
> -   this mode either to provide hardware acceleration to the emulated graphics
> -   or to use this mode in combination with guest-based remote access software,
> -   for example VNC (see below for optional output support).  This mode
> -   theoretically has no device specific handling dependencies on vfio-pci or
> -   the VM firmware.
> -
> -2) "Legacy" mode:
> -
> -   In this mode the IGD device is intended to be the primary and exclusive
> -   graphics device in the VM[1], as such QEMU does not facilitate any sort
> -   of remote graphics to the VM in this mode.  A connected physical monitor
> -   is the intended output device for IGD.  This mode includes several
> -   requirements and restrictions:
> -
> -    * IGD must be given address 02.0 on the PCI root bus in the VM
> -    * The host kernel must support vfio extensions for IGD (v4.6)
> -    * vfio VGA support very likely needs to be enabled in the host kernel
> -    * The VM firmware must support specific fw_cfg enablers for IGD
> -    * The VM machine type must support a PCI host bridge at 00.0 (standard)
> -    * The VM machine type must provide or allow to be created a special
> -      ISA/LPC bridge device (vfio-pci-igd-lpc-bridge) on the root bus at
> -      PCI address 1f.0.
> -    * The IGD device must have a VGA ROM, either provided via the romfile
> -      option or loaded automatically through vfio (standard).  rombar=0
> -      will disable legacy mode support.
> -    * Hotplug of the IGD device is not supported.
> -    * The IGD device must be a SandyBridge or newer model device.
> +Using vfio-pci, we can passthrough Intel Graphics Device (IGD) to guest, either
> +serve as primary and exclusive graphics adapter, or used in combination with an
> +emulated primary graphics device, depending on the config and guest driver
> +support. However, IGD devices are not "clean" PCI devices, they use extra
> +memory regions other than BARs. Special handling is required to make them work
> +properly, including:
> +
> +* OpRegion for accessing Virtual BIOS Table (VBT) that contains display output
> +  information.
> +* Data Stolen Memory (DSM) region used as VRAM at early stage (BIOS/UEFI)
> +
> +Certain guest software also depends on following conditions to work:
> +(*-Required by)
> +
> +| Condition                                   | Linux | Windows | VBIOS | EFI GOP |
> +|---------------------------------------------|-------|---------|-------|---------|
> +| #1 IGD has a valid OpRegion containing VBT  |  * ^1 |    *    |   *   |    *    |
> +| #2 VID/DID of LPC bridge at 00:1f.0 matches |       |         |   *   |    *    |
> +| #3 IGD is assigned to BDF 00:02.0           |       |         |   *   |    *    |
> +| #4 IGD has VGA controller device class      |       |         |   *   |    *    |
> +| #5 Host's VGA ranges are mapped to IGD      |       |         |   *   |         |
> +| #6 Guest has valid VBIOS or UEFI Option ROM |       |         |   *   |    *    |
> +
> +^1 Though i915 driver is able to mock a OpRegion, it is still recommended to
> +   use the VBT copied from host OpRegion to prevent incorrect configuration.
> +
> +For #1, the "x-igd-opregion=on" option exposes a copy of host IGD OpRegion to
> +guest via fw_cfg, where guest firmware can set up guest OpRegion with it.
> +
> +For #2, "x-igd-lpc=on" option copies the IDs of host LPC bridge and host bridge
> +to guest. Currently this is only supported on i440fx machines as there is
> +already an ICH9 LPC bridge present on q35 machines, overwriting its IDs may
> +lead to unexpected behavior.
> +
> +For #3, "addr=2.0" assigns IGD to 00:02.0.
> +
> +For #4, the primary display must be set to IGD in host BIOS.
> +
> +For #5, "x-vga=on" enables guest access to standard VGA IO/MMIO ranges.
> +
> +For #6, ROM either provided via the ROM BAR or romfile= option is needed, this
> +Intel document [1] shows how to dump VBIOS to file. For UEFI Option ROM, see
> +"Guest firmware" section.
> +
> +QEMU also provides a "Legacy" mode that implicitly enables full functionality
> +on IGD, it is automatically enabled when
> +* Machine type is i440fx
> +* IGD is assigned to guest BDF 00:02.0
> +* ROM BAR or romfile is present
> +
> +In "Legacy" mode, QEMU will automatically setup OpRegion, LPC bridge IDs and
> +VGA range access, which is equivalent to:
> +  x-igd-opregion=on,x-igd-lpc=on,x-vga=on
> +
> +By default, "Legacy" mode won't fail, it continues on error. User can set
> +"x-igd-legacy-mode=on" to force enabling legacy mode, this also checks if the
> +conditions above for legacy mode is met, and if any error occurs, QEMU will
> +fail immediately. Users can also set "x-igd-legacy-mode=off" to disable legacy
> +mode.
> +
> +In legacy mode, as the guest VGA ranges are assigned to IGD device, all other
> +graphics devices should be removed, this can be done using "-nographic" or
> +"-vga none" or "-nodefaults", along with adding the device using vfio-pci.
>  
>  For either mode, depending on the host kernel, the i915 driver in the host
>  may generate faults and errors upon re-binding to an IGD device after it
> @@ -73,31 +98,39 @@ DVI, or DisplayPort) may be unsupported in some use cases.  In the author's
>  experience, even DP to VGA adapters can be troublesome while adapters between
>  digital formats work well.
>  
> -Usage
> -=====
> -The intention is for IGD assignment to be transparent for users and thus for
> -management tools like libvirt.  To make use of legacy mode, simply remove all
> -other graphics options and use "-nographic" and either "-vga none" or
> -"-nodefaults", along with adding the device using vfio-pci:
>  
> -    -device vfio-pci,host=00:02.0,id=hostdev0,bus=pci.0,addr=0x2
> +Options
> +=======
> +* x-igd-opregion=[on|*off*]
> +  Copy host IGD OpRegion and expose it to guest with fw_cfg
> +
> +* x-igd-lpc=[on|*off*]
> +  Creates a dummy LPC bridge at 00:1f:0 with host VID/DID (i440fx only)
> +
> +* x-igd-legacy-mode=[on|off|*auto*]
> +  Enable/Disable legacy mode
> +
> +* x-igd-gms=[hex, default 0]
> +  Overriding DSM region size in GGC register, 0 means uses host value.
> +  Use this only when the DSM size cannot be changed through the
> +  'DVMT Pre-Allocated' option in host BIOS.
>  
> -For UPT mode, retain the default emulated graphics and simply add the vfio-pci
> -device making use of any other bus address other than 02.0.  libvirt will
> -default to assigning the device a UPT compatible address while legacy mode
> -users will need to manually edit the XML if using a tool like virt-manager
> -where the VM device address is not expressly specified.
>  
> -An experimental vfio-pci option also exists to enable OpRegion, and thus
> -external monitor support, for UPT mode.  This can be enabled by adding
> -"x-igd-opregion=on" to the vfio-pci device options for the IGD device.  As
> -with legacy mode, this requires the host to support features introduced in
> -the v4.6 kernel.  If Intel chooses to embrace this support, the option may
> -be made non-experimental in the future, opening it to libvirt support.
> +Examples
> +========
> +* Adding IGD with automatically legacy mode support
> +  -device vfio-pci,host=00:02.0,id=hostdev0,addr=2.0
>  
> -Developer ABI
> -=============
> -Legacy mode IGD support imposes two fw_cfg requirements on the VM firmware:
> +* Adding IGD with OpRegion and LPC ID hack, but without VGA ranges
> +  (For UEFI guests)
> +  -device vfio-pci,host=00:02.0,id=hostdev0,addr=2.0,x-igd-legacy-mode=off,x-igd-opregion=on,x-igd-lpc=on,romfile=efi_oprom.rom
> +
> +
> +Guest firmware
> +==============
> +Guest firmware is responsible for setting up OpRegion and Base of Data Stolen
> +Memory (BDSM) in guest address space. IGD passthrough support imposes two
> +fw_cfg requirements on the VM firmware:
>  
>  1) "etc/igd-opregion"
>  
> @@ -117,17 +150,108 @@ Legacy mode IGD support imposes two fw_cfg requirements on the VM firmware:
>     Firmware must allocate a reserved memory below 4GB with required 1MB
>     alignment equal to this size.  Additionally the base address of this
>     reserved region must be written to the dword BDSM register in PCI config
> -   space of the IGD device at offset 0x5C.  As this support is related to
> -   running the IGD ROM, which has other dependencies on the device appearing
> -   at guest address 00:02.0, it's expected that this fw_cfg file is only
> -   relevant to a single PCI class VGA device with Intel vendor ID, appearing
> -   at PCI bus address 00:02.0.
> +   space of the IGD device at offset 0x5C (or 0xC0 for Gen 11+ devices using
> +   64-bit BDSM).  As this support is related to running the IGD ROM, which
> +   has other dependencies on the device appearing at guest address 00:02.0,
> +   it's expected that this fw_cfg file is only relevant to a single PCI
> +   class VGA device with Intel vendor ID, appearing at PCI bus address 00:02.0.
> +
> +Upstream Seabios has OpRegion and BDSM (pre-Gen11 device only) support.
> +However, the support is not accepted by upstream EDK2/OVMF. A recommended
> +solution is to create a virtual OpRom with following DXE drivers:
> +
> +* IgdAssignmentDxe: Set up OpRegion and BDSM according to fw_cfg (must)
> +* IntelGopDriver: Closed-source Intel GOP driver
> +* PlatformGopPolicy: Protocol required by IntelGopDriver
> +
> +IntelGopDriver and PlatformGopPolicy is only required when enabling GOP on IGD.
> +
> +The original IgdAssignmentDxe can be found at [3]. A Intel maintained version
> +with PlatformGopPolicy for industrial computing is at [4]. There is also an
> +unofficially maintained version with newer Gen11+ device support at [5].
> +You need to build them with EDK2.
> +
> +For the IntelGopDriver, Intel never released it to public. You may contact
> +Intel support to get one as [4] said, if you are an Intel primer customer,

s/primer/premier/ ?

> +or you can try extract it from your host firmware using "UEFI BIOS Updater"[6].
> +
> +Once you got all the required DXE drivers, a Option ROM can be generated with
> +EfiRom utility in EDK2, using
> +  EfiRom -f 0x8086 -i <Device ID of your IGD> -o output.rom \
> +  -e IgdAssignmentDxe.efi PlatformGOPPolicy.efi IntelGopDriver.efi
> +
> +
> +Known issues
> +============
> +When using OVMF as guest firmware, you may encounter the following warning:
> +warning: vfio_container_dma_map(0x55fab36ce610, 0x380010000000, 0x108000, 0x7fd336000000) = -22 (Invalid argument)
> +Solution:
> +Set the host physical address bits to IOMMU address width using
> +  -cpu host,host-phys-bits-limit=<IOMMU address width>
> +Or in libvirt XML with
> +  <cpu>
> +    <maxphysaddr mode='passthrough' limit='<IOMMU address width>'/>
> +  </cpu>
> +The IOMMU address width can be determined with
> +echo $(( ((0x$(cat /sys/devices/virtual/iommu/dmar0/intel-iommu/cap) & 0x3F0000) >> 16) + 1 ))

That's handy!

> +Refer https://edk2.groups.io/g/devel/topic/patch_v1/102359124 for more details
> +
> +
> +Memory View
> +===========
> +IGD has it own address space. To use system RAM as VRAM, a single-level page
> +table named Graphics Translation Table (GTT) is used for the address
> +translation. Each page table entry points a 4KB page. The translation flow is:
> +
> +(PTE size 8)             +-------------+---+
> +                         |   Address   | V |  V: Valid Bit
> +                         +-------------+---+
> +                         | ...         |   |
> +IGD:0x01ae9010     0xd740| 0x70ffc000  | 1 |  Mem:0x42ba3e010^
> +-----------------> 0xd748| 0x42ba3e000 | 1 +------------------>
> +(addr << 12) * 8   0xd750| 0x42ba3f000 | 1 |
> +                         | ...         |   |
> +                         +-------------+---+

I think this was meant to be '(addr >> 12) * 8'.  A simpler
representation is just (addr >> 9), but maybe you're trying to
emphasize the PTE size here.

> +^ The address may be remapped by IOMMU
> +
> +The memory region store GTT is called GTT Stolen Memory (GSM), it is located
> +right below the Data Stolen Memory (DSM). Accessing this region directly is
> +not allowed, any access will immediately freeze the whole system. The only way
> +to access it is through the second half of MMIO BAR0.
> +
> +The Data Stolen Memory is reserved by firmware, and acts as the VRAM in pre-OS
> +environments. In QEMU, guest firmware (Seabios/OVMF) is responsible for
> +reserving a continuous region and program its base address to BDSM register,
> +then let VBIOS/GOP driver initializing this region. Illustration below shows
> +how DSM is mapped.
> +
> +       IGD Addr Space                 Host Addr Space         Guest Addr Space
> +       +-------------+                +-------------+         +-------------+
> +       |             |                |             |         |             |
> +       |             |                |             |         |             |
> +       |             |                +-------------+         +-------------+
> +       |             |                | Data Stolen |         | Data Stolen |
> +       |             |                |   (Guest)   |         |   (Guest)   |
> +       |             |  +------------>+-------------+<------->+-------------+<--Guest BDSM
> +       |             |  | Passthrough |             | EPT     |             |   Emulated by QEMU
> +DSMSIZE+-------------+  | with IOMMU  |             | Mapping |             |   Programmed by guest FW
> +       |             |  |             |             |         |             |
> +       |             |  |             |             |         |             |
> +      0+-------------+--+             |             |         |             |
> +                        |             +-------------+         |             |
> +                        |             | Data Stolen |         +-------------+
> +                        |             |   (Host)    |
> +                        +------------>+-------------+<--Host BDSM
> +                          Non-        |             |   "real" one in HW
> +                          Passthrough |             |   Programmed by host FW
> +                                      +-------------+
>  
>  Footnotes
>  =========
> -[1] Nothing precludes adding additional emulated or assigned graphics devices
> -    as non-primary, other than the combination typically not working.  I only
> -    intend to set user expectations, others are welcome to find working
> -    combinations or fix whatever issues prevent this from working in the common
> -    case.
> +[1] https://www.intel.com/content/www/us/en/docs/graphics-for-linux/developer-reference/1-0/dump-video-bios.html
>  [2] # echo "vfio-pci" > /sys/bus/pci/devices/0000:00:02.0/driver_override
> +[3] https://web.archive.org/web/20240827012422/https://bugzilla.tianocore.org/show_bug.cgi?id=935
> +    Tianocore bugzilla was down since Jan 2025 :(
> +[4] https://eci.intel.com/docs/3.3/components/kvm-hypervisor.html, Patch 0001-0004
> +[5] https://github.com/tomitamoeko/VfioIgdPkg
> +[6] https://winraid.level1techs.com/t/tool-guide-news-uefi-bios-updater-ubu/30357

This is great and a much needed update.  Thanks!

With above corrections:

Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
diff mbox series

Patch

diff --git a/docs/igd-assign.txt b/docs/igd-assign.txt
index e17bb50789..c7c4565906 100644
--- a/docs/igd-assign.txt
+++ b/docs/igd-assign.txt
@@ -1,44 +1,69 @@ 
 Intel Graphics Device (IGD) assignment with vfio-pci
 ====================================================
 
-IGD has two different modes for assignment using vfio-pci:
-
-1) Universal Pass-Through (UPT) mode:
-
-   In this mode the IGD device is added as a *secondary* (ie. non-primary)
-   graphics device in combination with an emulated primary graphics device.
-   This mode *requires* guest driver support to remove the external
-   dependencies generally associated with IGD (see below).  Those guest
-   drivers only support this mode for Broadwell and newer IGD, according to
-   Intel.  Additionally, this mode by default, and as officially supported
-   by Intel, does not support direct video output.  The intention is to use
-   this mode either to provide hardware acceleration to the emulated graphics
-   or to use this mode in combination with guest-based remote access software,
-   for example VNC (see below for optional output support).  This mode
-   theoretically has no device specific handling dependencies on vfio-pci or
-   the VM firmware.
-
-2) "Legacy" mode:
-
-   In this mode the IGD device is intended to be the primary and exclusive
-   graphics device in the VM[1], as such QEMU does not facilitate any sort
-   of remote graphics to the VM in this mode.  A connected physical monitor
-   is the intended output device for IGD.  This mode includes several
-   requirements and restrictions:
-
-    * IGD must be given address 02.0 on the PCI root bus in the VM
-    * The host kernel must support vfio extensions for IGD (v4.6)
-    * vfio VGA support very likely needs to be enabled in the host kernel
-    * The VM firmware must support specific fw_cfg enablers for IGD
-    * The VM machine type must support a PCI host bridge at 00.0 (standard)
-    * The VM machine type must provide or allow to be created a special
-      ISA/LPC bridge device (vfio-pci-igd-lpc-bridge) on the root bus at
-      PCI address 1f.0.
-    * The IGD device must have a VGA ROM, either provided via the romfile
-      option or loaded automatically through vfio (standard).  rombar=0
-      will disable legacy mode support.
-    * Hotplug of the IGD device is not supported.
-    * The IGD device must be a SandyBridge or newer model device.
+Using vfio-pci, we can passthrough Intel Graphics Device (IGD) to guest, either
+serve as primary and exclusive graphics adapter, or used in combination with an
+emulated primary graphics device, depending on the config and guest driver
+support. However, IGD devices are not "clean" PCI devices, they use extra
+memory regions other than BARs. Special handling is required to make them work
+properly, including:
+
+* OpRegion for accessing Virtual BIOS Table (VBT) that contains display output
+  information.
+* Data Stolen Memory (DSM) region used as VRAM at early stage (BIOS/UEFI)
+
+Certain guest software also depends on following conditions to work:
+(*-Required by)
+
+| Condition                                   | Linux | Windows | VBIOS | EFI GOP |
+|---------------------------------------------|-------|---------|-------|---------|
+| #1 IGD has a valid OpRegion containing VBT  |  * ^1 |    *    |   *   |    *    |
+| #2 VID/DID of LPC bridge at 00:1f.0 matches |       |         |   *   |    *    |
+| #3 IGD is assigned to BDF 00:02.0           |       |         |   *   |    *    |
+| #4 IGD has VGA controller device class      |       |         |   *   |    *    |
+| #5 Host's VGA ranges are mapped to IGD      |       |         |   *   |         |
+| #6 Guest has valid VBIOS or UEFI Option ROM |       |         |   *   |    *    |
+
+^1 Though i915 driver is able to mock a OpRegion, it is still recommended to
+   use the VBT copied from host OpRegion to prevent incorrect configuration.
+
+For #1, the "x-igd-opregion=on" option exposes a copy of host IGD OpRegion to
+guest via fw_cfg, where guest firmware can set up guest OpRegion with it.
+
+For #2, "x-igd-lpc=on" option copies the IDs of host LPC bridge and host bridge
+to guest. Currently this is only supported on i440fx machines as there is
+already an ICH9 LPC bridge present on q35 machines, overwriting its IDs may
+lead to unexpected behavior.
+
+For #3, "addr=2.0" assigns IGD to 00:02.0.
+
+For #4, the primary display must be set to IGD in host BIOS.
+
+For #5, "x-vga=on" enables guest access to standard VGA IO/MMIO ranges.
+
+For #6, ROM either provided via the ROM BAR or romfile= option is needed, this
+Intel document [1] shows how to dump VBIOS to file. For UEFI Option ROM, see
+"Guest firmware" section.
+
+QEMU also provides a "Legacy" mode that implicitly enables full functionality
+on IGD, it is automatically enabled when
+* Machine type is i440fx
+* IGD is assigned to guest BDF 00:02.0
+* ROM BAR or romfile is present
+
+In "Legacy" mode, QEMU will automatically setup OpRegion, LPC bridge IDs and
+VGA range access, which is equivalent to:
+  x-igd-opregion=on,x-igd-lpc=on,x-vga=on
+
+By default, "Legacy" mode won't fail, it continues on error. User can set
+"x-igd-legacy-mode=on" to force enabling legacy mode, this also checks if the
+conditions above for legacy mode is met, and if any error occurs, QEMU will
+fail immediately. Users can also set "x-igd-legacy-mode=off" to disable legacy
+mode.
+
+In legacy mode, as the guest VGA ranges are assigned to IGD device, all other
+graphics devices should be removed, this can be done using "-nographic" or
+"-vga none" or "-nodefaults", along with adding the device using vfio-pci.
 
 For either mode, depending on the host kernel, the i915 driver in the host
 may generate faults and errors upon re-binding to an IGD device after it
@@ -73,31 +98,39 @@  DVI, or DisplayPort) may be unsupported in some use cases.  In the author's
 experience, even DP to VGA adapters can be troublesome while adapters between
 digital formats work well.
 
-Usage
-=====
-The intention is for IGD assignment to be transparent for users and thus for
-management tools like libvirt.  To make use of legacy mode, simply remove all
-other graphics options and use "-nographic" and either "-vga none" or
-"-nodefaults", along with adding the device using vfio-pci:
 
-    -device vfio-pci,host=00:02.0,id=hostdev0,bus=pci.0,addr=0x2
+Options
+=======
+* x-igd-opregion=[on|*off*]
+  Copy host IGD OpRegion and expose it to guest with fw_cfg
+
+* x-igd-lpc=[on|*off*]
+  Creates a dummy LPC bridge at 00:1f:0 with host VID/DID (i440fx only)
+
+* x-igd-legacy-mode=[on|off|*auto*]
+  Enable/Disable legacy mode
+
+* x-igd-gms=[hex, default 0]
+  Overriding DSM region size in GGC register, 0 means uses host value.
+  Use this only when the DSM size cannot be changed through the
+  'DVMT Pre-Allocated' option in host BIOS.
 
-For UPT mode, retain the default emulated graphics and simply add the vfio-pci
-device making use of any other bus address other than 02.0.  libvirt will
-default to assigning the device a UPT compatible address while legacy mode
-users will need to manually edit the XML if using a tool like virt-manager
-where the VM device address is not expressly specified.
 
-An experimental vfio-pci option also exists to enable OpRegion, and thus
-external monitor support, for UPT mode.  This can be enabled by adding
-"x-igd-opregion=on" to the vfio-pci device options for the IGD device.  As
-with legacy mode, this requires the host to support features introduced in
-the v4.6 kernel.  If Intel chooses to embrace this support, the option may
-be made non-experimental in the future, opening it to libvirt support.
+Examples
+========
+* Adding IGD with automatically legacy mode support
+  -device vfio-pci,host=00:02.0,id=hostdev0,addr=2.0
 
-Developer ABI
-=============
-Legacy mode IGD support imposes two fw_cfg requirements on the VM firmware:
+* Adding IGD with OpRegion and LPC ID hack, but without VGA ranges
+  (For UEFI guests)
+  -device vfio-pci,host=00:02.0,id=hostdev0,addr=2.0,x-igd-legacy-mode=off,x-igd-opregion=on,x-igd-lpc=on,romfile=efi_oprom.rom
+
+
+Guest firmware
+==============
+Guest firmware is responsible for setting up OpRegion and Base of Data Stolen
+Memory (BDSM) in guest address space. IGD passthrough support imposes two
+fw_cfg requirements on the VM firmware:
 
 1) "etc/igd-opregion"
 
@@ -117,17 +150,108 @@  Legacy mode IGD support imposes two fw_cfg requirements on the VM firmware:
    Firmware must allocate a reserved memory below 4GB with required 1MB
    alignment equal to this size.  Additionally the base address of this
    reserved region must be written to the dword BDSM register in PCI config
-   space of the IGD device at offset 0x5C.  As this support is related to
-   running the IGD ROM, which has other dependencies on the device appearing
-   at guest address 00:02.0, it's expected that this fw_cfg file is only
-   relevant to a single PCI class VGA device with Intel vendor ID, appearing
-   at PCI bus address 00:02.0.
+   space of the IGD device at offset 0x5C (or 0xC0 for Gen 11+ devices using
+   64-bit BDSM).  As this support is related to running the IGD ROM, which
+   has other dependencies on the device appearing at guest address 00:02.0,
+   it's expected that this fw_cfg file is only relevant to a single PCI
+   class VGA device with Intel vendor ID, appearing at PCI bus address 00:02.0.
+
+Upstream Seabios has OpRegion and BDSM (pre-Gen11 device only) support.
+However, the support is not accepted by upstream EDK2/OVMF. A recommended
+solution is to create a virtual OpRom with following DXE drivers:
+
+* IgdAssignmentDxe: Set up OpRegion and BDSM according to fw_cfg (must)
+* IntelGopDriver: Closed-source Intel GOP driver
+* PlatformGopPolicy: Protocol required by IntelGopDriver
+
+IntelGopDriver and PlatformGopPolicy is only required when enabling GOP on IGD.
+
+The original IgdAssignmentDxe can be found at [3]. A Intel maintained version
+with PlatformGopPolicy for industrial computing is at [4]. There is also an
+unofficially maintained version with newer Gen11+ device support at [5].
+You need to build them with EDK2.
+
+For the IntelGopDriver, Intel never released it to public. You may contact
+Intel support to get one as [4] said, if you are an Intel primer customer,
+or you can try extract it from your host firmware using "UEFI BIOS Updater"[6].
+
+Once you got all the required DXE drivers, a Option ROM can be generated with
+EfiRom utility in EDK2, using
+  EfiRom -f 0x8086 -i <Device ID of your IGD> -o output.rom \
+  -e IgdAssignmentDxe.efi PlatformGOPPolicy.efi IntelGopDriver.efi
+
+
+Known issues
+============
+When using OVMF as guest firmware, you may encounter the following warning:
+warning: vfio_container_dma_map(0x55fab36ce610, 0x380010000000, 0x108000, 0x7fd336000000) = -22 (Invalid argument)
+Solution:
+Set the host physical address bits to IOMMU address width using
+  -cpu host,host-phys-bits-limit=<IOMMU address width>
+Or in libvirt XML with
+  <cpu>
+    <maxphysaddr mode='passthrough' limit='<IOMMU address width>'/>
+  </cpu>
+The IOMMU address width can be determined with
+echo $(( ((0x$(cat /sys/devices/virtual/iommu/dmar0/intel-iommu/cap) & 0x3F0000) >> 16) + 1 ))
+Refer https://edk2.groups.io/g/devel/topic/patch_v1/102359124 for more details
+
+
+Memory View
+===========
+IGD has it own address space. To use system RAM as VRAM, a single-level page
+table named Graphics Translation Table (GTT) is used for the address
+translation. Each page table entry points a 4KB page. The translation flow is:
+
+(PTE size 8)             +-------------+---+
+                         |   Address   | V |  V: Valid Bit
+                         +-------------+---+
+                         | ...         |   |
+IGD:0x01ae9010     0xd740| 0x70ffc000  | 1 |  Mem:0x42ba3e010^
+-----------------> 0xd748| 0x42ba3e000 | 1 +------------------>
+(addr << 12) * 8   0xd750| 0x42ba3f000 | 1 |
+                         | ...         |   |
+                         +-------------+---+
+^ The address may be remapped by IOMMU
+
+The memory region store GTT is called GTT Stolen Memory (GSM), it is located
+right below the Data Stolen Memory (DSM). Accessing this region directly is
+not allowed, any access will immediately freeze the whole system. The only way
+to access it is through the second half of MMIO BAR0.
+
+The Data Stolen Memory is reserved by firmware, and acts as the VRAM in pre-OS
+environments. In QEMU, guest firmware (Seabios/OVMF) is responsible for
+reserving a continuous region and program its base address to BDSM register,
+then let VBIOS/GOP driver initializing this region. Illustration below shows
+how DSM is mapped.
+
+       IGD Addr Space                 Host Addr Space         Guest Addr Space
+       +-------------+                +-------------+         +-------------+
+       |             |                |             |         |             |
+       |             |                |             |         |             |
+       |             |                +-------------+         +-------------+
+       |             |                | Data Stolen |         | Data Stolen |
+       |             |                |   (Guest)   |         |   (Guest)   |
+       |             |  +------------>+-------------+<------->+-------------+<--Guest BDSM
+       |             |  | Passthrough |             | EPT     |             |   Emulated by QEMU
+DSMSIZE+-------------+  | with IOMMU  |             | Mapping |             |   Programmed by guest FW
+       |             |  |             |             |         |             |
+       |             |  |             |             |         |             |
+      0+-------------+--+             |             |         |             |
+                        |             +-------------+         |             |
+                        |             | Data Stolen |         +-------------+
+                        |             |   (Host)    |
+                        +------------>+-------------+<--Host BDSM
+                          Non-        |             |   "real" one in HW
+                          Passthrough |             |   Programmed by host FW
+                                      +-------------+
 
 Footnotes
 =========
-[1] Nothing precludes adding additional emulated or assigned graphics devices
-    as non-primary, other than the combination typically not working.  I only
-    intend to set user expectations, others are welcome to find working
-    combinations or fix whatever issues prevent this from working in the common
-    case.
+[1] https://www.intel.com/content/www/us/en/docs/graphics-for-linux/developer-reference/1-0/dump-video-bios.html
 [2] # echo "vfio-pci" > /sys/bus/pci/devices/0000:00:02.0/driver_override
+[3] https://web.archive.org/web/20240827012422/https://bugzilla.tianocore.org/show_bug.cgi?id=935
+    Tianocore bugzilla was down since Jan 2025 :(
+[4] https://eci.intel.com/docs/3.3/components/kvm-hypervisor.html, Patch 0001-0004
+[5] https://github.com/tomitamoeko/VfioIgdPkg
+[6] https://winraid.level1techs.com/t/tool-guide-news-uefi-bios-updater-ubu/30357