diff mbox series

[v7,4/4] x86: Allow using Linux's PAT

Message ID 9fd0360dd914d93dab357d16b46b4290e6119d30.1673123823.git.demi@invisiblethingslab.com (mailing list archive)
State New, archived
Headers show
Series Make PAT handling less brittle | expand

Commit Message

Demi Marie Obenour Jan. 7, 2023, 10:07 p.m. UTC
Due to a hardware errata, Intel integrated GPUs are incompatible with
Xen's PAT.  Using Linux's PAT is a workaround for this flaw.

Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
---
 xen/arch/x86/Kconfig                 | 33 ++++++++++++++++++++++++++++
 xen/arch/x86/include/asm/page.h      | 12 ++++++++++
 xen/arch/x86/include/asm/processor.h | 15 +++++++++++++
 xen/arch/x86/mm.c                    |  2 ++
 4 files changed, 62 insertions(+)

Comments

Jan Beulich Jan. 9, 2023, 11:37 a.m. UTC | #1
On 07.01.2023 23:07, Demi Marie Obenour wrote:
> --- a/xen/arch/x86/Kconfig
> +++ b/xen/arch/x86/Kconfig
> @@ -227,6 +227,39 @@ config XEN_ALIGN_2M
>  
>  endchoice
>  
> +config LINUX_PAT
> +	bool "Use Linux's PAT instead of Xen's default"
> +	help
> +	  Use Linux's Page Attribute Table instead of the default Xen value.
> +
> +	  The Page Attribute Table (PAT) maps three bits in the page table entry
> +	  to the actual cacheability used by the processor.  Many Intel
> +	  integrated GPUs have errata (bugs) that cause CPU access to GPU memory
> +	  to ignore the topmost bit.  When using Xen's default PAT, this results
> +	  in caches not being flushed and incorrect images being displayed.  The
> +	  default PAT used by Linux does not cause this problem.
> +
> +	  If you say Y here, you will be able to use Intel integrated GPUs that
> +	  are attached to your Linux dom0 or other Linux PV guests.  However,
> +	  you will not be able to use non-Linux OSs in dom0, and attaching a PCI
> +	  device to a non-Linux PV guest will result in unpredictable guest
> +	  behavior.  If you say N here, you will be able to use a non-Linux
> +	  dom0, and will be able to attach PCI devices to non-Linux PV guests.
> +
> +	  Note that saving a PV guest with an assigned PCI device on a machine
> +	  with one PAT and restoring it on a machine with a different PAT won't
> +	  work: the resulting guest may boot and even appear to work, but caches
> +	  will not be flushed when needed, with unpredictable results.  HVM
> +	  (including PVH and PVHVM) guests and guests without assigned PCI
> +	  devices do not care what PAT Xen uses, and migration (even live)
> +	  between hypervisors with different PATs will work fine.  Guests using
> +	  PV Shim care about the PAT used by the PV Shim firmware, not the
> +	  host’s PAT.  Also, non-default PAT values are incompatible with the
> +	  (deprecated) qemu-traditional stubdomain.
> +
> +	  Say Y if you are building a hypervisor for a Linux distribution that
> +	  supports Intel iGPUs.  Say N otherwise.

I'm not convinced we want this; if other maintainers think differently,
then I don't mean to stand in the way though. If so, however,
- the above likely wants guarding by EXPERT and/or UNSUPPORTED
- the support status of using this setting wants to be made crystal
  clear, perhaps by an addition to ./SUPPORT.md.

Jan
Demi Marie Obenour Jan. 9, 2023, 4:14 p.m. UTC | #2
On Mon, Jan 09, 2023 at 12:37:34PM +0100, Jan Beulich wrote:
> On 07.01.2023 23:07, Demi Marie Obenour wrote:
> > --- a/xen/arch/x86/Kconfig
> > +++ b/xen/arch/x86/Kconfig
> > @@ -227,6 +227,39 @@ config XEN_ALIGN_2M
> >  
> >  endchoice
> >  
> > +config LINUX_PAT
> > +	bool "Use Linux's PAT instead of Xen's default"
> > +	help
> > +	  Use Linux's Page Attribute Table instead of the default Xen value.
> > +
> > +	  The Page Attribute Table (PAT) maps three bits in the page table entry
> > +	  to the actual cacheability used by the processor.  Many Intel
> > +	  integrated GPUs have errata (bugs) that cause CPU access to GPU memory
> > +	  to ignore the topmost bit.  When using Xen's default PAT, this results
> > +	  in caches not being flushed and incorrect images being displayed.  The
> > +	  default PAT used by Linux does not cause this problem.
> > +
> > +	  If you say Y here, you will be able to use Intel integrated GPUs that
> > +	  are attached to your Linux dom0 or other Linux PV guests.  However,
> > +	  you will not be able to use non-Linux OSs in dom0, and attaching a PCI
> > +	  device to a non-Linux PV guest will result in unpredictable guest
> > +	  behavior.  If you say N here, you will be able to use a non-Linux
> > +	  dom0, and will be able to attach PCI devices to non-Linux PV guests.
> > +
> > +	  Note that saving a PV guest with an assigned PCI device on a machine
> > +	  with one PAT and restoring it on a machine with a different PAT won't
> > +	  work: the resulting guest may boot and even appear to work, but caches
> > +	  will not be flushed when needed, with unpredictable results.  HVM
> > +	  (including PVH and PVHVM) guests and guests without assigned PCI
> > +	  devices do not care what PAT Xen uses, and migration (even live)
> > +	  between hypervisors with different PATs will work fine.  Guests using
> > +	  PV Shim care about the PAT used by the PV Shim firmware, not the
> > +	  host’s PAT.  Also, non-default PAT values are incompatible with the
> > +	  (deprecated) qemu-traditional stubdomain.
> > +
> > +	  Say Y if you are building a hypervisor for a Linux distribution that
> > +	  supports Intel iGPUs.  Say N otherwise.
> 
> I'm not convinced we want this; if other maintainers think differently,
> then I don't mean to stand in the way though. If so, however,
> - the above likely wants guarding by EXPERT and/or UNSUPPORTED

I considered this, but decided against it.  Recent Intel iGPUs are
simply incompatible with Xen’s default PAT, so anyone wanting to use Xen
in a desktop environment must say Y here.  Guarding this with EXPERT or
UNSUPPORTED will not prevent distribution maintainers from enabling it,
because the alternative is building a hypervisor that does not support
the hardware their users actually have.  Qubes OS is *already* shipping
a patch to use Linux’s PAT, so you don’t need to worry that this code
will go untested.  And if there was a vulnerability that requires
CONFIG_LINUX_PAT=y, I’d rather it not be dropped on Qubes users as a
0day.
diff mbox series

Patch

diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 6a7825f4ba3c98e0496415123fde79ee62f771fa..18efccedfd08873cd169a54825b0ba4256a12942 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -227,6 +227,39 @@  config XEN_ALIGN_2M
 
 endchoice
 
+config LINUX_PAT
+	bool "Use Linux's PAT instead of Xen's default"
+	help
+	  Use Linux's Page Attribute Table instead of the default Xen value.
+
+	  The Page Attribute Table (PAT) maps three bits in the page table entry
+	  to the actual cacheability used by the processor.  Many Intel
+	  integrated GPUs have errata (bugs) that cause CPU access to GPU memory
+	  to ignore the topmost bit.  When using Xen's default PAT, this results
+	  in caches not being flushed and incorrect images being displayed.  The
+	  default PAT used by Linux does not cause this problem.
+
+	  If you say Y here, you will be able to use Intel integrated GPUs that
+	  are attached to your Linux dom0 or other Linux PV guests.  However,
+	  you will not be able to use non-Linux OSs in dom0, and attaching a PCI
+	  device to a non-Linux PV guest will result in unpredictable guest
+	  behavior.  If you say N here, you will be able to use a non-Linux
+	  dom0, and will be able to attach PCI devices to non-Linux PV guests.
+
+	  Note that saving a PV guest with an assigned PCI device on a machine
+	  with one PAT and restoring it on a machine with a different PAT won't
+	  work: the resulting guest may boot and even appear to work, but caches
+	  will not be flushed when needed, with unpredictable results.  HVM
+	  (including PVH and PVHVM) guests and guests without assigned PCI
+	  devices do not care what PAT Xen uses, and migration (even live)
+	  between hypervisors with different PATs will work fine.  Guests using
+	  PV Shim care about the PAT used by the PV Shim firmware, not the
+	  host’s PAT.  Also, non-default PAT values are incompatible with the
+	  (deprecated) qemu-traditional stubdomain.
+
+	  Say Y if you are building a hypervisor for a Linux distribution that
+	  supports Intel iGPUs.  Say N otherwise.
+
 config X2APIC_PHYSICAL
 	bool "x2APIC Physical Destination mode"
 	help
diff --git a/xen/arch/x86/include/asm/page.h b/xen/arch/x86/include/asm/page.h
index c7d77ab2901aa5bdb03a719af810c6f8d8ba9d4e..03839eb2b78517332663daad2089677d7000852c 100644
--- a/xen/arch/x86/include/asm/page.h
+++ b/xen/arch/x86/include/asm/page.h
@@ -331,6 +331,7 @@  void efi_update_l4_pgtable(unsigned int l4idx, l4_pgentry_t);
 
 #define PAGE_CACHE_ATTRS (_PAGE_PAT | _PAGE_PCD | _PAGE_PWT)
 
+#ifndef CONFIG_LINUX_PAT
 /* Memory types, encoded under Xen's choice of MSR_PAT. */
 #define _PAGE_WB         (                                0)
 #define _PAGE_WT         (                        _PAGE_PWT)
@@ -340,6 +341,17 @@  void efi_update_l4_pgtable(unsigned int l4idx, l4_pgentry_t);
 #define _PAGE_WP         (_PAGE_PAT |             _PAGE_PWT)
 #define _PAGE_RSVD_1     (_PAGE_PAT | _PAGE_PCD            )
 #define _PAGE_RSVD_2     (_PAGE_PAT | _PAGE_PCD | _PAGE_PWT)
+#else
+/* Memory types, encoded under Linux's choice of MSR_PAT. */
+#define _PAGE_WB         (                                0)
+#define _PAGE_WC         (                        _PAGE_PWT)
+#define _PAGE_UCM        (            _PAGE_PCD            )
+#define _PAGE_UC         (            _PAGE_PCD | _PAGE_PWT)
+#define _PAGE_RSVD_1     (_PAGE_PAT                        )
+#define _PAGE_WP         (_PAGE_PAT |             _PAGE_PWT)
+#define _PAGE_RSVD_2     (_PAGE_PAT | _PAGE_PCD            )
+#define _PAGE_WT         (_PAGE_PAT | _PAGE_PCD | _PAGE_PWT)
+#endif
 
 /*
  * Debug option: Ensure that granted mappings are not implicitly unmapped.
diff --git a/xen/arch/x86/include/asm/processor.h b/xen/arch/x86/include/asm/processor.h
index 60b902060914584957db8afa5c7c1e6abdad4d13..413b59ab284990cca192fa1dc44b437f58bd282f 100644
--- a/xen/arch/x86/include/asm/processor.h
+++ b/xen/arch/x86/include/asm/processor.h
@@ -92,6 +92,20 @@ 
                           X86_EFLAGS_NT|X86_EFLAGS_DF|X86_EFLAGS_IF|    \
                           X86_EFLAGS_TF)
 
+#ifdef CONFIG_LINUX_PAT
+/*
+ * Host IA32_CR_PAT value to cover all memory types.  This is not the default
+ * MSR_PAT value, but is the same as the one used by Linux.
+ */
+#define XEN_MSR_PAT ((_AC(X86_MT_WB,  ULL) << 0x00) | \
+                     (_AC(X86_MT_WC,  ULL) << 0x08) | \
+                     (_AC(X86_MT_UCM, ULL) << 0x10) | \
+                     (_AC(X86_MT_UC,  ULL) << 0x18) | \
+                     (_AC(X86_MT_WB,  ULL) << 0x20) | \
+                     (_AC(X86_MT_WP,  ULL) << 0x28) | \
+                     (_AC(X86_MT_UCM, ULL) << 0x30) | \
+                     (_AC(X86_MT_WT,  ULL) << 0x38))
+#else
 /*
  * Host IA32_CR_PAT value to cover all memory types.  This is not the default
  * MSR_PAT value, and is an ABI with PV guests.
@@ -104,6 +118,7 @@ 
                      (_AC(X86_MT_WP,  ULL) << 0x28) | \
                      (_AC(X86_MT_UC,  ULL) << 0x30) | \
                      (_AC(X86_MT_UC,  ULL) << 0x38))
+#endif
 
 #ifndef __ASSEMBLY__
 
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index d69e9bea6c30bc782ab4c331f42502f6e61a028a..042c1875a02092a3f19c293003ef12209d88a450 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -6407,6 +6407,7 @@  unsigned long get_upper_mfn_bound(void)
 
 static void __init __maybe_unused build_assertions(void)
 {
+#ifndef CONFIG_LINUX_PAT
     /*
      * If this trips, any guests that blindly rely on the public API in xen.h
      * (instead of reading the PAT from Xen, as Linux 3.19+ does) will be
@@ -6414,6 +6415,7 @@  static void __init __maybe_unused build_assertions(void)
      * using different PATs will not work.
      */
     BUILD_BUG_ON(XEN_MSR_PAT != 0x050100070406ULL);
+#endif
 
     /*
      * _PAGE_WB must be zero.  Linux PV guests assume that _PAGE_WB will be