diff mbox series

x86/msr: expose MSR_FAM10H_MMIO_CONF_BASE on AMD

Message ID 20250221120417.20431-1-roger.pau@citrix.com (mailing list archive)
State New
Headers show
Series x86/msr: expose MSR_FAM10H_MMIO_CONF_BASE on AMD | expand

Commit Message

Roger Pau Monné Feb. 21, 2025, 12:04 p.m. UTC
The MMIO_CONF_BASE reports the base of the MCFG range on AMD systems.
Currently Linux is unconditionally attempting to read the MSR without a
safe MSR accessor, and since Xen doesn't allow access to it Linux reports
the following error:

unchecked MSR access error: RDMSR from 0xc0010058 at rIP: 0xffffffff8101d19f (xen_do_read_msr+0x7f/0xa0)
Call Trace:
 <TASK>
 ? ex_handler_msr+0x11e/0x150
 ? fixup_exception+0x81/0x300
 ? exc_general_protection+0x138/0x410
 ? asm_exc_general_protection+0x22/0x30
 ? xen_do_read_msr+0x7f/0xa0
 xen_read_msr+0x1e/0x30
 amd_get_mmconfig_range+0x2b/0x80
 quirk_amd_mmconfig_area+0x28/0x100
 ? quirk_system_pci_resources+0x2b/0x150
 pnp_fixup_device+0x39/0x50
 __pnp_add_device+0xf/0x150
 pnp_add_device+0x3d/0x100
 ? __pfx_pnpacpi_allocated_resource+0x10/0x10
 ? __pfx_pnpacpi_allocated_resource+0x10/0x10
 ? acpi_walk_resources+0xbb/0xd0
 pnpacpi_add_device_handler+0x1f9/0x280
 acpi_ns_get_device_callback+0x104/0x1c0
 ? _raw_spin_unlock_irqrestore+0x18/0x20
 ? down_timeout+0x3a/0x60
 ? _raw_spin_lock_irqsave+0x14/0x40
 acpi_ns_walk_namespace+0x1d0/0x260
 ? _raw_spin_unlock_irqrestore+0x18/0x20
 ? __pfx_acpi_ns_get_device_callback+0x10/0x10
 acpi_get_devices+0x8a/0xb0
 ? __pfx_pnpacpi_add_device_handler+0x10/0x10
 ? __pfx_pnpacpi_init+0x10/0x10
 pnpacpi_init+0x50/0x80
 do_one_initcall+0x46/0x2e0
 kernel_init_freeable+0x1da/0x2f0
 ? __pfx_kernel_init+0x10/0x10
 kernel_init+0x16/0x1b0
 ret_from_fork+0x30/0x50
 ? __pfx_kernel_init+0x10/0x10
 ret_from_fork_asm+0x1b/0x30
 </TASK>

Fix by allowing access to the MSR on AMD systems, returning 0 for
unprivileged domains (MMIO configuration space disabled), and the native
value for the hardware domain.

The non hardware domain logic will need to be adjusted if in the future we
expose an MCFG region to such domains.

Write attempts to the MSR will still result in #GP for all domain types.

Fixes: 84e848fd7a16 ('x86/hvm: disallow access to unknown MSRs')
Fixes: 322ec7c89f66 ('x86/pv: disallow access to unknown MSRs')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/msr.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

Comments

Andrew Cooper Feb. 21, 2025, 12:34 p.m. UTC | #1
On 21/02/2025 12:04 pm, Roger Pau Monne wrote:
> The MMIO_CONF_BASE reports the base of the MCFG range on AMD systems.
> Currently Linux is unconditionally attempting to read the MSR without a
> safe MSR accessor, and since Xen doesn't allow access to it Linux reports
> the following error:
>
> unchecked MSR access error: RDMSR from 0xc0010058 at rIP: 0xffffffff8101d19f (xen_do_read_msr+0x7f/0xa0)
> Call Trace:
>  <TASK>
>  ? ex_handler_msr+0x11e/0x150
>  ? fixup_exception+0x81/0x300
>  ? exc_general_protection+0x138/0x410
>  ? asm_exc_general_protection+0x22/0x30
>  ? xen_do_read_msr+0x7f/0xa0
>  xen_read_msr+0x1e/0x30
>  amd_get_mmconfig_range+0x2b/0x80
>  quirk_amd_mmconfig_area+0x28/0x100
>  ? quirk_system_pci_resources+0x2b/0x150
>  pnp_fixup_device+0x39/0x50
>  __pnp_add_device+0xf/0x150
>  pnp_add_device+0x3d/0x100
>  ? __pfx_pnpacpi_allocated_resource+0x10/0x10
>  ? __pfx_pnpacpi_allocated_resource+0x10/0x10
>  ? acpi_walk_resources+0xbb/0xd0
>  pnpacpi_add_device_handler+0x1f9/0x280
>  acpi_ns_get_device_callback+0x104/0x1c0
>  ? _raw_spin_unlock_irqrestore+0x18/0x20
>  ? down_timeout+0x3a/0x60
>  ? _raw_spin_lock_irqsave+0x14/0x40
>  acpi_ns_walk_namespace+0x1d0/0x260
>  ? _raw_spin_unlock_irqrestore+0x18/0x20
>  ? __pfx_acpi_ns_get_device_callback+0x10/0x10
>  acpi_get_devices+0x8a/0xb0
>  ? __pfx_pnpacpi_add_device_handler+0x10/0x10
>  ? __pfx_pnpacpi_init+0x10/0x10
>  pnpacpi_init+0x50/0x80
>  do_one_initcall+0x46/0x2e0
>  kernel_init_freeable+0x1da/0x2f0
>  ? __pfx_kernel_init+0x10/0x10
>  kernel_init+0x16/0x1b0
>  ret_from_fork+0x30/0x50
>  ? __pfx_kernel_init+0x10/0x10
>  ret_from_fork_asm+0x1b/0x30
>  </TASK>
>
> Fix by allowing access to the MSR on AMD systems, returning 0 for
> unprivileged domains (MMIO configuration space disabled), and the native
> value for the hardware domain.
>
> The non hardware domain logic will need to be adjusted if in the future we
> expose an MCFG region to such domains.
>
> Write attempts to the MSR will still result in #GP for all domain types.
>
> Fixes: 84e848fd7a16 ('x86/hvm: disallow access to unknown MSRs')
> Fixes: 322ec7c89f66 ('x86/pv: disallow access to unknown MSRs')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
>  xen/arch/x86/msr.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
>
> diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
> index 289cf10b783a..c588c9131337 100644
> --- a/xen/arch/x86/msr.c
> +++ b/xen/arch/x86/msr.c
> @@ -245,6 +245,21 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val)
>          *val = 0;
>          break;
>  
> +    case MSR_FAM10H_MMIO_CONF_BASE:
> +        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
> +            goto gp_fault;
> +
> +        /*
> +         * Report MMIO configuration space is disabled unconditionally for
> +         * domUs, as the emulated chipset doesn't support ECAM.  For dom0
> +         * return the hardware value.
> +         */
> +        *val = 0;
> +        if ( is_hardware_domain(d) && rdmsr_safe(msr, *val) )
> +            goto gp_fault;
> +
> +        break;
> +
>      case MSR_VIRT_SPEC_CTRL:
>          if ( !cp->extd.virt_ssbd )
>              goto gp_fault;

Looking at the linux code, can we not fix this just by turning it into a
rdmsr_safe(), noting that not all hypervisors virtualise this MSR?

Given the number of VMs which genuinely don't have PCI (emulated or
otherwise), it's a buggy assumption in Linux.

~Andrew
Jan Beulich Feb. 21, 2025, 12:44 p.m. UTC | #2
On 21.02.2025 13:04, Roger Pau Monne wrote:
> The MMIO_CONF_BASE reports the base of the MCFG range on AMD systems.
> Currently Linux is unconditionally attempting to read the MSR without a
> safe MSR accessor, and since Xen doesn't allow access to it Linux reports
> the following error:
> 
> unchecked MSR access error: RDMSR from 0xc0010058 at rIP: 0xffffffff8101d19f (xen_do_read_msr+0x7f/0xa0)
> Call Trace:
>  <TASK>
>  ? ex_handler_msr+0x11e/0x150
>  ? fixup_exception+0x81/0x300
>  ? exc_general_protection+0x138/0x410
>  ? asm_exc_general_protection+0x22/0x30
>  ? xen_do_read_msr+0x7f/0xa0
>  xen_read_msr+0x1e/0x30
>  amd_get_mmconfig_range+0x2b/0x80
>  quirk_amd_mmconfig_area+0x28/0x100
>  ? quirk_system_pci_resources+0x2b/0x150
>  pnp_fixup_device+0x39/0x50
>  __pnp_add_device+0xf/0x150
>  pnp_add_device+0x3d/0x100
>  ? __pfx_pnpacpi_allocated_resource+0x10/0x10
>  ? __pfx_pnpacpi_allocated_resource+0x10/0x10
>  ? acpi_walk_resources+0xbb/0xd0
>  pnpacpi_add_device_handler+0x1f9/0x280
>  acpi_ns_get_device_callback+0x104/0x1c0
>  ? _raw_spin_unlock_irqrestore+0x18/0x20
>  ? down_timeout+0x3a/0x60
>  ? _raw_spin_lock_irqsave+0x14/0x40
>  acpi_ns_walk_namespace+0x1d0/0x260
>  ? _raw_spin_unlock_irqrestore+0x18/0x20
>  ? __pfx_acpi_ns_get_device_callback+0x10/0x10
>  acpi_get_devices+0x8a/0xb0
>  ? __pfx_pnpacpi_add_device_handler+0x10/0x10
>  ? __pfx_pnpacpi_init+0x10/0x10
>  pnpacpi_init+0x50/0x80
>  do_one_initcall+0x46/0x2e0
>  kernel_init_freeable+0x1da/0x2f0
>  ? __pfx_kernel_init+0x10/0x10
>  kernel_init+0x16/0x1b0
>  ret_from_fork+0x30/0x50
>  ? __pfx_kernel_init+0x10/0x10
>  ret_from_fork_asm+0x1b/0x30
>  </TASK>

Across all the halfway recent Linux versions I've never seen this. The MSR
access therefore can't be entirely unconditional, I expect. Or is this new
in 6.14, which I haven't tried yet?

> Fix by allowing access to the MSR on AMD systems, returning 0 for
> unprivileged domains (MMIO configuration space disabled), and the native
> value for the hardware domain.
> 
> The non hardware domain logic will need to be adjusted if in the future we
> expose an MCFG region to such domains.
> 
> Write attempts to the MSR will still result in #GP for all domain types.
> 
> Fixes: 84e848fd7a16 ('x86/hvm: disallow access to unknown MSRs')
> Fixes: 322ec7c89f66 ('x86/pv: disallow access to unknown MSRs')

Hmm, if we consider this a bug fix, then perhaps we'd need to go quite a bit
farther with what MSRs we permit at least read access for. More generally in
this event I'd wonder whether for any MSR that's in principle writable we
shouldn't silently ignore same-value writes.

Jan
Roger Pau Monné Feb. 21, 2025, 1:52 p.m. UTC | #3
On Fri, Feb 21, 2025 at 12:34:21PM +0000, Andrew Cooper wrote:
> On 21/02/2025 12:04 pm, Roger Pau Monne wrote:
> > The MMIO_CONF_BASE reports the base of the MCFG range on AMD systems.
> > Currently Linux is unconditionally attempting to read the MSR without a
> > safe MSR accessor, and since Xen doesn't allow access to it Linux reports
> > the following error:
> >
> > unchecked MSR access error: RDMSR from 0xc0010058 at rIP: 0xffffffff8101d19f (xen_do_read_msr+0x7f/0xa0)
> > Call Trace:
> >  <TASK>
> >  ? ex_handler_msr+0x11e/0x150
> >  ? fixup_exception+0x81/0x300
> >  ? exc_general_protection+0x138/0x410
> >  ? asm_exc_general_protection+0x22/0x30
> >  ? xen_do_read_msr+0x7f/0xa0
> >  xen_read_msr+0x1e/0x30
> >  amd_get_mmconfig_range+0x2b/0x80
> >  quirk_amd_mmconfig_area+0x28/0x100
> >  ? quirk_system_pci_resources+0x2b/0x150
> >  pnp_fixup_device+0x39/0x50
> >  __pnp_add_device+0xf/0x150
> >  pnp_add_device+0x3d/0x100
> >  ? __pfx_pnpacpi_allocated_resource+0x10/0x10
> >  ? __pfx_pnpacpi_allocated_resource+0x10/0x10
> >  ? acpi_walk_resources+0xbb/0xd0
> >  pnpacpi_add_device_handler+0x1f9/0x280
> >  acpi_ns_get_device_callback+0x104/0x1c0
> >  ? _raw_spin_unlock_irqrestore+0x18/0x20
> >  ? down_timeout+0x3a/0x60
> >  ? _raw_spin_lock_irqsave+0x14/0x40
> >  acpi_ns_walk_namespace+0x1d0/0x260
> >  ? _raw_spin_unlock_irqrestore+0x18/0x20
> >  ? __pfx_acpi_ns_get_device_callback+0x10/0x10
> >  acpi_get_devices+0x8a/0xb0
> >  ? __pfx_pnpacpi_add_device_handler+0x10/0x10
> >  ? __pfx_pnpacpi_init+0x10/0x10
> >  pnpacpi_init+0x50/0x80
> >  do_one_initcall+0x46/0x2e0
> >  kernel_init_freeable+0x1da/0x2f0
> >  ? __pfx_kernel_init+0x10/0x10
> >  kernel_init+0x16/0x1b0
> >  ret_from_fork+0x30/0x50
> >  ? __pfx_kernel_init+0x10/0x10
> >  ret_from_fork_asm+0x1b/0x30
> >  </TASK>
> >
> > Fix by allowing access to the MSR on AMD systems, returning 0 for
> > unprivileged domains (MMIO configuration space disabled), and the native
> > value for the hardware domain.
> >
> > The non hardware domain logic will need to be adjusted if in the future we
> > expose an MCFG region to such domains.
> >
> > Write attempts to the MSR will still result in #GP for all domain types.
> >
> > Fixes: 84e848fd7a16 ('x86/hvm: disallow access to unknown MSRs')
> > Fixes: 322ec7c89f66 ('x86/pv: disallow access to unknown MSRs')
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> >  xen/arch/x86/msr.c | 15 +++++++++++++++
> >  1 file changed, 15 insertions(+)
> >
> > diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
> > index 289cf10b783a..c588c9131337 100644
> > --- a/xen/arch/x86/msr.c
> > +++ b/xen/arch/x86/msr.c
> > @@ -245,6 +245,21 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val)
> >          *val = 0;
> >          break;
> >  
> > +    case MSR_FAM10H_MMIO_CONF_BASE:
> > +        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
> > +            goto gp_fault;
> > +
> > +        /*
> > +         * Report MMIO configuration space is disabled unconditionally for
> > +         * domUs, as the emulated chipset doesn't support ECAM.  For dom0
> > +         * return the hardware value.
> > +         */
> > +        *val = 0;
> > +        if ( is_hardware_domain(d) && rdmsr_safe(msr, *val) )
> > +            goto gp_fault;
> > +
> > +        break;
> > +
> >      case MSR_VIRT_SPEC_CTRL:
> >          if ( !cp->extd.virt_ssbd )
> >              goto gp_fault;
> 
> Looking at the linux code, can we not fix this just by turning it into a
> rdmsr_safe(), noting that not all hypervisors virtualise this MSR?

Well, for dom0 it would be best if we expose it.  For domU it would be
fine to use rdmsr_safe() in Linux.

> Given the number of VMs which genuinely don't have PCI (emulated or
> otherwise), it's a buggy assumption in Linux.

I think we need to expose for the hardware domain, at which point
returning all 0s for domUs is kind of a very small nit.

I agree that Linux should use rdmsr_safe(), I can send a patch to that
effect, but I still think there's no harm in returning all 0s on domUs
instead of #GP.

Thanks, Roger.
Roger Pau Monné Feb. 21, 2025, 2:05 p.m. UTC | #4
On Fri, Feb 21, 2025 at 01:44:00PM +0100, Jan Beulich wrote:
> On 21.02.2025 13:04, Roger Pau Monne wrote:
> > The MMIO_CONF_BASE reports the base of the MCFG range on AMD systems.
> > Currently Linux is unconditionally attempting to read the MSR without a
> > safe MSR accessor, and since Xen doesn't allow access to it Linux reports
> > the following error:
> > 
> > unchecked MSR access error: RDMSR from 0xc0010058 at rIP: 0xffffffff8101d19f (xen_do_read_msr+0x7f/0xa0)
> > Call Trace:
> >  <TASK>
> >  ? ex_handler_msr+0x11e/0x150
> >  ? fixup_exception+0x81/0x300
> >  ? exc_general_protection+0x138/0x410
> >  ? asm_exc_general_protection+0x22/0x30
> >  ? xen_do_read_msr+0x7f/0xa0
> >  xen_read_msr+0x1e/0x30
> >  amd_get_mmconfig_range+0x2b/0x80
> >  quirk_amd_mmconfig_area+0x28/0x100
> >  ? quirk_system_pci_resources+0x2b/0x150
> >  pnp_fixup_device+0x39/0x50
> >  __pnp_add_device+0xf/0x150
> >  pnp_add_device+0x3d/0x100
> >  ? __pfx_pnpacpi_allocated_resource+0x10/0x10
> >  ? __pfx_pnpacpi_allocated_resource+0x10/0x10
> >  ? acpi_walk_resources+0xbb/0xd0
> >  pnpacpi_add_device_handler+0x1f9/0x280
> >  acpi_ns_get_device_callback+0x104/0x1c0
> >  ? _raw_spin_unlock_irqrestore+0x18/0x20
> >  ? down_timeout+0x3a/0x60
> >  ? _raw_spin_lock_irqsave+0x14/0x40
> >  acpi_ns_walk_namespace+0x1d0/0x260
> >  ? _raw_spin_unlock_irqrestore+0x18/0x20
> >  ? __pfx_acpi_ns_get_device_callback+0x10/0x10
> >  acpi_get_devices+0x8a/0xb0
> >  ? __pfx_pnpacpi_add_device_handler+0x10/0x10
> >  ? __pfx_pnpacpi_init+0x10/0x10
> >  pnpacpi_init+0x50/0x80
> >  do_one_initcall+0x46/0x2e0
> >  kernel_init_freeable+0x1da/0x2f0
> >  ? __pfx_kernel_init+0x10/0x10
> >  kernel_init+0x16/0x1b0
> >  ret_from_fork+0x30/0x50
> >  ? __pfx_kernel_init+0x10/0x10
> >  ret_from_fork_asm+0x1b/0x30
> >  </TASK>
> 
> Across all the halfway recent Linux versions I've never seen this. The MSR
> access therefore can't be entirely unconditional, I expect. Or is this new
> in 6.14, which I haven't tried yet?

Hm, no, the above report is from 6.6.  It is gated on the presence of
a device with PnP ID "PNP0c01", which triggers the execution of the
quirk_amd_mmconfig_area() function.  Such PnP ID seems to be "System
Board".  The system is currently busy, so I can't gather more
information about the device that triggers it.  The issue was
originally seem on a Dell system with AMD Naples (Zen) CPU (possibly
other CPUs also, that's the one I tested the fix on).

> > Fix by allowing access to the MSR on AMD systems, returning 0 for
> > unprivileged domains (MMIO configuration space disabled), and the native
> > value for the hardware domain.
> > 
> > The non hardware domain logic will need to be adjusted if in the future we
> > expose an MCFG region to such domains.
> > 
> > Write attempts to the MSR will still result in #GP for all domain types.
> > 
> > Fixes: 84e848fd7a16 ('x86/hvm: disallow access to unknown MSRs')
> > Fixes: 322ec7c89f66 ('x86/pv: disallow access to unknown MSRs')
> 
> Hmm, if we consider this a bug fix, then perhaps we'd need to go quite a bit
> farther with what MSRs we permit at least read access for. More generally in
> this event I'd wonder whether for any MSR that's in principle writable we
> shouldn't silently ignore same-value writes.

Yeah, I also wondered whether to silently ignore writes of the same
value.  I would initially leave as-is initially.

Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 289cf10b783a..c588c9131337 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -245,6 +245,21 @@  int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val)
         *val = 0;
         break;
 
+    case MSR_FAM10H_MMIO_CONF_BASE:
+        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
+            goto gp_fault;
+
+        /*
+         * Report MMIO configuration space is disabled unconditionally for
+         * domUs, as the emulated chipset doesn't support ECAM.  For dom0
+         * return the hardware value.
+         */
+        *val = 0;
+        if ( is_hardware_domain(d) && rdmsr_safe(msr, *val) )
+            goto gp_fault;
+
+        break;
+
     case MSR_VIRT_SPEC_CTRL:
         if ( !cp->extd.virt_ssbd )
             goto gp_fault;