Message ID | 20250221120417.20431-1-roger.pau@citrix.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | x86/msr: expose MSR_FAM10H_MMIO_CONF_BASE on AMD | expand |
On 21/02/2025 12:04 pm, Roger Pau Monne wrote: > The MMIO_CONF_BASE reports the base of the MCFG range on AMD systems. > Currently Linux is unconditionally attempting to read the MSR without a > safe MSR accessor, and since Xen doesn't allow access to it Linux reports > the following error: > > unchecked MSR access error: RDMSR from 0xc0010058 at rIP: 0xffffffff8101d19f (xen_do_read_msr+0x7f/0xa0) > Call Trace: > <TASK> > ? ex_handler_msr+0x11e/0x150 > ? fixup_exception+0x81/0x300 > ? exc_general_protection+0x138/0x410 > ? asm_exc_general_protection+0x22/0x30 > ? xen_do_read_msr+0x7f/0xa0 > xen_read_msr+0x1e/0x30 > amd_get_mmconfig_range+0x2b/0x80 > quirk_amd_mmconfig_area+0x28/0x100 > ? quirk_system_pci_resources+0x2b/0x150 > pnp_fixup_device+0x39/0x50 > __pnp_add_device+0xf/0x150 > pnp_add_device+0x3d/0x100 > ? __pfx_pnpacpi_allocated_resource+0x10/0x10 > ? __pfx_pnpacpi_allocated_resource+0x10/0x10 > ? acpi_walk_resources+0xbb/0xd0 > pnpacpi_add_device_handler+0x1f9/0x280 > acpi_ns_get_device_callback+0x104/0x1c0 > ? _raw_spin_unlock_irqrestore+0x18/0x20 > ? down_timeout+0x3a/0x60 > ? _raw_spin_lock_irqsave+0x14/0x40 > acpi_ns_walk_namespace+0x1d0/0x260 > ? _raw_spin_unlock_irqrestore+0x18/0x20 > ? __pfx_acpi_ns_get_device_callback+0x10/0x10 > acpi_get_devices+0x8a/0xb0 > ? __pfx_pnpacpi_add_device_handler+0x10/0x10 > ? __pfx_pnpacpi_init+0x10/0x10 > pnpacpi_init+0x50/0x80 > do_one_initcall+0x46/0x2e0 > kernel_init_freeable+0x1da/0x2f0 > ? __pfx_kernel_init+0x10/0x10 > kernel_init+0x16/0x1b0 > ret_from_fork+0x30/0x50 > ? __pfx_kernel_init+0x10/0x10 > ret_from_fork_asm+0x1b/0x30 > </TASK> > > Fix by allowing access to the MSR on AMD systems, returning 0 for > unprivileged domains (MMIO configuration space disabled), and the native > value for the hardware domain. > > The non hardware domain logic will need to be adjusted if in the future we > expose an MCFG region to such domains. > > Write attempts to the MSR will still result in #GP for all domain types. > > Fixes: 84e848fd7a16 ('x86/hvm: disallow access to unknown MSRs') > Fixes: 322ec7c89f66 ('x86/pv: disallow access to unknown MSRs') > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > --- > xen/arch/x86/msr.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > > diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c > index 289cf10b783a..c588c9131337 100644 > --- a/xen/arch/x86/msr.c > +++ b/xen/arch/x86/msr.c > @@ -245,6 +245,21 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val) > *val = 0; > break; > > + case MSR_FAM10H_MMIO_CONF_BASE: > + if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) > + goto gp_fault; > + > + /* > + * Report MMIO configuration space is disabled unconditionally for > + * domUs, as the emulated chipset doesn't support ECAM. For dom0 > + * return the hardware value. > + */ > + *val = 0; > + if ( is_hardware_domain(d) && rdmsr_safe(msr, *val) ) > + goto gp_fault; > + > + break; > + > case MSR_VIRT_SPEC_CTRL: > if ( !cp->extd.virt_ssbd ) > goto gp_fault; Looking at the linux code, can we not fix this just by turning it into a rdmsr_safe(), noting that not all hypervisors virtualise this MSR? Given the number of VMs which genuinely don't have PCI (emulated or otherwise), it's a buggy assumption in Linux. ~Andrew
On 21.02.2025 13:04, Roger Pau Monne wrote: > The MMIO_CONF_BASE reports the base of the MCFG range on AMD systems. > Currently Linux is unconditionally attempting to read the MSR without a > safe MSR accessor, and since Xen doesn't allow access to it Linux reports > the following error: > > unchecked MSR access error: RDMSR from 0xc0010058 at rIP: 0xffffffff8101d19f (xen_do_read_msr+0x7f/0xa0) > Call Trace: > <TASK> > ? ex_handler_msr+0x11e/0x150 > ? fixup_exception+0x81/0x300 > ? exc_general_protection+0x138/0x410 > ? asm_exc_general_protection+0x22/0x30 > ? xen_do_read_msr+0x7f/0xa0 > xen_read_msr+0x1e/0x30 > amd_get_mmconfig_range+0x2b/0x80 > quirk_amd_mmconfig_area+0x28/0x100 > ? quirk_system_pci_resources+0x2b/0x150 > pnp_fixup_device+0x39/0x50 > __pnp_add_device+0xf/0x150 > pnp_add_device+0x3d/0x100 > ? __pfx_pnpacpi_allocated_resource+0x10/0x10 > ? __pfx_pnpacpi_allocated_resource+0x10/0x10 > ? acpi_walk_resources+0xbb/0xd0 > pnpacpi_add_device_handler+0x1f9/0x280 > acpi_ns_get_device_callback+0x104/0x1c0 > ? _raw_spin_unlock_irqrestore+0x18/0x20 > ? down_timeout+0x3a/0x60 > ? _raw_spin_lock_irqsave+0x14/0x40 > acpi_ns_walk_namespace+0x1d0/0x260 > ? _raw_spin_unlock_irqrestore+0x18/0x20 > ? __pfx_acpi_ns_get_device_callback+0x10/0x10 > acpi_get_devices+0x8a/0xb0 > ? __pfx_pnpacpi_add_device_handler+0x10/0x10 > ? __pfx_pnpacpi_init+0x10/0x10 > pnpacpi_init+0x50/0x80 > do_one_initcall+0x46/0x2e0 > kernel_init_freeable+0x1da/0x2f0 > ? __pfx_kernel_init+0x10/0x10 > kernel_init+0x16/0x1b0 > ret_from_fork+0x30/0x50 > ? __pfx_kernel_init+0x10/0x10 > ret_from_fork_asm+0x1b/0x30 > </TASK> Across all the halfway recent Linux versions I've never seen this. The MSR access therefore can't be entirely unconditional, I expect. Or is this new in 6.14, which I haven't tried yet? > Fix by allowing access to the MSR on AMD systems, returning 0 for > unprivileged domains (MMIO configuration space disabled), and the native > value for the hardware domain. > > The non hardware domain logic will need to be adjusted if in the future we > expose an MCFG region to such domains. > > Write attempts to the MSR will still result in #GP for all domain types. > > Fixes: 84e848fd7a16 ('x86/hvm: disallow access to unknown MSRs') > Fixes: 322ec7c89f66 ('x86/pv: disallow access to unknown MSRs') Hmm, if we consider this a bug fix, then perhaps we'd need to go quite a bit farther with what MSRs we permit at least read access for. More generally in this event I'd wonder whether for any MSR that's in principle writable we shouldn't silently ignore same-value writes. Jan
On Fri, Feb 21, 2025 at 12:34:21PM +0000, Andrew Cooper wrote: > On 21/02/2025 12:04 pm, Roger Pau Monne wrote: > > The MMIO_CONF_BASE reports the base of the MCFG range on AMD systems. > > Currently Linux is unconditionally attempting to read the MSR without a > > safe MSR accessor, and since Xen doesn't allow access to it Linux reports > > the following error: > > > > unchecked MSR access error: RDMSR from 0xc0010058 at rIP: 0xffffffff8101d19f (xen_do_read_msr+0x7f/0xa0) > > Call Trace: > > <TASK> > > ? ex_handler_msr+0x11e/0x150 > > ? fixup_exception+0x81/0x300 > > ? exc_general_protection+0x138/0x410 > > ? asm_exc_general_protection+0x22/0x30 > > ? xen_do_read_msr+0x7f/0xa0 > > xen_read_msr+0x1e/0x30 > > amd_get_mmconfig_range+0x2b/0x80 > > quirk_amd_mmconfig_area+0x28/0x100 > > ? quirk_system_pci_resources+0x2b/0x150 > > pnp_fixup_device+0x39/0x50 > > __pnp_add_device+0xf/0x150 > > pnp_add_device+0x3d/0x100 > > ? __pfx_pnpacpi_allocated_resource+0x10/0x10 > > ? __pfx_pnpacpi_allocated_resource+0x10/0x10 > > ? acpi_walk_resources+0xbb/0xd0 > > pnpacpi_add_device_handler+0x1f9/0x280 > > acpi_ns_get_device_callback+0x104/0x1c0 > > ? _raw_spin_unlock_irqrestore+0x18/0x20 > > ? down_timeout+0x3a/0x60 > > ? _raw_spin_lock_irqsave+0x14/0x40 > > acpi_ns_walk_namespace+0x1d0/0x260 > > ? _raw_spin_unlock_irqrestore+0x18/0x20 > > ? __pfx_acpi_ns_get_device_callback+0x10/0x10 > > acpi_get_devices+0x8a/0xb0 > > ? __pfx_pnpacpi_add_device_handler+0x10/0x10 > > ? __pfx_pnpacpi_init+0x10/0x10 > > pnpacpi_init+0x50/0x80 > > do_one_initcall+0x46/0x2e0 > > kernel_init_freeable+0x1da/0x2f0 > > ? __pfx_kernel_init+0x10/0x10 > > kernel_init+0x16/0x1b0 > > ret_from_fork+0x30/0x50 > > ? __pfx_kernel_init+0x10/0x10 > > ret_from_fork_asm+0x1b/0x30 > > </TASK> > > > > Fix by allowing access to the MSR on AMD systems, returning 0 for > > unprivileged domains (MMIO configuration space disabled), and the native > > value for the hardware domain. > > > > The non hardware domain logic will need to be adjusted if in the future we > > expose an MCFG region to such domains. > > > > Write attempts to the MSR will still result in #GP for all domain types. > > > > Fixes: 84e848fd7a16 ('x86/hvm: disallow access to unknown MSRs') > > Fixes: 322ec7c89f66 ('x86/pv: disallow access to unknown MSRs') > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > > --- > > xen/arch/x86/msr.c | 15 +++++++++++++++ > > 1 file changed, 15 insertions(+) > > > > diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c > > index 289cf10b783a..c588c9131337 100644 > > --- a/xen/arch/x86/msr.c > > +++ b/xen/arch/x86/msr.c > > @@ -245,6 +245,21 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val) > > *val = 0; > > break; > > > > + case MSR_FAM10H_MMIO_CONF_BASE: > > + if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) > > + goto gp_fault; > > + > > + /* > > + * Report MMIO configuration space is disabled unconditionally for > > + * domUs, as the emulated chipset doesn't support ECAM. For dom0 > > + * return the hardware value. > > + */ > > + *val = 0; > > + if ( is_hardware_domain(d) && rdmsr_safe(msr, *val) ) > > + goto gp_fault; > > + > > + break; > > + > > case MSR_VIRT_SPEC_CTRL: > > if ( !cp->extd.virt_ssbd ) > > goto gp_fault; > > Looking at the linux code, can we not fix this just by turning it into a > rdmsr_safe(), noting that not all hypervisors virtualise this MSR? Well, for dom0 it would be best if we expose it. For domU it would be fine to use rdmsr_safe() in Linux. > Given the number of VMs which genuinely don't have PCI (emulated or > otherwise), it's a buggy assumption in Linux. I think we need to expose for the hardware domain, at which point returning all 0s for domUs is kind of a very small nit. I agree that Linux should use rdmsr_safe(), I can send a patch to that effect, but I still think there's no harm in returning all 0s on domUs instead of #GP. Thanks, Roger.
On Fri, Feb 21, 2025 at 01:44:00PM +0100, Jan Beulich wrote: > On 21.02.2025 13:04, Roger Pau Monne wrote: > > The MMIO_CONF_BASE reports the base of the MCFG range on AMD systems. > > Currently Linux is unconditionally attempting to read the MSR without a > > safe MSR accessor, and since Xen doesn't allow access to it Linux reports > > the following error: > > > > unchecked MSR access error: RDMSR from 0xc0010058 at rIP: 0xffffffff8101d19f (xen_do_read_msr+0x7f/0xa0) > > Call Trace: > > <TASK> > > ? ex_handler_msr+0x11e/0x150 > > ? fixup_exception+0x81/0x300 > > ? exc_general_protection+0x138/0x410 > > ? asm_exc_general_protection+0x22/0x30 > > ? xen_do_read_msr+0x7f/0xa0 > > xen_read_msr+0x1e/0x30 > > amd_get_mmconfig_range+0x2b/0x80 > > quirk_amd_mmconfig_area+0x28/0x100 > > ? quirk_system_pci_resources+0x2b/0x150 > > pnp_fixup_device+0x39/0x50 > > __pnp_add_device+0xf/0x150 > > pnp_add_device+0x3d/0x100 > > ? __pfx_pnpacpi_allocated_resource+0x10/0x10 > > ? __pfx_pnpacpi_allocated_resource+0x10/0x10 > > ? acpi_walk_resources+0xbb/0xd0 > > pnpacpi_add_device_handler+0x1f9/0x280 > > acpi_ns_get_device_callback+0x104/0x1c0 > > ? _raw_spin_unlock_irqrestore+0x18/0x20 > > ? down_timeout+0x3a/0x60 > > ? _raw_spin_lock_irqsave+0x14/0x40 > > acpi_ns_walk_namespace+0x1d0/0x260 > > ? _raw_spin_unlock_irqrestore+0x18/0x20 > > ? __pfx_acpi_ns_get_device_callback+0x10/0x10 > > acpi_get_devices+0x8a/0xb0 > > ? __pfx_pnpacpi_add_device_handler+0x10/0x10 > > ? __pfx_pnpacpi_init+0x10/0x10 > > pnpacpi_init+0x50/0x80 > > do_one_initcall+0x46/0x2e0 > > kernel_init_freeable+0x1da/0x2f0 > > ? __pfx_kernel_init+0x10/0x10 > > kernel_init+0x16/0x1b0 > > ret_from_fork+0x30/0x50 > > ? __pfx_kernel_init+0x10/0x10 > > ret_from_fork_asm+0x1b/0x30 > > </TASK> > > Across all the halfway recent Linux versions I've never seen this. The MSR > access therefore can't be entirely unconditional, I expect. Or is this new > in 6.14, which I haven't tried yet? Hm, no, the above report is from 6.6. It is gated on the presence of a device with PnP ID "PNP0c01", which triggers the execution of the quirk_amd_mmconfig_area() function. Such PnP ID seems to be "System Board". The system is currently busy, so I can't gather more information about the device that triggers it. The issue was originally seem on a Dell system with AMD Naples (Zen) CPU (possibly other CPUs also, that's the one I tested the fix on). > > Fix by allowing access to the MSR on AMD systems, returning 0 for > > unprivileged domains (MMIO configuration space disabled), and the native > > value for the hardware domain. > > > > The non hardware domain logic will need to be adjusted if in the future we > > expose an MCFG region to such domains. > > > > Write attempts to the MSR will still result in #GP for all domain types. > > > > Fixes: 84e848fd7a16 ('x86/hvm: disallow access to unknown MSRs') > > Fixes: 322ec7c89f66 ('x86/pv: disallow access to unknown MSRs') > > Hmm, if we consider this a bug fix, then perhaps we'd need to go quite a bit > farther with what MSRs we permit at least read access for. More generally in > this event I'd wonder whether for any MSR that's in principle writable we > shouldn't silently ignore same-value writes. Yeah, I also wondered whether to silently ignore writes of the same value. I would initially leave as-is initially. Thanks, Roger.
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c index 289cf10b783a..c588c9131337 100644 --- a/xen/arch/x86/msr.c +++ b/xen/arch/x86/msr.c @@ -245,6 +245,21 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val) *val = 0; break; + case MSR_FAM10H_MMIO_CONF_BASE: + if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) + goto gp_fault; + + /* + * Report MMIO configuration space is disabled unconditionally for + * domUs, as the emulated chipset doesn't support ECAM. For dom0 + * return the hardware value. + */ + *val = 0; + if ( is_hardware_domain(d) && rdmsr_safe(msr, *val) ) + goto gp_fault; + + break; + case MSR_VIRT_SPEC_CTRL: if ( !cp->extd.virt_ssbd ) goto gp_fault;
The MMIO_CONF_BASE reports the base of the MCFG range on AMD systems. Currently Linux is unconditionally attempting to read the MSR without a safe MSR accessor, and since Xen doesn't allow access to it Linux reports the following error: unchecked MSR access error: RDMSR from 0xc0010058 at rIP: 0xffffffff8101d19f (xen_do_read_msr+0x7f/0xa0) Call Trace: <TASK> ? ex_handler_msr+0x11e/0x150 ? fixup_exception+0x81/0x300 ? exc_general_protection+0x138/0x410 ? asm_exc_general_protection+0x22/0x30 ? xen_do_read_msr+0x7f/0xa0 xen_read_msr+0x1e/0x30 amd_get_mmconfig_range+0x2b/0x80 quirk_amd_mmconfig_area+0x28/0x100 ? quirk_system_pci_resources+0x2b/0x150 pnp_fixup_device+0x39/0x50 __pnp_add_device+0xf/0x150 pnp_add_device+0x3d/0x100 ? __pfx_pnpacpi_allocated_resource+0x10/0x10 ? __pfx_pnpacpi_allocated_resource+0x10/0x10 ? acpi_walk_resources+0xbb/0xd0 pnpacpi_add_device_handler+0x1f9/0x280 acpi_ns_get_device_callback+0x104/0x1c0 ? _raw_spin_unlock_irqrestore+0x18/0x20 ? down_timeout+0x3a/0x60 ? _raw_spin_lock_irqsave+0x14/0x40 acpi_ns_walk_namespace+0x1d0/0x260 ? _raw_spin_unlock_irqrestore+0x18/0x20 ? __pfx_acpi_ns_get_device_callback+0x10/0x10 acpi_get_devices+0x8a/0xb0 ? __pfx_pnpacpi_add_device_handler+0x10/0x10 ? __pfx_pnpacpi_init+0x10/0x10 pnpacpi_init+0x50/0x80 do_one_initcall+0x46/0x2e0 kernel_init_freeable+0x1da/0x2f0 ? __pfx_kernel_init+0x10/0x10 kernel_init+0x16/0x1b0 ret_from_fork+0x30/0x50 ? __pfx_kernel_init+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> Fix by allowing access to the MSR on AMD systems, returning 0 for unprivileged domains (MMIO configuration space disabled), and the native value for the hardware domain. The non hardware domain logic will need to be adjusted if in the future we expose an MCFG region to such domains. Write attempts to the MSR will still result in #GP for all domain types. Fixes: 84e848fd7a16 ('x86/hvm: disallow access to unknown MSRs') Fixes: 322ec7c89f66 ('x86/pv: disallow access to unknown MSRs') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- xen/arch/x86/msr.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)