Message ID | 20250401044931.793203-1-jon@nutanix.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | KVM: x86: Expose ARCH_CAP_FB_CLEAR when invulnerable to MDS | expand |
On Mon, Mar 31, 2025, Jon Kohler wrote: > Expose FB_CLEAR in arch_capabilities for certain MDS-invulnerable cases > to support live migration from older hardware (e.g., Cascade Lake, Ice > Lake) to newer hardware (e.g., Sapphire Rapids or higher). This ensures > compatibility when user space has previously configured vCPUs to see > FB_CLEAR (ARCH_CAPABILITIES Bit 17). > > Newer hardware sets the following bits but does not set FB_CLEAR, which > can prevent user space from configuring a matching setup: I looked at this again right after PUCK, and KVM does NOT actually prevent userspace from matching the original, pre-SPR configuration. KVM effectively treats ARCH_CAPABILITIES like a CPUID leaf, and lets userspace shove in any value. I.e. userspace can still migrate+stuff FB_CLEAR irrespective of hardware support, and thus there is no need for KVM to lie to userspace. So in effect, this is a userspace problem where it's being too aggressive in its sanity checks. FWIW, even if KVM did reject unsupported ARCH_CAPABILITIES bits, I would still say this is userspace's problem to solve. E.g. by using MSR filtering to intercept and emulate RDMSR(ARCH_CAPABILITIES) in userspace. > ARCH_CAP_MDS_NO > ARCH_CAP_TAA_NO > ARCH_CAP_PSDP_NO > ARCH_CAP_FBSDP_NO > ARCH_CAP_SBDR_SSDP_NO > > This change has minimal impact, as these bit combinations already mark > the host as MMIO immune (via arch_cap_mmio_immune()) and set > disable_fb_clear in vmx_update_fb_clear_dis(), resulting in no > additional overhead. > > Cc: Emanuele Giuseppe Esposito <eesposit@redhat.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> > Signed-off-by: Jon Kohler <jon@nutanix.com> > > --- > arch/x86/kvm/x86.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index c841817a914a..2a4337aa78cd 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -1641,6 +1641,20 @@ static u64 kvm_get_arch_capabilities(void) > if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated()) > data |= ARCH_CAP_GDS_NO; > > + /* > + * User space might set FB_CLEAR when starting a vCPU on a system > + * that does not enumerate FB_CLEAR but is also invulnerable to > + * other various MDS related bugs. To allow live migration from > + * hosts that do implement FB_CLEAR, leave it enabled. > + */ > + if ((data & ARCH_CAP_MDS_NO) && > + (data & ARCH_CAP_TAA_NO) && > + (data & ARCH_CAP_PSDP_NO) && > + (data & ARCH_CAP_FBSDP_NO) && > + (data & ARCH_CAP_SBDR_SSDP_NO)) { > + data |= ARCH_CAP_FB_CLEAR; > + } > + > return data; > } > > -- > 2.43.0 >
> On Apr 2, 2025, at 9:36 AM, Sean Christopherson <seanjc@google.com> wrote: > > !-------------------------------------------------------------------| > CAUTION: External Email > > |-------------------------------------------------------------------! > > On Mon, Mar 31, 2025, Jon Kohler wrote: >> Expose FB_CLEAR in arch_capabilities for certain MDS-invulnerable cases >> to support live migration from older hardware (e.g., Cascade Lake, Ice >> Lake) to newer hardware (e.g., Sapphire Rapids or higher). This ensures >> compatibility when user space has previously configured vCPUs to see >> FB_CLEAR (ARCH_CAPABILITIES Bit 17). >> >> Newer hardware sets the following bits but does not set FB_CLEAR, which >> can prevent user space from configuring a matching setup: > > I looked at this again right after PUCK, and KVM does NOT actually prevent > userspace from matching the original, pre-SPR configuration. KVM effectively > treats ARCH_CAPABILITIES like a CPUID leaf, and lets userspace shove in any > value. I.e. userspace can still migrate+stuff FB_CLEAR irrespective of hardware > support, and thus there is no need for KVM to lie to userspace. > > So in effect, this is a userspace problem where it's being too aggressive in its > sanity checks. > > FWIW, even if KVM did reject unsupported ARCH_CAPABILITIES bits, I would still > say this is userspace's problem to solve. E.g. by using MSR filtering to > intercept and emulate RDMSR(ARCH_CAPABILITIES) in userspace. Thanks, Sean, I appreciate it. I’ll see what sort of trouble I can get in on the user space side of the house with qemu to see if there is a clean way to special case this. Cheers, Jon > >> ARCH_CAP_MDS_NO >> ARCH_CAP_TAA_NO >> ARCH_CAP_PSDP_NO >> ARCH_CAP_FBSDP_NO >> ARCH_CAP_SBDR_SSDP_NO >> >> This change has minimal impact, as these bit combinations already mark >> the host as MMIO immune (via arch_cap_mmio_immune()) and set >> disable_fb_clear in vmx_update_fb_clear_dis(), resulting in no >> additional overhead. >> >> Cc: Emanuele Giuseppe Esposito <eesposit@redhat.com> >> Cc: Paolo Bonzini <pbonzini@redhat.com> >> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> >> Signed-off-by: Jon Kohler <jon@nutanix.com> >> >> --- >> arch/x86/kvm/x86.c | 14 ++++++++++++++ >> 1 file changed, 14 insertions(+) >> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index c841817a914a..2a4337aa78cd 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -1641,6 +1641,20 @@ static u64 kvm_get_arch_capabilities(void) >> if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated()) >> data |= ARCH_CAP_GDS_NO; >> >> + /* >> + * User space might set FB_CLEAR when starting a vCPU on a system >> + * that does not enumerate FB_CLEAR but is also invulnerable to >> + * other various MDS related bugs. To allow live migration from >> + * hosts that do implement FB_CLEAR, leave it enabled. >> + */ >> + if ((data & ARCH_CAP_MDS_NO) && >> + (data & ARCH_CAP_TAA_NO) && >> + (data & ARCH_CAP_PSDP_NO) && >> + (data & ARCH_CAP_FBSDP_NO) && >> + (data & ARCH_CAP_SBDR_SSDP_NO)) { >> + data |= ARCH_CAP_FB_CLEAR; >> + } >> + >> return data; >> } >> >> -- >> 2.43.0 >>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c841817a914a..2a4337aa78cd 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1641,6 +1641,20 @@ static u64 kvm_get_arch_capabilities(void) if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated()) data |= ARCH_CAP_GDS_NO; + /* + * User space might set FB_CLEAR when starting a vCPU on a system + * that does not enumerate FB_CLEAR but is also invulnerable to + * other various MDS related bugs. To allow live migration from + * hosts that do implement FB_CLEAR, leave it enabled. + */ + if ((data & ARCH_CAP_MDS_NO) && + (data & ARCH_CAP_TAA_NO) && + (data & ARCH_CAP_PSDP_NO) && + (data & ARCH_CAP_FBSDP_NO) && + (data & ARCH_CAP_SBDR_SSDP_NO)) { + data |= ARCH_CAP_FB_CLEAR; + } + return data; }
Expose FB_CLEAR in arch_capabilities for certain MDS-invulnerable cases to support live migration from older hardware (e.g., Cascade Lake, Ice Lake) to newer hardware (e.g., Sapphire Rapids or higher). This ensures compatibility when user space has previously configured vCPUs to see FB_CLEAR (ARCH_CAPABILITIES Bit 17). Newer hardware sets the following bits but does not set FB_CLEAR, which can prevent user space from configuring a matching setup: ARCH_CAP_MDS_NO ARCH_CAP_TAA_NO ARCH_CAP_PSDP_NO ARCH_CAP_FBSDP_NO ARCH_CAP_SBDR_SSDP_NO This change has minimal impact, as these bit combinations already mark the host as MMIO immune (via arch_cap_mmio_immune()) and set disable_fb_clear in vmx_update_fb_clear_dis(), resulting in no additional overhead. Cc: Emanuele Giuseppe Esposito <eesposit@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Jon Kohler <jon@nutanix.com> --- arch/x86/kvm/x86.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)