KVM: x86: Expose ARCH_CAP_FB_CLEAR when invulnerable to MDS

Message ID	20250401044931.793203-1-jon@nutanix.com (mailing list archive)
State	New
Headers	show Received: from mx0a-002c1b01.pphosted.com (mx0a-002c1b01.pphosted.com [148.163.151.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6FD558F54; Tue, 1 Apr 2025 04:22:06 +0000 (UTC) From: Jon Kohler <jon@nutanix.com> To: Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jon Kohler <jon@nutanix.com>, Emanuele Giuseppe Esposito <eesposit@redhat.com>, Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Subject: [PATCH] KVM: x86: Expose ARCH_CAP_FB_CLEAR when invulnerable to MDS Date: Mon, 31 Mar 2025 21:49:31 -0700 Message-ID: <20250401044931.793203-1-jon@nutanix.com> Content-Transfer-Encoding: 8bit Content-Type: text/plain Precedence: bulk MIME-Version: 1.0
Series	KVM: x86: Expose ARCH_CAP_FB_CLEAR when invulnerable to MDS \| expand KVM: x86: Expose ARCH_CAP_FB_CLEAR when invulnerable to MDS

Message ID

20250401044931.793203-1-jon@nutanix.com (mailing list archive)

State

New

Headers

From: Jon Kohler <jon@nutanix.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>,
        Borislav Petkov <bp@alien8.de>,
        Dave Hansen <dave.hansen@linux.intel.com>, x86@kernel.org,
        "H. Peter Anvin" <hpa@zytor.com>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org
Cc: Jon Kohler <jon@nutanix.com>,
        Emanuele Giuseppe Esposito <eesposit@redhat.com>,
        Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Subject: [PATCH] KVM: x86: Expose ARCH_CAP_FB_CLEAR when invulnerable to MDS
Date: Mon, 31 Mar 2025 21:49:31 -0700
Message-ID: <20250401044931.793203-1-jon@nutanix.com>
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
Precedence: bulk
MIME-Version: 1.0
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: 
 OQBRIzgYOIAwjFv3o72EjZb9L2m3Lfs763dwscgGcfxVtIzc33rk81zxd6DznPk+E3mWLG4yfN+IjHOq875SFLSc0cCuVOfr7ylayyJFq6bFQBiZh1pis9JF6oyTsa8FWpw2wr/czDOGZ/+H5BTdO+OpVo67fidTQqkXY+7i31FyHTTaatR/NQc+sh4BNm1XFUVZgGZJJ0OnMmbWgk1b0RoH27JcDORQHlz8Lv/WElIdTdIpMPic85LBGrc996pF2MiFQx4DTkTBCa+wdYK6jnIy3kZ1kIXUlFDgIgUPQhxpY6LaEl1IeCmXas2n/UQn14pZPWe+ZzvhDj3+dM7qy0azCWykmFeEb4GGga+6hPcLKv/Vd/SnlouSxP6utJ02KpigPxZ1Va9BmwwuN7WUpPiOUdXkKvd9B4keE+4m9WGKdiH65jF+wC+GZvVhYZYKT9fSy8fXU6OtrXx3soGlILyF/WDmN/VcldsF1GSlFjkm7qGY4dBxsAt5uVY1Re/e0Bb3VYFGuioFPznwdBlSsLtYMlAlVoST+xysBT3uUdG4QtMfUsPZiO28ovZkN9U6oPXswFCI9iiBRkU+6h87niyp37nurt4s3Wz16dtTmWpgmS06Ht+xyg8F5DPrPxhzttWWlRw7+dWvfJOWMXIdymujQ+bUCfBk3OnF7QJ+NioA25phnArbJUELvKeAV2Tyvzxmkk2pnqfzYokUWi9vdj0Wke2Qs1pnwxN4/Casv4YkE/5DBxptcv+HRZG69awKB5GF14VSnpIIWrZmknsQPnd37UpDhnS+HaIdkXW/YSE5zKmvQe7fdv0bpzj1Y8gdnq5yeCaSSGzLs3t3N4/sLNiEZyF0GaFcSrNa9CRkTR1SyiPSwkEafWtxeysNcG1cuiE05rt6+d2VM/+ADGcVmsQwAhmT94HNxO50qrQDPiz1cFdThj5HfBR+PvpVP55TRLj9/4p2OxdFMgZ1qHTsVRoQpHxBh5UNtHXis9JRYhBchOoIxiBwaSS8hczIICS9iy9DwN87DCAgIGuVBVWailL/XFRUGoupYNiN+jJtS8K2+PLVIOGwmuR0keT75xM7bG8yJsaTltDWuDaE+sHnRvCK2ZWTNXae2yZOHJN1Tn+9CaExtET28bCx6krNG0hlE8U2rb4/YQZgKkbU/ORKWUSydSxuVBJe2ryLIBRvT/ROxAK/XUDm4F00em/2ru6F2xveP4Snv5njf6NWNIERUlQkiWqU/wu3eCDxpI4X8H8z1RDo0qZO8KipUa8LcVxSb2L72kpb9tIg4CLfOMTAla8eCOW+1pa2JJLfba5Od7cutKYarF+bI/g2tLeWpWLPHK+KelwHz79Ghv1lJ0sf48BJGUWER1JKzgIYUsB1Hk6d3AYccK1VrM+bLzmsrqOcEqwz8IxoTS4kUxRBRc+gD2cKB3+qGYa/+4GJY6QF+go0SOoo7EqAgsIprtWLp5Myyr4b8AoL8Kf64YqPa97a1AfSgVxS0Arwh7hKGJHJQwWx0diKMKqJpjYLOhaDTIf+Qt0swopnkyVuiCHRi/JfIa5AIJAG4hTTyfTOyk5MzNVAx64cNL6mAxYTQfkn9geahm4HTHYxpgYrPIOP2E0CfA==
X-OriginatorOrg: nutanix.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 4c699920-e4b2-4980-e5cc-08dd70d4aa8b
X-MS-Exchange-CrossTenant-AuthSource: LV8PR02MB10287.namprd02.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2025 04:21:25.4168
 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: bb047546-786f-4de1-bd75-24e5b6f79043
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: 
 PU12gppsUGVYtBnK/5nPSvAv9ap3a8VGodl3RWSDnpNpiHm78DiZmZfY4LjYVqEGO7aVvSzH/4I3CNcWPkTemMZ0IJNcQotdH29sgS/Xj3c=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA6PR02MB10357
X-Proofpoint-ORIG-GUID: Bn3c-8DUE0LPZ1rNtZvHSrL2lPLP1uWY
X-Proofpoint-GUID: Bn3c-8DUE0LPZ1rNtZvHSrL2lPLP1uWY
X-Authority-Analysis: v=2.4 cv=IKsCChvG c=1 sm=1 tr=0 ts=67eb69c8 cx=c_pps
 a=SyDKAmCqRQVBsfptwLbjyA==:117 a=lCpzRmAYbLLaTzLvsPZ7Mbvzbb8=:19
 a=wKuvFiaSGQ0qltdbU6+NXLB8nM8=:19 a=Ol13hO9ccFRV9qXi2t6ftBPywas=:19
 a=xqWC_Br6kY4A:10 a=Vs1iUdzkB0EA:10
 a=H5OGdu5hBBwA:10 a=0kUYKlekyDsA:10 a=20KFwNOVAAAA:8 a=QyXUC8HyAAAA:8
 a=64Cc0HZtAAAA:8 a=Aiiw1AQcM_unW6Ac4i8A:9
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.293,Aquarius:18.0.1095,Hydra:6.0.680,FMLib:17.12.68.34
 definitions=2025-04-01_01,2025-03-27_02,2024-11-22_01
X-Proofpoint-Spam-Reason: safe

Series

KVM: x86: Expose ARCH_CAP_FB_CLEAR when invulnerable to MDS | expand

Commit Message

Jon Kohler April 1, 2025, 4:49 a.m. UTC

Expose FB_CLEAR in arch_capabilities for certain MDS-invulnerable cases 
to support live migration from older hardware (e.g., Cascade Lake, Ice 
Lake) to newer hardware (e.g., Sapphire Rapids or higher). This ensures 
compatibility when user space has previously configured vCPUs to see 
FB_CLEAR (ARCH_CAPABILITIES Bit 17).

Newer hardware sets the following bits but does not set FB_CLEAR, which 
can prevent user space from configuring a matching setup:
    ARCH_CAP_MDS_NO
    ARCH_CAP_TAA_NO
    ARCH_CAP_PSDP_NO
    ARCH_CAP_FBSDP_NO
    ARCH_CAP_SBDR_SSDP_NO

This change has minimal impact, as these bit combinations already mark 
the host as MMIO immune (via arch_cap_mmio_immune()) and set 
disable_fb_clear in vmx_update_fb_clear_dis(), resulting in no 
additional overhead.

Cc: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Jon Kohler <jon@nutanix.com>

---
 arch/x86/kvm/x86.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Sean Christopherson April 2, 2025, 1:36 p.m. UTC | #1

On Mon, Mar 31, 2025, Jon Kohler wrote:
> Expose FB_CLEAR in arch_capabilities for certain MDS-invulnerable cases 
> to support live migration from older hardware (e.g., Cascade Lake, Ice 
> Lake) to newer hardware (e.g., Sapphire Rapids or higher). This ensures 
> compatibility when user space has previously configured vCPUs to see 
> FB_CLEAR (ARCH_CAPABILITIES Bit 17).
> 
> Newer hardware sets the following bits but does not set FB_CLEAR, which 
> can prevent user space from configuring a matching setup:

I looked at this again right after PUCK, and KVM does NOT actually prevent
userspace from matching the original, pre-SPR configuration.  KVM effectively
treats ARCH_CAPABILITIES like a CPUID leaf, and lets userspace shove in any
value.  I.e. userspace can still migrate+stuff FB_CLEAR irrespective of hardware
support, and thus there is no need for KVM to lie to userspace.

So in effect, this is a userspace problem where it's being too aggressive in its
sanity checks.

FWIW, even if KVM did reject unsupported ARCH_CAPABILITIES bits, I would still
say this is userspace's problem to solve.  E.g. by using MSR filtering to
intercept and emulate RDMSR(ARCH_CAPABILITIES) in userspace.

>     ARCH_CAP_MDS_NO
>     ARCH_CAP_TAA_NO
>     ARCH_CAP_PSDP_NO
>     ARCH_CAP_FBSDP_NO
>     ARCH_CAP_SBDR_SSDP_NO
> 
> This change has minimal impact, as these bit combinations already mark 
> the host as MMIO immune (via arch_cap_mmio_immune()) and set 
> disable_fb_clear in vmx_update_fb_clear_dis(), resulting in no 
> additional overhead.
> 
> Cc: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Signed-off-by: Jon Kohler <jon@nutanix.com>
> 
> ---
>  arch/x86/kvm/x86.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c841817a914a..2a4337aa78cd 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1641,6 +1641,20 @@ static u64 kvm_get_arch_capabilities(void)
>  	if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
>  		data |= ARCH_CAP_GDS_NO;
>  
> +	/*
> +	 * User space might set FB_CLEAR when starting a vCPU on a system
> +	 * that does not enumerate FB_CLEAR but is also invulnerable to
> +	 * other various MDS related bugs. To allow live migration from
> +	 * hosts that do implement FB_CLEAR, leave it enabled.
> +	 */
> +	if ((data & ARCH_CAP_MDS_NO) &&
> +	    (data & ARCH_CAP_TAA_NO) &&
> +	    (data & ARCH_CAP_PSDP_NO) &&
> +	    (data & ARCH_CAP_FBSDP_NO) &&
> +	    (data & ARCH_CAP_SBDR_SSDP_NO)) {
> +		data |= ARCH_CAP_FB_CLEAR;
> +	}
> +
>  	return data;
>  }
>  
> -- 
> 2.43.0
>

Jon Kohler April 2, 2025, 1:46 p.m. UTC | #2

> On Apr 2, 2025, at 9:36 AM, Sean Christopherson <seanjc@google.com> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> On Mon, Mar 31, 2025, Jon Kohler wrote:
>> Expose FB_CLEAR in arch_capabilities for certain MDS-invulnerable cases 
>> to support live migration from older hardware (e.g., Cascade Lake, Ice 
>> Lake) to newer hardware (e.g., Sapphire Rapids or higher). This ensures 
>> compatibility when user space has previously configured vCPUs to see 
>> FB_CLEAR (ARCH_CAPABILITIES Bit 17).
>> 
>> Newer hardware sets the following bits but does not set FB_CLEAR, which 
>> can prevent user space from configuring a matching setup:
> 
> I looked at this again right after PUCK, and KVM does NOT actually prevent
> userspace from matching the original, pre-SPR configuration.  KVM effectively
> treats ARCH_CAPABILITIES like a CPUID leaf, and lets userspace shove in any
> value.  I.e. userspace can still migrate+stuff FB_CLEAR irrespective of hardware
> support, and thus there is no need for KVM to lie to userspace.
> 
> So in effect, this is a userspace problem where it's being too aggressive in its
> sanity checks.
> 
> FWIW, even if KVM did reject unsupported ARCH_CAPABILITIES bits, I would still
> say this is userspace's problem to solve.  E.g. by using MSR filtering to
> intercept and emulate RDMSR(ARCH_CAPABILITIES) in userspace.

Thanks, Sean, I appreciate it. I’ll see what sort of trouble I can get in on the user
space side of the house with qemu to see if there is a clean way to special case
this.

Cheers, Jon

> 
>>    ARCH_CAP_MDS_NO
>>    ARCH_CAP_TAA_NO
>>    ARCH_CAP_PSDP_NO
>>    ARCH_CAP_FBSDP_NO
>>    ARCH_CAP_SBDR_SSDP_NO
>> 
>> This change has minimal impact, as these bit combinations already mark 
>> the host as MMIO immune (via arch_cap_mmio_immune()) and set 
>> disable_fb_clear in vmx_update_fb_clear_dis(), resulting in no 
>> additional overhead.
>> 
>> Cc: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
>> Signed-off-by: Jon Kohler <jon@nutanix.com>
>> 
>> ---
>> arch/x86/kvm/x86.c | 14 ++++++++++++++
>> 1 file changed, 14 insertions(+)
>> 
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index c841817a914a..2a4337aa78cd 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1641,6 +1641,20 @@ static u64 kvm_get_arch_capabilities(void)
>> if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
>> data |= ARCH_CAP_GDS_NO;
>> 
>> + /*
>> + * User space might set FB_CLEAR when starting a vCPU on a system
>> + * that does not enumerate FB_CLEAR but is also invulnerable to
>> + * other various MDS related bugs. To allow live migration from
>> + * hosts that do implement FB_CLEAR, leave it enabled.
>> + */
>> + if ((data & ARCH_CAP_MDS_NO) &&
>> +    (data & ARCH_CAP_TAA_NO) &&
>> +    (data & ARCH_CAP_PSDP_NO) &&
>> +    (data & ARCH_CAP_FBSDP_NO) &&
>> +    (data & ARCH_CAP_SBDR_SSDP_NO)) {
>> + data |= ARCH_CAP_FB_CLEAR;
>> + }
>> +
>> return data;
>> }
>> 
>> -- 
>> 2.43.0
>>

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c841817a914a..2a4337aa78cd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1641,6 +1641,20 @@  static u64 kvm_get_arch_capabilities(void)
 	if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
 		data |= ARCH_CAP_GDS_NO;
 
+	/*
+	 * User space might set FB_CLEAR when starting a vCPU on a system
+	 * that does not enumerate FB_CLEAR but is also invulnerable to
+	 * other various MDS related bugs. To allow live migration from
+	 * hosts that do implement FB_CLEAR, leave it enabled.
+	 */
+	if ((data & ARCH_CAP_MDS_NO) &&
+	    (data & ARCH_CAP_TAA_NO) &&
+	    (data & ARCH_CAP_PSDP_NO) &&
+	    (data & ARCH_CAP_FBSDP_NO) &&
+	    (data & ARCH_CAP_SBDR_SSDP_NO)) {
+		data |= ARCH_CAP_FB_CLEAR;
+	}
+
 	return data;
 }

KVM: x86: Expose ARCH_CAP_FB_CLEAR when invulnerable to MDS

Commit Message

Comments

Patch