diff mbox series

KVM: x86: Set BHI_NO in guest when host is not affected by BHI

Message ID 20240411072445.522731-1-alexandre.chartre@oracle.com (mailing list archive)
State New
Headers show
Series KVM: x86: Set BHI_NO in guest when host is not affected by BHI | expand

Commit Message

Alexandre Chartre April 11, 2024, 7:24 a.m. UTC
When a system is not affected by the BHI bug then KVM should
configure guests with BHI_NO to ensure they won't enable any
BHI mitigation.

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
---
 arch/x86/kvm/x86.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Nikolay Borisov April 11, 2024, 7:34 a.m. UTC | #1
On 11.04.24 г. 10:24 ч., Alexandre Chartre wrote:
> When a system is not affected by the BHI bug then KVM should
> configure guests with BHI_NO to ensure they won't enable any
> BHI mitigation.
> 
> Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
> ---
>   arch/x86/kvm/x86.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 984ea2089efc..f43d3c15a6b7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1678,6 +1678,9 @@ static u64 kvm_get_arch_capabilities(void)
>   	if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
>   		data |= ARCH_CAP_GDS_NO;
>   
> +	if (!boot_cpu_has_bug(X86_BUG_BHI))
> +		data |= ARCH_CAP_BHI_NO;
> +

But this is already handled since ARCH_CAP_BHI_NO is added to 
KVM_SUPPORTED_ARCH_CAP so when the host caps are read that bit is going 
to be set there, if it's set for the physical cpu of course.
>   	return data;
>   }
>
Alexandre Chartre April 11, 2024, 7:49 a.m. UTC | #2
On 4/11/24 09:34, Nikolay Borisov wrote:
> 
> 
> On 11.04.24 г. 10:24 ч., Alexandre Chartre wrote:
>> When a system is not affected by the BHI bug then KVM should
>> configure guests with BHI_NO to ensure they won't enable any
>> BHI mitigation.
>>
>> Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
>> ---
>>   arch/x86/kvm/x86.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 984ea2089efc..f43d3c15a6b7 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1678,6 +1678,9 @@ static u64 kvm_get_arch_capabilities(void)
>>       if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
>>           data |= ARCH_CAP_GDS_NO;
>> +    if (!boot_cpu_has_bug(X86_BUG_BHI))
>> +        data |= ARCH_CAP_BHI_NO;
>> +
> 
> But this is already handled since ARCH_CAP_BHI_NO is added to
> KVM_SUPPORTED_ARCH_CAP so when the host caps are read that bit is
> going to be set there, if it's set for the physical cpu of course.

Correct, if the host has ARCH_CAP_BHI_NO then it will be propagated to the
guest. But the host might not have ARCH_CAP_BHI_NO set and not be affected
by BHI.

That's the case for example of Skylake servers: they don't have ARCH_CAP_BHI_NO,
but they are not affected by BHI because they don't have eIBRS. However, a guest
will think it is affected because it doesn't know if eIBRS is present on the
system (because virtualization can hide it).

I tested on Skylake:

Without the patch, both host and guest are running 6.9.0-rc3, then BHI mitigations are:

- Host:  BHI: Not affected
- Guest: BHI: SW loop, KVM: SW loop

=> so guest enables BHI SW loop mitigation although host doesn't need mitigation.

With the patch on the host, guest still running 6.9.0-rc3, then BHI mitigations are:

- Host:  BHI: Not affected
- Guest: BHI: Not affected

=> now guest doesn't enable BHI mitigation, like the host.

Thanks,

alex.
Greg KH April 11, 2024, 7:51 a.m. UTC | #3
On Thu, Apr 11, 2024 at 09:24:45AM +0200, Alexandre Chartre wrote:
> When a system is not affected by the BHI bug then KVM should
> configure guests with BHI_NO to ensure they won't enable any
> BHI mitigation.
> 
> Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>

No Fixes: tag here?

thanks,

greg k-h
Nikolay Borisov April 11, 2024, 7:51 a.m. UTC | #4
On 11.04.24 г. 10:24 ч., Alexandre Chartre wrote:
> When a system is not affected by the BHI bug then KVM should
> configure guests with BHI_NO to ensure they won't enable any
> BHI mitigation.
> 
> Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
> ---
>   arch/x86/kvm/x86.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 984ea2089efc..f43d3c15a6b7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1678,6 +1678,9 @@ static u64 kvm_get_arch_capabilities(void)
>   	if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
>   		data |= ARCH_CAP_GDS_NO;
>   
> +	if (!boot_cpu_has_bug(X86_BUG_BHI))
> +		data |= ARCH_CAP_BHI_NO;
> +
>   	return data;
>   }
>   

Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Alexandre Chartre April 11, 2024, 8 a.m. UTC | #5
On 4/11/24 09:51, Greg KH wrote:
> On Thu, Apr 11, 2024 at 09:24:45AM +0200, Alexandre Chartre wrote:
>> When a system is not affected by the BHI bug then KVM should
>> configure guests with BHI_NO to ensure they won't enable any
>> BHI mitigation.
>>
>> Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
> 
> No Fixes: tag here?
> 

This should have been done with commit ed2e8d49b54d ("KVM: x86: Add BHI_NO"), so:

Fixes: ed2e8d49b54d ("KVM: x86: Add BHI_NO")

Thanks,

alex.
Andrew Cooper April 11, 2024, 8:43 a.m. UTC | #6
On 11/04/2024 8:24 am, Alexandre Chartre wrote:
> When a system is not affected by the BHI bug then KVM should
> configure guests with BHI_NO to ensure they won't enable any
> BHI mitigation.
>
> Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
> ---
>  arch/x86/kvm/x86.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 984ea2089efc..f43d3c15a6b7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1678,6 +1678,9 @@ static u64 kvm_get_arch_capabilities(void)
>  	if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
>  		data |= ARCH_CAP_GDS_NO;
>  
> +	if (!boot_cpu_has_bug(X86_BUG_BHI))
> +		data |= ARCH_CAP_BHI_NO;

This isn't true or safe.

Linux only sets X86_BUG_BHI on a subset of affected parts.

Skylake for example *is* affected by BHI.  It's just that existing
mitigations are believed to suffice to mitigate BHI too.

"you happen to be safe if you're doing something else too" doesn't
remotely have the same meaning as "hardware doesn't have a history based
predictor".

~Andrew
Alexandre Chartre April 11, 2024, 9:33 a.m. UTC | #7
On 4/11/24 10:43, Andrew Cooper wrote:
> On 11/04/2024 8:24 am, Alexandre Chartre wrote:
>> When a system is not affected by the BHI bug then KVM should
>> configure guests with BHI_NO to ensure they won't enable any
>> BHI mitigation.
>>
>> Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
>> ---
>>   arch/x86/kvm/x86.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 984ea2089efc..f43d3c15a6b7 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1678,6 +1678,9 @@ static u64 kvm_get_arch_capabilities(void)
>>   	if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
>>   		data |= ARCH_CAP_GDS_NO;
>>   
>> +	if (!boot_cpu_has_bug(X86_BUG_BHI))
>> +		data |= ARCH_CAP_BHI_NO;
> 
> This isn't true or safe.
> 
> Linux only sets X86_BUG_BHI on a subset of affected parts.
> 
> Skylake for example *is* affected by BHI.  It's just that existing
> mitigations are believed to suffice to mitigate BHI too.
> 
> "you happen to be safe if you're doing something else too" doesn't
> remotely have the same meaning as "hardware doesn't have a history based
> predictor".
> 

So you mean we can't set ARCH_CAP_BHI_NO for the guest because we don't know
if the guest will run the (other) existing mitigations which are believed to
suffice to mitigate BHI?

The problem is that we can end up with a guest running extra BHI mitigations
while this is not needed. Could we inform the guest that eIBRS is not available
on the system so a Linux guest doesn't run with extra BHI mitigations?

Thanks,

alex.
Andrew Cooper April 11, 2024, 9:38 a.m. UTC | #8
On 11/04/2024 10:33 am, Alexandre Chartre wrote:
>
>
> On 4/11/24 10:43, Andrew Cooper wrote:
>> On 11/04/2024 8:24 am, Alexandre Chartre wrote:
>>> When a system is not affected by the BHI bug then KVM should
>>> configure guests with BHI_NO to ensure they won't enable any
>>> BHI mitigation.
>>>
>>> Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
>>> ---
>>>   arch/x86/kvm/x86.c | 3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index 984ea2089efc..f43d3c15a6b7 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -1678,6 +1678,9 @@ static u64 kvm_get_arch_capabilities(void)
>>>       if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
>>>           data |= ARCH_CAP_GDS_NO;
>>>   +    if (!boot_cpu_has_bug(X86_BUG_BHI))
>>> +        data |= ARCH_CAP_BHI_NO;
>>
>> This isn't true or safe.
>>
>> Linux only sets X86_BUG_BHI on a subset of affected parts.
>>
>> Skylake for example *is* affected by BHI.  It's just that existing
>> mitigations are believed to suffice to mitigate BHI too.
>>
>> "you happen to be safe if you're doing something else too" doesn't
>> remotely have the same meaning as "hardware doesn't have a history based
>> predictor".
>>
>
> So you mean we can't set ARCH_CAP_BHI_NO for the guest because we
> don't know
> if the guest will run the (other) existing mitigations which are
> believed to
> suffice to mitigate BHI?

Correct.

Also, when a VM really is migrating between different CPUs, things get
far more complicated.

>
> The problem is that we can end up with a guest running extra BHI
> mitigations
> while this is not needed. Could we inform the guest that eIBRS is not
> available
> on the system so a Linux guest doesn't run with extra BHI mitigations?

Well, that's why Intel specified some MSRs at 0x5000xxxx.

Except I don't know anyone currently interested in implementing them,
and I'm still not sure if they work correctly for some of the more
complicated migration cases.

~Andrew
Chao Gao April 11, 2024, 11:14 a.m. UTC | #9
>> The problem is that we can end up with a guest running extra BHI
>> mitigations
>> while this is not needed. Could we inform the guest that eIBRS is not
>> available
>> on the system so a Linux guest doesn't run with extra BHI mitigations?
>
>Well, that's why Intel specified some MSRs at 0x5000xxxx.

Yes. But note that there is a subtle difference. Those MSRs are used for guest
to communicate in-used software mitigations to the host. Such information is
stable across migration. Here we need the host to communicate that eIBRS isn't
available to the guest. this isn't stable as the guest may be migrated from
a host without eIBRS to one with it.

>
>Except I don't know anyone currently interested in implementing them,
>and I'm still not sure if they work correctly for some of the more
>complicated migration cases.

Looks you have the same opinion on the Intel-defined virtual MSRs as Sean.
If we all agree the issue here and the effectivenss problem of the short
BHB-clearing sequence need to be resolved and don't think the Intel-defined
virtual MSRs can handle all cases correctly, we have to define a better
interface through community collaboration as Sean suggested.
Alexandre Chartre April 11, 2024, 1:20 p.m. UTC | #10
On 4/11/24 13:14, Chao Gao wrote:
>>> The problem is that we can end up with a guest running extra BHI
>>> mitigations
>>> while this is not needed. Could we inform the guest that eIBRS is not
>>> available
>>> on the system so a Linux guest doesn't run with extra BHI mitigations?
>>
>> Well, that's why Intel specified some MSRs at 0x5000xxxx.
> 
> Yes. But note that there is a subtle difference. Those MSRs are used for guest
> to communicate in-used software mitigations to the host. Such information is
> stable across migration. Here we need the host to communicate that eIBRS isn't
> available to the guest. this isn't stable as the guest may be migrated from
> a host without eIBRS to one with it.
> 
>>
>> Except I don't know anyone currently interested in implementing them,
>> and I'm still not sure if they work correctly for some of the more
>> complicated migration cases.
> 
> Looks you have the same opinion on the Intel-defined virtual MSRs as Sean.
> If we all agree the issue here and the effectivenss problem of the short
> BHB-clearing sequence need to be resolved and don't think the Intel-defined
> virtual MSRs can handle all cases correctly, we have to define a better
> interface through community collaboration as Sean suggested.

Another solution could be to add cpus to cpu_vuln_whitelist with BHI_NO.
(e.g. explicitly add cpus which have eIBRS). That way, the kernel will
figure out the right mitigation on the host and guest.

alex.
Paolo Bonzini April 11, 2024, 1:22 p.m. UTC | #11
On Thu, Apr 11, 2024 at 11:34 AM Alexandre Chartre
<alexandre.chartre@oracle.com> wrote:
>
> So you mean we can't set ARCH_CAP_BHI_NO for the guest because we don't know
> if the guest will run the (other) existing mitigations which are believed to
> suffice to mitigate BHI?
>
> The problem is that we can end up with a guest running extra BHI mitigations
> while this is not needed. Could we inform the guest that eIBRS is not available
> on the system so a Linux guest doesn't run with extra BHI mitigations?

The (Linux or otherwise) guest will make its own determinations as to
whether BHI mitigations are necessary. If the guest uses eIBRS, it
will run with mitigations. If you hide bit 1 of
MSR_IA32_ARCH_CAPABILITIES from the guest, it may decide to disable
it. But if the guest decides to use eIBRS, I think it should use
mitigations even if the host doesn't.

It's a different story if the host isn't susceptible altogether. The
ARCH_CAP_BHI_NO bit *can* be set if the processor doesn't have the bug
at all, which would be true if cpu_matches(cpu_vuln_whitelist,
NO_BHI). I would apply a patch to do that.

Paolo
Alexandre Chartre April 11, 2024, 1:32 p.m. UTC | #12
On 4/11/24 15:22, Paolo Bonzini wrote:
> On Thu, Apr 11, 2024 at 11:34 AM Alexandre Chartre
> <alexandre.chartre@oracle.com> wrote:
>>
>> So you mean we can't set ARCH_CAP_BHI_NO for the guest because we don't know
>> if the guest will run the (other) existing mitigations which are believed to
>> suffice to mitigate BHI?
>>
>> The problem is that we can end up with a guest running extra BHI mitigations
>> while this is not needed. Could we inform the guest that eIBRS is not available
>> on the system so a Linux guest doesn't run with extra BHI mitigations?
> 
> The (Linux or otherwise) guest will make its own determinations as to
> whether BHI mitigations are necessary. If the guest uses eIBRS, it
> will run with mitigations. If you hide bit 1 of
> MSR_IA32_ARCH_CAPABILITIES from the guest, it may decide to disable
> it. But if the guest decides to use eIBRS, I think it should use
> mitigations even if the host doesn't.

The problem is not on servers which have eIBRS, but on servers which don't.

If there is no eIBRS on the server, then the guest doesn't know if there is
effectively no eIBRS on the server or if eIBRS is hidden by the virtualization
so it applies the BHI mitigation even when that's not needed (i.e. when eIBRS
is effectively not present the server).

> It's a different story if the host isn't susceptible altogether. The
> ARCH_CAP_BHI_NO bit *can* be set if the processor doesn't have the bug
> at all, which would be true if cpu_matches(cpu_vuln_whitelist,
> NO_BHI). I would apply a patch to do that.
> 

Right. I have just suggested to enumerate cpus which have eIBRS with NO_BHI,
but we need would that precise list of cpus.

alex.
Andrew Cooper April 11, 2024, 2:13 p.m. UTC | #13
On 11/04/2024 2:32 pm, Alexandre Chartre wrote:
>
> On 4/11/24 15:22, Paolo Bonzini wrote:
>> On Thu, Apr 11, 2024 at 11:34 AM Alexandre Chartre
>> <alexandre.chartre@oracle.com> wrote:
>>>
>>> So you mean we can't set ARCH_CAP_BHI_NO for the guest because we
>>> don't know
>>> if the guest will run the (other) existing mitigations which are
>>> believed to
>>> suffice to mitigate BHI?
>>>
>>> The problem is that we can end up with a guest running extra BHI
>>> mitigations
>>> while this is not needed. Could we inform the guest that eIBRS is
>>> not available
>>> on the system so a Linux guest doesn't run with extra BHI mitigations?
>>
>> The (Linux or otherwise) guest will make its own determinations as to
>> whether BHI mitigations are necessary. If the guest uses eIBRS, it
>> will run with mitigations. If you hide bit 1 of
>> MSR_IA32_ARCH_CAPABILITIES from the guest, it may decide to disable
>> it. But if the guest decides to use eIBRS, I think it should use
>> mitigations even if the host doesn't.
>
> The problem is not on servers which have eIBRS, but on servers which
> don't.
>
> If there is no eIBRS on the server, then the guest doesn't know if
> there is
> effectively no eIBRS on the server or if eIBRS is hidden by the
> virtualization
> so it applies the BHI mitigation even when that's not needed (i.e.
> when eIBRS
> is effectively not present the server).
>
>> It's a different story if the host isn't susceptible altogether. The
>> ARCH_CAP_BHI_NO bit *can* be set if the processor doesn't have the bug
>> at all, which would be true if cpu_matches(cpu_vuln_whitelist,
>> NO_BHI). I would apply a patch to do that.
>>
>
> Right. I have just suggested to enumerate cpus which have eIBRS with
> NO_BHI,
> but we need would that precise list of cpus.

Intel stated that there are no current CPUs for which NO_BHI would be true.

What I take this to mean is "no CPUs analysing backwards as far as Intel
cared to go".

~Andrew
Alexandre Chartre April 11, 2024, 2:33 p.m. UTC | #14
On 4/11/24 16:13, Andrew Cooper wrote:
> On 11/04/2024 2:32 pm, Alexandre Chartre wrote:
>>
>> On 4/11/24 15:22, Paolo Bonzini wrote:
>>> On Thu, Apr 11, 2024 at 11:34 AM Alexandre Chartre
>>> <alexandre.chartre@oracle.com> wrote:
>>>>
>>>> So you mean we can't set ARCH_CAP_BHI_NO for the guest because we
>>>> don't know
>>>> if the guest will run the (other) existing mitigations which are
>>>> believed to
>>>> suffice to mitigate BHI?
>>>>
>>>> The problem is that we can end up with a guest running extra BHI
>>>> mitigations
>>>> while this is not needed. Could we inform the guest that eIBRS is
>>>> not available
>>>> on the system so a Linux guest doesn't run with extra BHI mitigations?
>>>
>>> The (Linux or otherwise) guest will make its own determinations as to
>>> whether BHI mitigations are necessary. If the guest uses eIBRS, it
>>> will run with mitigations. If you hide bit 1 of
>>> MSR_IA32_ARCH_CAPABILITIES from the guest, it may decide to disable
>>> it. But if the guest decides to use eIBRS, I think it should use
>>> mitigations even if the host doesn't.
>>
>> The problem is not on servers which have eIBRS, but on servers which
>> don't.
>>
>> If there is no eIBRS on the server, then the guest doesn't know if
>> there is
>> effectively no eIBRS on the server or if eIBRS is hidden by the
>> virtualization
>> so it applies the BHI mitigation even when that's not needed (i.e.
>> when eIBRS
>> is effectively not present the server).
>>
>>> It's a different story if the host isn't susceptible altogether. The
>>> ARCH_CAP_BHI_NO bit *can* be set if the processor doesn't have the bug
>>> at all, which would be true if cpu_matches(cpu_vuln_whitelist,
>>> NO_BHI). I would apply a patch to do that.
>>>
>>
>> Right. I have just suggested to enumerate cpus which have eIBRS with
>> NO_BHI,
>> but we need would that precise list of cpus.
> 
> Intel stated that there are no current CPUs for which NO_BHI would be true.
> 
> What I take this to mean is "no CPUs analysing backwards as far as Intel
> cared to go".
> 

Still, we could enumerate CPUs which don't have eIBRS independently of NO_BHI
(if we have a list of such CPUs) and set X86_BUG_BHI for cpus with eIBRS.

So in arch/x86/kernel/cpu/common.c, replace:

	/* When virtualized, eIBRS could be hidden, assume vulnerable */
	if (!(ia32_cap & ARCH_CAP_BHI_NO) &&
	    !cpu_matches(cpu_vuln_whitelist, NO_BHI) &&
	    (boot_cpu_has(X86_FEATURE_IBRS_ENHANCED) ||
	     boot_cpu_has(X86_FEATURE_HYPERVISOR)))
		setup_force_cpu_bug(X86_BUG_BHI);

with something like:

	if (!(ia32_cap & ARCH_CAP_BHI_NO) &&
	    !cpu_matches(cpu_vuln_whitelist, NO_BHI) &&
	    (boot_cpu_has(X86_FEATURE_IBRS_ENHANCED) ||
	    !cpu_matches(cpu_vuln_whitelist, NO_EIBRS)))
		setup_force_cpu_bug(X86_BUG_BHI);

alex.
Paolo Bonzini April 11, 2024, 2:46 p.m. UTC | #15
On Thu, Apr 11, 2024 at 4:34 PM Alexandre Chartre
<alexandre.chartre@oracle.com> wrote:
> Still, we could enumerate CPUs which don't have eIBRS independently of NO_BHI
> (if we have a list of such CPUs) and set X86_BUG_BHI for cpus with eIBRS.
>
> So in arch/x86/kernel/cpu/common.c, replace:
>
>         /* When virtualized, eIBRS could be hidden, assume vulnerable */
>         if (!(ia32_cap & ARCH_CAP_BHI_NO) &&
>             !cpu_matches(cpu_vuln_whitelist, NO_BHI) &&
>             (boot_cpu_has(X86_FEATURE_IBRS_ENHANCED) ||
>              boot_cpu_has(X86_FEATURE_HYPERVISOR)))
>                 setup_force_cpu_bug(X86_BUG_BHI);
>
> with something like:
>
>         if (!(ia32_cap & ARCH_CAP_BHI_NO) &&
>             !cpu_matches(cpu_vuln_whitelist, NO_BHI) &&
>             (boot_cpu_has(X86_FEATURE_IBRS_ENHANCED) ||
>             !cpu_matches(cpu_vuln_whitelist, NO_EIBRS)))
>                 setup_force_cpu_bug(X86_BUG_BHI);

No, that you cannot do because the hypervisor can and will fake the
family/model/stepping.

However, looking again at the original patch you submitted, I think
the review was confusing host and guest sides. If the host is "not
affected", i.e. the host *genuinely* does not have eIBRS, it can set
BHI_NO and that will bypass the "always mitigate in a guest" bit.

I think that's robust and boot_cpu_has_bug(X86_BUG_BHI) is the right
way to do it.

If a VM migration pool has both !eIBRS and eIBRS machines, it will
combine the two; on one hand it will not set the eIBRS bit (bit 1), on
the other hand it will not set BHI_NO either, and it will set the
hypervisor bit. The result is that the guest *will* use mitigations.

To double check, from the point of view of a nested hypervisor, you
could set BHI_NO in a nested guest:
* if the nested hypervisor has BHI_NO passed from the outer level
* or if its CPUID passes cpu_matches(cpu_vuln_whitelist, NO_BHI)
* but it won't matter whether the nested hypervisor lacks eIBRS,
because that bit is not reliable in a VM

The logic you'd use in KVM therefore is:

   (ia32_cap & ARCH_CAP_BHI_NO) ||
   cpu_matches(cpu_vuln_whitelist, NO_BHI) ||
   (!boot_cpu_has(X86_FEATURE_IBRS_ENHANCED) &&
    !boot_cpu_has(X86_FEATURE_HYPERVISOR)))

but that is exactly !boot_cpu_has_bug(X86_BUG_BHI) and is therefore
what Alexandre's patch does.

So I'll wait for further comments but I think the patch is correct.

Paolo
Alexandre Chartre April 11, 2024, 3:12 p.m. UTC | #16
On 4/11/24 16:46, Paolo Bonzini wrote:
> On Thu, Apr 11, 2024 at 4:34 PM Alexandre Chartre
> <alexandre.chartre@oracle.com> wrote:
>> Still, we could enumerate CPUs which don't have eIBRS independently of NO_BHI
>> (if we have a list of such CPUs) and set X86_BUG_BHI for cpus with eIBRS.
>>
>> So in arch/x86/kernel/cpu/common.c, replace:
>>
>>          /* When virtualized, eIBRS could be hidden, assume vulnerable */
>>          if (!(ia32_cap & ARCH_CAP_BHI_NO) &&
>>              !cpu_matches(cpu_vuln_whitelist, NO_BHI) &&
>>              (boot_cpu_has(X86_FEATURE_IBRS_ENHANCED) ||
>>               boot_cpu_has(X86_FEATURE_HYPERVISOR)))
>>                  setup_force_cpu_bug(X86_BUG_BHI);
>>
>> with something like:
>>
>>          if (!(ia32_cap & ARCH_CAP_BHI_NO) &&
>>              !cpu_matches(cpu_vuln_whitelist, NO_BHI) &&
>>              (boot_cpu_has(X86_FEATURE_IBRS_ENHANCED) ||
>>              !cpu_matches(cpu_vuln_whitelist, NO_EIBRS)))
>>                  setup_force_cpu_bug(X86_BUG_BHI);
> 
> No, that you cannot do because the hypervisor can and will fake the
> family/model/stepping.
> 
> However, looking again at the original patch you submitted, I think
> the review was confusing host and guest sides. If the host is "not
> affected", i.e. the host *genuinely* does not have eIBRS, it can set
> BHI_NO and that will bypass the "always mitigate in a guest" bit.
> 
> I think that's robust and boot_cpu_has_bug(X86_BUG_BHI) is the right
> way to do it.
> 
> If a VM migration pool has both !eIBRS and eIBRS machines, it will
> combine the two; on one hand it will not set the eIBRS bit (bit 1), on
> the other hand it will not set BHI_NO either, and it will set the
> hypervisor bit. The result is that the guest *will* use mitigations.
> 
> To double check, from the point of view of a nested hypervisor, you
> could set BHI_NO in a nested guest:
> * if the nested hypervisor has BHI_NO passed from the outer level
> * or if its CPUID passes cpu_matches(cpu_vuln_whitelist, NO_BHI)
> * but it won't matter whether the nested hypervisor lacks eIBRS,
> because that bit is not reliable in a VM
> 
> The logic you'd use in KVM therefore is:
> 
>     (ia32_cap & ARCH_CAP_BHI_NO) ||
>     cpu_matches(cpu_vuln_whitelist, NO_BHI) ||
>     (!boot_cpu_has(X86_FEATURE_IBRS_ENHANCED) &&
>      !boot_cpu_has(X86_FEATURE_HYPERVISOR)))
> 
> but that is exactly !boot_cpu_has_bug(X86_BUG_BHI) and is therefore
> what Alexandre's patch does.
> 
> So I'll wait for further comments but I think the patch is correct.
> 

I think that Andrew's concern is that if there is no eIBRS on the host then
we do not set X86_BUG_BHI on the host because we know the kernel which is
running and this kernel has some mitigations (other than the explicit BHI
mitigations) and these mitigations are enough to prevent BHI. But still
the cpu is affected by BHI.

If you set ARCH_CAP_BHI_NO for the guest then you tell the guest that the
cpu is not affected by BHI although it is. The guest can be running a
different kernel or OS which doesn't necessarily have the (basic) mitigations
used in the host kernel that mitigate BHI.

alex.
Paolo Bonzini April 11, 2024, 3:20 p.m. UTC | #17
On Thu, Apr 11, 2024 at 5:13 PM Alexandre Chartre
<alexandre.chartre@oracle.com> wrote:
> I think that Andrew's concern is that if there is no eIBRS on the host then
> we do not set X86_BUG_BHI on the host because we know the kernel which is
> running and this kernel has some mitigations (other than the explicit BHI
> mitigations) and these mitigations are enough to prevent BHI. But still
> the cpu is affected by BHI.

Hmm, then I'm confused. It's what I wrote before: "The (Linux or
otherwise) guest will make its own determinations as to whether BHI
mitigations are necessary. If the guest uses eIBRS, it will run with
mitigations" but you said machines without eIBRS are fine.

If instead they are only fine _with Linux_, then yeah we cannot set
BHI_NO in general. What we can do is define a new bit that is in the
KVM leaves. The new bit is effectively !eIBRS, except that it is
defined in such a way that, in a mixed migration pool, both eIBRS and
the new bit will be 0.

Paolo
Chao Gao April 11, 2024, 3:56 p.m. UTC | #18
On Thu, Apr 11, 2024 at 05:20:30PM +0200, Paolo Bonzini wrote:
>On Thu, Apr 11, 2024 at 5:13 PM Alexandre Chartre
><alexandre.chartre@oracle.com> wrote:
>> I think that Andrew's concern is that if there is no eIBRS on the host then
>> we do not set X86_BUG_BHI on the host because we know the kernel which is
>> running and this kernel has some mitigations (other than the explicit BHI
>> mitigations) and these mitigations are enough to prevent BHI. But still
>> the cpu is affected by BHI.
>
>Hmm, then I'm confused. It's what I wrote before: "The (Linux or
>otherwise) guest will make its own determinations as to whether BHI
>mitigations are necessary. If the guest uses eIBRS, it will run with
>mitigations" but you said machines without eIBRS are fine.
>
>If instead they are only fine _with Linux_, then yeah we cannot set
>BHI_NO in general. What we can do is define a new bit that is in the
>KVM leaves. The new bit is effectively !eIBRS, except that it is
>defined in such a way that, in a mixed migration pool, both eIBRS and
>the new bit will be 0.

This looks a good solution.

We can also introduce a new bit indicating the effectiveness of the short
BHB-clearing sequence. KVM advertises this bit for all pre-SPR/ADL parts.
Only if the bit is 1, guests will use the short BHB-clearing sequence.
Otherwise guests should use the long sequence. In a mixed migration pool,
the VMM shouldn't expose the bit to guests.

>
>Paolo
>
>
Konrad Rzeszutek Wilk April 11, 2024, 8:50 p.m. UTC | #19
On Thu, Apr 11, 2024 at 11:56:39PM +0800, Chao Gao wrote:
> On Thu, Apr 11, 2024 at 05:20:30PM +0200, Paolo Bonzini wrote:
> >On Thu, Apr 11, 2024 at 5:13 PM Alexandre Chartre
> ><alexandre.chartre@oracle.com> wrote:
> >> I think that Andrew's concern is that if there is no eIBRS on the host then
> >> we do not set X86_BUG_BHI on the host because we know the kernel which is
> >> running and this kernel has some mitigations (other than the explicit BHI
> >> mitigations) and these mitigations are enough to prevent BHI. But still
> >> the cpu is affected by BHI.
> >
> >Hmm, then I'm confused. It's what I wrote before: "The (Linux or
> >otherwise) guest will make its own determinations as to whether BHI
> >mitigations are necessary. If the guest uses eIBRS, it will run with
> >mitigations" but you said machines without eIBRS are fine.
> >
> >If instead they are only fine _with Linux_, then yeah we cannot set
> >BHI_NO in general. What we can do is define a new bit that is in the
> >KVM leaves. The new bit is effectively !eIBRS, except that it is
> >defined in such a way that, in a mixed migration pool, both eIBRS and
> >the new bit will be 0.
> 
> This looks a good solution.
> 
> We can also introduce a new bit indicating the effectiveness of the short
> BHB-clearing sequence. KVM advertises this bit for all pre-SPR/ADL parts.
> Only if the bit is 1, guests will use the short BHB-clearing sequence.
> Otherwise guests should use the long sequence. In a mixed migration pool,
> the VMM shouldn't expose the bit to guests.

Is there a link to this 'short BHB-clearing sequence'?

But on your email, should a Skylake guests enable IBRS (or retpoline)
and have the short BHB clearing sequence?

And IceLake/Cascade lake should use eIBRS (or retpoline) and short BHB
clearing sequence?

If we already know all of this why does the hypervisor need to advertise
this to the guest? They can lookup the CPU data to make this determination, no?

I don't actually understand how one could do a mixed migration pool with
the various mitigations one has to engage (or not) based on the host one
is running under.
> 
> >
> >Paolo
> >
> >
Chao Gao April 12, 2024, 3:24 a.m. UTC | #20
On Thu, Apr 11, 2024 at 04:50:12PM -0400, Konrad Rzeszutek Wilk wrote:
>On Thu, Apr 11, 2024 at 11:56:39PM +0800, Chao Gao wrote:
>> On Thu, Apr 11, 2024 at 05:20:30PM +0200, Paolo Bonzini wrote:
>> >On Thu, Apr 11, 2024 at 5:13 PM Alexandre Chartre
>> ><alexandre.chartre@oracle.com> wrote:
>> >> I think that Andrew's concern is that if there is no eIBRS on the host then
>> >> we do not set X86_BUG_BHI on the host because we know the kernel which is
>> >> running and this kernel has some mitigations (other than the explicit BHI
>> >> mitigations) and these mitigations are enough to prevent BHI. But still
>> >> the cpu is affected by BHI.
>> >
>> >Hmm, then I'm confused. It's what I wrote before: "The (Linux or
>> >otherwise) guest will make its own determinations as to whether BHI
>> >mitigations are necessary. If the guest uses eIBRS, it will run with
>> >mitigations" but you said machines without eIBRS are fine.
>> >
>> >If instead they are only fine _with Linux_, then yeah we cannot set
>> >BHI_NO in general. What we can do is define a new bit that is in the
>> >KVM leaves. The new bit is effectively !eIBRS, except that it is
>> >defined in such a way that, in a mixed migration pool, both eIBRS and
>> >the new bit will be 0.
>> 
>> This looks a good solution.
>> 
>> We can also introduce a new bit indicating the effectiveness of the short
>> BHB-clearing sequence. KVM advertises this bit for all pre-SPR/ADL parts.
>> Only if the bit is 1, guests will use the short BHB-clearing sequence.
>> Otherwise guests should use the long sequence. In a mixed migration pool,
>> the VMM shouldn't expose the bit to guests.
>
>Is there a link to this 'short BHB-clearing sequence'?

https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html#inpage-nav-4-4

>
>But on your email, should a Skylake guests enable IBRS (or retpoline)
>and have the short BHB clearing sequence?
>
>And IceLake/Cascade lake should use eIBRS (or retpoline) and short BHB
>clearing sequence?
>
>If we already know all of this why does the hypervisor need to advertise
>this to the guest? They can lookup the CPU data to make this determination, no?
>
>I don't actually understand how one could do a mixed migration pool with
>the various mitigations one has to engage (or not) based on the host one
>is running under.

In my understanding, it is done at the cost of performance. The idea is to
report the "worst" case in a mixed migration pool to guests, i.e.,

  Hey, you are running on a host where eIBRS is available (and/or the short
  BHB-clearing sequnece is ineffective). Now, select your mitigation for BHI.

Then no matter which system in the pool the guest is migrated to, the guest is
not vulnerable if it deployed a mitigation for the "worst" case (in general,
this means a mitigation with larger overhead).

The good thing is migration in a mixed pool won't compromise the security level
of guests and guests in a homogeneous pool won't experience any performance loss.

>> 
>> >
>> >Paolo
>> >
>> >
Konrad Rzeszutek Wilk April 12, 2024, 4:33 p.m. UTC | #21
On Fri, Apr 12, 2024 at 11:24:54AM +0800, Chao Gao wrote:
> On Thu, Apr 11, 2024 at 04:50:12PM -0400, Konrad Rzeszutek Wilk wrote:
> >On Thu, Apr 11, 2024 at 11:56:39PM +0800, Chao Gao wrote:
> >> On Thu, Apr 11, 2024 at 05:20:30PM +0200, Paolo Bonzini wrote:
> >> >On Thu, Apr 11, 2024 at 5:13 PM Alexandre Chartre
> >> ><alexandre.chartre@oracle.com> wrote:
> >> >> I think that Andrew's concern is that if there is no eIBRS on the host then
> >> >> we do not set X86_BUG_BHI on the host because we know the kernel which is
> >> >> running and this kernel has some mitigations (other than the explicit BHI
> >> >> mitigations) and these mitigations are enough to prevent BHI. But still
> >> >> the cpu is affected by BHI.
> >> >
> >> >Hmm, then I'm confused. It's what I wrote before: "The (Linux or
> >> >otherwise) guest will make its own determinations as to whether BHI
> >> >mitigations are necessary. If the guest uses eIBRS, it will run with
> >> >mitigations" but you said machines without eIBRS are fine.
> >> >
> >> >If instead they are only fine _with Linux_, then yeah we cannot set
> >> >BHI_NO in general. What we can do is define a new bit that is in the
> >> >KVM leaves. The new bit is effectively !eIBRS, except that it is
> >> >defined in such a way that, in a mixed migration pool, both eIBRS and
> >> >the new bit will be 0.
> >> 
> >> This looks a good solution.
> >> 
> >> We can also introduce a new bit indicating the effectiveness of the short
> >> BHB-clearing sequence. KVM advertises this bit for all pre-SPR/ADL parts.
> >> Only if the bit is 1, guests will use the short BHB-clearing sequence.
> >> Otherwise guests should use the long sequence. In a mixed migration pool,
> >> the VMM shouldn't expose the bit to guests.
> >
> >Is there a link to this 'short BHB-clearing sequence'?
> 
> https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html#inpage-nav-4-4
> 
> >
> >But on your email, should a Skylake guests enable IBRS (or retpoline)
> >and have the short BHB clearing sequence?
> >
> >And IceLake/Cascade lake should use eIBRS (or retpoline) and short BHB
> >clearing sequence?
> >
> >If we already know all of this why does the hypervisor need to advertise
> >this to the guest? They can lookup the CPU data to make this determination, no?
> >
> >I don't actually understand how one could do a mixed migration pool with
> >the various mitigations one has to engage (or not) based on the host one
> >is running under.
> 
> In my understanding, it is done at the cost of performance. The idea is to
> report the "worst" case in a mixed migration pool to guests, i.e.,
> 
>   Hey, you are running on a host where eIBRS is available (and/or the short
>   BHB-clearing sequnece is ineffective). Now, select your mitigation for BHI.

And if the pool could be Skylake,IceLake,CascadeLake,Sapphire Rappids then lowest common one is IBRS.
And you would expose long-clearing BHI sequence enabled for all of them too, me thinks?

I would think that the use case for this mixed migration pool is not
workable anymore unless you expose the Family,Model,Stepping so that the
VM can engage the right mitigation. And then when it is Live Migrated
you re-engage the correct one (So from SkyLake to CascadeLake you kick
turn on eIBRS and BHI. When you move from CascadeLake to Skylake you
turn off BHI and enable retpoline).  But nobody has done that work and
nobody will, so why are we debating this?

> 
> Then no matter which system in the pool the guest is migrated to, the guest is
> not vulnerable if it deployed a mitigation for the "worst" case (in general,
> this means a mitigation with larger overhead).
> 
> The good thing is migration in a mixed pool won't compromise the security level
> of guests and guests in a homogeneous pool won't experience any performance loss.

I understand what you are saying, but I can't actually see someone
wanting to do this as either you get horrible performance (engage all
the mitigation making the VM be 50% slower), or some at runtime (but
nobody has done the work and nobody will as Thomas will not want runtime
knobs for mitigation).

So how about we just try to solve the problem for the 99% of homogenous pools?

And not make Skylake guests slower than they already are?
> 
> >> 
> >> >
> >> >Paolo
> >> >
> >> >
Alexandre Chartre April 15, 2024, 3:14 p.m. UTC | #22
On 4/11/24 15:20, Alexandre Chartre wrote:
> 
> On 4/11/24 13:14, Chao Gao wrote:
>>>> The problem is that we can end up with a guest running extra BHI
>>>> mitigations
>>>> while this is not needed. Could we inform the guest that eIBRS is not
>>>> available
>>>> on the system so a Linux guest doesn't run with extra BHI mitigations?
>>>
>>> Well, that's why Intel specified some MSRs at 0x5000xxxx.
>>
>> Yes. But note that there is a subtle difference. Those MSRs are used for guest
>> to communicate in-used software mitigations to the host. Such information is
>> stable across migration. Here we need the host to communicate that eIBRS isn't
>> available to the guest. this isn't stable as the guest may be migrated from
>> a host without eIBRS to one with it.
>>
>>>
>>> Except I don't know anyone currently interested in implementing them,
>>> and I'm still not sure if they work correctly for some of the more
>>> complicated migration cases.
>>
>> Looks you have the same opinion on the Intel-defined virtual MSRs as Sean.
>> If we all agree the issue here and the effectivenss problem of the short
>> BHB-clearing sequence need to be resolved and don't think the Intel-defined
>> virtual MSRs can handle all cases correctly, we have to define a better
>> interface through community collaboration as Sean suggested.
> 
> Another solution could be to add cpus to cpu_vuln_whitelist with BHI_NO.
> (e.g. explicitly add cpus which have eIBRS). That way, the kernel will
> figure out the right mitigation on the host and guest.
> 

More precisely we could something like this (this is just an example, obviously
the list is clearly incomplete):

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 754d91857d63..80477170ccc0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1182,6 +1182,24 @@ static const __initconst struct x86_cpu_id cpu_vuln_whitelist[] = {
         VULNWL_INTEL(ATOM_TREMONT_L,            NO_EIBRS_PBRSB),
         VULNWL_INTEL(ATOM_TREMONT_D,            NO_ITLB_MULTIHIT | NO_EIBRS_PBRSB),
  
+       /*
+        * The following Intel CPUs are affected by BHI, but they don't have
+        * the eIBRS feature. In that case, the default Spectre v2 mitigations
+        * are enough to also mitigate BHI. We mark these CPUs with NO_BHI so
+        * that X86_BUG_BHI doesn't get set and no extra BHI mitigation is
+        * enabled.
+        *
+        * This avoids guest VMs from enabling extra BHI mitigation when this
+        * is not needed. For guest, X86_BUG_BHI is never set for CPUs which
+        * don't have the eIBRS feature. But this doesn't happen in guest VMs
+        * as the virtualization can hide the eIBRS feature.
+        */
+       VULNWL_INTEL(IVYBRIDGE_X,               NO_BHI),
+       VULNWL_INTEL(HASWELL_X,                 NO_BHI),
+       VULNWL_INTEL(BROADWELL_X,               NO_BHI),
+       VULNWL_INTEL(SKYLAKE_X,                 NO_BHI),
+       VULNWL_INTEL(SKYLAKE_X,                 NO_BHI),
+
         /* AMD Family 0xf - 0x12 */
         VULNWL_AMD(0x0f,        NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO | NO_BHI),
         VULNWL_AMD(0x10,        NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO | NO_BH


alex.
Dave Hansen April 15, 2024, 5:17 p.m. UTC | #23
> +       /*
> +        * The following Intel CPUs are affected by BHI, but they don't have
> +        * the eIBRS feature. In that case, the default Spectre v2 mitigations
> +        * are enough to also mitigate BHI. We mark these CPUs with NO_BHI so
> +        * that X86_BUG_BHI doesn't get set and no extra BHI mitigation is
> +        * enabled.
> +        *
> +        * This avoids guest VMs from enabling extra BHI mitigation when this
> +        * is not needed. For guest, X86_BUG_BHI is never set for CPUs which
> +        * don't have the eIBRS feature. But this doesn't happen in guest VMs
> +        * as the virtualization can hide the eIBRS feature.
> +        */
> +       VULNWL_INTEL(IVYBRIDGE_X,               NO_BHI),
> +       VULNWL_INTEL(HASWELL_X,                 NO_BHI),
> +       VULNWL_INTEL(BROADWELL_X,               NO_BHI),
> +       VULNWL_INTEL(SKYLAKE_X,                 NO_BHI),
> +       VULNWL_INTEL(SKYLAKE_X,                 NO_BHI),

Isn't this at odds with the existing comment?

        /* When virtualized, eIBRS could be hidden, assume vulnerable */

Because it seems now that we've got two relatively conflicting pieces of
vulnerability information when running under a hypervisor.
Alexandre Chartre April 16, 2024, 8:41 a.m. UTC | #24
On 4/15/24 19:17, Dave Hansen wrote:
>> +       /*
>> +        * The following Intel CPUs are affected by BHI, but they don't have
>> +        * the eIBRS feature. In that case, the default Spectre v2 mitigations
>> +        * are enough to also mitigate BHI. We mark these CPUs with NO_BHI so
>> +        * that X86_BUG_BHI doesn't get set and no extra BHI mitigation is
>> +        * enabled.
>> +        *
>> +        * This avoids guest VMs from enabling extra BHI mitigation when this
>> +        * is not needed. For guest, X86_BUG_BHI is never set for CPUs which
>> +        * don't have the eIBRS feature. But this doesn't happen in guest VMs
>> +        * as the virtualization can hide the eIBRS feature.
>> +        */
>> +       VULNWL_INTEL(IVYBRIDGE_X,               NO_BHI),
>> +       VULNWL_INTEL(HASWELL_X,                 NO_BHI),
>> +       VULNWL_INTEL(BROADWELL_X,               NO_BHI),
>> +       VULNWL_INTEL(SKYLAKE_X,                 NO_BHI),
>> +       VULNWL_INTEL(SKYLAKE_X,                 NO_BHI),
> 
> Isn't this at odds with the existing comment?
> 
>          /* When virtualized, eIBRS could be hidden, assume vulnerable */
> 
> Because it seems now that we've got two relatively conflicting pieces of
> vulnerability information when running under a hypervisor.

It's not at odd, it's an additional information. The comment covers the general
case.

When running under a hypervisor then the kernel can't rely on CPU features to
find if the server has eIBRS or not, because the virtualization can be hiding
eIBRS. And that's the problem because the kernel might enable BHI mitigation
while it's not needed.

For example on Skylake: on the host, the kernel won't see eIBRS so it won't set
X86_BUG_BHI. But in a guest on the same platform, the kernel will set X86_BUG_BHI
because it doesn't know if the server doesn't effectively have eIBRS or if eIBRS
is hidden by virtualization.

With the patch, the kernel can know if the CPU it is running on (e.g. Skylake)
needs extra BHI mitigation or not. Then it can safely not enable BHI mitigation
no matter if it is running on host or in guest.

alex.
Konrad Rzeszutek Wilk April 25, 2024, 8:45 p.m. UTC | #25
On Tue, Apr 16, 2024 at 10:41:58AM +0200, Alexandre Chartre wrote:
> 
> On 4/15/24 19:17, Dave Hansen wrote:
> > > +       /*
> > > +        * The following Intel CPUs are affected by BHI, but they don't have
> > > +        * the eIBRS feature. In that case, the default Spectre v2 mitigations
> > > +        * are enough to also mitigate BHI. We mark these CPUs with NO_BHI so
> > > +        * that X86_BUG_BHI doesn't get set and no extra BHI mitigation is
> > > +        * enabled.
> > > +        *
> > > +        * This avoids guest VMs from enabling extra BHI mitigation when this
> > > +        * is not needed. For guest, X86_BUG_BHI is never set for CPUs which
> > > +        * don't have the eIBRS feature. But this doesn't happen in guest VMs
> > > +        * as the virtualization can hide the eIBRS feature.
> > > +        */
> > > +       VULNWL_INTEL(IVYBRIDGE_X,               NO_BHI),
> > > +       VULNWL_INTEL(HASWELL_X,                 NO_BHI),
> > > +       VULNWL_INTEL(BROADWELL_X,               NO_BHI),
> > > +       VULNWL_INTEL(SKYLAKE_X,                 NO_BHI),
> > > +       VULNWL_INTEL(SKYLAKE_X,                 NO_BHI),
> > 
> > Isn't this at odds with the existing comment?
> > 
> >          /* When virtualized, eIBRS could be hidden, assume vulnerable */
> > 
> > Because it seems now that we've got two relatively conflicting pieces of
> > vulnerability information when running under a hypervisor.
> 
> It's not at odd, it's an additional information. The comment covers the general
> case.
> 
> When running under a hypervisor then the kernel can't rely on CPU features to
> find if the server has eIBRS or not, because the virtualization can be hiding
> eIBRS. And that's the problem because the kernel might enable BHI mitigation
> while it's not needed.
> 
> For example on Skylake: on the host, the kernel won't see eIBRS so it won't set
> X86_BUG_BHI. But in a guest on the same platform, the kernel will set X86_BUG_BHI
> because it doesn't know if the server doesn't effectively have eIBRS or if eIBRS
> is hidden by virtualization.
> 
> With the patch, the kernel can know if the CPU it is running on (e.g. Skylake)
> needs extra BHI mitigation or not. Then it can safely not enable BHI mitigation
> no matter if it is running on host or in guest.

Where do we want to go with this one?

The problem (which I think is not understood) is that on Skylake there
is no Enhanced IBRS support. There is either IBRS or retpoline - and when IBRS
is enabled it does the job of mitigating against BHI.

And as of right now on Skylake guests we enable BHI _and_ IBRS for extra
slowdown.

We can't disable IBRS as it mitigates against other bugs too, so how do
you folks want to disable automatically BHI on Skylake with the least
amount of code?
diff mbox series

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 984ea2089efc..f43d3c15a6b7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1678,6 +1678,9 @@  static u64 kvm_get_arch_capabilities(void)
 	if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
 		data |= ARCH_CAP_GDS_NO;
 
+	if (!boot_cpu_has_bug(X86_BUG_BHI))
+		data |= ARCH_CAP_BHI_NO;
+
 	return data;
 }