diff mbox series

x86/sev: Check whether SEV or SME is supported first

Message ID 20210526072424.22453-1-puwen@hygon.cn (mailing list archive)
State New, archived
Headers show
Series x86/sev: Check whether SEV or SME is supported first | expand

Commit Message

Pu Wen May 26, 2021, 7:24 a.m. UTC
The first two bits of the CPUID leaf 0x8000001F EAX indicate whether
SEV or SME is supported respectively. It's better to check whether
SEV or SME is supported before checking the SEV MSR(0xc0010131) to
see whether SEV or SME is enabled.

This also avoid the MSR reading failure on the first generation Hygon
Dhyana CPU which does not support SEV or SME.

Fixes: eab696d8e8b9 ("x86/sev: Do not require Hypervisor CPUID bit for SEV guests")
Cc: <stable@vger.kernel.org> # v5.10+
Signed-off-by: Pu Wen <puwen@hygon.cn>
---
 arch/x86/mm/mem_encrypt_identity.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

Comments

Sean Christopherson May 26, 2021, 5:27 p.m. UTC | #1
On Wed, May 26, 2021, Pu Wen wrote:
> The first two bits of the CPUID leaf 0x8000001F EAX indicate whether
> SEV or SME is supported respectively. It's better to check whether
> SEV or SME is supported before checking the SEV MSR(0xc0010131) to
> see whether SEV or SME is enabled.
> 
> This also avoid the MSR reading failure on the first generation Hygon
> Dhyana CPU which does not support SEV or SME.
> 
> Fixes: eab696d8e8b9 ("x86/sev: Do not require Hypervisor CPUID bit for SEV guests")
> Cc: <stable@vger.kernel.org> # v5.10+
> Signed-off-by: Pu Wen <puwen@hygon.cn>
> ---
>  arch/x86/mm/mem_encrypt_identity.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
> index a9639f663d25..470b20208430 100644
> --- a/arch/x86/mm/mem_encrypt_identity.c
> +++ b/arch/x86/mm/mem_encrypt_identity.c
> @@ -504,10 +504,6 @@ void __init sme_enable(struct boot_params *bp)
>  #define AMD_SME_BIT	BIT(0)
>  #define AMD_SEV_BIT	BIT(1)
>  
> -	/* Check the SEV MSR whether SEV or SME is enabled */
> -	sev_status   = __rdmsr(MSR_AMD64_SEV);
> -	feature_mask = (sev_status & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;
> -
>  	/*
>  	 * Check for the SME/SEV feature:
>  	 *   CPUID Fn8000_001F[EAX]
> @@ -519,11 +515,16 @@ void __init sme_enable(struct boot_params *bp)
>  	eax = 0x8000001f;
>  	ecx = 0;
>  	native_cpuid(&eax, &ebx, &ecx, &edx);
> -	if (!(eax & feature_mask))
> +	/* Check whether SEV or SME is supported */
> +	if (!(eax & (AMD_SEV_BIT | AMD_SME_BIT)))

Hmm, checking CPUID at all before MSR_AMD64_SEV is flawed for SEV, e.g. the VMM
doesn't need to pass-through CPUID to attack the guest, it can lie directly.

SEV-ES is protected by virtue of CPUID interception being reflected as #VC, which
effectively tells the guest that it's (probably) an SEV-ES guest and also gives
the guest the opportunity to sanity check the emulated CPUID values provided by
the VMM.

In other words, this patch is flawed, but commit eab696d8e8b9 was also flawed by
conditioning the SEV path on CPUID.0x80000000.

Given that #VC can be handled cleanly, the kernel should be able to handle a #GP
at this point.  So I think the proper fix is to change __rdmsr() to
native_read_msr_safe(), or an open coded variant if necessary, and drop the CPUID
checks for SEV.

The other alternative is to admit that the VMM is still trusted for SEV guests
and take this patch as is (with a reworded changelog).  This probably has my
vote, I don't see much value in pretending that the VMM can't exfiltrate data
from an SEV guest.  In fact, a malicious VMM is probably more likely to get
access to interesting data by _not_ lying about SEV being enabled, because lying
about SEV itself will hose the guest sooner than later.

>  		return;
>  
>  	me_mask = 1UL << (ebx & 0x3f);
>  
> +	/* Check the SEV MSR whether SEV or SME is enabled */
> +	sev_status   = __rdmsr(MSR_AMD64_SEV);
> +	feature_mask = (sev_status & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;
> +
>  	/* Check if memory encryption is enabled */
>  	if (feature_mask == AMD_SME_BIT) {
>  		/*
> -- 
> 2.23.0
>
Pu Wen May 27, 2021, 3:08 p.m. UTC | #2
On 2021/5/27 1:27, Sean Christopherson wrote:
> On Wed, May 26, 2021, Pu Wen wrote:
>> The first two bits of the CPUID leaf 0x8000001F EAX indicate whether
>> SEV or SME is supported respectively. It's better to check whether
>> SEV or SME is supported before checking the SEV MSR(0xc0010131) to
>> see whether SEV or SME is enabled.
>>
>> This also avoid the MSR reading failure on the first generation Hygon
>> Dhyana CPU which does not support SEV or SME.
>>
>> Fixes: eab696d8e8b9 ("x86/sev: Do not require Hypervisor CPUID bit for SEV guests")
>> Cc: <stable@vger.kernel.org> # v5.10+
>> Signed-off-by: Pu Wen <puwen@hygon.cn>
>> ---
>>   arch/x86/mm/mem_encrypt_identity.c | 11 ++++++-----
>>   1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
>> index a9639f663d25..470b20208430 100644
>> --- a/arch/x86/mm/mem_encrypt_identity.c
>> +++ b/arch/x86/mm/mem_encrypt_identity.c
>> @@ -504,10 +504,6 @@ void __init sme_enable(struct boot_params *bp)
>>   #define AMD_SME_BIT	BIT(0)
>>   #define AMD_SEV_BIT	BIT(1)
>>   
>> -	/* Check the SEV MSR whether SEV or SME is enabled */
>> -	sev_status   = __rdmsr(MSR_AMD64_SEV);
>> -	feature_mask = (sev_status & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;
>> -
>>   	/*
>>   	 * Check for the SME/SEV feature:
>>   	 *   CPUID Fn8000_001F[EAX]
>> @@ -519,11 +515,16 @@ void __init sme_enable(struct boot_params *bp)
>>   	eax = 0x8000001f;
>>   	ecx = 0;
>>   	native_cpuid(&eax, &ebx, &ecx, &edx);
>> -	if (!(eax & feature_mask))
>> +	/* Check whether SEV or SME is supported */
>> +	if (!(eax & (AMD_SEV_BIT | AMD_SME_BIT)))
> 
> Hmm, checking CPUID at all before MSR_AMD64_SEV is flawed for SEV, e.g. the VMM
> doesn't need to pass-through CPUID to attack the guest, it can lie directly.
> 
> SEV-ES is protected by virtue of CPUID interception being reflected as #VC, which
> effectively tells the guest that it's (probably) an SEV-ES guest and also gives
> the guest the opportunity to sanity check the emulated CPUID values provided by
> the VMM.
> 
> In other words, this patch is flawed, but commit eab696d8e8b9 was also flawed by
> conditioning the SEV path on CPUID.0x80000000.

Yes, so I think we'd better admit that the VMM is still trusted for SEV guests
as you mentioned below.

> 
> Given that #VC can be handled cleanly, the kernel should be able to handle a #GP
> at this point.  So I think the proper fix is to change __rdmsr() to
> native_read_msr_safe(), or an open coded variant if necessary, and drop the CPUID

Reading MSR_AMD64_SEV which is not implemented on Hygon Dhyana CPU will cause
the kernel reboot, and native_read_msr_safe() has no help.

> checks for SEV.
> 
> The other alternative is to admit that the VMM is still trusted for SEV guests

Agree with that.
Joerg Roedel May 31, 2021, 9:37 a.m. UTC | #3
On Thu, May 27, 2021 at 11:08:32PM +0800, Pu Wen wrote:
> Reading MSR_AMD64_SEV which is not implemented on Hygon Dhyana CPU will cause
> the kernel reboot, and native_read_msr_safe() has no help.

The reason for the reboot is that there is no #GP or #DF handler set up
when this MSR is read, so its propagated to a shutdown event. But there
is an IDT already, so you can set up early and #GP handler to fix the
reboot.

Regards,

	Joerg
Pu Wen May 31, 2021, 2:56 p.m. UTC | #4
On 2021/5/31 17:37, Joerg Roedel wrote:
> On Thu, May 27, 2021 at 11:08:32PM +0800, Pu Wen wrote:
>> Reading MSR_AMD64_SEV which is not implemented on Hygon Dhyana CPU will cause
>> the kernel reboot, and native_read_msr_safe() has no help.
> 
> The reason for the reboot is that there is no #GP or #DF handler set up
> when this MSR is read, so its propagated to a shutdown event. But there
> is an IDT already, so you can set up early and #GP handler to fix the
> reboot.

Thanks for your suggestion, I'll try to set up early #GP handler to fix
the problem.
Borislav Petkov June 1, 2021, 2:39 p.m. UTC | #5
On Mon, May 31, 2021 at 10:56:50PM +0800, Pu Wen wrote:
> Thanks for your suggestion, I'll try to set up early #GP handler to fix
> the problem.

Why? AFAICT, you only need to return early in sme_enable() if CPUID is
not "AuthenticAMD". Just do that please.

Thx.
Sean Christopherson June 1, 2021, 4:14 p.m. UTC | #6
On Tue, Jun 01, 2021, Borislav Petkov wrote:
> On Mon, May 31, 2021 at 10:56:50PM +0800, Pu Wen wrote:
> > Thanks for your suggestion, I'll try to set up early #GP handler to fix
> > the problem.
> 
> Why? AFAICT, you only need to return early in sme_enable() if CPUID is
> not "AuthenticAMD". Just do that please.

I don't think that would suffice, presumably MSR_AMD64_SEV doesn't exist on older
AMD CPUs either.  E.g. there's no mention of MSR 0xC001_0131 in the dev's guide
from 2015[*].

I also don't see the point in checking the vendor string.  A malicious hypervisor
can lie about CPUID.0x0 just as easily as it can lie about CPUID.0x8000001f, so
for SEV the options are to either trust the hypervisor or eat #GPs on RDMSR for
non-SEV CPUs.  If we go with "trust the hypervisor", then the original patch of
hoisting the CPUID.0x8000001f check up is simpler than checking the vendor string.


[*] https://www.amd.com/system/files/TechDocs/48751_16h_bkdg.pdf
Tom Lendacky June 1, 2021, 4:36 p.m. UTC | #7
On 6/1/21 11:14 AM, Sean Christopherson wrote:
> On Tue, Jun 01, 2021, Borislav Petkov wrote:
>> On Mon, May 31, 2021 at 10:56:50PM +0800, Pu Wen wrote:
>>> Thanks for your suggestion, I'll try to set up early #GP handler to fix
>>> the problem.
>>
>> Why? AFAICT, you only need to return early in sme_enable() if CPUID is
>> not "AuthenticAMD". Just do that please.
> 
> I don't think that would suffice, presumably MSR_AMD64_SEV doesn't exist on older
> AMD CPUs either.  E.g. there's no mention of MSR 0xC001_0131 in the dev's guide
> from 2015[*].

That is the reason for checking the maximum supported leaf being at least
0x8000001f. If that leaf is supported, we expect the SEV status MSR to be
valid. The problem is that the Hygon ucode does not support the MSR in
question. I'm not sure what it would take for that to be added to their
ucode and just always return 0.

> 
> I also don't see the point in checking the vendor string.  A malicious hypervisor
> can lie about CPUID.0x0 just as easily as it can lie about CPUID.0x8000001f, so
> for SEV the options are to either trust the hypervisor or eat #GPs on RDMSR for
> non-SEV CPUs.  If we go with "trust the hypervisor", then the original patch of
> hoisting the CPUID.0x8000001f check up is simpler than checking the vendor string.

Because a hypervisor can put anything it wants in the CPUID 0x0 /
0x80000000 fields, I don't think we can just check for "AuthenticAMD".

If we want the read of CPUID 0x8000001f done before reading the SEV status
MSR, then the original patch is close, but slightly flawed, e.g. only SME
can be indicated but then MSR_AMD64_SEV can say SEV active.

If we want to introduce support for handling/detecting #GP, this might
become overly complicated because of the very early, identity mapped state
the code is in - especially for backport to stable.

Thanks,
Tom

> 
> 
> [*] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amd.com%2Fsystem%2Ffiles%2FTechDocs%2F48751_16h_bkdg.pdf&amp;data=04%7C01%7Cthomas.lendacky%40amd.com%7C6025fb08694e4e6b74bb08d92518549a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637581608717458896%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=CeJXPNxAOPiXQYYXxDKqSrcVBbiY%2FrkQ%2FmPzbRXbSHQ%3D&amp;reserved=0
>
Borislav Petkov June 1, 2021, 4:59 p.m. UTC | #8
On Tue, Jun 01, 2021 at 11:36:31AM -0500, Tom Lendacky wrote:
> That is the reason for checking the maximum supported leaf being at least
> 0x8000001f. If that leaf is supported, we expect the SEV status MSR to be
> valid. The problem is that the Hygon ucode does not support the MSR in
> question. I'm not sure what it would take for that to be added to their
> ucode and just always return 0.

Yap, that sounds good too.

> Because a hypervisor can put anything it wants in the CPUID 0x0 /
> 0x80000000 fields, I don't think we can just check for "AuthenticAMD".

By that logic you can forget even checking CPUID at all in that case.
The only reliable check you can do is MSR_AMD64_SEV which is guest-only.

> If we want the read of CPUID 0x8000001f done before reading the SEV status
> MSR, then the original patch is close, but slightly flawed, e.g. only SME
> can be indicated but then MSR_AMD64_SEV can say SEV active.
> 
> If we want to introduce support for handling/detecting #GP, this might
> become overly complicated because of the very early, identity mapped state
> the code is in - especially for backport to stable.

Yah, ain't gonna happen. I'm not taking some #GP handler to the early
code just because some hardware is operating out of spec.

If some hypervisor running on Hygon hardware is lying and says it is an
AMD which supports the 0x8000001f leaf, then that hypervisor gets to
keep both pieces.
Sean Christopherson June 1, 2021, 5:09 p.m. UTC | #9
On Tue, Jun 01, 2021, Tom Lendacky wrote:
> 
> On 6/1/21 11:14 AM, Sean Christopherson wrote:
> > On Tue, Jun 01, 2021, Borislav Petkov wrote:
> >> On Mon, May 31, 2021 at 10:56:50PM +0800, Pu Wen wrote:
> >>> Thanks for your suggestion, I'll try to set up early #GP handler to fix
> >>> the problem.
> >>
> >> Why? AFAICT, you only need to return early in sme_enable() if CPUID is
> >> not "AuthenticAMD". Just do that please.
> > 
> > I don't think that would suffice, presumably MSR_AMD64_SEV doesn't exist on older
> > AMD CPUs either.  E.g. there's no mention of MSR 0xC001_0131 in the dev's guide
> > from 2015[*].
> 
> That is the reason for checking the maximum supported leaf being at least
> 0x8000001f. If that leaf is supported, we expect the SEV status MSR to be
> valid. The problem is that the Hygon ucode does not support the MSR in
> question. I'm not sure what it would take for that to be added to their
> ucode and just always return 0.

Ah.  But it's also legal/possible for the max extended leaf to be greater than
0x8000001f, e.g. 0x80000020, without 0x8000001f itself being supported.  Even
if AMD can guarantee no such processor will exist, I don't think it would be
illegal for a hypervisor to emulate a feature (on an "AuthenticAMD" virtual CPU)
enumerated by a higher leaf on an older physical AMD CPU (or non-AMD CPU!) that
doesn't support MSR_AMD64_SEV.

> > I also don't see the point in checking the vendor string.  A malicious hypervisor
> > can lie about CPUID.0x0 just as easily as it can lie about CPUID.0x8000001f, so
> > for SEV the options are to either trust the hypervisor or eat #GPs on RDMSR for
> > non-SEV CPUs.  If we go with "trust the hypervisor", then the original patch of
> > hoisting the CPUID.0x8000001f check up is simpler than checking the vendor string.
> 
> Because a hypervisor can put anything it wants in the CPUID 0x0 /
> 0x80000000 fields, I don't think we can just check for "AuthenticAMD".
> 
> If we want the read of CPUID 0x8000001f done before reading the SEV status
> MSR, then the original patch is close, but slightly flawed, e.g. only SME
> can be indicated but then MSR_AMD64_SEV can say SEV active.

I didn't follow this.  Bare metal CPUs should never report only SME in CPUID and
then report SEV as being active in the MSR.

In the SEV case, relying on CPUID in any way, shape, or form requires trusting
the hypervisor.  The code could assert that CPUID and the MSR are consistent, but
I don't see any value in doing so as a much more effective attack would be to
report neither SME nor SEV as supported.
Sean Christopherson June 1, 2021, 5:16 p.m. UTC | #10
On Tue, Jun 01, 2021, Borislav Petkov wrote:
> Yah, ain't gonna happen. I'm not taking some #GP handler to the early
> code just because some hardware is operating out of spec.

The bug isn't limited to out-of-spec hardware.  At the point of #GP, sme_enable()
has only verified the max leaf is greater than 0x8000001f, it has not verified
that 0x8000001f is actually supported.  The APM itself declares several leafs
between 0x80000000 and 0x8000001f as reserved/unsupported, so we can't argue that
0x8000001f must be supported if the max leaf is greater than 0x8000001f.

The only way to verify that 0x8000001f is supported is to find a non-zero bit,
which is what Pu Wen's patch does.
Borislav Petkov June 1, 2021, 5:48 p.m. UTC | #11
On Tue, Jun 01, 2021 at 05:16:12PM +0000, Sean Christopherson wrote:
> The bug isn't limited to out-of-spec hardware.  At the point of #GP, sme_enable()
> has only verified the max leaf is greater than 0x8000001f, it has not verified
> that 0x8000001f is actually supported.  The APM itself declares several leafs
> between 0x80000000 and 0x8000001f as reserved/unsupported, so we can't argue that
> 0x8000001f must be supported if the max leaf is greater than 0x8000001f.

If a hypervisor says that 0x8000001f is supported but then we explode
when reading MSR_AMD64_SEV, then hypervisor gets to keep both pieces.

We're not going to workaround all possible insane hardware/hypervisor
configurations just because they dropped the ball.
Sean Christopherson June 1, 2021, 6:08 p.m. UTC | #12
On Tue, Jun 01, 2021, Borislav Petkov wrote:
> On Tue, Jun 01, 2021 at 05:16:12PM +0000, Sean Christopherson wrote:
> > The bug isn't limited to out-of-spec hardware.  At the point of #GP, sme_enable()
> > has only verified the max leaf is greater than 0x8000001f, it has not verified
> > that 0x8000001f is actually supported.  The APM itself declares several leafs
> > between 0x80000000 and 0x8000001f as reserved/unsupported, so we can't argue that
> > 0x8000001f must be supported if the max leaf is greater than 0x8000001f.
> 
> If a hypervisor says that 0x8000001f is supported but then we explode
> when reading MSR_AMD64_SEV, then hypervisor gets to keep both pieces.

But in my scenario, the hypervisor has not said that 0x8000001f is valid, it has
only said that at least one leaf > 0x8000001f is valid.

E.g. if a (virtual) CPU supports CPUID ranges:

  0x80000000 - 0x8000000A
  0x80000020 - 0x80000021

then the below check will pass as eax will be 0x80000021.

	/* Check for the SME/SEV support leaf */
	eax = 0x80000000;
	ecx = 0;
	native_cpuid(&eax, &ebx, &ecx, &edx);
	if (eax < 0x8000001f)
		return;

But we have not yet verified that 0x8000001f is supported, only that the result
of CPUID.0x8000001f can be trusted (to handle Intel CPUs which return data from
the highest supported leaf if the provided leaf function is greater than the max
supported leaf).  Verifying that 0x8000001f is supported doesn't happen until
0x8000001f is actually read, which is currently done after the RDMSR that #GPs
and explodes.

> We're not going to workaround all possible insane hardware/hypervisor
> configurations just because they dropped the ball.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
Borislav Petkov June 1, 2021, 6:24 p.m. UTC | #13
On Tue, Jun 01, 2021 at 06:08:19PM +0000, Sean Christopherson wrote:
> But we have not yet verified that 0x8000001f is supported, only that the result
> of CPUID.0x8000001f can be trusted (to handle Intel CPUs which return data from
> the highest supported leaf if the provided leaf function is greater than the max
> supported leaf).  Verifying that 0x8000001f is supported doesn't happen until
> 0x8000001f is actually read, which is currently done after the RDMSR that #GPs
> and explodes.

Yeah yeah, Tom just convinced me on IRC that the patch is ok after
all... so let's do that. And again, we cannot stop hypervisors from
doing shady things here so I don't even wanna try to. People should run
SNP/TDX guests only anyway if they care about this stuff.
Tom Lendacky June 1, 2021, 6:30 p.m. UTC | #14
On 5/26/21 2:24 AM, Pu Wen wrote:
> The first two bits of the CPUID leaf 0x8000001F EAX indicate whether
> SEV or SME is supported respectively. It's better to check whether
> SEV or SME is supported before checking the SEV MSR(0xc0010131) to
> see whether SEV or SME is enabled.
> 
> This also avoid the MSR reading failure on the first generation Hygon
> Dhyana CPU which does not support SEV or SME.
> 
> Fixes: eab696d8e8b9 ("x86/sev: Do not require Hypervisor CPUID bit for SEV guests")
> Cc: <stable@vger.kernel.org> # v5.10+
> Signed-off-by: Pu Wen <puwen@hygon.cn>

I think the commit message needs to be expanded to clarify the situations
and provide more detail.

This is both a bare-metal issue and a guest/VM issue. Since Hygon doesn't
support the MSR_AMD64_SEV MSR, reading that MSR results in a #GP - either
directly from hardware in the bare-metal case or via the hypervisor
(because the RDMSR is actually intercepted) in the guest/VM case,
resulting in a failed boot. And since this is very early in the boot
phase, rdmsrl_safe()/native_read_msr_safe() can't be used.

So by checking the CPUID information before attempting the RDMSR, this
goes back to the behavior before the patch identified in the Fixes: tag.

With an improved commit message:

Acked-by: Tom Lendacky <thomas.lendacky@amd.com>

> ---
>  arch/x86/mm/mem_encrypt_identity.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
> index a9639f663d25..470b20208430 100644
> --- a/arch/x86/mm/mem_encrypt_identity.c
> +++ b/arch/x86/mm/mem_encrypt_identity.c
> @@ -504,10 +504,6 @@ void __init sme_enable(struct boot_params *bp)
>  #define AMD_SME_BIT	BIT(0)
>  #define AMD_SEV_BIT	BIT(1)
>  
> -	/* Check the SEV MSR whether SEV or SME is enabled */
> -	sev_status   = __rdmsr(MSR_AMD64_SEV);
> -	feature_mask = (sev_status & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;
> -
>  	/*
>  	 * Check for the SME/SEV feature:
>  	 *   CPUID Fn8000_001F[EAX]
> @@ -519,11 +515,16 @@ void __init sme_enable(struct boot_params *bp)
>  	eax = 0x8000001f;
>  	ecx = 0;
>  	native_cpuid(&eax, &ebx, &ecx, &edx);
> -	if (!(eax & feature_mask))
> +	/* Check whether SEV or SME is supported */
> +	if (!(eax & (AMD_SEV_BIT | AMD_SME_BIT)))
>  		return;
>  
>  	me_mask = 1UL << (ebx & 0x3f);
>  
> +	/* Check the SEV MSR whether SEV or SME is enabled */
> +	sev_status   = __rdmsr(MSR_AMD64_SEV);
> +	feature_mask = (sev_status & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;
> +
>  	/* Check if memory encryption is enabled */
>  	if (feature_mask == AMD_SME_BIT) {
>  		/*
>
Pu Wen June 2, 2021, 6:55 a.m. UTC | #15
On 2021/6/2 2:30, Tom Lendacky wrote:
> On 5/26/21 2:24 AM, Pu Wen wrote:
>> The first two bits of the CPUID leaf 0x8000001F EAX indicate whether
>> SEV or SME is supported respectively. It's better to check whether
>> SEV or SME is supported before checking the SEV MSR(0xc0010131) to
>> see whether SEV or SME is enabled.
>>
>> This also avoid the MSR reading failure on the first generation Hygon
>> Dhyana CPU which does not support SEV or SME.
>>
>> Fixes: eab696d8e8b9 ("x86/sev: Do not require Hypervisor CPUID bit for SEV guests")
>> Cc: <stable@vger.kernel.org> # v5.10+
>> Signed-off-by: Pu Wen <puwen@hygon.cn>
> 
> I think the commit message needs to be expanded to clarify the situations
> and provide more detail.

Okay.

> This is both a bare-metal issue and a guest/VM issue. Since Hygon doesn't
> support the MSR_AMD64_SEV MSR, reading that MSR results in a #GP - either
> directly from hardware in the bare-metal case or via the hypervisor
> (because the RDMSR is actually intercepted) in the guest/VM case,
> resulting in a failed boot. And since this is very early in the boot
> phase, rdmsrl_safe()/native_read_msr_safe() can't be used.

The description is good, will add this.

> So by checking the CPUID information before attempting the RDMSR, this
> goes back to the behavior before the patch identified in the Fixes: tag.
> 
> With an improved commit message:
> 
> Acked-by: Tom Lendacky <thomas.lendacky@amd.com>

Thanks, will send patch v2 with improved commit messages.
diff mbox series

Patch

diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index a9639f663d25..470b20208430 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -504,10 +504,6 @@  void __init sme_enable(struct boot_params *bp)
 #define AMD_SME_BIT	BIT(0)
 #define AMD_SEV_BIT	BIT(1)
 
-	/* Check the SEV MSR whether SEV or SME is enabled */
-	sev_status   = __rdmsr(MSR_AMD64_SEV);
-	feature_mask = (sev_status & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;
-
 	/*
 	 * Check for the SME/SEV feature:
 	 *   CPUID Fn8000_001F[EAX]
@@ -519,11 +515,16 @@  void __init sme_enable(struct boot_params *bp)
 	eax = 0x8000001f;
 	ecx = 0;
 	native_cpuid(&eax, &ebx, &ecx, &edx);
-	if (!(eax & feature_mask))
+	/* Check whether SEV or SME is supported */
+	if (!(eax & (AMD_SEV_BIT | AMD_SME_BIT)))
 		return;
 
 	me_mask = 1UL << (ebx & 0x3f);
 
+	/* Check the SEV MSR whether SEV or SME is enabled */
+	sev_status   = __rdmsr(MSR_AMD64_SEV);
+	feature_mask = (sev_status & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;
+
 	/* Check if memory encryption is enabled */
 	if (feature_mask == AMD_SME_BIT) {
 		/*