diff mbox series

[v6,08/42] x86/sev-es: initialize sev_status/features within #VC handler

Message ID 20211008180453.462291-9-brijesh.singh@amd.com (mailing list archive)
State New
Headers show
Series Add AMD Secure Nested Paging (SEV-SNP) Guest Support | expand

Commit Message

Brijesh Singh Oct. 8, 2021, 6:04 p.m. UTC
From: Michael Roth <michael.roth@amd.com>

Generally access to MSR_AMD64_SEV is only safe if the 0x8000001F CPUID
leaf indicates SEV support. With SEV-SNP, CPUID responses from the
hypervisor are not considered trustworthy, particularly for 0x8000001F.
SEV-SNP provides a firmware-validated CPUID table to use as an
alternative, but prior to checking MSR_AMD64_SEV there are no
guarantees that this is even an SEV-SNP guest.

Rather than relying on these CPUID values early on, allow SEV-ES and
SEV-SNP guests to instead use a cpuid instruction to trigger a #VC and
have it cache MSR_AMD64_SEV in sev_status, since it is known to be safe
to access MSR_AMD64_SEV if a #VC has triggered.

Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kernel/sev-shared.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Borislav Petkov Oct. 18, 2021, 2:29 p.m. UTC | #1
On Fri, Oct 08, 2021 at 01:04:19PM -0500, Brijesh Singh wrote:
> From: Michael Roth <michael.roth@amd.com>
> 
> Generally access to MSR_AMD64_SEV is only safe if the 0x8000001F CPUID
> leaf indicates SEV support. With SEV-SNP, CPUID responses from the
> hypervisor are not considered trustworthy, particularly for 0x8000001F.
> SEV-SNP provides a firmware-validated CPUID table to use as an
> alternative, but prior to checking MSR_AMD64_SEV there are no
> guarantees that this is even an SEV-SNP guest.
> 
> Rather than relying on these CPUID values early on, allow SEV-ES and
> SEV-SNP guests to instead use a cpuid instruction to trigger a #VC and
> have it cache MSR_AMD64_SEV in sev_status, since it is known to be safe
> to access MSR_AMD64_SEV if a #VC has triggered.
> 
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/kernel/sev-shared.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> index 8ee27d07c1cd..2796c524d174 100644
> --- a/arch/x86/kernel/sev-shared.c
> +++ b/arch/x86/kernel/sev-shared.c
> @@ -191,6 +191,20 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
>  	if (exit_code != SVM_EXIT_CPUID)
>  		goto fail;
>  
> +	/*
> +	 * A #VC implies that either SEV-ES or SEV-SNP are enabled, so the SEV
> +	 * MSR is also available. Go ahead and initialize sev_status here to
> +	 * allow SEV features to be checked without relying solely on the SEV
> +	 * cpuid bit to indicate whether it is safe to do so.
> +	 */
> +	if (!sev_status) {
> +		unsigned long lo, hi;
> +
> +		asm volatile("rdmsr" : "=a" (lo), "=d" (hi)
> +				     : "c" (MSR_AMD64_SEV));
> +		sev_status = (hi << 32) | lo;
> +	}
> +
>  	sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(fn, GHCB_CPUID_REQ_EAX));
>  	VMGEXIT();
>  	val = sev_es_rd_ghcb_msr();
> -- 

Ok, you guys are killing me. ;-\

How is bolting some pretty much unrelated code into the early #VC
handler not a hack? Do you not see it?

So sme_enable() is reading MSR_AMD64_SEV and setting up everything
there, including sev_status. If a SNP guest does not trust CPUID, why
can't you attempt to read that MSR there, even if CPUID has lied to the
guest?

And not just slap it somewhere just because it works?
Michael Roth Oct. 18, 2021, 6:40 p.m. UTC | #2
On Mon, Oct 18, 2021 at 04:29:07PM +0200, Borislav Petkov wrote:
> On Fri, Oct 08, 2021 at 01:04:19PM -0500, Brijesh Singh wrote:
> > From: Michael Roth <michael.roth@amd.com>
> > 
> > Generally access to MSR_AMD64_SEV is only safe if the 0x8000001F CPUID
> > leaf indicates SEV support. With SEV-SNP, CPUID responses from the
> > hypervisor are not considered trustworthy, particularly for 0x8000001F.
> > SEV-SNP provides a firmware-validated CPUID table to use as an
> > alternative, but prior to checking MSR_AMD64_SEV there are no
> > guarantees that this is even an SEV-SNP guest.
> > 
> > Rather than relying on these CPUID values early on, allow SEV-ES and
> > SEV-SNP guests to instead use a cpuid instruction to trigger a #VC and
> > have it cache MSR_AMD64_SEV in sev_status, since it is known to be safe
> > to access MSR_AMD64_SEV if a #VC has triggered.
> > 
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > ---
> >  arch/x86/kernel/sev-shared.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> > index 8ee27d07c1cd..2796c524d174 100644
> > --- a/arch/x86/kernel/sev-shared.c
> > +++ b/arch/x86/kernel/sev-shared.c
> > @@ -191,6 +191,20 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
> >  	if (exit_code != SVM_EXIT_CPUID)
> >  		goto fail;
> >  
> > +	/*
> > +	 * A #VC implies that either SEV-ES or SEV-SNP are enabled, so the SEV
> > +	 * MSR is also available. Go ahead and initialize sev_status here to
> > +	 * allow SEV features to be checked without relying solely on the SEV
> > +	 * cpuid bit to indicate whether it is safe to do so.
> > +	 */
> > +	if (!sev_status) {
> > +		unsigned long lo, hi;
> > +
> > +		asm volatile("rdmsr" : "=a" (lo), "=d" (hi)
> > +				     : "c" (MSR_AMD64_SEV));
> > +		sev_status = (hi << 32) | lo;
> > +	}
> > +
> >  	sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(fn, GHCB_CPUID_REQ_EAX));
> >  	VMGEXIT();
> >  	val = sev_es_rd_ghcb_msr();
> > -- 
> 
> Ok, you guys are killing me. ;-\
> 
> How is bolting some pretty much unrelated code into the early #VC
> handler not a hack? Do you not see it?

This was the result of my proposal in v5:

  > More specifically, the general protocol to determine SNP is enabled
  > seems
  > to be:
  > 
  >  1) check cpuid 0x8000001f to determine if SEV bit is enabled and SEV
  >     MSR is available
  >  2) check the SEV MSR to see if SEV-SNP bit is set
  > 
  > but the conundrum here is the CPUID page is only valid if SNP is
  > enabled, otherwise it can be garbage. So the code to set up the page
  > skips those checks initially, and relies on the expectation that UEFI,
  > or whatever the initial guest blob was, will only provide a CC_BLOB if
  > it already determined SNP is enabled.
  > 
  > It's still possible something goes awry and the kernel gets handed a
  > bogus CC_BLOB even though SNP isn't actually enabled. In this case the
  > cpuid values could be bogus as well, but the guest will fail
  > attestation then and no secrets should be exposed.
  > 
  > There is one thing that could tighten up the check a bit though. Some
  > bits of SEV-ES code will use the generation of a #VC as an indicator
  > of SEV-ES support, which implies SEV MSR is available without relying
  > on hypervisor-provided CPUID bits. I could add a one-time check in
  > the cpuid #VC to check SEV MSR for SNP bit, but it would likely
  > involve another static __ro_after_init variable store state. If that
  > seems worthwhile I can look into that more as well.
  
  Yes, the skipping of checks above sounds weird: why don't you simply
  keep the checks order: SEV, -ES, -SNP and then parse CPUID. It'll fail
  at attestation eventually, but you'll have the usual flow like with the
  rest of the SEV- feature picking apart.

https://lore.kernel.org/lkml/YS3+saDefHwkYwny@zn.tnic/

I'd thought you didn't like the previous approach of having snp_cpuid_init()
defer the CPUID/MSR checks until sme_enable() sets up sev_status later on,
then failing the boot retroactively if SNP bit isn't set but CPUID table
was advertised. So I added those checks in snp_cpuid_init(), along with the
additional #VC-based indicator of SEV-ES/SEV-SNP support as an additional
sanity check of what EFI firmware was providing, since I thought that was
the key concern here.

Now I'm realizing that perhaps your suggestion was to actually defer the
entire CPUID page setup until after sme_enable(). Is that correct?

> 
> So sme_enable() is reading MSR_AMD64_SEV and setting up everything
> there, including sev_status. If a SNP guest does not trust CPUID, why
> can't you attempt to read that MSR there, even if CPUID has lied to the
> guest?

If CPUID has lied, that would result in a #GP, rather than a controlled
termination in the various checkers/callers. The latter is easier to
debug.

Additionally, #VC is arguably a better indicator of SEV MSR availability
for SEV-ES/SEV-SNP guests, since it is only generated by ES/SNP hardware
and doesn't rely directly on hypervisor/EFI-provided CPUID values. It
doesn't work for SEV guests, but I don't think it's a bad idea to allow
SEV-ES/SEV-SNP guests to initialize sev_status in #VC handler to make
use of the added assurance.

Is it just the way it's currently implemented as something
cpuid-table-specific that's at issue, or are you opposed to doing so in
general?

Thanks,

Mike

> 
> And not just slap it somewhere just because it works?
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7Cmichael.roth%40amd.com%7C462c7481ae414f7706a808d99243a615%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637701641625364120%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=6MViA5KCFEgSA2fijEx3Dg05btIEAjw55bFYRKL0P6o%3D&amp;reserved=0
Borislav Petkov Oct. 18, 2021, 7:18 p.m. UTC | #3
On Mon, Oct 18, 2021 at 01:40:03PM -0500, Michael Roth wrote:
> If CPUID has lied, that would result in a #GP, rather than a controlled
> termination in the various checkers/callers. The latter is easier to
> debug.
> 
> Additionally, #VC is arguably a better indicator of SEV MSR availability
> for SEV-ES/SEV-SNP guests, since it is only generated by ES/SNP hardware
> and doesn't rely directly on hypervisor/EFI-provided CPUID values. It
> doesn't work for SEV guests, but I don't think it's a bad idea to allow
> SEV-ES/SEV-SNP guests to initialize sev_status in #VC handler to make
> use of the added assurance.

Ok, let's take a step back and analyze what we're trying to solve first.
So I'm looking at sme_enable():

1. Code checks SME/SEV support leaf. HV lies and says there's none. So
guest doesn't boot encrypted. Oh well, not a big deal, the cloud vendor
won't be able to give confidentiality to its users => users go away or
do unencrypted like now.

Problem is solved by political and economical pressure.

2. Check SEV and SME bit. HV lies here. Oh well, same as the above.

3. HV lies about 1. and 2. but says that SME/SEV is supported.

Guest attempts to read the MSR Guest explodes due to the #GP. The same
political/economical pressure thing happens.

If the MSR is really there, we've landed at the place where we read the
SEV MSR. Moment of truth - SEV/SNP guests have a communication protocol
which is independent from the HV and all good.

Now, which case am I missing here which justifies the need to do those
acrobatics of causing #VCs just to detect the SEV MSR?

Thx.
Michael Roth Oct. 20, 2021, 4:10 p.m. UTC | #4
On Mon, Oct 18, 2021 at 09:18:13PM +0200, Borislav Petkov wrote:
> On Mon, Oct 18, 2021 at 01:40:03PM -0500, Michael Roth wrote:
> > If CPUID has lied, that would result in a #GP, rather than a controlled
> > termination in the various checkers/callers. The latter is easier to
> > debug.
> > 
> > Additionally, #VC is arguably a better indicator of SEV MSR availability
> > for SEV-ES/SEV-SNP guests, since it is only generated by ES/SNP hardware
> > and doesn't rely directly on hypervisor/EFI-provided CPUID values. It
> > doesn't work for SEV guests, but I don't think it's a bad idea to allow
> > SEV-ES/SEV-SNP guests to initialize sev_status in #VC handler to make
> > use of the added assurance.
> 

[Sorry for the wall of text, just trying to work through everything.]

> Ok, let's take a step back and analyze what we're trying to solve first.
> So I'm looking at sme_enable():

I'm not sure if this is pertaining to using the CPUID table prior to
sme_enable(), or just the #VC-based SEV MSR read. The following comments
assume the former. If that assumption is wrong you can basically ignore
the rest of this email :)

[The #VC-based SEV MSR read is not necessary for anything in sme_enable(),
it's simply a way to determine whether the guest is an SNP guest, without
any reliance on CPUID, which seemed useful in the context of doing some
additional sanity checks against the SNP CPUID table and determining that
it's appropriate to use it early on (rather than just trust that this is an
SNP guest by virtue of the CC blob being present, and then failing later
once sme_enable() checks for the SNP feature bits through the normal
mechanism, as was done in v5).]

> 
> 1. Code checks SME/SEV support leaf. HV lies and says there's none. So
> guest doesn't boot encrypted. Oh well, not a big deal, the cloud vendor
> won't be able to give confidentiality to its users => users go away or
> do unencrypted like now.
> 
> Problem is solved by political and economical pressure.
> 
> 2. Check SEV and SME bit. HV lies here. Oh well, same as the above.

I'd be worried about the possibility that, through some additional exploits
or failures in the attestation flow, a guest owner was tricked into booting
unencrypted on a compromised host and exposing their secrets. Their
attestation process might even do some additional CPUID sanity checks, which
would at the point be via the SNP CPUID table and look legitimate, unaware
that the kernel didn't actually use the SNP CPUID table until after
0x8000001F was parsed (if we were to only initialize it after/as-part-of
sme_enable()).

Fortunately in this scenario I think the guest kernel actually would fail to
boot due to the SNP hardware unconditionally treating code/page tables as
encrypted pages. I tested some of these scenarios just to check, but not
all, and I still don't feel confident enough about it to say that there's
not some way to exploit this by someone who is more clever/persistant than
me.

> 
> 3. HV lies about 1. and 2. but says that SME/SEV is supported.
> 
> Guest attempts to read the MSR Guest explodes due to the #GP. The same
> political/economical pressure thing happens.

That's seems likely, but maybe some future hardware bug, or some other
exploit, makes it possible to intercept that MSR read? I don't know, but
if that particular branch of execution can be made less likely by utilizing
SNP CPUID validation I think it makes sense to make use of it.

> 
> If the MSR is really there, we've landed at the place where we read the
> SEV MSR. Moment of truth - SEV/SNP guests have a communication protocol
> which is independent from the HV and all good.

At which point we then switch to using the CPUID table? But at that
point all the previous CPUID checks, both SEV-related/non-SEV-related,
are now possibly not consistent with what's in the CPUID table. Do we
then revalidate? Even a non-malicious hypervisor might provide
inconsistent values between the two sources due to bugs, or SNP
validation suppressing certain feature bits that hypervisor otherwise
exposes, etc. Now all the code after sme_enable() can potentially take
unexpected execution paths, where post-sme_enable() code makes
assumptions about pre-sme_enable() checks that may no longer hold true.

Also, it would be useful from an attestation perspective that the CPUID
bits visible to userspace correspond to what the kernel used during boot,
which wouldn't necessarily be the case if hypervisor-provided values were
used during early boot and potentially put the kernel into some unexpected
state that could persist beyond the point of attestation.

Code-wise, thanks in large part to your suggestions, it really isn't all
that much more complicated to hook in the CPUID table lookup in the #VC
handlers (which are already needed anyway for SEV-ES) early on so all
these checks are against the same trusted (or more-trusted at least)
CPUID source.

> 
> Now, which case am I missing here which justifies the need to do those
> acrobatics of causing #VCs just to detect the SEV MSR?

There are a few more places where cpuid is utilized prior to
sme_enable():

# In boot/compressed
paging_prepare():
  ecx = cpuid(7, 0) # SNP-verified against host values
  # check ecx for LA57

# In boot/compressed and kernel proper
verify_cpu():
  eax, ebx, ecx, edx = cpuid(0, 0) # SNP-verified against host values
  # check eax for range > 0
  # check ebx, ecx, edx for "AuthenticAMD" or "GenuineIntel"

if_amd:
  edx = cpuid(1, 0) # SNP-verified against host values
  # check edx feature bits against REQUIRED_MASK0 (PAE|FPU|PSE|etc.)

  eax = cpuid(0x80000001, 0) # SNP-verified against host values
  # check eax against REQUIRED_MASK1 (LM|3DNOW)

  edx = cpuid(1, 0) # SNP-verified against host values
  # check eax against SSE_MASK
  # if not set, try to force it on via MSR_K7_HWCR if this is an AMD CPU
  # if forcing fails, report no_longmode available

if_intel:
  # completely different stuff

It's possible that various lies about the values checked for in
REQUIRED_MASK0/REQUIRED_MASK1, LA57 enablement, etc., can be audited in
similar fashion as you've done above to find nothing concerning, but
what about 5 years from now? And these are all checks/configuration that
can put the kernel in unexpected states that persist beyond the point of
attestation, where we really need to care about the possible effects. If
SNP CPUID validation isn't utilized until after-the-fact, we'd end up
not utilizing it for some of the more 'interesting' CPUID bits.

It's also worth noting that TDX guards against most of this through
CPUID virtualization, where hardware/microcode provides similar
validation for these sorts of CPUID bits in early boot. It's only because
the SEV-SNP CPUID 'virtualization' lives in the guest code that we have to
deal with the additional complexity of initializing the CPUID table early
on. But if both platforms are capable of providing similar assurances then
it seems worthwhile to pursue that.

> acrobatics of causing #VCs just to detect the SEV MSR?

The CPUID calls in snp_cpuid_init() weren't added specifically to induce
the #VC-based SEV MSR read, they were added only because I thought the
gist of your earlier suggestions were to do more validation against the
CPUID table advertised by EFI rather than the v5 approach of deferring
them till later after sev_status gets set by sme_enable(), and then
failing later if turned out that SNP CPUID feature bit wasn't set by
sme_enable(). I thought this was motivated by a desire to be more
paranoid about what EFI provides, so once I had the cpuid checks added
in snp_cpuid_init() it seemed like a logical step to further
sanity-check the SNP CPUID bit using a mechanism that was completely
independent of the CPUID table.

I'm not dead set on that at all however, it was only added based on me
[mis-]interpreting your comments as a desire to be less trusting of EFI
as to whether this is an SNP guest or not. But I also don't think it's
a good idea to not utilize the CPUID table until after sme_enable(),
based on the above reasons.

What if we simply add a check in sme_enable() that terminates the guest
if the cc_blob/cpuid table is provided, but the CPUID/MSR checks in
sme_enable() determine that this isn't an SNP guest? That would be similar
to the v5 approach, but in a less roundabout way, and then the cpuid/MSR
checks could be dropped from snp_cpuid_init().

If we did decide that it is useful to use #VC-based initialization of
sev_status there, it would all be self-contained there (but again, that's
a separate thing that I don't have a strong opinion on).

Thanks,

Mike

> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7Cmichael.roth%40amd.com%7Cddb8c27d71794244176308d9926c094d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637701815061371715%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=LUArcQqb%2F0WFlworDbZOClXhgYqEGje364fpBycixUg%3D&amp;reserved=0
Borislav Petkov Oct. 20, 2021, 6:01 p.m. UTC | #5
On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> [Sorry for the wall of text, just trying to work through everything.]

And I'm going to respond in a couple of mails just for my own sanity.

> I'm not sure if this is pertaining to using the CPUID table prior to
> sme_enable(), or just the #VC-based SEV MSR read. The following comments
> assume the former. If that assumption is wrong you can basically ignore
> the rest of this email :)

This is pertaining to me wanting to show you that the design of this SNP
support needs to be sane and maintainable and every function needs to
make sense not only now but in the future.

In this particular example, we should set sev_status *once*, *before*
anything accesses it so that it is prepared when something needs it. Not
do a #VC and go, "oh, btw, is sev_status set? No? Ok, lemme set it."
which basically means our design is seriously lacking.

And I had suggested a similar thing for TDX and tglx was 100% right in
shooting it down because we do properly designed things - not, get stuff
in so that vendor is happy and then, once the vendor programmers have
disappeared to do their next enablement task, the maintainers get to mop
up and maintain it forever.

Because this mopping up doesn't scale - trust me.

> [The #VC-based SEV MSR read is not necessary for anything in sme_enable(),
> it's simply a way to determine whether the guest is an SNP guest, without
> any reliance on CPUID, which seemed useful in the context of doing some
> additional sanity checks against the SNP CPUID table and determining that
> it's appropriate to use it early on (rather than just trust that this is an
> SNP guest by virtue of the CC blob being present, and then failing later
> once sme_enable() checks for the SNP feature bits through the normal
> mechanism, as was done in v5).]

So you need to make up your mind here design-wise, what you wanna do.

The proper thing to do would be, to detect *everything*, detect whether
this is an SNP guest, yadda yadda, everything your code is going to need
later on, and then be done with it.

Then you continue with the boot and now your other code queries
everything that has been detected up til now and uses it.

End of mail 1.
Borislav Petkov Oct. 20, 2021, 6:08 p.m. UTC | #6
On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> > 1. Code checks SME/SEV support leaf. HV lies and says there's none. So
> > guest doesn't boot encrypted. Oh well, not a big deal, the cloud vendor
> > won't be able to give confidentiality to its users => users go away or
> > do unencrypted like now.
> > 
> > Problem is solved by political and economical pressure.
> > 
> > 2. Check SEV and SME bit. HV lies here. Oh well, same as the above.
> 
> I'd be worried about the possibility that, through some additional exploits
> or failures in the attestation flow,

Well, that puts forward an important question: how do you verify
*reliably* that this is an SNP guest?

- attestation?

- CPUID?

- anything else?

I don't see this written down anywhere. Because this assumption will
guide the design in the kernel.

> a guest owner was tricked into booting unencrypted on a compromised
> host and exposing their secrets. Their attestation process might even
> do some additional CPUID sanity checks, which would at the point
> be via the SNP CPUID table and look legitimate, unaware that the
> kernel didn't actually use the SNP CPUID table until after 0x8000001F
> was parsed (if we were to only initialize it after/as-part-of
> sme_enable()).

So what happens with that guest owner later?

How is she to notice that she booted unencrypted?

> Fortunately in this scenario I think the guest kernel actually would fail to
> boot due to the SNP hardware unconditionally treating code/page tables as
> encrypted pages. I tested some of these scenarios just to check, but not
> all, and I still don't feel confident enough about it to say that there's
> not some way to exploit this by someone who is more clever/persistant than
> me.

All this design needs to be preceded with: "We protect against cases A,
B and C and not against D, E, etc."

So that it is clear to all parties involved what we're working with and
what we're protecting against and what we're *not* protecting against.

End of mail 2, more later.
Michael Roth Oct. 21, 2021, 12:35 a.m. UTC | #7
On Wed, Oct 20, 2021 at 08:01:07PM +0200, Borislav Petkov wrote:
> On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> > [Sorry for the wall of text, just trying to work through everything.]
> 
> And I'm going to respond in a couple of mails just for my own sanity.
> 
> > I'm not sure if this is pertaining to using the CPUID table prior to
> > sme_enable(), or just the #VC-based SEV MSR read. The following comments
> > assume the former. If that assumption is wrong you can basically ignore
> > the rest of this email :)
> 
> This is pertaining to me wanting to show you that the design of this SNP
> support needs to be sane and maintainable and every function needs to
> make sense not only now but in the future.

Absolutely.

> 
> In this particular example, we should set sev_status *once*, *before*
> anything accesses it so that it is prepared when something needs it. Not
> do a #VC and go, "oh, btw, is sev_status set? No? Ok, lemme set it."
> which basically means our design is seriously lacking.

Yes, taking a step back there are some things that could probably be
improved upon there.

currently:

  - boot kernel initializes sev_status in set_sev_encryption_mask()
  - run-time kernel initializes sev_status in sme_enable()

with this series the following are introduced:

  - boot kernel initializes sev_status on-demand in sev_snp_enabled()
    - initially used by snp_cpuid_init_boot(), which happens before
      set_sev_encryption_mask()
  - run-time kernel initializes sev_status on-demand via #VC handler
    - initially used by snp_cpuid_init(), which happens before
      sme_enable()

Fortunately, all the code makes use of sev_status to get at the SEV MSR
bits, so breaking the appropriate bits out of sme_enable() into an earlier
sev_init() routine that's the exclusive writer of sev_status sounds like a
promising approach.

It makes sense to do it immediately after the first #VC handler is set
up, so CPUID is available, and since that's where SNP CPUID table
initialization would need to happen if it's to be made available in
#VC handler.

It may even be similar enough between boot/compressed and run-time kernel
that it could be a shared routine in sev-shared.c. But then again it also
sounds like the appropriate place to move the snp_cpuid_init*() calls,
and locating the cc_blob, and since there's differences there it might make
sense to keep the boot/compressed and kernel proper sev_init() routines
separate to avoid #ifdeffery).

Not to get ahead of myself though. Just seems like a good starting point
for how to consolidate the various users.

> 
> And I had suggested a similar thing for TDX and tglx was 100% right in
> shooting it down because we do properly designed things - not, get stuff
> in so that vendor is happy and then, once the vendor programmers have
> disappeared to do their next enablement task, the maintainers get to mop
> up and maintain it forever.
> 
> Because this mopping up doesn't scale - trust me.

Got it, and my apologies if I've given you that impression as it's
certainly not my intent. (though I'm sure you've heard that before.)

> 
> > [The #VC-based SEV MSR read is not necessary for anything in sme_enable(),
> > it's simply a way to determine whether the guest is an SNP guest, without
> > any reliance on CPUID, which seemed useful in the context of doing some
> > additional sanity checks against the SNP CPUID table and determining that
> > it's appropriate to use it early on (rather than just trust that this is an
> > SNP guest by virtue of the CC blob being present, and then failing later
> > once sme_enable() checks for the SNP feature bits through the normal
> > mechanism, as was done in v5).]
> 
> So you need to make up your mind here design-wise, what you wanna do.
> 
> The proper thing to do would be, to detect *everything*, detect whether
> this is an SNP guest, yadda yadda, everything your code is going to need
> later on, and then be done with it.
> 
> Then you continue with the boot and now your other code queries
> everything that has been detected up til now and uses it.

Agreed, if we need to check SEV MSR early for the purposes of SNP it makes
sense to move the overall SEV feature detection code earlier as well. I
should have looked into that aspect more closely before introducing the
changes.

> 
> End of mail 1.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7Cmichael.roth%40amd.com%7C1f46689b40da4a700a6308d993f3993a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637703496776826912%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Kdi9h%2FTzuzoLn64BRsRMLWHkew14BxZHR28QsORsSxs%3D&amp;reserved=0
Michael Roth Oct. 21, 2021, 2:05 a.m. UTC | #8
On Wed, Oct 20, 2021 at 08:08:39PM +0200, Borislav Petkov wrote:
> On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> > > 1. Code checks SME/SEV support leaf. HV lies and says there's none. So
> > > guest doesn't boot encrypted. Oh well, not a big deal, the cloud vendor
> > > won't be able to give confidentiality to its users => users go away or
> > > do unencrypted like now.
> > > 
> > > Problem is solved by political and economical pressure.
> > > 
> > > 2. Check SEV and SME bit. HV lies here. Oh well, same as the above.
> > 
> > I'd be worried about the possibility that, through some additional exploits
> > or failures in the attestation flow,
> 
> Well, that puts forward an important question: how do you verify
> *reliably* that this is an SNP guest?
> 
> - attestation?
> 
> - CPUID?
> 
> - anything else?
> 
> I don't see this written down anywhere. Because this assumption will
> guide the design in the kernel.

According to the APM at least, (Rev 3.37, 15.34.10, "SEV_STATUS MSR"), the
SEV MSR is the appropriate source for guests to use. This is what is used
in the EFI code as well. So that seems to be the right way to make the
initial determination.

There's a dependency there on the SEV CPUID bit however, since setting the
bit to 0 would generally result in a guest skipping the SEV MSR read and
assuming 0. So for SNP it would be more reliable to make use of the CPUID
table at that point, since it's less-susceptible to manipulation, or do the
#VC-based SEV MSR read (or both).

> 
> > a guest owner was tricked into booting unencrypted on a compromised
> > host and exposing their secrets. Their attestation process might even
> > do some additional CPUID sanity checks, which would at the point
> > be via the SNP CPUID table and look legitimate, unaware that the
> > kernel didn't actually use the SNP CPUID table until after 0x8000001F
> > was parsed (if we were to only initialize it after/as-part-of
> > sme_enable()).
> 
> So what happens with that guest owner later?
> 
> How is she to notice that she booted unencrypted?

Fully-unencrypted should result in a crash due to the reasons below.

But there may exist some carefully crafted outside influences that could
goad the guest into, perhaps, not marking certain pages as private. The
best that can be done to prevent that is to audit/harden all the code in the
boot stack so that it is less susceptible to that kind of outside
manipulation (via mechanisms like SEV-ES, SNP page validation, SNP CPUID
table, SNP restricted injection, etc.)

Then of course that boot stack needs to be part of the attestation process
to provide any meaningful assurances about the resulting guest state.

Outside of the boot stack the guest owner might take some extra precautions.
Perhaps custom some kernel driver to verify encryption/validated status of
guest pages, some checks against the CPUID table to verify it contains sane
values, but not really worth speculating on that aspect as it will be
ultimately dependent on how the cloud vendor decides to handle things after
boot.

> 
> > Fortunately in this scenario I think the guest kernel actually would fail to
> > boot due to the SNP hardware unconditionally treating code/page tables as
> > encrypted pages. I tested some of these scenarios just to check, but not
> > all, and I still don't feel confident enough about it to say that there's
> > not some way to exploit this by someone who is more clever/persistant than
> > me.
> 
> All this design needs to be preceded with: "We protect against cases A,
> B and C and not against D, E, etc."
> 
> So that it is clear to all parties involved what we're working with and
> what we're protecting against and what we're *not* protecting against.

That would indeed be useful. Perhaps as a nice big comment in sme_enable()
and/or the proposed sev_init() so that those invariants can be maintained,
or updated in sync with future changes. I'll look into that for the next
spin and check with Brijesh on the details.

> 
> End of mail 2, more later.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7Cmichael.roth%40amd.com%7C70ce657823a441516fc808d993f4a402%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637703501243595370%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=nIqCXolZUNWTV6eBfLscXRfQDWJZk5fwBMghKVbIeaw%3D&amp;reserved=0
Borislav Petkov Oct. 21, 2021, 2:28 p.m. UTC | #9
On Wed, Oct 20, 2021 at 07:35:35PM -0500, Michael Roth wrote:
> Fortunately, all the code makes use of sev_status to get at the SEV MSR
> bits, so breaking the appropriate bits out of sme_enable() into an earlier
> sev_init() routine that's the exclusive writer of sev_status sounds like a
> promising approach.

Ack.

> It makes sense to do it immediately after the first #VC handler is set
> up, so CPUID is available, and since that's where SNP CPUID table
> initialization would need to happen if it's to be made available in
> #VC handler.

Right, and you can do all your init/CPUID prep there.

> It may even be similar enough between boot/compressed and run-time kernel
> that it could be a shared routine in sev-shared.c.

Uuh, bonus points! :-)

> But then again it also sounds like the appropriate place to move the
> snp_cpuid_init*() calls, and locating the cc_blob, and since there's
> differences there it might make sense to keep the boot/compressed and
> kernel proper sev_init() routines separate to avoid #ifdeffery).
>
> Not to get ahead of myself though. Just seems like a good starting point
> for how to consolidate the various users.

I like how you're thinking. :)

> Got it, and my apologies if I've given you that impression as it's
> certainly not my intent. (though I'm sure you've heard that before.)

Nothing to apologize - all good.

> Agreed, if we need to check SEV MSR early for the purposes of SNP it makes
> sense to move the overall SEV feature detection code earlier as well. I
> should have looked into that aspect more closely before introducing the
> changes.

Thx.
Borislav Petkov Oct. 21, 2021, 2:39 p.m. UTC | #10
On Wed, Oct 20, 2021 at 09:05:42PM -0500, Michael Roth wrote:
> According to the APM at least, (Rev 3.37, 15.34.10, "SEV_STATUS MSR"), the
> SEV MSR is the appropriate source for guests to use. This is what is used
> in the EFI code as well. So that seems to be the right way to make the
> initial determination.

Yap.

> There's a dependency there on the SEV CPUID bit however, since setting the
> bit to 0 would generally result in a guest skipping the SEV MSR read and
> assuming 0. So for SNP it would be more reliable to make use of the CPUID
> table at that point, since it's less-susceptible to manipulation, or do the
> #VC-based SEV MSR read (or both).

So the CPUID page is supplied by the firmware, right?

Then, you parse it and see that the CPUID bit is 1, then you start using
the SEV_STATUS MSR and all good.

If there *is* a CPUID page but that bit is 0, then you can safely assume
that something is playing tricks on ya so you simply refuse booting.

> Fully-unencrypted should result in a crash due to the reasons below.

Crash is a good thing in confidential computing. :)

> But there may exist some carefully crafted outside influences that could
> goad the guest into, perhaps, not marking certain pages as private. The
> best that can be done to prevent that is to audit/harden all the code in the
> boot stack so that it is less susceptible to that kind of outside
> manipulation (via mechanisms like SEV-ES, SNP page validation, SNP CPUID
> table, SNP restricted injection, etc.)

So to me I wonder why would one use anything *else* but an SNP guest. We
all know that those previous technologies were just the stepping stones
towards SNP.

> Then of course that boot stack needs to be part of the attestation process
> to provide any meaningful assurances about the resulting guest state.
>
> Outside of the boot stack the guest owner might take some extra precautions.
> Perhaps custom some kernel driver to verify encryption/validated status of
> guest pages, some checks against the CPUID table to verify it contains sane
> values, but not really worth speculating on that aspect as it will be
> ultimately dependent on how the cloud vendor decides to handle things after
> boot.

Well, I've always advocated having a best-practices writeup somewhere
goes a long way to explain this technology to people and how to get
their feet wet. And there you can give hints how such verification could
look like in detail...

> That would indeed be useful. Perhaps as a nice big comment in sme_enable()
> and/or the proposed sev_init() so that those invariants can be maintained,
> or updated in sync with future changes. I'll look into that for the next
> spin and check with Brijesh on the details.

There is Documentation/x86/amd-memory-encryption.rst, for example.
Borislav Petkov Oct. 21, 2021, 2:48 p.m. UTC | #11
On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> At which point we then switch to using the CPUID table? But at that
> point all the previous CPUID checks, both SEV-related/non-SEV-related,
> are now possibly not consistent with what's in the CPUID table. Do we
> then revalidate?

Well, that's a tough question. That's basically the same question as,
does Linux support heterogeneous cores and can it handle hardware
features which get enabled after boot. The perfect example is, late
microcode loading which changes CPUID bits and adds new functionality.

And the answer to that is, well, hard. You need to decide this on a
case-by-case basis.

But isn't it that the SNP CPUID page will be parsed early enough anyway
so that kernel proper will see only SNP CPUID info and init properly
using that?

> Even a non-malicious hypervisor might provide inconsistent values
> between the two sources due to bugs, or SNP validation suppressing
> certain feature bits that hypervisor otherwise exposes, etc.

There's also migration, lemme point to a very recent example:

https://lore.kernel.org/r/20211021104744.24126-1-jane.malalane@citrix.com

which is exactly what you say - a non-malicious HV taking care of its
migration pool. So how do you handle that?

> Now all the code after sme_enable() can potentially take unexpected
> execution paths, where post-sme_enable() code makes assumptions about
> pre-sme_enable() checks that may no longer hold true.

So as I said above, if you parse SNP CPUID page early enough, you don't
have to worry about feature rediscovery. Early enough means, before
identify_boot_cpu().
Borislav Petkov Oct. 21, 2021, 2:51 p.m. UTC | #12
On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> The CPUID calls in snp_cpuid_init() weren't added specifically to induce
> the #VC-based SEV MSR read, they were added only because I thought the
> gist of your earlier suggestions were to do more validation against the
> CPUID table advertised by EFI

Well, if EFI is providing us with the CPUID table, who verified it? The
attestation process? Is it signed with the AMD platform key?

Because if we can verify the firmware is ok, then we can trust the CPUID
page, right?
Dr. David Alan Gilbert Oct. 21, 2021, 3:56 p.m. UTC | #13
* Borislav Petkov (bp@alien8.de) wrote:
> On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> > At which point we then switch to using the CPUID table? But at that
> > point all the previous CPUID checks, both SEV-related/non-SEV-related,
> > are now possibly not consistent with what's in the CPUID table. Do we
> > then revalidate?
> 
> Well, that's a tough question. That's basically the same question as,
> does Linux support heterogeneous cores and can it handle hardware
> features which get enabled after boot. The perfect example is, late
> microcode loading which changes CPUID bits and adds new functionality.
> 
> And the answer to that is, well, hard. You need to decide this on a
> case-by-case basis.

I can imagine a malicious hypervisor trying to return different cpuid
answers to different threads or even the same thread at different times.

> But isn't it that the SNP CPUID page will be parsed early enough anyway
> so that kernel proper will see only SNP CPUID info and init properly
> using that?
> 
> > Even a non-malicious hypervisor might provide inconsistent values
> > between the two sources due to bugs, or SNP validation suppressing
> > certain feature bits that hypervisor otherwise exposes, etc.
> 
> There's also migration, lemme point to a very recent example:
> 
> https://lore.kernel.org/r/20211021104744.24126-1-jane.malalane@citrix.com

Ewww.

> which is exactly what you say - a non-malicious HV taking care of its
> migration pool. So how do you handle that?

Well, the spec (AMD 56860 SEV spec) says:

  'If firmware encounters a CPUID function that is in the standard or extended ranges, then the
firmware performs a check to ensure that the provided output would not lead to an insecure guest
state'

so I take that 'firmware' to be the PSP; that wording doesn't say that
it checks that the CPUID is identical, just that it 'would not lead to
an insecure guest' - so a hypervisor could hide any 'no longer affected
by' flag for all the CPUs in it's migration pool and the firmware
shouldn't complain; so it should be OK to pessimise.

Dave

> > Now all the code after sme_enable() can potentially take unexpected
> > execution paths, where post-sme_enable() code makes assumptions about
> > pre-sme_enable() checks that may no longer hold true.
> 
> So as I said above, if you parse SNP CPUID page early enough, you don't
> have to worry about feature rediscovery. Early enough means, before
> identify_boot_cpu().
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
>
Borislav Petkov Oct. 21, 2021, 4:55 p.m. UTC | #14
On Thu, Oct 21, 2021 at 04:56:09PM +0100, Dr. David Alan Gilbert wrote:
> I can imagine a malicious hypervisor trying to return different cpuid
> answers to different threads or even the same thread at different times.

Haha, I guess that will fail not because of SEV* but because of the
kernel not really being able to handle heterogeneous CPUIDs.

> Well, the spec (AMD 56860 SEV spec) says:
> 
>   'If firmware encounters a CPUID function that is in the standard or extended ranges, then the
> firmware performs a check to ensure that the provided output would not lead to an insecure guest
> state'
> 
> so I take that 'firmware' to be the PSP; that wording doesn't say that
> it checks that the CPUID is identical, just that it 'would not lead to
> an insecure guest' - so a hypervisor could hide any 'no longer affected
> by' flag for all the CPUs in it's migration pool and the firmware
> shouldn't complain; so it should be OK to pessimise.

AFAIU this, I think this would depend on "[t]he policy used by the
firmware to assess CPUID function output can be found in [PPR]."

So if the HV sets the "no longer affected by" flag but the firmware
deems this set flag as insecure, I'm assuming the firmare will clear
it when it returns the CPUID leafs. I guess I need to go find that
policy...
Dr. David Alan Gilbert Oct. 21, 2021, 5:12 p.m. UTC | #15
* Borislav Petkov (bp@alien8.de) wrote:
> On Thu, Oct 21, 2021 at 04:56:09PM +0100, Dr. David Alan Gilbert wrote:
> > I can imagine a malicious hypervisor trying to return different cpuid
> > answers to different threads or even the same thread at different times.
> 
> Haha, I guess that will fail not because of SEV* but because of the
> kernel not really being able to handle heterogeneous CPUIDs.

My worry is if it fails cleanly or fails in a way an evil hypervisor can
exploit.

> > Well, the spec (AMD 56860 SEV spec) says:
> > 
> >   'If firmware encounters a CPUID function that is in the standard or extended ranges, then the
> > firmware performs a check to ensure that the provided output would not lead to an insecure guest
> > state'
> > 
> > so I take that 'firmware' to be the PSP; that wording doesn't say that
> > it checks that the CPUID is identical, just that it 'would not lead to
> > an insecure guest' - so a hypervisor could hide any 'no longer affected
> > by' flag for all the CPUs in it's migration pool and the firmware
> > shouldn't complain; so it should be OK to pessimise.
> 
> AFAIU this, I think this would depend on "[t]he policy used by the
> firmware to assess CPUID function output can be found in [PPR]."
> 
> So if the HV sets the "no longer affected by" flag but the firmware
> deems this set flag as insecure, I'm assuming the firmare will clear
> it when it returns the CPUID leafs. I guess I need to go find that
> policy...

<digs - ppr_B1_pub_1 55898 rev 0.50 >

OK, so that bit is 8...21 Eax ext2eax bit 6 page 1-109

then 2.1.5.3 CPUID policy enforcement shows 8...21 EAX as
'bitmask'
'bits set in the GuestVal must also be set in HostVal.
This is often applied to feature fields where each bit indicates
support for a feature'

So that's right isn't it?

Dave

> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
>
Borislav Petkov Oct. 21, 2021, 5:37 p.m. UTC | #16
On Thu, Oct 21, 2021 at 06:12:53PM +0100, Dr. David Alan Gilbert wrote:
> OK, so that bit is 8...21 Eax ext2eax bit 6 page 1-109
> 
> then 2.1.5.3 CPUID policy enforcement shows 8...21 EAX as
> 'bitmask'
> 'bits set in the GuestVal must also be set in HostVal.
> This is often applied to feature fields where each bit indicates
> support for a feature'
> 
> So that's right isn't it?

Yap, AFAIRC, it would fail the check if:

(GuestVal & HostVal) != GuestVal

and GuestVal is "the CPUID result value created by the hypervisor that
it wants to give to the guest". Let's say it clears bit 6 there.

Then HostVal comes in which is "the actual CPUID result value specified
in this PPR" and there the guest catches the HV lying its *ss off.

:-)
Dr. David Alan Gilbert Oct. 21, 2021, 5:47 p.m. UTC | #17
* Borislav Petkov (bp@alien8.de) wrote:
> On Thu, Oct 21, 2021 at 06:12:53PM +0100, Dr. David Alan Gilbert wrote:
> > OK, so that bit is 8...21 Eax ext2eax bit 6 page 1-109
> > 
> > then 2.1.5.3 CPUID policy enforcement shows 8...21 EAX as
> > 'bitmask'
> > 'bits set in the GuestVal must also be set in HostVal.
> > This is often applied to feature fields where each bit indicates
> > support for a feature'
> > 
> > So that's right isn't it?
> 
> Yap, AFAIRC, it would fail the check if:
> 
> (GuestVal & HostVal) != GuestVal
> 
> and GuestVal is "the CPUID result value created by the hypervisor that
> it wants to give to the guest". Let's say it clears bit 6 there.
                                               ^^^^^^^

> Then HostVal comes in which is "the actual CPUID result value specified
> in this PPR" and there the guest catches the HV lying its *ss off.
> 
> :-)

Hang on, I think it's perfectly fine for it to clear that bit - it just
gets caught if it *sets* it (i.e. claims to be a chip unaffected by the
bug).

i.e. if guestval=0 then (GustVal & whatever) == GuestVal
  fine

?

Dave

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
>
Borislav Petkov Oct. 21, 2021, 6:46 p.m. UTC | #18
On Thu, Oct 21, 2021 at 06:47:50PM +0100, Dr. David Alan Gilbert wrote:
> Hang on, I think it's perfectly fine for it to clear that bit - it just
> gets caught if it *sets* it (i.e. claims to be a chip unaffected by the
> bug).
> 
> i.e. if guestval=0 then (GustVal & whatever) == GuestVal
>   fine
> 
> ?

Bah, ofc. The name of the bit is NullSelectorClearsBase - so when it is
clear, we will note we're affected, as that patch does:

+       /*
+        * CPUID bit above wasn't set. If this kernel is still running
+        * as a HV guest, then the HV has decided not to advertize
+        * that CPUID bit for whatever reason.  For example, one
+        * member of the migration pool might be vulnerable.  Which
+        * means, the bug is present: set the BUG flag and return.
+        */
+       if (cpu_has(c, X86_FEATURE_HYPERVISOR)) {
+               set_cpu_bug(c, X86_BUG_NULL_SEG);
+               return;
+       }

I have managed to flip the meaning in my mind.

Ok, that makes more sense.

Thx.
Michael Roth Oct. 21, 2021, 8:41 p.m. UTC | #19
On Thu, Oct 21, 2021 at 04:51:06PM +0200, Borislav Petkov wrote:
> On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> > The CPUID calls in snp_cpuid_init() weren't added specifically to induce
> > the #VC-based SEV MSR read, they were added only because I thought the
> > gist of your earlier suggestions were to do more validation against the
> > CPUID table advertised by EFI
> 
> Well, if EFI is providing us with the CPUID table, who verified it? The
> attestation process? Is it signed with the AMD platform key?

For CPUID table pages, the only thing that's assured/attested to by firmware
is that:

 1) it is present at the expected guest physical address (that address
    is generally baked into the EFI firmware, which *is* attested to)
 2) its contents have been validated by the PSP against the current host
    CPUID capabilities as defined by the AMD PPR (Publication #55898),
    Section 2.1.5.3, "CPUID Policy Enforcement"
 3) it is encrypted with the guest key
 4) it is in a validated state at launch

The actual contents of the CPUID table are *not* attested to, so in theory
it can still be manipulated by a malicious hypervisor as part of the initial
SNP_LAUNCH_UPDATE firmware commands that provides the initial plain-text
encoding of the CPUID table that is provided to the PSP via
SNP_LAUNCH_UPDATE. It's also not signed in any way (apparently there were
some security reasons for that decision, though I don't know the full
details).

[A guest owner can still validate their CPUID values against known good
ones as part of their attestation flow, but that is not part of the
attestation report as reported by SNP firmware. (So long as there is some
care taken to ensure the source of the CPUID values visible to
userspace/guest attestion process are the same as what was used by the boot
stack: i.e. EFI/bootloader/kernel all use the CPUID page at that same
initial address, or in cases where a copy is used, that copy is placed in
encrypted/private/validated guest memory so it can't be tampered with during
boot.]

So, while it's more difficult to do, and the scope of influence is reduced,
there are still some games that can be played to mess with boot via
manipulation of the initial CPUID table values, so long as they are within
the constraints set by the CPUID enforcement policy defined in the PPR.

Unfortunately, the presence of the SEV/SEV-ES/SEV-SNP bits in 0x8000001F,
EAX, are not enforced by PSP. The only thing enforced there is that the
hypervisor cannot advertise bits that aren't supported by hardware. So
no matter how much the boot stack is trusted, the CPUID table does not
inherit that trust, and even values that we *know* should be true should be
verified rather than assumed.

But I think there are a couple approaches for verifying this is an SNP
guest that are robust against this sort of scenario. You've touched on
some of them in your other replies, so I'll respond there.

> 
> Because if we can verify the firmware is ok, then we can trust the CPUID
> page, right?
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7CMichael.Roth%40amd.com%7C155dd6f54f3e4de017a908d994a236a5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637704246794699901%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=U%2BS%2B%2F8%2BX8zLvPQGWvsOb7o6sKBz1MOZqU%2BVLKHiwugY%3D&amp;reserved=0
Michael Roth Oct. 21, 2021, 9:34 p.m. UTC | #20
On Thu, Oct 21, 2021 at 04:48:16PM +0200, Borislav Petkov wrote:
> On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> > At which point we then switch to using the CPUID table? But at that
> > point all the previous CPUID checks, both SEV-related/non-SEV-related,
> > are now possibly not consistent with what's in the CPUID table. Do we
> > then revalidate?
> 
> Well, that's a tough question. That's basically the same question as,
> does Linux support heterogeneous cores and can it handle hardware
> features which get enabled after boot. The perfect example is, late
> microcode loading which changes CPUID bits and adds new functionality.
> 
> And the answer to that is, well, hard. You need to decide this on a
> case-by-case basis.
> 
> But isn't it that the SNP CPUID page will be parsed early enough anyway
> so that kernel proper will see only SNP CPUID info and init properly
> using that?

At the time I wrote that I thought you were suggesting moving the SNP CPUID
table initialization to where sme_enable() is in current upstream, so it
seemed worth mentioning, but since the idea was actually to move all the
sev_status initialization in sme_enable() earlier in the code to where
SNP CPUID table init needs to happen (before first cpuid calls are made), I
this scenario is avoided.

> 
> > Even a non-malicious hypervisor might provide inconsistent values
> > between the two sources due to bugs, or SNP validation suppressing
> > certain feature bits that hypervisor otherwise exposes, etc.
> 
> There's also migration, lemme point to a very recent example:
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fr%2F20211021104744.24126-1-jane.malalane%40citrix.com&amp;data=04%7C01%7Cmichael.roth%40amd.com%7C4aaa998fcd134c8d054608d994a1d1aa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637704245057316093%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OKO9o3YzKwRkyWPpam2%2Fxn4aRSMKtPEnZjn05g81SP8%3D&amp;reserved=0
> 
> which is exactly what you say - a non-malicious HV taking care of its
> migration pool. So how do you handle that?

I concur with David's assessment on that solution being compatible with
CPUID enforcement policy. But it's certainly something to consider more
generally.

Fortunately I think I misspoke earlier, I thought there was a case or 2
where bits were suppressed, rather than causing a validation failure,
but looking back through the PPR I doesn't seem like that's actually the
case. Which is good, since that would indeed be painful to deal with in
the context of migration.
Michael Roth Oct. 21, 2021, 11 p.m. UTC | #21
On Thu, Oct 21, 2021 at 04:39:31PM +0200, Borislav Petkov wrote:
> On Wed, Oct 20, 2021 at 09:05:42PM -0500, Michael Roth wrote:
> > According to the APM at least, (Rev 3.37, 15.34.10, "SEV_STATUS MSR"), the
> > SEV MSR is the appropriate source for guests to use. This is what is used
> > in the EFI code as well. So that seems to be the right way to make the
> > initial determination.
> 
> Yap.
> 
> > There's a dependency there on the SEV CPUID bit however, since setting the
> > bit to 0 would generally result in a guest skipping the SEV MSR read and
> > assuming 0. So for SNP it would be more reliable to make use of the CPUID
> > table at that point, since it's less-susceptible to manipulation, or do the
> > #VC-based SEV MSR read (or both).
> 
> So the CPUID page is supplied by the firmware, right?

Yes.

> 
> Then, you parse it and see that the CPUID bit is 1, then you start using
> the SEV_STATUS MSR and all good.
> 
> If there *is* a CPUID page but that bit is 0, then you can safely assume
> that something is playing tricks on ya so you simply refuse booting.

I think that's a good way to deal with this.

I was going to suggest we could assume the presence of SEV status MSR by
virtue of EFI/bootloader/etc having provided a cc_blob, and just read it
right away to confirm this is SNP. But with your approach we could basically
just set up the table early, based on the presence of the cc_blob, and do all
the checks in sme_enable() in the same order as with SEV/SEV-ES, then just
have additional sanity checks against the CPUID/MSR response values to
ensure the SNP bits are present for the cases where a cpuid table / cc_blob
are provided.

I'll work on implementing things in this way and see how it goes.

> 
> > Fully-unencrypted should result in a crash due to the reasons below.
> 
> Crash is a good thing in confidential computing. :)
> 
> > But there may exist some carefully crafted outside influences that could
> > goad the guest into, perhaps, not marking certain pages as private. The
> > best that can be done to prevent that is to audit/harden all the code in the
> > boot stack so that it is less susceptible to that kind of outside
> > manipulation (via mechanisms like SEV-ES, SNP page validation, SNP CPUID
> > table, SNP restricted injection, etc.)
> 
> So to me I wonder why would one use anything *else* but an SNP guest. We
> all know that those previous technologies were just the stepping stones
> towards SNP.

Yah, I think ultimately that's where things are headed.

> 
> > Then of course that boot stack needs to be part of the attestation process
> > to provide any meaningful assurances about the resulting guest state.
> >
> > Outside of the boot stack the guest owner might take some extra precautions.
> > Perhaps custom some kernel driver to verify encryption/validated status of
> > guest pages, some checks against the CPUID table to verify it contains sane
> > values, but not really worth speculating on that aspect as it will be
> > ultimately dependent on how the cloud vendor decides to handle things after
> > boot.
> 
> Well, I've always advocated having a best-practices writeup somewhere
> goes a long way to explain this technology to people and how to get
> their feet wet. And there you can give hints how such verification could
> look like in detail...

Our security team is working on some initial reference designs / tooling
for attestation. It'll eventually make it's way to here:

  https://github.com/AMDESE/sev-guest

but it's still mostly an internal effort so nothing there ATM. But hopefully
that will fill in some of these gaps. But I agree some an accompanying best
practices document to highlight some of these considerations is also
something that should be considered, I'll need to check to see if there's
anything like that in the works already.

> 
> > That would indeed be useful. Perhaps as a nice big comment in sme_enable()
> > and/or the proposed sev_init() so that those invariants can be maintained,
> > or updated in sync with future changes. I'll look into that for the next
> > spin and check with Brijesh on the details.
> 
> There is Documentation/x86/amd-memory-encryption.rst, for example.

Makes sense, will work with Brijesh on this.

Thanks!

-Mike

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7Cmichael.roth%40amd.com%7C966f1e67d4704901d29a08d994a099e1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637704239855683221%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=730WoYycYnjabC4igLcViZsEokSrcMkJ9oXvZso4ULQ%3D&amp;reserved=0
Borislav Petkov Oct. 25, 2021, 11:04 a.m. UTC | #22
On Thu, Oct 21, 2021 at 03:41:49PM -0500, Michael Roth wrote:
> On Thu, Oct 21, 2021 at 04:51:06PM +0200, Borislav Petkov wrote:
> > On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> > > The CPUID calls in snp_cpuid_init() weren't added specifically to induce
> > > the #VC-based SEV MSR read, they were added only because I thought the
> > > gist of your earlier suggestions were to do more validation against the
> > > CPUID table advertised by EFI
> > 
> > Well, if EFI is providing us with the CPUID table, who verified it? The
> > attestation process? Is it signed with the AMD platform key?
> 
> For CPUID table pages, the only thing that's assured/attested to by firmware
> is that:
> 
>  1) it is present at the expected guest physical address (that address
>     is generally baked into the EFI firmware, which *is* attested to)
>  2) its contents have been validated by the PSP against the current host
>     CPUID capabilities as defined by the AMD PPR (Publication #55898),
>     Section 2.1.5.3, "CPUID Policy Enforcement"
>  3) it is encrypted with the guest key
>  4) it is in a validated state at launch
> 
> The actual contents of the CPUID table are *not* attested to,

Why?

> so in theory it can still be manipulated by a malicious hypervisor as
> part of the initial SNP_LAUNCH_UPDATE firmware commands that provides
> the initial plain-text encoding of the CPUID table that is provided
> to the PSP via SNP_LAUNCH_UPDATE. It's also not signed in any way
> (apparently there were some security reasons for that decision, though
> I don't know the full details).

So this sounds like an unnecessary complication. I'm sure there are
reasons to do it this way but my simple thinking would simply want the
CPUID page to be read-only and signed so that the guest can trust it
unconditionally.

> [A guest owner can still validate their CPUID values against known good
> ones as part of their attestation flow, but that is not part of the
> attestation report as reported by SNP firmware. (So long as there is some
> care taken to ensure the source of the CPUID values visible to
> userspace/guest attestion process are the same as what was used by the boot
> stack: i.e. EFI/bootloader/kernel all use the CPUID page at that same
> initial address, or in cases where a copy is used, that copy is placed in
> encrypted/private/validated guest memory so it can't be tampered with during
> boot.]

This sounds like the good practices advice to guest owners would be,
"Hey, I just booted your SNP guest but for full trust, you should go and
verify the CPUID page's contents."

"And if I were you, I wouldn't want to run any verification of CPUID
pages' contents on the same guest because it itself hasn't been verified
yet."

It all sounds weird.

> So, while it's more difficult to do, and the scope of influence is reduced,
> there are still some games that can be played to mess with boot via
> manipulation of the initial CPUID table values, so long as they are within
> the constraints set by the CPUID enforcement policy defined in the PPR.
> 
> Unfortunately, the presence of the SEV/SEV-ES/SEV-SNP bits in 0x8000001F,
> EAX, are not enforced by PSP. The only thing enforced there is that the
> hypervisor cannot advertise bits that aren't supported by hardware. So
> no matter how much the boot stack is trusted, the CPUID table does not
> inherit that trust, and even values that we *know* should be true should be
> verified rather than assumed.
> 
> But I think there are a couple approaches for verifying this is an SNP
> guest that are robust against this sort of scenario. You've touched on
> some of them in your other replies, so I'll respond there.

Yah, I guess the kernel can do good enough verification and then the
full thing needs to be done by the guest owner and in *some* userspace
- not necessarily on the currently booted, unverified guest - but
somewhere, where you have maximal flexibility.

IMHO.
Michael Roth Oct. 25, 2021, 4:35 p.m. UTC | #23
On Mon, Oct 25, 2021 at 01:04:10PM +0200, Borislav Petkov wrote:
> On Thu, Oct 21, 2021 at 03:41:49PM -0500, Michael Roth wrote:
> > On Thu, Oct 21, 2021 at 04:51:06PM +0200, Borislav Petkov wrote:
> > > On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> > > > The CPUID calls in snp_cpuid_init() weren't added specifically to induce
> > > > the #VC-based SEV MSR read, they were added only because I thought the
> > > > gist of your earlier suggestions were to do more validation against the
> > > > CPUID table advertised by EFI
> > > 
> > > Well, if EFI is providing us with the CPUID table, who verified it? The
> > > attestation process? Is it signed with the AMD platform key?
> > 
> > For CPUID table pages, the only thing that's assured/attested to by firmware
> > is that:
> > 
> >  1) it is present at the expected guest physical address (that address
> >     is generally baked into the EFI firmware, which *is* attested to)
> >  2) its contents have been validated by the PSP against the current host
> >     CPUID capabilities as defined by the AMD PPR (Publication #55898),
> >     Section 2.1.5.3, "CPUID Policy Enforcement"
> >  3) it is encrypted with the guest key
> >  4) it is in a validated state at launch
> > 
> > The actual contents of the CPUID table are *not* attested to,
> 
> Why?

As counter-intuitive as it sounds, it actually doesn't buy us if the CPUID
table is part of the PSP attestation report, since:

 - the boot stack is attested to, and if the boot stack isn't careful to
   use the CPUID table at all times, then attesting CPUID table after
   boot doesn't provide any assurance that the boot wasn't manipulated
   by CPUID
 - given the boot stack must take these precautions, guest-specific
   attestation code is just as capable of attesting the CPUID table
   contents/values, since it has the same view of the CPUID values that
   were used during boot.

So leaving it to the guest owner to attest it provides some flexibility
to guest owners to implement it as they see fit, whereas making it part
of the attestation report means that the guest needs the exact contents
of the CPUID page for a particular guest configuration so it can be
incorporated into the measurement they are expecting, which would likely
require some tooling provided by the cloud vendor, since every different
guest configuration, or even changes like the ordering in which entries
are placed in the table, would affect measurement, so it's not something
that could be easily surmised separately with minimal involvement from a
cloud vendor.

And even if the cloud vendor provided a simple way to export the table
contents for measurement, can you really trust it? If you have to audit
individual entries to be sure there's nothing fishy, why not just
incorporate those checks into the guest owner's attestation flow and
leave the vendor out of it completely?

So not including it in the measurement meshes well with the overall
SEV-SNP approach of reducing the cloud vendor's involvement in the
overall attestation process.

> 
> > so in theory it can still be manipulated by a malicious hypervisor as
> > part of the initial SNP_LAUNCH_UPDATE firmware commands that provides
> > the initial plain-text encoding of the CPUID table that is provided
> > to the PSP via SNP_LAUNCH_UPDATE. It's also not signed in any way
> > (apparently there were some security reasons for that decision, though
> > I don't know the full details).
> 
> So this sounds like an unnecessary complication. I'm sure there are
> reasons to do it this way but my simple thinking would simply want the
> CPUID page to be read-only and signed so that the guest can trust it
> unconditionally.

The thing here is that it's not just a specific CPUID page that's valid
for all guests for a particular host. Booting a guest with additional
vCPUs changes the contents, different CPU models/flags changes the
contents, etc. So it needs to be generated for each specific guest
configuration, and can't just be a read-only page.

Some sort of signature that indicates the PSP's stamp of approval on a
particular CPUID page would be nice, but we do sort of have this in the
sense that CPUID page 'address' is part of measurement, and can only
contain values that were blessed by the PSP. The problem then becomes
ensuring that only that address it used for CPUID lookups, and that it's
contents weren't manipulated in a way where it's 'valid' as far as the
PSP is concerned, but still not the 'expected' values for a particular
guest (which is where the attestation mentioned above would come into
play).

> 
> > [A guest owner can still validate their CPUID values against known good
> > ones as part of their attestation flow, but that is not part of the
> > attestation report as reported by SNP firmware. (So long as there is some
> > care taken to ensure the source of the CPUID values visible to
> > userspace/guest attestion process are the same as what was used by the boot
> > stack: i.e. EFI/bootloader/kernel all use the CPUID page at that same
> > initial address, or in cases where a copy is used, that copy is placed in
> > encrypted/private/validated guest memory so it can't be tampered with during
> > boot.]
> 
> This sounds like the good practices advice to guest owners would be,
> "Hey, I just booted your SNP guest but for full trust, you should go and
> verify the CPUID page's contents."
> 
> "And if I were you, I wouldn't want to run any verification of CPUID
> pages' contents on the same guest because it itself hasn't been verified
> yet."
> 
> It all sounds weird.

Yes, understandably so. But the only way to avoid that sort of weirdness
in general is for *all* guest state to be measured, all pages, all
registers, etc. Baking that directly into the SEV-SNP attestation report
would be a non-starter for most since computing the measurement for all
that state independently would require lots of additional inputs from
cloud vendor (who we don't necessarily trust in the first place), and
constant updates of measurement values since they would change with
every guest configuration change, every different starting TSC offset,
different, maybe the order in which vCPUs were onlined, stuff that the
kernel prints to log buffers, etc.

But, if a guest owner wants to attempt clever ways to account for some/all
of that in their attestation flow, they are welcome to try. That's sort of
the idea behind SNP attestation vs. SEV. Things like page
validation/encryption, cpuid enforcement, etc., reduce some of the
variables/possibilities guest owners need to account for during attestation
to make the process more secure/tenable, but they don't rule out all
possibilities, just as Trusted Boot doesn't necessarily mean you can fully
trust your OS state immediately afer boot; there are still outside
influences at play, and the boot stack should guard against them
wherever possible.

> 
> > So, while it's more difficult to do, and the scope of influence is reduced,
> > there are still some games that can be played to mess with boot via
> > manipulation of the initial CPUID table values, so long as they are within
> > the constraints set by the CPUID enforcement policy defined in the PPR.
> > 
> > Unfortunately, the presence of the SEV/SEV-ES/SEV-SNP bits in 0x8000001F,
> > EAX, are not enforced by PSP. The only thing enforced there is that the
> > hypervisor cannot advertise bits that aren't supported by hardware. So
> > no matter how much the boot stack is trusted, the CPUID table does not
> > inherit that trust, and even values that we *know* should be true should be
> > verified rather than assumed.
> > 
> > But I think there are a couple approaches for verifying this is an SNP
> > guest that are robust against this sort of scenario. You've touched on
> > some of them in your other replies, so I'll respond there.
> 
> Yah, I guess the kernel can do good enough verification and then the
> full thing needs to be done by the guest owner and in *some* userspace
> - not necessarily on the currently booted, unverified guest - but
> somewhere, where you have maximal flexibility.

Exactly, moving attestation into the guest allows for more of the these
unexpected states to be accounted for at whatever level of paranoia a guest
owner sees fit, while still allowing firmware to provide some basic
assurances via attestation report and various features to reduce common
attack vectors during/after boot.

> 
> IMHO.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7CMichael.Roth%40amd.com%7Cd9f9c20a37ce4a4b0e0608d997a72f6a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637707566641580997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OmfuUZfcf3bkC%2FmSsiInQ1vScK5Onu1lHWAUH3o%2FUmE%3D&amp;reserved=0
Borislav Petkov Oct. 27, 2021, 11:17 a.m. UTC | #24
On Mon, Oct 25, 2021 at 11:35:18AM -0500, Michael Roth wrote:
> As counter-intuitive as it sounds, it actually doesn't buy us if the CPUID
> table is part of the PSP attestation report, since:

Thanks for taking the time to explain in detail - I think I know now
what's going on, and David explained some additional stuff to me
yesterday.

So, to cut to the chase:

 - yeah, ok, I guess guest owner attestation is what should happen.

 - as to the boot detection, I think you should do in sme_enable(), in
pseudo:

	bool snp_guest_detected;

        if (CPUID page address) {
                read SEV_STATUS;

                snp_guest_detected = SEV_STATUS & MSR_AMD64_SEV_SNP_ENABLED;
        }

        /* old SME/SEV detection path */
        read 0x8000_001F_EAX and look at bits SME and SEV, yadda yadda.

        if (snp_guest_detected && (!SME || !SEV))
                /*
		 * HV is lying to me, do something there, dunno what. I guess we can
		 * continue booting unencrypted so that the guest owner knows that
		 * detection has failed and maybe the HV didn't want us to force SNP.
		 * This way, attestation will fail and the user will know why.
		 * Or something like that.
		 */


        /* normal feature detection continues. */

How does that sound?
Michael Roth Oct. 27, 2021, 3:13 p.m. UTC | #25
On Wed, Oct 27, 2021 at 01:17:11PM +0200, Borislav Petkov wrote:
> On Mon, Oct 25, 2021 at 11:35:18AM -0500, Michael Roth wrote:
> > As counter-intuitive as it sounds, it actually doesn't buy us if the CPUID
> > table is part of the PSP attestation report, since:
> 
> Thanks for taking the time to explain in detail - I think I know now
> what's going on, and David explained some additional stuff to me
> yesterday.
> 
> So, to cut to the chase:
> 
>  - yeah, ok, I guess guest owner attestation is what should happen.
> 
>  - as to the boot detection, I think you should do in sme_enable(), in
> pseudo:
> 
> 	bool snp_guest_detected;
> 
>         if (CPUID page address) {
>                 read SEV_STATUS;
> 
>                 snp_guest_detected = SEV_STATUS & MSR_AMD64_SEV_SNP_ENABLED;
>         }
> 
>         /* old SME/SEV detection path */
>         read 0x8000_001F_EAX and look at bits SME and SEV, yadda yadda.
> 
>         if (snp_guest_detected && (!SME || !SEV))
>                 /*
> 		 * HV is lying to me, do something there, dunno what. I guess we can
> 		 * continue booting unencrypted so that the guest owner knows that
> 		 * detection has failed and maybe the HV didn't want us to force SNP.
> 		 * This way, attestation will fail and the user will know why.
> 		 * Or something like that.
> 		 */
> 
> 
>         /* normal feature detection continues. */
> 
> How does that sound?

That seems promising. I've been testing a similar approach in conjunction with
moving sme_enable() to after the initial #VC handler is set up and things seem
to work out pretty nicely.

boot/compressed is a little less straightforward since the sme_enable()
equivalent is set_sev_encryption_mask() which sets sev_status and is written
in assembly, whereas the SNP-specific bits we're adding relies on C code
that handles stuff like scanning EFI config table are in C, so probably
worthwhile to see if everything can be redone in C. But then there's
get_sev_encryption_bit(), which needs to be in assembly since it needs
to be called from 32-bit entry path as well, but that doesn't actually
rely on anything set by set_sev_encryption_mask(), so it seems like it
should be okay to split set_sev_encryption_mask() out into a separate C
routine.

Will work on implementing/testing that approach, but if you or Joerg are
aware of any showstoppers there just let me know.

Thanks!

-Mike

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7Cmichael.roth%40amd.com%7C72940826a93b49882ffa08d9993b5390%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637709302358641670%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2BoUx7zP3RA57CwGG2q5IkUkrYQZiOL9ZoLxvIVTq%2BDY%3D&amp;reserved=0
diff mbox series

Patch

diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 8ee27d07c1cd..2796c524d174 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -191,6 +191,20 @@  void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
 	if (exit_code != SVM_EXIT_CPUID)
 		goto fail;
 
+	/*
+	 * A #VC implies that either SEV-ES or SEV-SNP are enabled, so the SEV
+	 * MSR is also available. Go ahead and initialize sev_status here to
+	 * allow SEV features to be checked without relying solely on the SEV
+	 * cpuid bit to indicate whether it is safe to do so.
+	 */
+	if (!sev_status) {
+		unsigned long lo, hi;
+
+		asm volatile("rdmsr" : "=a" (lo), "=d" (hi)
+				     : "c" (MSR_AMD64_SEV));
+		sev_status = (hi << 32) | lo;
+	}
+
 	sev_es_wr_ghcb_msr(GHCB_CPUID_REQ(fn, GHCB_CPUID_REQ_EAX));
 	VMGEXIT();
 	val = sev_es_rd_ghcb_msr();