diff mbox series

[v9,31/43] x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers

Message ID 20220128171804.569796-32-brijesh.singh@amd.com (mailing list archive)
State New
Headers show
Series Add AMD Secure Nested Paging (SEV-SNP) Guest Support | expand

Commit Message

Brijesh Singh Jan. 28, 2022, 5:17 p.m. UTC
From: Michael Roth <michael.roth@amd.com>

CPUID instructions generate a #VC exception for SEV-ES/SEV-SNP guests,
for which early handlers are currently set up to handle. In the case
of SEV-SNP, guests can use a configurable location in guest memory
that has been pre-populated with a firmware-validated CPUID table to
look up the relevant CPUID values rather than requesting them from
hypervisor via a VMGEXIT. Add the various hooks in the #VC handlers to
allow CPUID instructions to be handled via the table. The code to
actually configure/enable the table will be added in a subsequent
commit.

Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/boot/compressed/sev.c    |   1 +
 arch/x86/include/asm/sev-common.h |   2 +
 arch/x86/kernel/sev-shared.c      | 336 ++++++++++++++++++++++++++++++
 arch/x86/kernel/sev.c             |   1 +
 4 files changed, 340 insertions(+)

Comments

Michael Roth Feb. 5, 2022, 3:42 p.m. UTC | #1
On Sat, Feb 05, 2022 at 11:54:01AM +0100, Borislav Petkov wrote:
> On Fri, Jan 28, 2022 at 11:17:52AM -0600, Brijesh Singh wrote:
> > +/*
> > + * Individual entries of the SEV-SNP CPUID table, as defined by the SEV-SNP
> > + * Firmware ABI, Revision 0.9, Section 7.1, Table 14.
> > + */
> > +struct snp_cpuid_fn {
> > +	u32 eax_in;
> > +	u32 ecx_in;
> > +	u64 xcr0_in;
> > +	u64 xss_in;
> 
> So what's the end result here:
> 
> -+	u64 __unused;
> -+	u64 __unused2;
> ++	u64 xcr0_in;
> ++	u64 xss_in;
> 
> those are not unused fields anymore but xcr0 and xss input values?
> 
> Looking at the FW abi doc, they're only mentioned in "Table 14.
> CPUID_FUNCTION Structure" that they're XCR0 and XSS at the time of the
> CPUID execution.
> 
> But those values are input values to what exactly, guest or firmware?

These are inputs to firmware, either through a MSG_CPUID_REQ guest
message that gets passed along by the HV to firmware for validation,
or, in this case, through SNP_LAUNCH_UPDATE when the HV passes the
initial CPUID table contents to firmware for validation before it
gets mapped into guest memory.

> 
> There's a typo in the FW doc, btw:
> 
> "The guest constructs an MSG_CPUID_REQ message as defined in Table 13.
> This message contains an array of CPUID function structures as defined
> in Table 13."
> 
> That second "Table" is 14 not 13.
> 
> So, if an array CPUID_FUNCTION[] is passed as part of an MSG_CPUID_REQ
> command, then, the two _IN variables contain what the guest received
> from the HV for XCR0 and XSS values. Which means, this is the guest
> asking the FW whether those values the HV gave the guest are kosher.
> 
> Am I close?

Most of that documentation is written in the context of the
MSG_CPUID_REQ guest message interface. In that case, a guest has been
provided a certain sets of CPUID values by KVM CPUID instruction
emulation (or potentially the CPUID table), and wants firmware to
validate whether all/some of them are valid for that particular
system/configuration.

In the case of 0xD subfunction 0x0 and 0x1, the CPUID leaf on real
hardware will change depending on what the guest sets in its actual
XCR0 control register (APM Volume 2 11.5.2) or IA32_XSS_MSR register,
whose combined bits are OR'd to form the XFEATURE_ENABLED_MASK.

CPUID leaf 0xD subfunctions 0x0 and 0x1 advertise the size of the
XSAVE area required for the features currently enabled in
XFEATURE_ENABLED_MASK via the EBX return value from the corresponding
CPUID instruction, so for firmware to validate whether that EBX value
is valid for a particular system/configuration, it needs to know
what the guest's currently XFEATURE_ENABLED_MASK is, so it needs to
know what the guest has set in its XCR0 and IA32_XSS_MSR registers.

That's where the XCR0_IN and XSS_IN inputs come into play: they are
inputs to the firmware so it can do the proper validation based on
the guest's current state / XFEATURE_ENABLED_MASK.

But that guest message interface will likely be deprecated at some
point in favor of the static CPUID table (SNP FW ABI 8.14.2.6),
since all the firmware validation has already been done in advance,
and any additional validation is redundant, so this series relies
purely on the CPUID table.

In the case of the CPUID table, we don't know what the guest's
XFEATURE_ENABLED_MASK is going to be at the time a CPUID instruction
is issued for 0xD:0x0,0xD,0x1, and those values will change as the
guest boots and begins enabling additional XSAVE features. The only
thing that's certain is that there needs to be at least one CPUID
table entry for 0xD:0x0 and 0xD:0x1 corresponding to the initial
XFEATURE_ENABLED_MASK, which will correspond to XCR0_IN=1/XSS_IN=0,
or XCR0_IN=3/XSS_IN=0 depending on the hypervisor/guest
implementation.

What to do about other combinations of XCR0_IN/XSS_IN gets a bit
weird, since encoding all those permutations would not scale well,
and alternative methods of trying to encode them in the table would
almost certainly result in lots of different incompatibile guest
implementations floating around.

So our recommendation has been for hypervisors to encode only the
0xD:0x0/0xD:0x1 entries corresponding to these initial guest states
in the CPUID table, and for guests to ignore the XCR0_IN/XSS_IN
values completely, and instead have the guest calculate the XSAVE
save area size dynamically (EBX). This has multiple benefits:

1) Guests that implement this approach would be compatible even
   with hypervisors that try to encode additional permutations of
   XCR0_IN/XSS_IN in the table based on their read of the spec
2) It is just as secure, since it requires only that the guest
   trust itself and the APM specification, which is the ultimate
   authority on what firmware would be validationa against
3) The other leaf registers, EAX/ECX/EDX are not dependent on
   XCR0_IN/XSS_IN/XFEATURE_ENABLED_MASK, and so any
   0xD:0x0/0xD:0x1 entry will do as the starting point for the
   calculation.

That's why in v8 these fields were documented as unused1/unused2.
But when you suggested that we should enforce 0 in that case, it
became clear we should adjust our recommendation to go ahead and
check for the specific XCR0_IN={1,3}/XSS_IN=0 values when
searching for the base 0xD:0x0/0xD:0x1 entry to use, because a
hypervisor that chooses to use XCR0_IN/XSS_IN is not out-of-spec,
and should be supported, and there isn't really any read of the
spec from the hypervisor perspective where XCR0_IN=0/XSS_IN=0
would be valid.

So for v9 I went ahead and added the XCR0_IN/XSS_IN checks, and
updated our recommendations (will send an updated version to
SNP mailing list by Monday) to not ignore them in the guest.

I added some comments in snp_cpuid_get_validated_func() and
snp_cpuid_calc_xsave_size() to clarify how XCR0_IN/XSS_IN are
being used there, but let me know if those need to be beefed up
a bit.

-Mike
Michael Roth Feb. 5, 2022, 4:22 p.m. UTC | #2
On Sat, Feb 05, 2022 at 11:54:01AM +0100, Borislav Petkov wrote:
> On Fri, Jan 28, 2022 at 11:17:52AM -0600, Brijesh Singh wrote:
> 
> > +static const struct snp_cpuid_info *snp_cpuid_info_get_ptr(void)
> > +{
> > +	void *ptr;
> > +
> > +	asm ("lea cpuid_info_copy(%%rip), %0"
> > +	     : "=r" (ptr)
> 
> Same question as the last time:
> 
> Why not "=g" and let the compiler decide?

The documentation for lea (APM Volume 3 Chapter 3) seemed to require
that the destination register be a general purpose register, so it
seemed like there was potential for breakage in allowing GCC to use
anything otherwise. Maybe GCC is smart enough to figure that out, but
since we know the constraint in advance it seemed safer to stick
with the current approach of enforcing that constraint.

> 
> > +	     : "p" (&cpuid_info_copy));
> > +
> > +	return ptr;
> > +}
> 
> ...
> 
> > +static bool snp_cpuid_check_range(u32 func)
> > +{
> > +	if (func <= cpuid_std_range_max ||
> > +	    (func >= 0x40000000 && func <= cpuid_hyp_range_max) ||
> > +	    (func >= 0x80000000 && func <= cpuid_ext_range_max))
> > +		return true;
> > +
> > +	return false;
> > +}
> > +
> > +static int snp_cpuid_postprocess(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
> > +				 u32 *ecx, u32 *edx)
> 
> And again, same question as the last time:
> 
> I'm wondering if you could make everything a lot easier by doing
> 
> static int snp_cpuid_postprocess(struct cpuid_leaf *leaf)
> 
> and marshall around that struct cpuid_leaf which contains func, subfunc,
> e[abcd]x instead of dealing with 6 parameters.
> 
> Callers of snp_cpuid() can simply allocate it on their stack and hand it
> in and it is all in sev-shared.c so nicely self-contained...
> 
> Ok I'm ignoring this patch for now and I'll review it only after you've
> worked in all comments from the previous review.

I did look into it and honestly it just seemed to add more abstractions that
made it harder to parse the specific operations taken place here. For
instance, post-processing of 0x8000001E entry, we have e{a,b,c,d}x from
the CPUID table, then to post process:

  switch (func):
  case 0x8000001E:
    /* extended APIC ID */
    snp_cpuid_hv(func, subfunc, eax, &ebx2, &ecx2, NULL);
                                |    |      |      |
                                |    |      |      edx from cpuid table is used as-is
                                |    |      |  
                                |    |      |
                                |    |      load HV value into tmp ecx2
                                |    |
                                |    load HV value into tmp ebx2
                                |
                                |
                                replace eax completely with the HV value

    # then do the remaining fixups for final ebx/ecx

    /* compute ID */
    *ebx = (*ebx & GENMASK(31, 8)) | (ebx2 & GENMASK(7, 0));
    /* node ID */
    *ecx = (*ecx & GENMASK(31, 8)) | (ecx2 & GENMASK(7, 0));

and it all reads in a clear/familiar way to all the other
cpuid()/native_cpuid() users throughout the kernel, and from the
persective of someone auditing this from a security perspective that
needs to quickly check what registers come from the CPUID table, what
registers come from HV, what the final result is, it all just seems very
clear and familiar to me.

But if we start passing around this higher-level structure that does
not do anything other than abstract away e{a,b,c,x} to save on function
arguments, things become muddier, and there's more pointer dereference
operations and abstractions to sift through. I saved the diff from when
I looked into it previously (was just a rough-sketch, not build-tested),
and included it below for reference, but it just didn't seem to help with
readability to me, which I think is important here since this is probably
one of the most security-sensitive piece of the CPUID table handling,
since we're dealing with untrusted CPUID sources here and it needs to be
clear what exactly is ending up in the E{A,B,C,D} registers we're
returning for a particular CPUID instruction:

(There are some possible optimizations below, like added a mask parameter
so control specifically what EAX/EBX/ECX/EDX field should be modified,
possibly reworking snp_cpuid_info structure definitions to re-use the
cpuid_leaf struct internally, also modifying __sev_cpuid_hv() to take
the cpuid_leaf struct, etc., but none of that really seemed like
it would help much with the key issue of readability, so I ended up
setting it aside for v9)

diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index b2defbf7e66b..53534a6b1dcc 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -49,6 +49,13 @@ struct snp_cpuid_info {
 	struct snp_cpuid_fn fn[SNP_CPUID_COUNT_MAX];
 } __packed;
 
+struct cpuid_leaf {
+	u32 eax;
+	u32 ebx;
+	u32 ecx;
+	u32 edx;
+};
+
 /*
  * Since feature negotiation related variables are set early in the boot
  * process they must reside in the .data section so as not to be zeroed
@@ -260,14 +267,14 @@ static int __sev_cpuid_hv(u32 func, int reg_idx, u32 *reg)
 	return 0;
 }
 
-static int sev_cpuid_hv(u32 func, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx)
+static int sev_cpuid_hv(u32 func, struct cpuid_leaf *leaf)
 {
 	int ret;
 
-	ret = __sev_cpuid_hv(func, GHCB_CPUID_REQ_EAX, eax);
-	ret = ret ? : __sev_cpuid_hv(func, GHCB_CPUID_REQ_EBX, ebx);
-	ret = ret ? : __sev_cpuid_hv(func, GHCB_CPUID_REQ_ECX, ecx);
-	ret = ret ? : __sev_cpuid_hv(func, GHCB_CPUID_REQ_EDX, edx);
+	ret = __sev_cpuid_hv(func, GHCB_CPUID_REQ_EAX, &leaf->eax);
+	ret = ret ? : __sev_cpuid_hv(func, GHCB_CPUID_REQ_EBX, &leaf->ebx);
+	ret = ret ? : __sev_cpuid_hv(func, GHCB_CPUID_REQ_ECX, &leaf->ecx);
+	ret = ret ? : __sev_cpuid_hv(func, GHCB_CPUID_REQ_EDX, &leaf->edx);
 
 	return ret;
 }
@@ -328,8 +335,7 @@ static int snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
 	return xsave_size;
 }
 
-static void snp_cpuid_hv(u32 func, u32 subfunc, u32 *eax, u32 *ebx, u32 *ecx,
-			 u32 *edx)
+static void snp_cpuid_hv(u32 func, u32 subfunc, struct cpuid_leaf *leaf)
 {
 	/*
 	 * MSR protocol does not support fetching indexed subfunction, but is
@@ -342,13 +348,12 @@ static void snp_cpuid_hv(u32 func, u32 subfunc, u32 *eax, u32 *ebx, u32 *ecx,
 	if (cpuid_function_is_indexed(func) && subfunc)
 		sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV);
 
-	if (sev_cpuid_hv(func, eax, ebx, ecx, edx))
+	if (sev_cpuid_hv(func, leaf))
 		sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV);
 }
 
 static bool
-snp_cpuid_get_validated_func(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
-			     u32 *ecx, u32 *edx)
+snp_cpuid_get_validated_func(u32 func, u32 subfunc, struct cpuid_leaf *leaf)
 {
 	const struct snp_cpuid_info *cpuid_info = snp_cpuid_info_get_ptr();
 	int i;
@@ -362,10 +367,10 @@ snp_cpuid_get_validated_func(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
 		if (cpuid_function_is_indexed(func) && fn->ecx_in != subfunc)
 			continue;
 
-		*eax = fn->eax;
-		*ebx = fn->ebx;
-		*ecx = fn->ecx;
-		*edx = fn->edx;
+		leaf->eax = fn->eax;
+		leaf->ebx = fn->ebx;
+		leaf->ecx = fn->ecx;
+		leaf->edx = fn->edx;
 
 		return true;
 	}
@@ -383,33 +388,34 @@ static bool snp_cpuid_check_range(u32 func)
 	return false;
 }
 
-static int snp_cpuid_postprocess(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
-				 u32 *ecx, u32 *edx)
+static int snp_cpuid_postprocess(u32 func, u32 subfunc, struct cpuid_leaf *leaf)
 {
-	u32 ebx2, ecx2, edx2;
+	struct cpuid_leaf leaf_tmp;
 
 	switch (func) {
 	case 0x1:
-		snp_cpuid_hv(func, subfunc, NULL, &ebx2, NULL, &edx2);
+		snp_cpuid_hv(func, subfunc, &leaf_tmp);
 
 		/* initial APIC ID */
-		*ebx = (ebx2 & GENMASK(31, 24)) | (*ebx & GENMASK(23, 0));
+		leaf->ebx = (leaf_tmp.ebx & GENMASK(31, 24)) | (leaf->ebx & GENMASK(23, 0));
 		/* APIC enabled bit */
-		*edx = (edx2 & BIT(9)) | (*edx & ~BIT(9));
+		leaf->edx = (leaf_tmp.edx & BIT(9)) | (leaf->edx & ~BIT(9));
 
 		/* OSXSAVE enabled bit */
 		if (native_read_cr4() & X86_CR4_OSXSAVE)
-			*ecx |= BIT(27);
+			leaf->ecx |= BIT(27);
 		break;
 	case 0x7:
 		/* OSPKE enabled bit */
-		*ecx &= ~BIT(4);
+		leaf->ecx &= ~BIT(4);
 		if (native_read_cr4() & X86_CR4_PKE)
-			*ecx |= BIT(4);
+			leaf->ecx |= BIT(4);
 		break;
 	case 0xB:
 		/* extended APIC ID */
-		snp_cpuid_hv(func, 0, NULL, NULL, NULL, edx);
+		snp_cpuid_hv(func, 0, &leaf_tmp);
+		leaf->edx = leaf_tmp.edx;
+
 		break;
 	case 0xD: {
 		bool compacted = false;
@@ -440,7 +446,7 @@ static int snp_cpuid_postprocess(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
 			 * to avoid this becoming an issue it's safer to simply
 			 * treat this as unsupported for SEV-SNP guests.
 			 */
-			if (!(*eax & (BIT(1) | BIT(3))))
+			if (!(leaf->eax & (BIT(1) | BIT(3))))
 				return -EINVAL;
 
 			compacted = true;
@@ -450,16 +456,17 @@ static int snp_cpuid_postprocess(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
 		if (xsave_size < 0)
 			return -EINVAL;
 
-		*ebx = xsave_size;
+		leaf->ebx = xsave_size;
 		}
 		break;
 	case 0x8000001E:
 		/* extended APIC ID */
-		snp_cpuid_hv(func, subfunc, eax, &ebx2, &ecx2, NULL);
+		snp_cpuid_hv(func, subfunc, &leaf_tmp);
+		leaf->eax = leaf_tmp.eax;
 		/* compute ID */
-		*ebx = (*ebx & GENMASK(31, 8)) | (ebx2 & GENMASK(7, 0));
+		leaf->ebx = (leaf->ebx & GENMASK(31, 8)) | (leaf_tmp.ebx & GENMASK(7, 0));
 		/* node ID */
-		*ecx = (*ecx & GENMASK(31, 8)) | (ecx2 & GENMASK(7, 0));
+		leaf->ecx = (leaf->ecx & GENMASK(31, 8)) | (leaf_tmp.ecx & GENMASK(7, 0));
 		break;
 	default:
 		/* No fix-ups needed, use values as-is. */
@@ -473,15 +480,14 @@ static int snp_cpuid_postprocess(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
  * Returns -EOPNOTSUPP if feature not enabled. Any other return value should be
  * treated as fatal by caller.
  */
-static int snp_cpuid(u32 func, u32 subfunc, u32 *eax, u32 *ebx, u32 *ecx,
-		     u32 *edx)
+static int snp_cpuid(u32 func, u32 subfunc, struct cpuid_leaf *leaf)
 {
 	const struct snp_cpuid_info *cpuid_info = snp_cpuid_info_get_ptr();
 
 	if (!cpuid_info->count)
 		return -EOPNOTSUPP;
 
-	if (!snp_cpuid_get_validated_func(func, subfunc, eax, ebx, ecx, edx)) {
+	if (!snp_cpuid_get_validated_func(func, subfunc, leaf)) {
 		/*
 		 * Some hypervisors will avoid keeping track of CPUID entries
 		 * where all values are zero, since they can be handled the
@@ -497,12 +503,12 @@ static int snp_cpuid(u32 func, u32 subfunc, u32 *eax, u32 *ebx, u32 *ecx,
 		 * not in the table, but is still in the valid range, proceed
 		 * with the post-processing. Otherwise, just return zeros.
 		 */
-		*eax = *ebx = *ecx = *edx = 0;
+		leaf->eax = leaf->ebx = leaf->ecx = leaf->edx = 0;
 		if (!snp_cpuid_check_range(func))
 			return 0;
 	}
 
-	return snp_cpuid_postprocess(func, subfunc, eax, ebx, ecx, edx);
+	return snp_cpuid_postprocess(func, subfunc, leaf);
 }
 
 /*
@@ -514,28 +520,28 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
 {
 	unsigned int subfn = lower_bits(regs->cx, 32);
 	unsigned int fn = lower_bits(regs->ax, 32);
-	u32 eax, ebx, ecx, edx;
+	struct cpuid_leaf *leaf;
 	int ret;
 
 	/* Only CPUID is supported via MSR protocol */
 	if (exit_code != SVM_EXIT_CPUID)
 		goto fail;
 
-	ret = snp_cpuid(fn, subfn, &eax, &ebx, &ecx, &edx);
+	ret = snp_cpuid(fn, subfn, leaf);
 	if (!ret)
 		goto cpuid_done;
 
 	if (ret != -EOPNOTSUPP)
 		goto fail;
 
-	if (sev_cpuid_hv(fn, &eax, &ebx, &ecx, &edx))
+	if (sev_cpuid_hv(fn, leaf))
 		goto fail;
 
 cpuid_done:
-	regs->ax = eax;
-	regs->bx = ebx;
-	regs->cx = ecx;
-	regs->dx = edx;
+	regs->ax = leaf->eax;
+	regs->bx = leaf->ebx;
+	regs->cx = leaf->ecx;
+	regs->dx = leaf->edx;
 
 	/*
 	 * This is a VC handler and the #VC is only raised when SEV-ES is


> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7CMichael.Roth%40amd.com%7C6bc14b8b5b854a38d7c008d9e895da5b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637796552649205409%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=Lc5o1tYKrtqUy2h%2B8onmgdaydqUWTlnj7V9rfuBEU0s%3D&amp;reserved=0
Michael Roth Feb. 7, 2022, 3:37 p.m. UTC | #3
On Sun, Feb 06, 2022 at 02:37:59PM +0100, Borislav Petkov wrote:
> First of all,
> 
> let me give you a very important piece of advice for the future:
> ignoring review feedback is a very unfriendly thing to do:
> 
> - if you agree with the feedback, you work it in in the next revision
> 
> - if you don't agree, you *say* *why* you don't
> 
> What you should avoid is what you've done - ignore it silently. Because
> reviewing submitters code is the most ungrateful work around the kernel,
> and, at the same time, it is the most important one.
> 
> So please make sure you don't do that in the future when submitting
> patches for upstream inclusion. I'm sure you can imagine, the ignoring
> can go both ways.

Absolutely, I know a thorough review is grueling work, and would never
want to give the impression that I don't appreciate it. Was just hoping
to revisit these in the context of v9 since there were some concerning
things in flight WRT the spec handling and I was sort of focused on
getting ahead of those in case they involved firmware/spec changes. But
I realize that's resulted in a waste of your time and I should have at
least provided some indication of where I was with these before your
review. Won't happen again.

> 
> On Sat, Feb 05, 2022 at 10:22:49AM -0600, Michael Roth wrote:
> > The documentation for lea (APM Volume 3 Chapter 3) seemed to require
> > that the destination register be a general purpose register, so it
> > seemed like there was potential for breakage in allowing GCC to use
> > anything otherwise.
> 
> There's no such potential: binutils encodes the unstruction operands
> and what types are allowed. IOW, the assembler knows that there goes a
> register.
> 
> > Maybe GCC is smart enough to figure that out, but since we know the
> > constraint in advance it seemed safer to stick with the current
> > approach of enforcing that constraint.
> 
> I guess in this particular case it won't matter whether it is "=r" or
> "=g" but still...
> 
> > I did look into it and honestly it just seemed to add more abstractions that
> > made it harder to parse the specific operations taken place here. For
> > instance, post-processing of 0x8000001E entry, we have e{a,b,c,d}x from
> > the CPUID table, then to post process:
> > 
> >   switch (func):
> >   case 0x8000001E:
> >     /* extended APIC ID */
> >     snp_cpuid_hv(func, subfunc, eax, &ebx2, &ecx2, NULL);
> 
> Well, my suggestion was to put it *all* in the subleaf struct - not just
> the regs:
> 
> struct cpuid_leaf {
> 	u32 func;
> 	u32 subfunc;
> 	u32 eax;
> 	u32 ebx;
> 	u32 ecx;
> 	u32 edx;
> };
> 
> so that the call signature is:
> 
> 	snp_cpuid_postprocess(struct cpuid_leaf *leaf)
> 
> 
> > and it all reads in a clear/familiar way to all the other
> > cpuid()/native_cpuid() users throughout the kernel,
> 
> maybe it should read differently *on* *purpose*. Exactly because this is
> *not* the usual CPUID handling code but CPUID verification code for SNP
> guests.

Well, there's also sev_cpuid_hv(), which really is sort of a variant of
cpuid()/native_cpuid() helpers that happens to use the GHCB protocol, and
snp_cpuid() gets called at roughly the same callsite in the #VC handler,
so it made sense to me to just stick to a similar signature across all of
them. But no problem implementing it as you suggested, that was just my
reasoning there.

> 
> And please explain to me what's so unclear about leaf->eax vs *eax?!

It's not that one is unclear, it's just that, in the context of reading
through a function which has a lot of similar/repetitive cases for
various leaves:

  snp_cpuid_hv(fn, subfn, &eax, &ebx2, NULL, NULL) 

  ebx = fixup(ebx2)

seems slightly easier to process to me than:

  snp_cpuid_hv(&leaf_hv)

  leaf->eax = leaf_hv->eax
  leaf->ebx = fixup(leaf_hv->ebx)

because I already have advance indication from the function call that eax
will be overwritten by hv, ebx2 will getting fixed up in subsequent lines,
and ecx/edx will be used as is, and don't need to parse multiple lines to
discern that.

But I know I may very well be biased from staring at the existing
implementation so much, and that maybe your approach would be clearer to
someone looking at the code who isn't as familiar with that pattern.

> 
> > I saved the diff from when I looked into it previously (was just a
> > rough-sketch, not build-tested), and included it below for reference,
> > but it just didn't seem to help with readability to me,
> 
> Well, looking at it, the only difference is that the IO is done
> over a struct instead of separate pointers. And the diff is pretty
> straight-forward.
> 
> So no, having a struct cpuid_leaf containing it all is actually better
> in this case because you know which is which if you name it properly and
> you have a single pointer argument which you pass around - something
> which is done all around the kernel.

Ok, will work this in for v10. My plan is to introduce this struct:

  struct cpuid_leaf {
      u32 fn;
      u32 subfn;
      u32 eax;
      u32 ebx;
      u32 ecx;
      u32 edx;
  }

as part of the patch which introduces sev_cpuid_hv():

  x86/sev: Move MSR-based VMGEXITs for CPUID to helper

and then utilize that for the function parameters there, here, and any
other patches in the SNP code involved with fetching/manipulating cpuid
values before returning them to the #VC handler.

Thanks!

-Mike

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7Cmichael.roth%40amd.com%7Cd409d10fc3cc41c4cbfb08d9e975ebfc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637797515011518153%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=%2FDjnNrQ7a8%2F6eI%2FRVJJ2vmLEpeDaAeDVOJcW9CBnggU%3D&amp;reserved=0
diff mbox series

Patch

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 1e5aa6b65025..1b80c1d0ea1f 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -20,6 +20,7 @@ 
 #include <asm/fpu/xcr.h>
 #include <asm/ptrace.h>
 #include <asm/svm.h>
+#include <asm/cpuid.h>
 
 #include "error.h"
 
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 6d90f7c688b1..cd769984e929 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -152,6 +152,8 @@  struct snp_psc_desc {
 #define GHCB_TERM_PSC			1	/* Page State Change failure */
 #define GHCB_TERM_PVALIDATE		2	/* Pvalidate failure */
 #define GHCB_TERM_NOT_VMPL0		3	/* SEV-SNP guest is not running at VMPL-0 */
+#define GHCB_TERM_CPUID			4	/* CPUID-validation failure */
+#define GHCB_TERM_CPUID_HV		5	/* CPUID failure during hypervisor fallback */
 
 #define GHCB_RESP_CODE(v)		((v) & GHCB_MSR_INFO_MASK)
 
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 633f1f93b6e1..dea9f7b28620 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -14,6 +14,36 @@ 
 #define has_cpuflag(f)	boot_cpu_has(f)
 #endif
 
+/*
+ * Individual entries of the SEV-SNP CPUID table, as defined by the SEV-SNP
+ * Firmware ABI, Revision 0.9, Section 7.1, Table 14.
+ */
+struct snp_cpuid_fn {
+	u32 eax_in;
+	u32 ecx_in;
+	u64 xcr0_in;
+	u64 xss_in;
+	u32 eax;
+	u32 ebx;
+	u32 ecx;
+	u32 edx;
+	u64 __reserved;
+} __packed;
+
+/*
+ * SEV-SNP CPUID table header, as defined by the SEV-SNP Firmware ABI,
+ * Revision 0.9, Section 8.14.2.6. Also noted there is the SEV-SNP
+ * firmware-enforced limit of 64 entries per CPUID table.
+ */
+#define SNP_CPUID_COUNT_MAX 64
+
+struct snp_cpuid_info {
+	u32 count;
+	u32 __reserved1;
+	u64 __reserved2;
+	struct snp_cpuid_fn fn[SNP_CPUID_COUNT_MAX];
+} __packed;
+
 /*
  * Since feature negotiation related variables are set early in the boot
  * process they must reside in the .data section so as not to be zeroed
@@ -23,6 +53,19 @@ 
  */
 static u16 ghcb_version __ro_after_init;
 
+/* Copy of the SNP firmware's CPUID page. */
+static struct snp_cpuid_info cpuid_info_copy __ro_after_init;
+
+/*
+ * These will be initialized based on CPUID table so that non-present
+ * all-zero leaves (for sparse tables) can be differentiated from
+ * invalid/out-of-range leaves. This is needed since all-zero leaves
+ * still need to be post-processed.
+ */
+static u32 cpuid_std_range_max __ro_after_init;
+static u32 cpuid_hyp_range_max __ro_after_init;
+static u32 cpuid_ext_range_max __ro_after_init;
+
 static bool __init sev_es_check_cpu_features(void)
 {
 	if (!has_cpuflag(X86_FEATURE_RDRAND)) {
@@ -224,6 +267,266 @@  static int sev_cpuid_hv(u32 func, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx)
 	return ret;
 }
 
+/*
+ * This may be called early while still running on the initial identity
+ * mapping. Use RIP-relative addressing to obtain the correct address
+ * while running with the initial identity mapping as well as the
+ * switch-over to kernel virtual addresses later.
+ */
+static const struct snp_cpuid_info *snp_cpuid_info_get_ptr(void)
+{
+	void *ptr;
+
+	asm ("lea cpuid_info_copy(%%rip), %0"
+	     : "=r" (ptr)
+	     : "p" (&cpuid_info_copy));
+
+	return ptr;
+}
+
+/*
+ * The SEV-SNP Firmware ABI, Revision 0.9, Section 7.1, details the use of
+ * XCR0_IN and XSS_IN to encode multiple versions of 0xD subfunctions 0
+ * and 1 based on the corresponding features enabled by a particular
+ * combination of XCR0 and XSS registers so that a guest can look up the
+ * version corresponding to the features currently enabled in its XCR0/XSS
+ * registers. The only values that differ between these versions/table
+ * entries is the enabled XSAVE area size advertised via EBX.
+ *
+ * While hypervisors may choose to make use of this support, it is more
+ * robust/secure for a guest to simply find the entry corresponding to the
+ * base/legacy XSAVE area size (XCR0=1 or XCR0=3), and then calculate the
+ * XSAVE area size using subfunctions 2 through 64, as documented in APM
+ * Volume 3, Rev 3.31, Appendix E.3.8, which is what is done here.
+ *
+ * Since base/legacy XSAVE area size is documented as 0x240, use that value
+ * directly rather than relying on the base size in the CPUID table.
+ *
+ * Return: XSAVE area size on success, 0 otherwise.
+ */
+static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
+{
+	const struct snp_cpuid_info *cpuid_info = snp_cpuid_info_get_ptr();
+	u64 xfeatures_found = 0;
+	u32 xsave_size = 0x240;
+	int i;
+
+	for (i = 0; i < cpuid_info->count; i++) {
+		const struct snp_cpuid_fn *fn = &cpuid_info->fn[i];
+
+		if (!(fn->eax_in == 0xD && fn->ecx_in > 1 && fn->ecx_in < 64))
+			continue;
+		if (!(xfeatures_en & (BIT_ULL(fn->ecx_in))))
+			continue;
+		if (xfeatures_found & (BIT_ULL(fn->ecx_in)))
+			continue;
+
+		xfeatures_found |= (BIT_ULL(fn->ecx_in));
+
+		if (compacted)
+			xsave_size += fn->eax;
+		else
+			xsave_size = max(xsave_size, fn->eax + fn->ebx);
+	}
+
+	/*
+	 * Either the guest set unsupported XCR0/XSS bits, or the corresponding
+	 * entries in the CPUID table were not present. This is not a valid
+	 * state to be in.
+	 */
+	if (xfeatures_found != (xfeatures_en & GENMASK_ULL(63, 2)))
+		return 0;
+
+	return xsave_size;
+}
+
+static void snp_cpuid_hv(u32 func, u32 subfunc, u32 *eax, u32 *ebx, u32 *ecx,
+			 u32 *edx)
+{
+	/*
+	 * MSR protocol does not support fetching indexed subfunction, but is
+	 * sufficient to handle current fallback cases. Should that change,
+	 * make sure to terminate rather than ignoring the index and grabbing
+	 * random values. If this issue arises in the future, handling can be
+	 * added here to use GHCB-page protocol for cases that occur late
+	 * enough in boot that GHCB page is available.
+	 */
+	if (cpuid_function_is_indexed(func) && subfunc)
+		sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV);
+
+	if (sev_cpuid_hv(func, eax, ebx, ecx, edx))
+		sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV);
+}
+
+static bool
+snp_cpuid_get_validated_func(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
+			     u32 *ecx, u32 *edx)
+{
+	const struct snp_cpuid_info *cpuid_info = snp_cpuid_info_get_ptr();
+	int i;
+
+	for (i = 0; i < cpuid_info->count; i++) {
+		const struct snp_cpuid_fn *fn = &cpuid_info->fn[i];
+
+		if (fn->eax_in != func)
+			continue;
+
+		if (cpuid_function_is_indexed(func) && fn->ecx_in != subfunc)
+			continue;
+
+		/*
+		 * For 0xD subfunctions 0 and 1, only use the entry corresponding
+		 * to the base/legacy XSAVE area size (XCR0=1 or XCR0=3, XSS=0).
+		 * See the comments above snp_cpuid_calc_xsave_size() for more
+		 * details.
+		 */
+		if (fn->eax_in == 0xD && (fn->ecx_in == 0 || fn->ecx_in == 1))
+			if (!(fn->xcr0_in == 1 || fn->xcr0_in == 3) || fn->xss_in)
+				continue;
+
+		*eax = fn->eax;
+		*ebx = fn->ebx;
+		*ecx = fn->ecx;
+		*edx = fn->edx;
+
+		return true;
+	}
+
+	return false;
+}
+
+static bool snp_cpuid_check_range(u32 func)
+{
+	if (func <= cpuid_std_range_max ||
+	    (func >= 0x40000000 && func <= cpuid_hyp_range_max) ||
+	    (func >= 0x80000000 && func <= cpuid_ext_range_max))
+		return true;
+
+	return false;
+}
+
+static int snp_cpuid_postprocess(u32 func, u32 subfunc, u32 *eax, u32 *ebx,
+				 u32 *ecx, u32 *edx)
+{
+	u32 ebx2, ecx2, edx2;
+
+	switch (func) {
+	case 0x1:
+		snp_cpuid_hv(func, subfunc, NULL, &ebx2, NULL, &edx2);
+
+		/* initial APIC ID */
+		*ebx = (ebx2 & GENMASK(31, 24)) | (*ebx & GENMASK(23, 0));
+		/* APIC enabled bit */
+		*edx = (edx2 & BIT(9)) | (*edx & ~BIT(9));
+
+		/* OSXSAVE enabled bit */
+		if (native_read_cr4() & X86_CR4_OSXSAVE)
+			*ecx |= BIT(27);
+		break;
+	case 0x7:
+		/* OSPKE enabled bit */
+		*ecx &= ~BIT(4);
+		if (native_read_cr4() & X86_CR4_PKE)
+			*ecx |= BIT(4);
+		break;
+	case 0xB:
+		/* extended APIC ID */
+		snp_cpuid_hv(func, 0, NULL, NULL, NULL, edx);
+		break;
+	case 0xD: {
+		bool compacted = false;
+		u64 xcr0 = 1, xss = 0;
+		u32 xsave_size;
+
+		if (subfunc != 0 && subfunc != 1)
+			return 0;
+
+		if (native_read_cr4() & X86_CR4_OSXSAVE)
+			xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
+		if (subfunc == 1) {
+			/* Get XSS value if XSAVES is enabled. */
+			if (*eax & BIT(3)) {
+				unsigned long lo, hi;
+
+				asm volatile("rdmsr" : "=a" (lo), "=d" (hi)
+						     : "c" (MSR_IA32_XSS));
+				xss = (hi << 32) | lo;
+			}
+
+			/*
+			 * The PPR and APM aren't clear on what size should be
+			 * encoded in 0xD:0x1:EBX when compaction is not enabled
+			 * by either XSAVEC (feature bit 1) or XSAVES (feature
+			 * bit 3) since SNP-capable hardware has these feature
+			 * bits fixed as 1. KVM sets it to 0 in this case, but
+			 * to avoid this becoming an issue it's safer to simply
+			 * treat this as unsupported for SEV-SNP guests.
+			 */
+			if (!(*eax & (BIT(1) | BIT(3))))
+				return -EINVAL;
+
+			compacted = true;
+		}
+
+		xsave_size = snp_cpuid_calc_xsave_size(xcr0 | xss, compacted);
+		if (!xsave_size)
+			return -EINVAL;
+
+		*ebx = xsave_size;
+		}
+		break;
+	case 0x8000001E:
+		/* extended APIC ID */
+		snp_cpuid_hv(func, subfunc, eax, &ebx2, &ecx2, NULL);
+		/* compute ID */
+		*ebx = (*ebx & GENMASK(31, 8)) | (ebx2 & GENMASK(7, 0));
+		/* node ID */
+		*ecx = (*ecx & GENMASK(31, 8)) | (ecx2 & GENMASK(7, 0));
+		break;
+	default:
+		/* No fix-ups needed, use values as-is. */
+		break;
+	}
+
+	return 0;
+}
+
+/*
+ * Returns -EOPNOTSUPP if feature not enabled. Any other non-zero return value
+ * should be treated as fatal by caller.
+ */
+static int snp_cpuid(u32 func, u32 subfunc, u32 *eax, u32 *ebx, u32 *ecx,
+		     u32 *edx)
+{
+	const struct snp_cpuid_info *cpuid_info = snp_cpuid_info_get_ptr();
+
+	if (!cpuid_info->count)
+		return -EOPNOTSUPP;
+
+	if (!snp_cpuid_get_validated_func(func, subfunc, eax, ebx, ecx, edx)) {
+		/*
+		 * Some hypervisors will avoid keeping track of CPUID entries
+		 * where all values are zero, since they can be handled the
+		 * same as out-of-range values (all-zero). This is useful here
+		 * as well as it allows virtually all guest configurations to
+		 * work using a single SEV-SNP CPUID table.
+		 *
+		 * To allow for this, there is a need to distinguish between
+		 * out-of-range entries and in-range zero entries, since the
+		 * CPUID table entries are only a template that may need to be
+		 * augmented with additional values for things like
+		 * CPU-specific information during post-processing. So if it's
+		 * not in the table, but is still in the valid range, proceed
+		 * with the post-processing. Otherwise, just return zeros.
+		 */
+		*eax = *ebx = *ecx = *edx = 0;
+		if (!snp_cpuid_check_range(func))
+			return 0;
+	}
+
+	return snp_cpuid_postprocess(func, subfunc, eax, ebx, ecx, edx);
+}
+
 /*
  * Boot VC Handler - This is the first VC handler during boot, there is no GHCB
  * page yet, so it only supports the MSR based communication with the
@@ -231,16 +534,26 @@  static int sev_cpuid_hv(u32 func, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx)
  */
 void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
 {
+	unsigned int subfn = lower_bits(regs->cx, 32);
 	unsigned int fn = lower_bits(regs->ax, 32);
 	u32 eax, ebx, ecx, edx;
+	int ret;
 
 	/* Only CPUID is supported via MSR protocol */
 	if (exit_code != SVM_EXIT_CPUID)
 		goto fail;
 
+	ret = snp_cpuid(fn, subfn, &eax, &ebx, &ecx, &edx);
+	if (!ret)
+		goto cpuid_done;
+
+	if (ret != -EOPNOTSUPP)
+		goto fail;
+
 	if (sev_cpuid_hv(fn, &eax, &ebx, &ecx, &edx))
 		goto fail;
 
+cpuid_done:
 	regs->ax = eax;
 	regs->bx = ebx;
 	regs->cx = ecx;
@@ -535,12 +848,35 @@  static enum es_result vc_handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
 	return ret;
 }
 
+static int vc_handle_cpuid_snp(struct pt_regs *regs)
+{
+	u32 eax, ebx, ecx, edx;
+	int ret;
+
+	ret = snp_cpuid(regs->ax, regs->cx, &eax, &ebx, &ecx, &edx);
+	if (!ret) {
+		regs->ax = eax;
+		regs->bx = ebx;
+		regs->cx = ecx;
+		regs->dx = edx;
+	}
+
+	return ret;
+}
+
 static enum es_result vc_handle_cpuid(struct ghcb *ghcb,
 				      struct es_em_ctxt *ctxt)
 {
 	struct pt_regs *regs = ctxt->regs;
 	u32 cr4 = native_read_cr4();
 	enum es_result ret;
+	int snp_cpuid_ret;
+
+	snp_cpuid_ret = vc_handle_cpuid_snp(regs);
+	if (!snp_cpuid_ret)
+		return ES_OK;
+	if (snp_cpuid_ret != -EOPNOTSUPP)
+		return ES_VMM_ERROR;
 
 	ghcb_set_rax(ghcb, regs->ax);
 	ghcb_set_rcx(ghcb, regs->cx);
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 19a8ffcbd83b..c2bc07eb97e9 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -33,6 +33,7 @@ 
 #include <asm/smp.h>
 #include <asm/cpu.h>
 #include <asm/apic.h>
+#include <asm/cpuid.h>
 
 #define DR7_RESET_VALUE        0x400