Message ID | 20220128171804.569796-32-brijesh.singh@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add AMD Secure Nested Paging (SEV-SNP) Guest Support | expand |
On Fri, Jan 28, 2022 at 11:17:52AM -0600, Brijesh Singh wrote: > +/* > + * Individual entries of the SEV-SNP CPUID table, as defined by the SEV-SNP > + * Firmware ABI, Revision 0.9, Section 7.1, Table 14. > + */ > +struct snp_cpuid_fn { > + u32 eax_in; > + u32 ecx_in; > + u64 xcr0_in; > + u64 xss_in; So what's the end result here: -+ u64 __unused; -+ u64 __unused2; ++ u64 xcr0_in; ++ u64 xss_in; those are not unused fields anymore but xcr0 and xss input values? Looking at the FW abi doc, they're only mentioned in "Table 14. CPUID_FUNCTION Structure" that they're XCR0 and XSS at the time of the CPUID execution. But those values are input values to what exactly, guest or firmware? There's a typo in the FW doc, btw: "The guest constructs an MSG_CPUID_REQ message as defined in Table 13. This message contains an array of CPUID function structures as defined in Table 13." That second "Table" is 14 not 13. So, if an array CPUID_FUNCTION[] is passed as part of an MSG_CPUID_REQ command, then, the two _IN variables contain what the guest received from the HV for XCR0 and XSS values. Which means, this is the guest asking the FW whether those values the HV gave the guest are kosher. Am I close? > +static const struct snp_cpuid_info *snp_cpuid_info_get_ptr(void) > +{ > + void *ptr; > + > + asm ("lea cpuid_info_copy(%%rip), %0" > + : "=r" (ptr) Same question as the last time: Why not "=g" and let the compiler decide? > + : "p" (&cpuid_info_copy)); > + > + return ptr; > +} ... > +static bool snp_cpuid_check_range(u32 func) > +{ > + if (func <= cpuid_std_range_max || > + (func >= 0x40000000 && func <= cpuid_hyp_range_max) || > + (func >= 0x80000000 && func <= cpuid_ext_range_max)) > + return true; > + > + return false; > +} > + > +static int snp_cpuid_postprocess(u32 func, u32 subfunc, u32 *eax, u32 *ebx, > + u32 *ecx, u32 *edx) And again, same question as the last time: I'm wondering if you could make everything a lot easier by doing static int snp_cpuid_postprocess(struct cpuid_leaf *leaf) and marshall around that struct cpuid_leaf which contains func, subfunc, e[abcd]x instead of dealing with 6 parameters. Callers of snp_cpuid() can simply allocate it on their stack and hand it in and it is all in sev-shared.c so nicely self-contained... Ok I'm ignoring this patch for now and I'll review it only after you've worked in all comments from the previous review.
On Sat, Feb 05, 2022 at 11:54:01AM +0100, Borislav Petkov wrote: > On Fri, Jan 28, 2022 at 11:17:52AM -0600, Brijesh Singh wrote: > > +/* > > + * Individual entries of the SEV-SNP CPUID table, as defined by the SEV-SNP > > + * Firmware ABI, Revision 0.9, Section 7.1, Table 14. > > + */ > > +struct snp_cpuid_fn { > > + u32 eax_in; > > + u32 ecx_in; > > + u64 xcr0_in; > > + u64 xss_in; > > So what's the end result here: > > -+ u64 __unused; > -+ u64 __unused2; > ++ u64 xcr0_in; > ++ u64 xss_in; > > those are not unused fields anymore but xcr0 and xss input values? > > Looking at the FW abi doc, they're only mentioned in "Table 14. > CPUID_FUNCTION Structure" that they're XCR0 and XSS at the time of the > CPUID execution. > > But those values are input values to what exactly, guest or firmware? These are inputs to firmware, either through a MSG_CPUID_REQ guest message that gets passed along by the HV to firmware for validation, or, in this case, through SNP_LAUNCH_UPDATE when the HV passes the initial CPUID table contents to firmware for validation before it gets mapped into guest memory. > > There's a typo in the FW doc, btw: > > "The guest constructs an MSG_CPUID_REQ message as defined in Table 13. > This message contains an array of CPUID function structures as defined > in Table 13." > > That second "Table" is 14 not 13. > > So, if an array CPUID_FUNCTION[] is passed as part of an MSG_CPUID_REQ > command, then, the two _IN variables contain what the guest received > from the HV for XCR0 and XSS values. Which means, this is the guest > asking the FW whether those values the HV gave the guest are kosher. > > Am I close? Most of that documentation is written in the context of the MSG_CPUID_REQ guest message interface. In that case, a guest has been provided a certain sets of CPUID values by KVM CPUID instruction emulation (or potentially the CPUID table), and wants firmware to validate whether all/some of them are valid for that particular system/configuration. In the case of 0xD subfunction 0x0 and 0x1, the CPUID leaf on real hardware will change depending on what the guest sets in its actual XCR0 control register (APM Volume 2 11.5.2) or IA32_XSS_MSR register, whose combined bits are OR'd to form the XFEATURE_ENABLED_MASK. CPUID leaf 0xD subfunctions 0x0 and 0x1 advertise the size of the XSAVE area required for the features currently enabled in XFEATURE_ENABLED_MASK via the EBX return value from the corresponding CPUID instruction, so for firmware to validate whether that EBX value is valid for a particular system/configuration, it needs to know what the guest's currently XFEATURE_ENABLED_MASK is, so it needs to know what the guest has set in its XCR0 and IA32_XSS_MSR registers. That's where the XCR0_IN and XSS_IN inputs come into play: they are inputs to the firmware so it can do the proper validation based on the guest's current state / XFEATURE_ENABLED_MASK. But that guest message interface will likely be deprecated at some point in favor of the static CPUID table (SNP FW ABI 8.14.2.6), since all the firmware validation has already been done in advance, and any additional validation is redundant, so this series relies purely on the CPUID table. In the case of the CPUID table, we don't know what the guest's XFEATURE_ENABLED_MASK is going to be at the time a CPUID instruction is issued for 0xD:0x0,0xD,0x1, and those values will change as the guest boots and begins enabling additional XSAVE features. The only thing that's certain is that there needs to be at least one CPUID table entry for 0xD:0x0 and 0xD:0x1 corresponding to the initial XFEATURE_ENABLED_MASK, which will correspond to XCR0_IN=1/XSS_IN=0, or XCR0_IN=3/XSS_IN=0 depending on the hypervisor/guest implementation. What to do about other combinations of XCR0_IN/XSS_IN gets a bit weird, since encoding all those permutations would not scale well, and alternative methods of trying to encode them in the table would almost certainly result in lots of different incompatibile guest implementations floating around. So our recommendation has been for hypervisors to encode only the 0xD:0x0/0xD:0x1 entries corresponding to these initial guest states in the CPUID table, and for guests to ignore the XCR0_IN/XSS_IN values completely, and instead have the guest calculate the XSAVE save area size dynamically (EBX). This has multiple benefits: 1) Guests that implement this approach would be compatible even with hypervisors that try to encode additional permutations of XCR0_IN/XSS_IN in the table based on their read of the spec 2) It is just as secure, since it requires only that the guest trust itself and the APM specification, which is the ultimate authority on what firmware would be validationa against 3) The other leaf registers, EAX/ECX/EDX are not dependent on XCR0_IN/XSS_IN/XFEATURE_ENABLED_MASK, and so any 0xD:0x0/0xD:0x1 entry will do as the starting point for the calculation. That's why in v8 these fields were documented as unused1/unused2. But when you suggested that we should enforce 0 in that case, it became clear we should adjust our recommendation to go ahead and check for the specific XCR0_IN={1,3}/XSS_IN=0 values when searching for the base 0xD:0x0/0xD:0x1 entry to use, because a hypervisor that chooses to use XCR0_IN/XSS_IN is not out-of-spec, and should be supported, and there isn't really any read of the spec from the hypervisor perspective where XCR0_IN=0/XSS_IN=0 would be valid. So for v9 I went ahead and added the XCR0_IN/XSS_IN checks, and updated our recommendations (will send an updated version to SNP mailing list by Monday) to not ignore them in the guest. I added some comments in snp_cpuid_get_validated_func() and snp_cpuid_calc_xsave_size() to clarify how XCR0_IN/XSS_IN are being used there, but let me know if those need to be beefed up a bit. -Mike
On Sat, Feb 05, 2022 at 11:54:01AM +0100, Borislav Petkov wrote: > On Fri, Jan 28, 2022 at 11:17:52AM -0600, Brijesh Singh wrote: > > > +static const struct snp_cpuid_info *snp_cpuid_info_get_ptr(void) > > +{ > > + void *ptr; > > + > > + asm ("lea cpuid_info_copy(%%rip), %0" > > + : "=r" (ptr) > > Same question as the last time: > > Why not "=g" and let the compiler decide? The documentation for lea (APM Volume 3 Chapter 3) seemed to require that the destination register be a general purpose register, so it seemed like there was potential for breakage in allowing GCC to use anything otherwise. Maybe GCC is smart enough to figure that out, but since we know the constraint in advance it seemed safer to stick with the current approach of enforcing that constraint. > > > + : "p" (&cpuid_info_copy)); > > + > > + return ptr; > > +} > > ... > > > +static bool snp_cpuid_check_range(u32 func) > > +{ > > + if (func <= cpuid_std_range_max || > > + (func >= 0x40000000 && func <= cpuid_hyp_range_max) || > > + (func >= 0x80000000 && func <= cpuid_ext_range_max)) > > + return true; > > + > > + return false; > > +} > > + > > +static int snp_cpuid_postprocess(u32 func, u32 subfunc, u32 *eax, u32 *ebx, > > + u32 *ecx, u32 *edx) > > And again, same question as the last time: > > I'm wondering if you could make everything a lot easier by doing > > static int snp_cpuid_postprocess(struct cpuid_leaf *leaf) > > and marshall around that struct cpuid_leaf which contains func, subfunc, > e[abcd]x instead of dealing with 6 parameters. > > Callers of snp_cpuid() can simply allocate it on their stack and hand it > in and it is all in sev-shared.c so nicely self-contained... > > Ok I'm ignoring this patch for now and I'll review it only after you've > worked in all comments from the previous review. I did look into it and honestly it just seemed to add more abstractions that made it harder to parse the specific operations taken place here. For instance, post-processing of 0x8000001E entry, we have e{a,b,c,d}x from the CPUID table, then to post process: switch (func): case 0x8000001E: /* extended APIC ID */ snp_cpuid_hv(func, subfunc, eax, &ebx2, &ecx2, NULL); | | | | | | | edx from cpuid table is used as-is | | | | | | | | load HV value into tmp ecx2 | | | load HV value into tmp ebx2 | | replace eax completely with the HV value # then do the remaining fixups for final ebx/ecx /* compute ID */ *ebx = (*ebx & GENMASK(31, 8)) | (ebx2 & GENMASK(7, 0)); /* node ID */ *ecx = (*ecx & GENMASK(31, 8)) | (ecx2 & GENMASK(7, 0)); and it all reads in a clear/familiar way to all the other cpuid()/native_cpuid() users throughout the kernel, and from the persective of someone auditing this from a security perspective that needs to quickly check what registers come from the CPUID table, what registers come from HV, what the final result is, it all just seems very clear and familiar to me. But if we start passing around this higher-level structure that does not do anything other than abstract away e{a,b,c,x} to save on function arguments, things become muddier, and there's more pointer dereference operations and abstractions to sift through. I saved the diff from when I looked into it previously (was just a rough-sketch, not build-tested), and included it below for reference, but it just didn't seem to help with readability to me, which I think is important here since this is probably one of the most security-sensitive piece of the CPUID table handling, since we're dealing with untrusted CPUID sources here and it needs to be clear what exactly is ending up in the E{A,B,C,D} registers we're returning for a particular CPUID instruction: (There are some possible optimizations below, like added a mask parameter so control specifically what EAX/EBX/ECX/EDX field should be modified, possibly reworking snp_cpuid_info structure definitions to re-use the cpuid_leaf struct internally, also modifying __sev_cpuid_hv() to take the cpuid_leaf struct, etc., but none of that really seemed like it would help much with the key issue of readability, so I ended up setting it aside for v9) diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c index b2defbf7e66b..53534a6b1dcc 100644 --- a/arch/x86/kernel/sev-shared.c +++ b/arch/x86/kernel/sev-shared.c @@ -49,6 +49,13 @@ struct snp_cpuid_info { struct snp_cpuid_fn fn[SNP_CPUID_COUNT_MAX]; } __packed; +struct cpuid_leaf { + u32 eax; + u32 ebx; + u32 ecx; + u32 edx; +}; + /* * Since feature negotiation related variables are set early in the boot * process they must reside in the .data section so as not to be zeroed @@ -260,14 +267,14 @@ static int __sev_cpuid_hv(u32 func, int reg_idx, u32 *reg) return 0; } -static int sev_cpuid_hv(u32 func, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx) +static int sev_cpuid_hv(u32 func, struct cpuid_leaf *leaf) { int ret; - ret = __sev_cpuid_hv(func, GHCB_CPUID_REQ_EAX, eax); - ret = ret ? : __sev_cpuid_hv(func, GHCB_CPUID_REQ_EBX, ebx); - ret = ret ? : __sev_cpuid_hv(func, GHCB_CPUID_REQ_ECX, ecx); - ret = ret ? : __sev_cpuid_hv(func, GHCB_CPUID_REQ_EDX, edx); + ret = __sev_cpuid_hv(func, GHCB_CPUID_REQ_EAX, &leaf->eax); + ret = ret ? : __sev_cpuid_hv(func, GHCB_CPUID_REQ_EBX, &leaf->ebx); + ret = ret ? : __sev_cpuid_hv(func, GHCB_CPUID_REQ_ECX, &leaf->ecx); + ret = ret ? : __sev_cpuid_hv(func, GHCB_CPUID_REQ_EDX, &leaf->edx); return ret; } @@ -328,8 +335,7 @@ static int snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted) return xsave_size; } -static void snp_cpuid_hv(u32 func, u32 subfunc, u32 *eax, u32 *ebx, u32 *ecx, - u32 *edx) +static void snp_cpuid_hv(u32 func, u32 subfunc, struct cpuid_leaf *leaf) { /* * MSR protocol does not support fetching indexed subfunction, but is @@ -342,13 +348,12 @@ static void snp_cpuid_hv(u32 func, u32 subfunc, u32 *eax, u32 *ebx, u32 *ecx, if (cpuid_function_is_indexed(func) && subfunc) sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV); - if (sev_cpuid_hv(func, eax, ebx, ecx, edx)) + if (sev_cpuid_hv(func, leaf)) sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV); } static bool -snp_cpuid_get_validated_func(u32 func, u32 subfunc, u32 *eax, u32 *ebx, - u32 *ecx, u32 *edx) +snp_cpuid_get_validated_func(u32 func, u32 subfunc, struct cpuid_leaf *leaf) { const struct snp_cpuid_info *cpuid_info = snp_cpuid_info_get_ptr(); int i; @@ -362,10 +367,10 @@ snp_cpuid_get_validated_func(u32 func, u32 subfunc, u32 *eax, u32 *ebx, if (cpuid_function_is_indexed(func) && fn->ecx_in != subfunc) continue; - *eax = fn->eax; - *ebx = fn->ebx; - *ecx = fn->ecx; - *edx = fn->edx; + leaf->eax = fn->eax; + leaf->ebx = fn->ebx; + leaf->ecx = fn->ecx; + leaf->edx = fn->edx; return true; } @@ -383,33 +388,34 @@ static bool snp_cpuid_check_range(u32 func) return false; } -static int snp_cpuid_postprocess(u32 func, u32 subfunc, u32 *eax, u32 *ebx, - u32 *ecx, u32 *edx) +static int snp_cpuid_postprocess(u32 func, u32 subfunc, struct cpuid_leaf *leaf) { - u32 ebx2, ecx2, edx2; + struct cpuid_leaf leaf_tmp; switch (func) { case 0x1: - snp_cpuid_hv(func, subfunc, NULL, &ebx2, NULL, &edx2); + snp_cpuid_hv(func, subfunc, &leaf_tmp); /* initial APIC ID */ - *ebx = (ebx2 & GENMASK(31, 24)) | (*ebx & GENMASK(23, 0)); + leaf->ebx = (leaf_tmp.ebx & GENMASK(31, 24)) | (leaf->ebx & GENMASK(23, 0)); /* APIC enabled bit */ - *edx = (edx2 & BIT(9)) | (*edx & ~BIT(9)); + leaf->edx = (leaf_tmp.edx & BIT(9)) | (leaf->edx & ~BIT(9)); /* OSXSAVE enabled bit */ if (native_read_cr4() & X86_CR4_OSXSAVE) - *ecx |= BIT(27); + leaf->ecx |= BIT(27); break; case 0x7: /* OSPKE enabled bit */ - *ecx &= ~BIT(4); + leaf->ecx &= ~BIT(4); if (native_read_cr4() & X86_CR4_PKE) - *ecx |= BIT(4); + leaf->ecx |= BIT(4); break; case 0xB: /* extended APIC ID */ - snp_cpuid_hv(func, 0, NULL, NULL, NULL, edx); + snp_cpuid_hv(func, 0, &leaf_tmp); + leaf->edx = leaf_tmp.edx; + break; case 0xD: { bool compacted = false; @@ -440,7 +446,7 @@ static int snp_cpuid_postprocess(u32 func, u32 subfunc, u32 *eax, u32 *ebx, * to avoid this becoming an issue it's safer to simply * treat this as unsupported for SEV-SNP guests. */ - if (!(*eax & (BIT(1) | BIT(3)))) + if (!(leaf->eax & (BIT(1) | BIT(3)))) return -EINVAL; compacted = true; @@ -450,16 +456,17 @@ static int snp_cpuid_postprocess(u32 func, u32 subfunc, u32 *eax, u32 *ebx, if (xsave_size < 0) return -EINVAL; - *ebx = xsave_size; + leaf->ebx = xsave_size; } break; case 0x8000001E: /* extended APIC ID */ - snp_cpuid_hv(func, subfunc, eax, &ebx2, &ecx2, NULL); + snp_cpuid_hv(func, subfunc, &leaf_tmp); + leaf->eax = leaf_tmp.eax; /* compute ID */ - *ebx = (*ebx & GENMASK(31, 8)) | (ebx2 & GENMASK(7, 0)); + leaf->ebx = (leaf->ebx & GENMASK(31, 8)) | (leaf_tmp.ebx & GENMASK(7, 0)); /* node ID */ - *ecx = (*ecx & GENMASK(31, 8)) | (ecx2 & GENMASK(7, 0)); + leaf->ecx = (leaf->ecx & GENMASK(31, 8)) | (leaf_tmp.ecx & GENMASK(7, 0)); break; default: /* No fix-ups needed, use values as-is. */ @@ -473,15 +480,14 @@ static int snp_cpuid_postprocess(u32 func, u32 subfunc, u32 *eax, u32 *ebx, * Returns -EOPNOTSUPP if feature not enabled. Any other return value should be * treated as fatal by caller. */ -static int snp_cpuid(u32 func, u32 subfunc, u32 *eax, u32 *ebx, u32 *ecx, - u32 *edx) +static int snp_cpuid(u32 func, u32 subfunc, struct cpuid_leaf *leaf) { const struct snp_cpuid_info *cpuid_info = snp_cpuid_info_get_ptr(); if (!cpuid_info->count) return -EOPNOTSUPP; - if (!snp_cpuid_get_validated_func(func, subfunc, eax, ebx, ecx, edx)) { + if (!snp_cpuid_get_validated_func(func, subfunc, leaf)) { /* * Some hypervisors will avoid keeping track of CPUID entries * where all values are zero, since they can be handled the @@ -497,12 +503,12 @@ static int snp_cpuid(u32 func, u32 subfunc, u32 *eax, u32 *ebx, u32 *ecx, * not in the table, but is still in the valid range, proceed * with the post-processing. Otherwise, just return zeros. */ - *eax = *ebx = *ecx = *edx = 0; + leaf->eax = leaf->ebx = leaf->ecx = leaf->edx = 0; if (!snp_cpuid_check_range(func)) return 0; } - return snp_cpuid_postprocess(func, subfunc, eax, ebx, ecx, edx); + return snp_cpuid_postprocess(func, subfunc, leaf); } /* @@ -514,28 +520,28 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code) { unsigned int subfn = lower_bits(regs->cx, 32); unsigned int fn = lower_bits(regs->ax, 32); - u32 eax, ebx, ecx, edx; + struct cpuid_leaf *leaf; int ret; /* Only CPUID is supported via MSR protocol */ if (exit_code != SVM_EXIT_CPUID) goto fail; - ret = snp_cpuid(fn, subfn, &eax, &ebx, &ecx, &edx); + ret = snp_cpuid(fn, subfn, leaf); if (!ret) goto cpuid_done; if (ret != -EOPNOTSUPP) goto fail; - if (sev_cpuid_hv(fn, &eax, &ebx, &ecx, &edx)) + if (sev_cpuid_hv(fn, leaf)) goto fail; cpuid_done: - regs->ax = eax; - regs->bx = ebx; - regs->cx = ecx; - regs->dx = edx; + regs->ax = leaf->eax; + regs->bx = leaf->ebx; + regs->cx = leaf->ecx; + regs->dx = leaf->edx; /* * This is a VC handler and the #VC is only raised when SEV-ES is > > -- > Regards/Gruss, > Boris. > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7CMichael.Roth%40amd.com%7C6bc14b8b5b854a38d7c008d9e895da5b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637796552649205409%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Lc5o1tYKrtqUy2h%2B8onmgdaydqUWTlnj7V9rfuBEU0s%3D&reserved=0
First of all, let me give you a very important piece of advice for the future: ignoring review feedback is a very unfriendly thing to do: - if you agree with the feedback, you work it in in the next revision - if you don't agree, you *say* *why* you don't What you should avoid is what you've done - ignore it silently. Because reviewing submitters code is the most ungrateful work around the kernel, and, at the same time, it is the most important one. So please make sure you don't do that in the future when submitting patches for upstream inclusion. I'm sure you can imagine, the ignoring can go both ways. On Sat, Feb 05, 2022 at 10:22:49AM -0600, Michael Roth wrote: > The documentation for lea (APM Volume 3 Chapter 3) seemed to require > that the destination register be a general purpose register, so it > seemed like there was potential for breakage in allowing GCC to use > anything otherwise. There's no such potential: binutils encodes the unstruction operands and what types are allowed. IOW, the assembler knows that there goes a register. > Maybe GCC is smart enough to figure that out, but since we know the > constraint in advance it seemed safer to stick with the current > approach of enforcing that constraint. I guess in this particular case it won't matter whether it is "=r" or "=g" but still... > I did look into it and honestly it just seemed to add more abstractions that > made it harder to parse the specific operations taken place here. For > instance, post-processing of 0x8000001E entry, we have e{a,b,c,d}x from > the CPUID table, then to post process: > > switch (func): > case 0x8000001E: > /* extended APIC ID */ > snp_cpuid_hv(func, subfunc, eax, &ebx2, &ecx2, NULL); Well, my suggestion was to put it *all* in the subleaf struct - not just the regs: struct cpuid_leaf { u32 func; u32 subfunc; u32 eax; u32 ebx; u32 ecx; u32 edx; }; so that the call signature is: snp_cpuid_postprocess(struct cpuid_leaf *leaf) > and it all reads in a clear/familiar way to all the other > cpuid()/native_cpuid() users throughout the kernel, maybe it should read differently *on* *purpose*. Exactly because this is *not* the usual CPUID handling code but CPUID verification code for SNP guests. And please explain to me what's so unclear about leaf->eax vs *eax?! > and from the persective of someone auditing this from a security > perspective that needs to quickly check what registers come from the > CPUID table, what registers come from HV Having a struct which is properly named will actually be beneficial in this case: hv_leaf->eax or even hv_reported_leaf->eax vs *eax Now guess which is which. > But if we start passing around this higher-level structure that does > not do anything other than abstract away e{a,b,c,x} to save on function > arguments, things become muddier, and there's more pointer dereference > operations and abstractions to sift through. Huh?! I'm sorry but I cannot follow that logic here. > I saved the diff from when I looked into it previously (was just a > rough-sketch, not build-tested), and included it below for reference, > but it just didn't seem to help with readability to me, Well, looking at it, the only difference is that the IO is done over a struct instead of separate pointers. And the diff is pretty straight-forward. So no, having a struct cpuid_leaf containing it all is actually better in this case because you know which is which if you name it properly and you have a single pointer argument which you pass around - something which is done all around the kernel.
On Sun, Feb 06, 2022 at 02:37:59PM +0100, Borislav Petkov wrote: > First of all, > > let me give you a very important piece of advice for the future: > ignoring review feedback is a very unfriendly thing to do: > > - if you agree with the feedback, you work it in in the next revision > > - if you don't agree, you *say* *why* you don't > > What you should avoid is what you've done - ignore it silently. Because > reviewing submitters code is the most ungrateful work around the kernel, > and, at the same time, it is the most important one. > > So please make sure you don't do that in the future when submitting > patches for upstream inclusion. I'm sure you can imagine, the ignoring > can go both ways. Absolutely, I know a thorough review is grueling work, and would never want to give the impression that I don't appreciate it. Was just hoping to revisit these in the context of v9 since there were some concerning things in flight WRT the spec handling and I was sort of focused on getting ahead of those in case they involved firmware/spec changes. But I realize that's resulted in a waste of your time and I should have at least provided some indication of where I was with these before your review. Won't happen again. > > On Sat, Feb 05, 2022 at 10:22:49AM -0600, Michael Roth wrote: > > The documentation for lea (APM Volume 3 Chapter 3) seemed to require > > that the destination register be a general purpose register, so it > > seemed like there was potential for breakage in allowing GCC to use > > anything otherwise. > > There's no such potential: binutils encodes the unstruction operands > and what types are allowed. IOW, the assembler knows that there goes a > register. > > > Maybe GCC is smart enough to figure that out, but since we know the > > constraint in advance it seemed safer to stick with the current > > approach of enforcing that constraint. > > I guess in this particular case it won't matter whether it is "=r" or > "=g" but still... > > > I did look into it and honestly it just seemed to add more abstractions that > > made it harder to parse the specific operations taken place here. For > > instance, post-processing of 0x8000001E entry, we have e{a,b,c,d}x from > > the CPUID table, then to post process: > > > > switch (func): > > case 0x8000001E: > > /* extended APIC ID */ > > snp_cpuid_hv(func, subfunc, eax, &ebx2, &ecx2, NULL); > > Well, my suggestion was to put it *all* in the subleaf struct - not just > the regs: > > struct cpuid_leaf { > u32 func; > u32 subfunc; > u32 eax; > u32 ebx; > u32 ecx; > u32 edx; > }; > > so that the call signature is: > > snp_cpuid_postprocess(struct cpuid_leaf *leaf) > > > > and it all reads in a clear/familiar way to all the other > > cpuid()/native_cpuid() users throughout the kernel, > > maybe it should read differently *on* *purpose*. Exactly because this is > *not* the usual CPUID handling code but CPUID verification code for SNP > guests. Well, there's also sev_cpuid_hv(), which really is sort of a variant of cpuid()/native_cpuid() helpers that happens to use the GHCB protocol, and snp_cpuid() gets called at roughly the same callsite in the #VC handler, so it made sense to me to just stick to a similar signature across all of them. But no problem implementing it as you suggested, that was just my reasoning there. > > And please explain to me what's so unclear about leaf->eax vs *eax?! It's not that one is unclear, it's just that, in the context of reading through a function which has a lot of similar/repetitive cases for various leaves: snp_cpuid_hv(fn, subfn, &eax, &ebx2, NULL, NULL) ebx = fixup(ebx2) seems slightly easier to process to me than: snp_cpuid_hv(&leaf_hv) leaf->eax = leaf_hv->eax leaf->ebx = fixup(leaf_hv->ebx) because I already have advance indication from the function call that eax will be overwritten by hv, ebx2 will getting fixed up in subsequent lines, and ecx/edx will be used as is, and don't need to parse multiple lines to discern that. But I know I may very well be biased from staring at the existing implementation so much, and that maybe your approach would be clearer to someone looking at the code who isn't as familiar with that pattern. > > > I saved the diff from when I looked into it previously (was just a > > rough-sketch, not build-tested), and included it below for reference, > > but it just didn't seem to help with readability to me, > > Well, looking at it, the only difference is that the IO is done > over a struct instead of separate pointers. And the diff is pretty > straight-forward. > > So no, having a struct cpuid_leaf containing it all is actually better > in this case because you know which is which if you name it properly and > you have a single pointer argument which you pass around - something > which is done all around the kernel. Ok, will work this in for v10. My plan is to introduce this struct: struct cpuid_leaf { u32 fn; u32 subfn; u32 eax; u32 ebx; u32 ecx; u32 edx; } as part of the patch which introduces sev_cpuid_hv(): x86/sev: Move MSR-based VMGEXITs for CPUID to helper and then utilize that for the function parameters there, here, and any other patches in the SNP code involved with fetching/manipulating cpuid values before returning them to the #VC handler. Thanks! -Mike > > -- > Regards/Gruss, > Boris. > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7Cmichael.roth%40amd.com%7Cd409d10fc3cc41c4cbfb08d9e975ebfc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637797515011518153%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=%2FDjnNrQ7a8%2F6eI%2FRVJJ2vmLEpeDaAeDVOJcW9CBnggU%3D&reserved=0
On Mon, Feb 07, 2022 at 09:37:39AM -0600, Michael Roth wrote: > Absolutely, I know a thorough review is grueling work, and would never > want to give the impression that I don't appreciate it. Was just hoping > to revisit these in the context of v9 since there were some concerning > things in flight WRT the spec handling and I was sort of focused on > getting ahead of those in case they involved firmware/spec changes. But > I realize that's resulted in a waste of your time and I should have at > least provided some indication of where I was with these before your > review. Won't happen again. Thanks, that's appreciated. And in case you're wondering, the kernel is the most flexible thing from all parties involved so even if you have to change the spec/fw, fixing the kernel is a lot easier than any of the other things. So make sure you do a good job with the spec/fw - the kernel will be fine. :-) > Ok, will work this in for v10. My plan is to introduce this struct: > > struct cpuid_leaf { > u32 fn; > u32 subfn; > u32 eax; > u32 ebx; > u32 ecx; > u32 edx; > } Ok. > as part of the patch which introduces sev_cpuid_hv(): > > x86/sev: Move MSR-based VMGEXITs for CPUID to helper > > and then utilize that for the function parameters there, here, and any > other patches in the SNP code involved with fetching/manipulating cpuid > values before returning them to the #VC handler. Sounds good. Thx.
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c index 1e5aa6b65025..1b80c1d0ea1f 100644 --- a/arch/x86/boot/compressed/sev.c +++ b/arch/x86/boot/compressed/sev.c @@ -20,6 +20,7 @@ #include <asm/fpu/xcr.h> #include <asm/ptrace.h> #include <asm/svm.h> +#include <asm/cpuid.h> #include "error.h" diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h index 6d90f7c688b1..cd769984e929 100644 --- a/arch/x86/include/asm/sev-common.h +++ b/arch/x86/include/asm/sev-common.h @@ -152,6 +152,8 @@ struct snp_psc_desc { #define GHCB_TERM_PSC 1 /* Page State Change failure */ #define GHCB_TERM_PVALIDATE 2 /* Pvalidate failure */ #define GHCB_TERM_NOT_VMPL0 3 /* SEV-SNP guest is not running at VMPL-0 */ +#define GHCB_TERM_CPUID 4 /* CPUID-validation failure */ +#define GHCB_TERM_CPUID_HV 5 /* CPUID failure during hypervisor fallback */ #define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK) diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c index 633f1f93b6e1..dea9f7b28620 100644 --- a/arch/x86/kernel/sev-shared.c +++ b/arch/x86/kernel/sev-shared.c @@ -14,6 +14,36 @@ #define has_cpuflag(f) boot_cpu_has(f) #endif +/* + * Individual entries of the SEV-SNP CPUID table, as defined by the SEV-SNP + * Firmware ABI, Revision 0.9, Section 7.1, Table 14. + */ +struct snp_cpuid_fn { + u32 eax_in; + u32 ecx_in; + u64 xcr0_in; + u64 xss_in; + u32 eax; + u32 ebx; + u32 ecx; + u32 edx; + u64 __reserved; +} __packed; + +/* + * SEV-SNP CPUID table header, as defined by the SEV-SNP Firmware ABI, + * Revision 0.9, Section 8.14.2.6. Also noted there is the SEV-SNP + * firmware-enforced limit of 64 entries per CPUID table. + */ +#define SNP_CPUID_COUNT_MAX 64 + +struct snp_cpuid_info { + u32 count; + u32 __reserved1; + u64 __reserved2; + struct snp_cpuid_fn fn[SNP_CPUID_COUNT_MAX]; +} __packed; + /* * Since feature negotiation related variables are set early in the boot * process they must reside in the .data section so as not to be zeroed @@ -23,6 +53,19 @@ */ static u16 ghcb_version __ro_after_init; +/* Copy of the SNP firmware's CPUID page. */ +static struct snp_cpuid_info cpuid_info_copy __ro_after_init; + +/* + * These will be initialized based on CPUID table so that non-present + * all-zero leaves (for sparse tables) can be differentiated from + * invalid/out-of-range leaves. This is needed since all-zero leaves + * still need to be post-processed. + */ +static u32 cpuid_std_range_max __ro_after_init; +static u32 cpuid_hyp_range_max __ro_after_init; +static u32 cpuid_ext_range_max __ro_after_init; + static bool __init sev_es_check_cpu_features(void) { if (!has_cpuflag(X86_FEATURE_RDRAND)) { @@ -224,6 +267,266 @@ static int sev_cpuid_hv(u32 func, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx) return ret; } +/* + * This may be called early while still running on the initial identity + * mapping. Use RIP-relative addressing to obtain the correct address + * while running with the initial identity mapping as well as the + * switch-over to kernel virtual addresses later. + */ +static const struct snp_cpuid_info *snp_cpuid_info_get_ptr(void) +{ + void *ptr; + + asm ("lea cpuid_info_copy(%%rip), %0" + : "=r" (ptr) + : "p" (&cpuid_info_copy)); + + return ptr; +} + +/* + * The SEV-SNP Firmware ABI, Revision 0.9, Section 7.1, details the use of + * XCR0_IN and XSS_IN to encode multiple versions of 0xD subfunctions 0 + * and 1 based on the corresponding features enabled by a particular + * combination of XCR0 and XSS registers so that a guest can look up the + * version corresponding to the features currently enabled in its XCR0/XSS + * registers. The only values that differ between these versions/table + * entries is the enabled XSAVE area size advertised via EBX. + * + * While hypervisors may choose to make use of this support, it is more + * robust/secure for a guest to simply find the entry corresponding to the + * base/legacy XSAVE area size (XCR0=1 or XCR0=3), and then calculate the + * XSAVE area size using subfunctions 2 through 64, as documented in APM + * Volume 3, Rev 3.31, Appendix E.3.8, which is what is done here. + * + * Since base/legacy XSAVE area size is documented as 0x240, use that value + * directly rather than relying on the base size in the CPUID table. + * + * Return: XSAVE area size on success, 0 otherwise. + */ +static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted) +{ + const struct snp_cpuid_info *cpuid_info = snp_cpuid_info_get_ptr(); + u64 xfeatures_found = 0; + u32 xsave_size = 0x240; + int i; + + for (i = 0; i < cpuid_info->count; i++) { + const struct snp_cpuid_fn *fn = &cpuid_info->fn[i]; + + if (!(fn->eax_in == 0xD && fn->ecx_in > 1 && fn->ecx_in < 64)) + continue; + if (!(xfeatures_en & (BIT_ULL(fn->ecx_in)))) + continue; + if (xfeatures_found & (BIT_ULL(fn->ecx_in))) + continue; + + xfeatures_found |= (BIT_ULL(fn->ecx_in)); + + if (compacted) + xsave_size += fn->eax; + else + xsave_size = max(xsave_size, fn->eax + fn->ebx); + } + + /* + * Either the guest set unsupported XCR0/XSS bits, or the corresponding + * entries in the CPUID table were not present. This is not a valid + * state to be in. + */ + if (xfeatures_found != (xfeatures_en & GENMASK_ULL(63, 2))) + return 0; + + return xsave_size; +} + +static void snp_cpuid_hv(u32 func, u32 subfunc, u32 *eax, u32 *ebx, u32 *ecx, + u32 *edx) +{ + /* + * MSR protocol does not support fetching indexed subfunction, but is + * sufficient to handle current fallback cases. Should that change, + * make sure to terminate rather than ignoring the index and grabbing + * random values. If this issue arises in the future, handling can be + * added here to use GHCB-page protocol for cases that occur late + * enough in boot that GHCB page is available. + */ + if (cpuid_function_is_indexed(func) && subfunc) + sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV); + + if (sev_cpuid_hv(func, eax, ebx, ecx, edx)) + sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV); +} + +static bool +snp_cpuid_get_validated_func(u32 func, u32 subfunc, u32 *eax, u32 *ebx, + u32 *ecx, u32 *edx) +{ + const struct snp_cpuid_info *cpuid_info = snp_cpuid_info_get_ptr(); + int i; + + for (i = 0; i < cpuid_info->count; i++) { + const struct snp_cpuid_fn *fn = &cpuid_info->fn[i]; + + if (fn->eax_in != func) + continue; + + if (cpuid_function_is_indexed(func) && fn->ecx_in != subfunc) + continue; + + /* + * For 0xD subfunctions 0 and 1, only use the entry corresponding + * to the base/legacy XSAVE area size (XCR0=1 or XCR0=3, XSS=0). + * See the comments above snp_cpuid_calc_xsave_size() for more + * details. + */ + if (fn->eax_in == 0xD && (fn->ecx_in == 0 || fn->ecx_in == 1)) + if (!(fn->xcr0_in == 1 || fn->xcr0_in == 3) || fn->xss_in) + continue; + + *eax = fn->eax; + *ebx = fn->ebx; + *ecx = fn->ecx; + *edx = fn->edx; + + return true; + } + + return false; +} + +static bool snp_cpuid_check_range(u32 func) +{ + if (func <= cpuid_std_range_max || + (func >= 0x40000000 && func <= cpuid_hyp_range_max) || + (func >= 0x80000000 && func <= cpuid_ext_range_max)) + return true; + + return false; +} + +static int snp_cpuid_postprocess(u32 func, u32 subfunc, u32 *eax, u32 *ebx, + u32 *ecx, u32 *edx) +{ + u32 ebx2, ecx2, edx2; + + switch (func) { + case 0x1: + snp_cpuid_hv(func, subfunc, NULL, &ebx2, NULL, &edx2); + + /* initial APIC ID */ + *ebx = (ebx2 & GENMASK(31, 24)) | (*ebx & GENMASK(23, 0)); + /* APIC enabled bit */ + *edx = (edx2 & BIT(9)) | (*edx & ~BIT(9)); + + /* OSXSAVE enabled bit */ + if (native_read_cr4() & X86_CR4_OSXSAVE) + *ecx |= BIT(27); + break; + case 0x7: + /* OSPKE enabled bit */ + *ecx &= ~BIT(4); + if (native_read_cr4() & X86_CR4_PKE) + *ecx |= BIT(4); + break; + case 0xB: + /* extended APIC ID */ + snp_cpuid_hv(func, 0, NULL, NULL, NULL, edx); + break; + case 0xD: { + bool compacted = false; + u64 xcr0 = 1, xss = 0; + u32 xsave_size; + + if (subfunc != 0 && subfunc != 1) + return 0; + + if (native_read_cr4() & X86_CR4_OSXSAVE) + xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK); + if (subfunc == 1) { + /* Get XSS value if XSAVES is enabled. */ + if (*eax & BIT(3)) { + unsigned long lo, hi; + + asm volatile("rdmsr" : "=a" (lo), "=d" (hi) + : "c" (MSR_IA32_XSS)); + xss = (hi << 32) | lo; + } + + /* + * The PPR and APM aren't clear on what size should be + * encoded in 0xD:0x1:EBX when compaction is not enabled + * by either XSAVEC (feature bit 1) or XSAVES (feature + * bit 3) since SNP-capable hardware has these feature + * bits fixed as 1. KVM sets it to 0 in this case, but + * to avoid this becoming an issue it's safer to simply + * treat this as unsupported for SEV-SNP guests. + */ + if (!(*eax & (BIT(1) | BIT(3)))) + return -EINVAL; + + compacted = true; + } + + xsave_size = snp_cpuid_calc_xsave_size(xcr0 | xss, compacted); + if (!xsave_size) + return -EINVAL; + + *ebx = xsave_size; + } + break; + case 0x8000001E: + /* extended APIC ID */ + snp_cpuid_hv(func, subfunc, eax, &ebx2, &ecx2, NULL); + /* compute ID */ + *ebx = (*ebx & GENMASK(31, 8)) | (ebx2 & GENMASK(7, 0)); + /* node ID */ + *ecx = (*ecx & GENMASK(31, 8)) | (ecx2 & GENMASK(7, 0)); + break; + default: + /* No fix-ups needed, use values as-is. */ + break; + } + + return 0; +} + +/* + * Returns -EOPNOTSUPP if feature not enabled. Any other non-zero return value + * should be treated as fatal by caller. + */ +static int snp_cpuid(u32 func, u32 subfunc, u32 *eax, u32 *ebx, u32 *ecx, + u32 *edx) +{ + const struct snp_cpuid_info *cpuid_info = snp_cpuid_info_get_ptr(); + + if (!cpuid_info->count) + return -EOPNOTSUPP; + + if (!snp_cpuid_get_validated_func(func, subfunc, eax, ebx, ecx, edx)) { + /* + * Some hypervisors will avoid keeping track of CPUID entries + * where all values are zero, since they can be handled the + * same as out-of-range values (all-zero). This is useful here + * as well as it allows virtually all guest configurations to + * work using a single SEV-SNP CPUID table. + * + * To allow for this, there is a need to distinguish between + * out-of-range entries and in-range zero entries, since the + * CPUID table entries are only a template that may need to be + * augmented with additional values for things like + * CPU-specific information during post-processing. So if it's + * not in the table, but is still in the valid range, proceed + * with the post-processing. Otherwise, just return zeros. + */ + *eax = *ebx = *ecx = *edx = 0; + if (!snp_cpuid_check_range(func)) + return 0; + } + + return snp_cpuid_postprocess(func, subfunc, eax, ebx, ecx, edx); +} + /* * Boot VC Handler - This is the first VC handler during boot, there is no GHCB * page yet, so it only supports the MSR based communication with the @@ -231,16 +534,26 @@ static int sev_cpuid_hv(u32 func, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx) */ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code) { + unsigned int subfn = lower_bits(regs->cx, 32); unsigned int fn = lower_bits(regs->ax, 32); u32 eax, ebx, ecx, edx; + int ret; /* Only CPUID is supported via MSR protocol */ if (exit_code != SVM_EXIT_CPUID) goto fail; + ret = snp_cpuid(fn, subfn, &eax, &ebx, &ecx, &edx); + if (!ret) + goto cpuid_done; + + if (ret != -EOPNOTSUPP) + goto fail; + if (sev_cpuid_hv(fn, &eax, &ebx, &ecx, &edx)) goto fail; +cpuid_done: regs->ax = eax; regs->bx = ebx; regs->cx = ecx; @@ -535,12 +848,35 @@ static enum es_result vc_handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt) return ret; } +static int vc_handle_cpuid_snp(struct pt_regs *regs) +{ + u32 eax, ebx, ecx, edx; + int ret; + + ret = snp_cpuid(regs->ax, regs->cx, &eax, &ebx, &ecx, &edx); + if (!ret) { + regs->ax = eax; + regs->bx = ebx; + regs->cx = ecx; + regs->dx = edx; + } + + return ret; +} + static enum es_result vc_handle_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt) { struct pt_regs *regs = ctxt->regs; u32 cr4 = native_read_cr4(); enum es_result ret; + int snp_cpuid_ret; + + snp_cpuid_ret = vc_handle_cpuid_snp(regs); + if (!snp_cpuid_ret) + return ES_OK; + if (snp_cpuid_ret != -EOPNOTSUPP) + return ES_VMM_ERROR; ghcb_set_rax(ghcb, regs->ax); ghcb_set_rcx(ghcb, regs->cx); diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index 19a8ffcbd83b..c2bc07eb97e9 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -33,6 +33,7 @@ #include <asm/smp.h> #include <asm/cpu.h> #include <asm/apic.h> +#include <asm/cpuid.h> #define DR7_RESET_VALUE 0x400