Message ID | 20210602140416.23573-15-brijesh.singh@amd.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Herbert Xu |
Headers | show |
Series | Add AMD Secure Nested Paging (SEV-SNP) Guest Support | expand |
On Wed, Jun 02, 2021 at 09:04:08AM -0500, Brijesh Singh wrote: > +/* SNP Page State Change NAE event */ > +#define VMGEXIT_PSC_MAX_ENTRY 253 > + > +struct __packed snp_page_state_header { psc_hdr > + u16 cur_entry; > + u16 end_entry; > + u32 reserved; > +}; > + > +struct __packed snp_page_state_entry { psc_entry > + u64 cur_page : 12, > + gfn : 40, > + operation : 4, > + pagesize : 1, > + reserved : 7; > +}; > + > +struct __packed snp_page_state_change { snp_psc_desc or so. > + struct snp_page_state_header header; > + struct snp_page_state_entry entry[VMGEXIT_PSC_MAX_ENTRY]; > +}; Which would make this struct a lot more readable: struct __packed snp_psc_desc { struct psc_hdr hdr; struct psc_entry entries[VMGEXIT_PSC_MAX_ENTRY]; > diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c > index 6e9b45bb38ab..4847ac81cca3 100644 > --- a/arch/x86/kernel/sev.c > +++ b/arch/x86/kernel/sev.c > @@ -637,6 +637,113 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op) > WARN(1, "invalid memory op %d\n", op); > } > > +static int page_state_vmgexit(struct ghcb *ghcb, struct snp_page_state_change *data) vmgexit_psc > +{ > + struct snp_page_state_header *hdr; > + int ret = 0; > + > + hdr = &data->header; Make sure to verify that snp_page_state_header.reserved field is always 0 before working more on the header so that people don't put stuff in there which you cannot change later because it becomes ABI or whatnot. Ditto for the other reserved fields. > + > + /* > + * As per the GHCB specification, the hypervisor can resume the guest before > + * processing all the entries. The loop checks whether all the entries are s/The loop checks/Check/ > + * processed. If not, then keep retrying. What guarantees that that loop will terminate eventually? > + */ > + while (hdr->cur_entry <= hdr->end_entry) { I see that "[t]he hypervisor should ensure that cur_entry and end_entry represent values within the limits of the GHCB Shared Buffer." but let's sanity-check that HV here too. We don't trust it, remember? :) > + > + ghcb_set_sw_scratch(ghcb, (u64)__pa(data)); > + > + ret = sev_es_ghcb_hv_call(ghcb, NULL, SVM_VMGEXIT_PSC, 0, 0); > + > + /* Page State Change VMGEXIT can pass error code through exit_info_2. */ > + if (WARN(ret || ghcb->save.sw_exit_info_2, > + "SEV-SNP: page state change failed ret=%d exit_info_2=%llx\n", > + ret, ghcb->save.sw_exit_info_2)) > + return 1; > + } > + > + return 0; > +} > + > +static void set_page_state(unsigned long vaddr, unsigned int npages, int op) > +{ > + struct snp_page_state_change *data; > + struct snp_page_state_header *hdr; > + struct snp_page_state_entry *e; > + unsigned long vaddr_end; > + struct ghcb_state state; > + struct ghcb *ghcb; > + int idx; > + > + vaddr = vaddr & PAGE_MASK; > + vaddr_end = vaddr + (npages << PAGE_SHIFT); Move those... > + > + ghcb = sev_es_get_ghcb(&state); > + if (unlikely(!ghcb)) > + panic("SEV-SNP: Failed to get GHCB\n"); <--- ... here. > + > + data = (struct snp_page_state_change *)ghcb->shared_buffer; > + hdr = &data->header; > + > + while (vaddr < vaddr_end) { > + e = data->entry; > + memset(data, 0, sizeof(*data)); > + > + for (idx = 0; idx < VMGEXIT_PSC_MAX_ENTRY; idx++, e++) { > + unsigned long pfn; > + > + if (is_vmalloc_addr((void *)vaddr)) > + pfn = vmalloc_to_pfn((void *)vaddr); > + else > + pfn = __pa(vaddr) >> PAGE_SHIFT; > + > + e->gfn = pfn; > + e->operation = op; > + hdr->end_entry = idx; > + > + /* > + * The GHCB specification provides the flexibility to > + * use either 4K or 2MB page size in the RMP table. > + * The current SNP support does not keep track of the > + * page size used in the RMP table. To avoid the > + * overlap request, use the 4K page size in the RMP > + * table. > + */ > + e->pagesize = RMP_PG_SIZE_4K; > + vaddr = vaddr + PAGE_SIZE; Please put that e++; here. It took me a while to find it hidden at the end of the loop and was scratching my head as to why are we overwriting e-> everytime. > + > + if (vaddr >= vaddr_end) > + break; Instead of this silly check here, you can compute the range starting at vaddr, VMGEXIT_PSC_MAX_ENTRY pages worth, carve out that second for-loop in a helper called __set_page_state() which does the data preparation and does the vmgexit at the end. Then the outer loop does only the computation and calls that helper. Thx.
On 6/11/21 4:44 AM, Borislav Petkov wrote: > On Wed, Jun 02, 2021 at 09:04:08AM -0500, Brijesh Singh wrote: >> +/* SNP Page State Change NAE event */ >> +#define VMGEXIT_PSC_MAX_ENTRY 253 >> + >> +struct __packed snp_page_state_header { > psc_hdr Noted. >> + u16 cur_entry; >> + u16 end_entry; >> + u32 reserved; >> +}; >> + >> +struct __packed snp_page_state_entry { > psc_entry Noted. > >> + u64 cur_page : 12, >> + gfn : 40, >> + operation : 4, >> + pagesize : 1, >> + reserved : 7; >> +}; >> + >> +struct __packed snp_page_state_change { > snp_psc_desc > > or so. Noted. > >> + struct snp_page_state_header header; >> + struct snp_page_state_entry entry[VMGEXIT_PSC_MAX_ENTRY]; >> +}; > Which would make this struct a lot more readable: > > struct __packed snp_psc_desc { > struct psc_hdr hdr; > struct psc_entry entries[VMGEXIT_PSC_MAX_ENTRY]; > Agreed. >> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c >> index 6e9b45bb38ab..4847ac81cca3 100644 >> --- a/arch/x86/kernel/sev.c >> +++ b/arch/x86/kernel/sev.c >> @@ -637,6 +637,113 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op) >> WARN(1, "invalid memory op %d\n", op); >> } >> >> +static int page_state_vmgexit(struct ghcb *ghcb, struct snp_page_state_change *data) > vmgexit_psc Noted. >> +{ >> + struct snp_page_state_header *hdr; >> + int ret = 0; >> + >> + hdr = &data->header; > Make sure to verify that snp_page_state_header.reserved field is always > 0 before working more on the header so that people don't put stuff in > there which you cannot change later because it becomes ABI or whatnot. > Ditto for the other reserved fields. > Good point, let me go through both the hypervisor and guest to make sure that reserved fields are all zero (as defined by the GHCB spec). >> + >> + /* >> + * As per the GHCB specification, the hypervisor can resume the guest before >> + * processing all the entries. The loop checks whether all the entries are > s/The loop checks/Check/ Noted. > >> + * processed. If not, then keep retrying. > What guarantees that that loop will terminate eventually? Guest OS depend on the hypervisor to assist in this operation. The loop will terminate only after the hypervisor completes the requested operation. Guest is not protecting itself from DoS type of attack. A guest should not proceed until hypervisor performs the request page state change in the RMP table. >> + */ >> + while (hdr->cur_entry <= hdr->end_entry) { > I see that "[t]he hypervisor should ensure that cur_entry and end_entry > represent values within the limits of the GHCB Shared Buffer." but let's > sanity-check that HV here too. We don't trust it, remember? :) Let me understand, are you saying that hypervisor could trick us into believing that page state change completed without actually changing it ? >> + >> + ghcb_set_sw_scratch(ghcb, (u64)__pa(data)); >> + >> + ret = sev_es_ghcb_hv_call(ghcb, NULL, SVM_VMGEXIT_PSC, 0, 0); >> + >> + /* Page State Change VMGEXIT can pass error code through exit_info_2. */ >> + if (WARN(ret || ghcb->save.sw_exit_info_2, >> + "SEV-SNP: page state change failed ret=%d exit_info_2=%llx\n", >> + ret, ghcb->save.sw_exit_info_2)) >> + return 1; >> + } >> + >> + return 0; >> +} >> + >> +static void set_page_state(unsigned long vaddr, unsigned int npages, int op) >> +{ >> + struct snp_page_state_change *data; >> + struct snp_page_state_header *hdr; >> + struct snp_page_state_entry *e; >> + unsigned long vaddr_end; >> + struct ghcb_state state; >> + struct ghcb *ghcb; >> + int idx; >> + >> + vaddr = vaddr & PAGE_MASK; >> + vaddr_end = vaddr + (npages << PAGE_SHIFT); > Move those... > >> + >> + ghcb = sev_es_get_ghcb(&state); >> + if (unlikely(!ghcb)) >> + panic("SEV-SNP: Failed to get GHCB\n"); > <--- ... here. Noted. > >> + >> + data = (struct snp_page_state_change *)ghcb->shared_buffer; >> + hdr = &data->header; >> + >> + while (vaddr < vaddr_end) { >> + e = data->entry; >> + memset(data, 0, sizeof(*data)); >> + >> + for (idx = 0; idx < VMGEXIT_PSC_MAX_ENTRY; idx++, e++) { >> + unsigned long pfn; >> + >> + if (is_vmalloc_addr((void *)vaddr)) >> + pfn = vmalloc_to_pfn((void *)vaddr); >> + else >> + pfn = __pa(vaddr) >> PAGE_SHIFT; >> + >> + e->gfn = pfn; >> + e->operation = op; >> + hdr->end_entry = idx; >> + >> + /* >> + * The GHCB specification provides the flexibility to >> + * use either 4K or 2MB page size in the RMP table. >> + * The current SNP support does not keep track of the >> + * page size used in the RMP table. To avoid the >> + * overlap request, use the 4K page size in the RMP >> + * table. >> + */ >> + e->pagesize = RMP_PG_SIZE_4K; >> + vaddr = vaddr + PAGE_SIZE; > Please put that > e++; > > here. > > It took me a while to find it hidden at the end of the loop and was > scratching my head as to why are we overwriting e-> everytime. Ah, sure I will do it. >> + >> + if (vaddr >= vaddr_end) >> + break; > Instead of this silly check here, you can compute the range starting at > vaddr, VMGEXIT_PSC_MAX_ENTRY pages worth, carve out that second for-loop > in a helper called > > __set_page_state() > > which does the data preparation and does the vmgexit at the end. > > Then the outer loop does only the computation and calls that helper. Okay, I will look into rearranging the code a bit more to address your feedback. -Brijesh
On Mon, Jun 14, 2021 at 08:05:51AM -0500, Brijesh Singh wrote: > Guest OS depend on the hypervisor to assist in this operation. The loop > will terminate only after the hypervisor completes the requested > operation. Guest is not protecting itself from DoS type of attack. A > guest should not proceed until hypervisor performs the request page > state change in the RMP table. Some of that could be in a comment over that loop, so that it is clear what the guest strategy is. > Let me understand, are you saying that hypervisor could trick us into > believing that page state change completed without actually changing it ? Nah, I'm just saying that you should verify those ->cur_entry and ->end_entry values. Of course the guest doesn't protect itself against DoS types of attacks but this function page_state_vmgexit() here could save ->cur_entry and ->end_entry on function entry and then compare it each time the hypercall returns to make sure HV is not doing some shenanigans with the entries range or even has a bug or so. I.e., it has not changed ->end_entry or ->cur_entry is not going backwards into the buffer. I know, if uncaught here, it probably will explode later but a cheap sanity check like that doesn't hurt to have just in case. Thx.
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h index ae99a8a756fe..86bb185b5ec1 100644 --- a/arch/x86/include/asm/sev-common.h +++ b/arch/x86/include/asm/sev-common.h @@ -62,6 +62,8 @@ #define GHCB_MSR_PSC_REQ 0x014 #define SNP_PAGE_STATE_PRIVATE 1 #define SNP_PAGE_STATE_SHARED 2 +#define SNP_PAGE_STATE_PSMASH 3 +#define SNP_PAGE_STATE_UNSMASH 4 #define GHCB_MSR_PSC_GFN_POS 12 #define GHCB_MSR_PSC_GFN_MASK GENMASK_ULL(39, 0) #define GHCB_MSR_PSC_OP_POS 52 @@ -86,6 +88,28 @@ #define GHCB_MSR_GPA_REG_RESP 0x013 #define GHCB_MSR_GPA_REG_RESP_VAL(v) ((v) >> GHCB_MSR_GPA_REG_VALUE_POS) +/* SNP Page State Change NAE event */ +#define VMGEXIT_PSC_MAX_ENTRY 253 + +struct __packed snp_page_state_header { + u16 cur_entry; + u16 end_entry; + u32 reserved; +}; + +struct __packed snp_page_state_entry { + u64 cur_page : 12, + gfn : 40, + operation : 4, + pagesize : 1, + reserved : 7; +}; + +struct __packed snp_page_state_change { + struct snp_page_state_header header; + struct snp_page_state_entry entry[VMGEXIT_PSC_MAX_ENTRY]; +}; + #define GHCB_MSR_TERM_REQ 0x100 #define GHCB_MSR_TERM_REASON_SET_POS 12 #define GHCB_MSR_TERM_REASON_SET_MASK 0xf diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h index 7c2cb5300e43..e2141fc28058 100644 --- a/arch/x86/include/asm/sev.h +++ b/arch/x86/include/asm/sev.h @@ -114,6 +114,8 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned int npages); void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op); +void snp_set_memory_shared(unsigned long vaddr, unsigned int npages); +void snp_set_memory_private(unsigned long vaddr, unsigned int npages); #else static inline void sev_es_ist_enter(struct pt_regs *regs) { } static inline void sev_es_ist_exit(void) { } @@ -130,6 +132,8 @@ early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned i { } static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op) { } +static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npages) { } +static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npages) { } #endif #endif diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h index 554f75fe013c..41573cf44470 100644 --- a/arch/x86/include/uapi/asm/svm.h +++ b/arch/x86/include/uapi/asm/svm.h @@ -108,6 +108,7 @@ #define SVM_VMGEXIT_AP_JUMP_TABLE 0x80000005 #define SVM_VMGEXIT_SET_AP_JUMP_TABLE 0 #define SVM_VMGEXIT_GET_AP_JUMP_TABLE 1 +#define SVM_VMGEXIT_PSC 0x80000010 #define SVM_VMGEXIT_UNSUPPORTED_EVENT 0x8000ffff #define SVM_EXIT_ERR -1 @@ -215,6 +216,7 @@ { SVM_VMGEXIT_NMI_COMPLETE, "vmgexit_nmi_complete" }, \ { SVM_VMGEXIT_AP_HLT_LOOP, "vmgexit_ap_hlt_loop" }, \ { SVM_VMGEXIT_AP_JUMP_TABLE, "vmgexit_ap_jump_table" }, \ + { SVM_VMGEXIT_PSC, "vmgexit_page_state_change" }, \ { SVM_EXIT_ERR, "invalid_guest_state" } diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index 6e9b45bb38ab..4847ac81cca3 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -637,6 +637,113 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, int op) WARN(1, "invalid memory op %d\n", op); } +static int page_state_vmgexit(struct ghcb *ghcb, struct snp_page_state_change *data) +{ + struct snp_page_state_header *hdr; + int ret = 0; + + hdr = &data->header; + + /* + * As per the GHCB specification, the hypervisor can resume the guest before + * processing all the entries. The loop checks whether all the entries are + * processed. If not, then keep retrying. + */ + while (hdr->cur_entry <= hdr->end_entry) { + + ghcb_set_sw_scratch(ghcb, (u64)__pa(data)); + + ret = sev_es_ghcb_hv_call(ghcb, NULL, SVM_VMGEXIT_PSC, 0, 0); + + /* Page State Change VMGEXIT can pass error code through exit_info_2. */ + if (WARN(ret || ghcb->save.sw_exit_info_2, + "SEV-SNP: page state change failed ret=%d exit_info_2=%llx\n", + ret, ghcb->save.sw_exit_info_2)) + return 1; + } + + return 0; +} + +static void set_page_state(unsigned long vaddr, unsigned int npages, int op) +{ + struct snp_page_state_change *data; + struct snp_page_state_header *hdr; + struct snp_page_state_entry *e; + unsigned long vaddr_end; + struct ghcb_state state; + struct ghcb *ghcb; + int idx; + + vaddr = vaddr & PAGE_MASK; + vaddr_end = vaddr + (npages << PAGE_SHIFT); + + ghcb = sev_es_get_ghcb(&state); + if (unlikely(!ghcb)) + panic("SEV-SNP: Failed to get GHCB\n"); + + data = (struct snp_page_state_change *)ghcb->shared_buffer; + hdr = &data->header; + + while (vaddr < vaddr_end) { + e = data->entry; + memset(data, 0, sizeof(*data)); + + for (idx = 0; idx < VMGEXIT_PSC_MAX_ENTRY; idx++, e++) { + unsigned long pfn; + + if (is_vmalloc_addr((void *)vaddr)) + pfn = vmalloc_to_pfn((void *)vaddr); + else + pfn = __pa(vaddr) >> PAGE_SHIFT; + + e->gfn = pfn; + e->operation = op; + hdr->end_entry = idx; + + /* + * The GHCB specification provides the flexibility to + * use either 4K or 2MB page size in the RMP table. + * The current SNP support does not keep track of the + * page size used in the RMP table. To avoid the + * overlap request, use the 4K page size in the RMP + * table. + */ + e->pagesize = RMP_PG_SIZE_4K; + vaddr = vaddr + PAGE_SIZE; + + if (vaddr >= vaddr_end) + break; + } + + /* Terminate the guest on page state change failure. */ + if (page_state_vmgexit(ghcb, data)) + sev_es_terminate(1, GHCB_TERM_PSC); + } + + sev_es_put_ghcb(&state); +} + +void snp_set_memory_shared(unsigned long vaddr, unsigned int npages) +{ + if (!sev_feature_enabled(SEV_SNP)) + return; + + pvalidate_pages(vaddr, npages, 0); + + set_page_state(vaddr, npages, SNP_PAGE_STATE_SHARED); +} + +void snp_set_memory_private(unsigned long vaddr, unsigned int npages) +{ + if (!sev_feature_enabled(SEV_SNP)) + return; + + set_page_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE); + + pvalidate_pages(vaddr, npages, 1); +} + int sev_es_setup_ap_jump_table(struct real_mode_header *rmh) { u16 startup_cs, startup_ip; diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 156cd235659f..20cd5ebc972f 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -29,6 +29,7 @@ #include <asm/proto.h> #include <asm/memtype.h> #include <asm/set_memory.h> +#include <asm/sev.h> #include "../mm_internal.h" @@ -2009,8 +2010,21 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc) */ cpa_flush(&cpa, !this_cpu_has(X86_FEATURE_SME_COHERENT)); + /* + * To maintain the security guarantees of SEV-SNP guest, invalidate + * the memory before clearing the encryption attribute. + */ + if (!enc) + snp_set_memory_shared(addr, numpages); + ret = __change_page_attr_set_clr(&cpa, 1); + /* + * Now that memory is marked encrypted in the page table, validate it. + */ + if (!ret && enc) + snp_set_memory_private(addr, numpages); + /* * After changing the encryption attribute, we need to flush TLBs again * in case any speculative TLB caching occurred (but no need to flush
The set_memory_{encrypt,decrypt}() are used for changing the pages from decrypted (shared) to encrypted (private) and vice versa. When SEV-SNP is active, the page state transition needs to go through additional steps. If the page is transitioned from shared to private, then perform the following after the encryption attribute is set in the page table: 1. Issue the page state change VMGEXIT to add the memory region in the RMP table. 2. Validate the memory region after the RMP entry is added. To maintain the security guarantees, if the page is transitioned from private to shared, then perform the following before encryption attribute is removed from the page table: 1. Invalidate the page. 2. Issue the page state change VMGEXIT to remove the page from RMP table. To change the page state in the RMP table, use the Page State Change VMGEXIT defined in the GHCB specification. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> --- arch/x86/include/asm/sev-common.h | 24 +++++++ arch/x86/include/asm/sev.h | 4 ++ arch/x86/include/uapi/asm/svm.h | 2 + arch/x86/kernel/sev.c | 107 ++++++++++++++++++++++++++++++ arch/x86/mm/pat/set_memory.c | 14 ++++ 5 files changed, 151 insertions(+)