Message ID | 20210707183616.5620-7-brijesh.singh@amd.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Herbert Xu |
Headers | show |
Series | Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support | expand |
> +int psmash(struct page *page) > +{ > + unsigned long spa = page_to_pfn(page) << PAGE_SHIFT; > + int ret; > + > + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP)) > + return -ENXIO; > + > + /* Retry if another processor is modifying the RMP entry. */ > + do { > + /* Binutils version 2.36 supports the PSMASH mnemonic. */ > + asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF" > + : "=a"(ret) > + : "a"(spa) > + : "memory", "cc"); > + } while (ret == FAIL_INUSE); Should there be some retry limit here for safety? Or do we know that we'll never be stuck in this loop? Ditto for the loop in rmpupdate. > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(psmash); >
On 7/12/21 11:44 AM, Peter Gonda wrote: >> +int psmash(struct page *page) >> +{ >> + unsigned long spa = page_to_pfn(page) << PAGE_SHIFT; >> + int ret; >> + >> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP)) >> + return -ENXIO; >> + >> + /* Retry if another processor is modifying the RMP entry. */ >> + do { >> + /* Binutils version 2.36 supports the PSMASH mnemonic. */ >> + asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF" >> + : "=a"(ret) >> + : "a"(spa) >> + : "memory", "cc"); >> + } while (ret == FAIL_INUSE); > Should there be some retry limit here for safety? Or do we know that > we'll never be stuck in this loop? Ditto for the loop in rmpupdate. It's probably fine to just leave this. While you could *theoretically* lose this race forever, it's unlikely to happen in practice. If it does, you'll get an easy-to-understand softlockup backtrace which should point here pretty quickly. I think TDX has a few of these as well. Most of the "SEAMCALL"s from host to the firmware doing the security enforcement have something like an -EBUSY as well. I believe they just retry forever too.
On Mon, Jul 12, 2021, Dave Hansen wrote: > On 7/12/21 11:44 AM, Peter Gonda wrote: > >> +int psmash(struct page *page) > >> +{ > >> + unsigned long spa = page_to_pfn(page) << PAGE_SHIFT; > >> + int ret; > >> + > >> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP)) > >> + return -ENXIO; > >> + > >> + /* Retry if another processor is modifying the RMP entry. */ > >> + do { > >> + /* Binutils version 2.36 supports the PSMASH mnemonic. */ > >> + asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF" > >> + : "=a"(ret) > >> + : "a"(spa) > >> + : "memory", "cc"); > >> + } while (ret == FAIL_INUSE); > > Should there be some retry limit here for safety? Or do we know that > > we'll never be stuck in this loop? Ditto for the loop in rmpupdate. > > It's probably fine to just leave this. While you could *theoretically* > lose this race forever, it's unlikely to happen in practice. If it > does, you'll get an easy-to-understand softlockup backtrace which should > point here pretty quickly. But should failure here even be tolerated? The TDX cases spin on flows that are _not_ due to (direct) contenion, e.g. a pending interrupt while flushing the cache or lack of randomness when generating a key. In this case, there are two CPUs racing to modify the RMP entry, which implies that the final state of the RMP entry is not deterministic. > I think TDX has a few of these as well. Most of the "SEAMCALL"s from > host to the firmware doing the security enforcement have something like > an -EBUSY as well. I believe they just retry forever too.
On 7/15/21 11:56 AM, Sean Christopherson wrote: >>>> + /* Retry if another processor is modifying the RMP entry. */ >>>> + do { >>>> + /* Binutils version 2.36 supports the PSMASH mnemonic. */ >>>> + asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF" >>>> + : "=a"(ret) >>>> + : "a"(spa) >>>> + : "memory", "cc"); >>>> + } while (ret == FAIL_INUSE); >>> Should there be some retry limit here for safety? Or do we know that >>> we'll never be stuck in this loop? Ditto for the loop in rmpupdate. >> It's probably fine to just leave this. While you could *theoretically* >> lose this race forever, it's unlikely to happen in practice. If it >> does, you'll get an easy-to-understand softlockup backtrace which should >> point here pretty quickly. > But should failure here even be tolerated? The TDX cases spin on flows that are > _not_ due to (direct) contenion, e.g. a pending interrupt while flushing the > cache or lack of randomness when generating a key. In this case, there are two > CPUs racing to modify the RMP entry, which implies that the final state of the > RMP entry is not deterministic. I was envisioning that two different CPUs could try to smash two *different* 4k physical pages, but collide since they share a 2M page. But, in patch 33, this is called via: > + write_lock(&kvm->mmu_lock); > + > + switch (op) { > + case SNP_PAGE_STATE_SHARED: > + rc = snp_make_page_shared(vcpu, gpa, pfn, level); ... Which should make collisions impossible. Did I miss another call-site?
On Thu, Jul 15, 2021, Dave Hansen wrote: > On 7/15/21 11:56 AM, Sean Christopherson wrote: > >>>> + /* Retry if another processor is modifying the RMP entry. */ > >>>> + do { > >>>> + /* Binutils version 2.36 supports the PSMASH mnemonic. */ > >>>> + asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF" > >>>> + : "=a"(ret) > >>>> + : "a"(spa) > >>>> + : "memory", "cc"); > >>>> + } while (ret == FAIL_INUSE); > >>> Should there be some retry limit here for safety? Or do we know that > >>> we'll never be stuck in this loop? Ditto for the loop in rmpupdate. > >> It's probably fine to just leave this. While you could *theoretically* > >> lose this race forever, it's unlikely to happen in practice. If it > >> does, you'll get an easy-to-understand softlockup backtrace which should > >> point here pretty quickly. > > But should failure here even be tolerated? The TDX cases spin on flows that are > > _not_ due to (direct) contenion, e.g. a pending interrupt while flushing the > > cache or lack of randomness when generating a key. In this case, there are two > > CPUs racing to modify the RMP entry, which implies that the final state of the > > RMP entry is not deterministic. > > I was envisioning that two different CPUs could try to smash two > *different* 4k physical pages, but collide since they share > a 2M page. > > But, in patch 33, this is called via: > > > + write_lock(&kvm->mmu_lock); > > + > > + switch (op) { > > + case SNP_PAGE_STATE_SHARED: > > + rc = snp_make_page_shared(vcpu, gpa, pfn, level); > ... > > Which should make collisions impossible. Did I miss another call-site? Ya, there's more, e.g. sev_snp_write_page_begin() and snp_handle_rmp_page_fault(), both of which run without holding mmu_lock. The PSMASH operation isn't too concerning, but the associated RMPUDATE is most definitely a concern, e.g. if two vCPUs are trying to access different variants of a page. It's ok if KVM's "response" in such a situation does weird things to the guest, but one of the two operations should "win", which I don't think is guaranteed if multiple RMP violations are racing. I'll circle back to this patch after I've gone through the KVM MMU changes.
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index 1aed3d53f59f..949efe530319 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -2345,3 +2345,45 @@ struct rmpentry *snp_lookup_page_in_rmptable(struct page *page, int *level) return entry; } EXPORT_SYMBOL_GPL(snp_lookup_page_in_rmptable); + +int psmash(struct page *page) +{ + unsigned long spa = page_to_pfn(page) << PAGE_SHIFT; + int ret; + + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP)) + return -ENXIO; + + /* Retry if another processor is modifying the RMP entry. */ + do { + /* Binutils version 2.36 supports the PSMASH mnemonic. */ + asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF" + : "=a"(ret) + : "a"(spa) + : "memory", "cc"); + } while (ret == FAIL_INUSE); + + return ret; +} +EXPORT_SYMBOL_GPL(psmash); + +int rmpupdate(struct page *page, struct rmpupdate *val) +{ + unsigned long spa = page_to_pfn(page) << PAGE_SHIFT; + int ret; + + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP)) + return -ENXIO; + + /* Retry if another processor is modifying the RMP entry. */ + do { + /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */ + asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE" + : "=a"(ret) + : "a"(spa), "c"((unsigned long)val) + : "memory", "cc"); + } while (ret == FAIL_INUSE); + + return ret; +} +EXPORT_SYMBOL_GPL(rmpupdate); diff --git a/include/linux/sev.h b/include/linux/sev.h index 83c89e999999..bcd4d75d87c8 100644 --- a/include/linux/sev.h +++ b/include/linux/sev.h @@ -39,13 +39,33 @@ struct __packed rmpentry { #define RMP_TO_X86_PG_LEVEL(level) (((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M) +struct rmpupdate { + u64 gpa; + u8 assigned; + u8 pagesize; + u8 immutable; + u8 rsvd; + u32 asid; +} __packed; + + +/* + * The psmash() and rmpupdate() returns FAIL_INUSE when another processor is + * modifying the RMP entry. + */ +#define FAIL_INUSE 3 + #ifdef CONFIG_AMD_MEM_ENCRYPT struct rmpentry *snp_lookup_page_in_rmptable(struct page *page, int *level); +int psmash(struct page *page); +int rmpupdate(struct page *page, struct rmpupdate *e); #else static inline struct rmpentry *snp_lookup_page_in_rmptable(struct page *page, int *level) { return NULL; } +static inline int psmash(struct page *page) { return -ENXIO; } +static inline int rmpupdate(struct page *page, struct rmpupdate *e) { return -ENXIO; } #endif /* CONFIG_AMD_MEM_ENCRYPT */ #endif /* __LINUX_SEV_H */
The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The hypervisor will use the instruction to add pages to the RMP table. See APM3 for details on the instruction operations. The PSMASH instruction expands a 2MB RMP entry into a corresponding set of contiguous 4KB-Page RMP entries. The hypervisor will use this instruction to adjust the RMP entry without invalidating the previous RMP entry. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> --- arch/x86/kernel/sev.c | 42 ++++++++++++++++++++++++++++++++++++++++++ include/linux/sev.h | 20 ++++++++++++++++++++ 2 files changed, 62 insertions(+)