diff mbox series

[Part2,RFC,v4,06/40] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction

Message ID 20210707183616.5620-7-brijesh.singh@amd.com (mailing list archive)
State New
Headers show
Series Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support | expand

Commit Message

Brijesh Singh July 7, 2021, 6:35 p.m. UTC
The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The
hypervisor will use the instruction to add pages to the RMP table. See
APM3 for details on the instruction operations.

The PSMASH instruction expands a 2MB RMP entry into a corresponding set of
contiguous 4KB-Page RMP entries. The hypervisor will use this instruction
to adjust the RMP entry without invalidating the previous RMP entry.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kernel/sev.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/sev.h   | 20 ++++++++++++++++++++
 2 files changed, 62 insertions(+)

Comments

Peter Gonda July 12, 2021, 6:44 p.m. UTC | #1
> +int psmash(struct page *page)
> +{
> +       unsigned long spa = page_to_pfn(page) << PAGE_SHIFT;
> +       int ret;
> +
> +       if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> +               return -ENXIO;
> +
> +       /* Retry if another processor is modifying the RMP entry. */
> +       do {
> +               /* Binutils version 2.36 supports the PSMASH mnemonic. */
> +               asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
> +                             : "=a"(ret)
> +                             : "a"(spa)
> +                             : "memory", "cc");
> +       } while (ret == FAIL_INUSE);

Should there be some retry limit here for safety? Or do we know that
we'll never be stuck in this loop? Ditto for the loop in rmpupdate.

> +
> +       return ret;
> +}
> +EXPORT_SYMBOL_GPL(psmash);
>
Dave Hansen July 12, 2021, 7 p.m. UTC | #2
On 7/12/21 11:44 AM, Peter Gonda wrote:
>> +int psmash(struct page *page)
>> +{
>> +       unsigned long spa = page_to_pfn(page) << PAGE_SHIFT;
>> +       int ret;
>> +
>> +       if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
>> +               return -ENXIO;
>> +
>> +       /* Retry if another processor is modifying the RMP entry. */
>> +       do {
>> +               /* Binutils version 2.36 supports the PSMASH mnemonic. */
>> +               asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
>> +                             : "=a"(ret)
>> +                             : "a"(spa)
>> +                             : "memory", "cc");
>> +       } while (ret == FAIL_INUSE);
> Should there be some retry limit here for safety? Or do we know that
> we'll never be stuck in this loop? Ditto for the loop in rmpupdate.

It's probably fine to just leave this.  While you could *theoretically*
lose this race forever, it's unlikely to happen in practice.  If it
does, you'll get an easy-to-understand softlockup backtrace which should
point here pretty quickly.

I think TDX has a few of these as well.  Most of the "SEAMCALL"s from
host to the firmware doing the security enforcement have something like
an -EBUSY as well.  I believe they just retry forever too.
Sean Christopherson July 15, 2021, 6:56 p.m. UTC | #3
On Mon, Jul 12, 2021, Dave Hansen wrote:
> On 7/12/21 11:44 AM, Peter Gonda wrote:
> >> +int psmash(struct page *page)
> >> +{
> >> +       unsigned long spa = page_to_pfn(page) << PAGE_SHIFT;
> >> +       int ret;
> >> +
> >> +       if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> >> +               return -ENXIO;
> >> +
> >> +       /* Retry if another processor is modifying the RMP entry. */
> >> +       do {
> >> +               /* Binutils version 2.36 supports the PSMASH mnemonic. */
> >> +               asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
> >> +                             : "=a"(ret)
> >> +                             : "a"(spa)
> >> +                             : "memory", "cc");
> >> +       } while (ret == FAIL_INUSE);
> > Should there be some retry limit here for safety? Or do we know that
> > we'll never be stuck in this loop? Ditto for the loop in rmpupdate.
> 
> It's probably fine to just leave this.  While you could *theoretically*
> lose this race forever, it's unlikely to happen in practice.  If it
> does, you'll get an easy-to-understand softlockup backtrace which should
> point here pretty quickly.

But should failure here even be tolerated?  The TDX cases spin on flows that are
_not_ due to (direct) contenion, e.g. a pending interrupt while flushing the
cache or lack of randomness when generating a key.  In this case, there are two
CPUs racing to modify the RMP entry, which implies that the final state of the
RMP entry is not deterministic.

> I think TDX has a few of these as well.  Most of the "SEAMCALL"s from
> host to the firmware doing the security enforcement have something like
> an -EBUSY as well.  I believe they just retry forever too.
Dave Hansen July 15, 2021, 7:08 p.m. UTC | #4
On 7/15/21 11:56 AM, Sean Christopherson wrote:
>>>> +       /* Retry if another processor is modifying the RMP entry. */
>>>> +       do {
>>>> +               /* Binutils version 2.36 supports the PSMASH mnemonic. */
>>>> +               asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
>>>> +                             : "=a"(ret)
>>>> +                             : "a"(spa)
>>>> +                             : "memory", "cc");
>>>> +       } while (ret == FAIL_INUSE);
>>> Should there be some retry limit here for safety? Or do we know that
>>> we'll never be stuck in this loop? Ditto for the loop in rmpupdate.
>> It's probably fine to just leave this.  While you could *theoretically*
>> lose this race forever, it's unlikely to happen in practice.  If it
>> does, you'll get an easy-to-understand softlockup backtrace which should
>> point here pretty quickly.
> But should failure here even be tolerated?  The TDX cases spin on flows that are
> _not_ due to (direct) contenion, e.g. a pending interrupt while flushing the
> cache or lack of randomness when generating a key.  In this case, there are two
> CPUs racing to modify the RMP entry, which implies that the final state of the
> RMP entry is not deterministic.

I was envisioning that two different CPUs could try to smash two
*different* 4k physical pages, but collide since they share
a 2M page.

But, in patch 33, this is called via:

> +		write_lock(&kvm->mmu_lock);
> +
> +		switch (op) {
> +		case SNP_PAGE_STATE_SHARED:
> +			rc = snp_make_page_shared(vcpu, gpa, pfn, level);
...

Which should make collisions impossible.  Did I miss another call-site?
Sean Christopherson July 15, 2021, 7:18 p.m. UTC | #5
On Thu, Jul 15, 2021, Dave Hansen wrote:
> On 7/15/21 11:56 AM, Sean Christopherson wrote:
> >>>> +       /* Retry if another processor is modifying the RMP entry. */
> >>>> +       do {
> >>>> +               /* Binutils version 2.36 supports the PSMASH mnemonic. */
> >>>> +               asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
> >>>> +                             : "=a"(ret)
> >>>> +                             : "a"(spa)
> >>>> +                             : "memory", "cc");
> >>>> +       } while (ret == FAIL_INUSE);
> >>> Should there be some retry limit here for safety? Or do we know that
> >>> we'll never be stuck in this loop? Ditto for the loop in rmpupdate.
> >> It's probably fine to just leave this.  While you could *theoretically*
> >> lose this race forever, it's unlikely to happen in practice.  If it
> >> does, you'll get an easy-to-understand softlockup backtrace which should
> >> point here pretty quickly.
> > But should failure here even be tolerated?  The TDX cases spin on flows that are
> > _not_ due to (direct) contenion, e.g. a pending interrupt while flushing the
> > cache or lack of randomness when generating a key.  In this case, there are two
> > CPUs racing to modify the RMP entry, which implies that the final state of the
> > RMP entry is not deterministic.
> 
> I was envisioning that two different CPUs could try to smash two
> *different* 4k physical pages, but collide since they share
> a 2M page.
> 
> But, in patch 33, this is called via:
> 
> > +		write_lock(&kvm->mmu_lock);
> > +
> > +		switch (op) {
> > +		case SNP_PAGE_STATE_SHARED:
> > +			rc = snp_make_page_shared(vcpu, gpa, pfn, level);
> ...
> 
> Which should make collisions impossible.  Did I miss another call-site?

Ya, there's more, e.g. sev_snp_write_page_begin() and snp_handle_rmp_page_fault(),
both of which run without holding mmu_lock.  The PSMASH operation isn't too
concerning, but the associated RMPUDATE is most definitely a concern, e.g. if two
vCPUs are trying to access different variants of a page.  It's ok if KVM's
"response" in such a situation does weird things to the guest, but one of the
two operations should "win", which I don't think is guaranteed if multiple RMP
violations are racing.

I'll circle back to this patch after I've gone through the KVM MMU changes.
diff mbox series

Patch

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 1aed3d53f59f..949efe530319 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2345,3 +2345,45 @@  struct rmpentry *snp_lookup_page_in_rmptable(struct page *page, int *level)
 	return entry;
 }
 EXPORT_SYMBOL_GPL(snp_lookup_page_in_rmptable);
+
+int psmash(struct page *page)
+{
+	unsigned long spa = page_to_pfn(page) << PAGE_SHIFT;
+	int ret;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return -ENXIO;
+
+	/* Retry if another processor is modifying the RMP entry. */
+	do {
+		/* Binutils version 2.36 supports the PSMASH mnemonic. */
+		asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
+			      : "=a"(ret)
+			      : "a"(spa)
+			      : "memory", "cc");
+	} while (ret == FAIL_INUSE);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(psmash);
+
+int rmpupdate(struct page *page, struct rmpupdate *val)
+{
+	unsigned long spa = page_to_pfn(page) << PAGE_SHIFT;
+	int ret;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return -ENXIO;
+
+	/* Retry if another processor is modifying the RMP entry. */
+	do {
+		/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
+		asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
+			     : "=a"(ret)
+			     : "a"(spa), "c"((unsigned long)val)
+			     : "memory", "cc");
+	} while (ret == FAIL_INUSE);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(rmpupdate);
diff --git a/include/linux/sev.h b/include/linux/sev.h
index 83c89e999999..bcd4d75d87c8 100644
--- a/include/linux/sev.h
+++ b/include/linux/sev.h
@@ -39,13 +39,33 @@  struct __packed rmpentry {
 
 #define RMP_TO_X86_PG_LEVEL(level)	(((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
 
+struct rmpupdate {
+	u64 gpa;
+	u8 assigned;
+	u8 pagesize;
+	u8 immutable;
+	u8 rsvd;
+	u32 asid;
+} __packed;
+
+
+/*
+ * The psmash() and rmpupdate() returns FAIL_INUSE when another processor is
+ * modifying the RMP entry.
+ */
+#define FAIL_INUSE              3
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 struct rmpentry *snp_lookup_page_in_rmptable(struct page *page, int *level);
+int psmash(struct page *page);
+int rmpupdate(struct page *page, struct rmpupdate *e);
 #else
 static inline struct rmpentry *snp_lookup_page_in_rmptable(struct page *page, int *level)
 {
 	return NULL;
 }
+static inline int psmash(struct page *page) { return -ENXIO; }
+static inline int rmpupdate(struct page *page, struct rmpupdate *e) { return -ENXIO; }
 
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 #endif /* __LINUX_SEV_H */