diff mbox series

[Part2,RFC,v2,08/37] x86/sev: Split the physmap when adding the page in RMP table

Message ID 20210430123822.13825-9-brijesh.singh@amd.com (mailing list archive)
State New, archived
Headers show
Series Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support | expand

Commit Message

Brijesh Singh April 30, 2021, 12:37 p.m. UTC
The integrity guarantee of SEV-SNP is enforced through the RMP table.
The RMP is used in conjuntion with standard x86 and IOMMU page
tables to enforce memory restrictions and page access rights. The
RMP is indexed by system physical address, and is checked at the end
of CPU and IOMMU table walks. The RMP check is enforced as soon as
SEV-SNP is enabled globally in the system. Not every memory access
requires an RMP check. In particular, the read accesses from the
hypervisor do not require RMP checks because the data confidentiality
is already protected via memory encryption. When hardware encounters
an RMP checks failure, it raise a page-fault exception. The RMP bit in
fault error code can be used to determine if the fault was due to an
RMP checks failure.

A write from the hypervisor goes through the RMP checks. When the
hypervisor writes to pages, hardware checks to ensures that the assigned
bit in the RMP is zero (i.e page is shared). If the page table entry that
gives the sPA indicates that the target page size is a large page, then
all RMP entries for the 4KB constituting pages of the target must have the
assigned bit 0. If one of entry does not have assigned bit 0 then hardware
will raise an RMP violation. To resolve it, split the page table entry
leading to target page into 4K.

This poses a challenge in the Linux memory model. The Linux kernel
creates a direct mapping of all the physical memory -- referred to as
the physmap. The physmap may contain a valid mapping of guest owned pages.
During the page table walk, the host access may get into the situation where
one of the pages within the large page is owned by the guest (i.e assigned
bit is set in RMP). A write to a non-guest within the large page will
raise an RMP violation. To workaround it, call set_memory_4k() to split
the physmap before adding the page in the RMP table. This ensures that the
pages added in the RMP table are used as 4K in the physmap.

The spliting of the physmap is a temporary solution until the kernel page
fault handler is improved to split the kernel address on demand. One of the
disadvtange of splitting is that eventually, it will end up breaking down
the entire physmap unless its coalesce back to a large page. I am open to
the suggestation on various approaches we could take to address this problem.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kernel/sev.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Peter Zijlstra May 3, 2021, 3:07 p.m. UTC | #1
On Fri, Apr 30, 2021 at 07:37:53AM -0500, Brijesh Singh wrote:

> This poses a challenge in the Linux memory model. The Linux kernel
> creates a direct mapping of all the physical memory -- referred to as
> the physmap. The physmap may contain a valid mapping of guest owned pages.
> During the page table walk, the host access may get into the situation where
> one of the pages within the large page is owned by the guest (i.e assigned
> bit is set in RMP). A write to a non-guest within the large page will
> raise an RMP violation. To workaround it, call set_memory_4k() to split
> the physmap before adding the page in the RMP table. This ensures that the
> pages added in the RMP table are used as 4K in the physmap.

What's an RMP violation and why are they a problem?

> The spliting of the physmap is a temporary solution until the kernel page
> fault handler is improved to split the kernel address on demand.

How is that an improvement? Fracturing the physmap sucks whichever way
around.

> One of the
> disadvtange of splitting is that eventually, it will end up breaking down
> the entire physmap unless its coalesce back to a large page. I am open to
> the suggestation on various approaches we could take to address this problem.

Have the hardware fracture the TLB entry internally?
Andy Lutomirski May 3, 2021, 3:15 p.m. UTC | #2
On Fri, Apr 30, 2021 at 5:39 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>
> The integrity guarantee of SEV-SNP is enforced through the RMP table.
> The RMP is used in conjuntion with standard x86 and IOMMU page
> tables to enforce memory restrictions and page access rights. The
> RMP is indexed by system physical address, and is checked at the end
> of CPU and IOMMU table walks. The RMP check is enforced as soon as
> SEV-SNP is enabled globally in the system. Not every memory access
> requires an RMP check. In particular, the read accesses from the
> hypervisor do not require RMP checks because the data confidentiality
> is already protected via memory encryption. When hardware encounters
> an RMP checks failure, it raise a page-fault exception. The RMP bit in
> fault error code can be used to determine if the fault was due to an
> RMP checks failure.
>
> A write from the hypervisor goes through the RMP checks. When the
> hypervisor writes to pages, hardware checks to ensures that the assigned
> bit in the RMP is zero (i.e page is shared). If the page table entry that
> gives the sPA indicates that the target page size is a large page, then
> all RMP entries for the 4KB constituting pages of the target must have the
> assigned bit 0. If one of entry does not have assigned bit 0 then hardware
> will raise an RMP violation. To resolve it, split the page table entry
> leading to target page into 4K.
>
> This poses a challenge in the Linux memory model. The Linux kernel
> creates a direct mapping of all the physical memory -- referred to as
> the physmap. The physmap may contain a valid mapping of guest owned pages.
> During the page table walk, the host access may get into the situation where
> one of the pages within the large page is owned by the guest (i.e assigned
> bit is set in RMP). A write to a non-guest within the large page will
> raise an RMP violation. To workaround it, call set_memory_4k() to split
> the physmap before adding the page in the RMP table. This ensures that the
> pages added in the RMP table are used as 4K in the physmap.
>
> The spliting of the physmap is a temporary solution until the kernel page
> fault handler is improved to split the kernel address on demand.

Not happening.  The pages to be split might be critical to fault
handling, e.g. stack, GDT, IDT, etc.

How much performance do we get back if we add a requirement that only
2M pages (hugetlbfs, etc) may be used for private guest memory?
Dave Hansen May 3, 2021, 3:41 p.m. UTC | #3
On 5/3/21 8:15 AM, Andy Lutomirski wrote:
> How much performance do we get back if we add a requirement that only
> 2M pages (hugetlbfs, etc) may be used for private guest memory?

Are you generally asking about the performance overhead of using 4k
pages instead of 2M for the direct map?  We looked at that recently and
pulled together some data:

> https://lore.kernel.org/lkml/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/
Vlastimil Babka May 7, 2021, 11:28 a.m. UTC | #4
On 5/3/21 5:41 PM, Dave Hansen wrote:
> On 5/3/21 8:15 AM, Andy Lutomirski wrote:
>> How much performance do we get back if we add a requirement that only
>> 2M pages (hugetlbfs, etc) may be used for private guest memory?
> 
> Are you generally asking about the performance overhead of using 4k
> pages instead of 2M for the direct map?  We looked at that recently and
> pulled together some data:

IIUC using 2M for private guest memory wouldn't be itself sufficient, as the
guest would also have to share pages with host with 2MB granularity, and that
might be too restrictive?

>> https://lore.kernel.org/lkml/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/
>
diff mbox series

Patch

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index a8a0c6cd22ca..60d62c66778b 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1931,6 +1931,12 @@  int rmpupdate(struct page *page, struct rmpupdate *val)
 	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
 		return -ENXIO;
 
+	ret = set_memory_4k((unsigned long)page_to_virt(page), 1);
+	if (ret) {
+		pr_err("Failed to split physical address 0x%lx (%d)\n", spa, ret);
+		return ret;
+	}
+
 	/* Retry if another processor is modifying the RMP entry. */
 	do {
 		/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */