diff mbox series

[03/25] x86/sgx: Support VMA permissions exceeding enclave permissions

Message ID 7e622156315c9c22c3ef84a7c0aeb01b5c001ff9.1638381245.git.reinette.chatre@intel.com (mailing list archive)
State New, archived
Headers show
Series x86/sgx and selftests/sgx: Support SGX2 | expand

Commit Message

Reinette Chatre Dec. 1, 2021, 7:23 p.m. UTC
=== Summary ===

An SGX VMA can only be created if its permissions are the same or
weaker than the Enclave Page Cache Map (EPCM) permissions. After VMA
creation this rule continues to be enforced by the page fault handler.

With SGX2 the EPCM permissions of a page can change after VMA
creation resulting in the VMA exceeding the EPCM permissions and the
page fault handler incorrectly blocking access.

Enable the VMA's pages to remain accessible while ensuring that
the page table entries are installed to match the EPCM permissions
without exceeding the VMA permissions.

=== Full Changelog ===

An SGX enclave is an area of memory where parts of an application
can reside. First an enclave is created and loaded (from
non-enclave memory) with the code and data of an application,
then user space can map (mmap()) the enclave memory to
be able to enter the enclave at its defined entry points for
execution within it.

The hardware maintains a secure structure, the Enclave Page Cache Map
(EPCM), that tracks the contents of the enclave. Of interest here is
its tracking of the enclave page permissions. When a page is loaded
into the enclave its permissions are specified and recorded in the
EPCM. In parallel the OS maintains the page table permissions and
the rule is that page table permissions are never allowed to exceed
EPCM permissions.

A new mapping (mmap()) of enclave memory can only succeed if the
mapping has the same or weaker permissions than the permissions that
were vetted during enclave creation. This is enforced by
sgx_encl_may_map() that is called on the mmap() as well as mprotect()
paths. This permission verification remains.

One feature of SGX2 is to support the modification of enclave page
permissions after enclave creation. Enclave pages may thus already be
mapped at the time their enclave permissions are changed resulting
in the VMA's permissions potentially exceeding the enclave page
permissions.

Enable permissions of existing VMAs to exceed enclave page permissions
in preparation for dynamic enclave page permission changes.
New VMAs that attempt to exceed enclave page permissions continue to be
unsupported.

Reasons why permissions of existing VMAs are allowed to exceed enclave
page permissions instead of dynamically changing VMA permissions when
enclave page permissions change are:
1) Changing VMA permissions involve splitting VMAs which is an operation
   that can fail. Additionally the actual changing of page permissions
   of a range of pages could also fail on any of the pages involved.
   Handling these error cases causes problems. For example, if an
   enclave page permission change fails and the VMA has already been
   split then it is not possible to undo the VMA split nor possible to
   undo the enclave page permission changes that did succeed before the
   failure.
2) The OS has little insight into the user space where EPCM permissions
   are controlled from. For example, a RW page may be made RO just
   before it is made RX and splitting the VMAs while the VMAs may change
   soon is unnecessary.

Remove the extra permission check called on a page fault
(vm_operations_struct->fault) or during debugging
(vm_operations_struct->access) when loading the enclave page from swap
that ensures that the VMA permissions do not exceed the enclave
permissions. Since a VMA could only exist if it passed the original
permission checks during mmap() and a VMA may indeed exceed the page
permissions this extra permission check is no longer appropriate.

With the permission check removed, ensure that page table entries do
not blindly inherit the VMA permissions but instead the permissions
that the VMA and enclave agree on. PTEs for writable pages (from VMA and
enclave perspective) are installed with the writable bit set, reducing
the need for this additional flow to the permission mismatch cases
handled next.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/sgx/encl.c | 38 ++++++++++++++++++----------------
 1 file changed, 20 insertions(+), 18 deletions(-)

Comments

Jarkko Sakkinen Dec. 4, 2021, 10:25 p.m. UTC | #1
On Wed, Dec 01, 2021 at 11:23:01AM -0800, Reinette Chatre wrote:
> === Summary ===
> 
> An SGX VMA can only be created if its permissions are the same or
> weaker than the Enclave Page Cache Map (EPCM) permissions. After VMA
> creation this rule continues to be enforced by the page fault handler.
> 
> With SGX2 the EPCM permissions of a page can change after VMA
> creation resulting in the VMA exceeding the EPCM permissions and the
> page fault handler incorrectly blocking access.
> 
> Enable the VMA's pages to remain accessible while ensuring that
> the page table entries are installed to match the EPCM permissions
> without exceeding the VMA perms issions.

I don't understand what the short summary means in English, and the
commit message is way too bloated to make any conclusions. It really
needs a rewrite.

These were the questions I could not find answer for:

1. Why it would be by any means safe to remove a permission check?
2. Why not re-issuing mmap()'s is unfeasible? I.e. close existing
   VMA's and mmap() new ones.

/Jarkko
Jarkko Sakkinen Dec. 4, 2021, 10:27 p.m. UTC | #2
On Sun, Dec 05, 2021 at 12:25:59AM +0200, Jarkko Sakkinen wrote:
> On Wed, Dec 01, 2021 at 11:23:01AM -0800, Reinette Chatre wrote:
> > === Summary ===
> > 
> > An SGX VMA can only be created if its permissions are the same or
> > weaker than the Enclave Page Cache Map (EPCM) permissions. After VMA
> > creation this rule continues to be enforced by the page fault handler.
> > 
> > With SGX2 the EPCM permissions of a page can change after VMA
> > creation resulting in the VMA exceeding the EPCM permissions and the
> > page fault handler incorrectly blocking access.
> > 
> > Enable the VMA's pages to remain accessible while ensuring that
> > the page table entries are installed to match the EPCM permissions
> > without exceeding the VMA perms issions.
> 
> I don't understand what the short summary means in English, and the
> commit message is way too bloated to make any conclusions. It really
> needs a rewrite.
> 
> These were the questions I could not find answer for:
> 
> 1. Why it would be by any means safe to remove a permission check?
> 2. Why not re-issuing mmap()'s is unfeasible? I.e. close existing
>    VMA's and mmap() new ones.

3. Isn't this an API/ABI break?

/Jarkko
Reinette Chatre Dec. 6, 2021, 9:16 p.m. UTC | #3
Hi Jarkko,

On 12/4/2021 2:27 PM, Jarkko Sakkinen wrote:
> On Sun, Dec 05, 2021 at 12:25:59AM +0200, Jarkko Sakkinen wrote:
>> On Wed, Dec 01, 2021 at 11:23:01AM -0800, Reinette Chatre wrote:
>>> === Summary ===
>>>
>>> An SGX VMA can only be created if its permissions are the same or
>>> weaker than the Enclave Page Cache Map (EPCM) permissions. After VMA
>>> creation this rule continues to be enforced by the page fault handler.
>>>
>>> With SGX2 the EPCM permissions of a page can change after VMA
>>> creation resulting in the VMA exceeding the EPCM permissions and the
>>> page fault handler incorrectly blocking access.
>>>
>>> Enable the VMA's pages to remain accessible while ensuring that
>>> the page table entries are installed to match the EPCM permissions
>>> without exceeding the VMA perms issions.
>>
>> I don't understand what the short summary means in English, and the
>> commit message is way too bloated to make any conclusions. It really
>> needs a rewrite.
>>
>> These were the questions I could not find answer for:
>>
>> 1. Why it would be by any means safe to remove a permission check?

The permission check is redundant for SGX1 and incorrect for SGX2.

In the current SGX1 implementation the permission check in 
sgx_encl_load_page() is redundant because an SGX VMA can only be created 
if its permissions are the same or weaker than the EPCM permissions.

In SGX2 a user is able to change EPCM permissions during runtime (while 
VMA has the memory mapped). A RW VMA may thus originally have mapped an 
enclave page with RW EPCM permissions but since then the enclave page 
may have its permissions changed to read-only. The VMA should still be 
able to read those enclave pages but the check in sgx_encl_load_page() 
will prevent that.

>> 2. Why not re-issuing mmap()'s is unfeasible? I.e. close existing
>>     VMA's and mmap() new ones.

User is not prevented from closing existing VMAs and creating new ones.

> 3. Isn't this an API/ABI break?

Could you please elaborate where you see the API/ABI break? The rule 
that new VMAs cannot exceed EPCM permissions is untouched.

Reinette
Jarkko Sakkinen Dec. 11, 2021, 5:39 a.m. UTC | #4
On Mon, 2021-12-06 at 13:16 -0800, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 12/4/2021 2:27 PM, Jarkko Sakkinen wrote:
> > On Sun, Dec 05, 2021 at 12:25:59AM +0200, Jarkko Sakkinen wrote:
> > > On Wed, Dec 01, 2021 at 11:23:01AM -0800, Reinette Chatre wrote:
> > > > === Summary ===
> > > > 
> > > > An SGX VMA can only be created if its permissions are the same or
> > > > weaker than the Enclave Page Cache Map (EPCM) permissions. After VMA
> > > > creation this rule continues to be enforced by the page fault handler.
> > > > 
> > > > With SGX2 the EPCM permissions of a page can change after VMA
> > > > creation resulting in the VMA exceeding the EPCM permissions and the
> > > > page fault handler incorrectly blocking access.
> > > > 
> > > > Enable the VMA's pages to remain accessible while ensuring that
> > > > the page table entries are installed to match the EPCM permissions
> > > > without exceeding the VMA perms issions.
> > > 
> > > I don't understand what the short summary means in English, and the
> > > commit message is way too bloated to make any conclusions. It really
> > > needs a rewrite.
> > > 
> > > These were the questions I could not find answer for:
> > > 
> > > 1. Why it would be by any means safe to remove a permission check?
> 
> The permission check is redundant for SGX1 and incorrect for SGX2.
> 
> In the current SGX1 implementation the permission check in 
> sgx_encl_load_page() is redundant because an SGX VMA can only be created 
> if its permissions are the same or weaker than the EPCM permissions.
> 
> In SGX2 a user is able to change EPCM permissions during runtime (while 
> VMA has the memory mapped). A RW VMA may thus originally have mapped an 
> enclave page with RW EPCM permissions but since then the enclave page 
> may have its permissions changed to read-only. The VMA should still be 
> able to read those enclave pages but the check in sgx_encl_load_page() 
> will prevent that.
> 
> > > 2. Why not re-issuing mmap()'s is unfeasible? I.e. close existing
> > >     VMA's and mmap() new ones.
> 
> User is not prevented from closing existing VMAs and creating new ones.
> 
> > 3. Isn't this an API/ABI break?
> 
> Could you please elaborate where you see the API/ABI break? The rule 
> that new VMAs cannot exceed EPCM permissions is untouched.
> 
> Reinette

I just don't understand the description. There's a whole bunch of text
but 

It does not discuss what the patch does in low-level detail what the
patch does, e.g. the use of vm_insert_pfn_prot(). I honestly do not
get the story here...

/Jarkko
Reinette Chatre Dec. 13, 2021, 10:08 p.m. UTC | #5
Hi Jarkko,

On 12/10/2021 9:39 PM, Jarkko Sakkinen wrote:
> On Mon, 2021-12-06 at 13:16 -0800, Reinette Chatre wrote:
>> On 12/4/2021 2:27 PM, Jarkko Sakkinen wrote:
>>> On Sun, Dec 05, 2021 at 12:25:59AM +0200, Jarkko Sakkinen wrote:
>>>> On Wed, Dec 01, 2021 at 11:23:01AM -0800, Reinette Chatre wrote:
>>>>> === Summary ===
>>>>>
>>>>> An SGX VMA can only be created if its permissions are the same or
>>>>> weaker than the Enclave Page Cache Map (EPCM) permissions. After VMA
>>>>> creation this rule continues to be enforced by the page fault handler.
>>>>>
>>>>> With SGX2 the EPCM permissions of a page can change after VMA
>>>>> creation resulting in the VMA exceeding the EPCM permissions and the
>>>>> page fault handler incorrectly blocking access.
>>>>>
>>>>> Enable the VMA's pages to remain accessible while ensuring that
>>>>> the page table entries are installed to match the EPCM permissions
>>>>> without exceeding the VMA perms issions.
>>>>
>>>> I don't understand what the short summary means in English, and the
>>>> commit message is way too bloated to make any conclusions. It really
>>>> needs a rewrite.
>>>>
>>>> These were the questions I could not find answer for:
>>>>
>>>> 1. Why it would be by any means safe to remove a permission check?
>>
>> The permission check is redundant for SGX1 and incorrect for SGX2.
>>
>> In the current SGX1 implementation the permission check in
>> sgx_encl_load_page() is redundant because an SGX VMA can only be created
>> if its permissions are the same or weaker than the EPCM permissions.
>>
>> In SGX2 a user is able to change EPCM permissions during runtime (while
>> VMA has the memory mapped). A RW VMA may thus originally have mapped an
>> enclave page with RW EPCM permissions but since then the enclave page
>> may have its permissions changed to read-only. The VMA should still be
>> able to read those enclave pages but the check in sgx_encl_load_page()
>> will prevent that.
>>
>>>> 2. Why not re-issuing mmap()'s is unfeasible? I.e. close existing
>>>>      VMA's and mmap() new ones.
>>
>> User is not prevented from closing existing VMAs and creating new ones.
>>
>>> 3. Isn't this an API/ABI break?
>>
>> Could you please elaborate where you see the API/ABI break? The rule
>> that new VMAs cannot exceed EPCM permissions is untouched.
>>
>> Reinette
> 
> I just don't understand the description. There's a whole bunch of text
> but
> 
> It does not discuss what the patch does in low-level detail what the
> patch does, e.g. the use of vm_insert_pfn_prot(). I honestly do not
> get the story here...

vmf_insert_pfn_prot() replaces the existing call to vmf_insert_pfn().

Notice how:

vm_fault_t vmf_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
                           unsigned long pfn)
{
         return vmf_insert_pfn_prot(vma, addr, pfn, vma->vm_page_prot);
}

vmf_insert_pfn() is replaced with the function it would call anyway. It 
is done because the PTE being installed should no longer blindly inherit 
the VMA permission as is done in the current code, but it should also 
take the EPCM permissions into account. This is because the EPCM 
permissions can change after the VMA is created.

For example, consider a RW VMA created to map pages with RW EPCM pages.
Since SGX1 does not allow EPCM permission changes it is ok to always 
install RW PTEs to access those pages and thus vmf_insert_pfn() is 
sufficient. In SGX2 the EPCM pages may become read-only and the PTEs 
should no longer be RW. This is made possible with the call to 
vmf_insert_pfn_prot() where the protection bits for the PTE can be 
provided (so that the PTE permissions do not exceed the EPCM permissions).

Reinette
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 001808e3901c..20e97d3abdce 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -91,10 +91,8 @@  static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
 }
 
 static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
-						unsigned long addr,
-						unsigned long vm_flags)
+						unsigned long addr)
 {
-	unsigned long vm_prot_bits = vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
 	struct sgx_epc_page *epc_page;
 	struct sgx_encl_page *entry;
 
@@ -102,14 +100,6 @@  static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
 	if (!entry)
 		return ERR_PTR(-EFAULT);
 
-	/*
-	 * Verify that the faulted page has equal or higher build time
-	 * permissions than the VMA permissions (i.e. the subset of {VM_READ,
-	 * VM_WRITE, VM_EXECUTE} in vma->vm_flags).
-	 */
-	if ((entry->vm_max_prot_bits & vm_prot_bits) != vm_prot_bits)
-		return ERR_PTR(-EFAULT);
-
 	/* Entry successfully located. */
 	if (entry->epc_page) {
 		if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
@@ -138,7 +128,9 @@  static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
 {
 	unsigned long addr = (unsigned long)vmf->address;
 	struct vm_area_struct *vma = vmf->vma;
+	unsigned long page_prot_bits;
 	struct sgx_encl_page *entry;
+	unsigned long vm_prot_bits;
 	unsigned long phys_addr;
 	struct sgx_encl *encl;
 	vm_fault_t ret;
@@ -155,7 +147,7 @@  static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
 
 	mutex_lock(&encl->lock);
 
-	entry = sgx_encl_load_page(encl, addr, vma->vm_flags);
+	entry = sgx_encl_load_page(encl, addr);
 	if (IS_ERR(entry)) {
 		mutex_unlock(&encl->lock);
 
@@ -167,7 +159,19 @@  static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
 
 	phys_addr = sgx_get_epc_phys_addr(entry->epc_page);
 
-	ret = vmf_insert_pfn(vma, addr, PFN_DOWN(phys_addr));
+	/*
+	 * Insert PTE to match the EPCM page permissions ensured to not
+	 * exceed the VMA permissions.
+	 */
+	vm_prot_bits = vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
+	page_prot_bits = entry->vm_max_prot_bits & vm_prot_bits;
+	/*
+	 * Add VM_SHARED so that PTE is made writable right away if VMA
+	 * and EPCM are writable (no COW in SGX).
+	 */
+	page_prot_bits |= (vma->vm_flags & VM_SHARED);
+	ret = vmf_insert_pfn_prot(vma, addr, PFN_DOWN(phys_addr),
+				  vm_get_page_prot(page_prot_bits));
 	if (ret != VM_FAULT_NOPAGE) {
 		mutex_unlock(&encl->lock);
 
@@ -295,15 +299,14 @@  static int sgx_encl_debug_write(struct sgx_encl *encl, struct sgx_encl_page *pag
  * Load an enclave page to EPC if required, and take encl->lock.
  */
 static struct sgx_encl_page *sgx_encl_reserve_page(struct sgx_encl *encl,
-						   unsigned long addr,
-						   unsigned long vm_flags)
+						   unsigned long addr)
 {
 	struct sgx_encl_page *entry;
 
 	for ( ; ; ) {
 		mutex_lock(&encl->lock);
 
-		entry = sgx_encl_load_page(encl, addr, vm_flags);
+		entry = sgx_encl_load_page(encl, addr);
 		if (PTR_ERR(entry) != -EBUSY)
 			break;
 
@@ -339,8 +342,7 @@  static int sgx_vma_access(struct vm_area_struct *vma, unsigned long addr,
 		return -EFAULT;
 
 	for (i = 0; i < len; i += cnt) {
-		entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK,
-					      vma->vm_flags);
+		entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK);
 		if (IS_ERR(entry)) {
 			ret = PTR_ERR(entry);
 			break;