diff mbox series

[v5] x86/sgx: Resolves SECS reclaim vs. page fault for EAUG race

Message ID 20230727010212.26406-1-haitao.huang@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series [v5] x86/sgx: Resolves SECS reclaim vs. page fault for EAUG race | expand

Commit Message

Haitao Huang July 27, 2023, 1:02 a.m. UTC
Under heavy load, the SGX EPC reclaimer (ksgxd) may reclaim the SECS EPC
page for an enclave and set encl->secs.epc_page to NULL. But the SECS
EPC page is used for EAUG in the SGX page fault handler without checking
for NULL and reloading.

Fix this by checking if SECS is loaded before EAUG and loading it if it
was reclaimed.

The SECS page holds global enclave metadata. It can only be reclaimed
when there are no other enclave pages remaining. At that point,
virtually nothing can be done with the enclave until the SECS page is
paged back in.

An enclave can not run nor generate page faults without a resident SECS
page. But it is still possible for a #PF for a non-SECS page to race
with paging out the SECS page: when the last resident non-SECS page A
triggers a #PF in a non-resident page B, and then page A and the SECS
both are paged out before the #PF on B is handled.

Hitting this bug requires that race triggered with a #PF for EAUG.
Following is a trace when it happens.

BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:sgx_encl_eaug_page+0xc7/0x210
Call Trace:
 ? __kmem_cache_alloc_node+0x16a/0x440
 ? xa_load+0x6e/0xa0
 sgx_vma_fault+0x119/0x230
 __do_fault+0x36/0x140
 do_fault+0x12f/0x400
 __handle_mm_fault+0x728/0x1110
 handle_mm_fault+0x105/0x310
 do_user_addr_fault+0x1ee/0x750
 ? __this_cpu_preempt_check+0x13/0x20
 exc_page_fault+0x76/0x180
 asm_exc_page_fault+0x27/0x30

Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave")
Cc: stable@vger.kernel.org # v6.0+
Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
---
v5:
- Trimmed trace and added Acked-by (Reinette)

v4:
- Refined the title (Kai, Dave)
- Added a trace to commit meesage (Kai)
- Added a few details for the race.

v3:
- Added comments on sgx_encl_load_secs(). (Dave)
- Added theory of the race condition to hit the bug. (Dave)
- Added Reviewed-by, and applicable stable release. (Jarkko)

v2:
- Fixes for style, commit message (Jarkko, Kai)
- Removed unneeded WARN_ON (Kai)
---
 arch/x86/kernel/cpu/sgx/encl.c | 30 +++++++++++++++++++++++++-----
 1 file changed, 25 insertions(+), 5 deletions(-)


base-commit: 6eaae198076080886b9e7d57f4ae06fa782f90ef

Comments

Huang, Kai July 27, 2023, 2:50 a.m. UTC | #1
On Wed, 2023-07-26 at 18:02 -0700, Haitao Huang wrote:
> Under heavy load, the SGX EPC reclaimer (ksgxd) may reclaim the SECS EPC

If I read correctly, Dave suggested to not use "high" (heavy in this sentence)
or "low" pressure:

https://lore.kernel.org/lkml/op.179a4xs0wjvjmi@hhuan26-mobl.amr.corp.intel.com/T/#m9120eac6a4a94daa7c9fcc47709f241cd181e5dc

And I agree.  For instance, consider this happens to one extremely "small"
enclave, while there's a new "big" enclave starts to run.  I don't think we
should say this is "under heavy load".  Just stick to the fact that the
reclaimer may reclaim the SECS page.

> page for an enclave and set encl->secs.epc_page to NULL. But the SECS
> EPC page is used for EAUG in the SGX page fault handler without checking
> for NULL and reloading.
> 
> Fix this by checking if SECS is loaded before EAUG and loading it if it
> was reclaimed.
> 
> The SECS page holds global enclave metadata. It can only be reclaimed
> when there are no other enclave pages remaining. At that point,
> virtually nothing can be done with the enclave until the SECS page is
> paged back in.
> 
> An enclave can not run nor generate page faults without a resident SECS
> page. 
> 

I am not sure whether "nor generate page faults without a resident SECS page" is
accurate?  When SECS is swapped out, I suppose the first EENTER should trigger a
#PF on the TSC page and in the #PF handler the SECS will be swapped in first.

I guess you can just remove this sentence?

> But it is still possible for a #PF for a non-SECS page to race
> with paging out the SECS page: when the last resident non-SECS page A
> triggers a #PF in a non-resident page B, and then page A and the SECS
> both are paged out before the #PF on B is handled.
> 
> Hitting this bug requires that race triggered with a #PF for EAUG.

The above race can happen for the normal ELDU path too, thus I suppose it will
be better to mention why the normal ELDU path doesn't have this issue: it
already does what this fix does.

> Following is a trace when it happens.
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> RIP: 0010:sgx_encl_eaug_page+0xc7/0x210
> Call Trace:
>  ? __kmem_cache_alloc_node+0x16a/0x440
>  ? xa_load+0x6e/0xa0
>  sgx_vma_fault+0x119/0x230
>  __do_fault+0x36/0x140
>  do_fault+0x12f/0x400
>  __handle_mm_fault+0x728/0x1110
>  handle_mm_fault+0x105/0x310
>  do_user_addr_fault+0x1ee/0x750
>  ? __this_cpu_preempt_check+0x13/0x20
>  exc_page_fault+0x76/0x180
>  asm_exc_page_fault+0x27/0x30
> 
> Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave")
> Cc: stable@vger.kernel.org # v6.0+
> Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
> Acked-by: Reinette Chatre <reinette.chatre@intel.com>

With above addressed, feel free to add:

Reviewed-by: Kai Huang <kai.huang@intel.com>
Huang, Kai July 27, 2023, 3:11 a.m. UTC | #2
On Thu, 2023-07-27 at 02:50 +0000, Huang, Kai wrote:
> > An enclave can not run nor generate page faults without a resident SECS
> > page. 
> > 
> 
> I am not sure whether "nor generate page faults without a resident SECS page" is
> accurate?  When SECS is swapped out, I suppose the first EENTER should trigger a
> #PF on the TSC page and in the #PF handler the SECS will be swapped in first.
> 
> I guess you can just remove this sentence?

Hmm.. Probably I should interpret this sentence as the enclave "code" itself
cannot generate page faults without a resident SECS.  This is true.  So feel
free to ignore this comment.
Haitao Huang July 27, 2023, 2:16 p.m. UTC | #3
On Wed, 26 Jul 2023 21:50:02 -0500, Huang, Kai <kai.huang@intel.com> wrote:

> On Wed, 2023-07-26 at 18:02 -0700, Haitao Huang wrote:
>> Under heavy load, the SGX EPC reclaimer (ksgxd) may reclaim the SECS EPC
>
> If I read correctly, Dave suggested to not use "high" (heavy in this  
> sentence)
> or "low" pressure:
>
> https://lore.kernel.org/lkml/op.179a4xs0wjvjmi@hhuan26-mobl.amr.corp.intel.com/T/#m9120eac6a4a94daa7c9fcc47709f241cd181e5dc
>
> And I agree.  For instance, consider this happens to one extremely  
> "small"
> enclave, while there's a new "big" enclave starts to run.  I don't think  
> we
> should say this is "under heavy load".  Just stick to the fact that the
> reclaimer may reclaim the SECS page.
>
Mybe I have some confusion here but I did not think Dave had issues with  
'heavy load'. When this happens, the last page causing #PF (page A below)  
should be the the "youngest" in PTE and it got paged out together with the  
SECS before the #PF is even handled. Based on that the ksgxd moves 'young'  
pages to the back of the queue for reclaiming, for that to happen, almost  
all EPC pages must be paged out for all enclaves at that time, so it means  
heavy load to me.  And that's also consistent with my tests.

>> page for an enclave and set encl->secs.epc_page to NULL. But the SECS
>> EPC page is used for EAUG in the SGX page fault handler without checking
>> for NULL and reloading.
>>
>> Fix this by checking if SECS is loaded before EAUG and loading it if it
>> was reclaimed.
>>
>> The SECS page holds global enclave metadata. It can only be reclaimed
>> when there are no other enclave pages remaining. At that point,
>> virtually nothing can be done with the enclave until the SECS page is
>> paged back in.
...
>> But it is still possible for a #PF for a non-SECS page to race
>> with paging out the SECS page: when the last resident non-SECS page A
>> triggers a #PF in a non-resident page B, and then page A and the SECS
>> both are paged out before the #PF on B is handled.
>>
>> Hitting this bug requires that race triggered with a #PF for EAUG.
>
> The above race can happen for the normal ELDU path too, thus I suppose  
> it will
> be better to mention why the normal ELDU path doesn't have this issue: it
> already does what this fix does.
>
Should we focus on the bug and fix itself instead of explaining a non-bug  
case?
And the simple changes in this patch clearly show that too if people look  
for that.

Thanks
Haitao
Huang, Kai July 27, 2023, 11:21 p.m. UTC | #4
On Thu, 2023-07-27 at 09:16 -0500, Haitao Huang wrote:
> On Wed, 26 Jul 2023 21:50:02 -0500, Huang, Kai <kai.huang@intel.com> wrote:
> 
> > On Wed, 2023-07-26 at 18:02 -0700, Haitao Huang wrote:
> > > Under heavy load, the SGX EPC reclaimer (ksgxd) may reclaim the SECS EPC
> > 
> > If I read correctly, Dave suggested to not use "high" (heavy in this  
> > sentence)
> > or "low" pressure:
> > 
> > https://lore.kernel.org/lkml/op.179a4xs0wjvjmi@hhuan26-mobl.amr.corp.intel.com/T/#m9120eac6a4a94daa7c9fcc47709f241cd181e5dc
> > 
> > And I agree.  For instance, consider this happens to one extremely  
> > "small"
> > enclave, while there's a new "big" enclave starts to run.  I don't think  
> > we
> > should say this is "under heavy load".  Just stick to the fact that the
> > reclaimer may reclaim the SECS page.
> > 
> Mybe I have some confusion here but I did not think Dave had issues with  
> 'heavy load'. When this happens, the last page causing #PF (page A below)  
> should be the the "youngest" in PTE and it got paged out together with the  
> SECS before the #PF is even handled. Based on that the ksgxd moves 'young'  
> pages to the back of the queue for reclaiming, for that to happen, almost  
> all EPC pages must be paged out for all enclaves at that time, so it means  
> heavy load to me.  And that's also consistent with my tests.

I already provided an example: swapping out an "extreme small" enclave.

Anyway, no big deal to me.

> 
> > > page for an enclave and set encl->secs.epc_page to NULL. But the SECS
> > > EPC page is used for EAUG in the SGX page fault handler without checking
> > > for NULL and reloading.
> > > 
> > > Fix this by checking if SECS is loaded before EAUG and loading it if it
> > > was reclaimed.
> > > 
> > > The SECS page holds global enclave metadata. It can only be reclaimed
> > > when there are no other enclave pages remaining. At that point,
> > > virtually nothing can be done with the enclave until the SECS page is
> > > paged back in.
> ...
> > > But it is still possible for a #PF for a non-SECS page to race
> > > with paging out the SECS page: when the last resident non-SECS page A
> > > triggers a #PF in a non-resident page B, and then page A and the SECS
> > > both are paged out before the #PF on B is handled.
> > > 
> > > Hitting this bug requires that race triggered with a #PF for EAUG.
> > 
> > The above race can happen for the normal ELDU path too, thus I suppose  
> > it will
> > be better to mention why the normal ELDU path doesn't have this issue: it
> > already does what this fix does.
> > 
> Should we focus on the bug and fix itself instead of explaining a non-bug  
> case?
> And the simple changes in this patch clearly show that too if people look  
> for that.

So you spent a lot of text explaining the race condition, but such race
condition applies to both ELDU and EAUG.  I personally went to see the code
whether ELDU has such issue too, and it turned out only EAUG has issue.  If you
mention this in the changelog perhaps I wouldn't need to go to read the code.

Anyway, just my 2cents.

And I don't want to let those block this patch, so feel free to add my tag:

Reviewed-by: Kai Huang <kai.huang@intel.com>
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 91fa70e51004..279148e72459 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -235,6 +235,21 @@  static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	return epc_page;
 }
 
+/*
+ * Ensure the SECS page is not swapped out.  Must be called with encl->lock
+ * to protect the enclave states including SECS and ensure the SECS page is
+ * not swapped out again while being used.
+ */
+static struct sgx_epc_page *sgx_encl_load_secs(struct sgx_encl *encl)
+{
+	struct sgx_epc_page *epc_page = encl->secs.epc_page;
+
+	if (!epc_page)
+		epc_page = sgx_encl_eldu(&encl->secs, NULL);
+
+	return epc_page;
+}
+
 static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl,
 						  struct sgx_encl_page *entry)
 {
@@ -248,11 +263,9 @@  static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl,
 		return entry;
 	}
 
-	if (!(encl->secs.epc_page)) {
-		epc_page = sgx_encl_eldu(&encl->secs, NULL);
-		if (IS_ERR(epc_page))
-			return ERR_CAST(epc_page);
-	}
+	epc_page = sgx_encl_load_secs(encl);
+	if (IS_ERR(epc_page))
+		return ERR_CAST(epc_page);
 
 	epc_page = sgx_encl_eldu(entry, encl->secs.epc_page);
 	if (IS_ERR(epc_page))
@@ -339,6 +352,13 @@  static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
 
 	mutex_lock(&encl->lock);
 
+	epc_page = sgx_encl_load_secs(encl);
+	if (IS_ERR(epc_page)) {
+		if (PTR_ERR(epc_page) == -EBUSY)
+			vmret = VM_FAULT_NOPAGE;
+		goto err_out_unlock;
+	}
+
 	epc_page = sgx_alloc_epc_page(encl_page, false);
 	if (IS_ERR(epc_page)) {
 		if (PTR_ERR(epc_page) == -EBUSY)