Message ID | 20211011185924.374213-5-tony.luck@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Basic recovery for machine checks inside SGX | expand |
On Mon, Oct 11, 2021, Tony Luck wrote: > + section = &sgx_epc_sections[page->section]; > + node = section->node; > + > + spin_lock(&node->lock); > + > + /* Already poisoned? Nothing more to do */ > + if (page->poison) > + goto out; > + > + page->poison = 1; > + > + /* > + * If flags is zero, then the page is on a free list. > + * Move it to the poison page list. > + */ > + if (!page->flags) { If the flag is inverted, this becomes if (page->flags & SGX_EPC_PAGE_FREE) { > + list_del(&page->list); > + list_add(&page->list, &sgx_poison_page_list); list_move(), and needs the same protection for sgx_poison_page_list. > + goto out; > + } > + > + /* > + * TBD: Add additional plumbing to enable pre-emptive > + * action for asynchronous poison notification. Until > + * then just hope that the poison: > + * a) is not accessed - sgx_free_epc_page() will deal with it > + * when the user gives it back > + * b) results in a recoverable machine check rather than > + * a fatal one > + */ > +out: > + spin_unlock(&node->lock); > + return 0; > +} > + > /** > * A section metric is concatenated in a way that @low bits 12-31 define the > * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the > > -- > 2.31.1 >
On Fri, Oct 15, 2021 at 11:10:32PM +0000, Sean Christopherson wrote: > On Mon, Oct 11, 2021, Tony Luck wrote: > > + section = &sgx_epc_sections[page->section]; > > + node = section->node; > > + > > + spin_lock(&node->lock); > > + > > + /* Already poisoned? Nothing more to do */ > > + if (page->poison) > > + goto out; > > + > > + page->poison = 1; > > + > > + /* > > + * If flags is zero, then the page is on a free list. > > + * Move it to the poison page list. > > + */ > > + if (!page->flags) { > > If the flag is inverted, this becomes > > if (page->flags & SGX_EPC_PAGE_FREE) { I like the inversion. I'll switch to SGX_EPC_PAGE_FREE > > > + list_del(&page->list); > > + list_add(&page->list, &sgx_poison_page_list); > > list_move(), and needs the same protection for sgx_poison_page_list. Didn't know list_move() existed. Will change all the lis_del+list_add into list_move. Also change the sgx_poison_page_list from global to per-node. Then the adds will be safe (accessed while holding the node->lock). Thanks for the review. -Tony
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 653bace26100..398c9749e4d1 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -682,6 +682,83 @@ bool arch_is_platform_page(u64 paddr) } EXPORT_SYMBOL_GPL(arch_is_platform_page); +static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr) +{ + struct sgx_epc_section *section; + + section = xa_load(&sgx_epc_address_space, paddr); + if (!section) + return NULL; + + return §ion->pages[PFN_DOWN(paddr - section->phys_addr)]; +} + +/* + * Called in process context to handle a hardware reported + * error in an SGX EPC page. + * If the MF_ACTION_REQUIRED bit is set in flags, then the + * context is the task that consumed the poison data. Otherwise + * this is called from a kernel thread unrelated to the page. + */ +int arch_memory_failure(unsigned long pfn, int flags) +{ + struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT); + struct sgx_epc_section *section; + struct sgx_numa_node *node; + + /* + * mm/memory-failure.c calls this routine for all errors + * where there isn't a "struct page" for the address. But that + * includes other address ranges besides SGX. + */ + if (!page) + return -ENXIO; + + /* + * If poison was consumed synchronously. Send a SIGBUS to + * the task. Hardware has already exited the SGX enclave and + * will not allow re-entry to an enclave that has a memory + * error. The signal may help the task understand why the + * enclave is broken. + */ + if (flags & MF_ACTION_REQUIRED) + force_sig(SIGBUS); + + section = &sgx_epc_sections[page->section]; + node = section->node; + + spin_lock(&node->lock); + + /* Already poisoned? Nothing more to do */ + if (page->poison) + goto out; + + page->poison = 1; + + /* + * If flags is zero, then the page is on a free list. + * Move it to the poison page list. + */ + if (!page->flags) { + list_del(&page->list); + list_add(&page->list, &sgx_poison_page_list); + goto out; + } + + /* + * TBD: Add additional plumbing to enable pre-emptive + * action for asynchronous poison notification. Until + * then just hope that the poison: + * a) is not accessed - sgx_free_epc_page() will deal with it + * when the user gives it back + * b) results in a recoverable machine check rather than + * a fatal one + */ +out: + spin_unlock(&node->lock); + return 0; +} + /** * A section metric is concatenated in a way that @low bits 12-31 define the * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the