diff mbox series

[v5,2/3] x86/sgx: Fine grained SGX MCA behavior for virtualization

Message ID 20220622093705.2891642-3-zhiquan1.li@intel.com (mailing list archive)
State New, archived
Headers show
Series x86/sgx: fine grained SGX MCA behavior | expand

Commit Message

Zhiquan Li June 22, 2022, 9:37 a.m. UTC
When VM guest access a SGX EPC page with memory failure, current
behavior will kill the guest, expected only kill the SGX application
inside it.

To fix it we send SIGBUS with code BUS_MCEERR_AR and some extra
information for hypervisor to inject #MC information to guest, which is
helpful in SGX case.

The rest of things are guest side. Currently the hypervisor like Qemu
already has mature facility to convert HVA to GPA and inject #MC to
the guest OS.

Unlike host enclaves, virtual EPC instance cannot be shared by multiple
VMs.  It is because how enclaves are created is totally up to the guest.
Sharing virtual EPC instance will be very likely to unexpectedly break
enclaves in all VMs.

SGX virtual EPC driver doesn't explicitly prevent virtual EPC instance
being shared by multiple VMs via fork().  However KVM doesn't support
running a VM across multiple mm structures, and the de facto userspace
hypervisor (Qemu) doesn't use fork() to create a new VM, so in practice
this should not happen.

Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/linux-sgx/443cb425-009c-2784-56f4-5e707122de76@intel.com/T/#m1d1f4098f4fad78034e8706a60e4d79c119db407
---
Changes since V4:
- Switch the order of the two variables so all of variables are in
  reverse Christmas style.
- Do not initialize "ret" because it will be overridden by the return
  value of force_sig_mceerr() unconditionally.

Changes since V2:
- Retrieve virtual address from "owner" field of struct sgx_epc_page,
  instead of struct sgx_vepc_page.
- Replace EPC page flag SGX_EPC_PAGE_IS_VEPC with
  SGX_EPC_PAGE_KVM_GUEST as they are duplicated.

Changes since V1:
- Add Acked-by from Kai Huang.
- Add Kai’s excellent explanation regarding to why we no need to
  consider that one virtual EPC be shared by two guests.
---
 arch/x86/kernel/cpu/sgx/main.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

Comments

Dave Hansen July 21, 2022, 4:54 p.m. UTC | #1
On 6/22/22 02:37, Zhiquan Li wrote:
> When VM guest access a SGX EPC page with memory failure, current
> behavior will kill the guest, expected only kill the SGX application
> inside it.

Can we please clean this up?  This is generally readable, but _hard_ to
read.  Perhaps:

	Today, if a guest accesses an SGX EPC page with memory failure,
	the kernel will behavior will kill the entire guest.  This blast
	radius is too large.  It would be idea to kill only the SGX
	application inside the guest.

> To fix it we send SIGBUS with code BUS_MCEERR_AR and some extra

	    ^ No "we's".

> information for hypervisor to inject #MC information to guest, which is
> helpful in SGX case.

To fix this, send a SIGBUS to host userspace (like QEMU) which can
follow up by injecting a #MC to the guest.

> The rest of things are guest side. Currently the hypervisor like Qemu
> already has mature facility to convert HVA to GPA and inject #MC to
> the guest OS.
> 
> Unlike host enclaves, virtual EPC instance cannot be shared by multiple
> VMs.  It is because how enclaves are created is totally up to the guest.
> Sharing virtual EPC instance will be very likely to unexpectedly break
> enclaves in all VMs.

I'm not sure why this is here or why it is important to this patch.

> SGX virtual EPC driver doesn't explicitly prevent virtual EPC instance
> being shared by multiple VMs via fork().  However KVM doesn't support
> running a VM across multiple mm structures, and the de facto userspace
> hypervisor (Qemu) doesn't use fork() to create a new VM, so in practice
> this should not happen.


> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index ab4ec54bbdd9..4507c2302348 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -715,6 +715,8 @@ int arch_memory_failure(unsigned long pfn, int flags)
>  	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
>  	struct sgx_epc_section *section;
>  	struct sgx_numa_node *node;
> +	unsigned long vaddr;
> +	int ret;
>  
>  	/*
>  	 * mm/memory-failure.c calls this routine for all errors
> @@ -731,8 +733,26 @@ int arch_memory_failure(unsigned long pfn, int flags)
>  	 * error. The signal may help the task understand why the
>  	 * enclave is broken.
>  	 */
> -	if (flags & MF_ACTION_REQUIRED)
> -		force_sig(SIGBUS);
> +	if (flags & MF_ACTION_REQUIRED) {
> +		/*
> +		 * Provide extra info to the task so that it can make further
> +		 * decision but not simply kill it. This is quite useful for
> +		 * virtualization case.
> +		 */
> +		if (page->flags & SGX_EPC_PAGE_KVM_GUEST) {
> +			/*
> +			 * The "owner" field is repurposed as the virtual address
> +			 * of virtual EPC page.
> +			 */
> +			vaddr = (unsigned long)page->owner & PAGE_MASK;

I really don't like repurposing page->owner like this.  It requires
casting on *both* sides of a type that we have full control over.

	struct sgx_epc_page {
	        unsigned int section;
	        u16 flags;
	        u16 poison;
		union {
		        struct sgx_encl_page *encl_owner;
			// Use when SGX_EPC_PAGE_KVM_GUEST
			// set in ->flags:
		        void __user *vepc_vaddr;
		};
	        struct list_head list;
	};

There is zero reason to play casting games instead of doing that ^

> +			ret = force_sig_mceerr(BUS_MCEERR_AR, (void __user *)vaddr,
> +					PAGE_SHIFT);
> +			if (ret < 0)
> +				pr_err("Memory failure: Error sending signal to %s:%d: %d\n",
> +					current->comm, current->pid, ret);
> +		} else
> +			force_sig(SIGBUS);
> +	}
Zhiquan Li July 22, 2022, 4:21 p.m. UTC | #2
On 2022/7/22 00:54, Dave Hansen wrote:
> On 6/22/22 02:37, Zhiquan Li wrote:
>> When VM guest access a SGX EPC page with memory failure, current
>> behavior will kill the guest, expected only kill the SGX application
>> inside it.
> Can we please clean this up?  This is generally readable, but _hard_ to
> read.  Perhaps:
> 
> 	Today, if a guest accesses an SGX EPC page with memory failure,
> 	the kernel will behavior will kill the entire guest.  This blast
> 	radius is too large.  It would be idea to kill only the SGX
> 	application inside the guest.
> 
>> To fix it we send SIGBUS with code BUS_MCEERR_AR and some extra
> 	    ^ No "we's".
> 
>> information for hypervisor to inject #MC information to guest, which is
>> helpful in SGX case.
> To fix this, send a SIGBUS to host userspace (like QEMU) which can
> follow up by injecting a #MC to the guest.
> 
>> The rest of things are guest side. Currently the hypervisor like Qemu
>> already has mature facility to convert HVA to GPA and inject #MC to
>> the guest OS.
>>
>> Unlike host enclaves, virtual EPC instance cannot be shared by multiple
>> VMs.  It is because how enclaves are created is totally up to the guest.
>> Sharing virtual EPC instance will be very likely to unexpectedly break
>> enclaves in all VMs.
> I'm not sure why this is here or why it is important to this patch.
> 
>> SGX virtual EPC driver doesn't explicitly prevent virtual EPC instance
>> being shared by multiple VMs via fork().  However KVM doesn't support
>> running a VM across multiple mm structures, and the de facto userspace
>> hypervisor (Qemu) doesn't use fork() to create a new VM, so in practice
>> this should not happen.
> 
>> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
>> index ab4ec54bbdd9..4507c2302348 100644
>> --- a/arch/x86/kernel/cpu/sgx/main.c
>> +++ b/arch/x86/kernel/cpu/sgx/main.c
>> @@ -715,6 +715,8 @@ int arch_memory_failure(unsigned long pfn, int flags)
>>  	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
>>  	struct sgx_epc_section *section;
>>  	struct sgx_numa_node *node;
>> +	unsigned long vaddr;
>> +	int ret;
>>  
>>  	/*
>>  	 * mm/memory-failure.c calls this routine for all errors
>> @@ -731,8 +733,26 @@ int arch_memory_failure(unsigned long pfn, int flags)
>>  	 * error. The signal may help the task understand why the
>>  	 * enclave is broken.
>>  	 */
>> -	if (flags & MF_ACTION_REQUIRED)
>> -		force_sig(SIGBUS);
>> +	if (flags & MF_ACTION_REQUIRED) {
>> +		/*
>> +		 * Provide extra info to the task so that it can make further
>> +		 * decision but not simply kill it. This is quite useful for
>> +		 * virtualization case.
>> +		 */
>> +		if (page->flags & SGX_EPC_PAGE_KVM_GUEST) {
>> +			/*
>> +			 * The "owner" field is repurposed as the virtual address
>> +			 * of virtual EPC page.
>> +			 */
>> +			vaddr = (unsigned long)page->owner & PAGE_MASK;
> I really don't like repurposing page->owner like this.  It requires
> casting on *both* sides of a type that we have full control over.
> 
> 	struct sgx_epc_page {
> 	        unsigned int section;
> 	        u16 flags;
> 	        u16 poison;
> 		union {
> 		        struct sgx_encl_page *encl_owner;
> 			// Use when SGX_EPC_PAGE_KVM_GUEST
> 			// set in ->flags:
> 		        void __user *vepc_vaddr;
> 		};
> 	        struct list_head list;
> 	};
> 
> There is zero reason to play casting games instead of doing that ^
> 

Many thanks for your review, Dave.
I will send V6 patch set as per your suggestion.

Best Regards,
Zhiquan
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index ab4ec54bbdd9..4507c2302348 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -715,6 +715,8 @@  int arch_memory_failure(unsigned long pfn, int flags)
 	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
 	struct sgx_epc_section *section;
 	struct sgx_numa_node *node;
+	unsigned long vaddr;
+	int ret;
 
 	/*
 	 * mm/memory-failure.c calls this routine for all errors
@@ -731,8 +733,26 @@  int arch_memory_failure(unsigned long pfn, int flags)
 	 * error. The signal may help the task understand why the
 	 * enclave is broken.
 	 */
-	if (flags & MF_ACTION_REQUIRED)
-		force_sig(SIGBUS);
+	if (flags & MF_ACTION_REQUIRED) {
+		/*
+		 * Provide extra info to the task so that it can make further
+		 * decision but not simply kill it. This is quite useful for
+		 * virtualization case.
+		 */
+		if (page->flags & SGX_EPC_PAGE_KVM_GUEST) {
+			/*
+			 * The "owner" field is repurposed as the virtual address
+			 * of virtual EPC page.
+			 */
+			vaddr = (unsigned long)page->owner & PAGE_MASK;
+			ret = force_sig_mceerr(BUS_MCEERR_AR, (void __user *)vaddr,
+					PAGE_SHIFT);
+			if (ret < 0)
+				pr_err("Memory failure: Error sending signal to %s:%d: %d\n",
+					current->comm, current->pid, ret);
+		} else
+			force_sig(SIGBUS);
+	}
 
 	section = &sgx_epc_sections[page->section];
 	node = section->node;