diff mbox series

[3/4] x86/sgx: Fine grained SGX MCA behavior for virtualization

Message ID 20220510031748.3181459-1-zhiquan1.li@intel.com (mailing list archive)
State New, archived
Headers show
Series x86/sgx: fine grained SGX MCA behavior | expand

Commit Message

Zhiquan Li May 10, 2022, 3:17 a.m. UTC
When VM guest access a SGX EPC page with memory failure, current
behavior will kill the guest, expected only kill the SGX application
inside it.

To fix it we send SIGBUS with code BUS_MCEERR_AR and some extra
information for hypervisor to inject #MC information to guest, which is
helpful in SGX case.

Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

Comments

Huang, Kai May 11, 2022, 11:33 p.m. UTC | #1
On Tue, 2022-05-10 at 11:17 +0800, Zhiquan Li wrote:
> When VM guest access a SGX EPC page with memory failure, current
> behavior will kill the guest, expected only kill the SGX application
> inside it.
> 
> To fix it we send SIGBUS with code BUS_MCEERR_AR and some extra
> information for hypervisor to inject #MC information to guest, which is
> helpful in SGX case.
> 
> Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/main.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 8e4bc6453d26..81801ab0009e 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -710,6 +710,8 @@ int arch_memory_failure(unsigned long pfn, int flags)
>  	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
>  	struct sgx_epc_section *section;
>  	struct sgx_numa_node *node;
> +	struct sgx_vepc_page *owner;
> +	int ret = 0;
>  
>  	/*
>  	 * mm/memory-failure.c calls this routine for all errors
> @@ -726,8 +728,22 @@ int arch_memory_failure(unsigned long pfn, int flags)
>  	 * error. The signal may help the task understand why the
>  	 * enclave is broken.
>  	 */
> -	if (flags & MF_ACTION_REQUIRED)
> -		force_sig(SIGBUS);
> +	if (flags & MF_ACTION_REQUIRED) {
> +		/*
> +		 * In case the error memory is accessed by VM guest, provide
> +		 * extra info for hypervisor to make further decision but not
> +		 * simply kill it.
> +		 */
> +		if (page->flags & SGX_EPC_PAGE_IS_VEPC) {
> +			owner = (struct sgx_vepc_page *)page->owner;
> +			ret = force_sig_mceerr(BUS_MCEERR_AR, (void __user *)owner->vaddr,
> +					PAGE_SHIFT);

If I understand correctly, this sends signal to the userspace hypervisor, i.e.
Qemu?  Can you elaborate how is KVM supposed to inject the #MC to guest, instead
of Qemu process (the VM) being killed?
Zhiquan Li May 12, 2022, 2:19 a.m. UTC | #2
On 2022/5/12 07:33, Kai Huang wrote:
>> +			ret = force_sig_mceerr(BUS_MCEERR_AR, (void __user *)owner->vaddr,
>> +					PAGE_SHIFT);
> If I understand correctly, this sends signal to the userspace hypervisor, i.e.
> Qemu?  Can you elaborate how is KVM supposed to inject the #MC to guest, instead
> of Qemu process (the VM) being killed?
> 

Yes, here we use the facility that Qemu already has.
The basic call path as below:

sigbus_handler
  kvm_on_sigbus
    kvm_arch_on_sigbus_vcpu
      qemu_ram_addr_from_host
      kvm_physical_memory_addr_from_host
      kvm_hwpoison_page_add
      kvm_mce_inject
        cpu_x86_inject_mce

1. At first when Qemu init, it will register sigbus_handler() as its SIGBUS handler
   at qemu_init_sigbus().
2. At sigbus_handler() it will filter-out the signal which siginfo->si_code is not
   BUS_MCEERR_AO or BUS_MCEERR_AR, and then re-raise SIGBUS signal.
   If the si_code is BUS_MCEERR_AO or BUS_MCEERR_AR, it will invoke
   kvm_on_sigbus(siginfo->si_code, siginfo->si_addr), go on handling the signal.
3. kvm_on_sigbus() is a simple generic wrap which will call arch specific
   kvm_arch_on_sigbus_vcpu()
4. kvm_arch_on_sigbus_vcpu() is x86 specific implementation, it will do following
   1) Convert HVA (addr) to HPA (ram_addr)
      ram_addr = qemu_ram_addr_from_host(addr)
   2) Retrieve the GPA (paddr) as per HVA (addr)
      kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)
   3) Add the page including HWPoison HPA to its hwpoison_page_list
      kvm_hwpoison_page_add(ram_addr)
   4) Inject the #MC to guest with GPA and si_code
      kvm_mce_inject(cpu, paddr, code)
        cpu_x86_inject_mce()

The rest is guest kernel uses the existed MCA to handle the #MC. According to the
physical address (in fact, GPA) find the applications and send kill signal if the
#MC is SRAR.

So the intention of this patchset is to enrich the information when sending
SIGBUS, so that VMM can use already existed facility to take a proper behavior.

Best Regards,
Zhiquan

> 
> -- Thanks, -Kai
Huang, Kai May 12, 2022, 3:27 a.m. UTC | #3
On Thu, 2022-05-12 at 10:19 +0800, Zhiquan Li wrote:
> On 2022/5/12 07:33, Kai Huang wrote:
> > > +			ret = force_sig_mceerr(BUS_MCEERR_AR, (void __user *)owner->vaddr,
> > > +					PAGE_SHIFT);
> > If I understand correctly, this sends signal to the userspace hypervisor, i.e.
> > Qemu?  Can you elaborate how is KVM supposed to inject the #MC to guest, instead
> > of Qemu process (the VM) being killed?
> > 
> 
> Yes, here we use the facility that Qemu already has.
> The basic call path as below:
> 
> sigbus_handler
>   kvm_on_sigbus
>     kvm_arch_on_sigbus_vcpu
>       qemu_ram_addr_from_host
>       kvm_physical_memory_addr_from_host
>       kvm_hwpoison_page_add
>       kvm_mce_inject
>         cpu_x86_inject_mce
> 
> 1. At first when Qemu init, it will register sigbus_handler() as its SIGBUS handler
>    at qemu_init_sigbus().
> 2. At sigbus_handler() it will filter-out the signal which siginfo->si_code is not
>    BUS_MCEERR_AO or BUS_MCEERR_AR, and then re-raise SIGBUS signal.
>    If the si_code is BUS_MCEERR_AO or BUS_MCEERR_AR, it will invoke
>    kvm_on_sigbus(siginfo->si_code, siginfo->si_addr), go on handling the signal.
> 3. kvm_on_sigbus() is a simple generic wrap which will call arch specific
>    kvm_arch_on_sigbus_vcpu()
> 4. kvm_arch_on_sigbus_vcpu() is x86 specific implementation, it will do following
>    1) Convert HVA (addr) to HPA (ram_addr)
>       ram_addr = qemu_ram_addr_from_host(addr)
>    2) Retrieve the GPA (paddr) as per HVA (addr)
>       kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)
>    3) Add the page including HWPoison HPA to its hwpoison_page_list
>       kvm_hwpoison_page_add(ram_addr)
>    4) Inject the #MC to guest with GPA and si_code
>       kvm_mce_inject(cpu, paddr, code)
>         cpu_x86_inject_mce()
> 
> The rest is guest kernel uses the existed MCA to handle the #MC. According to the
> physical address (in fact, GPA) find the applications and send kill signal if the
> #MC is SRAR.
> 
> So the intention of this patchset is to enrich the information when sending
> SIGBUS, so that VMM can use already existed facility to take a proper behavior.
> 
> 

Yes this makes sense.  Thanks for explaining!
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 8e4bc6453d26..81801ab0009e 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -710,6 +710,8 @@  int arch_memory_failure(unsigned long pfn, int flags)
 	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
 	struct sgx_epc_section *section;
 	struct sgx_numa_node *node;
+	struct sgx_vepc_page *owner;
+	int ret = 0;
 
 	/*
 	 * mm/memory-failure.c calls this routine for all errors
@@ -726,8 +728,22 @@  int arch_memory_failure(unsigned long pfn, int flags)
 	 * error. The signal may help the task understand why the
 	 * enclave is broken.
 	 */
-	if (flags & MF_ACTION_REQUIRED)
-		force_sig(SIGBUS);
+	if (flags & MF_ACTION_REQUIRED) {
+		/*
+		 * In case the error memory is accessed by VM guest, provide
+		 * extra info for hypervisor to make further decision but not
+		 * simply kill it.
+		 */
+		if (page->flags & SGX_EPC_PAGE_IS_VEPC) {
+			owner = (struct sgx_vepc_page *)page->owner;
+			ret = force_sig_mceerr(BUS_MCEERR_AR, (void __user *)owner->vaddr,
+					PAGE_SHIFT);
+			if (ret < 0)
+				pr_err("Memory failure: Error sending signal to %s:%d: %d\n",
+					current->comm, current->pid, ret);
+		} else
+			force_sig(SIGBUS);
+	}
 
 	section = &sgx_epc_sections[page->section];
 	node = section->node;