Message ID | 20220510031748.3181459-1-zhiquan1.li@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/sgx: fine grained SGX MCA behavior | expand |
On Tue, 2022-05-10 at 11:17 +0800, Zhiquan Li wrote: > When VM guest access a SGX EPC page with memory failure, current > behavior will kill the guest, expected only kill the SGX application > inside it. > > To fix it we send SIGBUS with code BUS_MCEERR_AR and some extra > information for hypervisor to inject #MC information to guest, which is > helpful in SGX case. > > Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com> > --- > arch/x86/kernel/cpu/sgx/main.c | 20 ++++++++++++++++++-- > 1 file changed, 18 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c > index 8e4bc6453d26..81801ab0009e 100644 > --- a/arch/x86/kernel/cpu/sgx/main.c > +++ b/arch/x86/kernel/cpu/sgx/main.c > @@ -710,6 +710,8 @@ int arch_memory_failure(unsigned long pfn, int flags) > struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT); > struct sgx_epc_section *section; > struct sgx_numa_node *node; > + struct sgx_vepc_page *owner; > + int ret = 0; > > /* > * mm/memory-failure.c calls this routine for all errors > @@ -726,8 +728,22 @@ int arch_memory_failure(unsigned long pfn, int flags) > * error. The signal may help the task understand why the > * enclave is broken. > */ > - if (flags & MF_ACTION_REQUIRED) > - force_sig(SIGBUS); > + if (flags & MF_ACTION_REQUIRED) { > + /* > + * In case the error memory is accessed by VM guest, provide > + * extra info for hypervisor to make further decision but not > + * simply kill it. > + */ > + if (page->flags & SGX_EPC_PAGE_IS_VEPC) { > + owner = (struct sgx_vepc_page *)page->owner; > + ret = force_sig_mceerr(BUS_MCEERR_AR, (void __user *)owner->vaddr, > + PAGE_SHIFT); If I understand correctly, this sends signal to the userspace hypervisor, i.e. Qemu? Can you elaborate how is KVM supposed to inject the #MC to guest, instead of Qemu process (the VM) being killed?
On 2022/5/12 07:33, Kai Huang wrote: >> + ret = force_sig_mceerr(BUS_MCEERR_AR, (void __user *)owner->vaddr, >> + PAGE_SHIFT); > If I understand correctly, this sends signal to the userspace hypervisor, i.e. > Qemu? Can you elaborate how is KVM supposed to inject the #MC to guest, instead > of Qemu process (the VM) being killed? > Yes, here we use the facility that Qemu already has. The basic call path as below: sigbus_handler kvm_on_sigbus kvm_arch_on_sigbus_vcpu qemu_ram_addr_from_host kvm_physical_memory_addr_from_host kvm_hwpoison_page_add kvm_mce_inject cpu_x86_inject_mce 1. At first when Qemu init, it will register sigbus_handler() as its SIGBUS handler at qemu_init_sigbus(). 2. At sigbus_handler() it will filter-out the signal which siginfo->si_code is not BUS_MCEERR_AO or BUS_MCEERR_AR, and then re-raise SIGBUS signal. If the si_code is BUS_MCEERR_AO or BUS_MCEERR_AR, it will invoke kvm_on_sigbus(siginfo->si_code, siginfo->si_addr), go on handling the signal. 3. kvm_on_sigbus() is a simple generic wrap which will call arch specific kvm_arch_on_sigbus_vcpu() 4. kvm_arch_on_sigbus_vcpu() is x86 specific implementation, it will do following 1) Convert HVA (addr) to HPA (ram_addr) ram_addr = qemu_ram_addr_from_host(addr) 2) Retrieve the GPA (paddr) as per HVA (addr) kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr) 3) Add the page including HWPoison HPA to its hwpoison_page_list kvm_hwpoison_page_add(ram_addr) 4) Inject the #MC to guest with GPA and si_code kvm_mce_inject(cpu, paddr, code) cpu_x86_inject_mce() The rest is guest kernel uses the existed MCA to handle the #MC. According to the physical address (in fact, GPA) find the applications and send kill signal if the #MC is SRAR. So the intention of this patchset is to enrich the information when sending SIGBUS, so that VMM can use already existed facility to take a proper behavior. Best Regards, Zhiquan > > -- Thanks, -Kai
On Thu, 2022-05-12 at 10:19 +0800, Zhiquan Li wrote: > On 2022/5/12 07:33, Kai Huang wrote: > > > + ret = force_sig_mceerr(BUS_MCEERR_AR, (void __user *)owner->vaddr, > > > + PAGE_SHIFT); > > If I understand correctly, this sends signal to the userspace hypervisor, i.e. > > Qemu? Can you elaborate how is KVM supposed to inject the #MC to guest, instead > > of Qemu process (the VM) being killed? > > > > Yes, here we use the facility that Qemu already has. > The basic call path as below: > > sigbus_handler > kvm_on_sigbus > kvm_arch_on_sigbus_vcpu > qemu_ram_addr_from_host > kvm_physical_memory_addr_from_host > kvm_hwpoison_page_add > kvm_mce_inject > cpu_x86_inject_mce > > 1. At first when Qemu init, it will register sigbus_handler() as its SIGBUS handler > at qemu_init_sigbus(). > 2. At sigbus_handler() it will filter-out the signal which siginfo->si_code is not > BUS_MCEERR_AO or BUS_MCEERR_AR, and then re-raise SIGBUS signal. > If the si_code is BUS_MCEERR_AO or BUS_MCEERR_AR, it will invoke > kvm_on_sigbus(siginfo->si_code, siginfo->si_addr), go on handling the signal. > 3. kvm_on_sigbus() is a simple generic wrap which will call arch specific > kvm_arch_on_sigbus_vcpu() > 4. kvm_arch_on_sigbus_vcpu() is x86 specific implementation, it will do following > 1) Convert HVA (addr) to HPA (ram_addr) > ram_addr = qemu_ram_addr_from_host(addr) > 2) Retrieve the GPA (paddr) as per HVA (addr) > kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr) > 3) Add the page including HWPoison HPA to its hwpoison_page_list > kvm_hwpoison_page_add(ram_addr) > 4) Inject the #MC to guest with GPA and si_code > kvm_mce_inject(cpu, paddr, code) > cpu_x86_inject_mce() > > The rest is guest kernel uses the existed MCA to handle the #MC. According to the > physical address (in fact, GPA) find the applications and send kill signal if the > #MC is SRAR. > > So the intention of this patchset is to enrich the information when sending > SIGBUS, so that VMM can use already existed facility to take a proper behavior. > > Yes this makes sense. Thanks for explaining!
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 8e4bc6453d26..81801ab0009e 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -710,6 +710,8 @@ int arch_memory_failure(unsigned long pfn, int flags) struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT); struct sgx_epc_section *section; struct sgx_numa_node *node; + struct sgx_vepc_page *owner; + int ret = 0; /* * mm/memory-failure.c calls this routine for all errors @@ -726,8 +728,22 @@ int arch_memory_failure(unsigned long pfn, int flags) * error. The signal may help the task understand why the * enclave is broken. */ - if (flags & MF_ACTION_REQUIRED) - force_sig(SIGBUS); + if (flags & MF_ACTION_REQUIRED) { + /* + * In case the error memory is accessed by VM guest, provide + * extra info for hypervisor to make further decision but not + * simply kill it. + */ + if (page->flags & SGX_EPC_PAGE_IS_VEPC) { + owner = (struct sgx_vepc_page *)page->owner; + ret = force_sig_mceerr(BUS_MCEERR_AR, (void __user *)owner->vaddr, + PAGE_SHIFT); + if (ret < 0) + pr_err("Memory failure: Error sending signal to %s:%d: %d\n", + current->comm, current->pid, ret); + } else + force_sig(SIGBUS); + } section = &sgx_epc_sections[page->section]; node = section->node;
When VM guest access a SGX EPC page with memory failure, current behavior will kill the guest, expected only kill the SGX application inside it. To fix it we send SIGBUS with code BUS_MCEERR_AR and some extra information for hypervisor to inject #MC information to guest, which is helpful in SGX case. Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com> --- arch/x86/kernel/cpu/sgx/main.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)