From patchwork Tue Mar 31 19:40:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 11468315 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F1F8B913 for ; Tue, 31 Mar 2020 19:40:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BC4A120776 for ; Tue, 31 Mar 2020 19:40:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="LqcKas9p" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728493AbgCaTkc (ORCPT ); Tue, 31 Mar 2020 15:40:32 -0400 Received: from us-smtp-1.mimecast.com ([205.139.110.61]:48252 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727740AbgCaTkc (ORCPT ); Tue, 31 Mar 2020 15:40:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1585683630; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7evanx/JUXfji21NZZBHbn76ofL2zUbOkCQZCYcXArk=; b=LqcKas9pJr5SCx638hTXTNuzjx/w5ivQD0MjOqSLls+bQP8iafNjQIvB49ne1KeutVetOu 56haRtTKnS7CVjOmM8IU3YXThJaLEUf4/TRR+23OhFAuDqMzmEvZPQ57fGkLoSoeMwa1If lQhcLbSjsstHej/EZCEWX+uOKyGhxCs= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-34-acg85Hg7Mpuo67cZAyxyUQ-1; Tue, 31 Mar 2020 15:40:27 -0400 X-MC-Unique: acg85Hg7Mpuo67cZAyxyUQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E879C193579E; Tue, 31 Mar 2020 19:40:26 +0000 (UTC) Received: from horse.redhat.com (ovpn-118-184.phx2.redhat.com [10.3.118.184]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3079862697; Tue, 31 Mar 2020 19:40:21 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 604F22202B3; Tue, 31 Mar 2020 15:40:20 -0400 (EDT) From: Vivek Goyal To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: virtio-fs@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, vgoyal@redhat.com, aarcange@redhat.com, dhildenb@redhat.com Subject: [PATCH 1/4] kvm: Add capability to be able to report async pf error to guest Date: Tue, 31 Mar 2020 15:40:08 -0400 Message-Id: <20200331194011.24834-2-vgoyal@redhat.com> In-Reply-To: <20200331194011.24834-1-vgoyal@redhat.com> References: <20200331194011.24834-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org As of now asynchronous page fault mecahanism assumes host will always be successful in resolving page fault. So there are only two states, that is page is not present and page is ready. If a page is backed by a file and that file has been truncated (as can be the case with virtio-fs), then page fault handler on host returns -EFAULT. As of now async page fault logic does not look at error code (-EFAULT) returned by get_user_pages_remote() and returns PAGE_READY to guest. Guest tries to access page and page fault happnes again. And this gets kvm into an infinite loop. (Killing host process gets kvm out of this loop though). This patch adds another state to async page fault logic which allows host to return error to guest. Once guest knows that async page fault can't be resolved, it can send SIGBUS to host process (if user space was accessing the page in question). Signed-off-by: Vivek Goyal --- Documentation/virt/kvm/cpuid.rst | 4 ++++ Documentation/virt/kvm/msr.rst | 11 ++++++++--- arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/include/asm/kvm_para.h | 4 ++-- arch/x86/include/uapi/asm/kvm_para.h | 3 +++ arch/x86/kernel/kvm.c | 29 +++++++++++++++++++++++++--- arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/mmu/mmu.c | 2 +- arch/x86/kvm/x86.c | 13 +++++++++---- include/linux/kvm_host.h | 1 + virt/kvm/async_pf.c | 6 ++++-- 11 files changed, 63 insertions(+), 16 deletions(-) diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst index 01b081f6e7ea..a00bc5e964e0 100644 --- a/Documentation/virt/kvm/cpuid.rst +++ b/Documentation/virt/kvm/cpuid.rst @@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD 13 guest checks this feature bit before using paravirtualized sched yield. +KVM_FEATURE_ASYNC_PF_ERROR 14 paravirtualized async PF error + can be enabled by setting bit 3 + when writing to msr 0x4b564d02 + KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side per-cpu warps are expeced in kvmclock diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst index 33892036672d..93f5e555dcdf 100644 --- a/Documentation/virt/kvm/msr.rst +++ b/Documentation/virt/kvm/msr.rst @@ -192,18 +192,23 @@ MSR_KVM_ASYNC_PF_EN: data: Bits 63-6 hold 64-byte aligned physical address of a 64 byte memory area which must be in guest RAM and must be - zeroed. Bits 5-3 are reserved and should be zero. Bit 0 is 1 + zeroed. Bits 5-4 are reserved and should be zero. Bit 0 is 1 when asynchronous page faults are enabled on the vcpu 0 when disabled. Bit 1 is 1 if asynchronous page faults can be injected when vcpu is in cpl == 0. Bit 2 is 1 if asynchronous page faults are delivered to L1 as #PF vmexits. Bit 2 can be set only if - KVM_FEATURE_ASYNC_PF_VMEXIT is present in CPUID. + KVM_FEATURE_ASYNC_PF_VMEXIT is present in CPUID. Bit 3 is 1 if + asynchronous page fault can return error if hypervisor encounters + errors trying to fault in the page. Bit 3 can be set only if + KVM_FEATURE_ASYNC_PF_ERROR is present in CPUID. First 4 byte of 64 byte memory location will be written to by the hypervisor at the time of asynchronous page fault (APF) injection to indicate type of asynchronous page fault. Value of 1 means that the page referred to by the page fault is not - present. Value 2 means that the page is now available. Disabling + present. Value 2 means that the page is now available. Value 3 + means that hypervisor met with error while trying to fault in + page and task should probably be sent SIGBUS. Disabling interrupt inhibits APFs. Guest must not enable interrupt before the reason is read, or it may be overwritten by another APF. Since APF uses the same exception vector as regular page diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 98959e8cd448..011a5aab9df6 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -765,6 +765,7 @@ struct kvm_vcpu_arch { u32 host_apf_reason; unsigned long nested_apf_token; bool delivery_as_pf_vmexit; + bool send_pf_error; } apf; /* OSVW MSRs (AMD only) */ @@ -1642,6 +1643,8 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu, struct kvm_async_pf *work); void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work); +void kvm_arch_async_page_fault_error(struct kvm_vcpu *vcpu, + struct kvm_async_pf *work); bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu); extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn); diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 9b4df6eaa11a..3d6339c6cd47 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -89,7 +89,7 @@ bool kvm_para_available(void); unsigned int kvm_arch_para_features(void); unsigned int kvm_arch_para_hints(void); void kvm_async_pf_task_wait(u32 token, int interrupt_kernel); -void kvm_async_pf_task_wake(u32 token); +void kvm_async_pf_task_wake(u32 token, bool is_err); u32 kvm_read_and_reset_pf_reason(void); extern void kvm_disable_steal_time(void); void do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address); @@ -104,7 +104,7 @@ static inline void kvm_spinlock_init(void) #else /* CONFIG_KVM_GUEST */ #define kvm_async_pf_task_wait(T, I) do {} while(0) -#define kvm_async_pf_task_wake(T) do {} while(0) +#define kvm_async_pf_task_wake(T, I) do {} while(0) static inline bool kvm_para_available(void) { diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 2a8e0b6b9805..09743b45af79 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -31,6 +31,7 @@ #define KVM_FEATURE_PV_SEND_IPI 11 #define KVM_FEATURE_POLL_CONTROL 12 #define KVM_FEATURE_PV_SCHED_YIELD 13 +#define KVM_FEATURE_ASYNC_PF_ERROR 14 #define KVM_HINTS_REALTIME 0 @@ -81,6 +82,7 @@ struct kvm_clock_pairing { #define KVM_ASYNC_PF_ENABLED (1 << 0) #define KVM_ASYNC_PF_SEND_ALWAYS (1 << 1) #define KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT (1 << 2) +#define KVM_ASYNC_PF_SEND_ERROR (1 << 3) /* Operations for KVM_HC_MMU_OP */ #define KVM_MMU_OP_WRITE_PTE 1 @@ -110,6 +112,7 @@ struct kvm_mmu_op_release_pt { #define KVM_PV_REASON_PAGE_NOT_PRESENT 1 #define KVM_PV_REASON_PAGE_READY 2 +#define KVM_PV_REASON_PAGE_FAULT_ERROR 3 struct kvm_vcpu_pv_apf_data { __u32 reason; diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 6efe0410fb72..b5e9e3fa82df 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -74,6 +74,7 @@ struct kvm_task_sleep_node { u32 token; int cpu; bool halted; + bool is_err; }; static struct kvm_task_sleep_head { @@ -96,6 +97,12 @@ static struct kvm_task_sleep_node *_find_apf_task(struct kvm_task_sleep_head *b, return NULL; } +static void handle_async_pf_error(int user_mode) +{ + if (user_mode) + send_sig_info(SIGBUS, SEND_SIG_PRIV, current); +} + /* * @interrupt_kernel: Is this called from a routine which interrupts the kernel * (other than user space)? @@ -113,6 +120,8 @@ void kvm_async_pf_task_wait(u32 token, int interrupt_kernel) e = _find_apf_task(b, token); if (e) { /* dummy entry exist -> wake up was delivered ahead of PF */ + if (e->is_err) + handle_async_pf_error(!interrupt_kernel); hlist_del(&e->link); kfree(e); raw_spin_unlock(&b->lock); @@ -156,6 +165,9 @@ void kvm_async_pf_task_wait(u32 token, int interrupt_kernel) if (!n.halted) finish_swait(&n.wq, &wait); + if (n.is_err) + handle_async_pf_error(!interrupt_kernel); + rcu_irq_exit(); return; } @@ -188,7 +200,7 @@ static void apf_task_wake_all(void) } } -void kvm_async_pf_task_wake(u32 token) +void kvm_async_pf_task_wake(u32 token, bool is_err) { u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS); struct kvm_task_sleep_head *b = &async_pf_sleepers[key]; @@ -219,10 +231,13 @@ void kvm_async_pf_task_wake(u32 token) } n->token = token; n->cpu = smp_processor_id(); + n->is_err = is_err; init_swait_queue_head(&n->wq); hlist_add_head(&n->link, &b->list); - } else + } else { + n->is_err = is_err; apf_task_wake_one(n); + } raw_spin_unlock(&b->lock); return; } @@ -255,7 +270,12 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned lon break; case KVM_PV_REASON_PAGE_READY: rcu_irq_enter(); - kvm_async_pf_task_wake((u32)address); + kvm_async_pf_task_wake((u32)address, false); + rcu_irq_exit(); + break; + case KVM_PV_REASON_PAGE_FAULT_ERROR: + rcu_irq_enter(); + kvm_async_pf_task_wake((u32)address, true); rcu_irq_exit(); break; } @@ -316,6 +336,9 @@ static void kvm_guest_cpu_init(void) if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_VMEXIT)) pa |= KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT; + if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_ERROR)) + pa |= KVM_ASYNC_PF_SEND_ERROR; + wrmsrl(MSR_KVM_ASYNC_PF_EN, pa); __this_cpu_write(apf_reason.enabled, 1); printk(KERN_INFO"KVM setup async PF for cpu %d\n", diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index b1c469446b07..1ce1d998cbc2 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -716,7 +716,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function, (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) | (1 << KVM_FEATURE_PV_SEND_IPI) | (1 << KVM_FEATURE_POLL_CONTROL) | - (1 << KVM_FEATURE_PV_SCHED_YIELD); + (1 << KVM_FEATURE_PV_SCHED_YIELD) | + (1 << KVM_FEATURE_ASYNC_PF_ERROR); if (sched_info_on()) entry->eax |= (1 << KVM_FEATURE_STEAL_TIME); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 87e9ba27ada1..7c6e081bade1 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4211,7 +4211,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, case KVM_PV_REASON_PAGE_READY: vcpu->arch.apf.host_apf_reason = 0; local_irq_disable(); - kvm_async_pf_task_wake(fault_address); + kvm_async_pf_task_wake(fault_address, 0); local_irq_enable(); break; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3156e25b0774..9cd388f1891a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2614,8 +2614,8 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data) { gpa_t gpa = data & ~0x3f; - /* Bits 3:5 are reserved, Should be zero */ - if (data & 0x38) + /* Bits 4:5 are reserved, Should be zero */ + if (data & 0x30) return 1; vcpu->arch.apf.msr_val = data; @@ -2632,6 +2632,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data) vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS); vcpu->arch.apf.delivery_as_pf_vmexit = data & KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT; + vcpu->arch.apf.send_pf_error = data & KVM_ASYNC_PF_SEND_ERROR; kvm_async_pf_wakeup_all(vcpu); return 0; } @@ -10338,12 +10339,16 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) { struct x86_exception fault; - u32 val; + u32 val, async_pf_event = KVM_PV_REASON_PAGE_READY; if (work->wakeup_all) work->arch.token = ~0; /* broadcast wakeup */ else kvm_del_async_pf_gfn(vcpu, work->arch.gfn); + + if (work->error_code && vcpu->arch.apf.send_pf_error) + async_pf_event = KVM_PV_REASON_PAGE_FAULT_ERROR; + trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa); if (vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED && @@ -10359,7 +10364,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu, vcpu->arch.exception.error_code = 0; vcpu->arch.exception.has_payload = false; vcpu->arch.exception.payload = 0; - } else if (!apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) { + } else if (!apf_put_user(vcpu, async_pf_event)) { fault.vector = PF_VECTOR; fault.error_code_valid = true; fault.error_code = 0; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index bcb9b2ac0791..363fda33f803 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -206,6 +206,7 @@ struct kvm_async_pf { unsigned long addr; struct kvm_arch_async_pf arch; bool wakeup_all; + unsigned int error_code; }; void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu); diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c index 15e5b037f92d..d5268d34fc8e 100644 --- a/virt/kvm/async_pf.c +++ b/virt/kvm/async_pf.c @@ -51,6 +51,7 @@ static void async_pf_execute(struct work_struct *work) unsigned long addr = apf->addr; gpa_t cr2_or_gpa = apf->cr2_or_gpa; int locked = 1; + long ret; might_sleep(); @@ -60,11 +61,12 @@ static void async_pf_execute(struct work_struct *work) * access remotely. */ down_read(&mm->mmap_sem); - get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL, - &locked); + ret = get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL, + &locked); if (locked) up_read(&mm->mmap_sem); + apf->error_code = ret; if (IS_ENABLED(CONFIG_KVM_ASYNC_PF_SYNC)) kvm_arch_async_page_present(vcpu, apf); From patchwork Tue Mar 31 19:40:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 11468319 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8ECAF92A for ; Tue, 31 Mar 2020 19:40:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5A52720776 for ; Tue, 31 Mar 2020 19:40:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="TB22UnVo" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730533AbgCaTkj (ORCPT ); Tue, 31 Mar 2020 15:40:39 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:23470 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730413AbgCaTkh (ORCPT ); Tue, 31 Mar 2020 15:40:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1585683636; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=J9XP7qOQdAyVIPUvWhW1SjxHhsmR51c3lqlU3bm8vzk=; b=TB22UnVoOerOyPqmvs5eu80sLUgJwIY7xLhn5quV9REi0SBhA3KGS61vb/r3uhaYA+HqOE C7yGksDYA+CAGe8rAru2h7rago1Ga/BrVQN3uJlcM4jtgH6uREz+Fyfym7mEAi8GByppAt rpc2yCpMDO2Q9vNlvEzb9iWZ8A6KptY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-431-ZfpoV4uaP2GLz7vv0Pc3aA-1; Tue, 31 Mar 2020 15:40:28 -0400 X-MC-Unique: ZfpoV4uaP2GLz7vv0Pc3aA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C0134801E76; Tue, 31 Mar 2020 19:40:27 +0000 (UTC) Received: from horse.redhat.com (ovpn-118-184.phx2.redhat.com [10.3.118.184]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2FEB110002B5; Tue, 31 Mar 2020 19:40:21 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 65B5F2202C9; Tue, 31 Mar 2020 15:40:20 -0400 (EDT) From: Vivek Goyal To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: virtio-fs@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, vgoyal@redhat.com, aarcange@redhat.com, dhildenb@redhat.com Subject: [PATCH 2/4] kvm: async_pf: Send faulting gva address in case of error Date: Tue, 31 Mar 2020 15:40:09 -0400 Message-Id: <20200331194011.24834-3-vgoyal@redhat.com> In-Reply-To: <20200331194011.24834-1-vgoyal@redhat.com> References: <20200331194011.24834-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org If async page fault returns/injects error in guest, also send guest virtual address at the time of page fault. This will be needed if guest decides to send SIGBUS to task in guest. Guest process will need this info if it were to take some action. TODO: Nested kvm needs to be modified to use this. Also this patch only takes care of intel vmx. Signed-off-by: Vivek Goyal --- arch/x86/include/asm/kvm_host.h | 14 ++++++++++- arch/x86/include/asm/kvm_para.h | 8 +++---- arch/x86/include/asm/vmx.h | 2 ++ arch/x86/include/uapi/asm/kvm_para.h | 9 ++++++- arch/x86/kernel/kvm.c | 36 +++++++++++++++++----------- arch/x86/kvm/mmu/mmu.c | 10 ++++---- arch/x86/kvm/vmx/nested.c | 2 +- arch/x86/kvm/vmx/vmx.c | 11 +++++++-- arch/x86/kvm/x86.c | 34 +++++++++++++++++++------- 9 files changed, 90 insertions(+), 36 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 011a5aab9df6..0f83faeb5863 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -762,7 +762,7 @@ struct kvm_vcpu_arch { u64 msr_val; u32 id; bool send_user_only; - u32 host_apf_reason; + struct kvm_apf_reason host_apf_reason; unsigned long nested_apf_token; bool delivery_as_pf_vmexit; bool send_pf_error; @@ -813,6 +813,10 @@ struct kvm_vcpu_arch { bool gpa_available; gpa_t gpa_val; + /* GVA, if available at the time of VM exit */ + bool gva_available; + gva_t gva_val; + /* be preempted when it's in kernel-mode(cpl=0) */ bool preempted_in_kernel; @@ -1275,8 +1279,16 @@ struct kvm_arch_async_pf { gfn_t gfn; unsigned long cr3; bool direct_map; + bool gva_available; + gva_t gva_val; }; +struct kvm_arch_async_pf_shared { + u32 reason; + u32 pad1; + u64 faulting_gva; +} __packed; + extern struct kvm_x86_ops *kvm_x86_ops; extern struct kmem_cache *x86_fpu_cache; diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 3d6339c6cd47..2d464e470325 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -89,8 +89,8 @@ bool kvm_para_available(void); unsigned int kvm_arch_para_features(void); unsigned int kvm_arch_para_hints(void); void kvm_async_pf_task_wait(u32 token, int interrupt_kernel); -void kvm_async_pf_task_wake(u32 token, bool is_err); -u32 kvm_read_and_reset_pf_reason(void); +void kvm_async_pf_task_wake(u32 token, bool is_err, unsigned long addr); +void kvm_read_and_reset_pf_reason(struct kvm_apf_reason *reason); extern void kvm_disable_steal_time(void); void do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address); @@ -104,7 +104,7 @@ static inline void kvm_spinlock_init(void) #else /* CONFIG_KVM_GUEST */ #define kvm_async_pf_task_wait(T, I) do {} while(0) -#define kvm_async_pf_task_wake(T, I) do {} while(0) +#define kvm_async_pf_task_wake(T, I, A) do {} while(0) static inline bool kvm_para_available(void) { @@ -121,7 +121,7 @@ static inline unsigned int kvm_arch_para_hints(void) return 0; } -static inline u32 kvm_read_and_reset_pf_reason(void) +static inline void kvm_read_and_reset_pf_reason(struct kvm_apf_reason *reason) { return 0; } diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 8521af3fef27..014cccb2d25d 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -529,6 +529,7 @@ struct vmx_msr_entry { #define EPT_VIOLATION_READABLE_BIT 3 #define EPT_VIOLATION_WRITABLE_BIT 4 #define EPT_VIOLATION_EXECUTABLE_BIT 5 +#define EPT_VIOLATION_GLA_VALID_BIT 7 #define EPT_VIOLATION_GVA_TRANSLATED_BIT 8 #define EPT_VIOLATION_ACC_READ (1 << EPT_VIOLATION_ACC_READ_BIT) #define EPT_VIOLATION_ACC_WRITE (1 << EPT_VIOLATION_ACC_WRITE_BIT) @@ -536,6 +537,7 @@ struct vmx_msr_entry { #define EPT_VIOLATION_READABLE (1 << EPT_VIOLATION_READABLE_BIT) #define EPT_VIOLATION_WRITABLE (1 << EPT_VIOLATION_WRITABLE_BIT) #define EPT_VIOLATION_EXECUTABLE (1 << EPT_VIOLATION_EXECUTABLE_BIT) +#define EPT_VIOLATION_GLA_VALID (1 << EPT_VIOLATION_GLA_VALID_BIT) #define EPT_VIOLATION_GVA_TRANSLATED (1 << EPT_VIOLATION_GVA_TRANSLATED_BIT) /* diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 09743b45af79..95dcb6dd3c8a 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -116,10 +116,17 @@ struct kvm_mmu_op_release_pt { struct kvm_vcpu_pv_apf_data { __u32 reason; - __u8 pad[60]; + __u8 pad1[4]; + __u64 faulting_gva; + __u8 pad2[48]; __u32 enabled; }; +struct kvm_apf_reason { + u32 reason; + u64 faulting_gva; +}; + #define KVM_PV_EOI_BIT 0 #define KVM_PV_EOI_MASK (0x1 << KVM_PV_EOI_BIT) #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index b5e9e3fa82df..42d17e8c0135 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -75,6 +75,7 @@ struct kvm_task_sleep_node { int cpu; bool halted; bool is_err; + unsigned long fault_addr; }; static struct kvm_task_sleep_head { @@ -97,10 +98,10 @@ static struct kvm_task_sleep_node *_find_apf_task(struct kvm_task_sleep_head *b, return NULL; } -static void handle_async_pf_error(int user_mode) +static void handle_async_pf_error(int user_mode, unsigned long fault_addr) { if (user_mode) - send_sig_info(SIGBUS, SEND_SIG_PRIV, current); + force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)fault_addr); } /* @@ -121,7 +122,7 @@ void kvm_async_pf_task_wait(u32 token, int interrupt_kernel) if (e) { /* dummy entry exist -> wake up was delivered ahead of PF */ if (e->is_err) - handle_async_pf_error(!interrupt_kernel); + handle_async_pf_error(!interrupt_kernel, e->fault_addr); hlist_del(&e->link); kfree(e); raw_spin_unlock(&b->lock); @@ -166,7 +167,7 @@ void kvm_async_pf_task_wait(u32 token, int interrupt_kernel) finish_swait(&n.wq, &wait); if (n.is_err) - handle_async_pf_error(!interrupt_kernel); + handle_async_pf_error(!interrupt_kernel, n.fault_addr); rcu_irq_exit(); return; @@ -200,7 +201,7 @@ static void apf_task_wake_all(void) } } -void kvm_async_pf_task_wake(u32 token, bool is_err) +void kvm_async_pf_task_wake(u32 token, bool is_err, unsigned long fault_addr) { u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS); struct kvm_task_sleep_head *b = &async_pf_sleepers[key]; @@ -232,10 +233,12 @@ void kvm_async_pf_task_wake(u32 token, bool is_err) n->token = token; n->cpu = smp_processor_id(); n->is_err = is_err; + n->fault_addr = fault_addr; init_swait_queue_head(&n->wq); hlist_add_head(&n->link, &b->list); } else { n->is_err = is_err; + n->fault_addr = fault_addr; apf_task_wake_one(n); } raw_spin_unlock(&b->lock); @@ -243,16 +246,16 @@ void kvm_async_pf_task_wake(u32 token, bool is_err) } EXPORT_SYMBOL_GPL(kvm_async_pf_task_wake); -u32 kvm_read_and_reset_pf_reason(void) +void kvm_read_and_reset_pf_reason(struct kvm_apf_reason *apf) { - u32 reason = 0; - if (__this_cpu_read(apf_reason.enabled)) { - reason = __this_cpu_read(apf_reason.reason); + apf->reason = __this_cpu_read(apf_reason.reason); + apf->faulting_gva = __this_cpu_read(apf_reason.faulting_gva); __this_cpu_write(apf_reason.reason, 0); + __this_cpu_write(apf_reason.faulting_gva, 0); + } else { + apf->reason = 0; } - - return reason; } EXPORT_SYMBOL_GPL(kvm_read_and_reset_pf_reason); NOKPROBE_SYMBOL(kvm_read_and_reset_pf_reason); @@ -260,7 +263,11 @@ NOKPROBE_SYMBOL(kvm_read_and_reset_pf_reason); dotraplinkage void do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address) { - switch (kvm_read_and_reset_pf_reason()) { + struct kvm_apf_reason apf_data; + + kvm_read_and_reset_pf_reason(&apf_data); + + switch (apf_data.reason) { default: do_page_fault(regs, error_code, address); break; @@ -270,12 +277,13 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned lon break; case KVM_PV_REASON_PAGE_READY: rcu_irq_enter(); - kvm_async_pf_task_wake((u32)address, false); + kvm_async_pf_task_wake((u32)address, false, 0); rcu_irq_exit(); break; case KVM_PV_REASON_PAGE_FAULT_ERROR: rcu_irq_enter(); - kvm_async_pf_task_wake((u32)address, true); + kvm_async_pf_task_wake((u32)address, true, + apf_data.faulting_gva); rcu_irq_exit(); break; } diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 7c6e081bade1..e3337c5f73e0 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4082,6 +4082,8 @@ static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, arch.direct_map = vcpu->arch.mmu->direct_map; arch.cr3 = vcpu->arch.mmu->get_cr3(vcpu); + arch.gva_available = vcpu->arch.gva_available; + arch.gva_val = vcpu->arch.gva_val; return kvm_setup_async_pf(vcpu, cr2_or_gpa, kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch); } @@ -4193,7 +4195,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, #endif vcpu->arch.l1tf_flush_l1d = true; - switch (vcpu->arch.apf.host_apf_reason) { + switch (vcpu->arch.apf.host_apf_reason.reason) { default: trace_kvm_page_fault(fault_address, error_code); @@ -4203,15 +4205,15 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, insn_len); break; case KVM_PV_REASON_PAGE_NOT_PRESENT: - vcpu->arch.apf.host_apf_reason = 0; + vcpu->arch.apf.host_apf_reason.reason = 0; local_irq_disable(); kvm_async_pf_task_wait(fault_address, 0); local_irq_enable(); break; case KVM_PV_REASON_PAGE_READY: - vcpu->arch.apf.host_apf_reason = 0; + vcpu->arch.apf.host_apf_reason.reason = 0; local_irq_disable(); - kvm_async_pf_task_wake(fault_address, 0); + kvm_async_pf_task_wake(fault_address, 0, 0); local_irq_enable(); break; } diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 9750e590c89d..e8b026ec4acc 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -5560,7 +5560,7 @@ bool nested_vmx_exit_reflected(struct kvm_vcpu *vcpu, u32 exit_reason) if (is_nmi(intr_info)) return false; else if (is_page_fault(intr_info)) - return !vmx->vcpu.arch.apf.host_apf_reason && enable_ept; + return !vmx->vcpu.arch.apf.host_apf_reason.reason && enable_ept; else if (is_debug(intr_info) && vcpu->guest_debug & (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP)) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 26f8f31563e9..80dffc7375b6 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -4676,7 +4676,8 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu) if (is_page_fault(intr_info)) { cr2 = vmcs_readl(EXIT_QUALIFICATION); /* EPT won't cause page fault directly */ - WARN_ON_ONCE(!vcpu->arch.apf.host_apf_reason && enable_ept); + WARN_ON_ONCE(!vcpu->arch.apf.host_apf_reason.reason && + enable_ept); return kvm_handle_page_fault(vcpu, error_code, cr2, NULL, 0); } @@ -5159,6 +5160,7 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) unsigned long exit_qualification; gpa_t gpa; u64 error_code; + gva_t gva; exit_qualification = vmcs_readl(EXIT_QUALIFICATION); @@ -5195,6 +5197,11 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK; vcpu->arch.exit_qualification = exit_qualification; + if (exit_qualification | EPT_VIOLATION_GLA_VALID) { + gva = vmcs_readl(GUEST_LINEAR_ADDRESS); + vcpu->arch.gva_available = true; + vcpu->arch.gva_val = gva; + } return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0); } @@ -6236,7 +6243,7 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx) /* if exit due to PF check for async PF */ if (is_page_fault(vmx->exit_intr_info)) - vmx->vcpu.arch.apf.host_apf_reason = kvm_read_and_reset_pf_reason(); + kvm_read_and_reset_pf_reason(&vmx->vcpu.arch.apf.host_apf_reason); /* Handle machine checks before interrupts are enabled */ if (is_machine_check(vmx->exit_intr_info)) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9cd388f1891a..f3c79baf4998 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2627,7 +2627,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data) } if (kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.apf.data, gpa, - sizeof(u32))) + sizeof(struct kvm_arch_async_pf_shared))) return 1; vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS); @@ -10261,12 +10261,18 @@ static void kvm_del_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn) } } -static int apf_put_user(struct kvm_vcpu *vcpu, u32 val) +static int apf_put_user_u32(struct kvm_vcpu *vcpu, u32 val) { - return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.apf.data, &val, sizeof(val)); } +static int apf_put_user(struct kvm_vcpu *vcpu, + struct kvm_arch_async_pf_shared *val) +{ + + return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.apf.data, val, + sizeof(*val)); +} static int apf_get_user(struct kvm_vcpu *vcpu, u32 *val) { @@ -10309,12 +10315,16 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) { struct x86_exception fault; + struct kvm_arch_async_pf_shared apf_shared; trace_kvm_async_pf_not_present(work->arch.token, work->cr2_or_gpa); kvm_add_async_pf_gfn(vcpu, work->arch.gfn); + memset(&apf_shared, 0, sizeof(apf_shared)); + apf_shared.reason = KVM_PV_REASON_PAGE_NOT_PRESENT; + if (kvm_can_deliver_async_pf(vcpu) && - !apf_put_user(vcpu, KVM_PV_REASON_PAGE_NOT_PRESENT)) { + !apf_put_user(vcpu, &apf_shared)) { fault.vector = PF_VECTOR; fault.error_code_valid = true; fault.error_code = 0; @@ -10339,15 +10349,21 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) { struct x86_exception fault; - u32 val, async_pf_event = KVM_PV_REASON_PAGE_READY; + u32 val; + struct kvm_arch_async_pf_shared asyncpf_shared; if (work->wakeup_all) work->arch.token = ~0; /* broadcast wakeup */ else kvm_del_async_pf_gfn(vcpu, work->arch.gfn); - if (work->error_code && vcpu->arch.apf.send_pf_error) - async_pf_event = KVM_PV_REASON_PAGE_FAULT_ERROR; + memset(&asyncpf_shared, 0, sizeof(asyncpf_shared)); + asyncpf_shared.reason = KVM_PV_REASON_PAGE_READY; + if (work->error_code && vcpu->arch.apf.send_pf_error) { + asyncpf_shared.reason = KVM_PV_REASON_PAGE_FAULT_ERROR; + if (work->arch.gva_available) + asyncpf_shared.faulting_gva = work->arch.gva_val; + } trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa); @@ -10356,7 +10372,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu, if (val == KVM_PV_REASON_PAGE_NOT_PRESENT && vcpu->arch.exception.pending && vcpu->arch.exception.nr == PF_VECTOR && - !apf_put_user(vcpu, 0)) { + !apf_put_user_u32(vcpu, 0)) { vcpu->arch.exception.injected = false; vcpu->arch.exception.pending = false; vcpu->arch.exception.nr = 0; @@ -10364,7 +10380,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu, vcpu->arch.exception.error_code = 0; vcpu->arch.exception.has_payload = false; vcpu->arch.exception.payload = 0; - } else if (!apf_put_user(vcpu, async_pf_event)) { + } else if (!apf_put_user(vcpu, &asyncpf_shared)) { fault.vector = PF_VECTOR; fault.error_code_valid = true; fault.error_code = 0; From patchwork Tue Mar 31 19:40:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 11468321 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A849992A for ; Tue, 31 Mar 2020 19:44:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 86C5820776 for ; Tue, 31 Mar 2020 19:44:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="FDGCzUwU" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728091AbgCaToN (ORCPT ); Tue, 31 Mar 2020 15:44:13 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:55068 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727768AbgCaToN (ORCPT ); Tue, 31 Mar 2020 15:44:13 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1585683852; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mcpjr22EubvpvBv5U1Nr7UCjm49Gd0MVWUusU3L0BO0=; b=FDGCzUwU6vM5CNSPQQe1037cgWkeeXubKUqlK6RE5HzlETqTctOcXCaPGiWO3+4w9DIZRK R6LaZhDPNjbFF0YA8RtUXpk/J8FKhfY1LHjso8dDmj03cDPO15kzaEgYwsiTsmBdNRyXZr JR62L4ez/QlYklQyzoeMnl7xkq3GKzc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-282-mi-6gqLsPvy_dhd7NAlIvA-1; Tue, 31 Mar 2020 15:40:30 -0400 X-MC-Unique: mi-6gqLsPvy_dhd7NAlIvA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E35C11084426; Tue, 31 Mar 2020 19:40:28 +0000 (UTC) Received: from horse.redhat.com (ovpn-118-184.phx2.redhat.com [10.3.118.184]) by smtp.corp.redhat.com (Postfix) with ESMTP id 38C945C1BB; Tue, 31 Mar 2020 19:40:21 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 6A6F42202D7; Tue, 31 Mar 2020 15:40:20 -0400 (EDT) From: Vivek Goyal To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: virtio-fs@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, vgoyal@redhat.com, aarcange@redhat.com, dhildenb@redhat.com Subject: [PATCH 3/4] kvm: Always get async page notifications Date: Tue, 31 Mar 2020 15:40:10 -0400 Message-Id: <20200331194011.24834-4-vgoyal@redhat.com> In-Reply-To: <20200331194011.24834-1-vgoyal@redhat.com> References: <20200331194011.24834-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Right now, we seem to get async pf related notifications only if guest is in user mode, or if it is in kernel mode and CONFIG_PREEMPTION is enabled. I think idea is that if CONFIG_PREEMPTION is enabled then it gives us opportunity to schedule something else if page is not ready. If KVM_ASYNC_PF_SEND_ALWAYS is not set, then host will not send notifications of PAGE_NOT_PRESENT/PAGE_READY. Instead once page has been installed guest will run. Now we are adding capability to report errors as part of async pf protocol. That means we need async pf related notifications so that we can make a task wait and when error is reported, we can either send SIGBUS to user process or search through exception tables for possible error handler. Hence enable async pf notifications always. Not sure if this will have noticieable performance implication though. Signed-off-by: Vivek Goyal --- arch/x86/kernel/kvm.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 42d17e8c0135..97753a648133 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -336,9 +336,7 @@ static void kvm_guest_cpu_init(void) if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) && kvmapf) { u64 pa = slow_virt_to_phys(this_cpu_ptr(&apf_reason)); -#ifdef CONFIG_PREEMPTION pa |= KVM_ASYNC_PF_SEND_ALWAYS; -#endif pa |= KVM_ASYNC_PF_ENABLED; if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_VMEXIT)) From patchwork Tue Mar 31 19:40:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 11468317 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0370992A for ; Tue, 31 Mar 2020 19:40:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CC3602072E for ; Tue, 31 Mar 2020 19:40:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="N9aeXCmU" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730081AbgCaTkf (ORCPT ); Tue, 31 Mar 2020 15:40:35 -0400 Received: from us-smtp-1.mimecast.com ([205.139.110.61]:59434 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1729368AbgCaTkf (ORCPT ); Tue, 31 Mar 2020 15:40:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1585683634; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=U/VGYhnEFcRnnck5kQgK5opKWmK9mE8bqE7Og2ysZDk=; b=N9aeXCmUoJ1Bh3umbszM8mHSdlKAJ61nS9nW8A5IorEQYYicQzNaETMvQG2SOdliwR3Ck8 azlSbP2edT50lPX50lHlwz8Plnxg6IfY1A0X65A1rlsB46oayM83r1bjIy/yvY2jK+IMo/ HkweENkJ5dpDBzUuD2s210OiU5+Z5Nw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-296-beujuJUjPmCUp7d7ml0qUA-1; Tue, 31 Mar 2020 15:40:32 -0400 X-MC-Unique: beujuJUjPmCUp7d7ml0qUA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5D3E48014CC; Tue, 31 Mar 2020 19:40:31 +0000 (UTC) Received: from horse.redhat.com (ovpn-118-184.phx2.redhat.com [10.3.118.184]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4829319C6A; Tue, 31 Mar 2020 19:40:21 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 6E7D62202E3; Tue, 31 Mar 2020 15:40:20 -0400 (EDT) From: Vivek Goyal To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: virtio-fs@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, vgoyal@redhat.com, aarcange@redhat.com, dhildenb@redhat.com Subject: [PATCH 4/4] kvm,x86,async_pf: Search exception tables in case of error Date: Tue, 31 Mar 2020 15:40:11 -0400 Message-Id: <20200331194011.24834-5-vgoyal@redhat.com> In-Reply-To: <20200331194011.24834-1-vgoyal@redhat.com> References: <20200331194011.24834-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org If an error happens during page fault and it was kernel code executing at the time of fault, search exception tables and jump to corresponding handler, if there is one. This is useful when virtiofs DAX code is doing memcpy and page fault returns an error because corresponding page has been truncated on host. In that case, we want to return that error to guest user space, instead of retrying infinitely. This does not take care of nested KVM. Exit into L1 does not have notion of passing "struct pt_regs" to handler. That needs to be fixed first. Signed-off-by: Vivek Goyal --- arch/x86/include/asm/kvm_para.h | 5 +++-- arch/x86/kernel/kvm.c | 24 ++++++++++++++++++------ arch/x86/kvm/mmu/mmu.c | 2 +- 3 files changed, 22 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 2d464e470325..2c9e7c852b40 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -88,7 +88,8 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1, bool kvm_para_available(void); unsigned int kvm_arch_para_features(void); unsigned int kvm_arch_para_hints(void); -void kvm_async_pf_task_wait(u32 token, int interrupt_kernel); +void kvm_async_pf_task_wait(u32 token, int interrupt_kernel, + struct pt_regs *regs, unsigned long error_code); void kvm_async_pf_task_wake(u32 token, bool is_err, unsigned long addr); void kvm_read_and_reset_pf_reason(struct kvm_apf_reason *reason); extern void kvm_disable_steal_time(void); @@ -103,7 +104,7 @@ static inline void kvm_spinlock_init(void) #endif /* CONFIG_PARAVIRT_SPINLOCKS */ #else /* CONFIG_KVM_GUEST */ -#define kvm_async_pf_task_wait(T, I) do {} while(0) +#define kvm_async_pf_task_wait(T, I, R, E) do {} while(0) #define kvm_async_pf_task_wake(T, I, A) do {} while(0) static inline bool kvm_para_available(void) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 97753a648133..387ef0aa323b 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -98,17 +98,23 @@ static struct kvm_task_sleep_node *_find_apf_task(struct kvm_task_sleep_head *b, return NULL; } -static void handle_async_pf_error(int user_mode, unsigned long fault_addr) +static inline void handle_async_pf_error(int user_mode, + unsigned long fault_addr, + struct pt_regs *regs, + unsigned long error_code) { if (user_mode) force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)fault_addr); + else + fixup_exception(regs, X86_TRAP_PF, error_code, fault_addr); } /* * @interrupt_kernel: Is this called from a routine which interrupts the kernel * (other than user space)? */ -void kvm_async_pf_task_wait(u32 token, int interrupt_kernel) +void kvm_async_pf_task_wait(u32 token, int interrupt_kernel, + struct pt_regs *regs, unsigned long error_code) { u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS); struct kvm_task_sleep_head *b = &async_pf_sleepers[key]; @@ -120,13 +126,17 @@ void kvm_async_pf_task_wait(u32 token, int interrupt_kernel) raw_spin_lock(&b->lock); e = _find_apf_task(b, token); if (e) { + bool is_err = e->is_err; + unsigned long fault_addr = e->fault_addr; + /* dummy entry exist -> wake up was delivered ahead of PF */ - if (e->is_err) - handle_async_pf_error(!interrupt_kernel, e->fault_addr); hlist_del(&e->link); kfree(e); raw_spin_unlock(&b->lock); + if (is_err) + handle_async_pf_error(!interrupt_kernel, fault_addr, + regs, error_code); rcu_irq_exit(); return; } @@ -167,7 +177,8 @@ void kvm_async_pf_task_wait(u32 token, int interrupt_kernel) finish_swait(&n.wq, &wait); if (n.is_err) - handle_async_pf_error(!interrupt_kernel, n.fault_addr); + handle_async_pf_error(!interrupt_kernel, n.fault_addr, regs, + error_code); rcu_irq_exit(); return; @@ -273,7 +284,8 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned lon break; case KVM_PV_REASON_PAGE_NOT_PRESENT: /* page is swapped out by the host. */ - kvm_async_pf_task_wait((u32)address, !user_mode(regs)); + kvm_async_pf_task_wait((u32)address, !user_mode(regs), regs, + error_code); break; case KVM_PV_REASON_PAGE_READY: rcu_irq_enter(); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index e3337c5f73e0..a9b707fb5861 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4207,7 +4207,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, case KVM_PV_REASON_PAGE_NOT_PRESENT: vcpu->arch.apf.host_apf_reason.reason = 0; local_irq_disable(); - kvm_async_pf_task_wait(fault_address, 0); + kvm_async_pf_task_wait(fault_address, 0, NULL, 0); local_irq_enable(); break; case KVM_PV_REASON_PAGE_READY: