From patchwork Wed Jan 16 17:59:05 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoffer Dall X-Patchwork-Id: 1992931 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) by patchwork2.kernel.org (Postfix) with ESMTP id 5B9BCDF2A2 for ; Wed, 16 Jan 2013 18:08:25 +0000 (UTC) Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux)) id 1TvXL7-0006Ft-Ex; Wed, 16 Jan 2013 18:03:58 +0000 Received: from mail-vb0-f53.google.com ([209.85.212.53]) by merlin.infradead.org with esmtps (Exim 4.76 #1 (Red Hat Linux)) id 1TvXGV-0003rA-BV for linux-arm-kernel@lists.infradead.org; Wed, 16 Jan 2013 17:59:14 +0000 Received: by mail-vb0-f53.google.com with SMTP id b23so1603680vbz.40 for ; Wed, 16 Jan 2013 09:59:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:subject:to:from:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-type :content-transfer-encoding:x-gm-message-state; bh=riqxc+wWDHcdooB0qPBh7MqsVT6O2fwQwpHL2ce313s=; b=SqIj6aRSsCQx+U8rszxwepIdZ5unOfQ4W27Ho3oeQie++bxm8vfarD2IdcUH/Q1SM9 4K1YiBfEVEoJ2KDn6fF4dHoL7i1HyadA7C6Ufq7Z0NR8vqnAx11Pf0dnLVB1s6wzXiCH lyHQDIoDO+LMDApoMYD/M2GYYMAZzGxUAXF6yJKNWW0JeMnE8C0aPUjSz+DaiQT05IqI eatrQ9hPuSaA7B/TPNLjFIszpO41YqU4iAUYnVpqaEUcd/SM0dgfd/NI/hyWM/JWwSxl bHGl9uTfV4K6Z6UmbeRYlss2sj1zqX833FJ9UhBEFhdYNtqpzbslfpXb5Uw6Im3ddWND HZqw== X-Received: by 10.220.148.205 with SMTP id q13mr2249421vcv.6.1358359146734; Wed, 16 Jan 2013 09:59:06 -0800 (PST) Received: from [127.0.1.1] (pool-72-80-83-148.nycmny.fios.verizon.net. [72.80.83.148]) by mx.google.com with ESMTPS id yu4sm9732001veb.7.2013.01.16.09.59.05 (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 16 Jan 2013 09:59:06 -0800 (PST) Subject: [PATCH v6 12/15] KVM: ARM: Handle guest faults in KVM To: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu From: Christoffer Dall Date: Wed, 16 Jan 2013 12:59:05 -0500 Message-ID: <20130116175905.29147.17824.stgit@ubuntu> In-Reply-To: <20130116175716.29147.15348.stgit@ubuntu> References: <20130116175716.29147.15348.stgit@ubuntu> User-Agent: StGit/0.15 MIME-Version: 1.0 X-Gm-Message-State: ALoCoQmnBSlNtCnBp+DozBoHdDug4c9ECSx35DhA9SbGG05dmcXuriYsWaEi/dQio43Ncod6da9l X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20130116_125908_005077_85162993 X-CRM114-Status: GOOD ( 25.13 ) X-Spam-Score: -2.6 (--) X-Spam-Report: SpamAssassin version 3.3.2 on merlin.infradead.org summary: Content analysis details: (-2.6 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [209.85.212.53 listed in list.dnswl.org] -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] Cc: Marc Zyngier , Marcelo Tosatti , Will Deacon X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-arm-kernel-bounces@lists.infradead.org Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org Handles the guest faults in KVM by mapping in corresponding user pages in the 2nd stage page tables. We invalidate the instruction cache by MVA whenever we map a page to the guest (no, we cannot only do it when we have an iabt because the guest may happily read/write a page before hitting the icache) if the hardware uses VIPT or PIPT. In the latter case, we can invalidate only that physical page. In the first case, all bets are off and we simply must invalidate the whole affair. Not that VIVT icaches are tagged with vmids, and we are out of the woods on that one. Alexander Graf was nice enough to remind us of this massive pain. Reviewed-by: Will Deacon Reviewed-by: Marcelo Tosatti Signed-off-by: Marc Zyngier Signed-off-by: Christoffer Dall --- arch/arm/include/asm/kvm_asm.h | 2 + arch/arm/include/asm/kvm_mmu.h | 12 +++ arch/arm/kvm/mmu.c | 153 ++++++++++++++++++++++++++++++++++++++++ arch/arm/kvm/trace.h | 26 +++++++ 4 files changed, 192 insertions(+), 1 deletion(-) diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h index f6652f6..5e06e81 100644 --- a/arch/arm/include/asm/kvm_asm.h +++ b/arch/arm/include/asm/kvm_asm.h @@ -71,6 +71,8 @@ extern char __kvm_hyp_vector[]; extern char __kvm_hyp_code_start[]; extern char __kvm_hyp_code_end[]; +extern void __kvm_tlb_flush_vmid(struct kvm *kvm); + extern void __kvm_flush_vm_context(void); extern void __kvm_tlb_flush_vmid(struct kvm *kvm); diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 499e7b0..421a20b 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -35,4 +35,16 @@ void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu); phys_addr_t kvm_mmu_get_httbr(void); int kvm_mmu_init(void); void kvm_clear_hyp_idmap(void); + +static inline bool kvm_is_write_fault(unsigned long hsr) +{ + unsigned long hsr_ec = hsr >> HSR_EC_SHIFT; + if (hsr_ec == HSR_EC_IABT) + return false; + else if ((hsr & HSR_ISV) && !(hsr & HSR_WNR)) + return false; + else + return true; +} + #endif /* __ARM_KVM_MMU_H__ */ diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 4347d68..a4b7b0f 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -21,9 +21,11 @@ #include #include #include +#include #include #include #include +#include #include #include @@ -488,9 +490,158 @@ out: return ret; } +static void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn) +{ + /* + * If we are going to insert an instruction page and the icache is + * either VIPT or PIPT, there is a potential problem where the host + * (or another VM) may have used the same page as this guest, and we + * read incorrect data from the icache. If we're using a PIPT cache, + * we can invalidate just that page, but if we are using a VIPT cache + * we need to invalidate the entire icache - damn shame - as written + * in the ARM ARM (DDI 0406C.b - Page B3-1393). + * + * VIVT caches are tagged using both the ASID and the VMID and doesn't + * need any kind of flushing (DDI 0406C.b - Page B3-1392). + */ + if (icache_is_pipt()) { + unsigned long hva = gfn_to_hva(kvm, gfn); + __cpuc_coherent_user_range(hva, hva + PAGE_SIZE); + } else if (!icache_is_vivt_asid_tagged()) { + /* any kind of VIPT cache */ + __flush_icache_all(); + } +} + +static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, + gfn_t gfn, struct kvm_memory_slot *memslot, + unsigned long fault_status) +{ + pte_t new_pte; + pfn_t pfn; + int ret; + bool write_fault, writable; + unsigned long mmu_seq; + struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache; + + write_fault = kvm_is_write_fault(vcpu->arch.hsr); + if (fault_status == FSC_PERM && !write_fault) { + kvm_err("Unexpected L2 read permission error\n"); + return -EFAULT; + } + + /* We need minimum second+third level pages */ + ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS); + if (ret) + return ret; + + mmu_seq = vcpu->kvm->mmu_notifier_seq; + /* + * Ensure the read of mmu_notifier_seq happens before we call + * gfn_to_pfn_prot (which calls get_user_pages), so that we don't risk + * the page we just got a reference to gets unmapped before we have a + * chance to grab the mmu_lock, which ensure that if the page gets + * unmapped afterwards, the call to kvm_unmap_hva will take it away + * from us again properly. This smp_rmb() interacts with the smp_wmb() + * in kvm_mmu_notifier_invalidate_. + */ + smp_rmb(); + + pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable); + if (is_error_pfn(pfn)) + return -EFAULT; + + new_pte = pfn_pte(pfn, PAGE_S2); + coherent_icache_guest_page(vcpu->kvm, gfn); + + spin_lock(&vcpu->kvm->mmu_lock); + if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) + goto out_unlock; + if (writable) { + pte_val(new_pte) |= L_PTE_S2_RDWR; + kvm_set_pfn_dirty(pfn); + } + stage2_set_pte(vcpu->kvm, memcache, fault_ipa, &new_pte, false); + +out_unlock: + spin_unlock(&vcpu->kvm->mmu_lock); + kvm_release_pfn_clean(pfn); + return 0; +} + +/** + * kvm_handle_guest_abort - handles all 2nd stage aborts + * @vcpu: the VCPU pointer + * @run: the kvm_run structure + * + * Any abort that gets to the host is almost guaranteed to be caused by a + * missing second stage translation table entry, which can mean that either the + * guest simply needs more memory and we must allocate an appropriate page or it + * can mean that the guest tried to access I/O memory, which is emulated by user + * space. The distinction is based on the IPA causing the fault and whether this + * memory region has been registered as standard RAM by user space. + */ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run) { - return -EINVAL; + unsigned long hsr_ec; + unsigned long fault_status; + phys_addr_t fault_ipa; + struct kvm_memory_slot *memslot; + bool is_iabt; + gfn_t gfn; + int ret, idx; + + hsr_ec = vcpu->arch.hsr >> HSR_EC_SHIFT; + is_iabt = (hsr_ec == HSR_EC_IABT); + fault_ipa = ((phys_addr_t)vcpu->arch.hpfar & HPFAR_MASK) << 8; + + trace_kvm_guest_fault(*vcpu_pc(vcpu), vcpu->arch.hsr, + vcpu->arch.hxfar, fault_ipa); + + /* Check the stage-2 fault is trans. fault or write fault */ + fault_status = (vcpu->arch.hsr & HSR_FSC_TYPE); + if (fault_status != FSC_FAULT && fault_status != FSC_PERM) { + kvm_err("Unsupported fault status: EC=%#lx DFCS=%#lx\n", + hsr_ec, fault_status); + return -EFAULT; + } + + idx = srcu_read_lock(&vcpu->kvm->srcu); + + gfn = fault_ipa >> PAGE_SHIFT; + if (!kvm_is_visible_gfn(vcpu->kvm, gfn)) { + if (is_iabt) { + /* Prefetch Abort on I/O address */ + kvm_inject_pabt(vcpu, vcpu->arch.hxfar); + ret = 1; + goto out_unlock; + } + + if (fault_status != FSC_FAULT) { + kvm_err("Unsupported fault status on io memory: %#lx\n", + fault_status); + ret = -EFAULT; + goto out_unlock; + } + + kvm_pr_unimpl("I/O address abort..."); + ret = 0; + goto out_unlock; + } + + memslot = gfn_to_memslot(vcpu->kvm, gfn); + if (!memslot->user_alloc) { + kvm_err("non user-alloc memslots not supported\n"); + ret = -EINVAL; + goto out_unlock; + } + + ret = user_mem_abort(vcpu, fault_ipa, gfn, memslot, fault_status); + if (ret == 0) + ret = 1; +out_unlock: + srcu_read_unlock(&vcpu->kvm->srcu, idx); + return ret; } static void handle_hva_to_gpa(struct kvm *kvm, diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h index 022305b..624b5a4 100644 --- a/arch/arm/kvm/trace.h +++ b/arch/arm/kvm/trace.h @@ -39,6 +39,32 @@ TRACE_EVENT(kvm_exit, TP_printk("PC: 0x%08lx", __entry->vcpu_pc) ); +TRACE_EVENT(kvm_guest_fault, + TP_PROTO(unsigned long vcpu_pc, unsigned long hsr, + unsigned long hxfar, + unsigned long long ipa), + TP_ARGS(vcpu_pc, hsr, hxfar, ipa), + + TP_STRUCT__entry( + __field( unsigned long, vcpu_pc ) + __field( unsigned long, hsr ) + __field( unsigned long, hxfar ) + __field( unsigned long long, ipa ) + ), + + TP_fast_assign( + __entry->vcpu_pc = vcpu_pc; + __entry->hsr = hsr; + __entry->hxfar = hxfar; + __entry->ipa = ipa; + ), + + TP_printk("guest fault at PC %#08lx (hxfar %#08lx, " + "ipa %#16llx, hsr %#08lx", + __entry->vcpu_pc, __entry->hxfar, + __entry->ipa, __entry->hsr) +); + TRACE_EVENT(kvm_irq_line, TP_PROTO(unsigned int type, int vcpu_idx, int irq_num, int level), TP_ARGS(type, vcpu_idx, irq_num, level),