From patchwork Thu Mar 15 19:09:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Morse X-Patchwork-Id: 10285635 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6CCF960386 for ; Thu, 15 Mar 2018 19:12:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 45E6123201 for ; Thu, 15 Mar 2018 19:12:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3A91F26220; Thu, 15 Mar 2018 19:12:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID autolearn=unavailable version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 9E22E24B48 for ; Thu, 15 Mar 2018 19:12:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:References:Subject:To: MIME-Version:From:Date:Message-ID:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=sK2/rz+++NdhWXTVd7PKuihbCgRUzB5ueax61vCrMHA=; b=pQxWUTvN3sgrFJ ELPC7+E7VfXoP6HKK2OZ5vgYvrU2F+9aovYKrveCvUG5/KTRXD/NNr+QqV1rf1LFVnKMOy2LwP6jA 3oSgdjjM6ZBtdxSr0YBqH4FWNEoN4oZwLlL+pai/42wvyF7mh+D2TAG1l4fmjDA3ZKSZ/qDfKAY0b +iBRCdoJjLn2yOkh6YGOa8tVesVuRHjrun1herI80d27zHOTi7phcwn8PTnkB5bfuNNt9tCCETEnw DzXQEtWXpUKBwT3Ux5plKxQBxZc/t3L/jCcjnBN+taa5peXn3EgO06s5y/F+LTIFg0Gf6wjb6hyc/ MND+R2hSrZ9tjXAB1teQ==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1ewYIp-0007st-DK; Thu, 15 Mar 2018 19:12:39 +0000 Received: from foss.arm.com ([217.140.101.70]) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1ewYIl-0007qw-QX for linux-arm-kernel@lists.infradead.org; Thu, 15 Mar 2018 19:12:37 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 76B941529; Thu, 15 Mar 2018 12:12:25 -0700 (PDT) Received: from [10.1.207.55] (melchizedek.cambridge.arm.com [10.1.207.55]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2DB283F25D; Thu, 15 Mar 2018 12:12:23 -0700 (PDT) Message-ID: <5AAAC4F8.9050608@arm.com> Date: Thu, 15 Mar 2018 19:09:44 +0000 From: James Morse User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.6.0 MIME-Version: 1.0 To: Marc Zyngier Subject: Re: [PATCH v6 12/26] KVM: arm/arm64: Move HYP IO VAs to the "idmap" range References: <20180314165049.30105-1-marc.zyngier@arm.com> <20180314165049.30105-13-marc.zyngier@arm.com> In-Reply-To: <20180314165049.30105-13-marc.zyngier@arm.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20180315_121235_978658_63ACB5EF X-CRM114-Status: GOOD ( 34.68 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , Peter Maydell , Andrew Jones , Christoffer Dall , kvm@vger.kernel.org, Steve Capper , Catalin Marinas , Will Deacon , Kristina Martsenko , kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Marc, On 14/03/18 16:50, Marc Zyngier wrote: > We so far mapped our HYP IO (which is essentially the GICv2 control > registers) using the same method as for memory. It recently appeared > that is a bit unsafe: > > We compute the HYP VA using the kern_hyp_va helper, but that helper > is only designed to deal with kernel VAs coming from the linear map, > and not from the vmalloc region... This could in turn cause some bad > aliasing between the two, amplified by the upcoming VA randomisation. > > A solution is to come up with our very own basic VA allocator for > MMIO. Since half of the HYP address space only contains a single > page (the idmap), we have plenty to borrow from. Let's use the idmap > as a base, and allocate downwards from it. GICv2 now lives on the > other side of the great VA barrier. > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c > index 9946f8a38b26..a9577ecc3e6a 100644 > --- a/virt/kvm/arm/mmu.c > +++ b/virt/kvm/arm/mmu.c > @@ -43,6 +43,9 @@ static unsigned long hyp_idmap_start; > static unsigned long hyp_idmap_end; > static phys_addr_t hyp_idmap_vector; > > +static DEFINE_MUTEX(io_map_lock); > +static unsigned long io_map_base; > + > #define S2_PGD_SIZE (PTRS_PER_S2_PGD * sizeof(pgd_t)) > #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t)) > > @@ -518,27 +521,37 @@ static void unmap_hyp_idmap_range(pgd_t *pgdp, phys_addr_t start, u64 size) > * > * Assumes hyp_pgd is a page table used strictly in Hyp-mode and > * therefore contains either mappings in the kernel memory area (above > - * PAGE_OFFSET), or device mappings in the vmalloc range (from > - * VMALLOC_START to VMALLOC_END). > + * PAGE_OFFSET), or device mappings in the idmap range. > * > - * boot_hyp_pgd should only map two pages for the init code. > + * boot_hyp_pgd should only map the idmap range, and is only used in > + * the extended idmap case. > */ > void free_hyp_pgds(void) > { > + pgd_t *id_pgd; > + > mutex_lock(&kvm_hyp_pgd_mutex); > > + id_pgd = boot_hyp_pgd ? boot_hyp_pgd : hyp_pgd; > + > + if (id_pgd) { > + mutex_lock(&io_map_lock); This takes kvm_hyp_pgd_mutex then io_map_lock ... > + /* In case we never called hyp_mmu_init() */ > + if (!io_map_base) > + io_map_base = hyp_idmap_start; > + unmap_hyp_idmap_range(id_pgd, io_map_base, > + hyp_idmap_start + PAGE_SIZE - io_map_base); > + mutex_unlock(&io_map_lock); > + } > + > if (boot_hyp_pgd) { > - unmap_hyp_idmap_range(boot_hyp_pgd, hyp_idmap_start, PAGE_SIZE); > free_pages((unsigned long)boot_hyp_pgd, hyp_pgd_order); > boot_hyp_pgd = NULL; > } > > if (hyp_pgd) { > - unmap_hyp_idmap_range(hyp_pgd, hyp_idmap_start, PAGE_SIZE); > unmap_hyp_range(hyp_pgd, kern_hyp_va(PAGE_OFFSET), > (uintptr_t)high_memory - PAGE_OFFSET); > - unmap_hyp_range(hyp_pgd, kern_hyp_va(VMALLOC_START), > - VMALLOC_END - VMALLOC_START); > > free_pages((unsigned long)hyp_pgd, hyp_pgd_order); > hyp_pgd = NULL; > @@ -747,11 +761,43 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size, > return 0; > } > + mutex_lock(&io_map_lock); > - start = kern_hyp_va((unsigned long)*kaddr); > - end = kern_hyp_va((unsigned long)*kaddr + size); > - ret = __create_hyp_mappings(hyp_pgd, PTRS_PER_PGD, start, end, > - __phys_to_pfn(phys_addr), PAGE_HYP_DEVICE); > + /* > + * This assumes that we we have enough space below the idmap > + * page to allocate our VAs. If not, the check below will > + * kick. A potential alternative would be to detect that > + * overflow and switch to an allocation above the idmap. > + * > + * The allocated size is always a multiple of PAGE_SIZE. > + */ > + size = PAGE_ALIGN(size + offset_in_page(phys_addr)); > + base = io_map_base - size; > + > + /* > + * Verify that BIT(VA_BITS - 1) hasn't been flipped by > + * allocating the new area, as it would indicate we've > + * overflowed the idmap/IO address range. > + */ > + if ((base ^ io_map_base) & BIT(VA_BITS - 1)) { > + ret = -ENOMEM; > + goto out; > + } > + > + if (__kvm_cpu_uses_extended_idmap()) > + pgd = boot_hyp_pgd; > + > + ret = __create_hyp_mappings(pgd, __kvm_idmap_ptrs_per_pgd(), > + base, base + size, > + __phys_to_pfn(phys_addr), PAGE_HYP_DEVICE); And here we have io_map_lock, but __create_hyp_mappings takes kvm_hyp_pgd_mutex. There is another example of this in create_hyp_exec_mappings, I think making this the required order makes sense: if you are mapping to the KVM-IO region, you take the io_map_lock before taking the pgd lock to write in your allocated location. (free_hyp_pgds() is always going to need to take both). Never going to happen, but it generates a lockdep splat[1]. Fixup diff[0] below fixes it for me. > + if (ret) > + goto out; > + > + *haddr = (void __iomem *)base + offset_in_page(phys_addr); > + io_map_base = base; > + > +out: > + mutex_unlock(&io_map_lock); > > if (ret) { > iounmap(*kaddr); Thanks, James [0] --------------------------%<-------------------------- --------------------------%<-------------------------- [1] [ 2.795711] kvm [1]: 8-bit VMID [ 2.800456] [ 2.801944] ====================================================== [ 2.808086] WARNING: possible circular locking dependency detected [ 2.814230] 4.16.0-rc5-00027-gca4a12e92d2d-dirty #9471 Not tainted [ 2.820371] ------------------------------------------------------ [ 2.826513] swapper/0/1 is trying to acquire lock: [ 2.831274] (io_map_lock){+.+.}, at: [<00000000cf644f15>] free_hyp_pgds+0x44 /0x148 [ 2.838914] [ 2.838914] but task is already holding lock: [ 2.844713] (kvm_hyp_pgd_mutex){+.+.}, at: [<00000000e786face>] free_hyp_pgd s+0x20/0x148 [ 2.852863] [ 2.852863] which lock already depends on the new lock. [ 2.852863] [ 2.860995] [ 2.860995] the existing dependency chain (in reverse order) is: [ 2.868433] [ 2.868433] -> #1 (kvm_hyp_pgd_mutex){+.+.}: [ 2.874174] __mutex_lock+0x70/0x8d0 [ 2.878249] mutex_lock_nested+0x1c/0x28 [ 2.882671] __create_hyp_mappings+0x48/0x3e0 [ 2.887523] __create_hyp_private_mapping+0xa4/0xf8 [ 2.892893] create_hyp_exec_mappings+0x28/0x58 [ 2.897918] kvm_arch_init+0x524/0x6e8 [ 2.902167] kvm_init+0x20/0x2d8 [ 2.905897] arm_init+0x1c/0x28 [ 2.909542] do_one_initcall+0x38/0x128 [ 2.913884] kernel_init_freeable+0x1e0/0x284 [ 2.918737] kernel_init+0x10/0x100 [ 2.922726] ret_from_fork+0x10/0x18 [ 2.926797] [ 2.926797] -> #0 (io_map_lock){+.+.}: [ 2.932020] lock_acquire+0x6c/0xb0 [ 2.936008] __mutex_lock+0x70/0x8d0 [ 2.940082] mutex_lock_nested+0x1c/0x28 [ 2.944502] free_hyp_pgds+0x44/0x148 [ 2.948663] teardown_hyp_mode+0x34/0x90 [ 2.953084] kvm_arch_init+0x3b8/0x6e8 [ 2.957332] kvm_init+0x20/0x2d8 [ 2.961061] arm_init+0x1c/0x28 [ 2.964704] do_one_initcall+0x38/0x128 [ 2.969039] kernel_init_freeable+0x1e0/0x284 [ 2.973891] kernel_init+0x10/0x100 [ 2.977880] ret_from_fork+0x10/0x18 [ 2.981950] [ 2.981950] other info that might help us debug this: [ 2.981950] [ 2.989910] Possible unsafe locking scenario: [ 2.989910] [ 2.995796] CPU0 CPU1 [ 3.000297] ---- ---- [ 3.004798] lock(kvm_hyp_pgd_mutex); [ 3.008533] lock(io_map_lock); [ 3.014252] lock(kvm_hyp_pgd_mutex); [ 3.020488] lock(io_map_lock); [ 3.023705] [ 3.023705] *** DEADLOCK *** [ 3.023705] [ 3.029597] 1 lock held by swapper/0/1: [ 3.033409] #0: (kvm_hyp_pgd_mutex){+.+.}, at: [<00000000e786face>] free_hy p_pgds+0x20/0x148 [ 3.041993] [ 3.041993] stack backtrace: [ 3.046331] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc5-00027-gca4a1 2e92d2d-dirty #9471 [ 3.055062] Hardware name: ARM Juno development board (r1) (DT) [ 3.060945] Call trace: [ 3.063383] dump_backtrace+0x0/0x190 [ 3.067027] show_stack+0x14/0x20 [ 3.070326] dump_stack+0xac/0xe8 [ 3.073626] print_circular_bug.isra.19+0x1d4/0x2e0 [ 3.078476] __lock_acquire+0x19f4/0x1a50 [ 3.082464] lock_acquire+0x6c/0xb0 [ 3.085934] __mutex_lock+0x70/0x8d0 [ 3.089491] mutex_lock_nested+0x1c/0x28 [ 3.093394] free_hyp_pgds+0x44/0x148 [ 3.097037] teardown_hyp_mode+0x34/0x90 [ 3.100940] kvm_arch_init+0x3b8/0x6e8 [ 3.104670] kvm_init+0x20/0x2d8 [ 3.107882] arm_init+0x1c/0x28 [ 3.111007] do_one_initcall+0x38/0x128 [ 3.114823] kernel_init_freeable+0x1e0/0x284 [ 3.119158] kernel_init+0x10/0x100 [ 3.122628] ret_from_fork+0x10/0x18 diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 554ad5493e7d..f7642c96fc56 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -43,6 +43,7 @@ static unsigned long hyp_idmap_start; static unsigned long hyp_idmap_end; static phys_addr_t hyp_idmap_vector; +/* Take io_map_lock before kvm_hyp_pgd_mutex */ static DEFINE_MUTEX(io_map_lock); static unsigned long io_map_base; @@ -530,18 +531,17 @@ void free_hyp_pgds(void) { pgd_t *id_pgd; + mutex_lock(&io_map_lock); mutex_lock(&kvm_hyp_pgd_mutex); id_pgd = boot_hyp_pgd ? boot_hyp_pgd : hyp_pgd; if (id_pgd) { - mutex_lock(&io_map_lock); /* In case we never called hyp_mmu_init() */ if (!io_map_base) io_map_base = hyp_idmap_start; unmap_hyp_idmap_range(id_pgd, io_map_base, hyp_idmap_start + PAGE_SIZE - io_map_base); - mutex_unlock(&io_map_lock); } if (boot_hyp_pgd) { @@ -563,6 +563,7 @@ void free_hyp_pgds(void) } mutex_unlock(&kvm_hyp_pgd_mutex); + mutex_unlock(&io_map_lock); } static void create_hyp_pte_mappings(pmd_t *pmd, unsigned long start,