From patchwork Mon Mar 12 14:02:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marc Zyngier X-Patchwork-Id: 10276365 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id A094E602BD for ; Mon, 12 Mar 2018 14:02:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8F5D92880A for ; Mon, 12 Mar 2018 14:02:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 841032883F; Mon, 12 Mar 2018 14:02:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4B35F2880A for ; Mon, 12 Mar 2018 14:02:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932274AbeCLOCS (ORCPT ); Mon, 12 Mar 2018 10:02:18 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:54420 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932227AbeCLOCN (ORCPT ); Mon, 12 Mar 2018 10:02:13 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F34F11596; Mon, 12 Mar 2018 07:02:12 -0700 (PDT) Received: from [10.1.207.62] (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2544A3F24A; Mon, 12 Mar 2018 07:02:11 -0700 (PDT) Subject: Re: [PATCH v5 10/23] KVM: arm/arm64: Move HYP IO VAs to the "idmap" range To: James Morse Cc: linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu, Christoffer Dall , Mark Rutland , Catalin Marinas , Will Deacon , Steve Capper , Peter Maydell , Kristina Martsenko References: <20180301155538.26860-1-marc.zyngier@arm.com> <20180301155538.26860-11-marc.zyngier@arm.com> <5AA2D97C.70503@arm.com> From: Marc Zyngier Organization: ARM Ltd Message-ID: <89ded6ea-5b08-e296-3595-08776704f847@arm.com> Date: Mon, 12 Mar 2018 14:02:09 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <5AA2D97C.70503@arm.com> Content-Language: en-GB Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP [+Kristina for the extended idmap stuff] Hi James, On 09/03/18 18:59, James Morse wrote: > Hi Marc, > > On 01/03/18 15:55, Marc Zyngier wrote: >> We so far mapped our HYP IO (which is essencially the GICv2 control > > (Nit: essentially) > > >> registers) using the same method as for memory. It recently appeared >> that is a bit unsafe: >> >> We compute the HYP VA using the kern_hyp_va helper, but that helper >> is only designed to deal with kernel VAs coming from the linear map, >> and not from the vmalloc region... This could in turn cause some bad >> aliasing between the two, amplified by the upcoming VA randomisation. >> >> A solution is to come up with our very own basic VA allocator for >> MMIO. Since half of the HYP address space only contains a single >> page (the idmap), we have plenty to borrow from. Let's use the idmap >> as a base, and allocate downwards from it. GICv2 now lives on the >> other side of the great VA barrier. > >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 0e5cfffb4c21..3074544940dc 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -502,27 +505,31 @@ static void unmap_hyp_range(pgd_t *pgdp, phys_addr_t start, u64 size) >> * >> * Assumes hyp_pgd is a page table used strictly in Hyp-mode and >> * therefore contains either mappings in the kernel memory area (above >> - * PAGE_OFFSET), or device mappings in the vmalloc range (from >> - * VMALLOC_START to VMALLOC_END). >> + * PAGE_OFFSET), or device mappings in the idmap range. >> * >> - * boot_hyp_pgd should only map two pages for the init code. >> + * boot_hyp_pgd should only map the idmap range, and is only used in >> + * the extended idmap case. >> */ >> void free_hyp_pgds(void) >> { >> + pgd_t *id_pgd; >> + >> mutex_lock(&kvm_hyp_pgd_mutex); >> >> + id_pgd = boot_hyp_pgd ? boot_hyp_pgd : hyp_pgd; > >> + >> + if (id_pgd) >> + unmap_hyp_range(id_pgd, io_map_base, >> + hyp_idmap_start + PAGE_SIZE - io_map_base); > > Even if kvm_mmu_init() fails before it sets io_map_base, this will still unmap > the idmap. It just starts from 0, so it may take out the flipped PAGE_OFFSET > range too... Yup, definitely worth fixing. > > (using io_map_base without taking io_map_lock makes me nervous ... in practice, > its fine) I'm not too worried about that, as we only have a single CPU performing the teardown. But better safe than sorry, so I'll take it anyway. > > >> + >> if (boot_hyp_pgd) { >> - unmap_hyp_range(boot_hyp_pgd, hyp_idmap_start, PAGE_SIZE); >> free_pages((unsigned long)boot_hyp_pgd, hyp_pgd_order); >> boot_hyp_pgd = NULL; >> } >> >> if (hyp_pgd) { >> - unmap_hyp_range(hyp_pgd, hyp_idmap_start, PAGE_SIZE); >> unmap_hyp_range(hyp_pgd, kern_hyp_va(PAGE_OFFSET), >> (uintptr_t)high_memory - PAGE_OFFSET); >> - unmap_hyp_range(hyp_pgd, kern_hyp_va(VMALLOC_START), >> - VMALLOC_END - VMALLOC_START); >> >> free_pages((unsigned long)hyp_pgd, hyp_pgd_order); >> hyp_pgd = NULL; >> @@ -719,7 +726,8 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size, >> void __iomem **kaddr, >> void __iomem **haddr) >> { >> - unsigned long start, end; >> + pgd_t *pgd = hyp_pgd; >> + unsigned long base; >> int ret; >> >> *kaddr = ioremap(phys_addr, size); >> @@ -731,11 +739,42 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size, >> return 0; >> } >> >> + mutex_lock(&io_map_lock); >> + >> + /* >> + * This assumes that we we have enough space below the idmap >> + * page to allocate our VAs. If not, the check below will >> + * kick. A potential alternative would be to detect that >> + * overflow and switch to an allocation above the idmap. >> + */ >> + size = max(PAGE_SIZE, roundup_pow_of_two(size)); >> + base = io_map_base - size; >> + base &= ~(size - 1); >> >> - start = kern_hyp_va((unsigned long)*kaddr); >> - end = kern_hyp_va((unsigned long)*kaddr + size); >> - ret = __create_hyp_mappings(hyp_pgd, PTRS_PER_PGD, start, end, >> - __phys_to_pfn(phys_addr), PAGE_HYP_DEVICE); >> + /* >> + * Verify that BIT(VA_BITS - 1) hasn't been flipped by >> + * allocating the new area, as it would indicate we've >> + * overflowed the idmap/IO address range. >> + */ >> + if ((base ^ io_map_base) & BIT(VA_BITS - 1)) { >> + ret = -ENOMEM; >> + goto out; >> + } >> + >> + if (__kvm_cpu_uses_extended_idmap()) >> + pgd = boot_hyp_pgd; >> + >> + ret = __create_hyp_mappings(pgd, __kvm_idmap_ptrs_per_pgd(), >> + base, base + size, >> + __phys_to_pfn(phys_addr), PAGE_HYP_DEVICE); > > (/me winces, that's subtle....) > This __kvm_idmap_ptrs_per_pgd() change is because the hyp_pgd > extended-idmap top-level page may be a pgd bigger than the 64-entries that linux > believes it has for 64K/48bit VA? Yes, together with the 52bit PA madness. > Doesn't unmap_hyp_range() need to know about this too? Otherwise its > pgd_index(hyp_idmap_start) is going to mask out too much of the address, and > pgd_addr_end() will never reach the end address we provided... Hmmm, that's a good point. Kristina, what do you think? > Trying to boot a 64K config, and forcing it into teardown_hyp_mode() leads to > some fireworks: It looks like an unaligned end address is getting into > unmap_hyp_ptes() and its escaping the loop to kvm_set_pte() other kernel data... > > My local changes are below [0], the config is defconfig + 64K pages, this is on > Juno. 4K pages is quite happy with this 'force teardown_hyp_mode()' debug hack. > > Bisects to patch 4: "arm64: KVM: Dynamically patch the kernel/hyp VA mask" > > I'll keep digging on Monday, Nice one! I've reproduced it on a model. It turns out that we only mandate a 4kB alignment for the idmap, whilst we assume page alignment everywhere in the code... Not great. I've fixed it as such: Thanks, M. diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 7059c4f34c37..ae4eabd2005d 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1891,7 +1891,9 @@ int kvm_mmu_init(void) int err; hyp_idmap_start = kvm_virt_to_phys(__hyp_idmap_text_start); + hyp_idmap_start = ALIGN_DOWN(hyp_idmap_start, PAGE_SIZE); hyp_idmap_end = kvm_virt_to_phys(__hyp_idmap_text_end); + hyp_idmap_end = ALIGN(hyp_idmap_end, PAGE_SIZE); hyp_idmap_vector = kvm_virt_to_phys(__kvm_hyp_init); /*