From patchwork Tue Nov 17 18:15:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Quentin Perret X-Patchwork-Id: 11913147 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.7 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E33C8C2D0E4 for ; Tue, 17 Nov 2020 18:18:00 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 79B7E221FD for ; Tue, 17 Nov 2020 18:18:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="C56y/wzy"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=google.com header.i=@google.com header.b="H73ISXjM" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 79B7E221FD Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:To:From:Subject:Mime-Version:Message-Id:Date: Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender :Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Owner; bh=65RPteVnXmAYsrVZvDbagOujKM4l0IjtBgNHCIdzR1w=; b=C56y/wzyrRgzPUwaml507V72ol kXUX5AYOzLnXA3iC+Zf6Aj2FiAjRnqOFHZrdU6UIVKWvGgybAspLExR4mCtZU1rNxbpC0bS1Pw1PL yERr0L3D1yRz6jEQuf1+QnOxm3RDJrMM4L7OYPDNgsUAdr0sg6B2MQVQCoVS7T/L5UuEotYbLdWfy 9VJVz4yMLVkrJoQz2K3dgBKmdZ1X3bxkVafW76A7x+uAOLvURvDg9Noa5h7XZp2+gwV/dVypgGZaC kRYqTpuToa4MfEK34mp77xiMQsI5IAr0ZTbfomvZFxCG4wEGf56GZaoAPfEHbsUpr0nQ7mNVLO7Ls gAqsyTrg==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kf5Wl-0002BI-KQ; Tue, 17 Nov 2020 18:16:27 +0000 Received: from mail-qv1-xf49.google.com ([2607:f8b0:4864:20::f49]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kf5Wg-00029q-WF for linux-arm-kernel@lists.infradead.org; Tue, 17 Nov 2020 18:16:24 +0000 Received: by mail-qv1-xf49.google.com with SMTP id a1so13599415qvv.18 for ; Tue, 17 Nov 2020 10:16:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:message-id:mime-version:subject:from:to:cc; bh=4x2LPs+SVLChp+56gdHoWBr1y6qmTjYGgcctnhFKsXo=; b=H73ISXjMOGVV1w3bN1jGOFUiUY1yRfJ7AOCQXuOoODvokljw2HLFEN5+G9rTeK4xVQ VXDPG0olr7xSsDBmpit1p1vTjkFCBj36EsT2dRlNaWtGF5FcY85I4gkFpRu7u08zuVud 5wOsR+OmSbpw5Rd9N2yadrUm7yKlp4dzBvrlC706Z2UssEbMifZzzTvFNPT8ull40jm/ Mp2OrAkvIn5KnQqV7wpr6uhGzhKZUdifsJbQlrLaXfcmhVmbpQCnY2GFqI7DeuVvIJ0a ik6fE1aZ/Rgj7Wn0TXAg2fK49UYC7cq5nkUANKEJ+eVYpDDjlqtV65E67i/HISd7jeEd 1Ccg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:message-id:mime-version:subject:from :to:cc; bh=4x2LPs+SVLChp+56gdHoWBr1y6qmTjYGgcctnhFKsXo=; b=roCGTJmjmEcVHF/GWdw69XXneMwoeiq06aPNJELUXqW2mxn1E5TEy85xLlH32C8Ar6 tLb38iHMQ/abqroEhRyNA164pHg5/ZEQFFxTwLRi8QpKShPuPv9zDcU+JwsVFNKSXCJg KKWBV2PuU8TvFdLlvkc/xEhItJGQTEQPicYZxM1q+4B4JzwvELbY2CAYvPamIkr9z0N8 CktLa54J0lgmVZ8aamePa/Gtd9PsYJ57c7mSarPtD9sUhcgv+Cc5Xot0uqIJImXxtiyQ I9ER4vLgXAYPpurI3SmRSoXYKxVtX7o6yiECaO+iz+E5LarQRZnIWyMIoDKAn+RxEAom K2KQ== X-Gm-Message-State: AOAM532Wni2vkP0+KSOzfBW3kXjRs/6kWenaNDnIt5q8syFIUUV7qri2 ApW4lndsljKA14t5eszilFeuR3F9+47R X-Google-Smtp-Source: ABdhPJx8GBSOh7X/x/f3+0D0vrCA1gVP067PW6ywKbFjbKrFdmO63xQSY+G4mk1iJInpYT3Z/CtEUZb0Pssr X-Received: from luke.lon.corp.google.com ([2a00:79e0:d:210:f693:9fff:fef4:a7ef]) (user=qperret job=sendgmr) by 2002:ad4:4a8a:: with SMTP id h10mr884248qvx.55.1605636979378; Tue, 17 Nov 2020 10:16:19 -0800 (PST) Date: Tue, 17 Nov 2020 18:15:40 +0000 Message-Id: <20201117181607.1761516-1-qperret@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.29.2.299.gdc1121823c-goog Subject: [RFC PATCH 00/27] KVM/arm64: A stage 2 for the host From: Quentin Perret To: Catalin Marinas , Will Deacon , Marc Zyngier , James Morse , Julien Thierry , Suzuki K Poulose , Rob Herring , Frank Rowand X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201117_131623_061677_9F940F80 X-CRM114-Status: GOOD ( 30.37 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE" , Quentin Perret , android-kvm@google.com, open list , kernel-team@android.com, "open list:KERNEL VIRTUAL MACHINE FOR ARM64 \(KVM/arm64\)" , "moderated list:ARM64 PORT \(AARCH64 ARCHITECTURE\)" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi all, This RFC series provides the infrastructure enabling to wrap the host kernel with a stage 2 when running KVM in nVHE. This can be useful for several use-cases, but the primary motivation is to (eventually) be able to protect guest memory from the host kernel. More details about the overall idea, design, and motivations can be found in Will's talk at KVM Forum 2020 [1], or the pKVM talk at the Android uconf during LPC 2020 [2]. This series essentially gets us to a point where the 'VM' bit is set in the host's HCR_EL2 when running in nVHE and if 'kvm-arm.protected' is set on the kernel command line. The EL2 object directly handles memory aborts from the host and manages entirely its stage 2 page table. However, this series does _not_ provide any real user for this (yet) and simply idmaps everything into the host stage 2 as RWX cacheable. This is all about the infrastructure for now, so clearly not ready for inclusion upstream yet (hence the RFC tag), but the bases are there and I thought it'd be useful to start a discussion with the community early as this is a rather intrusive change. So, here goes. One of the interesting requirements that comes with the series is that managing page-tables requires some sort of memory allocator at EL2 to allocate, refcount and free memory pages. Clearly, none of that is currently possible in nVHE, so a significant chunk of the series is dedicated to solving that problem. The proposed EL2 memory allocator mimics Linux' buddy system in principles, and re-uses some of the arm64 mm design choices. Specifically, it uses a vmemmap at EL2 which contains a set of struct hyp_page entries to hold pages metadata. To support this, I extended the EL2 object to make it manage its own stage 1 page-table in addition to host stage 2. This simplifies the hyp_vmemmap creation and was going to be required anyway for the protected VM use-case -- the threat model implies the host cannot be trusted after boot, and it will thus be crucial to ensure it cannot map arbitrary code at EL2. The pool of memory pages used by the EL2 allocator are reserved by the host early during boot (while it is still trusted) using the memblock API, and are donated to EL2 during KVM init. The current assumption is that the host reserves enough pages to allow the EL2 object to map all of memory at page granularity for both hyp stage 1 and host stage 2, plus some extra pages for device mappings. On top of that the series introduces a few smaller features that are needed along the way, but hopefully all of those are detailed properly in the relevant commit messages. And as a last note, I'd like to point out that there are at this point trivial ways for the host to circumvent its stage 2 protection. It still owns the guests stage 2 for example, meaning that nothing would prevent a malicious host from using a guest as a proxy to access protected memory, _yet_. This series lays the ground for future work to address these things, which will clearly require a stage 2 over the host at some point, so I just wanted to set the expectations right. With all that in mind, the series is organized as follows: - patches 01-03 provide EL2 with some utility libraries needed for memory management and synchronization; - patches 04-09 mostly refactor smalls portions of the code to ease the EL2 memory management; - patches 10-17 add the actual EL2 memory management code, as well as the setup/bootstrap code on the KVM init path; - patches 18-24 refactor the existing stage 2 management code to make it re-usable from the EL2 object; - and finally patches 25-27 introduce the host stage 2 and the trap handling logic at EL2. This work is based on the latest kvmarm/queue (which includes Marc's host EL2 entry rework [3], as well as Will's guest vector refactoring [4]) + David's PSCI proxying series [5]. And if you'd like a branch that has all the bits and pieces: https://android-kvm.googlesource.com/linux qperret/host-stage2 Boot-tested (host and guest) using qemu in VHE and nVHE, and on real hardware on a AML-S905X-CC (Le Potato). Thanks, Quentin [1] https://kvmforum2020.sched.com/event/eE24/virtualization-for-the-masses-exposing-kvm-on-android-will-deacon-google [2] https://youtu.be/54q6RzS9BpQ?t=10859 [3] https://lore.kernel.org/kvmarm/20201109175923.445945-1-maz@kernel.org/ [4] https://lore.kernel.org/kvmarm/20201113113847.21619-1-will@kernel.org/ [5] https://lore.kernel.org/kvmarm/20201116204318.63987-1-dbrazdil@google.com/ Quentin Perret (24): KVM: arm64: Initialize kvm_nvhe_init_params early KVM: arm64: Avoid free_page() in page-table allocator KVM: arm64: Factor memory allocation out of pgtable.c KVM: arm64: Introduce a BSS section for use at Hyp KVM: arm64: Make kvm_call_hyp() a function call at Hyp KVM: arm64: Allow using kvm_nvhe_sym() in hyp code KVM: arm64: Introduce an early Hyp page allocator KVM: arm64: Stub CONFIG_DEBUG_LIST at Hyp KVM: arm64: Introduce a Hyp buddy page allocator KVM: arm64: Enable access to sanitized CPU features at EL2 KVM: arm64: Factor out vector address calculation of/fdt: Introduce early_init_dt_add_memory_hyp() KVM: arm64: Prepare Hyp memory protection KVM: arm64: Elevate Hyp mappings creation at EL2 KVM: arm64: Use kvm_arch for stage 2 pgtable KVM: arm64: Use kvm_arch in kvm_s2_mmu KVM: arm64: Set host stage 2 using kvm_nvhe_init_params KVM: arm64: Refactor kvm_arm_setup_stage2() KVM: arm64: Refactor __load_guest_stage2() KVM: arm64: Refactor __populate_fault_info() KVM: arm64: Make memcache anonymous in pgtable allocator KVM: arm64: Reserve memory for host stage 2 KVM: arm64: Sort the memblock regions list KVM: arm64: Wrap the host with a stage 2 Will Deacon (3): arm64: lib: Annotate {clear,copy}_page() as position-independent KVM: arm64: Link position-independent string routines into .hyp.text KVM: arm64: Add standalone ticket spinlock implementation for use at hyp arch/arm64/include/asm/cpufeature.h | 1 + arch/arm64/include/asm/hyp_image.h | 4 + arch/arm64/include/asm/kvm_asm.h | 13 +- arch/arm64/include/asm/kvm_cpufeature.h | 19 ++ arch/arm64/include/asm/kvm_host.h | 17 +- arch/arm64/include/asm/kvm_hyp.h | 8 + arch/arm64/include/asm/kvm_mmu.h | 69 +++++- arch/arm64/include/asm/kvm_pgtable.h | 41 +++- arch/arm64/include/asm/sections.h | 1 + arch/arm64/kernel/asm-offsets.c | 3 + arch/arm64/kernel/cpufeature.c | 14 +- arch/arm64/kernel/image-vars.h | 35 +++ arch/arm64/kernel/vmlinux.lds.S | 7 + arch/arm64/kvm/arm.c | 136 +++++++++-- arch/arm64/kvm/hyp/Makefile | 2 +- arch/arm64/kvm/hyp/include/hyp/switch.h | 36 +-- arch/arm64/kvm/hyp/include/nvhe/early_alloc.h | 14 ++ arch/arm64/kvm/hyp/include/nvhe/gfp.h | 32 +++ arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 33 +++ arch/arm64/kvm/hyp/include/nvhe/memory.h | 55 +++++ arch/arm64/kvm/hyp/include/nvhe/mm.h | 107 +++++++++ arch/arm64/kvm/hyp/include/nvhe/spinlock.h | 95 ++++++++ arch/arm64/kvm/hyp/include/nvhe/util.h | 25 ++ arch/arm64/kvm/hyp/nvhe/Makefile | 9 +- arch/arm64/kvm/hyp/nvhe/cache.S | 13 ++ arch/arm64/kvm/hyp/nvhe/cpufeature.c | 8 + arch/arm64/kvm/hyp/nvhe/early_alloc.c | 60 +++++ arch/arm64/kvm/hyp/nvhe/hyp-init.S | 39 ++++ arch/arm64/kvm/hyp/nvhe/hyp-main.c | 50 ++++ arch/arm64/kvm/hyp/nvhe/hyp.lds.S | 1 + arch/arm64/kvm/hyp/nvhe/mem_protect.c | 191 ++++++++++++++++ arch/arm64/kvm/hyp/nvhe/mm.c | 175 ++++++++++++++ arch/arm64/kvm/hyp/nvhe/page_alloc.c | 185 +++++++++++++++ arch/arm64/kvm/hyp/nvhe/psci-relay.c | 7 +- arch/arm64/kvm/hyp/nvhe/setup.c | 214 ++++++++++++++++++ arch/arm64/kvm/hyp/nvhe/stub.c | 22 ++ arch/arm64/kvm/hyp/nvhe/switch.c | 12 +- arch/arm64/kvm/hyp/nvhe/tlb.c | 4 +- arch/arm64/kvm/hyp/pgtable.c | 98 ++++---- arch/arm64/kvm/hyp/reserved_mem.c | 95 ++++++++ arch/arm64/kvm/mmu.c | 114 +++++++++- arch/arm64/kvm/reset.c | 42 +--- arch/arm64/lib/clear_page.S | 4 +- arch/arm64/lib/copy_page.S | 4 +- arch/arm64/mm/init.c | 3 + drivers/of/fdt.c | 5 + 46 files changed, 1971 insertions(+), 151 deletions(-) create mode 100644 arch/arm64/include/asm/kvm_cpufeature.h create mode 100644 arch/arm64/kvm/hyp/include/nvhe/early_alloc.h create mode 100644 arch/arm64/kvm/hyp/include/nvhe/gfp.h create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h create mode 100644 arch/arm64/kvm/hyp/include/nvhe/memory.h create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mm.h create mode 100644 arch/arm64/kvm/hyp/include/nvhe/spinlock.h create mode 100644 arch/arm64/kvm/hyp/include/nvhe/util.h create mode 100644 arch/arm64/kvm/hyp/nvhe/cache.S create mode 100644 arch/arm64/kvm/hyp/nvhe/cpufeature.c create mode 100644 arch/arm64/kvm/hyp/nvhe/early_alloc.c create mode 100644 arch/arm64/kvm/hyp/nvhe/mem_protect.c create mode 100644 arch/arm64/kvm/hyp/nvhe/mm.c create mode 100644 arch/arm64/kvm/hyp/nvhe/page_alloc.c create mode 100644 arch/arm64/kvm/hyp/nvhe/setup.c create mode 100644 arch/arm64/kvm/hyp/nvhe/stub.c create mode 100644 arch/arm64/kvm/hyp/reserved_mem.c