From patchwork Wed Sep 11 14:33:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fares Mehanna X-Patchwork-Id: 13800673 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1EDBEE49B7 for ; Wed, 11 Sep 2024 14:35:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 710EF940046; Wed, 11 Sep 2024 10:35:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 69A20940021; Wed, 11 Sep 2024 10:35:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 51410940046; Wed, 11 Sep 2024 10:35:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 2E592940021 for ; Wed, 11 Sep 2024 10:35:28 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id BD35381794 for ; Wed, 11 Sep 2024 14:35:27 +0000 (UTC) X-FDA: 82552705494.14.2A0E520 Received: from smtp-fw-80007.amazon.com (smtp-fw-80007.amazon.com [99.78.197.218]) by imf27.hostedemail.com (Postfix) with ESMTP id A50BE4000C for ; Wed, 11 Sep 2024 14:35:25 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=amazon.de header.s=amazon201209 header.b=oH0Xf6Wo; dmarc=pass (policy=quarantine) header.from=amazon.de; spf=pass (imf27.hostedemail.com: domain of "prvs=97728e23b=faresx@amazon.de" designates 99.78.197.218 as permitted sender) smtp.mailfrom="prvs=97728e23b=faresx@amazon.de" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726065210; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=OYjq6pbgVwn9b81T7cFK3VzB+SX95+WNYatfu9MFnys=; b=BpXHqEv1OtBCz7zkFaFnZvg+tylctBp0jNXhO8JdSW/8gm1Tv/yP74HrGgr+0BZQtFG515 D3X11XV7MZzueKIhXB/K9pLaZxAYu7BNP8jMjsz70SKzWB/HMnl4tnuaw5HUuRSUhMmuEi 4eirBsStK4v32I0CfYLeYlJVq1yqjGc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726065210; a=rsa-sha256; cv=none; b=s+ifmV8A5lHMkWzmphnD9cpFKsQ5fiGrEz6u6ysDXbgdGI1NJzN92PkhOgWX49GyCUa71j GU5ANaKDLkO3Xn6wO8n3g1itfWgD7tAMmbBZ/8L09STump8Gf3ZTaaOx7ihzeUh2SjLSlS F40Vb4RAzh1HMXUrVsZyBM7uHzDoKxg= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=amazon.de header.s=amazon201209 header.b=oH0Xf6Wo; dmarc=pass (policy=quarantine) header.from=amazon.de; spf=pass (imf27.hostedemail.com: domain of "prvs=97728e23b=faresx@amazon.de" designates 99.78.197.218 as permitted sender) smtp.mailfrom="prvs=97728e23b=faresx@amazon.de" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1726065326; x=1757601326; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=OYjq6pbgVwn9b81T7cFK3VzB+SX95+WNYatfu9MFnys=; b=oH0Xf6Wouo62A4f1QEKc12IFHbxjof9i2KSKVMUoRU5/4oYBqhmwoI7d 9WnbEo4yR+kUJHn7fZbbJFkFt2HESBh/SXnKIt3DGmgBQXG0XyLkirYcp 8N31aOs+zInn3g5W4ZWnfQGhTsqKyXNG4j/E1oA8Etj/2KJ37B1oJcMyB E=; X-IronPort-AV: E=Sophos;i="6.10,220,1719878400"; d="scan'208";a="329955903" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80007.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2024 14:35:20 +0000 Received: from EX19MTAEUA002.ant.amazon.com [10.0.10.100:6114] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.5.52:2525] with esmtp (Farcaster) id 509ab7d1-9dbf-4d4d-b643-5a7e7b0c884b; Wed, 11 Sep 2024 14:35:18 +0000 (UTC) X-Farcaster-Flow-ID: 509ab7d1-9dbf-4d4d-b643-5a7e7b0c884b Received: from EX19D007EUB001.ant.amazon.com (10.252.51.82) by EX19MTAEUA002.ant.amazon.com (10.252.50.126) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 11 Sep 2024 14:35:17 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D007EUB001.ant.amazon.com (10.252.51.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.35; Wed, 11 Sep 2024 14:35:17 +0000 Received: from dev-dsk-faresx-1b-27755bf1.eu-west-1.amazon.com (10.253.79.181) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Wed, 11 Sep 2024 14:35:14 +0000 From: Fares Mehanna To: CC: , Fares Mehanna , "Marc Zyngier" , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , "Will Deacon" , Andrew Morton , "Kemeng Shi" , =?utf-8?q?Pierre-Cl=C3=A9ment_Tos?= =?utf-8?q?i?= , Ard Biesheuvel , Mark Rutland , Javier Martinez Canillas , "Arnd Bergmann" , Fuad Tabba , Mark Brown , Joey Gouly , Kristina Martsenko , Randy Dunlap , "Bjorn Helgaas" , Jean-Philippe Brucker , "Mike Rapoport (IBM)" , "David Hildenbrand" , Roman Kagan , "moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , "open list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , open list , "open list:MEMORY MANAGEMENT" Subject: [RFC PATCH 0/7] support for mm-local memory allocations and use it Date: Wed, 11 Sep 2024 14:33:59 +0000 Message-ID: <20240911143421.85612-1-faresx@amazon.de> X-Mailer: git-send-email 2.40.1 MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: A50BE4000C X-Stat-Signature: 5hwninho1ru8m3xjzq9gikxhz7zct77o X-Rspam-User: X-HE-Tag: 1726065325-691089 X-HE-Meta: U2FsdGVkX1/l+CG6+jit6dy6H7e5P4rashU61PAgwcHklDgEIpQbylrJWmqbGoImUTYuDju2fW0aHlE4ibNYbFe+fvwiEcIKRPbU1oD0P2p7S+pzMVVh95A3+QZPIUAPI/j5fMv3O8Xw2G/xNI9znAVPQns5/QXnI6AWnw9cZEsGSfJgF77r6zomxgYKbm08zE7tC8Yqfy0/LIbW6jQfKdPf6rf5aNq+2tZ7SDQ72mo8On9l+d0bAahPEwhiJ4hwWuGhJqgdhBjouZRY820watOEcmF9sNE4oWnyRAOxDd1bhrCz+zG7QTtS4sMDONCT4UYld31XRxl6cQ8OCzkQdaoihV6Vxk0D+gaL5Tx6MtGGf/NinLVYRq6q1h8/5kYO4wy+puSaiVSVCi/AYRon/G7XN0nKNt4zNPAMAERZmUi56QHYPeMoh8bnj0UdQDEnlW7pwsSIk4Dpo2vHWALtikH2hlAOEHI3eV/P9N8oWUvmagwozfaJjMFy6wgMjdFBElaxSTUvX17NsmgUNnb6lFsPYnj4mv55R+XgoqcDzbdIDW8ycqflgpkOOP7LJq5DrlI4IE90pAtH46oYySp2cjp0RjeZCFXHTprFt0s64PX63EnTzimWGyOVYwCKkrSvLmLIBfsOOlrkUaE9kssQCjEshYgf9hwAoKGBNWOFmmYpPjTn0kBhJ5dw7GRME4yH8PMNTB5IpXzzGIB/DYDiPXwHKYn+MeP64g9KfBA6KCHsEZZsQ6PqaMjjFfiyJQdgfyxQDBK3pl7D3TSiB2C6lcr0+mYRrDXIMAX+NcQyVwWj0gFGi6Y7Klsz9fUw07Pnae1RJKY8MAtzNUBaA9wo9nz/wWxZP+ATYfQofIlnKUXMSbWYlhiRxuGIA97cw7rHdGrI+iV7vwQpV5qQS7YY8XRkq8W66kWAjBg+SDoUsFZUqj057bf5iZPfF3SuLwFDd6i2pnQ0uAvhLATS7Se wnmEGRQ+ YdyFzGtj3Aa5jkkOmSg35bA2mlNioX4vII7R6C6R9NkVJ+ccqymr8RKBF1e5CHDFAV5vo5n/Xd3eq7sv+XfWkHgB+lJbVPgsGRpZOIALKbFMRQh5GxysN4sru+yU8H9mPO3fAroLT0AEflBnmwLJXBes2qHktj2oRkzdHX2yCl9fAAepS1Dka1zgnTQwgo6xOApSixlmzY4k2u9sd/1Z8m0kbZUZpCgkxsGtn4V85yDD2BF6gpyTjDGew97Ub/k5pJkVQPSYBRwmvuIpMxtEHqn6tQjixBWa/bfIyaIxXEDyYoYhmNWi8GKevjiyLmI2oTpFgeF4eq+kKOPkX2fmnlxwIXvSleC63saqciuNUHzt5IBJm6dY3tFHAZ0dnPNhVeLxHfUFucE+rFWhHiVa7k8KixC4fWR4dyf7gpFgsz84/ASY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In a series posted a few years ago [1], a proposal was put forward to allow the kernel to allocate memory local to a mm and thus push it out of reach for current and future speculation-based cross-process attacks. We still believe this is a nice thing to have. However, in the time passed since that post Linux mm has grown quite a few new goodies, so we'd like to explore possibilities to implement this functionality with less effort and churn leveraging the now available facilities. An RFC was posted few months back [2] to show the proof of concept and a simple test driver. In this RFC, we're using the same approach of implementing mm-local allocations piggy-backing on memfd_secret(), using regular user addresses but pinning the pages and flipping the user/supervisor flag on the respective PTEs to make them directly accessible from kernel. In addition to that we are submitting 5 patches to use the secret memory to hide the vCPU gp-regs and fp-regs on arm64 VHE systems. The generic drawbacks of using user virtual addresses mentioned in the previous RFC [2] still hold, in addition to a more specific one: - While the user virtual addresses allocated for kernel secret memory are not directly accessible by userspace as the PTEs restrict that, copy_from_user() and copy_to_user() can operate on those ranges, so that e.g. the usermode can guess the address and pass it as the target buffer for read(), making the kernel overwrite it with the user-controlled content. Effectively making the secret memory in the current implementation missing confidentiality and integrity guarantees. In the specific case of vCPU registers, this is fine because the owner process can read and write to them using KVM IOCTLs anyway. But in the general case this represents a security concern and needs to be addressed. A possible way forward for the arch-agnostic implementation is to limit the user virtual addresses used for kernel to specific range that can be checked against in copy_from_user() and copy_to_user(). For arch specific implementation, using separate PGD is the way to go. [1] https://lore.kernel.org/lkml/20190612170834.14855-1-mhillenb@amazon.de/ [2] https://lore.kernel.org/lkml/20240621201501.1059948-1-rkagan@amazon.de/ Fares Mehanna / Roman Kagan (2): mseal: expose interface to seal / unseal user memory ranges mm/secretmem: implement mm-local kernel allocations Fares Mehanna (5): arm64: KVM: Refactor C-code to access vCPU gp-registers through macros KVM: Refactor Assembly-code to access vCPU gp-registers through a macro arm64: KVM: Allocate vCPU gp-regs dynamically on VHE and KERNEL_SECRETMEM enabled systems arm64: KVM: Refactor C-code to access vCPU fp-registers through macros arm64: KVM: Allocate vCPU fp-regs dynamically on VHE and KERNEL_SECRETMEM enabled systems arch/arm64/include/asm/kvm_asm.h | 50 ++-- arch/arm64/include/asm/kvm_emulate.h | 2 +- arch/arm64/include/asm/kvm_host.h | 41 +++- arch/arm64/kernel/asm-offsets.c | 1 + arch/arm64/kernel/image-vars.h | 2 + arch/arm64/kvm/arm.c | 90 +++++++- arch/arm64/kvm/fpsimd.c | 2 +- arch/arm64/kvm/guest.c | 14 +- arch/arm64/kvm/hyp/entry.S | 15 ++ arch/arm64/kvm/hyp/include/hyp/switch.h | 6 +- arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 10 +- .../arm64/kvm/hyp/include/nvhe/trap_handler.h | 2 +- arch/arm64/kvm/hyp/nvhe/host.S | 20 +- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 4 +- arch/arm64/kvm/reset.c | 2 +- arch/arm64/kvm/va_layout.c | 38 ++++ include/linux/secretmem.h | 29 +++ mm/Kconfig | 10 + mm/gup.c | 4 +- mm/internal.h | 7 + mm/mseal.c | 81 ++++--- mm/secretmem.c | 213 ++++++++++++++++++ 22 files changed, 559 insertions(+), 84 deletions(-)