From patchwork Tue Jun 13 00:10:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277716 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D42EC88CB9 for ; Tue, 13 Jun 2023 00:12:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 02D416B0078; Mon, 12 Jun 2023 20:12:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 003CE8E0002; Mon, 12 Jun 2023 20:12:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D241E6B007E; Mon, 12 Jun 2023 20:12:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C29D76B0078 for ; Mon, 12 Jun 2023 20:12:13 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 81CD0AF7E8 for ; Tue, 13 Jun 2023 00:12:13 +0000 (UTC) X-FDA: 80895797346.18.63FAF2F Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf08.hostedemail.com (Postfix) with ESMTP id 3C6D2160011 for ; Tue, 13 Jun 2023 00:12:09 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=LQp277AQ; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615130; a=rsa-sha256; cv=none; b=e4pkuC5VxfM+I2Q1lQw+VzhBfbYQZPwg4QRDKE6lmtFtCV8/iZFszjwDAEXv1hYg6CXVrt Sbx6iu83lwItjusXv2WnYEbcIpU6zioJoFHnPcO2OmpkQfeWcrREnx6T7hGti6FI0glQ1i ipHSAal4Cfq3GNMpWeCVhG8M3tZV7yM= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=LQp277AQ; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615130; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=r4qR69WOh9awfBvqereclECVp/zYAJs3umHYUyW05g8=; b=cTMgJmLEsd1gZX0/6jqTFY6RnVMMWwE7etZtvqS5Ce01Jqo9Lu9dIjpVShTwwc2uc1KlZf KkS1pNVxtGTPYupgn7zFYegLNaFOFRuEdfN85J3ara/QBYizr1F2GqwvtNMF47tBah/yjY RdffDkRtSgQhvYXe4FO4cQqgL9QUUYo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615130; x=1718151130; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+8Ah+gwuuWVdk3MaZgJFhmVmdzU+WHC34h/afb7gaAY=; b=LQp277AQF+AhGUDJYeavNrVreevL5C/aPRLewBH6YDBgycnSvl3F6Zic UdKOSVFsHGWZfRHgpMraTnzzOGiu8tssta+tkrce18NOoDtE+d+RTdpMx rUOQaxVuzGX58Y9Bcc+eqJV7qdXY1fxFZXWYCcy3hj4MxnlA+RE5eTn9o pTq8M+EE+civ9APJKvQlQIouV2x1ThvFxQ5fOkSkKtS8jmpDDP2fGj/7r h/qW7YxUCQiAtc0DfCJvs4ev2i/zqM4FEJ/jaoyzKPOpQjBb1YQyOX3L9 YLHHbCIhacZOluir8aC086+WlAgAp2Cyus0tp8W1QcnMuLQqbmryQAmY9 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556663" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556663" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835670967" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835670967" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:06 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, Michal Simek , Dinh Nguyen , linux-mips@vger.kernel.org, openrisc@lists.librecores.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, Linus Torvalds Subject: [PATCH v9 01/42] mm: Rename arch pte_mkwrite()'s to pte_mkwrite_novma() Date: Mon, 12 Jun 2023 17:10:27 -0700 Message-Id: <20230613001108.3040476-2-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 3C6D2160011 X-Stat-Signature: dtk9hg4u7m45b86iptu18j3d9d7btbpq X-HE-Tag: 1686615129-375795 X-HE-Meta: U2FsdGVkX195CCwj5yQvbxSjUXBdnDBh+tb/CVGO0PJivo9Oo+TL3+djiVR4xyylLTHUP3CnFvUgzdDpSLxEXmZnAoDEcqZoNrZi9T+IaVGeyHcFOUaZYsw86CJVCZadJsydOVOnKPab80q1OYucFsqv7G4hRjDrNUhSphQ33sQMgMDRjVNFv8T8rb2+X2Y+PkLTPtEoxFS9/l+IQbqCLUtO7bVH8OvMAOtfrxrVQCsLsQoQljEMkhs99WrDUmvS1g0gqJ8HnX1eFl3tW+N83w6aMlxDpdO/I2YDm0+zOp8eHcf9wTDALgyU5+7m+oOxHXpHm8qt8B/MJcKGeejVyymrw4WtY2dkoLKIN5Tdom8Ekw22cRebJC5prB+rV9AnkUrDfS3Tz9vc6uQcMTWhzDvJxE7e/Nc0ITQ9gqydMBCef9MsOxIONCEbfuU/oQUszSZjgIXWQaDBMlUqdcy18LII0WhnCMLbLI4jCmLZwTmpKIYA84b98Py3CIpJtdB3pbKugLXKALTt0YU5wASSsSgEzXcHUFwRe74MMgVNWeispGZBf8+7wNe1w+u9HeFWh7MPa07R8UAErlWOyKoubH/nK1gay6Qt5HDl/9cjMDQRM23FWRgfQ2znDl+4Jr/YaZFZgh0r2jmZ8HCuS7PkvXuLT068G+wvHFzwNytyuMN/mbK3vF1gbmHZLi480RWpcyB6hQzQAJmEv5rrX3LHdB79wzT6SR/b8pGWdP1HdQVJ6DuOyiHrWcOeLKNJ0/tLdm7JSfLomwO2Jbcfb23FpkJndpex4W6XvAcItjGV6I8TMx5NmH6f0XyycX4JCCcD2g02ygXZ9YyKMxT+cP7TvRy1W6GoSbOnJ8eujNCWIw8K9pEnf2tW2WCx6NjgYr/HUZAQn9lDnymHjfrRstewKMFq+XXxrN/+f8SKQNvmK+hwevffCIxoLG+ejQ01G6zhfqpMxHFogNg9qfq0yHj ZobgUqly VkQId7EJyrs7Av/H1jSSJcL0gfMYkmIGZB8b881ngblCcy3q9fuRyGuFPNak+xWWXP93hrcb97/UjMvQ2c7baFH5PZjS61TfDnlcknlFgroCaNTalA9nk6SrIgVOw/+hegu6bmnjHTfauLbLhJZnojG66c1Z9UyCbGs4qh2x5zQBIXjPqD3H4P2tqo6Fj8aiaLY8DcvNqCMu3XS/eXEwKGWS2mJGHuTFG2kpNJ4/dXMP8Nm36A9Fpnhzjl8v0GFY7g4PKbHXZ+nr3j419Wg3sA6mbVbsANOPaBRiajUMu1fs9ENt7b7Q+XceR/oNOPCsX2Ci2eZ+0nXT33iRvoj8qykG+lLdlO4YlBPfZv+AR1sUB0yqJCsiIYSZrhhAWBvzz2ZADMG61VxoTLh4ytX2PCLRLsEQwMvLcneatsxE4P39/QQN5SMW1k3uu7tQ7RBBFrJJT9TW7u+MdRvo6gZi/PBfXB8DiQatY6GAewZx8mdHoqteWpJcGraY7UX2FqtKk4aSkGIc4X+0O6/VUR5S091DY+rhswClpexOIaPTxFExK+CIbl8loPtRPrGLXwOqMuh5R4DULVdjQy9zBSKoNfL9KYddWUl2aPjtxlGOFkETWsOJAY/fRWiJ3jhsFKOBUf+WrgnBoyL/DGCDUHINqTJHfYu0J2X3ZWjDUgqbQN8IGlOwip5KcJVJnucYSvSozXfIGSaa38oUtc+EqSU4SSRWKDUiqucQzVmyqIzNWmearZKpXg6dwTQH2GP2Tajyj7QbTZfBmYJIpSWPsV4SRxve4+e0BhMGoRqbdr/Y6qT/N5lw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The x86 Shadow stack feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. One of these unusual properties is that shadow stack memory is writable, but only in limited ways. These limits are applied via a specific PTE bit combination. Nevertheless, the memory is writable, and core mm code will need to apply the writable permissions in the typical paths that call pte_mkwrite(). Future patches will make pte_mkwrite() take a VMA, so that the x86 implementation of it can know whether to create regular writable memory or shadow stack memory. But there are a couple of challenges to this. Modifying the signatures of each arch pte_mkwrite() implementation would be error prone because some are generated with macros and would need to be re-implemented. Also, some pte_mkwrite() callers operate on kernel memory without a VMA. So this can be done in a three step process. First pte_mkwrite() can be renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite() added that just calls pte_mkwrite_novma(). Next callers without a VMA can be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers can be changed to take/pass a VMA. Start the process by renaming pte_mkwrite() to pte_mkwrite_novma() and adding the pte_mkwrite() wrapper in linux/pgtable.h. Apply the same pattern for pmd_mkwrite(). Since not all archs have a pmd_mkwrite_novma(), create a new arch config HAS_HUGE_PAGE that can be used to tell if pmd_mkwrite() should be defined. Otherwise in the !HAS_HUGE_PAGE cases the compiler would not be able to find pmd_mkwrite_novma(). No functional change. Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-alpha@vger.kernel.org Cc: linux-snps-arc@lists.infradead.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-csky@vger.kernel.org Cc: linux-hexagon@vger.kernel.org Cc: linux-ia64@vger.kernel.org Cc: loongarch@lists.linux.dev Cc: linux-m68k@lists.linux-m68k.org Cc: Michal Simek Cc: Dinh Nguyen Cc: linux-mips@vger.kernel.org Cc: openrisc@lists.librecores.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-riscv@lists.infradead.org Cc: linux-s390@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-um@lists.infradead.org Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Suggested-by: Linus Torvalds Signed-off-by: Rick Edgecombe Link: https://lore.kernel.org/lkml/CAHk-=wiZjSu7c9sFYZb3q04108stgHff2wfbokGCCgW7riz+8Q@mail.gmail.com/ Acked-by: Geert Uytterhoeven Reviewed-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand --- Hi Non-x86 Arch’s, x86 has a feature that allows for the creation of a special type of writable memory (shadow stack) that is only writable in limited specific ways. Previously, changes were proposed to core MM code to teach it to decide when to create normally writable memory or the special shadow stack writable memory, but David Hildenbrand suggested[0] to change pXX_mkwrite() to take a VMA, so awareness of shadow stack memory can be moved into x86 code. Later Linus suggested a less error-prone way[1] to go about this after the first attempt had a bug. Since pXX_mkwrite() is defined in every arch, it requires some tree-wide changes. So that is why you are seeing some patches out of a big x86 series pop up in your arch mailing list. There is no functional change. After this refactor, the shadow stack series goes on to use the arch helpers to push arch memory details inside arch/x86 and other arch's with upcoming shadow stack features. Testing was just 0-day build testing. Hopefully that is enough context. Thanks! [0] https://lore.kernel.org/lkml/0e29a2d0-08d8-bcd6-ff26-4bea0e4037b0@redhat.com/ [1] https://lore.kernel.org/lkml/CAHk-=wiZjSu7c9sFYZb3q04108stgHff2wfbokGCCgW7riz+8Q@mail.gmail.com/ --- Documentation/mm/arch_pgtable_helpers.rst | 6 ++++++ arch/Kconfig | 3 +++ arch/alpha/include/asm/pgtable.h | 2 +- arch/arc/include/asm/hugepage.h | 2 +- arch/arc/include/asm/pgtable-bits-arcv2.h | 2 +- arch/arm/include/asm/pgtable-3level.h | 2 +- arch/arm/include/asm/pgtable.h | 2 +- arch/arm64/include/asm/pgtable.h | 4 ++-- arch/csky/include/asm/pgtable.h | 2 +- arch/hexagon/include/asm/pgtable.h | 2 +- arch/ia64/include/asm/pgtable.h | 2 +- arch/loongarch/include/asm/pgtable.h | 4 ++-- arch/m68k/include/asm/mcf_pgtable.h | 2 +- arch/m68k/include/asm/motorola_pgtable.h | 2 +- arch/m68k/include/asm/sun3_pgtable.h | 2 +- arch/microblaze/include/asm/pgtable.h | 2 +- arch/mips/include/asm/pgtable.h | 6 +++--- arch/nios2/include/asm/pgtable.h | 2 +- arch/openrisc/include/asm/pgtable.h | 2 +- arch/parisc/include/asm/pgtable.h | 2 +- arch/powerpc/include/asm/book3s/32/pgtable.h | 2 +- arch/powerpc/include/asm/book3s/64/pgtable.h | 4 ++-- arch/powerpc/include/asm/nohash/32/pgtable.h | 4 ++-- arch/powerpc/include/asm/nohash/32/pte-8xx.h | 4 ++-- arch/powerpc/include/asm/nohash/64/pgtable.h | 2 +- arch/riscv/include/asm/pgtable.h | 6 +++--- arch/s390/include/asm/hugetlb.h | 2 +- arch/s390/include/asm/pgtable.h | 4 ++-- arch/sh/include/asm/pgtable_32.h | 4 ++-- arch/sparc/include/asm/pgtable_32.h | 2 +- arch/sparc/include/asm/pgtable_64.h | 6 +++--- arch/um/include/asm/pgtable.h | 2 +- arch/x86/include/asm/pgtable.h | 4 ++-- arch/xtensa/include/asm/pgtable.h | 2 +- include/asm-generic/hugetlb.h | 2 +- include/linux/pgtable.h | 14 ++++++++++++++ 36 files changed, 70 insertions(+), 47 deletions(-) diff --git a/Documentation/mm/arch_pgtable_helpers.rst b/Documentation/mm/arch_pgtable_helpers.rst index af3891f895b0..69ce1f2aa4d1 100644 --- a/Documentation/mm/arch_pgtable_helpers.rst +++ b/Documentation/mm/arch_pgtable_helpers.rst @@ -48,6 +48,9 @@ PTE Page Table Helpers +---------------------------+--------------------------------------------------+ | pte_mkwrite | Creates a writable PTE | +---------------------------+--------------------------------------------------+ +| pte_mkwrite_novma | Creates a writable PTE, of the conventional type | +| | of writable. | ++---------------------------+--------------------------------------------------+ | pte_wrprotect | Creates a write protected PTE | +---------------------------+--------------------------------------------------+ | pte_mkspecial | Creates a special PTE | @@ -120,6 +123,9 @@ PMD Page Table Helpers +---------------------------+--------------------------------------------------+ | pmd_mkwrite | Creates a writable PMD | +---------------------------+--------------------------------------------------+ +| pmd_mkwrite_novma | Creates a writable PMD, of the conventional type | +| | of writable. | ++---------------------------+--------------------------------------------------+ | pmd_wrprotect | Creates a write protected PMD | +---------------------------+--------------------------------------------------+ | pmd_mkspecial | Creates a special PMD | diff --git a/arch/Kconfig b/arch/Kconfig index 205fd23e0cad..3bc11c9a2ac1 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -919,6 +919,9 @@ config HAVE_ARCH_HUGE_VMALLOC config ARCH_WANT_HUGE_PMD_SHARE bool +config HAS_HUGE_PAGE + def_bool HAVE_ARCH_HUGE_VMAP || TRANSPARENT_HUGEPAGE || HUGETLBFS + config HAVE_ARCH_SOFT_DIRTY bool diff --git a/arch/alpha/include/asm/pgtable.h b/arch/alpha/include/asm/pgtable.h index ba43cb841d19..af1a13ab3320 100644 --- a/arch/alpha/include/asm/pgtable.h +++ b/arch/alpha/include/asm/pgtable.h @@ -256,7 +256,7 @@ extern inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED; extern inline pte_t pte_wrprotect(pte_t pte) { pte_val(pte) |= _PAGE_FOW; return pte; } extern inline pte_t pte_mkclean(pte_t pte) { pte_val(pte) &= ~(__DIRTY_BITS); return pte; } extern inline pte_t pte_mkold(pte_t pte) { pte_val(pte) &= ~(__ACCESS_BITS); return pte; } -extern inline pte_t pte_mkwrite(pte_t pte) { pte_val(pte) &= ~_PAGE_FOW; return pte; } +extern inline pte_t pte_mkwrite_novma(pte_t pte){ pte_val(pte) &= ~_PAGE_FOW; return pte; } extern inline pte_t pte_mkdirty(pte_t pte) { pte_val(pte) |= __DIRTY_BITS; return pte; } extern inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= __ACCESS_BITS; return pte; } diff --git a/arch/arc/include/asm/hugepage.h b/arch/arc/include/asm/hugepage.h index 5001b796fb8d..ef8d4166370c 100644 --- a/arch/arc/include/asm/hugepage.h +++ b/arch/arc/include/asm/hugepage.h @@ -21,7 +21,7 @@ static inline pmd_t pte_pmd(pte_t pte) } #define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd))) -#define pmd_mkwrite(pmd) pte_pmd(pte_mkwrite(pmd_pte(pmd))) +#define pmd_mkwrite_novma(pmd) pte_pmd(pte_mkwrite_novma(pmd_pte(pmd))) #define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd))) #define pmd_mkold(pmd) pte_pmd(pte_mkold(pmd_pte(pmd))) #define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd))) diff --git a/arch/arc/include/asm/pgtable-bits-arcv2.h b/arch/arc/include/asm/pgtable-bits-arcv2.h index 6e9f8ca6d6a1..5c073d9f41c2 100644 --- a/arch/arc/include/asm/pgtable-bits-arcv2.h +++ b/arch/arc/include/asm/pgtable-bits-arcv2.h @@ -87,7 +87,7 @@ PTE_BIT_FUNC(mknotpresent, &= ~(_PAGE_PRESENT)); PTE_BIT_FUNC(wrprotect, &= ~(_PAGE_WRITE)); -PTE_BIT_FUNC(mkwrite, |= (_PAGE_WRITE)); +PTE_BIT_FUNC(mkwrite_novma, |= (_PAGE_WRITE)); PTE_BIT_FUNC(mkclean, &= ~(_PAGE_DIRTY)); PTE_BIT_FUNC(mkdirty, |= (_PAGE_DIRTY)); PTE_BIT_FUNC(mkold, &= ~(_PAGE_ACCESSED)); diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index 106049791500..71c3add6417f 100644 --- a/arch/arm/include/asm/pgtable-3level.h +++ b/arch/arm/include/asm/pgtable-3level.h @@ -202,7 +202,7 @@ static inline pmd_t pmd_##fn(pmd_t pmd) { pmd_val(pmd) op; return pmd; } PMD_BIT_FUNC(wrprotect, |= L_PMD_SECT_RDONLY); PMD_BIT_FUNC(mkold, &= ~PMD_SECT_AF); -PMD_BIT_FUNC(mkwrite, &= ~L_PMD_SECT_RDONLY); +PMD_BIT_FUNC(mkwrite_novma, &= ~L_PMD_SECT_RDONLY); PMD_BIT_FUNC(mkdirty, |= L_PMD_SECT_DIRTY); PMD_BIT_FUNC(mkclean, &= ~L_PMD_SECT_DIRTY); PMD_BIT_FUNC(mkyoung, |= PMD_SECT_AF); diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index a58ccbb406ad..f37ba2472eae 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -227,7 +227,7 @@ static inline pte_t pte_wrprotect(pte_t pte) return set_pte_bit(pte, __pgprot(L_PTE_RDONLY)); } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { return clear_pte_bit(pte, __pgprot(L_PTE_RDONLY)); } diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 0bd18de9fd97..7a3d62cb9bee 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -180,7 +180,7 @@ static inline pmd_t set_pmd_bit(pmd_t pmd, pgprot_t prot) return pmd; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte = set_pte_bit(pte, __pgprot(PTE_WRITE)); pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY)); @@ -487,7 +487,7 @@ static inline int pmd_trans_huge(pmd_t pmd) #define pmd_cont(pmd) pte_cont(pmd_pte(pmd)) #define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd))) #define pmd_mkold(pmd) pte_pmd(pte_mkold(pmd_pte(pmd))) -#define pmd_mkwrite(pmd) pte_pmd(pte_mkwrite(pmd_pte(pmd))) +#define pmd_mkwrite_novma(pmd) pte_pmd(pte_mkwrite_novma(pmd_pte(pmd))) #define pmd_mkclean(pmd) pte_pmd(pte_mkclean(pmd_pte(pmd))) #define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd))) #define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd))) diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h index d4042495febc..aa0cce4fc02f 100644 --- a/arch/csky/include/asm/pgtable.h +++ b/arch/csky/include/asm/pgtable.h @@ -176,7 +176,7 @@ static inline pte_t pte_mkold(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte_val(pte) |= _PAGE_WRITE; if (pte_val(pte) & _PAGE_MODIFIED) diff --git a/arch/hexagon/include/asm/pgtable.h b/arch/hexagon/include/asm/pgtable.h index 59393613d086..fc2d2d83368d 100644 --- a/arch/hexagon/include/asm/pgtable.h +++ b/arch/hexagon/include/asm/pgtable.h @@ -300,7 +300,7 @@ static inline pte_t pte_wrprotect(pte_t pte) } /* pte_mkwrite - mark page as writable */ -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte_val(pte) |= _PAGE_WRITE; return pte; diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h index 21c97e31a28a..f80aba7cad99 100644 --- a/arch/ia64/include/asm/pgtable.h +++ b/arch/ia64/include/asm/pgtable.h @@ -268,7 +268,7 @@ ia64_phys_addr_valid (unsigned long addr) * access rights: */ #define pte_wrprotect(pte) (__pte(pte_val(pte) & ~_PAGE_AR_RW)) -#define pte_mkwrite(pte) (__pte(pte_val(pte) | _PAGE_AR_RW)) +#define pte_mkwrite_novma(pte) (__pte(pte_val(pte) | _PAGE_AR_RW)) #define pte_mkold(pte) (__pte(pte_val(pte) & ~_PAGE_A)) #define pte_mkyoung(pte) (__pte(pte_val(pte) | _PAGE_A)) #define pte_mkclean(pte) (__pte(pte_val(pte) & ~_PAGE_D)) diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h index d28fb9dbec59..8245cf367b31 100644 --- a/arch/loongarch/include/asm/pgtable.h +++ b/arch/loongarch/include/asm/pgtable.h @@ -390,7 +390,7 @@ static inline pte_t pte_mkdirty(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte_val(pte) |= _PAGE_WRITE; if (pte_val(pte) & _PAGE_MODIFIED) @@ -490,7 +490,7 @@ static inline int pmd_write(pmd_t pmd) return !!(pmd_val(pmd) & _PAGE_WRITE); } -static inline pmd_t pmd_mkwrite(pmd_t pmd) +static inline pmd_t pmd_mkwrite_novma(pmd_t pmd) { pmd_val(pmd) |= _PAGE_WRITE; if (pmd_val(pmd) & _PAGE_MODIFIED) diff --git a/arch/m68k/include/asm/mcf_pgtable.h b/arch/m68k/include/asm/mcf_pgtable.h index d97fbb812f63..42ebea0488e3 100644 --- a/arch/m68k/include/asm/mcf_pgtable.h +++ b/arch/m68k/include/asm/mcf_pgtable.h @@ -211,7 +211,7 @@ static inline pte_t pte_mkold(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte_val(pte) |= CF_PAGE_WRITABLE; return pte; diff --git a/arch/m68k/include/asm/motorola_pgtable.h b/arch/m68k/include/asm/motorola_pgtable.h index ec0dc19ab834..ba28ca4d219a 100644 --- a/arch/m68k/include/asm/motorola_pgtable.h +++ b/arch/m68k/include/asm/motorola_pgtable.h @@ -155,7 +155,7 @@ static inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED; static inline pte_t pte_wrprotect(pte_t pte) { pte_val(pte) |= _PAGE_RONLY; return pte; } static inline pte_t pte_mkclean(pte_t pte) { pte_val(pte) &= ~_PAGE_DIRTY; return pte; } static inline pte_t pte_mkold(pte_t pte) { pte_val(pte) &= ~_PAGE_ACCESSED; return pte; } -static inline pte_t pte_mkwrite(pte_t pte) { pte_val(pte) &= ~_PAGE_RONLY; return pte; } +static inline pte_t pte_mkwrite_novma(pte_t pte){ pte_val(pte) &= ~_PAGE_RONLY; return pte; } static inline pte_t pte_mkdirty(pte_t pte) { pte_val(pte) |= _PAGE_DIRTY; return pte; } static inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= _PAGE_ACCESSED; return pte; } static inline pte_t pte_mknocache(pte_t pte) diff --git a/arch/m68k/include/asm/sun3_pgtable.h b/arch/m68k/include/asm/sun3_pgtable.h index e582b0484a55..4114eaff7404 100644 --- a/arch/m68k/include/asm/sun3_pgtable.h +++ b/arch/m68k/include/asm/sun3_pgtable.h @@ -143,7 +143,7 @@ static inline int pte_young(pte_t pte) { return pte_val(pte) & SUN3_PAGE_ACCESS static inline pte_t pte_wrprotect(pte_t pte) { pte_val(pte) &= ~SUN3_PAGE_WRITEABLE; return pte; } static inline pte_t pte_mkclean(pte_t pte) { pte_val(pte) &= ~SUN3_PAGE_MODIFIED; return pte; } static inline pte_t pte_mkold(pte_t pte) { pte_val(pte) &= ~SUN3_PAGE_ACCESSED; return pte; } -static inline pte_t pte_mkwrite(pte_t pte) { pte_val(pte) |= SUN3_PAGE_WRITEABLE; return pte; } +static inline pte_t pte_mkwrite_novma(pte_t pte){ pte_val(pte) |= SUN3_PAGE_WRITEABLE; return pte; } static inline pte_t pte_mkdirty(pte_t pte) { pte_val(pte) |= SUN3_PAGE_MODIFIED; return pte; } static inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= SUN3_PAGE_ACCESSED; return pte; } static inline pte_t pte_mknocache(pte_t pte) { pte_val(pte) |= SUN3_PAGE_NOCACHE; return pte; } diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h index d1b8272abcd9..9108b33a7886 100644 --- a/arch/microblaze/include/asm/pgtable.h +++ b/arch/microblaze/include/asm/pgtable.h @@ -266,7 +266,7 @@ static inline pte_t pte_mkread(pte_t pte) \ { pte_val(pte) |= _PAGE_USER; return pte; } static inline pte_t pte_mkexec(pte_t pte) \ { pte_val(pte) |= _PAGE_USER | _PAGE_EXEC; return pte; } -static inline pte_t pte_mkwrite(pte_t pte) \ +static inline pte_t pte_mkwrite_novma(pte_t pte) \ { pte_val(pte) |= _PAGE_RW; return pte; } static inline pte_t pte_mkdirty(pte_t pte) \ { pte_val(pte) |= _PAGE_DIRTY; return pte; } diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h index 574fa14ac8b2..40a54fd6e48d 100644 --- a/arch/mips/include/asm/pgtable.h +++ b/arch/mips/include/asm/pgtable.h @@ -309,7 +309,7 @@ static inline pte_t pte_mkold(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte.pte_low |= _PAGE_WRITE; if (pte.pte_low & _PAGE_MODIFIED) { @@ -364,7 +364,7 @@ static inline pte_t pte_mkold(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte_val(pte) |= _PAGE_WRITE; if (pte_val(pte) & _PAGE_MODIFIED) @@ -627,7 +627,7 @@ static inline pmd_t pmd_wrprotect(pmd_t pmd) return pmd; } -static inline pmd_t pmd_mkwrite(pmd_t pmd) +static inline pmd_t pmd_mkwrite_novma(pmd_t pmd) { pmd_val(pmd) |= _PAGE_WRITE; if (pmd_val(pmd) & _PAGE_MODIFIED) diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h index 0f5c2564e9f5..cf1ffbc1a121 100644 --- a/arch/nios2/include/asm/pgtable.h +++ b/arch/nios2/include/asm/pgtable.h @@ -129,7 +129,7 @@ static inline pte_t pte_mkold(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte_val(pte) |= _PAGE_WRITE; return pte; diff --git a/arch/openrisc/include/asm/pgtable.h b/arch/openrisc/include/asm/pgtable.h index 3eb9b9555d0d..828820c74fc5 100644 --- a/arch/openrisc/include/asm/pgtable.h +++ b/arch/openrisc/include/asm/pgtable.h @@ -250,7 +250,7 @@ static inline pte_t pte_mkold(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte_val(pte) |= _PAGE_WRITE; return pte; diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h index e715df5385d6..79d1cef2fd7c 100644 --- a/arch/parisc/include/asm/pgtable.h +++ b/arch/parisc/include/asm/pgtable.h @@ -331,7 +331,7 @@ static inline pte_t pte_mkold(pte_t pte) { pte_val(pte) &= ~_PAGE_ACCESSED; retu static inline pte_t pte_wrprotect(pte_t pte) { pte_val(pte) &= ~_PAGE_WRITE; return pte; } static inline pte_t pte_mkdirty(pte_t pte) { pte_val(pte) |= _PAGE_DIRTY; return pte; } static inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= _PAGE_ACCESSED; return pte; } -static inline pte_t pte_mkwrite(pte_t pte) { pte_val(pte) |= _PAGE_WRITE; return pte; } +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte_val(pte) |= _PAGE_WRITE; return pte; } static inline pte_t pte_mkspecial(pte_t pte) { pte_val(pte) |= _PAGE_SPECIAL; return pte; } /* diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h index 7bf1fe7297c6..67dfb674a4c1 100644 --- a/arch/powerpc/include/asm/book3s/32/pgtable.h +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h @@ -498,7 +498,7 @@ static inline pte_t pte_mkpte(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { return __pte(pte_val(pte) | _PAGE_RW); } diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 4acc9690f599..0328d917494a 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -600,7 +600,7 @@ static inline pte_t pte_mkexec(pte_t pte) return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_EXEC)); } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { /* * write implies read, hence set both @@ -1071,7 +1071,7 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd) #define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd))) #define pmd_mkclean(pmd) pte_pmd(pte_mkclean(pmd_pte(pmd))) #define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd))) -#define pmd_mkwrite(pmd) pte_pmd(pte_mkwrite(pmd_pte(pmd))) +#define pmd_mkwrite_novma(pmd) pte_pmd(pte_mkwrite_novma(pmd_pte(pmd))) #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY #define pmd_soft_dirty(pmd) pte_soft_dirty(pmd_pte(pmd)) diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h index fec56d965f00..33213b31fcbb 100644 --- a/arch/powerpc/include/asm/nohash/32/pgtable.h +++ b/arch/powerpc/include/asm/nohash/32/pgtable.h @@ -170,8 +170,8 @@ void unmap_kernel_page(unsigned long va); #define pte_clear(mm, addr, ptep) \ do { pte_update(mm, addr, ptep, ~0, 0, 0); } while (0) -#ifndef pte_mkwrite -static inline pte_t pte_mkwrite(pte_t pte) +#ifndef pte_mkwrite_novma +static inline pte_t pte_mkwrite_novma(pte_t pte) { return __pte(pte_val(pte) | _PAGE_RW); } diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h b/arch/powerpc/include/asm/nohash/32/pte-8xx.h index 1a89ebdc3acc..21f681ee535a 100644 --- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h +++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h @@ -101,12 +101,12 @@ static inline int pte_write(pte_t pte) #define pte_write pte_write -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { return __pte(pte_val(pte) & ~_PAGE_RO); } -#define pte_mkwrite pte_mkwrite +#define pte_mkwrite_novma pte_mkwrite_novma static inline bool pte_user(pte_t pte) { diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h b/arch/powerpc/include/asm/nohash/64/pgtable.h index 287e25864ffa..abe4fd82721e 100644 --- a/arch/powerpc/include/asm/nohash/64/pgtable.h +++ b/arch/powerpc/include/asm/nohash/64/pgtable.h @@ -85,7 +85,7 @@ #ifndef __ASSEMBLY__ /* pte_clear moved to later in this file */ -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { return __pte(pte_val(pte) | _PAGE_RW); } diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 2258b27173b0..b38faec98154 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -379,7 +379,7 @@ static inline pte_t pte_wrprotect(pte_t pte) /* static inline pte_t pte_mkread(pte_t pte) */ -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { return __pte(pte_val(pte) | _PAGE_WRITE); } @@ -665,9 +665,9 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) return pte_pmd(pte_mkyoung(pmd_pte(pmd))); } -static inline pmd_t pmd_mkwrite(pmd_t pmd) +static inline pmd_t pmd_mkwrite_novma(pmd_t pmd) { - return pte_pmd(pte_mkwrite(pmd_pte(pmd))); + return pte_pmd(pte_mkwrite_novma(pmd_pte(pmd))); } static inline pmd_t pmd_wrprotect(pmd_t pmd) diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetlb.h index ccdbccfde148..f07267875a19 100644 --- a/arch/s390/include/asm/hugetlb.h +++ b/arch/s390/include/asm/hugetlb.h @@ -104,7 +104,7 @@ static inline int huge_pte_dirty(pte_t pte) static inline pte_t huge_pte_mkwrite(pte_t pte) { - return pte_mkwrite(pte); + return pte_mkwrite_novma(pte); } static inline pte_t huge_pte_mkdirty(pte_t pte) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 6822a11c2c8a..699406036f30 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1005,7 +1005,7 @@ static inline pte_t pte_wrprotect(pte_t pte) return set_pte_bit(pte, __pgprot(_PAGE_PROTECT)); } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte = set_pte_bit(pte, __pgprot(_PAGE_WRITE)); if (pte_val(pte) & _PAGE_DIRTY) @@ -1488,7 +1488,7 @@ static inline pmd_t pmd_wrprotect(pmd_t pmd) return set_pmd_bit(pmd, __pgprot(_SEGMENT_ENTRY_PROTECT)); } -static inline pmd_t pmd_mkwrite(pmd_t pmd) +static inline pmd_t pmd_mkwrite_novma(pmd_t pmd) { pmd = set_pmd_bit(pmd, __pgprot(_SEGMENT_ENTRY_WRITE)); if (pmd_val(pmd) & _SEGMENT_ENTRY_DIRTY) diff --git a/arch/sh/include/asm/pgtable_32.h b/arch/sh/include/asm/pgtable_32.h index 21952b094650..165b4fd08152 100644 --- a/arch/sh/include/asm/pgtable_32.h +++ b/arch/sh/include/asm/pgtable_32.h @@ -359,11 +359,11 @@ static inline pte_t pte_##fn(pte_t pte) { pte.pte_##h op; return pte; } * kernel permissions), we attempt to couple them a bit more sanely here. */ PTE_BIT_FUNC(high, wrprotect, &= ~(_PAGE_EXT_USER_WRITE | _PAGE_EXT_KERN_WRITE)); -PTE_BIT_FUNC(high, mkwrite, |= _PAGE_EXT_USER_WRITE | _PAGE_EXT_KERN_WRITE); +PTE_BIT_FUNC(high, mkwrite_novma, |= _PAGE_EXT_USER_WRITE | _PAGE_EXT_KERN_WRITE); PTE_BIT_FUNC(high, mkhuge, |= _PAGE_SZHUGE); #else PTE_BIT_FUNC(low, wrprotect, &= ~_PAGE_RW); -PTE_BIT_FUNC(low, mkwrite, |= _PAGE_RW); +PTE_BIT_FUNC(low, mkwrite_novma, |= _PAGE_RW); PTE_BIT_FUNC(low, mkhuge, |= _PAGE_SZHUGE); #endif diff --git a/arch/sparc/include/asm/pgtable_32.h b/arch/sparc/include/asm/pgtable_32.h index d4330e3c57a6..a2d909446539 100644 --- a/arch/sparc/include/asm/pgtable_32.h +++ b/arch/sparc/include/asm/pgtable_32.h @@ -241,7 +241,7 @@ static inline pte_t pte_mkold(pte_t pte) return __pte(pte_val(pte) & ~SRMMU_REF); } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { return __pte(pte_val(pte) | SRMMU_WRITE); } diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h index 5563efa1a19f..4dd4f6cdc670 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -517,7 +517,7 @@ static inline pte_t pte_mkclean(pte_t pte) return __pte(val); } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { unsigned long val = pte_val(pte), mask; @@ -772,11 +772,11 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) return __pmd(pte_val(pte)); } -static inline pmd_t pmd_mkwrite(pmd_t pmd) +static inline pmd_t pmd_mkwrite_novma(pmd_t pmd) { pte_t pte = __pte(pmd_val(pmd)); - pte = pte_mkwrite(pte); + pte = pte_mkwrite_novma(pte); return __pmd(pte_val(pte)); } diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h index a70d1618eb35..46f59a8bc812 100644 --- a/arch/um/include/asm/pgtable.h +++ b/arch/um/include/asm/pgtable.h @@ -207,7 +207,7 @@ static inline pte_t pte_mkyoung(pte_t pte) return(pte); } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { if (unlikely(pte_get_bits(pte, _PAGE_RW))) return pte; diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 15ae4d6ba476..112e6060eafa 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -352,7 +352,7 @@ static inline pte_t pte_mkyoung(pte_t pte) return pte_set_flags(pte, _PAGE_ACCESSED); } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { return pte_set_flags(pte, _PAGE_RW); } @@ -453,7 +453,7 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) return pmd_set_flags(pmd, _PAGE_ACCESSED); } -static inline pmd_t pmd_mkwrite(pmd_t pmd) +static inline pmd_t pmd_mkwrite_novma(pmd_t pmd) { return pmd_set_flags(pmd, _PAGE_RW); } diff --git a/arch/xtensa/include/asm/pgtable.h b/arch/xtensa/include/asm/pgtable.h index fc7a14884c6c..27e3ae38a5de 100644 --- a/arch/xtensa/include/asm/pgtable.h +++ b/arch/xtensa/include/asm/pgtable.h @@ -262,7 +262,7 @@ static inline pte_t pte_mkdirty(pte_t pte) { pte_val(pte) |= _PAGE_DIRTY; return pte; } static inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= _PAGE_ACCESSED; return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_novma(pte_t pte) { pte_val(pte) |= _PAGE_WRITABLE; return pte; } #define pgprot_noncached(prot) \ diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h index d7f6335d3999..4da02798a00b 100644 --- a/include/asm-generic/hugetlb.h +++ b/include/asm-generic/hugetlb.h @@ -22,7 +22,7 @@ static inline unsigned long huge_pte_dirty(pte_t pte) static inline pte_t huge_pte_mkwrite(pte_t pte) { - return pte_mkwrite(pte); + return pte_mkwrite_novma(pte); } #ifndef __HAVE_ARCH_HUGE_PTE_WRPROTECT diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index c5a51481bbb9..ae271a307584 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -507,6 +507,20 @@ extern pud_t pudp_huge_clear_flush(struct vm_area_struct *vma, pud_t *pudp); #endif +#ifndef pte_mkwrite +static inline pte_t pte_mkwrite(pte_t pte) +{ + return pte_mkwrite_novma(pte); +} +#endif + +#if defined(CONFIG_HAS_HUGE_PAGE) && !defined(pmd_mkwrite) +static inline pmd_t pmd_mkwrite(pmd_t pmd) +{ + return pmd_mkwrite_novma(pmd); +} +#endif + #ifndef __HAVE_ARCH_PTEP_SET_WRPROTECT struct mm_struct; static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long address, pte_t *ptep) From patchwork Tue Jun 13 00:10:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277717 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1E6BC88CB2 for ; Tue, 13 Jun 2023 00:12:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9D3EB8E0005; Mon, 12 Jun 2023 20:12:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 97CC28E0002; Mon, 12 Jun 2023 20:12:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 81EE86B0081; Mon, 12 Jun 2023 20:12:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 6535B6B007B for ; Mon, 12 Jun 2023 20:12:14 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 2C9ED1A0386 for ; Tue, 13 Jun 2023 00:12:14 +0000 (UTC) X-FDA: 80895797388.13.BEECB08 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf01.hostedemail.com (Postfix) with ESMTP id D2DA74000C for ; Tue, 13 Jun 2023 00:12:10 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=b2BP33R2; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615131; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bhEM/cU7s2w8AGabEsZ/KJXI1WiWuC9Y86R+7+JHEQk=; b=ghi61kj1GLgLO789WW3RFhZRWTvXkpsW043K7NB0GBye6u8pzIuv83Su+LrhP5v97yb4u0 fQenGdjXU1RiudjMJp4vBN5axbyBSrLrWn+kO+uTuqsR6pVwFlrplP27Z+AFqhtbrbEXcA 7Jce77FLZlsMU+3fSdg28qHGQe8YYqE= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=b2BP33R2; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615131; a=rsa-sha256; cv=none; b=TtG+qEYc34LOLG2xvsZH9+hHRCtJxnAhobJieDF9ZnRsweJfMd00gnm3Ffcb+oIPUNvItZ VoJlQJbk/amdN95klKmtQNJA1V5qE6qfV0cMcOdhl/fcNFLk4sonwhkH9083VlKT2lY/kv b8OrEIn9sP0P9RxajLgDuQ5k85Em2fo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615130; x=1718151130; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3c77rqX6YUy6Cyz5r8ePlQOYiEkqAYIhDQVjx5oVoZw=; b=b2BP33R2x4W89UJISrB9QYeO5rGrEZ1SyCKsZ3eaSks8xsjHUHy5tIeK wHWWbo3Suujw4Y70dHzl2ylpHVmW+Qc+nd3Ep2R+BH1I1lCGhvuro7HFz 67Pd9IaDXVWFhK+L6r5aVc4ubsMU8sdt5CqjH92DXGWcGke7qHFabqClo Bs0EmXjyLkqmaYBNROdV1+6WLxsCfdK3s14i7fdf+YrA9d5aM4lQW8XlH qGB0ZlSh1D+tLAU91QtyWOZPpR+hcWPT2YkQp1++pRtL1bwW5XGrAwy6M Uf3IbkLAeJGoC7/v41gKJzoOaQeX/RnYQT5Pc5M3ck8NMIz/1akdfjINs Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556687" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556687" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835670972" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835670972" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:07 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, linux-arm-kernel@lists.infradead.org, linux-s390@vger.kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH v9 02/42] mm: Move pte/pmd_mkwrite() callers with no VMA to _novma() Date: Mon, 12 Jun 2023 17:10:28 -0700 Message-Id: <20230613001108.3040476-3-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: D2DA74000C X-Stat-Signature: gqysrr94p48s41gsm9swpizguddmohit X-Rspam-User: X-HE-Tag: 1686615130-837162 X-HE-Meta: U2FsdGVkX18YqUFe8l15112J079isoZplUEEHBF71WWvbToEQKXO/dFA0THeP3Dcx+nOzRfXdqKbGpZdEtMZRz+b98rjNKv4YXGVLKvFk2hvDs/7yjSP2pmHWWUlWS8Csd1cJULYK9my3uvGkCYjOlVNEwGyluUKt90izOJ9B0dQEVlJBLSWHR/KTGM87wMjsuWEOyBYV5nnCjfOV2m8yyJurP7hkG6ZfA7q0OUjPDgnUxqY9HZBbd/nGS2GRjkCwTXJ5GNAhtcdyZmlC/0HzKBmZ9gOQdYgtzSbUMSwBnXpiyv5DrdRAEdkPIfdkGQLm2vLPvzbHZsLvzpWkciMRZxCsqqQeP3rVwMGNyP4FaVTrJ75eMi2+UlALx4/Q+eTU3uEFY8EgjwWM4NkibDE2Q4k6ZglQ8UBQ9wQIPqD7Y9WOZzNt3K+YQShPr3i86CJTg5PrgBLWePdk+5MisgkokCnidT+WAG3pSuFCcv6rZckUBjXAcuSwexm/fccI2EjAVi0lkoKo/ssRGZFmN/mvOQPEEtVAbivaegLmhQrykC8/pZIamlmFSTZ86Ha7tNrG++PvkHo0MHEYi7H9TLT5C4RHorVnyxxqA4UvI+ms/9FBcYj1vmyZyzcYuPkwBwFNi3I1w8uULAjPK3DUgG7jIaYD4YSiWUQO/DVmXu1ykqRbfPKnln7xoFjMGzfpH9M0L1zcSr4adm+vq5UL+pIrfS3QeqriuwEVkxi83L3Ll+gc7rvDXOQsS0Poxdg2FnOVL85dybhPTaFBTvTiYtjr6cr+s+yZVdyZhCJJ8wrkjQ4NkMEPdd5K8m/YHhQiPjae4YMgXkCKL6fxdJnLLFzZUuafugjn9wPai1anUzij/SkUs2KxzVIUdXk0OHv+97d5m+14MfkrOoYxf1KPIfoTsAFol9GDn63S5zT1PngdrOHDEi/kZmzwXUnBl7jCQYrV0Yh/e+S2b3VFq3oTtQ XwWrRYua OrBzNyKZC+8ZRaN6u2am1/L/vQMn+tVXJrZDHjtH68dyONMb2P/ck9HmKZ8/CI7d/tfwrYXQ8rkWUul0jWlr1vsl/V5gm4InrmueApSgmUYU00nfcEAmxaXkfkrxgVakKpotB2NO9pdVmcc9pZGBovtcF1KXk+66iPIEzbBH9tLHInCqqYz2saEAAAliORDfhCspxZOuIrH5ArAmuJdkRgBKYii3m6z7sI87GQ3Ej0pDKQABi/mwHXa+8rDx9QYkN7GybI9ijh6YU6/+dv0I0IGr1fTVHszTfBzzBmGhEiYlK78zCPvh9aILY+jehEWyMiWr+7uisnhcylpgT0rKC3WzZdVZ/ZEeXxmy5pd5/updoSxNmQQXXL+vGzt/ELpP6ufAp5fZwa8SW5izyLtFbX1yzycIcz6D/y+M7Bc0tekWWRgVBw5sq6e9ub++KnI2BL08Vp8gNZDxHTTGRbiGWaOBZXymIovvaKFPgaf0+TXlklcu+q1pAKQwQ6+QerQNxpePb7yHBnpMUlYUPk6KcRTm9S5ASu+SUSSVnbNYFUiwsDFskvTiatfVdL1iVR11tmmcx X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The x86 Shadow stack feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. One of these unusual properties is that shadow stack memory is writable, but only in limited ways. These limits are applied via a specific PTE bit combination. Nevertheless, the memory is writable, and core mm code will need to apply the writable permissions in the typical paths that call pte_mkwrite(). Future patches will make pte_mkwrite() take a VMA, so that the x86 implementation of it can know whether to create regular writable memory or shadow stack memory. But there are a couple of challenges to this. Modifying the signatures of each arch pte_mkwrite() implementation would be error prone because some are generated with macros and would need to be re-implemented. Also, some pte_mkwrite() callers operate on kernel memory without a VMA. So this can be done in a three step process. First pte_mkwrite() can be renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite() added that just calls pte_mkwrite_novma(). Next callers without a VMA can be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers can be changed to take/pass a VMA. Previous patches have done the first step, so next move the callers that don't have a VMA to pte_mkwrite_novma(). Also do the same for pmd_mkwrite(). This will be ok for the shadow stack feature, as these callers are on kernel memory which will not need to be made shadow stack, and the other architectures only currently support one type of memory in pte_mkwrite() Cc: linux-doc@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-s390@vger.kernel.org Cc: xen-devel@lists.xenproject.org Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: Rick Edgecombe Reviewed-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand --- Hi Non-x86 Arch’s, x86 has a feature that allows for the creation of a special type of writable memory (shadow stack) that is only writable in limited specific ways. Previously, changes were proposed to core MM code to teach it to decide when to create normally writable memory or the special shadow stack writable memory, but David Hildenbrand suggested[0] to change pXX_mkwrite() to take a VMA, so awareness of shadow stack memory can be moved into x86 code. Later Linus suggested a less error-prone way[1] to go about this after the first attempt had a bug. Since pXX_mkwrite() is defined in every arch, it requires some tree-wide changes. So that is why you are seeing some patches out of a big x86 series pop up in your arch mailing list. There is no functional change. After this refactor, the shadow stack series goes on to use the arch helpers to push arch memory details inside arch/x86 and other arch's with upcoming shadow stack features. Testing was just 0-day build testing. Hopefully that is enough context. Thanks! [0] https://lore.kernel.org/lkml/0e29a2d0-08d8-bcd6-ff26-4bea0e4037b0@redhat.com/ [1] https://lore.kernel.org/lkml/CAHk-=wiZjSu7c9sFYZb3q04108stgHff2wfbokGCCgW7riz+8Q@mail.gmail.com/ --- arch/arm64/mm/trans_pgd.c | 4 ++-- arch/s390/mm/pageattr.c | 4 ++-- arch/x86/xen/mmu_pv.c | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c index 4ea2eefbc053..a01493f3a06f 100644 --- a/arch/arm64/mm/trans_pgd.c +++ b/arch/arm64/mm/trans_pgd.c @@ -40,7 +40,7 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr) * read only (code, rodata). Clear the RDONLY bit from * the temporary mappings we use during restore. */ - set_pte(dst_ptep, pte_mkwrite(pte)); + set_pte(dst_ptep, pte_mkwrite_novma(pte)); } else if (debug_pagealloc_enabled() && !pte_none(pte)) { /* * debug_pagealloc will removed the PTE_VALID bit if @@ -53,7 +53,7 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr) */ BUG_ON(!pfn_valid(pte_pfn(pte))); - set_pte(dst_ptep, pte_mkpresent(pte_mkwrite(pte))); + set_pte(dst_ptep, pte_mkpresent(pte_mkwrite_novma(pte))); } } diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c index 5ba3bd8a7b12..6931d484d8a7 100644 --- a/arch/s390/mm/pageattr.c +++ b/arch/s390/mm/pageattr.c @@ -97,7 +97,7 @@ static int walk_pte_level(pmd_t *pmdp, unsigned long addr, unsigned long end, if (flags & SET_MEMORY_RO) new = pte_wrprotect(new); else if (flags & SET_MEMORY_RW) - new = pte_mkwrite(pte_mkdirty(new)); + new = pte_mkwrite_novma(pte_mkdirty(new)); if (flags & SET_MEMORY_NX) new = set_pte_bit(new, __pgprot(_PAGE_NOEXEC)); else if (flags & SET_MEMORY_X) @@ -155,7 +155,7 @@ static void modify_pmd_page(pmd_t *pmdp, unsigned long addr, if (flags & SET_MEMORY_RO) new = pmd_wrprotect(new); else if (flags & SET_MEMORY_RW) - new = pmd_mkwrite(pmd_mkdirty(new)); + new = pmd_mkwrite_novma(pmd_mkdirty(new)); if (flags & SET_MEMORY_NX) new = set_pmd_bit(new, __pgprot(_SEGMENT_ENTRY_NOEXEC)); else if (flags & SET_MEMORY_X) diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index b3b8d289b9ab..63fced067057 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -150,7 +150,7 @@ void make_lowmem_page_readwrite(void *vaddr) if (pte == NULL) return; /* vaddr missing */ - ptev = pte_mkwrite(*pte); + ptev = pte_mkwrite_novma(*pte); if (HYPERVISOR_update_va_mapping(address, ptev, 0)) BUG(); From patchwork Tue Jun 13 00:10:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277718 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A82EC7EE43 for ; Tue, 13 Jun 2023 00:12:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CB86C6B007B; Mon, 12 Jun 2023 20:12:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B27966B007E; Mon, 12 Jun 2023 20:12:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9CA9A8E0003; Mon, 12 Jun 2023 20:12:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 7249E6B0080 for ; Mon, 12 Jun 2023 20:12:14 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 3E2F91203A0 for ; Tue, 13 Jun 2023 00:12:14 +0000 (UTC) X-FDA: 80895797388.16.BC281DD Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf19.hostedemail.com (Postfix) with ESMTP id 2F3341A0007 for ; Tue, 13 Jun 2023 00:12:11 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=WEe03L+d; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615132; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VcuBReWpfxAIKx4wFfFaWaN8xL7BMpduuUswGcnNnAs=; b=hSFzohFm1o54e5jCHkaGtEqldk6RP8gQkZJPQm65a8DwHcsAXrEmj8xzZHHtJXU5gljYMN YkAV83CIpGA8UaFg/YrukK3KWSr59gNHy3poKRrpYYpJPaDHNsc2epQjWDLrEBPX3L+Gvh aw4B86rOmFnqol8QP/vHtmj5ULkSYKM= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=WEe03L+d; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615132; a=rsa-sha256; cv=none; b=nzuV3EDvKbG/liKL9R/0tPuwehxJ+XlH+v+xmi8J9kamn8lTiqZJ+QvArnra+DuptI27mx DiGPUJKd34FSyzcuVukB0nIrkeSqDhijoiC+dLiHYU/+jFrmlHVMInUKPmOU9ZZwuDVb/+ 4Qav/kL4Lo3k07Bq+nnTf+2LF6Eos9w= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615132; x=1718151132; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=p3wIDF2hCv3eUQbed5EBHWb3M5fOtv++6fox2OA9QBk=; b=WEe03L+dip8DerGbMMAGDK4Ow/Xw3dy+cm3Ti1gMWsu0r8R69pWiUPgo yHvJlcXbTuopsQg2EcCGEDgz70Kdl9QU+6uBcXFqur2ZIud2W2/HSF2L8 GJF34qaIjUY1hH2/iOaPR2Ra5fqOnTyKPQD7My8Al5sqDtO6B/U5kiZ2p br+WLfaTzzdY3gmGGdF3C4RicDik8cpHfEjxNDgBRUMWq7J2sye/PfL4+ wBzTIn1f9weZPypHBlSu1KwlfZ8/6I2kqDTZ04buk+VO2LbNe0/fopnEX Ev26Em8f0ik2wR0lsoFrX0bOhClFj9zptlCu3RsbTBmwGt9GpsHX/3lyi Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556712" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556712" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835670975" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835670975" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:08 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com Subject: [PATCH v9 03/42] mm: Make pte_mkwrite() take a VMA Date: Mon, 12 Jun 2023 17:10:29 -0700 Message-Id: <20230613001108.3040476-4-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 2F3341A0007 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: ubeqmd89a71zaon53a7zs9rpocj6ke3u X-HE-Tag: 1686615131-41816 X-HE-Meta: U2FsdGVkX19C8CAs8GzzsMkB13q3sAsrEv8E7DrCmK8XCO5uxjo3WTQRNgeFMAiOM+unnN00yJTl0oCZ/4c2ZlfSOR7PSitFPZb8K8Vl/axSrTlGAxYQcLoINrKxPO0pYHEwXQkzoTm8r8jQ8MCaZN5kYtDdFJReQp5kSV0BHc0b4KKQ7aEDXS/IfIY44b8jvt3U2x7xh+HAML5uIv7BHqTRT+kYFj/xrdzbdT8ODcd0kmjDkRLpERfG8kHLMv/Xnjfq+WiVwqD6ZuJLGCk5xxVAtCxgQe2CQyH05lUqXsJb8IQt9ccyQ+aPsWzwYzQPXBBSkHrQEmYmeVXfpotBHkG+cW+D00caZU/JDw2Msv9JNl9bPHrq8lKqFF8f7QikhkJJaqhAE+fEQc/wh8oa4vkRyfWwfu979iL0EBUOL6BfN5lMzAzM/bQWEHANx5CdANJbQ+xdFUjyICrP6jMnuHE8F9T7kYuVJFO49/NyUkv5dUzZ97WsINY4eDIZAKyhIVw+6sDLp7zwFVVKEJQGFlpirds+4i/qd0HrweTrdfLfQ5fX3ula3kr22SWGMDORKjUPDAAKpLB4BEV6tfcbg8FM9Mw+oMzJlkh0qzTHxoG8tn1ETDYrFTW++9tu27SiDQ1ySi5rcAqkVI3CePIDxczkeDejt6gNr58UC6lye7p9+BuH9L0bpGPDjNu4yjNNCbatur68WoskBnX4qIwvJDekqm08TXkq+ovP6CIQeu12Az2fdro+Zq8HZ2EwB4RVSGm1wvCx1WspCuT05kc6wSHu0HC+iTalJtYhGZxGiMqfvEfHinCyyvDu06z5dwLODZr7FxOrJSZvmmv0YFv84FOaG63fTwzRwI0+aKlErRktPzBY6m1oRmAZgoE/8FuS02n2ArBlAIGPmSxYj0PcClgTJVETsC1ZeYRr93r2PwJFiooyBNW+Zm7KOlFOm+VDW4pIkRkccRw87hCffxM qsZj64nW 30hk0HvEPrLtDLHFBOEf0NlbsfsWVr4ZCCxFfCZqN8q0IeiKRVvijm1tTD5zr1oR3nMFZT3Bc2O485tRVvaQUKw+7g7zN0VtVWTzaMs6eOQukinMC4wGL4X8YGeRDaGvjXWrb5F+k39/iiw6pCUbP4X4r+PebC4/1ODXJsLq/1zGdpzhGtBHUL3RuWNuFsrpR+C09K/+UDNWqWgUoVG/xJ4U0XIeIUnmxpHADgp5abZn/uRXHnTs4pQ8B6GUZbXIZGcs6wI2eW7CX+YR0FntbuvLoTDR25+TNBlK/JrMT078vWv4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The x86 Shadow stack feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. One of these unusual properties is that shadow stack memory is writable, but only in limited ways. These limits are applied via a specific PTE bit combination. Nevertheless, the memory is writable, and core mm code will need to apply the writable permissions in the typical paths that call pte_mkwrite(). Future patches will make pte_mkwrite() take a VMA, so that the x86 implementation of it can know whether to create regular writable memory or shadow stack memory. But there are a couple of challenges to this. Modifying the signatures of each arch pte_mkwrite() implementation would be error prone because some are generated with macros and would need to be re-implemented. Also, some pte_mkwrite() callers operate on kernel memory without a VMA. So this can be done in a three step process. First pte_mkwrite() can be renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite() added that just calls pte_mkwrite_novma(). Next callers without a VMA can be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers can be changed to take/pass a VMA. In a previous patches, pte_mkwrite() was renamed pte_mkwrite_novma() and callers that don't have a VMA were changed to use pte_mkwrite_novma(). So now change pte_mkwrite() to take a VMA and change the remaining callers to pass a VMA. Apply the same changes for pmd_mkwrite(). No functional change. Suggested-by: David Hildenbrand Signed-off-by: Rick Edgecombe Reviewed-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand --- Documentation/mm/arch_pgtable_helpers.rst | 6 ++++-- include/linux/mm.h | 2 +- include/linux/pgtable.h | 4 ++-- mm/debug_vm_pgtable.c | 12 ++++++------ mm/huge_memory.c | 10 +++++----- mm/memory.c | 4 ++-- mm/migrate.c | 2 +- mm/migrate_device.c | 2 +- mm/mprotect.c | 2 +- mm/userfaultfd.c | 2 +- 10 files changed, 24 insertions(+), 22 deletions(-) diff --git a/Documentation/mm/arch_pgtable_helpers.rst b/Documentation/mm/arch_pgtable_helpers.rst index 69ce1f2aa4d1..c82e3ee20e51 100644 --- a/Documentation/mm/arch_pgtable_helpers.rst +++ b/Documentation/mm/arch_pgtable_helpers.rst @@ -46,7 +46,8 @@ PTE Page Table Helpers +---------------------------+--------------------------------------------------+ | pte_mkclean | Creates a clean PTE | +---------------------------+--------------------------------------------------+ -| pte_mkwrite | Creates a writable PTE | +| pte_mkwrite | Creates a writable PTE of the type specified by | +| | the VMA. | +---------------------------+--------------------------------------------------+ | pte_mkwrite_novma | Creates a writable PTE, of the conventional type | | | of writable. | @@ -121,7 +122,8 @@ PMD Page Table Helpers +---------------------------+--------------------------------------------------+ | pmd_mkclean | Creates a clean PMD | +---------------------------+--------------------------------------------------+ -| pmd_mkwrite | Creates a writable PMD | +| pmd_mkwrite | Creates a writable PMD of the type specified by | +| | the VMA. | +---------------------------+--------------------------------------------------+ | pmd_mkwrite_novma | Creates a writable PMD, of the conventional type | | | of writable. | diff --git a/include/linux/mm.h b/include/linux/mm.h index 27ce77080c79..43701bf223d3 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1284,7 +1284,7 @@ void free_compound_page(struct page *page); static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) { if (likely(vma->vm_flags & VM_WRITE)) - pte = pte_mkwrite(pte); + pte = pte_mkwrite(pte, vma); return pte; } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index ae271a307584..0f3cf726812a 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -508,14 +508,14 @@ extern pud_t pudp_huge_clear_flush(struct vm_area_struct *vma, #endif #ifndef pte_mkwrite -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { return pte_mkwrite_novma(pte); } #endif #if defined(CONFIG_HAS_HUGE_PAGE) && !defined(pmd_mkwrite) -static inline pmd_t pmd_mkwrite(pmd_t pmd) +static inline pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { return pmd_mkwrite_novma(pmd); } diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index c54177aabebd..107e293904d3 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -109,10 +109,10 @@ static void __init pte_basic_tests(struct pgtable_debug_args *args, int idx) WARN_ON(!pte_same(pte, pte)); WARN_ON(!pte_young(pte_mkyoung(pte_mkold(pte)))); WARN_ON(!pte_dirty(pte_mkdirty(pte_mkclean(pte)))); - WARN_ON(!pte_write(pte_mkwrite(pte_wrprotect(pte)))); + WARN_ON(!pte_write(pte_mkwrite(pte_wrprotect(pte), args->vma))); WARN_ON(pte_young(pte_mkold(pte_mkyoung(pte)))); WARN_ON(pte_dirty(pte_mkclean(pte_mkdirty(pte)))); - WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte)))); + WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte, args->vma)))); WARN_ON(pte_dirty(pte_wrprotect(pte_mkclean(pte)))); WARN_ON(!pte_dirty(pte_wrprotect(pte_mkdirty(pte)))); } @@ -153,7 +153,7 @@ static void __init pte_advanced_tests(struct pgtable_debug_args *args) pte = pte_mkclean(pte); set_pte_at(args->mm, args->vaddr, args->ptep, pte); flush_dcache_page(page); - pte = pte_mkwrite(pte); + pte = pte_mkwrite(pte, args->vma); pte = pte_mkdirty(pte); ptep_set_access_flags(args->vma, args->vaddr, args->ptep, pte, 1); pte = ptep_get(args->ptep); @@ -199,10 +199,10 @@ static void __init pmd_basic_tests(struct pgtable_debug_args *args, int idx) WARN_ON(!pmd_same(pmd, pmd)); WARN_ON(!pmd_young(pmd_mkyoung(pmd_mkold(pmd)))); WARN_ON(!pmd_dirty(pmd_mkdirty(pmd_mkclean(pmd)))); - WARN_ON(!pmd_write(pmd_mkwrite(pmd_wrprotect(pmd)))); + WARN_ON(!pmd_write(pmd_mkwrite(pmd_wrprotect(pmd), args->vma))); WARN_ON(pmd_young(pmd_mkold(pmd_mkyoung(pmd)))); WARN_ON(pmd_dirty(pmd_mkclean(pmd_mkdirty(pmd)))); - WARN_ON(pmd_write(pmd_wrprotect(pmd_mkwrite(pmd)))); + WARN_ON(pmd_write(pmd_wrprotect(pmd_mkwrite(pmd, args->vma)))); WARN_ON(pmd_dirty(pmd_wrprotect(pmd_mkclean(pmd)))); WARN_ON(!pmd_dirty(pmd_wrprotect(pmd_mkdirty(pmd)))); /* @@ -253,7 +253,7 @@ static void __init pmd_advanced_tests(struct pgtable_debug_args *args) pmd = pmd_mkclean(pmd); set_pmd_at(args->mm, vaddr, args->pmdp, pmd); flush_dcache_page(page); - pmd = pmd_mkwrite(pmd); + pmd = pmd_mkwrite(pmd, args->vma); pmd = pmd_mkdirty(pmd); pmdp_set_access_flags(args->vma, vaddr, args->pmdp, pmd, 1); pmd = READ_ONCE(*args->pmdp); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 624671aaa60d..37dd56b7b3d1 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -551,7 +551,7 @@ __setup("transparent_hugepage=", setup_transparent_hugepage); pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { if (likely(vma->vm_flags & VM_WRITE)) - pmd = pmd_mkwrite(pmd); + pmd = pmd_mkwrite(pmd, vma); return pmd; } @@ -1572,7 +1572,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) pmd = pmd_modify(oldpmd, vma->vm_page_prot); pmd = pmd_mkyoung(pmd); if (writable) - pmd = pmd_mkwrite(pmd); + pmd = pmd_mkwrite(pmd, vma); set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd); update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); spin_unlock(vmf->ptl); @@ -1924,7 +1924,7 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, /* See change_pte_range(). */ if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pmd_write(entry) && can_change_pmd_writable(vma, addr, entry)) - entry = pmd_mkwrite(entry); + entry = pmd_mkwrite(entry, vma); ret = HPAGE_PMD_NR; set_pmd_at(mm, addr, pmd, entry); @@ -2234,7 +2234,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, } else { entry = mk_pte(page + i, READ_ONCE(vma->vm_page_prot)); if (write) - entry = pte_mkwrite(entry); + entry = pte_mkwrite(entry, vma); if (anon_exclusive) SetPageAnonExclusive(page + i); if (!young) @@ -3271,7 +3271,7 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) if (pmd_swp_soft_dirty(*pvmw->pmd)) pmde = pmd_mksoft_dirty(pmde); if (is_writable_migration_entry(entry)) - pmde = pmd_mkwrite(pmde); + pmde = pmd_mkwrite(pmde, vma); if (pmd_swp_uffd_wp(*pvmw->pmd)) pmde = pmd_mkuffd_wp(pmde); if (!is_migration_entry_young(entry)) diff --git a/mm/memory.c b/mm/memory.c index f69fbc251198..c1b6fe944c20 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4100,7 +4100,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) entry = mk_pte(&folio->page, vma->vm_page_prot); entry = pte_sw_mkyoung(entry); if (vma->vm_flags & VM_WRITE) - entry = pte_mkwrite(pte_mkdirty(entry)); + entry = pte_mkwrite(pte_mkdirty(entry), vma); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); @@ -4796,7 +4796,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) pte = pte_modify(old_pte, vma->vm_page_prot); pte = pte_mkyoung(pte); if (writable) - pte = pte_mkwrite(pte); + pte = pte_mkwrite(pte, vma); ptep_modify_prot_commit(vma, vmf->address, vmf->pte, old_pte, pte); update_mmu_cache(vma, vmf->address, vmf->pte); pte_unmap_unlock(vmf->pte, vmf->ptl); diff --git a/mm/migrate.c b/mm/migrate.c index 01cac26a3127..8b46b722f1a4 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -219,7 +219,7 @@ static bool remove_migration_pte(struct folio *folio, if (folio_test_dirty(folio) && is_migration_entry_dirty(entry)) pte = pte_mkdirty(pte); if (is_writable_migration_entry(entry)) - pte = pte_mkwrite(pte); + pte = pte_mkwrite(pte, vma); else if (pte_swp_uffd_wp(*pvmw.pte)) pte = pte_mkuffd_wp(pte); diff --git a/mm/migrate_device.c b/mm/migrate_device.c index d30c9de60b0d..df3f5e9d5f76 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -646,7 +646,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, } entry = mk_pte(page, vma->vm_page_prot); if (vma->vm_flags & VM_WRITE) - entry = pte_mkwrite(pte_mkdirty(entry)); + entry = pte_mkwrite(pte_mkdirty(entry), vma); } ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); diff --git a/mm/mprotect.c b/mm/mprotect.c index 92d3d3ca390a..afdb6723782e 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -198,7 +198,7 @@ static long change_pte_range(struct mmu_gather *tlb, if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pte_write(ptent) && can_change_pte_writable(vma, addr, ptent)) - ptent = pte_mkwrite(ptent); + ptent = pte_mkwrite(ptent, vma); ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); if (pte_needs_flush(oldpte, ptent)) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index e97a0b4889fc..6dea7f57026e 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -72,7 +72,7 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd, if (page_in_cache && !vm_shared) writable = false; if (writable) - _dst_pte = pte_mkwrite(_dst_pte); + _dst_pte = pte_mkwrite(_dst_pte, dst_vma); if (flags & MFILL_ATOMIC_WP) _dst_pte = pte_mkuffd_wp(_dst_pte); From patchwork Tue Jun 13 00:10:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277719 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41D25C7EE2E for ; Tue, 13 Jun 2023 00:12:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 843298E0006; Mon, 12 Jun 2023 20:12:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 81CC68E0003; Mon, 12 Jun 2023 20:12:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D18D8E0006; Mon, 12 Jun 2023 20:12:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 4A20C8E0002 for ; Mon, 12 Jun 2023 20:12:16 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 15EEBC034E for ; Tue, 13 Jun 2023 00:12:16 +0000 (UTC) X-FDA: 80895797472.30.DC1A71A Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf08.hostedemail.com (Postfix) with ESMTP id E01AD160011 for ; Tue, 13 Jun 2023 00:12:13 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=UOHkNYU8; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615134; a=rsa-sha256; cv=none; b=LAayIiqYuCE6aEYBe4CFrwKUtoA9a8hgk+NgphCqkdvQk8c4J3bI9tD6U8TYeHg2GBZJhf 3nDtfI2fg+pJk9nP493sjY8mQpEOFDt41a3JdRI3tP0y2SuRGM/JOLjmUQepw4zAdyr/eH 30nMPW1V4L896A/pRJNcoHTzv7R8d3I= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=UOHkNYU8; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615134; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ueCxy0hEBQgGAIs/gsp+oRtbLmHLjLxT1XZ2AlFm+Qc=; b=Umar+BQ0cbeMjZeWsl0Ra9F0EdB722DZUCcMAOCfIH3RP4PDfBVRPnNVJIrTqvqH3cillu RHZent8cez1McHmgGWBa2SJAQ/rsHsIAtImVlo2QN8ggA2ZDKpPzQUHgsw/MrLnXnYuIgc 228ntiA8xO+6pa3e8miV3EePlHGSdwE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615134; x=1718151134; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tDE8EspiBAdPvpd4BkDoYawjxYWlY+HjRp9EjAaWS/U=; b=UOHkNYU8s60+ndDx0Z840KQix7bsE7IPs2RpuIxOqiXsw47+YSxcLd7W TZ/aN+IT8Uladv3o/kPWxyBaHCTo1U2xSkpHhXxp2vD4V5NjD4cRzCQGC UNfNPBOYlBtRw77oiPTbHgcb3xpbYKZOzzIdueE0OKETGt+QETc2F3qDb WTyh2E+nKg8GbllPSgNI3T3o7hJ/jsYhk5BPUNOEi1Cxqt6AvEKwXAl3q 3GUX941sGvEb2I8OKF8I7JesCJHlLYlVxI1ASdlmR4XdSbvTos0cC32zN 5usg0F8GxNn61GBQi2wkd7w8W7lVMQtTY0hUa2WXBQwAS7Jqmuwwg+ITM w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556740" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556740" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835670979" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835670979" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:08 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Peter Collingbourne , Pengfei Xu Subject: [PATCH v9 04/42] mm: Re-introduce vm_flags to do_mmap() Date: Mon, 12 Jun 2023 17:10:30 -0700 Message-Id: <20230613001108.3040476-5-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: E01AD160011 X-Stat-Signature: 9s1j4ead71545dkz6cqmmu1zb76p3t5j X-HE-Tag: 1686615133-381743 X-HE-Meta: U2FsdGVkX1//u8F0H3uKdhutX7ucHK2nguZCnZI61n+IPsS4c6eXwGEcJIzcg15JLbT5RPMNf1189Q7/UyfywS+y+2CgJommg/T7L4kJQ7blkdAdugenDhqYCnIZejx+XlFzHiaFxCRugV0VpvKZaSHniCbPbcxsX2738vvl8GmgRHajfWBan3B6xJHnKYYGN4DKc+4Y0A2YW91hercfm+OCQN9WBhfc8yC9rmGRGtBdsXEr1DMnXre3kPwXXIfNkHAoLkmUaBChNm8inLFRlX1tLDAEl51IjuHMuzFN3O/dtRKfR7vd1U73uQmSdHnY9M/mcElcLM8a4wSnZP6gOp5az2T+0E0fmZOQBPu8g849WI67eyydhdEQPrkaQwwq8XN07ogsrDoreG1x4Q9+nPbEAjfAWcsI2w+M3u1LYYkPmTSoCBhBh3lbhXwo9Agk8XEZt723aIMY/WW6v69C/l05gcymk3LKKvDHeZcZj2ZTbFfRBdhA8qStTvy3NdWYgWWh5Uh/EJYNqbleVXJuY6VH/NpdY2WQ9SmL09gERxt0d6ALSX6Q5gEXU5G25c1ItL/10B1NGsvKCOg3EOYsJn4Mdf40q2xxy8H47aHt2g5ddhcDKWGjhQAVhIQEAP0T97qoiWObRvavYVTHl2I4KL4K+zjIVa8HuMF+RKKJ5Nj2cA9UmY5zFw6CqvJe3+3MsNmKn8VwZ8ui/emEGwDimmWOigdAu4AxQxdJb/caI8ZbGRRiIR5SUbzIMQoALjFUbVrlum/8qIWoWzd7ptx61ALf1c8EU5YEAeY3nj432fYpidfFKmByRYkvS6nvnOs2MGykBnqZgGtatphSux0olRC8Yr36Onoz8qEh+CO1wZ9JLkeoUt35LDOe+Lv2k8hQj12OHHqZZqyJzz0CGE1IlTsdz3p2+NzU3bwvw8zC/1BUk0nULXq/mrwKpsOPJD5i9Qw8w7zZJWgRvkr/+Fj 8O5WzagU WuVH7WXOWXr9yJozVFR879fUUwQX4Tta3q5/C/IUnx06jYG2yWRQi1qmbLD2ZysAuWbbdX6vlNB7v5WlcjY9cE7IzdD1J/q9bnw7sHSz0L0XKrzc+Q1+nFgDWam765Um4xNQUR63x2wWrIyo3WC2BI4r1Z7oV90NFNpWffUPUoyVu2GTyFysqN0l6qYXHJrtIKwvJcK+NkltRFkuxqbOQKYXgeNs28D92v9YeBvtojh+223nKeisR0f5GTFOdiCEoXpwt86pfUVgNO/xJiRByvdN9ikmTvTiHZX3JnoTHSMQ4S7v05HqLYeEhfAv7JVJ0Z7WfshfoKh3TaMEzgGQ7w2f+xmC+pmcObmimJI0PBhpwKVYdWIctZfwYxG1GowEvDfzio1bQfW+5GmqPrvW9CKt9kWmv4xGc5p6oq5kI56w5LwzfXB+cIS9qr2NNiipMqb1a X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu There was no more caller passing vm_flags to do_mmap(), and vm_flags was removed from the function's input by: commit 45e55300f114 ("mm: remove unnecessary wrapper function do_mmap_pgoff()"). There is a new user now. Shadow stack allocation passes VM_SHADOW_STACK to do_mmap(). Thus, re-introduce vm_flags to do_mmap(). Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Peter Collingbourne Reviewed-by: Kees Cook Reviewed-by: Kirill A. Shutemov Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Acked-by: David Hildenbrand Reviewed-by: Mark Brown Tested-by: Mark Brown --- fs/aio.c | 2 +- include/linux/mm.h | 3 ++- ipc/shm.c | 2 +- mm/mmap.c | 10 +++++----- mm/nommu.c | 4 ++-- mm/util.c | 2 +- 6 files changed, 12 insertions(+), 11 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index b0b17bd098bb..4a7576989719 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -558,7 +558,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) ctx->mmap_base = do_mmap(ctx->aio_ring_file, 0, ctx->mmap_size, PROT_READ | PROT_WRITE, - MAP_SHARED, 0, &unused, NULL); + MAP_SHARED, 0, 0, &unused, NULL); mmap_write_unlock(mm); if (IS_ERR((void *)ctx->mmap_base)) { ctx->mmap_size = 0; diff --git a/include/linux/mm.h b/include/linux/mm.h index 43701bf223d3..9ec20cbb20c1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3133,7 +3133,8 @@ extern unsigned long mmap_region(struct file *file, unsigned long addr, struct list_head *uf); extern unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, - unsigned long pgoff, unsigned long *populate, struct list_head *uf); + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, + struct list_head *uf); extern int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, unsigned long start, size_t len, struct list_head *uf, bool downgrade); diff --git a/ipc/shm.c b/ipc/shm.c index 60e45e7045d4..576a543b7cff 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -1662,7 +1662,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, goto invalid; } - addr = do_mmap(file, addr, size, prot, flags, 0, &populate, NULL); + addr = do_mmap(file, addr, size, prot, flags, 0, 0, &populate, NULL); *raddr = addr; err = 0; if (IS_ERR_VALUE(addr)) diff --git a/mm/mmap.c b/mm/mmap.c index 13678edaa22c..afdf5f78432b 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1221,11 +1221,11 @@ static inline bool file_mmap_ok(struct file *file, struct inode *inode, */ unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, - unsigned long flags, unsigned long pgoff, - unsigned long *populate, struct list_head *uf) + unsigned long flags, vm_flags_t vm_flags, + unsigned long pgoff, unsigned long *populate, + struct list_head *uf) { struct mm_struct *mm = current->mm; - vm_flags_t vm_flags; int pkey = 0; validate_mm(mm); @@ -1286,7 +1286,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, * to. we assume access permissions have been handled by the open * of the memory object, so we don't do any here. */ - vm_flags = calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | + vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; if (flags & MAP_LOCKED) @@ -2903,7 +2903,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, file = get_file(vma->vm_file); ret = do_mmap(vma->vm_file, start, size, - prot, flags, pgoff, &populate, NULL); + prot, flags, 0, pgoff, &populate, NULL); fput(file); out: mmap_write_unlock(mm); diff --git a/mm/nommu.c b/mm/nommu.c index f670d9979a26..138826c4a872 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -1002,6 +1002,7 @@ unsigned long do_mmap(struct file *file, unsigned long len, unsigned long prot, unsigned long flags, + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf) @@ -1009,7 +1010,6 @@ unsigned long do_mmap(struct file *file, struct vm_area_struct *vma; struct vm_region *region; struct rb_node *rb; - vm_flags_t vm_flags; unsigned long capabilities, result; int ret; VMA_ITERATOR(vmi, current->mm, 0); @@ -1029,7 +1029,7 @@ unsigned long do_mmap(struct file *file, /* we've determined that we can make the mapping, now translate what we * now know into VMA flags */ - vm_flags = determine_vm_flags(file, prot, flags, capabilities); + vm_flags |= determine_vm_flags(file, prot, flags, capabilities); /* we're going to need to record the mapping */ diff --git a/mm/util.c b/mm/util.c index dd12b9531ac4..8e7fc6cacab4 100644 --- a/mm/util.c +++ b/mm/util.c @@ -540,7 +540,7 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr, if (!ret) { if (mmap_write_lock_killable(mm)) return -EINTR; - ret = do_mmap(file, addr, len, prot, flag, pgoff, &populate, + ret = do_mmap(file, addr, len, prot, flag, 0, pgoff, &populate, &uf); mmap_write_unlock(mm); userfaultfd_unmap_complete(mm, &uf); From patchwork Tue Jun 13 00:10:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277720 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50E3FC88CBB for ; Tue, 13 Jun 2023 00:12:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B33F58E0002; Mon, 12 Jun 2023 20:12:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AB2D68E0007; Mon, 12 Jun 2023 20:12:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 77FC38E0002; Mon, 12 Jun 2023 20:12:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 590688E0003 for ; Mon, 12 Jun 2023 20:12:16 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 3214B80350 for ; Tue, 13 Jun 2023 00:12:16 +0000 (UTC) X-FDA: 80895797472.01.A31B464 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf01.hostedemail.com (Postfix) with ESMTP id 13E8640008 for ; Tue, 13 Jun 2023 00:12:13 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=atX2Nv1m; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615134; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GGsLCpG/NjQKTNutKZQhh/vSRH0Qv1WXqoRX0znCIbQ=; b=qo9tMvSfiqTNxCbhR13urYT/Go0Ci0DCAhnIGFFVO/SrAtHKMm+6k4vPjg5TxdpADd24fv gCymVeK1FcKOkCUHxHPzhVtJ39Xek5Su6RMV4CVz3sZHNE00Z8HW/TDeIrSkZpYtwmFf6C 5ZE1GTzHP+v2/iAvugeeaY9ptsD3rX4= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=atX2Nv1m; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615134; a=rsa-sha256; cv=none; b=5SH8k6970I9oRn4Vb/TAx16uOE2RdNDv0qYltFDTLIlaHC4pRWDqe+qPzRRHOgvXOL+IoN nNpw71OFUeVy3E7y4HhyesW63SPBRHYUn2EPcai0CJqewWVAiVq8EXcdvusneB6DvpduYH iRlSPz0vxWvGBTp7obDOOHOb3tc0Js0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615134; x=1718151134; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=HsIyEUzZjyBgsbaXIFC6qpwhsPOWh1g+KPlkVPDuH8M=; b=atX2Nv1mobs5wD5eAK+12w44TXU+5M51yyF23d0LfN9xBpHXu3y8izMR /ZsmPfds9BPvpuaMc5T8XByMJroE9Pzd3meNdi3tDyvfUqmqq5F9tqrGr CIjZmeOEPICzmo5TwjFm+UvIbFlmcYTnY819CsI6OZwesvw6DRQNuyXR9 XMTFm46NMZtPUq0vSI+PFdVXhjJWd+V2ih7wjccZkKxXOIysc0pk/4VN4 HEZsg89wW3DY0UVZ4e/4p2T/lc8LbgE0FckiyLutW5P/ulUH8Kl6tLt1t 1/VfZCQl7tVJCXgSjIQdXbBdlNiJ1tcrCE0pBlEy3VEp7wVCJ8OsiZ9AH g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556763" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556763" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835670984" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835670984" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:09 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Axel Rasmussen , Peter Xu , Pengfei Xu Subject: [PATCH v9 05/42] mm: Move VM_UFFD_MINOR_BIT from 37 to 38 Date: Mon, 12 Jun 2023 17:10:31 -0700 Message-Id: <20230613001108.3040476-6-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 13E8640008 X-Stat-Signature: g7usgsa78xgyp4jf7j5gtq9y976ogi7q X-Rspam-User: X-HE-Tag: 1686615133-397918 X-HE-Meta: U2FsdGVkX18BYLRzdyhjtt27DfyGjR0XGVRPVI4JQT4lZhRsJTvh7yNhhodVrKRzlHsKZg9AdyOxBkdGjFW1al+cFvHNFkCQCdUCVQuB31SDqM0x/bTjQmeF65bn2YX1l50ijA9JnJbPskl8TXKESkRCLz9miEPKdnUsWYMx2OCy4zZ6EZYd2ZmSuqmNAUbYXpsNmrYZqJfB8bcG4XiRE52br3am3G29xf/AwVjzhM2Nz4+zPUywRnzoVvcWrRkifqwWyp9aYbemm68o2/HxRCxzfdE6pYDy2iYqybj4MeVnmGyDSM79aOtlKh6mSPXgIzjzpmvhD7A0UOGVy48Uvs2wW0rW8FNW6h0CE7BUoJnp+/dTE8M3PhVU1KqFuKEDWOkdtMPPtkHbnIXKXqckxa4bSZdhY+zkD6kriyTNEjyCrzlUdMfit9b4+C/o7hDd8TvrMRy68JJmpf+iwmoc9EgaWekiLubMw25/65d+5f5UjO6wdAkX6q+LgItKMW2Qz/x+AgSHgJfXXX0FLRMFDO2mX4M+k/yNTQ5QG9ns4OFuhT5oPGlIQCUCSv3HJtl9XU+PXUycQx3F8WefnHioL8ySTjKbeE/vGS/xYHloZ5a8Q4es1pQqWJUk7bKWFLzjXXssfgOQ39mYTT7mPYHk2N3sZZ5eLBYR5G/wimMY1i0ce4ejsHreQzRMupOrL8P+tShZwjvhO7Oz9VXLHhI05vZIrhVB9hjKxTqviWVy521AX5zXxv6IyvpzZIC5nQuxG1A6lT2DPy5UUyW1l/yqEBcRsmjADORmccHCEFNR6bxlhQcVNOuL0Cts+GWE2H1s1C5Gt/eU3J+HgGvHFiBaOYES28tTcvwDqoPfC5Sglo70SFLQb9vlwY5DenIUcjBwmkCYf+M5vRl4GTzGtErF7xoWuqlzQNxyiKS+DmJbqhfsteSj7UWGHN/EKHRorYD4pDY04KP2sC5LBqgLI9c YyvU9XNs c56xsaj0vpk9ZJsCtIJdS3JYizJp/RqbOoqhQSlnRRxsfqkPAIkXmbXX+BiCWt+kDMCKGZ/T10xVUOiAnR2isE4QNw+KO3dkK2tafA4vuYTtcBcooZaLRPiA12v2SaHypp1bMFnm+IManUGrxpeZFJgiWgN1HsusmL51FXyEq3HbX+vcbg5zaqNPTm8Pu+mFdUdGih9lVZ6FRoZ7q3806byYRWafHP2gdnIsusZai+zeDTLoyNyQnL8Gpiv6E/gjdr8HHeu+UF+VxiyEy20nq7/EJWQ/iPgIk84M3kpXJ9iZnwEQpzvLtJ0Q8PdywvhBCNLo0mxlLnRXJEL56B4zfafBQ96BvxQS10U7C0NZU0c9vVoxnqD8AfKkpQRlyMgdQsxmspSYE2oyNYfFO7trFkR89T7rw/+O/kysRB8UjehYLxrMOKZ2nRg6+qTYUUPS8d/plBHltaASINgOEAyPMg6csBw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. Future patches will introduce a new VM flag VM_SHADOW_STACK that will be VM_HIGH_ARCH_BIT_5. VM_HIGH_ARCH_BIT_1 through VM_HIGH_ARCH_BIT_4 are bits 32-36, and bit 37 is the unrelated VM_UFFD_MINOR_BIT. For the sake of order, make all VM_HIGH_ARCH_BITs stay together by moving VM_UFFD_MINOR_BIT from 37 to 38. This will allow VM_SHADOW_STACK to be introduced as 37. Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Reviewed-by: Axel Rasmussen Acked-by: Mike Rapoport (IBM) Acked-by: Peter Xu Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Reviewed-by: David Hildenbrand --- include/linux/mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 9ec20cbb20c1..6f52c1e7c640 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -370,7 +370,7 @@ extern unsigned int kobjsize(const void *objp); #endif #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR -# define VM_UFFD_MINOR_BIT 37 +# define VM_UFFD_MINOR_BIT 38 # define VM_UFFD_MINOR BIT(VM_UFFD_MINOR_BIT) /* UFFD minor faults */ #else /* !CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ # define VM_UFFD_MINOR VM_NONE From patchwork Tue Jun 13 00:10:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277721 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DEDA6C7EE43 for ; Tue, 13 Jun 2023 00:12:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 008178E0007; Mon, 12 Jun 2023 20:12:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EF7DB8E0003; Mon, 12 Jun 2023 20:12:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CFCC38E0008; Mon, 12 Jun 2023 20:12:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B8DE68E0003 for ; Mon, 12 Jun 2023 20:12:16 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 71307C034E for ; Tue, 13 Jun 2023 00:12:16 +0000 (UTC) X-FDA: 80895797472.26.2F685F0 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf19.hostedemail.com (Postfix) with ESMTP id 606561A0004 for ; Tue, 13 Jun 2023 00:12:14 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=J5dsVrvs; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615134; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ryAUT3UXrAcsGxsWLBFRJE1PqULflYsxnmjTi3z/Wrs=; b=A+pG4f7xGZiupHruGMdVjiuynSy95t7tTWviOoMPfKCpSOUiOtIxHRBidxRB1MMXofhvfW /yCN6MF5piARUVe/hBEKVUYXnH0W96N9UbDSPwmiL/2rQdm2pNrP66bHE3i0n/Md48kx5b MIcun9o1wJdseFKH3alr03e8Ja3heFk= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=J5dsVrvs; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615134; a=rsa-sha256; cv=none; b=L4tIlVNoSAtNVEUUytoG15ZnZKkNKIAbZpy+2PyawdfoNKPlexHMqA4xN6T9zqT46BE3v+ Lkj5arWUc051uwJmr3u9Stoyzh8g/617FkQnMG0NyPfvxDZrshC3uEv3vTRG7l6+If3Re0 op/hWU/ObsI3Vkcu24xq9TmnkZpmSjw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615134; x=1718151134; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VIRIy8IvN5ZfY5u3MVh2ThurjhiZOJC7RcXup7TqiYM=; b=J5dsVrvsQoEiGXjU2/F2mZxHX9Wtgd8eR1tZAx8sTiWaU7g0Nl12dqYK WC3qIDwv9pTqoHO0SUKRtQbFVgJJXVmXRdQq776vDZ4XBTtUinn7yXkpX ERI5enbupURzIofkxsAzQJggymoC2YwNLTDDS+h4d2p1nfMoSJCBZc/Kg eavVqt6vEYuEaRlEVK9WIqidIFBhMoFKjW197xXJdbTN5Yq4FpZ4taULP CCWpLI0rtXeHOqMTe1e4WkxVbWxAAihvoerV/Tg1OhCah5+fCvni+h7rs 76q2oZ2nMw/rCfj7YwiGzBlGan0ehh2rHw5r0hr9f0ABIGqfzZ0iSoLJv Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556785" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556785" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835670988" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835670988" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:10 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 06/42] x86/shstk: Add Kconfig option for shadow stack Date: Mon, 12 Jun 2023 17:10:32 -0700 Message-Id: <20230613001108.3040476-7-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 606561A0004 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 8ykrsc19utb6rzx4wakd7j1rdk6z9f7z X-HE-Tag: 1686615134-513816 X-HE-Meta: U2FsdGVkX1/o6muee5ISm98l2mLK8dol8QWBRcPXtSVqOY141Um9RcIysGVxuene/rMobZ4z0QqvuOB1mKt4U9VQMha5QfqD+pLcnKUGjAFgflfnSUUF7wjrp5SHU/QeATHogJUepMPAtwI+s35QZTQpStJOca2MdGojaxhbWztPi6X7beUYMwaj3MZcW6g5mbzY0GA/DfZSqwgisgWSu8GhpdW1fuIFdj7fknhcZZsaPFC5wlLhZc8R3/MwY2g/S0FADLfCQuuv0s8HcSjjyG6SeeLCpOD2rEcT7hXuL7docQJEk4BlxmTv46uTe2qqdA299bDKlZO4nujwpducfZLVbRgS9R6A7XrMd0W2M6VDEx4a6amOoO+4EfQZ/h0Jxoilyh0XZc5fjIWCR3VnEKEAR0o+SfjrIqySzikl+3FjmpfvlLnooeMv7GvOyylpLvDaLxmEMyPMaKaMcHrW7gQcMQpM9rRmANAYwsiZbjxGUatZIUMNE+glM6bzrl/VzYkfwaLlbs12Lr5aO5m4i0gBsN3IFfu6Vqk6gD8iy3G2uNn4CTYvNYcCa2IeH4QkmcVRx/f7Wv/30V+M2rljA8UuKYbhnfvb5JRQBxT26An0INFJmix8sWZb/pFsEKB2YMQx06TEeuJPeprYX0hiAFTghkQ2HCOz3kbWllfHvL3B2IGm2KrmLi/tlizd9wlzLaYD8yTMy1QTmirxrZNTLEK0Aw0kK2m3Fo2umIHrzBIZGqvvsE+K9LzD8hRpjJ6DdXcFrseTi6LvT6ZMxs8Gpe3umajwiomU8yN9HX1UMRr+CNjG9V3duoagxV385QKrN7OFVYQfo5GDJOXBF2IMtNjzU6K7hW5VLU2VvUr7dh/7oFCLiUnkqApDYWEtAJNeYNOTzovkEenIizX8+1xF9p+qWQE/q6F5HEmOuMrUS/6nFKhjoKTyPpQkskacLv9hgLQc9SdCMsjSztu/cRg ot6ynY+T GT8iRDzGz5F3PYZCmdPOn7n9kKLyhczHhY9xmUTS5aW6sZIpYPdWgk5jGcBEncKeM6ISs9L/ZVHPaSMj2J/qAUvSNTRVdKdwknE720/uATfzbr5XulAlfPkh2jVRNe89+idNHRJhmC1isWHZfSchAhVTZlzvlxfNiHaBfsinHHcrF3Koa8z3FfQo32P0aRekgKdJlREvc0X7uiJAG9Xm1T11pPpgT3BXt5WwRzX68pKRC9JEjRHlu3UJVw3zjJH/V8xyKrm/eR2fvwrGIxUIZ8/aBbSQRcu4ODQY0vdxv8d627PWLA2OW7ona0UvSZ7cy21R14vJinZ9jKEqSxNW1rFy5r/b5bxGqdeLpfGfH/Z1PgQn/i5+iH2iqBfFRmwMUmNBsDI4crZx8qBI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Shadow stack provides protection for applications against function return address corruption. It is active when the processor supports it, the kernel has CONFIG_X86_SHADOW_STACK enabled, and the application is built for the feature. This is only implemented for the 64-bit kernel. When it is enabled, legacy non-shadow stack applications continue to work, but without protection. Since there is another feature that utilizes CET (Kernel IBT) that will share implementation with shadow stacks, create CONFIG_CET to signify that at least one CET feature is configured. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/Kconfig | 24 ++++++++++++++++++++++++ arch/x86/Kconfig.assembler | 5 +++++ 2 files changed, 29 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 53bab123a8ee..ce460d6b4e25 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1852,6 +1852,11 @@ config CC_HAS_IBT (CC_IS_CLANG && CLANG_VERSION >= 140000)) && \ $(as-instr,endbr64) +config X86_CET + def_bool n + help + CET features configured (Shadow stack or IBT) + config X86_KERNEL_IBT prompt "Indirect Branch Tracking" def_bool y @@ -1859,6 +1864,7 @@ config X86_KERNEL_IBT # https://github.com/llvm/llvm-project/commit/9d7001eba9c4cb311e03cd8cdc231f9e579f2d0f depends on !LD_IS_LLD || LLD_VERSION >= 140000 select OBJTOOL + select X86_CET help Build the kernel with support for Indirect Branch Tracking, a hardware support course-grain forward-edge Control Flow Integrity @@ -1952,6 +1958,24 @@ config X86_SGX If unsure, say N. +config X86_USER_SHADOW_STACK + bool "X86 userspace shadow stack" + depends on AS_WRUSS + depends on X86_64 + select ARCH_USES_HIGH_VMA_FLAGS + select X86_CET + help + Shadow stack protection is a hardware feature that detects function + return address corruption. This helps mitigate ROP attacks. + Applications must be enabled to use it, and old userspace does not + get protection "for free". + + CPUs supporting shadow stacks were first released in 2020. + + See Documentation/x86/shstk.rst for more information. + + If unsure, say N. + config EFI bool "EFI runtime service support" depends on ACPI diff --git a/arch/x86/Kconfig.assembler b/arch/x86/Kconfig.assembler index b88f784cb02e..8ad41da301e5 100644 --- a/arch/x86/Kconfig.assembler +++ b/arch/x86/Kconfig.assembler @@ -24,3 +24,8 @@ config AS_GFNI def_bool $(as-instr,vgf2p8mulb %xmm0$(comma)%xmm1$(comma)%xmm2) help Supported by binutils >= 2.30 and LLVM integrated assembler + +config AS_WRUSS + def_bool $(as-instr,wrussq %rax$(comma)(%rbx)) + help + Supported by binutils >= 2.31 and LLVM integrated assembler From patchwork Tue Jun 13 00:10:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277722 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 846CCCA9EB3 for ; Tue, 13 Jun 2023 00:12:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E85058E0009; Mon, 12 Jun 2023 20:12:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DBF998E0008; Mon, 12 Jun 2023 20:12:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BEB618E0009; Mon, 12 Jun 2023 20:12:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id A7A538E0003 for ; Mon, 12 Jun 2023 20:12:18 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 78D7B40385 for ; Tue, 13 Jun 2023 00:12:18 +0000 (UTC) X-FDA: 80895797556.15.082A9AD Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf08.hostedemail.com (Postfix) with ESMTP id 4CBD7160011 for ; Tue, 13 Jun 2023 00:12:16 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="R3/b+BSD"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615136; a=rsa-sha256; cv=none; b=a3PF+MP/crOdj681Ge/bKtD7JInLuwmovjXPpS0zUss58Ki6W4RwSrAcoWFekXdUfLLpkA AOm+aXQFAO6Y1OSz7+goZJdY5yxsqet5dudcPj11TJVvjpvZaiGj2t+tX/H1gT6E7AcPqb 8Aye2TS1vaf4efzCdKcVA9yDBSkbuw4= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="R3/b+BSD"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615136; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4x3914htyKBGWYyTzQcGb3BHP/ZAi07IgspQYvXolFs=; b=4aV2fWJPhRxlH9R+XOPkkaQO5+rtq3U3K9HJd4sJlcFFqd523xV9Bb/F9myyZUoz/mIO7t bF/9Tlzwnv/xGp689QKvawAvGHO1y4rEONTr8oA5WhV1t8lw5hGPHiBDPMYYpEuatUHXcw Jmdc2tWa38VteKtK/hxUo6isx2J/60E= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615136; x=1718151136; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1T/qgZrUQhYBEmir0bddzuZOwE3WskkI/v/mGEi34h4=; b=R3/b+BSDam0NqDlAT1HhuUrsopGL40H9zZf3quXKnqZM8STTQFcNglZ/ 9GVzOcJb/LA1ZIXm2HoBRfGZP69oQEndy38eukFBmj5Xw22YEHG3fWV2G V5Grxd7Uqp7qmOALzhjSEIIEKr0Kod1F0p3vGVgK22hn/fZ7UDhYX+wJI RsI+XOuVWRl5V+fPTnY9sYcNONw0hBPOSXta27XwFUrM3qvTHfebL/DXu EOq5y0Y5MPn25nC3rAQWSNogSRfcxNYmDQsiE9OyTagnSCXOLMyidMPHH wnMwndWjUKzgkuW/uaaketD1g4GXc2dRny1y/ZokKMefHTXN/l7Uxw9y7 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556810" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556810" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835670993" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835670993" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:11 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 07/42] x86/traps: Move control protection handler to separate file Date: Mon, 12 Jun 2023 17:10:33 -0700 Message-Id: <20230613001108.3040476-8-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 4CBD7160011 X-Stat-Signature: 4sz8bexexkmzz53a8xm1hqhqwo9u3tf5 X-HE-Tag: 1686615136-483452 X-HE-Meta: U2FsdGVkX1/AXJ+fjK2T0zGlMAiTv0d9ie60+NNZ5ZGt1dCmeNQldhgT7smimGtf1C9IHgJshef+++IeF/ae/wswnAwj+Q9XZzzxaa65ANgWEw/4yhz5mULwZn1NYTTMCvhcXzBN4IhzL3lAkTRB5ghs4k9NF0KvJeQ72BMIQGsNX9iD/XzG87vlfQNM5W6golb9XcvF1hW3vTIP2NlrEAiJuxVv6k5u8CiNwAi/yYBYjxEpjWDggGo2vFQ+5WgnAcb7Q/RskyDIxfRuO4KrcK46V9ohmqED8B8TOwKrzHAV8RFSuk6sDmVFtjXIiC2w2rvwyBIM0WSfS1dV0aD3y9r7TdO4wKgtAbakl66Vq+g7pgj0vxMaraVjL/py1BsrEfGGZa7xsUQbSAZ56bdeYf6SOhGDNKODP5l2YIosaSgpY0iI7pbaaYoJstZXAJcwiYqZZaxjkiBiHLIXz4s7LjOZgLEP+kt6xasz90FEn1ZqL27HV5nMjOYrN96ODgbFkEeyZcDeltseiRTGOp/rujcxYkihxtlS07dB2AsgvvLGjtuJRafMQoMDQPAuOk08NQ4+/QxbuLWfXLmnjhBQ+oyvWA619yruZHUVoTSy7HcSTZHX78fCEborfHMZBTVAVi8ThH1QCm5vbnUGrCePSfIAmL6xAA4NOOWUWrYHSrVezsxybCSGqivQzhfqarQyk508mTV8l5uCB3tMyAiATdohaoh+bHZu90M+2jPzADuXcjvxepVfCHfkU5HcrElgy2QupMzSvZaqxaJjwDSc6G0iIM1bSUoctAwYTNK3IFGIU4jI6OYBBMoP+NsEJcMdmIGHS7Y+G27Zl/uNoCcqKABmxK9fxKhBjurch4ou1MoeS7aMG15XMk3jpKKjLAFly+b7GA8RGLnBCoDcUZDONqiNKll4OI81+EMLd50HPyvTGjIaXJftTWlZHCvjG6TcSpIY0GosJRYbBDPZPJq jZJm/g17 C+RXOLybM0spbfkV96Cu8wau9yOG+1NclTQ37dQ4UvhNHsDPEB8M/htwfjzn6WR0GmtM7nk8PqELCDJLZOk2XXdeOLamtaZIISIgGruSLzMbkMYhTUk2myJ6gaJgsyZVIma1H4OUJ9obZlmn67k4jOhQQ/HH7NHUv85lYHTVdvdHqygGy9RW8n+YIlWICCh2zAB65IJ3txUJpW5/CXpqFngkN6suE7bJuGkxPA+fsrvkJBG+61dOYlNJ1Q8aqxzaK1XCu1rpCbcvAjV6ll9DHASqqhpaGG9oqx0g1WY0EBhNrrUN64qPb2YIXGzp7I+2Chs+3KH0oPBsTL1zV5ZKbzlbFw3t1U8UnktKveD63byltPwJg5P8SIaBVmQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Today the control protection handler is defined in traps.c and used only for the kernel IBT feature. To reduce ifdeffery, move it to it's own file. In future patches, functionality will be added to make this handler also handle user shadow stack faults. So name the file cet.c. No functional change. Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/kernel/Makefile | 2 ++ arch/x86/kernel/cet.c | 76 ++++++++++++++++++++++++++++++++++++++++ arch/x86/kernel/traps.c | 75 --------------------------------------- 3 files changed, 78 insertions(+), 75 deletions(-) create mode 100644 arch/x86/kernel/cet.c diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 4070a01c11b7..abee0564b750 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -145,6 +145,8 @@ obj-$(CONFIG_CFI_CLANG) += cfi.o obj-$(CONFIG_CALL_THUNKS) += callthunks.o +obj-$(CONFIG_X86_CET) += cet.o + ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c new file mode 100644 index 000000000000..7ad22b705b64 --- /dev/null +++ b/arch/x86/kernel/cet.c @@ -0,0 +1,76 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include + +static __ro_after_init bool ibt_fatal = true; + +extern void ibt_selftest_ip(void); /* code label defined in asm below */ + +enum cp_error_code { + CP_EC = (1 << 15) - 1, + + CP_RET = 1, + CP_IRET = 2, + CP_ENDBR = 3, + CP_RSTRORSSP = 4, + CP_SETSSBSY = 5, + + CP_ENCL = 1 << 15, +}; + +DEFINE_IDTENTRY_ERRORCODE(exc_control_protection) +{ + if (!cpu_feature_enabled(X86_FEATURE_IBT)) { + pr_err("Unexpected #CP\n"); + BUG(); + } + + if (WARN_ON_ONCE(user_mode(regs) || (error_code & CP_EC) != CP_ENDBR)) + return; + + if (unlikely(regs->ip == (unsigned long)&ibt_selftest_ip)) { + regs->ax = 0; + return; + } + + pr_err("Missing ENDBR: %pS\n", (void *)instruction_pointer(regs)); + if (!ibt_fatal) { + printk(KERN_DEFAULT CUT_HERE); + __warn(__FILE__, __LINE__, (void *)regs->ip, TAINT_WARN, regs, NULL); + return; + } + BUG(); +} + +/* Must be noinline to ensure uniqueness of ibt_selftest_ip. */ +noinline bool ibt_selftest(void) +{ + unsigned long ret; + + asm (" lea ibt_selftest_ip(%%rip), %%rax\n\t" + ANNOTATE_RETPOLINE_SAFE + " jmp *%%rax\n\t" + "ibt_selftest_ip:\n\t" + UNWIND_HINT_FUNC + ANNOTATE_NOENDBR + " nop\n\t" + + : "=a" (ret) : : "memory"); + + return !ret; +} + +static int __init ibt_setup(char *str) +{ + if (!strcmp(str, "off")) + setup_clear_cpu_cap(X86_FEATURE_IBT); + + if (!strcmp(str, "warn")) + ibt_fatal = false; + + return 1; +} + +__setup("ibt=", ibt_setup); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 58b1f208eff5..6f666dfa97de 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -213,81 +213,6 @@ DEFINE_IDTENTRY(exc_overflow) do_error_trap(regs, 0, "overflow", X86_TRAP_OF, SIGSEGV, 0, NULL); } -#ifdef CONFIG_X86_KERNEL_IBT - -static __ro_after_init bool ibt_fatal = true; - -extern void ibt_selftest_ip(void); /* code label defined in asm below */ - -enum cp_error_code { - CP_EC = (1 << 15) - 1, - - CP_RET = 1, - CP_IRET = 2, - CP_ENDBR = 3, - CP_RSTRORSSP = 4, - CP_SETSSBSY = 5, - - CP_ENCL = 1 << 15, -}; - -DEFINE_IDTENTRY_ERRORCODE(exc_control_protection) -{ - if (!cpu_feature_enabled(X86_FEATURE_IBT)) { - pr_err("Unexpected #CP\n"); - BUG(); - } - - if (WARN_ON_ONCE(user_mode(regs) || (error_code & CP_EC) != CP_ENDBR)) - return; - - if (unlikely(regs->ip == (unsigned long)&ibt_selftest_ip)) { - regs->ax = 0; - return; - } - - pr_err("Missing ENDBR: %pS\n", (void *)instruction_pointer(regs)); - if (!ibt_fatal) { - printk(KERN_DEFAULT CUT_HERE); - __warn(__FILE__, __LINE__, (void *)regs->ip, TAINT_WARN, regs, NULL); - return; - } - BUG(); -} - -/* Must be noinline to ensure uniqueness of ibt_selftest_ip. */ -noinline bool ibt_selftest(void) -{ - unsigned long ret; - - asm (" lea ibt_selftest_ip(%%rip), %%rax\n\t" - ANNOTATE_RETPOLINE_SAFE - " jmp *%%rax\n\t" - "ibt_selftest_ip:\n\t" - UNWIND_HINT_FUNC - ANNOTATE_NOENDBR - " nop\n\t" - - : "=a" (ret) : : "memory"); - - return !ret; -} - -static int __init ibt_setup(char *str) -{ - if (!strcmp(str, "off")) - setup_clear_cpu_cap(X86_FEATURE_IBT); - - if (!strcmp(str, "warn")) - ibt_fatal = false; - - return 1; -} - -__setup("ibt=", ibt_setup); - -#endif /* CONFIG_X86_KERNEL_IBT */ - #ifdef CONFIG_X86_F00F_BUG void handle_invalid_op(struct pt_regs *regs) #else From patchwork Tue Jun 13 00:10:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277724 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57DF9C88CBE for ; Tue, 13 Jun 2023 00:12:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F6FC8E0003; Mon, 12 Jun 2023 20:12:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ED1718E000A; Mon, 12 Jun 2023 20:12:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C87788E0003; Mon, 12 Jun 2023 20:12:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B5CC08E0008 for ; Mon, 12 Jun 2023 20:12:18 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 7DE85A0348 for ; Tue, 13 Jun 2023 00:12:18 +0000 (UTC) X-FDA: 80895797556.22.FBA9195 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf01.hostedemail.com (Postfix) with ESMTP id 73E2440004 for ; Tue, 13 Jun 2023 00:12:16 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=OQ97ZvTO; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615136; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2/A0JR+pPzLRCqzuy8n+KkCV+4dlid7cWNKSH0w0Lto=; b=BtG0Wjvz+R91vxZ11kYz4LT5DB2f34OlT1lJocel7oITHvabY1wSKaKDWDbIWQiI8ZU6rt 1NWpJcsQ+zuoR20Spl8hrNezjaDoPslSS+jWWdnSyGRMoRcHSqM9eUz5pVrK9E6pQLSQT9 rsvWjzbiXxpLx5TGdlamHnkd/yW51U8= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=OQ97ZvTO; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615136; a=rsa-sha256; cv=none; b=1uTFIqv1ANrr0k5XIxUwobeDnzbP7lqjaCYL+oIO1to7Uc5arkTxUc7jsGG1u4+7hlGlbE iFoFZl81vgD8y/UFM4qkZfzcXpKBgu5PvnCVrnISmMVVSKxS48cxbwLQA9HrwgenPy//eG ThFWbJRxHry/DHU65aXkFfWj+jjlpbY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615136; x=1718151136; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=sARsUThnVYVYYqaKHA46gpgv7OakP1T/BusMzOfelkg=; b=OQ97ZvTOhUbpI2RMqN4W7lkciajYAFs3LYvjyFWSk871BfyH0KBr2osC pMOtMhfDtUkngKbQLyIjhyxaTCTRLkcM+gjwrfC30nbYdh+XakhV4WuJT yikov8LzYN2bJnfDwayNwTWRlsih2XO7n/P/py58kPaOvCDXY6GwkrM44 ynPevCh/P5KhuFeiR3FxizekOHZ5DuGgtvKBAMG3Vk84y0lebiYKDoxoe ewOijI0o8jtgr8WKUlXSx1dCmWg1SszNdh4mOTzU/QzZTe7v47rZ60Vpo SMR48pqtfosNCYeJWMy61Te2JS3rMMCFbpxDZxpEhE6aspm9LpI3ze5pE Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556828" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556828" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835670996" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835670996" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:13 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 08/42] x86/cpufeatures: Add CPU feature flags for shadow stacks Date: Mon, 12 Jun 2023 17:10:34 -0700 Message-Id: <20230613001108.3040476-9-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 73E2440004 X-Stat-Signature: 9a3qyihkr67sixx7xbgiq3aj1xtt1t7u X-Rspam-User: X-HE-Tag: 1686615136-222365 X-HE-Meta: U2FsdGVkX1/GwObkrjJucV0ItMr8J0CTOHhBF9A+PNOAwLoR9zkBNZ2EFeuQjPhm87XZl4/3x5D8E0B/rzdbUiy34AKWEkvMwLqBB1d60bK9z6DjGdELNP5VstI0bvza12nDJEIiJ4xvfaj8hTnGtHE/e6UM600ZpInTrbm5ZVamt39qOL/YKeNlrPYc3fSpaCJsFp8CrClY3DpnUxHIinZgRVtSebW92ed/O9WOUHzvetEEi45XzmVemKlmnwxJiJ7OtSYGFP8tcSEY2W7LxHKfoT84RkVzqDfNMd/jvmPUVuWkjk/DTlOTJ3oPFvtkXeldyjYoAoGozY1M0Xx0sC/+u8l1TlRpbOfZey+W3jfNy0zl6eTGAzabX8UHuqAPRFaHJjGumYROwP4F3QQCtYxFfc/exGaxf2iBaLQ7cn7r7PNVKpQsWtqMiN4PfX1RfsUv3M+x2dTuig0xFEqNHbhvVXETjTpBQoSrJMXjaQKxbfimd7EEe3Qal+FDdCuHNe43cLdmZFVfcXOqOkuNz20CXHStuN5ZSqutl8B2iEKKEjPfQnsJgn9OWvWe/mIxZZ/67IVpJ5I3z5lVRnryVipfp4pvqdB48sSAqw8WvlKSpJD99YRxkAPB09kvomrZYjFpJXUO2FuInRbExKOW2Fh6eWDSS+KQsoVpIFsRTAUxy/5JgtHk6YqZpAUD0GEd2mNt+Ib60gVDGsLwFN4VlRQFWZQJ45rSWmnU7OOUpkfP/6iwq36lEEddpC0WYk5zc2Cu9t8zvRM5HUQ5OhciMtYhnrb22TO6u/gOnE69k5IEWwX8i0xJujt0AEpk76aFp0VMuv09X3WSqYgUVGZ1/HkKTOAVUICGqpvattEt8P0ExhQDZ29rp6QJp84/1JTLEHCR+wz8yodP2ulGdkOjkPJB8WwA281PKutf/rgTNkEXZimvU2Y3RmUXIvmj1nXL9uXYPnZ2+kY9ZGbQ7Xx CPRQNWya 7hHCkMMnwfob2kaFgu7tIDcMQ4sGOrmquc4aFl3Hd3ieUcLkTIG/MESSfs1olOQ0D8aLQShOubqQnFVj0YRH+j/pmyk4A9HTP+CTx/jWu2WA0Bh1rmjktwoXET0LfwK8qsR3+XZeD5apVZTcdL4f4RKnl4PTC3k5goEoMzXinhWoJnk838sW0Z4QPFYGtM7HndVIhZ8MmHlU0xTg7HtwmDaz5pvZrbITz0UTApXA03pVR7N/TLBEM8bSKSRCDh6zJYQOdreWp0q0U0EBGjZfLa0fkR29vTtzJeGLcqKg7w6w8pWSLUNtRXqL8zqw+O0HO/lQP85xPUPu9v35ucSgDfdFQCp8/U8P2qUlY8QuM05uoMuX8sZxxHHiPfg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The Control-Flow Enforcement Technology contains two related features, one of which is Shadow Stacks. Future patches will utilize this feature for shadow stack support in KVM, so add a CPU feature flags for Shadow Stacks (CPUID.(EAX=7,ECX=0):ECX[bit 7]). To protect shadow stack state from malicious modification, the registers are only accessible in supervisor mode. This implementation context-switches the registers with XSAVES. Make X86_FEATURE_SHSTK depend on XSAVES. The shadow stack feature, enumerated by the CPUID bit described above, encompasses both supervisor and userspace support for shadow stack. In near future patches, only userspace shadow stack will be enabled. In expectation of future supervisor shadow stack support, create a software CPU capability to enumerate kernel utilization of userspace shadow stack support. This user shadow stack bit should depend on the HW "shstk" capability and that logic will be implemented in future patches. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/cpufeatures.h | 2 ++ arch/x86/include/asm/disabled-features.h | 8 +++++++- arch/x86/kernel/cpu/cpuid-deps.c | 1 + 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index cb8ca46213be..d7215c8b7923 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -308,6 +308,7 @@ #define X86_FEATURE_MSR_TSX_CTRL (11*32+20) /* "" MSR IA32_TSX_CTRL (Intel) implemented */ #define X86_FEATURE_SMBA (11*32+21) /* "" Slow Memory Bandwidth Allocation */ #define X86_FEATURE_BMEC (11*32+22) /* "" Bandwidth Monitoring Event Configuration */ +#define X86_FEATURE_USER_SHSTK (11*32+23) /* Shadow stack support for user mode applications */ /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */ #define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */ @@ -380,6 +381,7 @@ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ #define X86_FEATURE_WAITPKG (16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */ #define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ +#define X86_FEATURE_SHSTK (16*32+ 7) /* "" Shadow stack */ #define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ #define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */ #define X86_FEATURE_VPCLMULQDQ (16*32+10) /* Carry-Less Multiplication Double Quadword */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index fafe9be7a6f4..b9c7eae2e70f 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -105,6 +105,12 @@ # define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31)) #endif +#ifdef CONFIG_X86_USER_SHADOW_STACK +#define DISABLE_USER_SHSTK 0 +#else +#define DISABLE_USER_SHSTK (1 << (X86_FEATURE_USER_SHSTK & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -120,7 +126,7 @@ #define DISABLED_MASK9 (DISABLE_SGX) #define DISABLED_MASK10 0 #define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \ - DISABLE_CALL_DEPTH_TRACKING) + DISABLE_CALL_DEPTH_TRACKING|DISABLE_USER_SHSTK) #define DISABLED_MASK12 (DISABLE_LAM) #define DISABLED_MASK13 0 #define DISABLED_MASK14 0 diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c index f6748c8bd647..e462c1d3800a 100644 --- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -81,6 +81,7 @@ static const struct cpuid_dep cpuid_deps[] = { { X86_FEATURE_XFD, X86_FEATURE_XSAVES }, { X86_FEATURE_XFD, X86_FEATURE_XGETBV1 }, { X86_FEATURE_AMX_TILE, X86_FEATURE_XFD }, + { X86_FEATURE_SHSTK, X86_FEATURE_XSAVES }, {} }; From patchwork Tue Jun 13 00:10:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277723 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C876EC88CBA for ; Tue, 13 Jun 2023 00:12:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE7458E000A; Mon, 12 Jun 2023 20:12:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DBFF28E0008; Mon, 12 Jun 2023 20:12:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C602D8E000A; Mon, 12 Jun 2023 20:12:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B174D8E0008 for ; Mon, 12 Jun 2023 20:12:19 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 84EC0C029D for ; Tue, 13 Jun 2023 00:12:19 +0000 (UTC) X-FDA: 80895797598.09.A712353 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf19.hostedemail.com (Postfix) with ESMTP id 0C5FB1A0002 for ; Tue, 13 Jun 2023 00:12:16 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=IXFXq9d3; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615137; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=B5LlQiTXVucM+z0G5ThLSCfU2JFVpdZqLcXL708P6y0=; b=pR8yL7Lls8mujobENGqiyZnvInlhTkYQCC1L6Z8F3+fA19OucA6IHRAe67HmIaf8nCIFTR SELKPVIMCoOwG/PfcCoLT30v4KbM+LhZdzhL+/SblmhhEqW1Fd25qDfQSFHnw9nIGNCuk1 HHWgsx+GggVmCcnZ4wv4N31VQkM4ayY= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=IXFXq9d3; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615137; a=rsa-sha256; cv=none; b=euBH0vcC8U3xf/hz5EutDPtkmImhLQRFrcggGHQ1vydREwNzkJbtThRN8rltx8TUeyz/lf fjFRxF7B94NFxdYY2i/XeISoMbiqRYDkIkvvv1ico/hcDWbrNC9I+avxnkE9GdV9DoqHLx LtMmyBth9vyiS3i0n9lS0BV91RrKU+w= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615137; x=1718151137; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YP96Yn7FdNN68fPWKmeUZZw0SU1yEnE0kYU9/y2Lvpc=; b=IXFXq9d3nKBsY77YuopmAa+T1bQJ06NEkW4SOMjEiIms7AuufXq/YeYo 7hvhotfxwx65w24rQrHAfaO9h8/RUie66I7/rgKQomvYMWTIZvFrqBi5j 5+/ZgOeLdjqTm0in/yffnldT/EunTNrM1+Swq8SZWeclC1p4i4ECb6ulC 16W3mNQTn23lrT9mqOTrWYf1RzSe3f68JWaASdQQoqafwox7WdU5s10vX bNLUsyu9IlMB/UxAWf5hxiNQLB/qx/HhuiHgyjxFsouvxgBSvTt0Zs8NN qwzY1VS8oyn10/ps3qTGiFYXyq+5257Ti7C0WeZ1rCSWD+a/UrIabO5N9 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556855" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556855" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671000" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671000" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:14 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 09/42] x86/mm: Move pmd_write(), pud_write() up in the file Date: Mon, 12 Jun 2023 17:10:35 -0700 Message-Id: <20230613001108.3040476-10-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 0C5FB1A0002 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: qah54yap9j7huso8imp96wmdj7of6369 X-HE-Tag: 1686615136-293848 X-HE-Meta: U2FsdGVkX18FZyX8Xsp85ggtTaD56hqHpJ6Hgl/k0LOLsg2YUUq/m0/1FrqyD4dFmN+n4lEl596px1lcSn+kmmc6L20RC0lGuOnSN7UA7OXBAkZLqYdM4XVKh+l8Y9vWq/WwBnHkERem+6DfBa81VFmuXzuFqqeEgqblC46C40P5nFKkECuMl7CuWlTH5naK8NUymWhayROdCbDTrjMA60jWJYl+JQ+tRa39LGijA04LQ3oaxukflNHtjhFnrhm3IoR7W22MoVshHpU9n9fX1DejLkSOgpzVyzLdlXjTXmnkbXyQnlmw3MdVaG8ZiaYJQgtrQdbH4FTPhT6LyjCOFEBmkzP4y469KSo11HtRHahKQhFtgg/SlA51zSxWPgPyZZAsUcAijlQMgjVfn17yRsrvTNF52Ggbmhy9e6PpPdT460Lt9r+vhAh00S7uZt2G3yptbX3nH0vgmEtqX8XFaxnE8I4J1hsw+59erhkGVXAIh3ujGtrxfy38TGe21bjstyLEdAbTcL2r1NL82fPN/djCOdfOE4fl8NlFpRfo3MPUme6idhvX/loQErtkrOrkO/Tx8mbeRdtq/P+rfGJ/RxgYTHjnSix/XREG0Nyfu5I9fzLEAWH+Axyx+bDcsW5urxPYmmcwL/fjB8Kp9f40LzU2Siok6Y9Xjs2zZOSqD+IPT/nlImiUL2S55q2zDRchlIR+DvFsqFNGHn8GF7eKUNxPwnCwXUHR+yLc5BSfbdMG+N5lx4nloYfXywFxknjEO67Hu7tId/AzfBGKyQPs5ISWd4/uWNuAwN7usoQ8vKAgp7tNm6RfAQY1BtYz639n3ztJqLVkRBlj+SXuR+d/j5MvdrOLmlxvie5TZetQnfsMhTX+y2qhHfCS1IYKZnAEWgyQHWNRELs6iV1zEzZPLE5Eoi+dvnYgmtEoj9whKZ4FpdZH2x6hA2h9EsYKhDuf8LkfPEp2QCBqU50QFaF Aur/0OLX 8GO7D5M8jPS/aIZPbA1qZ2hvHrCmmYw2RjeOmdzDKNkvDYFpn3I6ACZoco25lq/qOSiSlTmw9EOC5Lp82fE7YykXYN+WCL1sEvfabAfswdJd4gKyq+QuzTgsk1qVhjjAKFKCrO5m5u1REc71holBxiXYiUAOJqrpxyOY4QBuMcpDikqmqCPZT95+GbsxENMIgeElwH3IEpqtXBu1vRqpjvyKR53A6uL6RzzjJtW/qwaZSxn1msmiBxSCcIFozDowEWJUaA97qBxKD38OLaqkohWaEKVyS31HsbFL0ssdAVGMLqn0R2eKxzou6iZnBZJVx4O59P3UF/LIKmaSMDHFZiAlxq7iyutfNVZabUdVUoPf+rYUe6uVvU19tR91PrzHyEK+v2JgQ2QNyBYN9d5Sv2num3WAeQLIHwXsNz+uDjml8Yo4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To prepare the introduction of _PAGE_SAVED_DIRTY, move pmd_write() and pud_write() up in the file, so that they can be used by other helpers below. No functional changes. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Reviewed-by: Kirill A. Shutemov Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/pgtable.h | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 112e6060eafa..768ee46782c9 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -160,6 +160,18 @@ static inline int pte_write(pte_t pte) return pte_flags(pte) & _PAGE_RW; } +#define pmd_write pmd_write +static inline int pmd_write(pmd_t pmd) +{ + return pmd_flags(pmd) & _PAGE_RW; +} + +#define pud_write pud_write +static inline int pud_write(pud_t pud) +{ + return pud_flags(pud) & _PAGE_RW; +} + static inline int pte_huge(pte_t pte) { return pte_flags(pte) & _PAGE_PSE; @@ -1120,12 +1132,6 @@ extern int pmdp_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); -#define pmd_write pmd_write -static inline int pmd_write(pmd_t pmd) -{ - return pmd_flags(pmd) & _PAGE_RW; -} - #define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) @@ -1155,12 +1161,6 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm, clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); } -#define pud_write pud_write -static inline int pud_write(pud_t pud) -{ - return pud_flags(pud) & _PAGE_RW; -} - #ifndef pmdp_establish #define pmdp_establish pmdp_establish static inline pmd_t pmdp_establish(struct vm_area_struct *vma, From patchwork Tue Jun 13 00:10:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277726 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2020C7EE43 for ; Tue, 13 Jun 2023 00:12:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 781788E0008; Mon, 12 Jun 2023 20:12:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7303E8E000B; Mon, 12 Jun 2023 20:12:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 385868E0008; Mon, 12 Jun 2023 20:12:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 20CCF8E000B for ; Mon, 12 Jun 2023 20:12:21 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id EFE57140371 for ; Tue, 13 Jun 2023 00:12:20 +0000 (UTC) X-FDA: 80895797640.30.4D8BE08 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf08.hostedemail.com (Postfix) with ESMTP id B237A160011 for ; Tue, 13 Jun 2023 00:12:18 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=lUhJcDV1; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615139; a=rsa-sha256; cv=none; b=xcyRPgBOeZeR9Lq8rsXS+3DtW0NYVhUwnhxxIiuigmg4kaiKyob+ZybtVHnRGZbLspb2sf v1UaDLfe8QAeyY2wv7NhJk3jkFVBcvTo7QNDeKQjw9XTDgfhIbyXIEwkmPY2+b/zDBEb+3 lfitpzs0E4SoZ1Yn4x49EsgKRz9iEcI= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=lUhJcDV1; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615139; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0S7yGeGIp3jHFXqgB9BBQQG2+YPsvsrr3HVQmGBoKGw=; b=k6asQRN2bswFphs/yOf3blP7iX0TLUiw0PuSfBr//ug5oZ0jPvN3MxIxT4Vkk0JwXKcpjd hmJGzuejqIv45gxEGpFL6t62GfACNvFV7Jsebc/KRhokKMMvD8kZa0yhxWo9vmvAmJbo/w et17EuMG6ttBxoHNlRF9gGGGHQ9y3/A= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615138; x=1718151138; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=n6G5K0m9WwVLI5yH5jOOTWumHDLO47Q1gA3TW71PVjE=; b=lUhJcDV1CUtaF6sTcdtCwWfbygH+0e9Oj9BJ0qrmWFScx/FypGyAOd5n q4WtkG8rPkJhLTGvMBkgZHb9cMvnKziZeKSjEmxR+1lYLbBimoHpxpVzb gJLjWW8gB+4wqtWo7/yo/w7l81B9MOP60C/VxR+iqitRyZhdyvv+9sfHl 5H+6AuA1cLrZqk62xhGO44HSPy5lnEmdICYdFZM3cOsmzUVexANm/kyhU DwpmZf9jN10eL0EDCFEzgp332XqkS3SdlJmg9kA5zNaWt2HRhoVkbfMqU yAc75W4eFc6eOKuHv3sMAjoaTMefvJvNY3AMaMSBVSbKIHl0s1SGE7Tdu A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556880" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556880" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671005" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671005" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:15 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 10/42] x86/mm: Introduce _PAGE_SAVED_DIRTY Date: Mon, 12 Jun 2023 17:10:36 -0700 Message-Id: <20230613001108.3040476-11-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: B237A160011 X-Stat-Signature: imzefi47qayj71aqkne9f39gkespqad1 X-HE-Tag: 1686615138-606952 X-HE-Meta: U2FsdGVkX19NayeMe0jJ+Eb3rpe7wyt+Vei6VZ3fV6uR0PP2fzCMS3ir+4E2HAsvGB5LdAVOaZUtVALGxY0upkKuxzw1yBPYFRn6S2lr25x7B2Itc5rxtFwh/lq/k9y9uG2W7ZdlaMZGANhDFRJHYxQHkY79i6zN2WShxOe7KtiP0ZP3StXTHGVhNzwitn3Ydv/ZufA5mLpth7mcDV3gedult19gmELqt2V+cqe2ixPliiXWhFAdQts3gBeV5y2K8FEQO3Sh028mDiXZuE2x2TFTRKo6IEwxqt42qJ/wnXtzWhvsgNgxxEgrP0a5VRechB4bzzU3/BE5IRAWjpVGYdja4X/DLY8u7FyPhCwPqNxjck6IRujFOBYPZgZ+NKwrvpsLchFaW8M1xofPK1q1o89Aj/SwLG/kgf9IVHUuOuFPmdA42PyCf34KFI70Jm1K1fmcZnp/G6FmmY6c2PGt1zqRgXCUsN2BaOsqDseZjy84y1DfiuH1/N3mMtTBs05FQFlJ8RLuZBdTVXC4IwVNHpSEjOfTMOyiEKp8c6rwLoHsr9W+MQvvVPjwpxpWq07wB6uetkq/79BUiVL03mDq+EnwI8PnMkNSKRd+GpFUsBSH8NfTaVYqDHbLBovZaIUrbkxt/qsaXkBMtVDfsIshQF0ACDEnZq94evG1Z3vNmq9suN3RpsUgrpfcf6qqNthLu5PLj7b5uWytzghVqxylqzeVhVGzImcHr1Yb9pa71CjgnVUqyVlLHvg/lPZK3026N++pfG3i4RB8Bu5AKAyCaQQbyGxMUuchxKaOAF5cfZz0eGzHWLyVfJelyF6uGHzi/g9mtYSOlpt+uWsUtnsLeMQe9LKr/mYAw2MKLJL6reDIQDVOnv5BaIiwxwr6T9RIMLyRgG62ifz/pIw1o/24tntfO88dPsR/gLYoVuMThdVlFlt9zUIUi9dZ0ITMRikl2mOdciLnihlyV5ztdi0 c6a3TW5M QXbaCSdPglToUvsm40H6eeKGSAbGlwIy9gJwFWAK099/BRKEnKl+OybWDEhoSYFbl192XmuZOHTQLmmIXMoJoxvQrBC9xBeq8a/5/eroTQsxjFatXIGlT+OXRjaHrBkNpIiBSBfjT6uIAM5sipii2bA5HeupcVhV40vCIMO7ZVpZ6Rx/WjS8SbUBNU6Q4QyIJwKpvfhiYE7GY3r5TbiGVACs4o3wtTRg+Jo2JV+V1pv/diiMT5dGD8oIm7RL9sIz0djqgj4q6CGbVGIARC144m2d6MSPtTJpYJ+QeWzEXQaXdOJvurlQ3/+1gyIfjNFyIrOZs3Kv0mZs6NmZnAenut5KMjWC5cUHfEIJrpiYOuxgSqqw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Some OSes have a greater dependence on software available bits in PTEs than Linux. That left the hardware architects looking for a way to represent a new memory type (shadow stack) within the existing bits. They chose to repurpose a lightly-used state: Write=0,Dirty=1. So in order to support shadow stack memory, Linux should avoid creating memory with this PTE bit combination unless it intends for it to be shadow stack. The reason it's lightly used is that Dirty=1 is normally set by HW _before_ a write. A write with a Write=0 PTE would typically only generate a fault, not set Dirty=1. Hardware can (rarely) both set Dirty=1 *and* generate the fault, resulting in a Write=0,Dirty=1 PTE. Hardware which supports shadow stacks will no longer exhibit this oddity. So that leaves Write=0,Dirty=1 PTEs created in software. To avoid inadvertently created shadow stack memory, in places where Linux normally creates Write=0,Dirty=1, it can use the software-defined _PAGE_SAVED_DIRTY in place of the hardware _PAGE_DIRTY. In other words, whenever Linux needs to create Write=0,Dirty=1, it instead creates Write=0,SavedDirty=1 except for shadow stack, which is Write=0,Dirty=1. There are six bits left available to software in the 64-bit PTE after consuming a bit for _PAGE_SAVED_DIRTY. For 32 bit, the same bit as _PAGE_BIT_UFFD_WP is used, since user fault fd is not supported on 32 bit. This leaves one unused software bit on 32 bit (_PAGE_BIT_SOFT_DIRTY, as this is also not supported on 32 bit). Implement only the infrastructure for _PAGE_SAVED_DIRTY. Changes to actually begin creating _PAGE_SAVED_DIRTY PTEs will follow once other pieces are in place. Since this SavedDirty shifting is done for all x86 CPUs, this leaves the possibility for the hardware oddity to still create Write=0,Dirty=1 PTEs in rare cases. Since these CPUs also don't support shadow stack, this will be harmless as it was before the introduction of SavedDirty. Implement the shifting logic to be branchless. Embed the logic of whether to do the shifting (including checking the Write bits) so that it can be called by future callers that would otherwise need additional branching logic. This efficiency allows the logic of when to do the shifting to be centralized, making the code easier to reason about. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v9: - Use bit shifting instead of conditionals (Linus) - Make saved dirty bit unconditional (Linus) - Add 32 bit support to make it extra unconditional - Don't re-order PAGE flags (Dave) --- arch/x86/include/asm/pgtable.h | 83 ++++++++++++++++++++++++++++ arch/x86/include/asm/pgtable_types.h | 38 +++++++++++-- arch/x86/include/asm/tlbflush.h | 3 +- 3 files changed, 119 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 768ee46782c9..a95f872c7429 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -301,6 +301,53 @@ static inline pte_t pte_clear_flags(pte_t pte, pteval_t clear) return native_make_pte(v & ~clear); } +/* + * Write protection operations can result in Dirty=1,Write=0 PTEs. But in the + * case of X86_FEATURE_USER_SHSTK, these PTEs denote shadow stack memory. So + * when creating dirty, write-protected memory, a software bit is used: + * _PAGE_BIT_SAVED_DIRTY. The following functions take a PTE and transition the + * Dirty bit to SavedDirty, and vice-vesra. + * + * This shifting is only done if needed. In the case of shifting + * Dirty->SavedDirty, the condition is if the PTE is Write=0. In the case of + * shifting SavedDirty->Dirty, the condition is Write=1. + */ +static inline unsigned long mksaveddirty_shift(unsigned long v) +{ + unsigned long cond = !(v & (1 << _PAGE_BIT_RW)); + + v |= ((v >> _PAGE_BIT_DIRTY) & cond) << _PAGE_BIT_SAVED_DIRTY; + v &= ~(cond << _PAGE_BIT_DIRTY); + + return v; +} + +static inline unsigned long clear_saveddirty_shift(unsigned long v) +{ + unsigned long cond = !!(v & (1 << _PAGE_BIT_RW)); + + v |= ((v >> _PAGE_BIT_SAVED_DIRTY) & cond) << _PAGE_BIT_DIRTY; + v &= ~(cond << _PAGE_BIT_SAVED_DIRTY); + + return v; +} + +static inline pte_t pte_mksaveddirty(pte_t pte) +{ + pteval_t v = native_pte_val(pte); + + v = mksaveddirty_shift(v); + return native_make_pte(v); +} + +static inline pte_t pte_clear_saveddirty(pte_t pte) +{ + pteval_t v = native_pte_val(pte); + + v = clear_saveddirty_shift(v); + return native_make_pte(v); +} + static inline pte_t pte_wrprotect(pte_t pte) { return pte_clear_flags(pte, _PAGE_RW); @@ -413,6 +460,24 @@ static inline pmd_t pmd_clear_flags(pmd_t pmd, pmdval_t clear) return native_make_pmd(v & ~clear); } +/* See comments above mksaveddirty_shift() */ +static inline pmd_t pmd_mksaveddirty(pmd_t pmd) +{ + pmdval_t v = native_pmd_val(pmd); + + v = mksaveddirty_shift(v); + return native_make_pmd(v); +} + +/* See comments above mksaveddirty_shift() */ +static inline pmd_t pmd_clear_saveddirty(pmd_t pmd) +{ + pmdval_t v = native_pmd_val(pmd); + + v = clear_saveddirty_shift(v); + return native_make_pmd(v); +} + static inline pmd_t pmd_wrprotect(pmd_t pmd) { return pmd_clear_flags(pmd, _PAGE_RW); @@ -484,6 +549,24 @@ static inline pud_t pud_clear_flags(pud_t pud, pudval_t clear) return native_make_pud(v & ~clear); } +/* See comments above mksaveddirty_shift() */ +static inline pud_t pud_mksaveddirty(pud_t pud) +{ + pudval_t v = native_pud_val(pud); + + v = mksaveddirty_shift(v); + return native_make_pud(v); +} + +/* See comments above mksaveddirty_shift() */ +static inline pud_t pud_clear_saveddirty(pud_t pud) +{ + pudval_t v = native_pud_val(pud); + + v = clear_saveddirty_shift(v); + return native_make_pud(v); +} + static inline pud_t pud_mkold(pud_t pud) { return pud_clear_flags(pud, _PAGE_ACCESSED); diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 447d4bee25c4..ee6f8e57e115 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -21,7 +21,8 @@ #define _PAGE_BIT_SOFTW2 10 /* " */ #define _PAGE_BIT_SOFTW3 11 /* " */ #define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */ -#define _PAGE_BIT_SOFTW4 58 /* available for programmer */ +#define _PAGE_BIT_SOFTW4 57 /* available for programmer */ +#define _PAGE_BIT_SOFTW5 58 /* available for programmer */ #define _PAGE_BIT_PKEY_BIT0 59 /* Protection Keys, bit 1/4 */ #define _PAGE_BIT_PKEY_BIT1 60 /* Protection Keys, bit 2/4 */ #define _PAGE_BIT_PKEY_BIT2 61 /* Protection Keys, bit 3/4 */ @@ -34,6 +35,13 @@ #define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */ #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4 +#ifdef CONFIG_X86_64 +#define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW5 /* Saved Dirty bit */ +#else +/* Shared with _PAGE_BIT_UFFD_WP which is not supported on 32 bit */ +#define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW2 /* Saved Dirty bit */ +#endif + /* If _PAGE_BIT_PRESENT is clear, we use these: */ /* - if the user mapped it with PROT_NONE; pte_present gives true */ #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL @@ -117,6 +125,22 @@ #define _PAGE_SOFTW4 (_AT(pteval_t, 0)) #endif +/* + * The hardware requires shadow stack to be Write=0,Dirty=1. However, + * there are valid cases where the kernel might create read-only PTEs that + * are dirty (e.g., fork(), mprotect(), uffd-wp(), soft-dirty tracking). In + * this case, the _PAGE_SAVED_DIRTY bit is used instead of the HW-dirty bit, + * to avoid creating a wrong "shadow stack" PTEs. Such PTEs have + * (Write=0,SavedDirty=1,Dirty=0) set. + */ +#ifdef CONFIG_X86_64 +#define _PAGE_SAVED_DIRTY (_AT(pteval_t, 1) << _PAGE_BIT_SAVED_DIRTY) +#else +#define _PAGE_SAVED_DIRTY (_AT(pteval_t, 0)) +#endif + +#define _PAGE_DIRTY_BITS (_PAGE_DIRTY | _PAGE_SAVED_DIRTY) + #define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) /* @@ -125,9 +149,9 @@ * instance, and is *not* included in this mask since * pte_modify() does modify it. */ -#define _PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \ - _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY | \ - _PAGE_SOFT_DIRTY | _PAGE_DEVMAP | _PAGE_ENC | \ +#define _PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \ + _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY_BITS | \ + _PAGE_SOFT_DIRTY | _PAGE_DEVMAP | _PAGE_ENC | \ _PAGE_UFFD_WP) #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE) @@ -188,10 +212,16 @@ enum page_cache_mode { #define __PAGE_KERNEL (__PP|__RW| 0|___A|__NX|___D| 0|___G) #define __PAGE_KERNEL_EXEC (__PP|__RW| 0|___A| 0|___D| 0|___G) + +/* + * Page tables needs to have Write=1 in order for any lower PTEs to be + * writable. This includes shadow stack memory (Write=0, Dirty=1) + */ #define _KERNPG_TABLE_NOENC (__PP|__RW| 0|___A| 0|___D| 0| 0) #define _KERNPG_TABLE (__PP|__RW| 0|___A| 0|___D| 0| 0| _ENC) #define _PAGE_TABLE_NOENC (__PP|__RW|_USR|___A| 0|___D| 0| 0) #define _PAGE_TABLE (__PP|__RW|_USR|___A| 0|___D| 0| 0| _ENC) + #define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX|___D| 0|___G) #define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0|___D| 0|___G) #define __PAGE_KERNEL_NOCACHE (__PP|__RW| 0|___A|__NX|___D| 0|___G| __NC) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 75bfaa421030..965659d2c965 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -293,7 +293,8 @@ static inline bool pte_flags_need_flush(unsigned long oldflags, const pteval_t flush_on_clear = _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED; const pteval_t software_flags = _PAGE_SOFTW1 | _PAGE_SOFTW2 | - _PAGE_SOFTW3 | _PAGE_SOFTW4; + _PAGE_SOFTW3 | _PAGE_SOFTW4 | + _PAGE_SAVED_DIRTY; const pteval_t flush_on_change = _PAGE_RW | _PAGE_USER | _PAGE_PWT | _PAGE_PCD | _PAGE_PSE | _PAGE_GLOBAL | _PAGE_PAT | _PAGE_PAT_LARGE | _PAGE_PKEY_BIT0 | _PAGE_PKEY_BIT1 | From patchwork Tue Jun 13 00:10:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277725 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E9F0CA9EC0 for ; Tue, 13 Jun 2023 00:12:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4E8468E000C; Mon, 12 Jun 2023 20:12:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4969F8E000B; Mon, 12 Jun 2023 20:12:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2EB258E000C; Mon, 12 Jun 2023 20:12:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1DE298E0008 for ; Mon, 12 Jun 2023 20:12:21 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id E4A7A140370 for ; Tue, 13 Jun 2023 00:12:20 +0000 (UTC) X-FDA: 80895797640.30.F88E3F9 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf01.hostedemail.com (Postfix) with ESMTP id A929740004 for ; Tue, 13 Jun 2023 00:12:18 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=GT6z5n5c; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615139; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PX1Ox8jo3hs6viUiDsDzJZ8BE9pewzLRSmXAF0uR7OU=; b=VTJ5KAQMm72zqOWuiD21J7pZAZ1C6+46IYERcp2cuPW2Sjek9pqqFLrAdFM27AIkw4zNPa 0VUH1l4X3Av+/4qLk2fe2ShdOUaZ5taHpV5zYfjiPkOU4ATscC6XWnx3AoKyrqW6v3VR21 lX0rX6I7EsLPbYAis8MZnxW2CQZjlmE= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=GT6z5n5c; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615139; a=rsa-sha256; cv=none; b=amJ5V6FwC7vKnfjT0QxkGf5pjDY5DzC+Q4+h7/EpJK2F1DmyGYCNi8O9R7lDDOeZrLheHp u6Wbs5nhTVJer3xAHR/RHKsA4BtMqpXyPe9oiw9Z1uYMNUkUK9R4DiSeGFEHs0Y/9uDz5u +OcIhIVTcGzwbSFwm/AN+ArE9I/n5v8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615138; x=1718151138; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=m52H3JZHS5dKvCQFGYyapfckCiGOGTRhvdDeSntBBxk=; b=GT6z5n5cPc3/7CkC8RombuOPzRbg7q9/C5J90C00Hgh8mgjHdMgeXL9N 0VzrqyJUph4JkyuukilJFZJLYI85mHI1bMtxRwtIpeOf6+HMk1YlwCT+/ 4x6aExPCRfR0ihc97J7NsKL1/hrv92pd82a1YkGrPLUri0r566zMAXpZJ be3qe/Ad0WNzbPMVLYD+ADMn4WkCOpn7DS2G0DORVJzBkmnsqtEnrRqU8 JJvXtOXGwhET8VGJcRpCl3oNXkAkFqJN4deBf+3kRwJErcfD6Bz3Qr+f/ MacIB0J78ir+FPo7znqSeMASifSyJ++0Rhag9CsiwT4yXee+Ncose3ZrU Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556898" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556898" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671009" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671009" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:16 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 11/42] x86/mm: Update ptep/pmdp_set_wrprotect() for _PAGE_SAVED_DIRTY Date: Mon, 12 Jun 2023 17:10:37 -0700 Message-Id: <20230613001108.3040476-12-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: A929740004 X-Stat-Signature: ha6e67g4iecmhj1ixhi51totfympzuou X-Rspam-User: X-HE-Tag: 1686615138-695957 X-HE-Meta: U2FsdGVkX1/2zdlDpdZ25Y5eNzqUbWj8oHRf97N2P5GbYGYlxKJImF9dAjMTnxpDZPwed1xTW/jl09Ji+yRjLQlExFWkcPFUxhXMdOqOdh2cQumybXS/CFJ2UVpmraxvX23fb80bzpuJKQYsfg9Oght/No1IC8Urwz8poJck5u+EtWcCAuoiUsMSiivru2mJbNQ9CMONxhH7uP4HwbgTIFlGpl5xe9blRH+HKQnFyNI/cl1H119CKw46M0NsmvTiOrTkrZtMeZPsRCJE/OjaPDRniMPH7FgnSOSD4fjWJtq3e7LWNwTX1b8Lsgt1Mh9nmsSgo1pHw01v3Jc0egTROZGDxLvVrNJONyW87oxfD0My0zpNV8qv7PqPzhrDL4sef1gufXL50RdEXXak9t0BzrGzLi5hmCZ94mZnQJZFP4VLKtc/HeHM1U+YAnjx6tTAXqS65mPLkhQzM+n+iUTUpbLpsOIYnuQok6zeUTabGeGuQLPkPn6H+TU0Ka+1+Pi7/JeXqkAIXpsU/B6pTfU4QN17MoNg0NaExr3Y79DLZV740o9JdvF7aDKahAePibY0vFyNq9XYkXEAv4M8U846PUwpW/AAIlNsR07tSTpMX8RyL4eqs1YhDu3W0OEX895CNRzRRLXI01Ib2tvbDXcBMg//Hkhu4/FfOfiN7SHtMguVvT40A70c1iblSt0sF3KO3GUu1uBoiFJZIwbuoldGlm8nGsjLZKCDptT43oPvoeXV/ygr5CdUZctWd9VfaNWIQ30DIigeRWVCVw+ednek47Kzrk6T0II559GtD3dzLVVSTBQSp22Y1xHn0XWD5mvv78m3gnIq5417s9wH8kJ3CI+u8iCqVZ6XM6YZz2x13g+9ZH8mHit3Z+MGEFggntcJ8UBBdGBKnUiw29YeSycCB9U8Q7qyUr+Id1GA++gG3j+70rBPFDy6WNbPvk56s3yQtky57RVFxs7v/vnyBvy AzbL4XPx UhzDhokIFUmG9DXw0ItcGh6Xo3kiPs6bccMxGQdy1NIgzj2H5HRI8UJ8NqnfcD89Osk4T/2afJVDHyKnjrctV1kB0mVfw6DJ74/JBgOU57mo1jiSI8JhLckHUG5AWO3BBJepej2HYjJ4aXJqbwM0Tvew59SxAJ3aUHZKbqlpN2tkvIOUKBrg6SR+IidmmLIMSsc4YJrqB8kblbxDjRert1TCIZ9oH8v3GE2nUqy3YRRRtDUll58zoPSX/dzcfBVmagMIk77TY+4ORnq/KWsFc50urYL6xBX/6yswC3sVuYdYXo2N5FxeJqaJp0nihqH9oAH1SoKzzLonmel7d4dPdkH0PzSLhXMA4BGnu4QhxxObgNCJ6QuTM5OnNXIkmb73XLUEw2uIB9a99RnkU0Z8PjF8KyYvprwTiSdFdGmJ0n4gGbK0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When shadow stack is in use, Write=0,Dirty=1 PTE are preserved for shadow stack. Copy-on-write PTEs then have Write=0,SavedDirty=1. When a PTE goes from Write=1,Dirty=1 to Write=0,SavedDirty=1, it could become a transient shadow stack PTE in two cases: 1. Some processors can start a write but end up seeing a Write=0 PTE by the time they get to the Dirty bit, creating a transient shadow stack PTE. However, this will not occur on processors supporting shadow stack, and a TLB flush is not necessary. 2. When _PAGE_DIRTY is replaced with _PAGE_SAVED_DIRTY non-atomically, a transient shadow stack PTE can be created as a result. Prevent the second case when doing a write protection and Dirty->SavedDirty shift at the same time with a CMPXCHG loop. The first case Note, in the PAE case CMPXCHG will need to operate on 8 byte, but try_cmpxchg() will not use CMPXCHG8B, so it cannot operate on a full PAE PTE. However the exiting logic is not operating on a full 8 byte region either, and relies on the fact that the Write bit is in the first 4 bytes when doing the clear_bit(). Since both the Dirty, SavedDirty and Write bits are in the first 4 bytes, casting to a long will be similar to the existing behavior which also casts to a long. Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many insights to the issue. Jann Horn provided the CMPXCHG solution. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v9: - Use bit shifting helpers that don't need any extra conditional logic. (Linus) - Always do the SavedDirty shifting (Linus) --- arch/x86/include/asm/pgtable.h | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index a95f872c7429..99b54ab0a919 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1189,7 +1189,17 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); + /* + * Avoid accidentally creating shadow stack PTEs + * (Write=0,Dirty=1). Use cmpxchg() to prevent races with + * the hardware setting Dirty=1. + */ + pte_t old_pte, new_pte; + + old_pte = READ_ONCE(*ptep); + do { + new_pte = pte_wrprotect(old_pte); + } while (!try_cmpxchg((long *)&ptep->pte, (long *)&old_pte, *(long *)&new_pte)); } #define flush_tlb_fix_spurious_fault(vma, address, ptep) do { } while (0) @@ -1241,7 +1251,17 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { - clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); + /* + * Avoid accidentally creating shadow stack PTEs + * (Write=0,Dirty=1). Use cmpxchg() to prevent races with + * the hardware setting Dirty=1. + */ + pmd_t old_pmd, new_pmd; + + old_pmd = READ_ONCE(*pmdp); + do { + new_pmd = pmd_wrprotect(old_pmd); + } while (!try_cmpxchg((long *)pmdp, (long *)&old_pmd, *(long *)&new_pmd)); } #ifndef pmdp_establish From patchwork Tue Jun 13 00:10:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277727 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54C69CA9EA5 for ; Tue, 13 Jun 2023 00:12:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8448C8E000D; Mon, 12 Jun 2023 20:12:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 757AD8E000B; Mon, 12 Jun 2023 20:12:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A9568E000D; Mon, 12 Jun 2023 20:12:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 4C72C8E000B for ; Mon, 12 Jun 2023 20:12:22 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 290CBA0346 for ; Tue, 13 Jun 2023 00:12:22 +0000 (UTC) X-FDA: 80895797724.24.EECE21F Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf19.hostedemail.com (Postfix) with ESMTP id F00BE1A0002 for ; Tue, 13 Jun 2023 00:12:19 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="Md/55sSS"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615140; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=w6xmAlo1/6JBYZEdIzULXEdqQeAR1+gtIuehh+1XHvc=; b=RuJrf4UFXjsstg56NZGl03XPJDsF/dJ1/DjXFHKv3E8KBbCwJlAvb/FJxZz5mYzj/WTyQF r5oZ4fE9ozLA1aEkvKNk7fCz9Ie+YReV848Gk8J98OAvoKHKMD/hw2EdmaxiPn31lvApxD Jk1mu+ZC9/QHRhPoe2MJvHwREd612UM= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="Md/55sSS"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615140; a=rsa-sha256; cv=none; b=t9DcEopTVe5TDb4gYVOGqBc4wTQPy2nZ6twnMdS0jrIh8qH8Ue7gurI5pOx9mCPkGZVvAy 7B+dKxJK0B0UxGmDnr0crTiT52BdhZkOJtKmsSrUJ9fNFqEeQ7ANhRF6iXu0Qo3cDKkiSv bCaoqv/km6YONdtqoisJ6zUEK/Qwhiw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615140; x=1718151140; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7eVLFaUcP7DtfYYGhz2IcxmOiAMntCwFG2k86EQ1hPY=; b=Md/55sSSxGhgsxNzQpuqnQPE/9Bvop8m2CsQorDrAzlYID/gYv5+CmYV 5+eWhc2gDaxfkNt69vkRimOo5G5Ple3JHdjpgf2G1T+V/M0jMHUllaxAg bJaeKqKmXVxFRYsDSaXtLQc5wvcamTF3NUSHnVVKQyj5gTwsRDIUgRR6h 4og8Qtqn/OVyp2h+nhyPQ6lXjmF3H2wG4BdWuyqH0iW6Fk/2dXF9Ip9cZ 6Tf0Ew3cCHzKCNm34/oH85xE5BBpScVedssPJXFbpcSLRpREhtJDXDu8c DXsnOEQF7B20lI9ydGXRelfLu+OU+ykl6czqBILwLJ+BjWaurcbrKr+sU g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556929" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556929" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671013" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671013" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:17 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 12/42] x86/mm: Start actually marking _PAGE_SAVED_DIRTY Date: Mon, 12 Jun 2023 17:10:38 -0700 Message-Id: <20230613001108.3040476-13-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: F00BE1A0002 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: bih8b9z5k76bqfxkqnnobx3as6uee9ac X-HE-Tag: 1686615139-271671 X-HE-Meta: U2FsdGVkX19NIE2q5clINoIQ/mfKXUwhyq/i9XS30pVVnsUgiKMMCsmuv3JrcQkw3dBHUnP363Ol6jvr1WUMv2X5nrnTP12RihAZjpAWJw/geFw2Zbjf3ioDfaqbSvlBp85v2b3rnxF8WRmNIEJUYjh58WAv5HVjd6Txvyl+F5QHZ9VsmnPWH8bVRgHVwRfUqTjj+xOrTrQmyOzJnEwsdo2XjJdZaN3S+k56MZ1wW2wb/3VvYQZgniEBJUnf3/WffXrIpJNBoRy6G3K1IG9+338S0dJ169YbNahKQjEco4Gidb1x8CRf9xkO2PgLaVQQROJqP8IuF2fgGyXPQmFx6W27PFwf2XqGgXbX3ebKB9yFC3QiySyEMFIhwI5ujZq5izZYcqJMVtTPyYbFklMejvFpD8s05uurts9srZfEIcddSXTI4En5YK+jgCsWunxxmZY3UG/H/QH52AjG/yInmkEcrlprCByAHjfwBzDKJ1T7a5b2JllwbGxvKklnyeOxRXWR4I0aY/WW9oONm74GW4IR9NgiYVfl+jlcOkITeuyDbTKfZ9tDTyA6oHrtK62cyBIyGHXTjUpGHeu8UfuXD3aEor6zgP0MIztVXY8tHWrKR69W8C1qIzcwVHSicnis/+BiROjTnUmzEsYZ8T1DPhP4zO2kKSKAZhuRdQi5AX3OcpSsuTOKMQImN1QxFdzFF4a19VR9zj+nqSL7vEopG1/wgufj1NdsNtq+HtkWlK8GM8CDU4Z41JsguN2TwbPmZ7ubKDLhe+ZHkRV3qs4GmYCXMAJ4MyZ1ftxsyrl6f/3BSvLmcKDa4ZXI6xPLTuQd9Mnjh13szKZKzF/+JlMeZCH8Cz7nwWL7jjH29KXyrpSH5iKMXjbi/E8qaC59ioGgfbFrpfnRsu8odHT6LsXkU8T1zxJ+tSR423v18iiuD1JfoArftg7dWQl0KKCvuhScolhy/buBq3pe84rSOUM rxZpfVh/ IVVw+jqOsgRNSa9gc+m40CgxThGXufu/QDtAU0+Ev+CjZQzySDV97LlS3F74iiPLCTEaL5L+J8sFTMTxD90vjElp3SjVteItVfvuA6Whv8PIzrXjfU9bGrsbxuhBVjVUZXgCzv08bblvsZcPtIkyCFQwMAP48qb9KrvmuJ0HHG1bHdMjK/O+0REcw7dDxUTMstt3BEwDv75Xzx3VDShA5BwBZQZc6RniVikCnoHHszjVOVf5ufejXGodzjpXSD0EtNZ06udtIgJNFuWZwRJeDt2omQaPF2ZM8z6W8hx4cykqReHcexa3LjCH4WJ+dbwfiR6TKc1oy5RZKn++iQEshOuuma4kASr6ihIpYnsTtpsZu8Zzm7uyKX45eRWPmpWGMa5n0Q87aglxm3H9o6NnvOuibOHhn3syZjZXnv07lzRtfYyg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The recently introduced _PAGE_SAVED_DIRTY should be used instead of the HW Dirty bit whenever a PTE is Write=0, in order to not inadvertently create shadow stack PTEs. Update pte_mk*() helpers to do this, and apply the same changes to pmd and pud. Since there is no x86 version of pte_mkwrite() to hold this arch specific logic, create one. Add it to x86/mm/pgtable.c instead of x86/asm/include/pgtable.h as future patches will require it to live in pgtable.c and it will make the diff easier for reviewers. Since CPUs without shadow stack support could create Write=0,Dirty=1 PTEs, only return true for pte_shstk() if the CPU also supports shadow stack. This will prevent these HW creates PTEs as showing as true for pte_write(). For pte_modify() this is a bit trickier. It takes a "raw" pgprot_t which was not necessarily created with any of the existing PTE bit helpers. That means that it can return a pte_t with Write=0,Dirty=1, a shadow stack PTE, when it did not intend to create one. Modify it to also move _PAGE_DIRTY to _PAGE_SAVED_DIRTY. To avoid creating Write=0,Dirty=1 PTEs, pte_modify() needs to avoid: 1. Marking Write=0 PTEs Dirty=1 2. Marking Dirty=1 PTEs Write=0 The first case cannot happen as the existing behavior of pte_modify() is to filter out any Dirty bit passed in newprot. Handle the second case by shifting _PAGE_DIRTY=1 to _PAGE_SAVED_DIRTY=1 if the PTE was write protected by the pte_modify() call. Apply the same changes to pmd_modify(). Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v9: - Use bit shifting helpers that don't need any extra conditional logic. (Linus) - Handle the harmless Write=0->Write=1 pte_modify() case with the shifting helpers. - Don't ever return true for pte_shstk() if the CPU does not support shadow stack. --- arch/x86/include/asm/pgtable.h | 151 ++++++++++++++++++++++++++++----- arch/x86/mm/pgtable.c | 14 +++ 2 files changed, 144 insertions(+), 21 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 99b54ab0a919..d8724f5b1202 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -124,9 +124,15 @@ extern pmdval_t early_pmd_flags; * The following only work if pte_present() is true. * Undefined behaviour if not.. */ -static inline int pte_dirty(pte_t pte) +static inline bool pte_dirty(pte_t pte) { - return pte_flags(pte) & _PAGE_DIRTY; + return pte_flags(pte) & _PAGE_DIRTY_BITS; +} + +static inline bool pte_shstk(pte_t pte) +{ + return cpu_feature_enabled(X86_FEATURE_SHSTK) && + (pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY; } static inline int pte_young(pte_t pte) @@ -134,9 +140,16 @@ static inline int pte_young(pte_t pte) return pte_flags(pte) & _PAGE_ACCESSED; } -static inline int pmd_dirty(pmd_t pmd) +static inline bool pmd_dirty(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_DIRTY; + return pmd_flags(pmd) & _PAGE_DIRTY_BITS; +} + +static inline bool pmd_shstk(pmd_t pmd) +{ + return cpu_feature_enabled(X86_FEATURE_SHSTK) && + (pmd_flags(pmd) & (_PAGE_RW | _PAGE_DIRTY | _PAGE_PSE)) == + (_PAGE_DIRTY | _PAGE_PSE); } #define pmd_young pmd_young @@ -145,9 +158,9 @@ static inline int pmd_young(pmd_t pmd) return pmd_flags(pmd) & _PAGE_ACCESSED; } -static inline int pud_dirty(pud_t pud) +static inline bool pud_dirty(pud_t pud) { - return pud_flags(pud) & _PAGE_DIRTY; + return pud_flags(pud) & _PAGE_DIRTY_BITS; } static inline int pud_young(pud_t pud) @@ -157,13 +170,21 @@ static inline int pud_young(pud_t pud) static inline int pte_write(pte_t pte) { - return pte_flags(pte) & _PAGE_RW; + /* + * Shadow stack pages are logically writable, but do not have + * _PAGE_RW. Check for them separately from _PAGE_RW itself. + */ + return (pte_flags(pte) & _PAGE_RW) || pte_shstk(pte); } #define pmd_write pmd_write static inline int pmd_write(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_RW; + /* + * Shadow stack pages are logically writable, but do not have + * _PAGE_RW. Check for them separately from _PAGE_RW itself. + */ + return (pmd_flags(pmd) & _PAGE_RW) || pmd_shstk(pmd); } #define pud_write pud_write @@ -350,7 +371,14 @@ static inline pte_t pte_clear_saveddirty(pte_t pte) static inline pte_t pte_wrprotect(pte_t pte) { - return pte_clear_flags(pte, _PAGE_RW); + pte = pte_clear_flags(pte, _PAGE_RW); + + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PTE (Write=0,Dirty=1). Move the hardware + * dirty value to the software bit, if present. + */ + return pte_mksaveddirty(pte); } #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP @@ -388,7 +416,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte) static inline pte_t pte_mkclean(pte_t pte) { - return pte_clear_flags(pte, _PAGE_DIRTY); + return pte_clear_flags(pte, _PAGE_DIRTY_BITS); } static inline pte_t pte_mkold(pte_t pte) @@ -403,7 +431,16 @@ static inline pte_t pte_mkexec(pte_t pte) static inline pte_t pte_mkdirty(pte_t pte) { - return pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pte = pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + + return pte_mksaveddirty(pte); +} + +static inline pte_t pte_mkwrite_shstk(pte_t pte) +{ + pte = pte_clear_flags(pte, _PAGE_RW); + + return pte_set_flags(pte, _PAGE_DIRTY); } static inline pte_t pte_mkyoung(pte_t pte) @@ -416,6 +453,10 @@ static inline pte_t pte_mkwrite_novma(pte_t pte) return pte_set_flags(pte, _PAGE_RW); } +struct vm_area_struct; +pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma); +#define pte_mkwrite pte_mkwrite + static inline pte_t pte_mkhuge(pte_t pte) { return pte_set_flags(pte, _PAGE_PSE); @@ -480,7 +521,14 @@ static inline pmd_t pmd_clear_saveddirty(pmd_t pmd) static inline pmd_t pmd_wrprotect(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_RW); + pmd = pmd_clear_flags(pmd, _PAGE_RW); + + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PMD (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + return pmd_mksaveddirty(pmd); } #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP @@ -507,12 +555,21 @@ static inline pmd_t pmd_mkold(pmd_t pmd) static inline pmd_t pmd_mkclean(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_DIRTY); + return pmd_clear_flags(pmd, _PAGE_DIRTY_BITS); } static inline pmd_t pmd_mkdirty(pmd_t pmd) { - return pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pmd = pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + + return pmd_mksaveddirty(pmd); +} + +static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd) +{ + pmd = pmd_clear_flags(pmd, _PAGE_RW); + + return pmd_set_flags(pmd, _PAGE_DIRTY); } static inline pmd_t pmd_mkdevmap(pmd_t pmd) @@ -535,6 +592,9 @@ static inline pmd_t pmd_mkwrite_novma(pmd_t pmd) return pmd_set_flags(pmd, _PAGE_RW); } +pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); +#define pmd_mkwrite pmd_mkwrite + static inline pud_t pud_set_flags(pud_t pud, pudval_t set) { pudval_t v = native_pud_val(pud); @@ -574,17 +634,26 @@ static inline pud_t pud_mkold(pud_t pud) static inline pud_t pud_mkclean(pud_t pud) { - return pud_clear_flags(pud, _PAGE_DIRTY); + return pud_clear_flags(pud, _PAGE_DIRTY_BITS); } static inline pud_t pud_wrprotect(pud_t pud) { - return pud_clear_flags(pud, _PAGE_RW); + pud = pud_clear_flags(pud, _PAGE_RW); + + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PUD (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + return pud_mksaveddirty(pud); } static inline pud_t pud_mkdirty(pud_t pud) { - return pud_set_flags(pud, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pud = pud_set_flags(pud, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + + return pud_mksaveddirty(pud); } static inline pud_t pud_mkdevmap(pud_t pud) @@ -604,7 +673,9 @@ static inline pud_t pud_mkyoung(pud_t pud) static inline pud_t pud_mkwrite(pud_t pud) { - return pud_set_flags(pud, _PAGE_RW); + pud = pud_set_flags(pud, _PAGE_RW); + + return pud_clear_saveddirty(pud); } #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY @@ -721,6 +792,7 @@ static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask); static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { pteval_t val = pte_val(pte), oldval = val; + pte_t pte_result; /* * Chop off the NX bit (if present), and add the NX portion of @@ -729,17 +801,54 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) val &= _PAGE_CHG_MASK; val |= check_pgprot(newprot) & ~_PAGE_CHG_MASK; val = flip_protnone_guard(oldval, val, PTE_PFN_MASK); - return __pte(val); + + pte_result = __pte(val); + + /* + * To avoid creating Write=0,Dirty=1 PTEs, pte_modify() needs to avoid: + * 1. Marking Write=0 PTEs Dirty=1 + * 2. Marking Dirty=1 PTEs Write=0 + * + * The first case cannot happen because the _PAGE_CHG_MASK will filter + * out any Dirty bit passed in newprot. Handle the second case by + * going through the mksaveddirty exercise. Only do this if the old + * value was Write=1 to avoid doing this on Shadow Stack PTEs. + */ + if (oldval & _PAGE_RW) + pte_result = pte_mksaveddirty(pte_result); + else + pte_result = pte_clear_saveddirty(pte_result); + + return pte_result; } static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) { pmdval_t val = pmd_val(pmd), oldval = val; + pmd_t pmd_result; - val &= _HPAGE_CHG_MASK; + val &= (_HPAGE_CHG_MASK & ~_PAGE_DIRTY); val |= check_pgprot(newprot) & ~_HPAGE_CHG_MASK; val = flip_protnone_guard(oldval, val, PHYSICAL_PMD_PAGE_MASK); - return __pmd(val); + + pmd_result = __pmd(val); + + /* + * To avoid creating Write=0,Dirty=1 PMDs, pte_modify() needs to avoid: + * 1. Marking Write=0 PMDs Dirty=1 + * 2. Marking Dirty=1 PMDs Write=0 + * + * The first case cannot happen because the _PAGE_CHG_MASK will filter + * out any Dirty bit passed in newprot. Handle the second case by + * going through the mksaveddirty exercise. Only do this if the old + * value was Write=1 to avoid doing this on Shadow Stack PTEs. + */ + if (oldval & _PAGE_RW) + pmd_result = pmd_mksaveddirty(pmd_result); + else + pmd_result = pmd_clear_saveddirty(pmd_result); + + return pmd_result; } /* diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index e4f499eb0f29..0ad2c62ac0a8 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -880,3 +880,17 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) #endif /* CONFIG_X86_64 */ #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ + +pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) +{ + pte = pte_mkwrite_novma(pte); + + return pte_clear_saveddirty(pte); +} + +pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) +{ + pmd = pmd_mkwrite_novma(pmd); + + return pmd_clear_saveddirty(pmd); +} From patchwork Tue Jun 13 00:10:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277728 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A43EDCA9EAE for ; Tue, 13 Jun 2023 00:12:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D4DF28E000E; Mon, 12 Jun 2023 20:12:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C86858E000B; Mon, 12 Jun 2023 20:12:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B272D8E000E; Mon, 12 Jun 2023 20:12:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A3FED8E000B for ; Mon, 12 Jun 2023 20:12:23 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 6FDB21C7BBB for ; Tue, 13 Jun 2023 00:12:23 +0000 (UTC) X-FDA: 80895797766.16.9DB1296 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf01.hostedemail.com (Postfix) with ESMTP id 26CD540004 for ; Tue, 13 Jun 2023 00:12:20 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=W1MZ76Nh; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615141; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IQk0l3wwsjhshYURQZ9ElMIHgqOA76qR/5uNDXp6uwI=; b=iiJ+QQH8lnv4kXoj/dTmcp0sSE3x0/MwhIwnVGjgV3Ww4CKMPFiDv2+/pIiFiw/sYR1f+g 0sJuSszCDM4LIU+m9MHYwigJx20lvygA3PoI72mBwtF+9bnPk/dj7Hks1YERx5zFuK+wNZ Zp8Ez2JwZTcY/vb513SY3zqadCWG7po= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=W1MZ76Nh; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615141; a=rsa-sha256; cv=none; b=4h2Sup1wHqekuIBK0ZRbrkhWWVjLJJb1exitTfSHtlTWR0x9RPeaij9/RkeOjspmad0E5d /7Cllks4CUPDcE66VVE6liKkqj4eHtI1bcMESkOAG7xz/vPIzJwnCLt/b7TTJvm0VYSe7X 9cAu5g9JKrksJsGoU5mwI+/JqNOyiC0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615141; x=1718151141; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qno7FNbVCOlU0l2tsMlcvtzg5iSqFMH8Q3FgqO3yA3s=; b=W1MZ76NhryJv/MhDVPRZKTrm3sDN5Z5ujDrOg40E6VIdKU+CqctOOGQ/ 9AQFciPpJy5nr5HcM4fEspyumlqpo1cBWtEymY1pqQb61IPyMaZFuc2+j EetdC2u+J0dZbucfqz800TdSp1E4wHorgM3iHrkhZNvBxAiZOB/1xZLTO D7O8rhpad15pPrYv/kqvu6QJ0dF1M0e2Eyc8Dt7tBnuTlxGSw8PI7MU9n c0Lx7dDnML3wXzO6QkwFl39WYA0Q+zlmiUCppzGY/1gYucq3wmPIdggX9 9Je8UcM/f2euKy9ZF0yYnwowfAyrIq3F4BLJC2GNwm1NyN69/1hzuwHZH g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556957" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556957" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671016" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671016" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:18 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 13/42] x86/mm: Remove _PAGE_DIRTY from kernel RO pages Date: Mon, 12 Jun 2023 17:10:39 -0700 Message-Id: <20230613001108.3040476-14-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 26CD540004 X-Stat-Signature: onxkh6rqt7t13krz8wwxu3huafgq3ei3 X-Rspam-User: X-HE-Tag: 1686615140-381 X-HE-Meta: U2FsdGVkX1/KsWPIdJMAVIYAMdm7FgwsawjMgJazK/ta7C/NPXn6tl4oz46+TA3wNHJgYSKcOURAI2vC0hJUga4aGSMp+Hqw4Qwi1Ezb1eFJ6Dmex7aMmLPzji4unvheedZ35nvb9CQfI9CS734qN1RsLS5E9l5fxG6aBxWNJBZEa7ePaCOFAmkQIjm8cZCBKRe0UGGTWwHSGli8IbF66FbK8sH+t1oKNDvMEdAm42y5Rj5OxIUfdEy+WLeaqeZ8efNb+SWDNYVJ1MOIp4+z11NkwtDtFwJS8U8ofD4Cozi1xzbXoGbgDmWMJR3mYnM5Mlw/MnwM9Ug+nRIIqbAkW9A6qtdmceLxpzLnLUHAIQeZWY71Yoh4ytv5HrIh0SJcou0ZA/sxwQi9IWffJ1wcd/EDY9W25OtgnQH8KHowY5j+7HM8s5padcrs41ME/eu2M00uputSAU9740m96IqbFF5mqgvIcPV3SQJwMVmzShjT8Av5sXmCL4IxzSRKn6v3z3cc80L09Yg8Db82XeJZAOLF20ShRp3XUcnYJz5x8sr9DRS4uVIvY9YKY86ee/pvMUoqPc/6X90DAPlZGYgfECx739O27ds3Pz465ovS94jZQBCDTuxyN+eVrvtcGeVOPWsZxQ1TDoYRZxUqNNzwixAWw1P9zGnnkgpUtV6BiMP+y+Lj8ax2mzSnAuCVaiQ/6elFcVgu+tdGF2YcO4kXYZzDbZPQgFm8NzeoprkFMMqTeGL2R16FncGYpDSVe7fyJ1udqz0GFceDVLLKbTOj3/kqMDcQm766hEEO3DoeXRWMU2/w2E11YoiJyQTk9TrBqYup6xy57xM0of0OaWCF5PX/AijfB7ngxKgohvQPskIJrPuFjufq1TFx9nluhv39NqWKiKo1E0ds0SnOX32kCwWy85o9+3hnyxHUd6Om9U5S4rtaixMSmjh1yg0cc89/Yca3kHe13rvkeLB4A2g 3ZoOxxKx jMYLMXPx+e5XzmX+efgxqY9BBACsGnzyBwgaQKde6W56Zi3rEgp2la9cG/IKDvngHhW7/jzEc0a9lxLrKk31FieJvDhZORZMFnE8w1rZfZtGe2bdGmVR6bXLal/T+TpexbHfJ3BU6Yb0OqHKV2OaGW/+prtvaBB/d3KaYHMwf+hbSbH4a38+Ck+Y1pSg3gnovL9D9XZoRVPLMehuckGiSmNAtfTjuVU1MKHr5I/OE8bRWntvrEzAPN5B0NtKsrUV0xBq6SqudcZYk9Nx6Ekwz3yOSH2ZRPmo6CVDbNntS5MAEwQsVLFYz+NaY/ulJbBodTkGHDOcwuEpl+9DGYn5upfpg4wMakGC5i1TtgidOPabmi7PIhxyZ2el6GEs93NOsrNs3UjC9pVtfeQixUkvBgK/rCFbUUpru5i7Xm+WH1z0TdQ3JcodTZGg7g2UrZRgwpz1z X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: New processors that support Shadow Stack regard Write=0,Dirty=1 PTEs as shadow stack pages. In normal cases, it can be helpful to create Write=1 PTEs as also Dirty=1 if HW dirty tracking is not needed, because if the Dirty bit is not already set the CPU has to set Dirty=1 when the memory gets written to. This creates additional work for the CPU. So traditional wisdom was to simply set the Dirty bit whenever you didn't care about it. However, it was never really very helpful for read-only kernel memory. When CR4.CET=1 and IA32_S_CET.SH_STK_EN=1, some instructions can write to such supervisor memory. The kernel does not set IA32_S_CET.SH_STK_EN, so avoiding kernel Write=0,Dirty=1 memory is not strictly needed for any functional reason. But having Write=0,Dirty=1 kernel memory doesn't have any functional benefit either, so to reduce ambiguity between shadow stack and regular Write=0 pages, remove Dirty=1 from any kernel Write=0 PTEs. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/pgtable_types.h | 8 +++++--- arch/x86/mm/pat/set_memory.c | 4 ++-- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index ee6f8e57e115..26f07d6d5758 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -222,10 +222,12 @@ enum page_cache_mode { #define _PAGE_TABLE_NOENC (__PP|__RW|_USR|___A| 0|___D| 0| 0) #define _PAGE_TABLE (__PP|__RW|_USR|___A| 0|___D| 0| 0| _ENC) -#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX|___D| 0|___G) -#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0|___D| 0|___G) +#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX| 0| 0|___G) +#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0| 0| 0|___G) +#define __PAGE_KERNEL (__PP|__RW| 0|___A|__NX|___D| 0|___G) +#define __PAGE_KERNEL_EXEC (__PP|__RW| 0|___A| 0|___D| 0|___G) #define __PAGE_KERNEL_NOCACHE (__PP|__RW| 0|___A|__NX|___D| 0|___G| __NC) -#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX|___D| 0|___G) +#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX| 0| 0|___G) #define __PAGE_KERNEL_LARGE (__PP|__RW| 0|___A|__NX|___D|_PSE|___G) #define __PAGE_KERNEL_LARGE_EXEC (__PP|__RW| 0|___A| 0|___D|_PSE|___G) #define __PAGE_KERNEL_WP (__PP|__RW| 0|___A|__NX|___D| 0|___G| __WP) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 7159cf787613..fc627acfe40e 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -2073,12 +2073,12 @@ int set_memory_nx(unsigned long addr, int numpages) int set_memory_ro(unsigned long addr, int numpages) { - return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW), 0); + return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW | _PAGE_DIRTY), 0); } int set_memory_rox(unsigned long addr, int numpages) { - pgprot_t clr = __pgprot(_PAGE_RW); + pgprot_t clr = __pgprot(_PAGE_RW | _PAGE_DIRTY); if (__supported_pte_mask & _PAGE_NX) clr.pgprot |= _PAGE_NX; From patchwork Tue Jun 13 00:10:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277730 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0506DC88CBB for ; Tue, 13 Jun 2023 00:12:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3FCE98E000F; Mon, 12 Jun 2023 20:12:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3AE408E000B; Mon, 12 Jun 2023 20:12:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24E3D8E000F; Mon, 12 Jun 2023 20:12:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 14D1F8E000B for ; Mon, 12 Jun 2023 20:12:24 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id DEB8A80375 for ; Tue, 13 Jun 2023 00:12:23 +0000 (UTC) X-FDA: 80895797766.10.09440C0 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf08.hostedemail.com (Postfix) with ESMTP id A523F160014 for ; Tue, 13 Jun 2023 00:12:21 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=W0BkPvsO; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615142; a=rsa-sha256; cv=none; b=5Wj/KiEx/ANstME2Ok5NmyQDWsp3VgwrXNH6iebizwu20LyUmHEgoa8vpM2WlexHxAUJAe IumPX2XqK7xDSZBciDZxGhLlk62u2QYxp7U6SE/Ob7+PSx4qYxx/JADhFi+8Pro/qTRgQY 6HDZqbNlBhvzt1D1cS+eVTa7AfHnCAk= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=W0BkPvsO; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615142; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FOFVMBXtKB8/Hcfbd8g/yK53EZfWd2ezBu+ZFVQ6/dM=; b=2d5YzkvRCMEy06YtxzUhtut+oduhjTjr/0qH8mbB5BITg7yd21XQvGdQWZ8vQQwA2xE+3j MzYX21Kwc9ayCmEsFSDCvBc5UhOe+HvehLZ9Y4Jqri9w+0mbBRJ2PimoBHb0ReNh57qiEq 220H5dSoklmrEswwl7rAv+tWY2W06bw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615141; x=1718151141; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=svMWn5R9eThoiar+k0qdORAiS7Vwyk8B+BTeN3eLytc=; b=W0BkPvsOqGwx2vXcj0cJ/uDDL+JrLN1IbsmQX8sSsNCypPnJdTfOdFxt b2I7jZU2SkgeCyKImSh1+q/BXsnga6DnfTfTCdVErdojQhs3ju3tgHgg6 /aSuaPiuDazmsurH8+K6OB4wM7aX+9Pt+adk3GVrq69V29szE7KSbYfBZ MY6m9+GBnBV43pe14jBzUTLiV/sCPq6ypNGEzQPdoDwkGXVzgUK8YJHfT gcrcEoZiyIWMMvN5Yi265Lqp6gOM1uIYrceOK6hsmwHciy03o0oCKx/P6 EytezrTZ1mhGsRImYX462IBkApKGu5XvBLRiIFwmUBGZluzCk9z3MBSFT w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556985" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556985" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671022" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671022" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:19 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 14/42] mm: Introduce VM_SHADOW_STACK for shadow stack memory Date: Mon, 12 Jun 2023 17:10:40 -0700 Message-Id: <20230613001108.3040476-15-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: A523F160014 X-Stat-Signature: x6cftntfjap6kgmetzib54ztfefw8x4y X-HE-Tag: 1686615141-789899 X-HE-Meta: U2FsdGVkX19ZtyNWISnUg+Pw/01bSGPJVkFNIagX1gy9gyRZIlfAG5P9BPYDl3ArvjO+toobVEAqUsHG3fsVcKximm443fWexLNfkiD919wug+S7YvqnF3nzeazub8z4bErXKl6TJtZrCvPHf+aKbjIB6bYBbP8HNUmoSusW1XPoMSxxp3hJBy5/iBlu46MpQq0D+QilOpUGBk7Gv3lVtO89qw9FCtXC5stHZiyl9R30dFeTJuFoXvXeuaBWZ+9ab2LcynSWwDrCO/SDyiFste1PyShrEPiRf9EZ4gP8ccraqXF3U22AjyG5cAMX+yj/gWlEXLnclGErID2C3tPdaaoyiPdLIkS3F+rMKVfZIJIu9yRHzqA635r3EtDyyXaucXb2mRDCr2bDQ8OQLPb4oFCxNZmu6tHoEmZN+3iCbCBrlIPGf85H4IVlG9x2d9bk3vcl2PwV8xoMM+GUMOLSS7sCEU0PmRJapHlvd+POIznwqF0oaKZ/gB5c7JGf8EB3VOuuY634T3cukwMiwzRt3uMssQC1j5icxM+zLOk5MH/UttvPCmbQMSvFCdju7bXosaRBFw7eRIaopMDe10loqavxSJ672TGJOajfpZV7dkFkRvLCTW2b/hNRW2++TDCLIIrmpqlq2K7YV46w8qnLrEO5KfKJUP2+NcJ3X+GWln1nGlTXuHmz/L4xDxEshm/5/zHn8XpxZ7kIJohPgRbM0ASj1lKkivfx1jPoLf86J0kIx9jpv1f1EydYi6HvGzxKDPE77YA9NO3QSaPcIXxWRJG6UOBnuTv4JV3l6w0Cz2vO8pFsIfuO6gcv0OBM3o5u8dTZ2b2WNHbP1W/bWTklq9WbhNLVvN2yhUxfklnTzKr/Ytk4THd9NJP8xwY4caZNgD13cn33bNkt+Fn18DmUkZlReIfOSQtS3PY598WduRpEoXH8dpCVrzCK72/hpz11K9gJDB+Ww2ZP78kk2BD fHCH3zrZ BSmFVwDZLCDgi5N08gTaUYtpIyhh8DB+yJoEPt41yWAP7n+Z+u2DUeP1DUZ+i1LJ6Sni1yfxXFzoHuwIEVR1PdxnT1Z3VJ3bylLApNW1bcp3MZTMliGAW/C3hTQpH8/plOWynPQrVPidQtPOGpAbJFf5FkS+Fiwo9Xw9sPSPHZ0zJE1p4WB3YWHYCKbc//oLrd9NaOmq0NSHb4tJdVrWJs6bhxL+eXcaUIX7siC3/vKXRs8SbCsRJEtHTe5G2ZE50Va5Fz0ffNWhutEpJIUsfWn7m3bFrQ9u/noSdVxEEwt0w4iwWZgAV7XF8qItz7J82iWLgvDZeAgQOSfZY+6aNpnetOMyYz0rqLa0SHs1+X+c5wJhm96KF64hcbjcA9LF+S9Yi8egt7wig31eNXPJCzbh0h0tSAQo9Yle2zhJyNzrIL8k= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu New hardware extensions implement support for shadow stack memory, such as x86 Control-flow Enforcement Technology (CET). Add a new VM flag to identify these areas, for example, to be used to properly indicate shadow stack PTEs to the hardware. Shadow stack VMA creation will be tightly controlled and limited to anonymous memory to make the implementation simpler and since that is all that is required. The solution will rely on pte_mkwrite() to create the shadow stack PTEs, so it will not be required for vm_get_page_prot() to learn how to create shadow stack memory. For this reason document that VM_SHADOW_STACK should not be mixed with VM_SHARED. Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Reviewed-by: Kirill A. Shutemov Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Acked-by: David Hildenbrand Reviewed-by: Mark Brown Tested-by: Mark Brown --- Documentation/filesystems/proc.rst | 1 + fs/proc/task_mmu.c | 3 +++ include/linux/mm.h | 8 ++++++++ 3 files changed, 12 insertions(+) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 7897a7dafcbc..6ccb57089a06 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -566,6 +566,7 @@ encoded manner. The codes are the following: mt arm64 MTE allocation tags are enabled um userfaultfd missing tracking uw userfaultfd wr-protect tracking + ss shadow stack page == ======================================= Note that there is no guarantee that every flag and associated mnemonic will diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 420510f6a545..38b19a757281 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -711,6 +711,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR [ilog2(VM_UFFD_MINOR)] = "ui", #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ +#ifdef CONFIG_X86_USER_SHADOW_STACK + [ilog2(VM_SHADOW_STACK)] = "ss", +#endif }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index 6f52c1e7c640..fb17cbd531ac 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -319,11 +319,13 @@ extern unsigned int kobjsize(const void *objp); #define VM_HIGH_ARCH_BIT_2 34 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_3 35 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_4 36 /* bit only usable on 64-bit architectures */ +#define VM_HIGH_ARCH_BIT_5 37 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0) #define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1) #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2) #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3) #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4) +#define VM_HIGH_ARCH_5 BIT(VM_HIGH_ARCH_BIT_5) #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */ #ifdef CONFIG_ARCH_HAS_PKEYS @@ -339,6 +341,12 @@ extern unsigned int kobjsize(const void *objp); #endif #endif /* CONFIG_ARCH_HAS_PKEYS */ +#ifdef CONFIG_X86_USER_SHADOW_STACK +# define VM_SHADOW_STACK VM_HIGH_ARCH_5 /* Should not be set with VM_SHARED */ +#else +# define VM_SHADOW_STACK VM_NONE +#endif + #if defined(CONFIG_X86) # define VM_PAT VM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ #elif defined(CONFIG_PPC) From patchwork Tue Jun 13 00:10:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277729 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9340C88CBA for ; Tue, 13 Jun 2023 00:12:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 02D0C8E0010; Mon, 12 Jun 2023 20:12:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 003E98E000B; Mon, 12 Jun 2023 20:12:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DE55F8E0010; Mon, 12 Jun 2023 20:12:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C7ADB8E000B for ; Mon, 12 Jun 2023 20:12:24 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 95400AF12A for ; Tue, 13 Jun 2023 00:12:24 +0000 (UTC) X-FDA: 80895797808.01.877C9DD Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf19.hostedemail.com (Postfix) with ESMTP id 8D85F1A0002 for ; Tue, 13 Jun 2023 00:12:22 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Y9pbHl6R; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615142; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5fI2fz3C60mSydVE+2mwy8UwGqfU5Pn7fjr7hidomLc=; b=KdsgbmYp3w58p9RvLeJgRyklRTNj3rcd4IyAWGGWtVGgPSgm2ZiBf1ynKcz9+RIgN7xwGK 3pmjyTsB2l6YobFZNwKIPDTWxc1JdIFe3jsRUbACH+4I3pHMAD0w5zvQM2NSivWCzi4NWg 0G6JwqSMeZjVf6EXgkdKHSo/LKM3YWQ= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Y9pbHl6R; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615142; a=rsa-sha256; cv=none; b=0uXiu6aeaMiZALdzG3/E3OWglV5C4738aAd2+i3dwL3fjIKaTxdbDmw7/dzfa+7DyD8JyF DX2K8I7rLyni9VVq7mqa1LpKekUZH/UT6CPKoQn/dkS6NUGJMxGqCV0F+EMv6S/xDcxPK7 +ihB+1wpKOBbcnuMxV4PPA7NY4XtRvk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615142; x=1718151142; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VxsR8A3vgFZdU2Uya7vVGhJBClJyO0tAkA+BqmUO8Zs=; b=Y9pbHl6Rgr0QGmaQrexViFiapsK9KJ751LLn8cHKI6FH3jr9RukwNCkf 2mFQoeoa812EkOaG/c/kWlmvGGczOYjZhR13e4ICDyPmkQpWjSC1MNd+c UkK1YRfi85GCLXzAAp8YWOuaCqxq83YEC6yzFmYS5g+HhiLGr2ifZ2k53 iu2ozY0p+QCq0nKWmO28KfLHkfC17FgV3bVwDcCcQuoakkAgp0fpJECoU ikhiz2yfpBLx6cWrFe12gyJ+RMIrGsEqfTNc3TipONT1d+Xxz7JZYtgQB YAg735kcgj3Lvguii3UROk0WUBBqD5pDHWArW7Y7wdKnspPG+MqqjEe5Q A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557009" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557009" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671026" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671026" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:20 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 15/42] x86/mm: Check shadow stack page fault errors Date: Mon, 12 Jun 2023 17:10:41 -0700 Message-Id: <20230613001108.3040476-16-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 8D85F1A0002 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: t5hdtxbhgn1ycofuhfzdd6x8o4iw5fsj X-HE-Tag: 1686615142-283420 X-HE-Meta: U2FsdGVkX19OrtyeQGWFgxpkHWiEmvNdfI8txSrnLXaBJGfCY3aHeoygLdfXW3uF/8EgfTCefxQ3JiFaivnpS+MHzD9R+HVAKp2RuvJ8xEU7JVbBa0quxX3I6an38dp5/TINiI2N11V/w9FQb7HQES4Us7fzIafpsdYyaPXk907DezDd2KWvXF4qdFLjhdoiRRhMD555nmc/EoNIR7wO5m2ifiHcVwEaop/QndWC3xuR0cXkX2PBAK/5eRN0Nb28ymhW9CcWYg6L3TQtVaR9AqWaa5rQq0SknLLTwoY6ohiBl61VHZOlH+b3Bf3kO3+34g9yc6HJIeJIyALw0+0i2Q49FPBsow7hok1WQdclPchegVoXgBNsRi4Z9HXFnbEGHBgNEp0Y3yF+swOnqp6CMjYwxDJhoK2dB6/PHtxd7mMh+OkUtpagm0U58feLMis/GLlmeJbJfgQ/gZuEyvUgEfJv/IXTlLaXt6acZRPzPh5cOnEaMdk3YQJdzfuAiESlauh3SooFz23re+8IUNnKOBp0wQjpVPNPQIgTHRoyYx+oN7ZWSFMZUeM3aTZ+rg3ABcbvYHxN9MOLwbDvfORx8/RkMdq00fhg5gDCV/Ro3CERBmLhGB+tQTvKyKVNipnGVLM9KhoGxkRJHzUWXGIxMjMWTsW5tXTD7LlvaqStgw1E3Zg4gk3X7M0nopVqhsUN3IDOfWofrYxWVW9/MQ1OytnMNRakAGjbaf9kbNV8O7sGpQkbKADgMbtK+8/n2WMOB+9GQkRQoWCIHLP/FBhyqbUUGAWeHyYVusIKrv2efqztUAL+6ZnJFWlskt9451Z95BTzW7u+GrYQ3tgOG1fABit47tJGY+O0oUkBJ3PX4t8OFEu034UANgduZJXPOwUiW/lWw5VXyQxYV9MfNvmN37d09Tkvba77kiNuEhsw0viaUi0TUZYZsdpNn8Taoj6+7wWurxfQsA68ir/Y5vw TC8B/C+W UH/1VDchZZqH+Rgy1IQ3Gfr8X4q4PRL8s5OFYBMBtVA337exC3bupKUsSgt5KXLOUz7dCsMIjxRttLrl8Mjc7o4jMn1QJh8ZJBmQ9vWWicVuKej7uphWEUn3HRna0y3T5cRQf8pIY1NaaNLz+6VV28N7POiur9fyZ14uoo4v2LbGf9qZIh3wX0DWI1B6lVy/RzjbnysVhmvtkxCk9SuMRKwrHHJhK7jWl4gSzOz4ngjJvYh1suz8W2mKWQEE6JqO7nQCyyl5v1a4ogGu3EXk5jX3XfC4IbNDVx7rErEIhxd8jdfWGi+KAWiKG3T+oKs0NFVJ1p2Y78DRwldk3OUqFI9Fx0ciUies8CCETKJQ9ON7dHtSrohOukEuKTA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The CPU performs "shadow stack accesses" when it expects to encounter shadow stack mappings. These accesses can be implicit (via CALL/RET instructions) or explicit (instructions like WRSS). Shadow stack accesses to shadow-stack mappings can result in faults in normal, valid operation just like regular accesses to regular mappings. Shadow stacks need some of the same features like delayed allocation, swap and copy-on-write. The kernel needs to use faults to implement those features. The architecture has concepts of both shadow stack reads and shadow stack writes. Any shadow stack access to non-shadow stack memory will generate a fault with the shadow stack error code bit set. This means that, unlike normal write protection, the fault handler needs to create a type of memory that can be written to (with instructions that generate shadow stack writes), even to fulfill a read access. So in the case of COW memory, the COW needs to take place even with a shadow stack read. Otherwise the page will be left (shadow stack) writable in userspace. So to trigger the appropriate behavior, set FAULT_FLAG_WRITE for shadow stack accesses, even if the access was a shadow stack read. For the purpose of making this clearer, consider the following example. If a process has a shadow stack, and forks, the shadow stack PTEs will become read-only due to COW. If the CPU in one process performs a shadow stack read access to the shadow stack, for example executing a RET and causing the CPU to read the shadow stack copy of the return address, then in order for the fault to be resolved the PTE will need to be set with shadow stack permissions. But then the memory would be changeable from userspace (from CALL, RET, WRSS, etc). So this scenario needs to trigger COW, otherwise the shared page would be changeable from both processes. Shadow stack accesses can also result in errors, such as when a shadow stack overflows, or if a shadow stack access occurs to a non-shadow-stack mapping. Also, generate the errors for invalid shadow stack accesses. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/trap_pf.h | 2 ++ arch/x86/mm/fault.c | 22 ++++++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h index 10b1de500ab1..afa524325e55 100644 --- a/arch/x86/include/asm/trap_pf.h +++ b/arch/x86/include/asm/trap_pf.h @@ -11,6 +11,7 @@ * bit 3 == 1: use of reserved bit detected * bit 4 == 1: fault was an instruction fetch * bit 5 == 1: protection keys block access + * bit 6 == 1: shadow stack access fault * bit 15 == 1: SGX MMU page-fault */ enum x86_pf_error_code { @@ -20,6 +21,7 @@ enum x86_pf_error_code { X86_PF_RSVD = 1 << 3, X86_PF_INSTR = 1 << 4, X86_PF_PK = 1 << 5, + X86_PF_SHSTK = 1 << 6, X86_PF_SGX = 1 << 15, }; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index e4399983c50c..fe68119ce2cc 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1118,8 +1118,22 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) (error_code & X86_PF_INSTR), foreign)) return 1; + /* + * Shadow stack accesses (PF_SHSTK=1) are only permitted to + * shadow stack VMAs. All other accesses result in an error. + */ + if (error_code & X86_PF_SHSTK) { + if (unlikely(!(vma->vm_flags & VM_SHADOW_STACK))) + return 1; + if (unlikely(!(vma->vm_flags & VM_WRITE))) + return 1; + return 0; + } + if (error_code & X86_PF_WRITE) { /* write, present and write, not present: */ + if (unlikely(vma->vm_flags & VM_SHADOW_STACK)) + return 1; if (unlikely(!(vma->vm_flags & VM_WRITE))) return 1; return 0; @@ -1311,6 +1325,14 @@ void do_user_addr_fault(struct pt_regs *regs, perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); + /* + * Read-only permissions can not be expressed in shadow stack PTEs. + * Treat all shadow stack accesses as WRITE faults. This ensures + * that the MM will prepare everything (e.g., break COW) such that + * maybe_mkwrite() can create a proper shadow stack PTE. + */ + if (error_code & X86_PF_SHSTK) + flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_WRITE) flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_INSTR) From patchwork Tue Jun 13 00:10:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277732 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BA9CC88CB2 for ; Tue, 13 Jun 2023 00:12:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 165A38E0011; Mon, 12 Jun 2023 20:12:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 003FC8E000B; Mon, 12 Jun 2023 20:12:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DBF918E0011; Mon, 12 Jun 2023 20:12:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id CE5208E000B for ; Mon, 12 Jun 2023 20:12:25 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A212C140375 for ; Tue, 13 Jun 2023 00:12:25 +0000 (UTC) X-FDA: 80895797850.29.AF9419C Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf01.hostedemail.com (Postfix) with ESMTP id 8107E4000E for ; Tue, 13 Jun 2023 00:12:23 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=E24lY+sY; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615143; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hAQJJAeb9RDs/WyI+bBNY0zKz2ihMNwhX4fY+cexkp8=; b=JLwTbv+bUxksai0foSbzI1mYZwKfOf6nR0jKzfbsbD+Jq0/XpyzVxKq4u7UiGRnR4fSjKg tPVHXofEoDEma5xwtX9XSs75viZIjJJ7u/0xfrFozfBn0zU0dDDGqSJKxt7ZDTuAj0khmd 5N1TEBwmENiv5SYG4lHkFm3k6FR/p/Y= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=E24lY+sY; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615143; a=rsa-sha256; cv=none; b=Du1JXBnFbivfS8Ki3DNk0OwtYhaduk9gl0KjECT3MtYvqsB7Arm/5G53zYxwQmNAKu+/91 AvmPOeH6rWWc22dTU0uYkKBIHGqelPpfVAQbPKit7TZ9ZzuxAXkhdc6+VVVMet4EJR2u6i iigjjB3wUJ8Zus34KnW5f4r+nXJDSNE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615143; x=1718151143; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=oZdKPqbrR7Qk+U72in77o4gzn1lGdOYqXWRfhhqI1mg=; b=E24lY+sYpeSbvFXxXHey+/eJ41De23h2VjobwE7f8amV9/v1vftT2DbB KlbLjGSqyym9hlNwlFEdUim4zRmLQNku7Pct1jIbJqpREnNdXibzfhH4S nXRhsIwkjRkeYCn53H957UdotgN9hVkzlKgVhIOBymnp9COJVGDUIjs6u 1Tf7RqSmdKn6JtC9XNYj8y8bCBDo50shW2G2OaRxIhAL4sbPmHuHTdg8C YkkE4HEh6hxFg8rzBEsJq+dejKWU7OyVwzB3tNWhT5KCZk7LOkr4k5Dwi 9dzvAzZ+qLKPdLOad5mU5grj6jfHuHv3TQK3XVVYTJ4MOu16Zu2y0Dd6C Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557031" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557031" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671031" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671031" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:20 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 16/42] mm: Add guard pages around a shadow stack. Date: Mon, 12 Jun 2023 17:10:42 -0700 Message-Id: <20230613001108.3040476-17-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 8107E4000E X-Stat-Signature: ipb3m96333kp3nj9183x9z4dkokhgu4w X-Rspam-User: X-HE-Tag: 1686615143-547051 X-HE-Meta: U2FsdGVkX193kJySAofKBP7BjOrQg1yxH9bgWBm9cqwKbI57PFTEBZ44VlB01lYNXzx0kZYPosvuf1nnmEVnwg3RaMPrKlnctbioa7AGqxWtzT8hlR2dvGdpk+AQyxqydd1127Isb4POMp5/GnhJZfxBdUwLhz2vpTFdvabeEgOSgRMXCi7qT2tlooOLN9VjJyBuDAin+gRAAsHlmtZMYIsNxH7ww2rENppMxDY9Ya8SXNc9xFYRj/pLnc7k0Lt2VpbDrfJtqEMkDhogqVEX2ORjH6AtO6zNhu+eSckfItKbRjhtG2c5i2rVz34J7buBYh2Ru8dVXne9AT8z9osZ/rJ1ls1waF0a3/1ZRzN3noZK7uv0/D8miZuunfWGtlXjGjlsaXWJRxTuFd9vkI/4raKYaWkxv5techuxPGOEcYI747/iAp6DYD4oEG3woeLRrdaAhKPvAQqJlHgER+CHEkpTWbp9DpTGLxx9EHUNWYV2gNvxYKQJ5JgoUkB1xNILxXgihbHyCNWpZKJGhcWUbEeZVyLFQScJmdPfF8mZQrhN+BNzSBDu/h5nafOUeZlDeYxQhGN9UvE2oKtPro0BDDPLUjgFgO7uXsC9egiyBlL1NEpibDEDt7o+bk0bpmBEKSyEDKnZQSxPmxZ4/t8noqqE+zM8yHN8F89OUCBceBKZ2zXRTQKI9TQfjaCxSdEHTEQN9CgIgRsEwKGVrvwaXFpJ1SNmvGWN3XKizuU3JWGbUqAStBAEtTsQwSg195R0cN/Oy56w4MIt89DZW6wBbcbn6vtNjSkrOlBnp2PS4aY1NNj0Tmn3z24NFQhiTJV2jr5sYXUoz2QWAOhan2afSXzTd9wqezOeS+KjWSCyJm4oHg+0Fu5uKYMGLyWT2+cfBxWpUp2oMASkPZlSJxDfTUSUXogfvHey9aAkxTwrvp/EKHOrUxNoJfR5BkEP/crE9c8bUHUFfNntWz5FU2t z7OZlvFf wlH+KyQ1G2wBgv6/9VHUGBSmbHZdbJXJbZklz/E873YbZ3CtcGB1+F1rL/H2jmHr+Ucm+lHAx1th5Ez2RYN1uf0rslKZjqPHv7nErUKYwvxPz7eLWmGNTALYhPVnDD6u1B2DkxyceNdid6qEeCl/qk25jnqVoUTL2k8aocXiU1XGRxdSEABjpb96mHMxo3mtrLCw3SFiUgLOS9ju/L/iSnd+sfw3KVT7U2uAPvCRk+4gamLysRbVbBxLjK1wccz9xoBWOrlc99HMJDep9L3P6hqCvmPh3VgztQZIfkOZL+LsuGg5BfZju+8EDzhPk0TugzO44w5n7nuw7TFlKMQFk6LH02A8iY7Z8yZx2GnWf61v5c5Mmo15UNDgjYw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. The architecture of shadow stack constrains the ability of userspace to move the shadow stack pointer (SSP) in order to prevent corrupting or switching to other shadow stacks. The RSTORSSP instruction can move the SSP to different shadow stacks, but it requires a specially placed token in order to do this. However, the architecture does not prevent incrementing the stack pointer to wander onto an adjacent shadow stack. To prevent this in software, enforce guard pages at the beginning of shadow stack VMAs, such that there will always be a gap between adjacent shadow stacks. Make the gap big enough so that no userspace SSP changing operations (besides RSTORSSP), can move the SSP from one stack to the next. The SSP can be incremented or decremented by CALL, RET and INCSSP. CALL and RET can move the SSP by a maximum of 8 bytes, at which point the shadow stack would be accessed. The INCSSP instruction can also increment the shadow stack pointer. It is the shadow stack analog of an instruction like: addq $0x80, %rsp However, there is one important difference between an ADD on %rsp and INCSSP. In addition to modifying SSP, INCSSP also reads from the memory of the first and last elements that were "popped". It can be thought of as acting like this: READ_ONCE(ssp); // read+discard top element on stack ssp += nr_to_pop * 8; // move the shadow stack READ_ONCE(ssp-8); // read+discard last popped stack element The maximum distance INCSSP can move the SSP is 2040 bytes, before it would read the memory. Therefore, a single page gap will be enough to prevent any operation from shifting the SSP to an adjacent stack, since it would have to land in the gap at least once, causing a fault. This could be accomplished by using VM_GROWSDOWN, but this has a downside. The behavior would allow shadow stacks to grow, which is unneeded and adds a strange difference to how most regular stacks work. In the maple tree code, there is some logic for retrying the unmapped area search if a guard gap is violated. This retry should happen for shadow stack guard gap violations as well. This logic currently only checks for VM_GROWSDOWN for start gaps. Since shadow stacks also have a start gap as well, create an new define VM_STARTGAP_FLAGS to hold all the VM flag bits that have start gaps, and make mmap use it. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Reviewed-by: Mark Brown --- v9: - Add logic needed to still have guard gaps with maple tree. --- include/linux/mm.h | 54 ++++++++++++++++++++++++++++++++++++++++------ mm/mmap.c | 4 ++-- 2 files changed, 50 insertions(+), 8 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index fb17cbd531ac..535c58d3b2e4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -342,7 +342,36 @@ extern unsigned int kobjsize(const void *objp); #endif /* CONFIG_ARCH_HAS_PKEYS */ #ifdef CONFIG_X86_USER_SHADOW_STACK -# define VM_SHADOW_STACK VM_HIGH_ARCH_5 /* Should not be set with VM_SHARED */ +/* + * This flag should not be set with VM_SHARED because of lack of support + * core mm. It will also get a guard page. This helps userspace protect + * itself from attacks. The reasoning is as follows: + * + * The shadow stack pointer(SSP) is moved by CALL, RET, and INCSSPQ. The + * INCSSP instruction can increment the shadow stack pointer. It is the + * shadow stack analog of an instruction like: + * + * addq $0x80, %rsp + * + * However, there is one important difference between an ADD on %rsp + * and INCSSP. In addition to modifying SSP, INCSSP also reads from the + * memory of the first and last elements that were "popped". It can be + * thought of as acting like this: + * + * READ_ONCE(ssp); // read+discard top element on stack + * ssp += nr_to_pop * 8; // move the shadow stack + * READ_ONCE(ssp-8); // read+discard last popped stack element + * + * The maximum distance INCSSP can move the SSP is 2040 bytes, before + * it would read the memory. Therefore a single page gap will be enough + * to prevent any operation from shifting the SSP to an adjacent stack, + * since it would have to land in the gap at least once, causing a + * fault. + * + * Prevent using INCSSP to move the SSP between shadow stacks by + * having a PAGE_SIZE guard gap. + */ +# define VM_SHADOW_STACK VM_HIGH_ARCH_5 #else # define VM_SHADOW_STACK VM_NONE #endif @@ -405,6 +434,8 @@ extern unsigned int kobjsize(const void *objp); #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS #endif +#define VM_STARTGAP_FLAGS (VM_GROWSDOWN | VM_SHADOW_STACK) + #ifdef CONFIG_STACK_GROWSUP #define VM_STACK VM_GROWSUP #else @@ -3235,15 +3266,26 @@ struct vm_area_struct *vma_lookup(struct mm_struct *mm, unsigned long addr) return mtree_load(&mm->mm_mt, addr); } +static inline unsigned long stack_guard_start_gap(struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_GROWSDOWN) + return stack_guard_gap; + + /* See reasoning around the VM_SHADOW_STACK definition */ + if (vma->vm_flags & VM_SHADOW_STACK) + return PAGE_SIZE; + + return 0; +} + static inline unsigned long vm_start_gap(struct vm_area_struct *vma) { + unsigned long gap = stack_guard_start_gap(vma); unsigned long vm_start = vma->vm_start; - if (vma->vm_flags & VM_GROWSDOWN) { - vm_start -= stack_guard_gap; - if (vm_start > vma->vm_start) - vm_start = 0; - } + vm_start -= gap; + if (vm_start > vma->vm_start) + vm_start = 0; return vm_start; } diff --git a/mm/mmap.c b/mm/mmap.c index afdf5f78432b..d4793600a8d4 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1570,7 +1570,7 @@ static unsigned long unmapped_area(struct vm_unmapped_area_info *info) gap = mas.index; gap += (info->align_offset - gap) & info->align_mask; tmp = mas_next(&mas, ULONG_MAX); - if (tmp && (tmp->vm_flags & VM_GROWSDOWN)) { /* Avoid prev check if possible */ + if (tmp && (tmp->vm_flags & VM_STARTGAP_FLAGS)) { /* Avoid prev check if possible */ if (vm_start_gap(tmp) < gap + length - 1) { low_limit = tmp->vm_end; mas_reset(&mas); @@ -1622,7 +1622,7 @@ static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info) gap -= (gap - info->align_offset) & info->align_mask; gap_end = mas.last; tmp = mas_next(&mas, ULONG_MAX); - if (tmp && (tmp->vm_flags & VM_GROWSDOWN)) { /* Avoid prev check if possible */ + if (tmp && (tmp->vm_flags & VM_STARTGAP_FLAGS)) { /* Avoid prev check if possible */ if (vm_start_gap(tmp) <= gap_end) { high_limit = vm_start_gap(tmp); mas_reset(&mas); From patchwork Tue Jun 13 00:10:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277731 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85C3FC88CBC for ; Tue, 13 Jun 2023 00:12:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C61908E0012; Mon, 12 Jun 2023 20:12:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BEB868E000B; Mon, 12 Jun 2023 20:12:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 841178E0012; Mon, 12 Jun 2023 20:12:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 69D898E000B for ; Mon, 12 Jun 2023 20:12:26 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 477D61A0382 for ; Tue, 13 Jun 2023 00:12:26 +0000 (UTC) X-FDA: 80895797892.23.E4014C2 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf08.hostedemail.com (Postfix) with ESMTP id 3DEA2160011 for ; Tue, 13 Jun 2023 00:12:24 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=L0c1KZIy; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615144; a=rsa-sha256; cv=none; b=ApZuvgZ+R9BY+HmNeaEt8OKjxbz6ehQuNl8lJBW0sQpHBviXyJCoqeRKNINWsTAwjs+eSg q4/GO3C6UJtInH8GToy0cxPHXwWaxo/FGfBJaQY4OKM6uvgUPJIYCnsRCitx2BakBC26XP 9jts5CU5pcDPdiQ7Gds7hY/RJRMU2wg= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=L0c1KZIy; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615144; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=p4A9Ag3Rbq1a3tkkCFr8IkGFKL7/tKsugwm4Dh2bjeE=; b=H3rLt15d8kXNjV910tUfjNYoOzl9guM1ejNFbgbLN3c3SO00Bm2Tf+5yxJGq3/zOdc3mn6 Yt1AhJJT5xAOo/GfSmqvCTUuBCGqUal3lzm9meY6r621YiNpP2ay9E4LI8zpSQ7KXigAMC yzODAeHATl659KZuicTJ4LiEXKJZ29A= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615144; x=1718151144; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dhZWE6YTdZ+FD9PI/U6uJC2MaFgGrtV9CcOxkJ3S4o4=; b=L0c1KZIyHZrSZTFQuexQ0s/JwUr1mdPEJXrq4kklA/a3JhLb84AbgSWM 5gDY+I3FCVYVrIb5I+05V3QvBSk0rjp5J1Zfj6n76/H2p1yF+XXp5rHKB 5CdA8a7fX4OLajvvIZlNmyZL9bXeY5+WQZ3StZnVKXCap8mv2i8+m9yaL 8y2CF5Sx34S0ikrKDUzhEsackDu63TxgYAre0AQ06BWmBJ339r4yg2kai 4WSu3o93YpB6gTk8zXBcyyYeVU+SpIGbFvaPUc8kUA9wdup/ptxOb/trR egq82eqBxQM7O67NDoBEpimPrt5jD5xMFv0brQZNQsLkbyujC066Gop8E A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557058" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557058" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671035" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671035" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:21 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 17/42] mm: Warn on shadow stack memory in wrong vma Date: Mon, 12 Jun 2023 17:10:43 -0700 Message-Id: <20230613001108.3040476-18-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 3DEA2160011 X-Stat-Signature: fbgnrifd1xiz7fnnre9ye77kw5o173dp X-HE-Tag: 1686615144-109328 X-HE-Meta: U2FsdGVkX1+YckHE2QbL3Nh6IB5NkeabZJD+LIpfuKkyDn6gUeJYWSd0gE4XTkjLGjSTB8VtB6QtrEStjH5vjbL3rFu4hYScCIl2HkOxH8APVe2lsyd2H7jjqBWacJk5e2HaU7SA4z1tnelXAGfWDFLbCFzPk8AkMtpqQG+OhEQObOO8yTxcWxu7qD4gVU2JlIoENTWKGTEa6majA5qZVePjMdpAsgXBdH32tPQOk5fcpBvLQuC5mUslgYIh2xwn4BFqU02SocN1p7tF7qPspUEH87x510Jw5dpc9usaKUwlGOI8aAx56yrGFEgxb6yDBP46UqQGzWYbX9uQNcvpv8MtO/lcvBiS5MMNtHRmm0Ugc0hvSxD8reZlMSN2/RIng49GghTSL1Tuz1H9F7shiG5NMTFgm31MoWZiEHyiOdbhwvO1N9kp0BS0YyMRb8ZxG2v70rosFOS0+WdBIV/foZrYDnBZUjFXz4F2K+fA8XiBQKM7xgDl9DRKbupxij3T4GNMs29rIMKpkqbrjSJpOtnuv0WCyxWS4/9hK5G6M181G7zD0DKXsGQndgh/PJO09k/dvTvn/xAr3mddZ6XwneqzkoZd7BsAVBcj7D92ZbX5GI/rQIjyr2nfksgHQJHz23VEhFR6fvOoKHUoONXiKx/nTRUC0wP6AmLxKaGOL+kZsomEqMXwvlrGk+lFi6ntQ/A2GMreeOit7YlLx/ieQTW0Stc3E+MAz/OX17N0xSPhCPldkm4t2t/0wVC6+aQ2K6gFDU7BtOOVSCyd4AQSxl2Wer1leoKGrCvp5pWUIUYaBXziJzbA7e/dFCIgKgOkzczNi1pk+EArB4IyVfdyxvFOediSyu98Z8uwIohPRbVvho2kiVcGlViUKHSfC7cGJIEWSLci65qiAkGt/v/6V4FA6c4lbs1qwOjnjO7mX1mo82l73SRIdSWfsIhBz5R4quhGt01GQwPT7bAqW7V Yqu5zzhM RR/Gr2/G8Cr30A60wDvfpvAX8Xy1Va1AItbCwWiUdUmjlIJ0TcBqKJVXMePLkPThLvImnDMiuWF6nlz0AtzwbzczEC9296FMKy+xgWK1MgMXr5eMFuX9vE8zbk79QM5NAEIrCQfmlyWHwtVnXUsxBMF+ZZSue3kyj0iR7KGKD6daDknOvcw9dwdzu0lGF+UF6sxy2HFMcIUP1Z3A80tS6wJFCFSzb6EIqSwe/0M37XTZM3nWWWl1gZYBgG7+4ZtRmtbVkkjH3R1LFSZM9mUBDCafXCYrDYlPxcglzu4PtNnyTW/Ezi1cGoiPhCrj+23c2MjyMhlbJ/phXhkRFsHoUpqeqy4ZuWmOt8w3llQW7HhjIZfhf4e9Mf22loA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. One sharp edge is that PTEs that are both Write=0 and Dirty=1 are treated as shadow by the CPU, but this combination used to be created by the kernel on x86. Previous patches have changed the kernel to now avoid creating these PTEs unless they are for shadow stack memory. In case any missed corners of the kernel are still creating PTEs like this for non-shadow stack memory, and to catch any re-introductions of the logic, warn if any shadow stack PTEs (Write=0, Dirty=1) are found in non-shadow stack VMAs when they are being zapped. This won't catch transient cases but should have decent coverage. In order to check if a PTE is shadow stack in core mm code, add two arch breakouts arch_check_zapped_pte/pmd(). This will allow shadow stack specific code to be kept in arch/x86. Only do the check if shadow stack is supported by the CPU and configured because in rare cases older CPUs may write Dirty=1 to a Write=0 CPU on older CPUs. This check is handled in pte_shstk()/pmd_shstk(). Signed-off-by: Rick Edgecombe Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Reviewed-by: Mark Brown --- v9: - Add comments about not doing the check on non-shadow stack CPUs --- arch/x86/include/asm/pgtable.h | 6 ++++++ arch/x86/mm/pgtable.c | 20 ++++++++++++++++++++ include/linux/pgtable.h | 14 ++++++++++++++ mm/huge_memory.c | 1 + mm/memory.c | 1 + 5 files changed, 42 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index d8724f5b1202..89cfa93d0ad6 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1664,6 +1664,12 @@ static inline bool arch_has_hw_pte_young(void) return true; } +#define arch_check_zapped_pte arch_check_zapped_pte +void arch_check_zapped_pte(struct vm_area_struct *vma, pte_t pte); + +#define arch_check_zapped_pmd arch_check_zapped_pmd +void arch_check_zapped_pmd(struct vm_area_struct *vma, pmd_t pmd); + #ifdef CONFIG_XEN_PV #define arch_has_hw_nonleaf_pmd_young arch_has_hw_nonleaf_pmd_young static inline bool arch_has_hw_nonleaf_pmd_young(void) diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 0ad2c62ac0a8..101e721d74aa 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -894,3 +894,23 @@ pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) return pmd_clear_saveddirty(pmd); } + +void arch_check_zapped_pte(struct vm_area_struct *vma, pte_t pte) +{ + /* + * Hardware before shadow stack can (rarely) set Dirty=1 + * on a Write=0 PTE. So the below condition + * only indicates a software bug when shadow stack is + * supported by the HW. This checking is covered in + * pte_shstk(). + */ + VM_WARN_ON_ONCE(!(vma->vm_flags & VM_SHADOW_STACK) && + pte_shstk(pte)); +} + +void arch_check_zapped_pmd(struct vm_area_struct *vma, pmd_t pmd) +{ + /* See note in arch_check_zapped_pte() */ + VM_WARN_ON_ONCE(!(vma->vm_flags & VM_SHADOW_STACK) && + pmd_shstk(pmd)); +} diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 0f3cf726812a..feb1fd2c814f 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -291,6 +291,20 @@ static inline bool arch_has_hw_pte_young(void) } #endif +#ifndef arch_check_zapped_pte +static inline void arch_check_zapped_pte(struct vm_area_struct *vma, + pte_t pte) +{ +} +#endif + +#ifndef arch_check_zapped_pmd +static inline void arch_check_zapped_pmd(struct vm_area_struct *vma, + pmd_t pmd) +{ +} +#endif + #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long address, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 37dd56b7b3d1..c3cc20c1b26c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1681,6 +1681,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, */ orig_pmd = pmdp_huge_get_and_clear_full(vma, addr, pmd, tlb->fullmm); + arch_check_zapped_pmd(vma, orig_pmd); tlb_remove_pmd_tlb_entry(tlb, pmd, addr); if (vma_is_special_huge(vma)) { if (arch_needs_pgtable_deposit()) diff --git a/mm/memory.c b/mm/memory.c index c1b6fe944c20..40c0b233b61d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1412,6 +1412,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, continue; ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm); + arch_check_zapped_pte(vma, ptent); tlb_remove_tlb_entry(tlb, pte, addr); zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent); From patchwork Tue Jun 13 00:10:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277733 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E636CA9EA0 for ; Tue, 13 Jun 2023 00:12:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 536D08E0013; Mon, 12 Jun 2023 20:12:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 422468E000B; Mon, 12 Jun 2023 20:12:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 200108E0013; Mon, 12 Jun 2023 20:12:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 05BF78E000B for ; Mon, 12 Jun 2023 20:12:27 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id D12CCAF34C for ; Tue, 13 Jun 2023 00:12:26 +0000 (UTC) X-FDA: 80895797892.21.D842B65 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf19.hostedemail.com (Postfix) with ESMTP id C5B891A0008 for ; Tue, 13 Jun 2023 00:12:24 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=nhNCy4eJ; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615145; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xbWC9KPiSk6+i/caVwFEQQ8XwmMbX5MPJXovsyxTZuw=; b=d6O0o0DsLhbKDWe+2ZrTaxkNXXQKQhQ1jnfvZBAXlBuQQrSjlEmUJjY8gkzR31wpNYW7cd HhZyAlkvwjlNW22WqQOYo0fN88MsoAUg8bVRz2zHgCmOFkh9u0TIdZZrOXHWnFH2Ntn+2y 0GQ5FEoPwmTEvey30/NS8EB9emnMWRA= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=nhNCy4eJ; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615145; a=rsa-sha256; cv=none; b=tLVek+BwJH4Kc8CcWlGHHq2O7DSf22iAnPTeXIYVq1U9slkToV9A7w230fbJqGLpr5zgIJ 1skYOzb1gsKllRdh6lQesNDmrRweXEmlvWqyAlOLbpVNYOzuxWFFGitegdt/BBPDg9NUpH 6n6YIU47J7+R5ToR7ZJQLChKl6WyBHU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615144; x=1718151144; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ioNZyjx88vkgqQos5HBjwuXN8Qld1af278Y586V7kGQ=; b=nhNCy4eJLgxgW6pn2LQeRSYTdSG8t7tbZikqsEceHQ1cTwpzf900cVV1 c69UlzN8dKw8Jinq7fUaRDw/mP6sWl0G5fW03MEl5X56tWB6o3QJr3EHA wHA5Y9lbQzzUB6ws1b1rxT4pfUSosbR6jAWJxSuvXXpKoNBfMROT4MwVX ZPL2qtRR4iqOZ0gaDPiUm4MTe+hcwy3KFIp6aCCXIxw9oVrfhOxyuqRAi wzXygGMDKHFZR/9NJzVN4m9yXKHbnVCxqPKv7jTgdGDvOCVEVENjmZYWj mSRjGrbnl26fQIv639mlw6dS0UaaWIHzFcxCgMqdVZAk2q49S5bfZSpYe g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557082" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557082" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671040" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671040" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:22 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 18/42] x86/mm: Warn if create Write=0,Dirty=1 with raw prot Date: Mon, 12 Jun 2023 17:10:44 -0700 Message-Id: <20230613001108.3040476-19-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: C5B891A0008 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: e6qs8aub848ug37fpd1hotounfhhoi6h X-HE-Tag: 1686615144-121254 X-HE-Meta: U2FsdGVkX18ZNAuWH26XI370FSP/xAzxuYuJbMWwb/SolbxDK6YFayYxy0HX4pmJXgBIy2m1XzcDG8VEz3DWh9AHAfSufD/PmtCJmDDnBMAHTaAfLBk3C66Br+JS6/37/u+sf+R+7qGFngzErz5ccsJP5Uvu0NNofp36Q4AKV+3qK2Fm20+RnUrTskdq7DBodZbaRoB9NTipQ4/YfOcabTx7dtWpgY5+cxEV4ouHT1hACWordtvHXOHYgdHQNjfkrEml7cxfqix/a0U/e51r2HsTHl8Roj9Au0FFsvAtbB3FX4MXoaZAp42dN6P+m8mw8IY4jUr0ydcV+OSv/XIUbzg2jdJklCKcvWQeFXim256bk0dZVvdRuOpIKmBD+2218icoREpGoHIPmEQwBAx0dr3XDz7gGQ57GhK7tkgTPPutJ3tQjqj8BRBUepse2+Gkx8p6aNKX4NPotB9Qe+S1Rw06PZ932XlNXPT/vEUUTxu1ZRfkAI+qNVeMp6HFoZWk3+nGT4sWG3wwp2yQa0eEFeC8HqAxleaV/4F+8MNOqrknRjjZAaNy9VSKRXOPpOSBnZkTHYi+mbwJb+C07I9B+dYxkjNtuWOVZ74psTUtvUHed0ZuFQ/LRiwvUO7nukEOOKIZP7JSDU8Zdm/DlX126G7aGP1gk/1NJIU3Wbb6VNIvS9St7p2icxF7FbDthFFVPC4eIbtpIE3bkmkQA4QjXkRY8wRCq2lVkmLxF+96WJuCfkE3taPLUPi3hTuK4WtsGm8JFX7tiN8RL0TjFdayVp2fM0nzooyyuFx5Y2rEbjkR5FjfBcvobOVn0/ol4n/yBz45izNRNEn86RyHWXusVA+I2ihcWRUeHcNLb3O+hPDEo6R0OzGaE3HXtMDYS0ryalM32Wt6nuAODpfrgz8V4+ZXfnmASoAvuIktfaVdzsIUSFYaJIMYEefJNv/TCRcPT3gU9roLFspb8oRpLUJ SUCp1cf7 cG3ED+WDZyJd6BUPRIVceQX34sLQfkoIkCZOWtnqLyhtZkfnbJd+ga7xMdX1kN1BQS0pO6GMOf5dwy74LbGCKNnAWqQy/DSDt9DHPvmhV+Lc3YYKXw1PZvEqGOCCKLZqANXu+h6xVIzpK76x9WYIzUa5knjEvyvCy08rvaE+qyEPUK9zRvQI98cq07mpXOBzjSiIjqeQc2oGo2hXlgXy4BrAUXOxvCtC8pMM9u1SehoeWIr6gBIbd34xcjWowSaf5miQaIWY5qYdciLC//1Yf75OLt4np4jmbvgMTU5xqSkzPWzlG3Pn3M+ZC1g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When user shadow stack is in use, Write=0,Dirty=1 is treated by the CPU as shadow stack memory. So for shadow stack memory this bit combination is valid, but when Dirty=1,Write=1 (conventionally writable) memory is being write protected, the kernel has been taught to transition the Dirty=1 bit to SavedDirty=1, to avoid inadvertently creating shadow stack memory. It does this inside pte_wrprotect() because it knows the PTE is not intended to be a writable shadow stack entry, it is supposed to be write protected. However, when a PTE is created by a raw prot using mk_pte(), mk_pte() can't know whether to adjust Dirty=1 to SavedDirty=1. It can't distinguish between the caller intending to create a shadow stack PTE or needing the SavedDirty shift. The kernel has been updated to not do this, and so Write=0,Dirty=1 memory should only be created by the pte_mkfoo() helpers. Add a warning to make sure no new mk_pte() start doing this, like, for example, set_memory_rox() did. Signed-off-by: Rick Edgecombe Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v9: - Always do the check since 32 bit now supports SavedDirty --- arch/x86/include/asm/pgtable.h | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 89cfa93d0ad6..5383f7282f89 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1032,7 +1032,14 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd) * (Currently stuck as a macro because of indirect forward reference * to linux/mm.h:page_to_nid()) */ -#define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot)) +#define mk_pte(page, pgprot) \ +({ \ + pgprot_t __pgprot = pgprot; \ + \ + WARN_ON_ONCE((pgprot_val(__pgprot) & (_PAGE_DIRTY | _PAGE_RW)) == \ + _PAGE_DIRTY); \ + pfn_pte(page_to_pfn(page), __pgprot); \ +}) static inline int pmd_bad(pmd_t pmd) { From patchwork Tue Jun 13 00:10:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277734 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8381DCA9EA2 for ; Tue, 13 Jun 2023 00:12:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 845338E0014; Mon, 12 Jun 2023 20:12:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7CCCE8E000B; Mon, 12 Jun 2023 20:12:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 55C3D8E0014; Mon, 12 Jun 2023 20:12:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 358BB8E000B for ; Mon, 12 Jun 2023 20:12:28 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id EB4521203A3 for ; Tue, 13 Jun 2023 00:12:27 +0000 (UTC) X-FDA: 80895797934.18.F8ED8A5 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf01.hostedemail.com (Postfix) with ESMTP id D57D94000B for ; Tue, 13 Jun 2023 00:12:25 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=M8hE4MuI; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615146; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JK9+ERy4F36GfmVdpqghccuIPuJ16HTtI5Wly0MCYFI=; b=dPomzj7BUnprQiU3VcIZtKcMS65uOtb7rRluPs9EVtyvAV2CqhEYOuNI06GyJELsxVwu0p eb4nW7o7Imvh1+pKwEdemg/Xn6MrqMG1aSU1LFZzXOw0cR989osJM84201b5sd0lCSpB0x ER1FNOcwZB5cAD0lkKnfaQIsWfKIhxE= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=M8hE4MuI; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615146; a=rsa-sha256; cv=none; b=lV2Esk8Ycu6szVkJZzOv7oSXf3HW5RS7/ZkRSwtebGuL7RRrSdYcwYHf1SHSM0wAAes6pn H9Df+OhVuNuM0Gln6apN0Lp3mtQDyRusKzouhtJ6gsmvuO5hyUa41rro8zRiYQDQ5MieRD +kWmi/7UfnQhs4Bn6ZrKDL3zOQIFgZ8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615146; x=1718151146; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0w0eM3eQB07Ass1qJGPwxV0mmwv4IZmjBMBIbi82ENo=; b=M8hE4MuIYDYGWzsfUHcrDSkj+IAl5g8+jYV1VKdduJa3fWngXvbPkoOj 9TqQY2/qjBaF8tM4WKdJOe4tjOekFLTbUx7cfrEafHXE2VIRFOrs75GLd MYUKe8JmKatGeY/PLGBz3Ov0QdmpN6qduIFJak29hAsRNFMj5D6ZEi8LB l/uJ3ZdLr6LIQWHTRNY7k3b207CeCzrq4Uyr7ZsXaN0nYcrvA2SDvhZX4 25NLS+rlxd3DXk+eJqMzbFNRCu7qvuR/lJvVTldOSNZ5IatXXqrr0JUlq ValndaW/kKjD+K1YK45b1r1Frz82/Ohft+yK2ydg3arnKSSt+3oQrWGFg w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557104" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557104" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:24 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671045" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671045" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:23 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 19/42] mm/mmap: Add shadow stack pages to memory accounting Date: Mon, 12 Jun 2023 17:10:45 -0700 Message-Id: <20230613001108.3040476-20-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: D57D94000B X-Stat-Signature: u4k8nhia3ta36qb11g4nbf8yeati5jk5 X-Rspam-User: X-HE-Tag: 1686615145-444143 X-HE-Meta: U2FsdGVkX1/hMVcmQJMCw8aArlYKvTT8ojAloik7N3qptzc4HSzzbTD9H3YwfgiAdWX0oJMGWQNGs3izhgnhdlsazHAST8X+6wUlQe9aIHFVzACdrMZ4AOU0L8BM4zd+gGflZRF0W6dHRmnvwVc93RicokoGSHeotZx2QVBo+Wn6xgUewSToYVfnR+vbWl5h/t2FeHQJ/ecc6BBf0gLdCYG/Jb2h/QAh9rTF55104DgeRnXm5n6FRB1nv9p8fYOTUk2HxXi56c8XpvbkO+z/UKD7yvceHzggl3hgN+96x1vI1fuk97rtKZitMNDyRKLfupCVrSL4ohdO7F9Ct1EQXYdPfqq3bSxP9VpSzFnDXKs3q0PuAZ9QKOrjJ+MFhrZRyyHeWOdQH8wNnodeRw50PmL4y6SWH4lw9nor/2vx4/C7C0uiQjdJC0+p7yMDpPCG3zQyXDMoQJmGqhxuz7Z7L0yDq8QybPouGhcP66ZX1vcOIMKVhNGYYfv1k6q4BuUn8CjnF9GF0Opf/VQ4GMj0J+meXipmhqWfCddlhl1bG4Stzhfy01wOV3F3eo+OVh0YxOfmDMvQbkB2pjg1sl7mXxHYU4ORgcS4fQLJksoJWe5P4RyuZXZNKyIVvQeBaUdTgmKsp67VDJxg5On4Sz2Eia226ojYEZ1g5EV6KUxYM3drOtdAZ24iWgUCT8YTo3vGbfYS9wAZgko+yUPVFGfppSwuagv/yVTILwGmEZCePaIPYBVjVPEn7Pivq6q79ynY+wOWQxBrV+fLpA/12zH7Xho9gREOfca6WLv0DDzyWQ8VjCjw1mi9D4aY1xDhqXfALhBrXmu+W8RpoeQr4pvu8t44/AUDnkl80cZ8oho94XKTKrKpHamoVjDwQmO3NJNCgsP0C58VAvCor/n8gh4AephAvlbtOIHaFcjEbZCnPaDiHYhsEk1m6PUjE2txpgGwIVduaMyO/187jan5PZx lJTYaUPX jw9/OnkG5VH0vVpo/J35rt0sm3i8gRLyIkElXjjmy7pZmlupaXBucs2ZhsXBxhHqTU+YAil8ooy85KrD14GLklK8oEDsaI7BawqAHjTu0Z/ZQ0mQpEq32iDvGq5ji7KIzIFxFjiCj26opBNxY7aVi4a4DGkRgxx8FmBbnOc3HxsiGWshCCYvdOYCNXy3vonYEWEqqiesSI+NZLknla5auPNzbUgL/w3dOCQ5wpu5TDaUXIbx7gC43TBGHgV91Dvqa3/yXATz4J+D720zYWo5vQzWwWNlRhWW6XjR4J3NEVjWrWSuzWyFE4YnqfQHRjzOj2EomjONt0pVAML/HDlFJ6S4++KheMvmMOYH6htjWKg4AGkBEJHNFRXHOdibCZHFQsb7KsjZnV+bXlC0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- mm/internal.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 68410c6d97ac..dd2ded32d3d5 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -535,14 +535,14 @@ static inline bool is_exec_mapping(vm_flags_t flags) } /* - * Stack area - automatically grows in one direction + * Stack area (including shadow stacks) * * VM_GROWSUP / VM_GROWSDOWN VMAs are always private anonymous: * do_mmap() forbids all other combinations. */ static inline bool is_stack_mapping(vm_flags_t flags) { - return (flags & VM_STACK) == VM_STACK; + return ((flags & VM_STACK) == VM_STACK) || (flags & VM_SHADOW_STACK); } /* From patchwork Tue Jun 13 00:10:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277735 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BE25C88CBA for ; Tue, 13 Jun 2023 00:12:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 18C458E0015; Mon, 12 Jun 2023 20:12:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C8A18E000B; Mon, 12 Jun 2023 20:12:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DE7498E0015; Mon, 12 Jun 2023 20:12:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id BAF868E000B for ; Mon, 12 Jun 2023 20:12:28 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 9325014035B for ; Tue, 13 Jun 2023 00:12:28 +0000 (UTC) X-FDA: 80895797976.10.8807531 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf08.hostedemail.com (Postfix) with ESMTP id 62B3C16001F for ; Tue, 13 Jun 2023 00:12:26 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=QIwfNVHS; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615146; a=rsa-sha256; cv=none; b=wOxJbmPs6Z4bhb8JqhrdUGll2RY4WpxQCl7ZtFaUH4hcBKvtvTv8qdCMgm0pq8ZhUWZpxi qjSbQ1zztbKRpW1nDQmCVNTaXTOk8a5AvmBpZ88WV7R36JU3Ry3A/2jJKQYxVUebuCSu55 UFq8knrIrXJCF6Dkln/rVsWJbPPM0cA= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=QIwfNVHS; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615146; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eSpfbzj81RJHeXXJwuREGCXBeYOUsrYFIe5C3KzovZA=; b=pkXOhsT8nxlZW3byPQmagOqKWJQwvH5IuyAmbN9SnOhJqwasRGpuPMzjzIJMraI/GfjYgd sfTkFF9Ub9J9ccMfXK8xUqJIMugPWBLKRkh6ZOVMjNrX2XMymQo+zGM0BdDnnyx2FCfhG4 ntt1sh84nO07LYYTMs/NgHMqfcT8TKA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615146; x=1718151146; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EOoWyIFrsGRkjg2/O3hP5P9SjQgryxLlQpl2aAV/598=; b=QIwfNVHSnmdVrL/U9bd6je3EgR5A31QT9ua6exiMSxeJFOJnv+XTXOb4 EYNcpsoFqeepxMtWjAsClgGWdPWjH9JQjxTuMsqmfO5pBHtFHM/MOSSfL PwKeq+TFOH1PVTojZUlh96pKXpKkLO7NBaTVh7iAEItsJtl4soI6nf1yC e/6Leo3s+LGZS3rv6RxpsOnQgKvoViDNlfGNi6ezlgCYFL6ttgtSOzcrU Xb+XzHAK31LHLPN6fjdtBnSswVhTlpUuLnZNgIN9PtkuUeraLKvigY/dC UVPAmHaNGTI7zEtXqhkr/YA2jIgo4nhYve03F8aQ2dVbUF8xw5Fs6bMOh w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557128" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557128" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671048" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671048" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:24 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 20/42] x86/mm: Introduce MAP_ABOVE4G Date: Mon, 12 Jun 2023 17:10:46 -0700 Message-Id: <20230613001108.3040476-21-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 62B3C16001F X-Stat-Signature: rtde1eb878gg39kuufibu3nyapqn5jb1 X-HE-Tag: 1686615146-607166 X-HE-Meta: U2FsdGVkX1++MOhOb9gDgAcKXvBTHLpI4TSPAb2wG4c3SF9cLJDR/sFgR0iRnamaxyNeN6EbMW9BXI4zooQQgpvepjmYPMo1dWeIXvzWYmL+2ak5fGPWuf0rWP012yclvdHQezILdf9HExQ3gzjG+V8+yal1k7+T9EBerEbTSvg6d0ulVya1qMFVkTSt6X5vPdMMK+7Dw5QvyimdfZM+36rn08ZKut+jLFRRKzxIAvDPJCDsVE+H/NcbVIGEioy/Rs/Ws+jtsrpVfcsO/mFgEgmWd/x4VknaGqkNBtp0lvX951g7cToUyFoDK71hmTydphyq+645AYixsf0BaYdZLulJQz7qpoKOrSK0vJRq4fcecvndYBpuQUgUh/gd4lM/8twdWHt0gnfG30IS+kgd2xJzZT7TGRMOofsHa0Crjj9SWa9luTyWYFPrn6huOh0QT+haEpvfozWJWV1hgf9tokoPk8Y6xMZid1c3HBYj9RPDlCTCaOAuvjDOUz1Y5DWmYm43PasYKV6JoevRGpAnLGa8DpSOvyD6vBtrcnF5R4t80d7KiO4QTDGP0c5ehLuR6mbA1Amer3hmR3+Z1cGuD5ZFXixNhgoQfq1leVocBCDdxJq/tOVzPjyfPetb6x93XqW4MnDfPoUa+dZ7Ay8/UYD+n8M5EUJK1a2NTTXzato3G6ynq3Wk7Xhq2eiT0q/PE7ikigGFW5qKy+zHadEGh3CCSCkUjhW0Mav8xani58059FWFH1ORSbye3XjOO0jL/tBYS0QxI5K9VvZCL5UygWKb5UIjuynazcKVnsJfSsBZ2zAgHRGtG/l1CHhV/3uTkctRY9Nez4JczT4wuSxAbO95kO5jyHJqTbe7Xufy5eXCZ0snm4mihWNd15Y0MgkoZQKwgcL4eTmMP107nUL0Z7qIpY6pQyBPxgHQmhS+TFwx46VyNLLMO+t8/B+2hm09BjlJAdw/d883Oq4FxNE Flo6DVDo gab5wvYY4GsbAhNbu2FVNrKApoREkqq5G0OhvXF20eDX51VEkjlfHQJnYD/SmOmkPBOtB0avT0Ds5MOkF/nngu+X3B7yFOz5xH9prKd23QMZ6NXr8UasaKx+dS39TSqq5Ny8V7B+fIbmtwL9QhKBKb6RbTrjQESeiEBixoPG1qAIvyJpsNI0K1/J2SbvOQYJ6UIGeiNW7zig7e2Z3hTb3rK5voZx3oBBdxNO0jE+Dn/UEmkRG58lyw+SB26tYiYyqJ/+XpdWtX+Jln98Mkf9nJKCnYqKSx1OUojQxwz+AnMdbgnqH9VQRiESyv80m094MHUAteIr8VO9kFBsQMkNn37hvUsKSCDBGi8+AdV0iRuKBQ3o7xgUjvu+nHEC9xHJ7HgsQyW2ACuadc0znSb9MNsEEaWyTztHcUNqczQpf4awc250= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which require some core mm changes to function properly. One of the properties is that the shadow stack pointer (SSP), which is a CPU register that points to the shadow stack like the stack pointer points to the stack, can't be pointing outside of the 32 bit address space when the CPU is executing in 32 bit mode. It is desirable to prevent executing in 32 bit mode when shadow stack is enabled because the kernel can't easily support 32 bit signals. On x86 it is possible to transition to 32 bit mode without any special interaction with the kernel, by doing a "far call" to a 32 bit segment. So the shadow stack implementation can use this address space behavior as a feature, by enforcing that shadow stack memory is always mapped outside of the 32 bit address space. This way userspace will trigger a general protection fault which will in turn trigger a segfault if it tries to transition to 32 bit mode with shadow stack enabled. This provides a clean error generating border for the user if they try attempt to do 32 bit mode shadow stack, rather than leave the kernel in a half working state for userspace to be surprised by. So to allow future shadow stack enabling patches to map shadow stacks out of the 32 bit address space, introduce MAP_ABOVE4G. The behavior is pretty much like MAP_32BIT, except that it has the opposite address range. The are a few differences though. If both MAP_32BIT and MAP_ABOVE4G are provided, the kernel will use the MAP_ABOVE4G behavior. Like MAP_32BIT, MAP_ABOVE4G is ignored in a 32 bit syscall. Since the default search behavior is top down, the normal kaslr base can be used for MAP_ABOVE4G. This is unlike MAP_32BIT which has to add its own randomization in the bottom up case. For MAP_32BIT, only the bottom up search path is used. For MAP_ABOVE4G both are potentially valid, so both are used. In the bottomup search path, the default behavior is already consistent with MAP_ABOVE4G since mmap base should be above 4GB. Without MAP_ABOVE4G, the shadow stack will already normally be above 4GB. So without introducing MAP_ABOVE4G, trying to transition to 32 bit mode with shadow stack enabled would usually segfault anyway. This is already pretty decent guard rails. But the addition of MAP_ABOVE4G is some small complexity spent to make it make it more complete. Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/uapi/asm/mman.h | 1 + arch/x86/kernel/sys_x86_64.c | 6 +++++- include/linux/mman.h | 4 ++++ 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h index 775dbd3aff73..5a0256e73f1e 100644 --- a/arch/x86/include/uapi/asm/mman.h +++ b/arch/x86/include/uapi/asm/mman.h @@ -3,6 +3,7 @@ #define _ASM_X86_MMAN_H #define MAP_32BIT 0x40 /* only give out 32bit addresses */ +#define MAP_ABOVE4G 0x80 /* only map above 4GB */ #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS #define arch_calc_vm_prot_bits(prot, key) ( \ diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c index 8cc653ffdccd..c783aeb37dce 100644 --- a/arch/x86/kernel/sys_x86_64.c +++ b/arch/x86/kernel/sys_x86_64.c @@ -193,7 +193,11 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, info.flags = VM_UNMAPPED_AREA_TOPDOWN; info.length = len; - info.low_limit = PAGE_SIZE; + if (!in_32bit_syscall() && (flags & MAP_ABOVE4G)) + info.low_limit = SZ_4G; + else + info.low_limit = PAGE_SIZE; + info.high_limit = get_mmap_base(0); /* diff --git a/include/linux/mman.h b/include/linux/mman.h index cee1e4b566d8..40d94411d492 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -15,6 +15,9 @@ #ifndef MAP_32BIT #define MAP_32BIT 0 #endif +#ifndef MAP_ABOVE4G +#define MAP_ABOVE4G 0 +#endif #ifndef MAP_HUGE_2MB #define MAP_HUGE_2MB 0 #endif @@ -50,6 +53,7 @@ | MAP_STACK \ | MAP_HUGETLB \ | MAP_32BIT \ + | MAP_ABOVE4G \ | MAP_HUGE_2MB \ | MAP_HUGE_1GB) From patchwork Tue Jun 13 00:10:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277736 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 529E7C7EE2E for ; Tue, 13 Jun 2023 00:13:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8E0118E0016; Mon, 12 Jun 2023 20:12:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 840CD8E000B; Mon, 12 Jun 2023 20:12:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6447F8E0016; Mon, 12 Jun 2023 20:12:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 495908E000B for ; Mon, 12 Jun 2023 20:12:29 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 24DC780377 for ; Tue, 13 Jun 2023 00:12:29 +0000 (UTC) X-FDA: 80895798018.05.CE38F26 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf19.hostedemail.com (Postfix) with ESMTP id 15E671A0004 for ; Tue, 13 Jun 2023 00:12:26 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=G3+JPdSA; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615147; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PXl5GqZ7V0Lo7jE7kAx6s21MXFeCFmlgpAurMlsy448=; b=4ztzTCaACvJVPmQztU6kkdjslZ6C0UIMkiri0COtm4tyGYCPvV0hR77jcs9ayohc+EuYVT tteSATi2pCH3EOqjK42jGz+7awz2MYOH0EYWiXb9QsK7MygEP0mPYWLJ8+QM3XHIPcKBlX G5Iw7aXRRlq5yV2sUL28efYLlaBKjpI= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=G3+JPdSA; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615147; a=rsa-sha256; cv=none; b=b5sYO5i3fEi7Tk2p14LjR77MpWazGB3WOWgcrPwBrJPpgLncN6jlDK2bd4gvZsFg/I3PQ4 TvLN3O1ncuEx+jW4AIhUXEldlTzKUdlKWTdemPlCEhv07+iye0kVttg+OYFTjT4jY8yKMh 6IqrwdWj6MsM7/hJfgXhKgh7PyHypCQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615147; x=1718151147; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=c7sunRbrNdSwt/1KmOuIzzKZkXv73NcNzDYIVaoBldo=; b=G3+JPdSAEnkWz/gFPtktw1qrrlC2s417q0jflR6qZp69DHmX0VDNpkoW L+wteZl+laaiDuFsqg1f1zS4hweuEw3/u2im+tNwwEfP+jua5Iph0a4bl HTXg5GmfolqaEKwKeyDqLAeDAN1DVjWFkE8s/51ffXlH6I8Q4VO6VZHMB /B0HyzFX6e9u75LzJl/2j5TQJLofy6vnOxyhAbagTgOV98Nb59PEHjPis R2chFeMzpQdADb3copseGtC262ZMHWg9M4747mpnMp3TbyJDCQw0C9th4 ylLytvDBAV9w+4YzFpMmz3XN94Ljf/jdLVzUAvM+dolQECnsBKOqAcaMu w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557155" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557155" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671052" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671052" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:25 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 21/42] x86/mm: Teach pte_mkwrite() about stack memory Date: Mon, 12 Jun 2023 17:10:47 -0700 Message-Id: <20230613001108.3040476-22-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 15E671A0004 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: un5roqkit9ftmka67qqo16h518hcyyio X-HE-Tag: 1686615146-734073 X-HE-Meta: U2FsdGVkX19m4Rju6ob/M3mj89nqiIoT7c+oCR/aQrBxtumbp6laTVeI8UKezk22pV4/XtZy7iObqLGOB8I+A12AlHPa7S6BBtXM2wffc8f2QcQ22LuNXzc8+/c/qo0l9xJX/vBbw3geQZd/xXqckzW4hiPCQGteWZVtFrd51xFoksvDhAPn22IJSXTDVG9+JgMVqooSIcyveHVVxp8BlnPPykj+neSZpiB04ylXxE+Sk2wbZsBmjMEurtr8Pj957eXRVd+8r/htaI+KFp1E9o3qNB0ZHnIaHChl1JONWHGYU6AJmbz2q1I4sBZGSU2AuZ51KYm6cJMxwiiqQqf1o663JU+YjWc4dSQudKtwP42jjxO8G8wEHVrBa51Qu1Rh0XTVT32CjNuAFwIC8qVN8rgCQ+GEk/Wv4dKllF15l7Qopu2bhHqBnbzQFnmMYfVEUhGMLtlThzOezsGY55aHtLHQGCB7Nb8yaz+lnAPQqKoyzx3S/6gOa/A4rr/q3U19qn6BnqKJJP5LT4caOmWzk6dBX6pRRl/srJ3xHMLPf8ORV0IeHSkFeJy3mcFo0TJumDiuRHXa/FZGGHQk+JR+lmAcMdrnYqaPqJJF7mL17Vww3nPf+r26+czAarNDbGkAIIQoRavDFYeRWCYKq93+X+qSjWH0npWtTk/U4kuFlPhiVJWZ6VNhzWcfiCV2Y3SPh5se1DIy78G+6wh3O01BLbfyiYWxBOyOiNokYqKxSogBfFoe99znQj2AY0pl+g8Yee16eOfh1tpt9YLesC84W0gsLILbePfr4OyQOVXzU+UPScKxKckcFcQ4KSkWcTt4EQNJGQnzQDoBpoUfca7qnSjmAgWMAf60Ttse2OhIh/vnn7PdrqfCPot9WVE3dP/DHVBEY/+CvZJnJ6M2hv0UB+qrOrNBl1mF82dD6ScBstosu4Xs9Ry5kqc3WOUWHawc/hKK5UPgLnjpaQRI0cq aik532M1 HWxm+uEPkRmPrvh1Axy+dLaNMGQOBoYg0ZPDCjC29739b8Hj6s63NwSv4NHUawY4vo8H995vDCXR4u6+4ql2wODyfoDIW33puOKK/HsdQsGsh/wvzmcWMwJ1WGdKPYtVoFjKHSn5sXEZnT4SoFQSl6LGWtVmgTuWngT11CGO5tfNxmbPAhcBoWC5H3D+4br+T0pnKhgzkcsey7eL8byh4/gwcIP3hbrUSQn563gTluNB9W6e4oXduiHlPgVme+NngjjWFXGcMh5vVB6/qSGfjMM/BSpwq/OC53ilAK14+RrdwUBd6Ca7d3gbFsXkERMpUzpL1/NQaSpJ/bgtB6l7dQnkdmQN7R7Tx61XHCINQyCdDOJ7CtZuc085It1uKv5gMRDmNRwj7E6HqtmcMCn8ldSjeZUlblEG0UDKevDNfkDd4nzU/ANfcCsJ/aV2uf9x9Q4XkT9hhBeDcrM7rsIib+BVVPWSV89lSuH8h X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If a VMA has the VM_SHADOW_STACK flag, it is shadow stack memory. So when it is made writable with pte_mkwrite(), it should create shadow stack memory, not conventionally writable memory. Now that all the places where shadow stack memory might be created pass a VMA into pte_mkwrite(), it can know when it should do this. So make pte_mkwrite() create shadow stack memory when the VMA has the VM_SHADOW_STACK flag. Do the same thing for pmd_mkwrite(). This requires referencing VM_SHADOW_STACK in these functions, which are currently defined in pgtable.h, however mm.h (where VM_SHADOW_STACK is located) can't be pulled in without causing problems for files that reference pgtable.h. So also move pte/pmd_mkwrite() into pgtable.c, where they can safely reference VM_SHADOW_STACK. Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Acked-by: Deepak Gupta Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/mm/pgtable.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 101e721d74aa..c4b222d3b1b4 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -883,6 +883,9 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { + if (vma->vm_flags & VM_SHADOW_STACK) + return pte_mkwrite_shstk(pte); + pte = pte_mkwrite_novma(pte); return pte_clear_saveddirty(pte); @@ -890,6 +893,9 @@ pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { + if (vma->vm_flags & VM_SHADOW_STACK) + return pmd_mkwrite_shstk(pmd); + pmd = pmd_mkwrite_novma(pmd); return pmd_clear_saveddirty(pmd); From patchwork Tue Jun 13 00:10:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277737 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA24FC88CB2 for ; Tue, 13 Jun 2023 00:13:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C894F8E0017; Mon, 12 Jun 2023 20:12:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C11FA8E000B; Mon, 12 Jun 2023 20:12:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9EF408E0017; Mon, 12 Jun 2023 20:12:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 7D3CD8E000B for ; Mon, 12 Jun 2023 20:12:30 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 564C1AF4BD for ; Tue, 13 Jun 2023 00:12:30 +0000 (UTC) X-FDA: 80895798060.29.BEED0B6 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf01.hostedemail.com (Postfix) with ESMTP id 19A1D40014 for ; Tue, 13 Jun 2023 00:12:27 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=HM6sQ3mE; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615148; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OTL6GqoGz5oweSC99z4UnMl2uF9TwgX18y0erkZrvzI=; b=2/y7N2qvJ4R+xPvqCaSjsec860OhHg5QQlDopgNzIxm1uGLpFt+q9D/ckm17wfMBzA1HvY AEZvPi0a+aHRB9fSz8I3X+juW9aE+2QY88/c5WssCkjTaxDTtW0H56DnX7MVvI4Z1KXedh QFc3tf5x7dnv8hT8m1dtl3SJ/Us8774= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=HM6sQ3mE; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615148; a=rsa-sha256; cv=none; b=I6evev0R0ZIOPRjXg+Iiws5+Z4Amj1bWSnWqRrP1JeKxEaJvY01fAk7cV3dDVZWKihdAQI ZblJtzbf77JncZiTDjxiKO+5RIfi1I6VJVMuURWBDdgln1IGKlTvoe0gqEQ1SIIg5KNS+6 pK5BpwByPIzR2HdeTuNMYaFMDEG5YxM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615148; x=1718151148; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nq8FP2aUY00edE8aipvtg7A+Bog2Zlh3+3MlR+vKuX8=; b=HM6sQ3mESelD7KURIqapsbVU0BbBVdWDWt607qArtFWA3GQwGTO6cys1 5SSKvrhfKqajuaC1v7/9weg/Sfa4oMw++CVfWenRiLkKKreD5PbuPI3wl YOB1DAHxFm0ueAMYhG2QMmozoiNCjvPl4eCJbbeo7e1Vil+u2EwivtszP UwxRt9jRae4xgZzK+29s4mlprBf3i90RdXbQGMWSjlCL6dcD1SahKeTxF n9Yy0RzYQyrlSLBZeqWP+w5v6vwwtq983RyWma3S7KDZ8NcXsZaR+r5vw f/iOfvQ60yUkRcZXsedEcRS63DlBt15b1Wvz8gsATDZdPOZt7mXO+CqoX g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557180" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557180" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:27 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671056" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671056" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:26 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 22/42] mm: Don't allow write GUPs to shadow stack memory Date: Mon, 12 Jun 2023 17:10:48 -0700 Message-Id: <20230613001108.3040476-23-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 19A1D40014 X-Stat-Signature: bxcmewcuhewarjozm8zxwz3eahpmxyop X-Rspam-User: X-HE-Tag: 1686615147-583021 X-HE-Meta: U2FsdGVkX1+QEKEjpUzdrWd+2qpQHR43YijoYhGjQFHxrEkn402JKCxI+pUQ+THbhoYKmSL7pcweZqemetrP7f7iA4VtG8S650rlV9zwHEBxryLuuASYev+8xEuvNU38npK5DeOyG3xwLZvGZuGviy8M0numPopmdeC0zzuJnsdXDJZlLxr61PemXH71pafO/hlkneUX6hfN6Pic8lpNZMMEDxebS6/IMtsVrZHH8sVyLYQroNBMfF//KsaCzEYiOj5tg3/xLbfhJ9pPH5vCzUYDTOtU01D4Yjw4FzuT3OkofQO1kxlPnlRz6J4i/joPemmWqGI+XBDz5o9Muo2WeNSjVnErmMujPFcpYbO7PsHe44KSf2+hUabM4lBBt7KdUWoxGl8Gq04DRhlZz+jcaIbWI2n2GSgFtEcV4l7ZY/rJSQHYQbxIl1/+FNuBP5+lWXrX+qSPlRM2sabf13eWVua66E3Tf4/cDvL2AzCnimHRBHjV1Fof/uqUXOCCdJbip0bSzXwBbIhs4wp1fkyztpaKLjO4pXetOPkAdokkMH2MnTYOzbV6HPEwDvGlvB9HvQuDwgBohiVtEfZbyFrWhQJeXK9YjFfrV49UEuEZKnq63fzBCDB4pUg36QXaSABKUbYawbb0cMhV4F293/n60etvjgZGiTBMLRBBjeHZ7QagtXNk33BGeaeWawHS6okp45gEppyVY9VEB0xuOiSwxKznD9bXP1/UFjz8uwsstgCzEOHip3m347zv8NZGHzRU+4xC820Y2gt/fS3mTbAoLm2Tnfe4uHNCX89zyM/OmRz99/VCS0ABvIIrgnIY1tAysyNfJ1oIiNm6XRiVc6zyiSW5Kovo+OjUw6NF95xhPfUn5m9zOThCcUBF83qT68IJzI7QOhja1I8TZteq2AZ1laKMiWBr/Mm4rl+RhOvowLL22FpRAxNvLFEwJxjXjAbp02f3fVN88Ak3RHOmfJF vPCWdFEL dx2yyovGZqPos5/2NAV5wUnBKB9lp4e60pSBA6Rgtyz3JyE5u+z5u88UQnOyTt0HTns2HUqUGvOuWSDsQh8WWYxCSMnaL4UL5oCYmKR1LONqL++QfUOPyIYUKNewoTfS4He4Hz37zEAW5KzROd++NR+lplRKlWetEfrKRKAwT9uUrKvw22/3sKuW2uCCbERIt+MUH+osIItEU+gaCFkKVOXSi3LoYJr0v2RUk6T4ONMpcUMOwOQu/Hq4TljTcasnHWHfaUFHWICYD4HzMz6oPDgnhjhOo+jix6O1EM9665KaInbPDrfJtqVzdUYVb5rxezSkYquZQ9ITZK9+nsoB6d7OKSvz3c5BzaV9SFDHKAPZfcC2Sd7BXvlCZW+5oydxBkPjzfxIpc0NeFt4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. In userspace, shadow stack memory is writable only in very specific, controlled ways. However, since userspace can, even in the limited ways, modify shadow stack contents, the kernel treats it as writable memory. As a result, without additional work there would remain many ways for userspace to trigger the kernel to write arbitrary data to shadow stacks via get_user_pages(, FOLL_WRITE) based operations. To help userspace protect their shadow stacks, make this a little less exposed by blocking writable get_user_pages() operations for shadow stack VMAs. Still allow FOLL_FORCE to write through shadow stack protections, as it does for read-only protections. This is required for debugging use cases. Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/pgtable.h | 5 +++++ mm/gup.c | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 5383f7282f89..fce35f5d4a4e 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1630,6 +1630,11 @@ static inline bool __pte_access_permitted(unsigned long pteval, bool write) { unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER; + /* + * Write=0,Dirty=1 PTEs are shadow stack, which the kernel + * shouldn't generally allow access to, but since they + * are already Write=0, the below logic covers both cases. + */ if (write) need_pte_bits |= _PAGE_RW; diff --git a/mm/gup.c b/mm/gup.c index bbe416236593..cc0dd5267509 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -978,7 +978,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) return -EFAULT; if (write) { - if (!(vm_flags & VM_WRITE)) { + if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) { if (!(gup_flags & FOLL_FORCE)) return -EFAULT; /* hugetlb does not support FOLL_FORCE|FOLL_WRITE. */ From patchwork Tue Jun 13 00:10:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277738 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9AD68C7EE2E for ; Tue, 13 Jun 2023 00:13:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3D8358E0018; Mon, 12 Jun 2023 20:12:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 338028E000B; Mon, 12 Jun 2023 20:12:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 13D238E0018; Mon, 12 Jun 2023 20:12:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E553F8E000B for ; Mon, 12 Jun 2023 20:12:31 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id B7AF71203A0 for ; Tue, 13 Jun 2023 00:12:31 +0000 (UTC) X-FDA: 80895798102.21.F79CD5A Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf08.hostedemail.com (Postfix) with ESMTP id 21B8C16001F for ; Tue, 13 Jun 2023 00:12:28 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=JPrJdNqc; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615149; a=rsa-sha256; cv=none; b=htEa0gpUpU1PP+aONLndSKs54Oo/ci0fo/MFtJmKuAio1MJ0azZ+O66SBvF6XUxm4IAQxt wCjMnzMKpDi6XRJjNwsnr9wrSnSNXe+X9wKefFeBJ+dKK/Axg/5Ih/1MuGWi/GggsE022G 2nZRXkUXHJ2Rm15Hpm0g9qT5n+TxOeA= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=JPrJdNqc; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615149; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ngsNHJkwJ8rpoiE9+OlXKRzL3wHKwvwi/lLZa6Im/hc=; b=FCtQ8POPmvwFYoLISgUtATZlamqmXBIQXQ76dsqUHyy2Wkegr3TZiQ89P/STAfIl8d4P49 kaYVlqcKoDxRQ9gTZHRlxoHmYuZkEb+zoyIy0M1EGu04zWC9NU4KNviWvLWT/fDqjf07hh xv2B1W42ic7wQE1NB+6rK+eZzq/2J/Q= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615149; x=1718151149; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QMw9iVo5bgNcqwrdr19MOUpW+koohCohKti/401XfIw=; b=JPrJdNqcBodz9lL6K9CvDGG0Mb94roKf0lSeHyauFbErtdMGpeHH9+hK hiMaacwvhz+4wPDu1OztkRNnwb+TptCDJa1GSSrpmAQTcSAKDFagkuHKt 7R5g69mgVmZV9JQCXqoOoeLvOOqoZ/DRpen45BEXk1gCpq79YqIcMxcPs 0c/i2U4b6lyy2c58YP68B6A5UPnbk7wtYIvc32fV6AnP0qLW7+1QYDx40 CN9DoXEGQP27Xj36JFKpfvbythUFpxdRxTw7QjcT6e5muZvGG3amVA/gs Q+NZowMWi+Fr6xHRKuAARopZsvbELZ3K+PmaQecDXAWLqSWP+x5NaFmEk Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557206" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557206" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671062" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671062" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:27 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 23/42] Documentation/x86: Add CET shadow stack description Date: Mon, 12 Jun 2023 17:10:49 -0700 Message-Id: <20230613001108.3040476-24-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 21B8C16001F X-Stat-Signature: nxh16a8cue8dh4r3ty58qr7zxta71rtj X-HE-Tag: 1686615148-433028 X-HE-Meta: U2FsdGVkX1+KPlXf4hYBTa4uy2IfQGlvIr7v0RYgMaFZtlfjpHtQENZjz7DayFRQser08J1BsWYKrxc/folkmq3ZUCq4+Jb2pU783f82VMxrL6NUX779uagz5ROhzEZ2BuxXEwS6vj6i5X+1h0i824gFD1w74Sb5OYNHgv9wtuNUzZB8xGFkatuFNMCb8k9Pwx6bOQ/lrACSBC8dR5VOnZSJPHw/LJasMUbNOcOwgXwEpysp4WThz6vLrOuESkpoJp4HTOkDJbnWla07y0A/aWqD0AtbCSUn/FMRl1GHQd8D9Noz6ixnAvnyc5dl/Og0hAJ303N6lsbeIgsMquP7foTWfbe53MwEyAM6JhGO3vCaSRI12wY09GP0wfCTu+ijZio1d8NZBh1JZC8MoRii+rB2F0MQdjSQm/jS6kgSDabyKn0YundVPUQixilHW++F7kofJ07LpNjqZfUGZ4u5DmWmNtKr2eL0nMNw0uK46xOLjISukr1mQORy7tn6d2qxSwpLdR/nFTUyUVU+NYokkA7smWz/0DqtQ7a4HFcXRAW1ocW5CybTCvYa5ahrZGXex7zZXIq2mit30OA4A4hNYgHB1DO29YFowz+nZLsz3CCrs7ClBHGB66Dq8vAEMpMZLfTBVoDH9tODeAn3EIO0Zv+TSfL/BaF8cedXvw5rVjxTFtcZKMRBuJXRpYaws4f+cz0jiHziLuo9qi3UepgNwBUyXfV+eL0uBugZKwjI0csnHxLL1iK+Cu9IvcUbrmRfm77e+fPjT6g/u22xagD+SJihqoLjZ0Mu5sm0XGUiMbx3u7N7vRPpJeO1RS6F0zNLHxPR6ONa45Eh6mIDJwDZjJpll2uUH05gm5kR8/+waL4aXS51Dgyy5Cyok9+GI8/f4JRmpUsVfoEiBSD/K3MvwgnLszpKYAG9scaWeVRXjDtj3lpl794aMY+Pd3IfYSrszwrm8+Rn7zgpLOhG7uz ute+QNZk yXHd6/4lcEbaQrgKMYvYbbBdjIhO9ZSnXLcmJPmg1Fmx/HPOi6qC6fa5fSQ0MCz4YbVDC+TXrJlEZlGTHahFUgJ3LdYd1i7Ycp5+J5hbMJLOPptgup95B+lqGaRTddmq1JIGwqX0fdGtsY/vKPrNWFZ/Moxk9Mrl3I1t+jhlg73JqeukC0vgi9WuY4zV9vBPIQrsW0/94qT1cgyIY8MgXXHz0wmMrwc9V54CuyZEg+dsxcV4f+AYClJYw7u6RPGb5i5ePAma1v6u+xLDaWIqpFfjO69K4MALgMoGGIxoWD+RgC5Y0zgEBGsVgq7fTKzrTbDOQaSTTAw9ChHz5vAJiBb6Cf1/b3ri7HGcLs+YjF49OXGgAJON+I9Dc7A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Introduce a new document on Control-flow Enforcement Technology (CET). Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Reviewed-by: Szabolcs Nagy --- Documentation/arch/x86/index.rst | 1 + Documentation/arch/x86/shstk.rst | 169 +++++++++++++++++++++++++++++++ 2 files changed, 170 insertions(+) create mode 100644 Documentation/arch/x86/shstk.rst diff --git a/Documentation/arch/x86/index.rst b/Documentation/arch/x86/index.rst index c73d133fd37c..8ac64d7de4dc 100644 --- a/Documentation/arch/x86/index.rst +++ b/Documentation/arch/x86/index.rst @@ -22,6 +22,7 @@ x86-specific Documentation mtrr pat intel-hfi + shstk iommu intel_txt amd-memory-encryption diff --git a/Documentation/arch/x86/shstk.rst b/Documentation/arch/x86/shstk.rst new file mode 100644 index 000000000000..f09afa504ec0 --- /dev/null +++ b/Documentation/arch/x86/shstk.rst @@ -0,0 +1,169 @@ +.. SPDX-License-Identifier: GPL-2.0 + +====================================================== +Control-flow Enforcement Technology (CET) Shadow Stack +====================================================== + +CET Background +============== + +Control-flow Enforcement Technology (CET) covers several related x86 processor +features that provide protection against control flow hijacking attacks. CET +can protect both applications and the kernel. + +CET introduces shadow stack and indirect branch tracking (IBT). A shadow stack +is a secondary stack allocated from memory which cannot be directly modified by +applications. When executing a CALL instruction, the processor pushes the +return address to both the normal stack and the shadow stack. Upon +function return, the processor pops the shadow stack copy and compares it +to the normal stack copy. If the two differ, the processor raises a +control-protection fault. IBT verifies indirect CALL/JMP targets are intended +as marked by the compiler with 'ENDBR' opcodes. Not all CPU's have both Shadow +Stack and Indirect Branch Tracking. Today in the 64-bit kernel, only userspace +shadow stack and kernel IBT are supported. + +Requirements to use Shadow Stack +================================ + +To use userspace shadow stack you need HW that supports it, a kernel +configured with it and userspace libraries compiled with it. + +The kernel Kconfig option is X86_USER_SHADOW_STACK. When compiled in, shadow +stacks can be disabled at runtime with the kernel parameter: nousershstk. + +To build a user shadow stack enabled kernel, Binutils v2.29 or LLVM v6 or later +are required. + +At run time, /proc/cpuinfo shows CET features if the processor supports +CET. "user_shstk" means that userspace shadow stack is supported on the current +kernel and HW. + +Application Enabling +==================== + +An application's CET capability is marked in its ELF note and can be verified +from readelf/llvm-readelf output:: + + readelf -n | grep -a SHSTK + properties: x86 feature: SHSTK + +The kernel does not process these applications markers directly. Applications +or loaders must enable CET features using the interface described in section 4. +Typically this would be done in dynamic loader or static runtime objects, as is +the case in GLIBC. + +Enabling arch_prctl()'s +======================= + +Elf features should be enabled by the loader using the below arch_prctl's. They +are only supported in 64 bit user applications. These operate on the features +on a per-thread basis. The enablement status is inherited on clone, so if the +feature is enabled on the first thread, it will propagate to all the thread's +in an app. + +arch_prctl(ARCH_SHSTK_ENABLE, unsigned long feature) + Enable a single feature specified in 'feature'. Can only operate on + one feature at a time. + +arch_prctl(ARCH_SHSTK_DISABLE, unsigned long feature) + Disable a single feature specified in 'feature'. Can only operate on + one feature at a time. + +arch_prctl(ARCH_SHSTK_LOCK, unsigned long features) + Lock in features at their current enabled or disabled status. 'features' + is a mask of all features to lock. All bits set are processed, unset bits + are ignored. The mask is ORed with the existing value. So any feature bits + set here cannot be enabled or disabled afterwards. + +The return values are as follows. On success, return 0. On error, errno can +be:: + + -EPERM if any of the passed feature are locked. + -ENOTSUPP if the feature is not supported by the hardware or + kernel. + -EINVAL arguments (non existing feature, etc) + +The feature's bits supported are:: + + ARCH_SHSTK_SHSTK - Shadow stack + ARCH_SHSTK_WRSS - WRSS + +Currently shadow stack and WRSS are supported via this interface. WRSS +can only be enabled with shadow stack, and is automatically disabled +if shadow stack is disabled. + +Proc Status +=========== +To check if an application is actually running with shadow stack, the +user can read the /proc/$PID/status. It will report "wrss" or "shstk" +depending on what is enabled. The lines look like this:: + + x86_Thread_features: shstk wrss + x86_Thread_features_locked: shstk wrss + +Implementation of the Shadow Stack +================================== + +Shadow Stack Size +----------------- + +A task's shadow stack is allocated from memory to a fixed size of +MIN(RLIMIT_STACK, 4 GB). In other words, the shadow stack is allocated to +the maximum size of the normal stack, but capped to 4 GB. In the case +of the clone3 syscall, there is a stack size passed in and shadow stack +uses this instead of the rlimit. + +Signal +------ + +The main program and its signal handlers use the same shadow stack. Because +the shadow stack stores only return addresses, a large shadow stack covers +the condition that both the program stack and the signal alternate stack run +out. + +When a signal happens, the old pre-signal state is pushed on the stack. When +shadow stack is enabled, the shadow stack specific state is pushed onto the +shadow stack. Today this is only the old SSP (shadow stack pointer), pushed +in a special format with bit 63 set. On sigreturn this old SSP token is +verified and restored by the kernel. The kernel will also push the normal +restorer address to the shadow stack to help userspace avoid a shadow stack +violation on the sigreturn path that goes through the restorer. + +So the shadow stack signal frame format is as follows:: + + |1...old SSP| - Pointer to old pre-signal ssp in sigframe token format + (bit 63 set to 1) + | ...| - Other state may be added in the future + + +32 bit ABI signals are not supported in shadow stack processes. Linux prevents +32 bit execution while shadow stack is enabled by the allocating shadow stacks +outside of the 32 bit address space. When execution enters 32 bit mode, either +via far call or returning to userspace, a #GP is generated by the hardware +which, will be delivered to the process as a segfault. When transitioning to +userspace the register's state will be as if the userspace ip being returned to +caused the segfault. + +Fork +---- + +The shadow stack's vma has VM_SHADOW_STACK flag set; its PTEs are required +to be read-only and dirty. When a shadow stack PTE is not RO and dirty, a +shadow access triggers a page fault with the shadow stack access bit set +in the page fault error code. + +When a task forks a child, its shadow stack PTEs are copied and both the +parent's and the child's shadow stack PTEs are cleared of the dirty bit. +Upon the next shadow stack access, the resulting shadow stack page fault +is handled by page copy/re-use. + +When a pthread child is created, the kernel allocates a new shadow stack +for the new thread. New shadow stack creation behaves like mmap() with respect +to ASLR behavior. Similarly, on thread exit the thread's shadow stack is +disabled. + +Exec +---- + +On exec, shadow stack features are disabled by the kernel. At which point, +userspace can choose to re-enable, or lock them. From patchwork Tue Jun 13 00:10:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277739 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D73A1C7EE43 for ; Tue, 13 Jun 2023 00:13:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C62258E0019; Mon, 12 Jun 2023 20:12:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B9DDE8E000B; Mon, 12 Jun 2023 20:12:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9049B8E0019; Mon, 12 Jun 2023 20:12:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 6C0F18E000B for ; Mon, 12 Jun 2023 20:12:32 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 43E6C40396 for ; Tue, 13 Jun 2023 00:12:32 +0000 (UTC) X-FDA: 80895798144.15.89BB2FA Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf19.hostedemail.com (Postfix) with ESMTP id EA8341A0007 for ; Tue, 13 Jun 2023 00:12:29 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=PcWZ3ywZ; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615150; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=caRAaSBciErwlEFgMtwnmdwmBKoQ4mAGZdQm0SayhA8=; b=4R04pwGTJX0HvsUcCkr1nrozvxbkPh7dgyBwwDwMeQlt3qeY8odWLE8AJaw3CZVfZ28C8n ABwbE9mTL3rWAT+JBjOGRzMfkvRS1hzZxX3V4B3SMnT0AAloJiYJDQs3Uj+b8Yk8W3cc2g A3bim8Y01Hr24Pb2MfkZ5O4Brzcqo7A= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=PcWZ3ywZ; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615150; a=rsa-sha256; cv=none; b=nVjMnKlR9TDa3dUuxNpPR6ysNVXVqdu/AF2RXHg538tr4dJ4c0nQie91wNQNtEZSiWgHOV pGTbKoJEoGk/T8nPkbE60TSD9uQx3qWuDnnXUfRGe2h/CXduYpgayu6zCKQwCmRNizoIb+ DVADruoqFT5iMPKKd87R3bMiIDAYpeE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615150; x=1718151150; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=68vHo4+oSqhV8kF+lXavlD4FeK2AToHr3tcBPqqVDm8=; b=PcWZ3ywZIVEllSFguaCOQfTSCYCjJFYjKQT7l4yRvjWkQ7+0mDXScpNG /FG0YOcxpwBCUharqchPz7KWkNXJfQOnjru+NX8ampfpqDxoeLKnyrgtc RrYQMk4WYtDrHUyrRf7GluklWCnD5e2iqODHH3s1zGxHndUsa2qgqngSE z/CSHgrDnMLTkzO5jIc5FR8QndWT7qOzuos4m32vNTCzB/NLnluWScClx Y8xkYg/wvD78Awhs4t6W2AdLxMBmpgP4NvPM/3Q33pSkZ2yiVQ0/r8Gza NuBct+myVGO77X64M4/rgHfqEm/W8yXtisqmhJD3T59EaCqo6uqSS92e9 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557230" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557230" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671067" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671067" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:28 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 24/42] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Date: Mon, 12 Jun 2023 17:10:50 -0700 Message-Id: <20230613001108.3040476-25-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: EA8341A0007 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 5m55azrnf6bzzedut6gf9o94xxaqsa3o X-HE-Tag: 1686615149-720484 X-HE-Meta: U2FsdGVkX1+6unO8BahKpOB3iK8lg/+xSpn4wtcYCTpqNvNQ37CYu3DGYjzp2s5Xha0lakRfI7o4FaRDTXiRLuhLY2szoedcvmmHBS+vvViYtN2iXCXjn4ISY39/rdltSQFsUryYXkrNBZdK3MjnNPx2ce+VR/BNuKDevWCOIxShmJazSuNutFmoofzTNvyuicBIOXFssBF1mfM+XMjmC/Z6KbNhBofTas/bSfeDMWfDJ/asxe+q2qAKgVA6BR0QaiXJBMyEahxylqXujrazL/UOjWbjqRFqrN2A2S6fZqSDcniz1mL0sBRy3ZmCEuw9QY3aMiLNBMQX5poN/xLzHrCWu8/dua8xXJFmc++QjZ20SfSm3A6v6exncuEWyc2ucieS+1AxM+W8MisFHjzfx0FMP7qhX8Vp2VVhY94qFkFGfceELXhEVSemNkfWdLmtatRsYoCBv803owlQJV1rfBmJUswRbTJXkXLzYgDzHgCseMYW9SdR42TGyNueW4w7y66X9njIGA3jq1/S3adKOtyY0LJqeemH+rYbKy6TtNiNguxjBcGrdF4EqwSHnDDoA/dozyw7G6lM+iL0o352OCm2IzVgC3AiC1NRO4v5nJC630xNBh2CF8UNBAZveTenL5xF6D7TqtksX5oKk/xFK355/BGjQ2Wo0B+R1LQN7XIK43GuYk8KiKTzyq7UXAMwcCyXL8ocpz+/zWhoJPiOi6OO48f/sP8XSNh1X/6Vy9a32I1Qv449Jh3i8bgnI5glYmhU1lTjAiCUAULAd/o5kSiR+CwAeYKN0DODn66Hn8stFhMF90C2ytXjorYXs9O0Mmx7yh9AD74Hg6oi9Li9zDruPf74NbGyy8RqOn4cUqGgQRSOycJCEZSc3C8wNm3Ri8Wsu071OboHBukmQVbhO5XNQX2yrf1RNHUbmHvWFsVfXxU6iNwCZ5r6iJuB2XcTFdXaUIkRoP1VvVb1QMp sfeIflYH YKU5aljz1A/Hiaj/vglCP5/jTzZomjvhI63VCXE1TcuCKrAqzv5Dv4xEjVeVc4emMc9LX56klY4bOJPEmUuGiXTkYSiMJguutByKzl3XgiSx/wpTfu43l0gM5fT9S/XTrRND+cZeRPyFLboL+MpMSxEGqUkO2DCzY3TlbzNQppgfvD6IRGvh4Jo40W66HJEHvnFFkyozJS8uZakBsEur53QVZYhT8f5zqPCF356pv0VYAk1dl4jbySCZJHhe3Q5bYzUrK7mod2a6X2AvcvxNIK+g3XtQbzTA/kQ2KBUbOglMGGDalAyHx7NEm09QKQNJOofHaT0lS9JaKwMmgC1tna1HpZHzlveoSvcpEOGxsqok7x5NicPtBzlzcqg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Shadow stack register state can be managed with XSAVE. The registers can logically be separated into two groups: * Registers controlling user-mode operation * Registers controlling kernel-mode operation The architecture has two new XSAVE state components: one for each group of those groups of registers. This lets an OS manage them separately if it chooses. Future patches for host userspace and KVM guests will only utilize the user-mode registers, so only configure XSAVE to save user-mode registers. This state will add 16 bytes to the xsave buffer size. Future patches will use the user-mode XSAVE area to save guest user-mode CET state. However, VMCS includes new fields for guest CET supervisor states. KVM can use these to save and restore guest supervisor state, so host supervisor XSAVE support is not required. Adding this exacerbates the already unwieldy if statement in check_xstate_against_struct() that handles warning about unimplemented xfeatures. So refactor these check's by having XCHECK_SZ() set a bool when it actually check's the xfeature. This ends up exceeding 80 chars, but was better on balance than other options explored. Pass the bool as pointer to make it clear that XCHECK_SZ() can change the variable. While configuring user-mode XSAVE, clarify kernel-mode registers are not managed by XSAVE by defining the xfeature in XFEATURE_MASK_SUPERVISOR_UNSUPPORTED, like is done for XFEATURE_MASK_PT. This serves more of a documentation as code purpose, and functionally, only enables a few safety checks. Both XSAVE state components are supervisor states, even the state controlling user-mode operation. This is a departure from earlier features like protection keys where the PKRU state is a normal user (non-supervisor) state. Having the user state be supervisor-managed ensures there is no direct, unprivileged access to it, making it harder for an attacker to subvert CET. To facilitate this privileged access, define the two user-mode CET MSRs, and the bits defined in those MSRs relevant to future shadow stack enablement patches. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/fpu/types.h | 16 +++++- arch/x86/include/asm/fpu/xstate.h | 6 ++- arch/x86/kernel/fpu/xstate.c | 90 +++++++++++++++---------------- 3 files changed, 61 insertions(+), 51 deletions(-) diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index 7f6d858ff47a..eb810074f1e7 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -115,8 +115,8 @@ enum xfeature { XFEATURE_PT_UNIMPLEMENTED_SO_FAR, XFEATURE_PKRU, XFEATURE_PASID, - XFEATURE_RSRVD_COMP_11, - XFEATURE_RSRVD_COMP_12, + XFEATURE_CET_USER, + XFEATURE_CET_KERNEL_UNUSED, XFEATURE_RSRVD_COMP_13, XFEATURE_RSRVD_COMP_14, XFEATURE_LBR, @@ -138,6 +138,8 @@ enum xfeature { #define XFEATURE_MASK_PT (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR) #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU) #define XFEATURE_MASK_PASID (1 << XFEATURE_PASID) +#define XFEATURE_MASK_CET_USER (1 << XFEATURE_CET_USER) +#define XFEATURE_MASK_CET_KERNEL (1 << XFEATURE_CET_KERNEL_UNUSED) #define XFEATURE_MASK_LBR (1 << XFEATURE_LBR) #define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG) #define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA) @@ -252,6 +254,16 @@ struct pkru_state { u32 pad; } __packed; +/* + * State component 11 is Control-flow Enforcement user states + */ +struct cet_user_state { + /* user control-flow settings */ + u64 user_cet; + /* user shadow stack pointer */ + u64 user_ssp; +}; + /* * State component 15: Architectural LBR configuration state. * The size of Arch LBR state depends on the number of LBRs (lbr_depth). diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h index cd3dd170e23a..d4427b88ee12 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -50,7 +50,8 @@ #define XFEATURE_MASK_USER_DYNAMIC XFEATURE_MASK_XTILE_DATA /* All currently supported supervisor features */ -#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID) +#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \ + XFEATURE_MASK_CET_USER) /* * A supervisor state component may not always contain valuable information, @@ -77,7 +78,8 @@ * Unsupported supervisor features. When a supervisor feature in this mask is * supported in the future, move it to the supported supervisor feature mask. */ -#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT) +#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT | \ + XFEATURE_MASK_CET_KERNEL) /* All supervisor states including supported and unsupported states. */ #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 0bab497c9436..4fa4751912d9 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -39,26 +39,26 @@ */ static const char *xfeature_names[] = { - "x87 floating point registers" , - "SSE registers" , - "AVX registers" , - "MPX bounds registers" , - "MPX CSR" , - "AVX-512 opmask" , - "AVX-512 Hi256" , - "AVX-512 ZMM_Hi256" , - "Processor Trace (unused)" , + "x87 floating point registers", + "SSE registers", + "AVX registers", + "MPX bounds registers", + "MPX CSR", + "AVX-512 opmask", + "AVX-512 Hi256", + "AVX-512 ZMM_Hi256", + "Processor Trace (unused)", "Protection Keys User registers", "PASID state", - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "AMX Tile config" , - "AMX Tile data" , - "unknown xstate feature" , + "Control-flow User registers", + "Control-flow Kernel registers (unused)", + "unknown xstate feature", + "unknown xstate feature", + "unknown xstate feature", + "unknown xstate feature", + "AMX Tile config", + "AMX Tile data", + "unknown xstate feature", }; static unsigned short xsave_cpuid_features[] __initdata = { @@ -73,6 +73,7 @@ static unsigned short xsave_cpuid_features[] __initdata = { [XFEATURE_PT_UNIMPLEMENTED_SO_FAR] = X86_FEATURE_INTEL_PT, [XFEATURE_PKRU] = X86_FEATURE_PKU, [XFEATURE_PASID] = X86_FEATURE_ENQCMD, + [XFEATURE_CET_USER] = X86_FEATURE_SHSTK, [XFEATURE_XTILE_CFG] = X86_FEATURE_AMX_TILE, [XFEATURE_XTILE_DATA] = X86_FEATURE_AMX_TILE, }; @@ -276,6 +277,7 @@ static void __init print_xstate_features(void) print_xstate_feature(XFEATURE_MASK_Hi16_ZMM); print_xstate_feature(XFEATURE_MASK_PKRU); print_xstate_feature(XFEATURE_MASK_PASID); + print_xstate_feature(XFEATURE_MASK_CET_USER); print_xstate_feature(XFEATURE_MASK_XTILE_CFG); print_xstate_feature(XFEATURE_MASK_XTILE_DATA); } @@ -344,6 +346,7 @@ static __init void os_xrstor_booting(struct xregs_state *xstate) XFEATURE_MASK_BNDREGS | \ XFEATURE_MASK_BNDCSR | \ XFEATURE_MASK_PASID | \ + XFEATURE_MASK_CET_USER | \ XFEATURE_MASK_XTILE) /* @@ -446,14 +449,15 @@ static void __init __xstate_dump_leaves(void) } \ } while (0) -#define XCHECK_SZ(sz, nr, nr_macro, __struct) do { \ - if ((nr == nr_macro) && \ - WARN_ONCE(sz != sizeof(__struct), \ - "%s: struct is %zu bytes, cpu state %d bytes\n", \ - __stringify(nr_macro), sizeof(__struct), sz)) { \ +#define XCHECK_SZ(sz, nr, __struct) ({ \ + if (WARN_ONCE(sz != sizeof(__struct), \ + "[%s]: struct is %zu bytes, cpu state %d bytes\n", \ + xfeature_names[nr], sizeof(__struct), sz)) { \ __xstate_dump_leaves(); \ } \ -} while (0) + true; \ +}) + /** * check_xtile_data_against_struct - Check tile data state size. @@ -527,36 +531,28 @@ static bool __init check_xstate_against_struct(int nr) * Ask the CPU for the size of the state. */ int sz = xfeature_size(nr); + /* * Match each CPU state with the corresponding software * structure. */ - XCHECK_SZ(sz, nr, XFEATURE_YMM, struct ymmh_struct); - XCHECK_SZ(sz, nr, XFEATURE_BNDREGS, struct mpx_bndreg_state); - XCHECK_SZ(sz, nr, XFEATURE_BNDCSR, struct mpx_bndcsr_state); - XCHECK_SZ(sz, nr, XFEATURE_OPMASK, struct avx_512_opmask_state); - XCHECK_SZ(sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state); - XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM, struct avx_512_hi16_state); - XCHECK_SZ(sz, nr, XFEATURE_PKRU, struct pkru_state); - XCHECK_SZ(sz, nr, XFEATURE_PASID, struct ia32_pasid_state); - XCHECK_SZ(sz, nr, XFEATURE_XTILE_CFG, struct xtile_cfg); - - /* The tile data size varies between implementations. */ - if (nr == XFEATURE_XTILE_DATA) - check_xtile_data_against_struct(sz); - - /* - * Make *SURE* to add any feature numbers in below if - * there are "holes" in the xsave state component - * numbers. - */ - if ((nr < XFEATURE_YMM) || - (nr >= XFEATURE_MAX) || - (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) || - ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_RSRVD_COMP_16))) { + switch (nr) { + case XFEATURE_YMM: return XCHECK_SZ(sz, nr, struct ymmh_struct); + case XFEATURE_BNDREGS: return XCHECK_SZ(sz, nr, struct mpx_bndreg_state); + case XFEATURE_BNDCSR: return XCHECK_SZ(sz, nr, struct mpx_bndcsr_state); + case XFEATURE_OPMASK: return XCHECK_SZ(sz, nr, struct avx_512_opmask_state); + case XFEATURE_ZMM_Hi256: return XCHECK_SZ(sz, nr, struct avx_512_zmm_uppers_state); + case XFEATURE_Hi16_ZMM: return XCHECK_SZ(sz, nr, struct avx_512_hi16_state); + case XFEATURE_PKRU: return XCHECK_SZ(sz, nr, struct pkru_state); + case XFEATURE_PASID: return XCHECK_SZ(sz, nr, struct ia32_pasid_state); + case XFEATURE_XTILE_CFG: return XCHECK_SZ(sz, nr, struct xtile_cfg); + case XFEATURE_CET_USER: return XCHECK_SZ(sz, nr, struct cet_user_state); + case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true; + default: XSTATE_WARN_ON(1, "No structure for xstate: %d\n", nr); return false; } + return true; } From patchwork Tue Jun 13 00:10:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277740 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3774DC88CB5 for ; Tue, 13 Jun 2023 00:13:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 780EE8E001A; Mon, 12 Jun 2023 20:12:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6E1A48E000B; Mon, 12 Jun 2023 20:12:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4C0038E001A; Mon, 12 Jun 2023 20:12:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 325488E000B for ; Mon, 12 Jun 2023 20:12:33 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 0FA53A0354 for ; Tue, 13 Jun 2023 00:12:33 +0000 (UTC) X-FDA: 80895798186.15.18A1E73 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf01.hostedemail.com (Postfix) with ESMTP id B0AAA40008 for ; Tue, 13 Jun 2023 00:12:30 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=ka2BS7hH; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615151; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=F2RW2ABIPgY9zcBNGEtdjwLLlBLz/ha0Ddgc0k6oYH0=; b=hB2eOfxmujkS4Sm6k5McbNwwyzW7/NhQtVhse69fM9BR+tn6TWHoyEPAFDSKXHl+sPNGjY 4pjqdwZwNYYQJb8I/de4j3XChMPKBQ7folg7wPdu9Dt14nnFg3n0o6u6JgbnmqZPhlJ5e5 OgS2oFIMCFAwjnCBhFjEndWkO+HGknA= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=ka2BS7hH; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615151; a=rsa-sha256; cv=none; b=pjs45xS4jhkqSFRTS/1O1H13PPOC6KdsG8Dppf25VF08HMqGB5uBmGkJ9+uQM/3sJK7Bvr v2LOV9joAkhTbTzOEaRAXnNHREaJ5pNK0qWCRIBOogmO/uas2/Daokndui0vBDaK8SBzv/ 8WjNLeXfaRTatvZfi1dYmuUWuODy5x8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615151; x=1718151151; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=L2W4szzCW+qvvywxGcp6uHsWCFDTi4NqV9MxxSUxhGY=; b=ka2BS7hH5uQADWxEIOsIAyOLnrMWWFLwaWUm0789R+Lh5K+3+rPu55wj mVpZ/t7QJnfY8B7qMjG8szhDJ7JSvBbQ1Sp0d9jAC1q8TQ14M94P6TvBW bMtGTR/A1e8AkEAcS9+cFtI9Rqm2C6LL7PPIajYVuLfXO3VNBvm0PUTQM ky7Mtik0mo2Tbox018/i1lCh2mhNqGL2t33ah8lsNMS+y0qTx7c/lqH4E qHaY4SZpNp/6QWVbPo81JJ0A/vHAydafU5dd+jmspTUKck83aYGwvC3L6 9HA6DbUhwbxHBcJ3CBZ43W+kGzyqB6QXztEC9cZx5R5TcuWlpnaI9l0xu Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557254" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557254" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671072" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671072" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:28 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 25/42] x86/fpu: Add helper for modifying xstate Date: Mon, 12 Jun 2023 17:10:51 -0700 Message-Id: <20230613001108.3040476-26-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: B0AAA40008 X-Stat-Signature: boccz1r6a3x4wsjk71ezkrcpj67rnt8u X-Rspam-User: X-HE-Tag: 1686615150-453401 X-HE-Meta: U2FsdGVkX19JURuWLvIztohI9tY1RDu8pKVqCb7po2k/m6HmLOTtqI4VI6K+c1LEQM7TTIe4dW2K7W7dr9wjkDS0eb4sUDgj7w+0gPJ8t051iCpvTQfm20iVLOsfqyMsaOdubdwTdlfadeOQ++yaas0MaxBWChqlC0UytTY5S2xUa0bLLr2RXyaOdumw9s5jaBKLwWxExWXQNwU4/Vzt/G14TPlWjX+84A/u+MXM/BM1ggd2M8lHhOQGloWmZJKuUO3IZKZBOW2axQZ9485ZxZj2CluOoHAWKG7rhi8Vi0whuU9+p+L6QTwdiN0ZNdOgSRa+mEnumpiIsGxwaAzZwyrnRlOk8foRPP0xc4Cb6XbiQzK+oQL9FEl38LXtBPIpkGmUYx+ahWyUulgmkeznossMEpS/2R+w+THR6aEziH6qov8q+IKKe6ihTyA7DqBthbfkj8rURAseCtC0fYN0+g2OeWKTaHZ/wQc40RLNLCzmDfEuHrynWBryVFbbzhZiKHOGYAyiys6C63k2sznCm2qMmtFGWHKt9a3xMIYoDNNWALK3Fa0yvQTxTa4aB7iYVpMjZc9KouUOXxJpFp15sTNyIfjT/NP493ogHYlXP8bzJPAKIrEyxkA2OyqZjVAdxrYKDKwZ525pLBZttJZ/9j7ueNNlxSGHLTXtaF1rwPUukaRjEwNCUAAxt+SjPhvQEYEDpRSi6uO/Il6QZR0t8/omYoYkycaZRZBwjFPEEqnHXiSAaHwpVC+0+jD/HsAEarvip8Y5MlZJn9DuGwT3lTylyVJJZyWhvipuiIVfoBg/FC6fl8rp4bdQXaAu9bG/WTtkJ2aKHeIwwHaUOg+m62WPiT0mVX9LeYa+X+Nt1JJcrQxgNDk8V0D6p/jQjD3YQRuheWepWxbe1YpwB4k1DMXQ3jWlXANcPSdDe1LV7U2kkIkHrU0108h8vXEm9XAi35mNc7xk05H5qnbVcCv 7mBFyWr2 iWg26H50deXe5XnVNbbSF2QnBVunEf/n7yobmDJTbxJ5j5v+Q/9PAhDJxxo/Xc/TJfl4xa5kXuhcI0AqFjnkH4LL7sH3OhSg/nljjI1JTFrBtA/3kuoY0kh+hDsm5bVWzopN5KGAlmsjkbNHczednx2CO5IUXi8LrSnBJcy3B6mQQVM5rg9IR6QLPXc7I5bOxRsgx0z0Ki11y5WeFFYH9MHc2DxMVLhFEVCkvAfkdp1RkafZcctMiV5h9xygkS5Fx0LM+VxFhmg/6KZnWHWPdtsC0mvOcLMK9ZMfZgAfEXVfXpkvFKaJmg8ZTJA3+pMbUhWIAG1yQuGnwBpNnhJvhXZlGhliVHfMhRslsRNmv8isehmjKd6QqIm7J3g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Just like user xfeatures, supervisor xfeatures can be active in the registers or present in the task FPU buffer. If the registers are active, the registers can be modified directly. If the registers are not active, the modification must be performed on the task FPU buffer. When the state is not active, the kernel could perform modifications directly to the buffer. But in order for it to do that, it needs to know where in the buffer the specific state it wants to modify is located. Doing this is not robust against optimizations that compact the FPU buffer, as each access would require computing where in the buffer it is. The easiest way to modify supervisor xfeature data is to force restore the registers and write directly to the MSRs. Often times this is just fine anyway as the registers need to be restored before returning to userspace. Do this for now, leaving buffer writing optimizations for the future. Add a new function fpregs_lock_and_load() that can simultaneously call fpregs_lock() and do this restore. Also perform some extra sanity checks in this function since this will be used in non-fpu focused code. Suggested-by: Thomas Gleixner Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/fpu/api.h | 9 +++++++++ arch/x86/kernel/fpu/core.c | 18 ++++++++++++++++++ 2 files changed, 27 insertions(+) diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h index 503a577814b2..aadc6893dcaa 100644 --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -82,6 +82,15 @@ static inline void fpregs_unlock(void) preempt_enable(); } +/* + * FPU state gets lazily restored before returning to userspace. So when in the + * kernel, the valid FPU state may be kept in the buffer. This function will force + * restore all the fpu state to the registers early if needed, and lock them from + * being automatically saved/restored. Then FPU state can be modified safely in the + * registers, before unlocking with fpregs_unlock(). + */ +void fpregs_lock_and_load(void); + #ifdef CONFIG_X86_DEBUG_FPU extern void fpregs_assert_state_consistent(void); #else diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index caf33486dc5e..f851558b673f 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -753,6 +753,24 @@ void switch_fpu_return(void) } EXPORT_SYMBOL_GPL(switch_fpu_return); +void fpregs_lock_and_load(void) +{ + /* + * fpregs_lock() only disables preemption (mostly). So modifying state + * in an interrupt could screw up some in progress fpregs operation. + * Warn about it. + */ + WARN_ON_ONCE(!irq_fpu_usable()); + WARN_ON_ONCE(current->flags & PF_KTHREAD); + + fpregs_lock(); + + fpregs_assert_state_consistent(); + + if (test_thread_flag(TIF_NEED_FPU_LOAD)) + fpregs_restore_userregs(); +} + #ifdef CONFIG_X86_DEBUG_FPU /* * If current FPU state according to its tracking (loaded FPU context on this From patchwork Tue Jun 13 00:10:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277741 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEFC0C7EE43 for ; Tue, 13 Jun 2023 00:13:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6E80F8E001B; Mon, 12 Jun 2023 20:12:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 66D178E000B; Mon, 12 Jun 2023 20:12:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4996F8E001B; Mon, 12 Jun 2023 20:12:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 31CD68E000B for ; Mon, 12 Jun 2023 20:12:34 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 0E8DEAF4CD for ; Tue, 13 Jun 2023 00:12:34 +0000 (UTC) X-FDA: 80895798228.09.6D8E69D Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf08.hostedemail.com (Postfix) with ESMTP id BEA42160013 for ; Tue, 13 Jun 2023 00:12:31 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=YVS8qSma; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615152; a=rsa-sha256; cv=none; b=rkAcqttigrgIDm4gGDE2mS3xyZ68W/vtj5i8xIrYX2EWeg6ULoFHR78e+XBKPbgJw1mHzg NWWuf5QQ70P/0zBzWyj03XOfdiENE/uts0ONX2Gw/1xqTvC3GxY0YYVuPblHiAKTLH+Cow 0rbnCslOpqYTy01Co0OJvln6hSqJMao= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=YVS8qSma; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615152; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=izcKFpB20xlWfjQ0sAp89VgdYkvyy6AEcctJjTNYnrk=; b=MFl4ZVY4q79VhgFhjO0haL29pcW0/cTI3SW1XAzO/PlTT7VLUSDspRlTMaqdm7EytuHJdV Ho9Og9LpXzUxHfa/hj914qmUKCLHlH6iIomigNHOx4qYn/SMQNPpS5NscefWgVP8e1YlOt se5nKbwuFJXFgIZS2m+tImLX3vbU2+4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615152; x=1718151152; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FpsnqDmskVtuVWeAVhUxFxlMXuwPJqlBrZTKKFCDs5Y=; b=YVS8qSmarnjno/wo29UmoxAb1YWIq92197YX+U7bXS6vDpvHS5xcOKZ4 bsmXIMMVgSYyL3yU+w/3ttAtinLnFW7xQTuNGrbFODVlpBW2gMMAUV3BI m48zHu0YDBQiE5CmbGJ873los2Yb81Jm/2c2YiV+cJSJaJshSSIUEGkvx 2xiokL1EKTWjDJPM8ZA8v4fU9YFT5Y3lFryT+q405r/z8jJa8XhCv+6px 2rSVBdmPQua/ZwQVvsPPFEJOhWT/0X0Ep01yYjRWskbbYi2JObLdBswG9 aZ204UvZDEFUWYdvf/T6vD65yqKwRols/zxRgaD36ENCV/a2Rm8viVamz A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557280" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557280" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671078" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671078" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:29 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 26/42] x86: Introduce userspace API for shadow stack Date: Mon, 12 Jun 2023 17:10:52 -0700 Message-Id: <20230613001108.3040476-27-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: BEA42160013 X-Stat-Signature: 83zy3e5froy99uqkcn8pm9zxee47c3f7 X-HE-Tag: 1686615151-472965 X-HE-Meta: U2FsdGVkX1/e4U6YY8WmPcUVijzt5RKitrtVUCWDa9Pshi16d/VoZd4HWssRzbXm9rQ9i/j/WLnd2REAFPfFmp5SOARQS0seSOHZkDroEn33yUuPc0Xk4oRUG510+lKyEThb2vBjh//XA0Jw+Dmvwaar9FcjXBTmyouFRfURYX0/UZSoSW25iGJhz5Of09dLwZOreIDZQtOZLFeQf5w3u1OhH5MnMBbelO/H2Vc2//NKMlL7d7MWODGLH+zIIMQG5rby0YBTEyHBhxO5po4bwqbA5nlqQnlAWcb/GzwgOwHbyQfJMPWBBqzEdHajKxGn0S4UmkrX0khOG+6PsDGHciFqjF3sstMw0e/QxQfDt4jvCW7ZIkZF+rwywr+EUpTKWe6EolhFHPt2RbBMylAJDEwzrWBxz8blahoqpSMNlV75uTJI3PvLsMsSuD4but2mLwKbSNCmBM0XZXfHvVSL0sb/RnBRlE2KwOHiO86gS4iE+hHVj0MbX7rNjgYjdsqCv115z8CqO0t6/r6DqH0qNwVpNjOI3KeqeveyGbLyY2r+Aor952TsWPct501LkRQANkh0aRcysCZ1o9t1bez2BxJ5fll0b6tuu45wy9/pJ2cGRpY4BTwCr8BdtUrXSuSEg36QxFehKd25OPlILcww+zS9THmo/YqmJciJiRZ21QMjIqLTTuxDG0Fck3o2AdlP5eoUFp72kO1bPuOmtp9Zn3VUVxckfcZnN67/Q/reVf3tP4crjA/4MdYJ8cw0q2oobzn1OiqlgkEwfWGYH/A5+8zPrIMANAxzLpGXL27cP3tWsbEZHhJoeG3D6+HxLmYwYMtDuiHW2ds1v5TU6nJD18kdmOD/iXTd3UMME2Q45VyS2/47f59tmxTq4A4cE2+iekWU5iph4uvfKpX8JeGHDzdO5d3odlVc3mcrFQV/sA3zvpcLJyTyska3eLNd259akys7etKdXTAKrDq0/Lq 147XjroO wDcoxhjO+MkuzJED5fhF83l6Nhr6cVoNpOUehQGysycNG5dxymD3wQoq0TlB1hD6lOsi2qmIhrX3yLtLucDa3IYU7UEMW9r2hfOwNh5pyfs5FuxOnfEM7QKfIum4TH/Un2eRsIi7fOygkJ5osomYQuFF5fn085YDxYEL57upGtFiBMXaRAZZ2cP2w2tNXIqqlvSTISuK6Y+XXq/5o25T3+IJB/mMT//pB6R1GClqzPMVOm5NWrFhdZvIhfpfCuYWH7x7W6avZ//VMV/OF++bfY1wPK8taHv+MLK6ywV65WQf+vX+HGwVz4T4fIRlSnzQsnSNbDCACjOFvRRz3l9VVfAgVRnzsoBSpBKdvRn7cJSDILhLROLhvBi0dpQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add three new arch_prctl() handles: - ARCH_SHSTK_ENABLE/DISABLE enables or disables the specified feature. Returns 0 on success or a negative value on error. - ARCH_SHSTK_LOCK prevents future disabling or enabling of the specified feature. Returns 0 on success or a negative value on error. The features are handled per-thread and inherited over fork(2)/clone(2), but reset on exec(). Co-developed-by: Kirill A. Shutemov Signed-off-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/processor.h | 6 +++++ arch/x86/include/asm/shstk.h | 21 +++++++++++++++ arch/x86/include/uapi/asm/prctl.h | 6 +++++ arch/x86/kernel/Makefile | 2 ++ arch/x86/kernel/process_64.c | 6 +++++ arch/x86/kernel/shstk.c | 44 +++++++++++++++++++++++++++++++ 6 files changed, 85 insertions(+) create mode 100644 arch/x86/include/asm/shstk.h create mode 100644 arch/x86/kernel/shstk.c diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index a1e4fa58b357..407d5551b6a7 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -28,6 +28,7 @@ struct vm86; #include #include #include +#include #include #include @@ -475,6 +476,11 @@ struct thread_struct { */ u32 pkru; +#ifdef CONFIG_X86_USER_SHADOW_STACK + unsigned long features; + unsigned long features_locked; +#endif + /* Floating point and extended processor state */ struct fpu fpu; /* diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h new file mode 100644 index 000000000000..ec753809f074 --- /dev/null +++ b/arch/x86/include/asm/shstk.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_SHSTK_H +#define _ASM_X86_SHSTK_H + +#ifndef __ASSEMBLY__ +#include + +struct task_struct; + +#ifdef CONFIG_X86_USER_SHADOW_STACK +long shstk_prctl(struct task_struct *task, int option, unsigned long features); +void reset_thread_features(void); +#else +static inline long shstk_prctl(struct task_struct *task, int option, + unsigned long arg2) { return -EINVAL; } +static inline void reset_thread_features(void) {} +#endif /* CONFIG_X86_USER_SHADOW_STACK */ + +#endif /* __ASSEMBLY__ */ + +#endif /* _ASM_X86_SHSTK_H */ diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index e8d7ebbca1a4..1cd44ecc9ce0 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -23,9 +23,15 @@ #define ARCH_MAP_VDSO_32 0x2002 #define ARCH_MAP_VDSO_64 0x2003 +/* Don't use 0x3001-0x3004 because of old glibcs */ + #define ARCH_GET_UNTAG_MASK 0x4001 #define ARCH_ENABLE_TAGGED_ADDR 0x4002 #define ARCH_GET_MAX_TAG_BITS 0x4003 #define ARCH_FORCE_TAGGED_SVA 0x4004 +#define ARCH_SHSTK_ENABLE 0x5001 +#define ARCH_SHSTK_DISABLE 0x5002 +#define ARCH_SHSTK_LOCK 0x5003 + #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index abee0564b750..6b6bf47652ee 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -147,6 +147,8 @@ obj-$(CONFIG_CALL_THUNKS) += callthunks.o obj-$(CONFIG_X86_CET) += cet.o +obj-$(CONFIG_X86_USER_SHADOW_STACK) += shstk.o + ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 3d181c16a2f6..0f89aa0186d1 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -515,6 +515,8 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip, load_gs_index(__USER_DS); } + reset_thread_features(); + loadsegment(fs, 0); loadsegment(es, _ds); loadsegment(ds, _ds); @@ -894,6 +896,10 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2) else return put_user(LAM_U57_BITS, (unsigned long __user *)arg2); #endif + case ARCH_SHSTK_ENABLE: + case ARCH_SHSTK_DISABLE: + case ARCH_SHSTK_LOCK: + return shstk_prctl(task, option, arg2); default: ret = -EINVAL; break; diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c new file mode 100644 index 000000000000..41ed6552e0a5 --- /dev/null +++ b/arch/x86/kernel/shstk.c @@ -0,0 +1,44 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * shstk.c - Intel shadow stack support + * + * Copyright (c) 2021, Intel Corporation. + * Yu-cheng Yu + */ + +#include +#include +#include + +void reset_thread_features(void) +{ + current->thread.features = 0; + current->thread.features_locked = 0; +} + +long shstk_prctl(struct task_struct *task, int option, unsigned long features) +{ + if (option == ARCH_SHSTK_LOCK) { + task->thread.features_locked |= features; + return 0; + } + + /* Don't allow via ptrace */ + if (task != current) + return -EINVAL; + + /* Do not allow to change locked features */ + if (features & task->thread.features_locked) + return -EPERM; + + /* Only support enabling/disabling one feature at a time. */ + if (hweight_long(features) > 1) + return -EINVAL; + + if (option == ARCH_SHSTK_DISABLE) { + return -EINVAL; + } + + /* Handle ARCH_SHSTK_ENABLE */ + return -EINVAL; +} From patchwork Tue Jun 13 00:10:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277742 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E389C88CB2 for ; Tue, 13 Jun 2023 00:13:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2ED148E001C; Mon, 12 Jun 2023 20:12:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1B1FF8E000B; Mon, 12 Jun 2023 20:12:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF7E48E001C; Mon, 12 Jun 2023 20:12:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id CB9588E000B for ; Mon, 12 Jun 2023 20:12:35 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id AB43616037B for ; Tue, 13 Jun 2023 00:12:35 +0000 (UTC) X-FDA: 80895798270.24.1A391ED Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf19.hostedemail.com (Postfix) with ESMTP id 45C3E1A0007 for ; Tue, 13 Jun 2023 00:12:33 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=go5NbeOd; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615153; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qGs9rCBjozTM2AAPpzKdYfSTzD2KPWUcCiw9QM54jZM=; b=AZ3QH+3k7ulLI90JO8aczD+FWqJ43I2KSz7r2SSThsrKvB40XwHN5tWo53t/kCMzNajp9G 4EtrSSDFdcLFPtxxOa21qAbiO6yJ+cebMlHx9G0Pl3kLLCcmsp/43idDVT19DYnEYuttYU hlzlbAg3tVS4LT380/ksXnmvUSpedS8= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=go5NbeOd; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615153; a=rsa-sha256; cv=none; b=HUpWYr5Hg4MwV+6zjH8vux86w/vjtCd31/jJuQi0lb0+Yu3hXVDnVTziUBsfUC1HGBHljN SrU0ZKADxVQe1puuN9Mi7LkHwseJHSxR8V4YVJhDGn/8iEXGnqQtMsCTpl5oglAqhRUqwx dwU9/YLmp8XuJAUjExK1z6y+u3Ellr4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615153; x=1718151153; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=WchKfqQNnYFXXNMxX3TqxAMzB4S9WkAXS/aRm3qIr7o=; b=go5NbeOdtvOOCr1i7N8NouBwKB+S8AT+xI7IaQXLuJ5cWkajFXlW5T3K qbCO3BpHAQXVVa+4y0i6U5viD/iKMxszijQyQ/y2zRYb8U6vYc0u9ybQc PHkMnJ5B2ZL6feOosIPe7UR/I2jJPGJqfV0F54nd4NzxvToKbRMOXp2U2 4KPqsuM960fXfecONTPqifbJa3m0/7eq7OFm7/Ar9Oc6FOHQyHClwbCcG cCTxuKYSM9TrGGpIT15L9UmJxam0UjyUnf4YF5qFu8l3kVeLTVtCVySam fdOFeatcGWDupI4aoS+q3f9L4W1lBnpbcOUW/vnU4ff3NNh8u7OejfmZH A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557303" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557303" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671087" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671087" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:30 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 27/42] x86/shstk: Add user control-protection fault handler Date: Mon, 12 Jun 2023 17:10:53 -0700 Message-Id: <20230613001108.3040476-28-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 45C3E1A0007 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: ynwy6useyyy41139k9fsp6o5as5kuh9h X-HE-Tag: 1686615153-577153 X-HE-Meta: U2FsdGVkX1/hJf7jcuiGqEmcDoMFRma2ZzqEuHpjRgsEGhAHNpKy5YigS0SEFQsSwIoz3U8oQpQ3F3oBwINWAXTUMxf/BZvTUY1iJla+4+Bb3w85ssSWey4qEKIYii7+N8S98iDRzV7F1uY3Bgv8Vz2/8xMJnZFexkQMhQxLufcrBecwu3YzjLGBuKq/q56eg5167+8pT6WS/6KGWsa6naIq7VsDDbEp6RRUiZfx7NMImVlu8MpdetXbQHnyfrhhIika0nnyvscMbedh87r1ALNaf+dI1BB7qLPQjGOgNHufCqllu1gfrjdkbgMb22yopIWgeZ/Uzpy8TeOgJELbeVWFTP0XhaNxD+xAp+bmXM5+ZrE3VJnsVdV6TUHGyFPpQVrSMfaHw2qIwM0cwX2UpyKq1z1LvYLIyVvAEtfNLOXTCn+LuDSLsKHW1g44H7NK3gUeboMKoZ4/KcqrHZP80a070OL+W+s8hhHeDxKLVxbe8E9o7mvKb1NQ3aXdg81lWHBL9g+JdfCDzJx1UUDZWc9Plj7pNn37Vgxf6TgVW72p3wusiMfMwdHlUJIdD5U07o4Qz7lULKPImwes3P4yBcQBJTr4f0O33mrduoPVT+utHpRDQUXUT5iiiHRCgMsc9UMQhtw4CWYbKAFSDrvCNq/FbiBI0n7gKVR8VV2Vh/xxe5OpVgm32aMlNnOrwULDDBU1LQe3QFuEim3veSHEl5ioil/+zGgjEQqy9GeXBw67djSVepdAuR50oE+Lvg/xOwaZBfeG5S/n7U3uZm74GIPFISC9/HuWSLeF8B3aD0QSw+XtmkDjqXeiJEMh6k4DRsysVMP8GeHYaEnEoOhVQecENMVnAqUPWPwmreesXK4brnIt13JBINsEADtqSlZ5ydEO/VC/uVrkDudXfgWMmsx5CqU9xq5+MQg2VrvmGbIfr4W59HuatoE28izpDOaXwNCXhSWweyxip9IZUSx F4CCtIf0 K828cTFr543rh99m8saM3+VVs3NALYp+clAMcvYekOimhwy0xdR0WDZJ6IVoCb7wLygr8MuQYmOmUeDJZwbND6skkXGnWBcQ+pF0tnbTsZxdE+PSI9xOCIM4iy6ehSbVBeidd4a1UHw9lFpHSGdx/fZcWR5ULTC1xtM1uKt76t60T4nabwzOyn9z6EgBx4YkKhBYklpw5VIgLE6ehFDVQBIDUix1E5KMqE7jfWaWIOK5f897jpqPCGeMietJYWIpzpfbXNhMjW5QTvzwJkl2Gsv5MFL9iwQgzEeofivtQ6Tq8EAKHtS+Ci6IoYO1y5p7Zk3NapHXzZ5C+Q1A40Ax/VMmkVQH4HwgKhaoJ08tk6fehTtfwX/UBNqgxpg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A control-protection fault is triggered when a control-flow transfer attempt violates Shadow Stack or Indirect Branch Tracking constraints. For example, the return address for a RET instruction differs from the copy on the shadow stack. There already exists a control-protection fault handler for handling kernel IBT faults. Refactor this fault handler into separate user and kernel handlers, like the page fault handler. Add a control-protection handler for usermode. To avoid ifdeffery, put them both in a new file cet.c, which is compiled in the case of either of the two CET features supported in the kernel: kernel IBT or user mode shadow stack. Move some static inline functions from traps.c into a header so they can be used in cet.c. Opportunistically fix a comment in the kernel IBT part of the fault handler that is on the end of the line instead of preceding it. Keep the same behavior for the kernel side of the fault handler, except for converting a BUG to a WARN in the case of a #CP happening when the feature is missing. This unifies the behavior with the new shadow stack code, and also prevents the kernel from crashing under this situation which is potentially recoverable. The control-protection fault handler works in a similar way as the general protection fault handler. It provides the si_code SEGV_CPERR to the signal handler. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/arm/kernel/signal.c | 2 +- arch/arm64/kernel/signal.c | 2 +- arch/arm64/kernel/signal32.c | 2 +- arch/sparc/kernel/signal32.c | 2 +- arch/sparc/kernel/signal_64.c | 2 +- arch/x86/include/asm/disabled-features.h | 8 +- arch/x86/include/asm/idtentry.h | 2 +- arch/x86/include/asm/traps.h | 12 +++ arch/x86/kernel/cet.c | 94 +++++++++++++++++++++--- arch/x86/kernel/idt.c | 2 +- arch/x86/kernel/signal_32.c | 2 +- arch/x86/kernel/signal_64.c | 2 +- arch/x86/kernel/traps.c | 12 --- arch/x86/xen/enlighten_pv.c | 2 +- arch/x86/xen/xen-asm.S | 2 +- include/uapi/asm-generic/siginfo.h | 3 +- 16 files changed, 117 insertions(+), 34 deletions(-) diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c index e07f359254c3..9a3c9de5ac5e 100644 --- a/arch/arm/kernel/signal.c +++ b/arch/arm/kernel/signal.c @@ -681,7 +681,7 @@ asmlinkage void do_rseq_syscall(struct pt_regs *regs) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 2cfc810d0a5b..06d31731c8ed 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -1343,7 +1343,7 @@ void __init minsigstksz_setup(void) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c index 4700f8522d27..bbd542704730 100644 --- a/arch/arm64/kernel/signal32.c +++ b/arch/arm64/kernel/signal32.c @@ -460,7 +460,7 @@ void compat_setup_restart_syscall(struct pt_regs *regs) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/sparc/kernel/signal32.c b/arch/sparc/kernel/signal32.c index dad38960d1a8..82da8a2d769d 100644 --- a/arch/sparc/kernel/signal32.c +++ b/arch/sparc/kernel/signal32.c @@ -751,7 +751,7 @@ asmlinkage int do_sys32_sigstack(u32 u_ssptr, u32 u_ossptr, unsigned long sp) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/sparc/kernel/signal_64.c b/arch/sparc/kernel/signal_64.c index 570e43e6fda5..b4e410976e0d 100644 --- a/arch/sparc/kernel/signal_64.c +++ b/arch/sparc/kernel/signal_64.c @@ -562,7 +562,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long orig_i0, unsigned long */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index b9c7eae2e70f..702d93fdd10e 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -111,6 +111,12 @@ #define DISABLE_USER_SHSTK (1 << (X86_FEATURE_USER_SHSTK & 31)) #endif +#ifdef CONFIG_X86_KERNEL_IBT +#define DISABLE_IBT 0 +#else +#define DISABLE_IBT (1 << (X86_FEATURE_IBT & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -134,7 +140,7 @@ #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \ DISABLE_ENQCMD) #define DISABLED_MASK17 0 -#define DISABLED_MASK18 0 +#define DISABLED_MASK18 (DISABLE_IBT) #define DISABLED_MASK19 0 #define DISABLED_MASK20 0 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 21) diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h index b241af4ce9b4..61e0e6301f09 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -614,7 +614,7 @@ DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_DF, xenpv_exc_double_fault); #endif /* #CP */ -#ifdef CONFIG_X86_KERNEL_IBT +#ifdef CONFIG_X86_CET DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP, exc_control_protection); #endif diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index 47ecfff2c83d..75e0dabf0c45 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -47,4 +47,16 @@ void __noreturn handle_stack_overflow(struct pt_regs *regs, struct stack_info *info); #endif +static inline void cond_local_irq_enable(struct pt_regs *regs) +{ + if (regs->flags & X86_EFLAGS_IF) + local_irq_enable(); +} + +static inline void cond_local_irq_disable(struct pt_regs *regs) +{ + if (regs->flags & X86_EFLAGS_IF) + local_irq_disable(); +} + #endif /* _ASM_X86_TRAPS_H */ diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c index 7ad22b705b64..cc10d8be9d74 100644 --- a/arch/x86/kernel/cet.c +++ b/arch/x86/kernel/cet.c @@ -4,10 +4,6 @@ #include #include -static __ro_after_init bool ibt_fatal = true; - -extern void ibt_selftest_ip(void); /* code label defined in asm below */ - enum cp_error_code { CP_EC = (1 << 15) - 1, @@ -20,15 +16,80 @@ enum cp_error_code { CP_ENCL = 1 << 15, }; -DEFINE_IDTENTRY_ERRORCODE(exc_control_protection) +static const char cp_err[][10] = { + [0] = "unknown", + [1] = "near ret", + [2] = "far/iret", + [3] = "endbranch", + [4] = "rstorssp", + [5] = "setssbsy", +}; + +static const char *cp_err_string(unsigned long error_code) { - if (!cpu_feature_enabled(X86_FEATURE_IBT)) { - pr_err("Unexpected #CP\n"); - BUG(); + unsigned int cpec = error_code & CP_EC; + + if (cpec >= ARRAY_SIZE(cp_err)) + cpec = 0; + return cp_err[cpec]; +} + +static void do_unexpected_cp(struct pt_regs *regs, unsigned long error_code) +{ + WARN_ONCE(1, "Unexpected %s #CP, error_code: %s\n", + user_mode(regs) ? "user mode" : "kernel mode", + cp_err_string(error_code)); +} + +static DEFINE_RATELIMIT_STATE(cpf_rate, DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST); + +static void do_user_cp_fault(struct pt_regs *regs, unsigned long error_code) +{ + struct task_struct *tsk; + unsigned long ssp; + + /* + * An exception was just taken from userspace. Since interrupts are disabled + * here, no scheduling should have messed with the registers yet and they + * will be whatever is live in userspace. So read the SSP before enabling + * interrupts so locking the fpregs to do it later is not required. + */ + rdmsrl(MSR_IA32_PL3_SSP, ssp); + + cond_local_irq_enable(regs); + + tsk = current; + tsk->thread.error_code = error_code; + tsk->thread.trap_nr = X86_TRAP_CP; + + /* Ratelimit to prevent log spamming. */ + if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) && + __ratelimit(&cpf_rate)) { + pr_emerg("%s[%d] control protection ip:%lx sp:%lx ssp:%lx error:%lx(%s)%s", + tsk->comm, task_pid_nr(tsk), + regs->ip, regs->sp, ssp, error_code, + cp_err_string(error_code), + error_code & CP_ENCL ? " in enclave" : ""); + print_vma_addr(KERN_CONT " in ", regs->ip); + pr_cont("\n"); } - if (WARN_ON_ONCE(user_mode(regs) || (error_code & CP_EC) != CP_ENDBR)) + force_sig_fault(SIGSEGV, SEGV_CPERR, (void __user *)0); + cond_local_irq_disable(regs); +} + +static __ro_after_init bool ibt_fatal = true; + +/* code label defined in asm below */ +extern void ibt_selftest_ip(void); + +static void do_kernel_cp_fault(struct pt_regs *regs, unsigned long error_code) +{ + if ((error_code & CP_EC) != CP_ENDBR) { + do_unexpected_cp(regs, error_code); return; + } if (unlikely(regs->ip == (unsigned long)&ibt_selftest_ip)) { regs->ax = 0; @@ -74,3 +135,18 @@ static int __init ibt_setup(char *str) } __setup("ibt=", ibt_setup); + +DEFINE_IDTENTRY_ERRORCODE(exc_control_protection) +{ + if (user_mode(regs)) { + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + do_user_cp_fault(regs, error_code); + else + do_unexpected_cp(regs, error_code); + } else { + if (cpu_feature_enabled(X86_FEATURE_IBT)) + do_kernel_cp_fault(regs, error_code); + else + do_unexpected_cp(regs, error_code); + } +} diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c index a58c6bc1cd68..5074b8420359 100644 --- a/arch/x86/kernel/idt.c +++ b/arch/x86/kernel/idt.c @@ -107,7 +107,7 @@ static const __initconst struct idt_data def_idts[] = { ISTG(X86_TRAP_MC, asm_exc_machine_check, IST_INDEX_MCE), #endif -#ifdef CONFIG_X86_KERNEL_IBT +#ifdef CONFIG_X86_CET INTG(X86_TRAP_CP, asm_exc_control_protection), #endif diff --git a/arch/x86/kernel/signal_32.c b/arch/x86/kernel/signal_32.c index 9027fc088f97..c12624bc82a3 100644 --- a/arch/x86/kernel/signal_32.c +++ b/arch/x86/kernel/signal_32.c @@ -402,7 +402,7 @@ int ia32_setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/x86/kernel/signal_64.c b/arch/x86/kernel/signal_64.c index 13a1e6083837..0e808c72bf7e 100644 --- a/arch/x86/kernel/signal_64.c +++ b/arch/x86/kernel/signal_64.c @@ -403,7 +403,7 @@ void sigaction_compat_abi(struct k_sigaction *act, struct k_sigaction *oact) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 6f666dfa97de..f358350624b2 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -77,18 +77,6 @@ DECLARE_BITMAP(system_vectors, NR_VECTORS); -static inline void cond_local_irq_enable(struct pt_regs *regs) -{ - if (regs->flags & X86_EFLAGS_IF) - local_irq_enable(); -} - -static inline void cond_local_irq_disable(struct pt_regs *regs) -{ - if (regs->flags & X86_EFLAGS_IF) - local_irq_disable(); -} - __always_inline int is_valid_bugaddr(unsigned long addr) { if (addr < TASK_SIZE_MAX) diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 093b78c8bbec..5a034a994682 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -640,7 +640,7 @@ static struct trap_array_entry trap_array[] = { TRAP_ENTRY(exc_coprocessor_error, false ), TRAP_ENTRY(exc_alignment_check, false ), TRAP_ENTRY(exc_simd_coprocessor_error, false ), -#ifdef CONFIG_X86_KERNEL_IBT +#ifdef CONFIG_X86_CET TRAP_ENTRY(exc_control_protection, false ), #endif }; diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S index 08f1ceb9eb81..9e5e68008785 100644 --- a/arch/x86/xen/xen-asm.S +++ b/arch/x86/xen/xen-asm.S @@ -148,7 +148,7 @@ xen_pv_trap asm_exc_page_fault xen_pv_trap asm_exc_spurious_interrupt_bug xen_pv_trap asm_exc_coprocessor_error xen_pv_trap asm_exc_alignment_check -#ifdef CONFIG_X86_KERNEL_IBT +#ifdef CONFIG_X86_CET xen_pv_trap asm_exc_control_protection #endif #ifdef CONFIG_X86_MCE diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index ffbe4cec9f32..0f52d0ac47c5 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -242,7 +242,8 @@ typedef struct siginfo { #define SEGV_ADIPERR 7 /* Precise MCD exception */ #define SEGV_MTEAERR 8 /* Asynchronous ARM MTE error */ #define SEGV_MTESERR 9 /* Synchronous ARM MTE exception */ -#define NSIGSEGV 9 +#define SEGV_CPERR 10 /* Control protection fault */ +#define NSIGSEGV 10 /* * SIGBUS si_codes From patchwork Tue Jun 13 00:10:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277743 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D1BFC7EE43 for ; Tue, 13 Jun 2023 00:13:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7CD4A8E001D; Mon, 12 Jun 2023 20:12:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 757968E000B; Mon, 12 Jun 2023 20:12:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 50C038E001E; Mon, 12 Jun 2023 20:12:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 385718E001D for ; Mon, 12 Jun 2023 20:12:36 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 0D073A035D for ; Tue, 13 Jun 2023 00:12:36 +0000 (UTC) X-FDA: 80895798312.21.C6B54AA Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf01.hostedemail.com (Postfix) with ESMTP id D403340011 for ; Tue, 13 Jun 2023 00:12:33 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=nUkU1rNg; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615154; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=p3jyjKsRiT5U7d0D/fXsaKNoPvKBDHZ161THvSejxrY=; b=FWph9zpaWwrMZLnYVogN3AYlra/L7LEpD6DO26b96Cc156AhOmb1okKDHwgMPQIHgArTjs 9rZoKlT57TCbf5y7kCP/Up032UUfoqcJxU8WFmC8ZoXrtn9inqkE/226pLQpXUvEcj+eB0 YAB0DNnSW2uYGXSjqwEknVE7l0QQQ30= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=nUkU1rNg; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615154; a=rsa-sha256; cv=none; b=Qseso594HVIUGxY3HiBP53gwKCAO1BEN8bs2iIJo0a3Prb3Y5TIh88K29mYt/gTFTUXzGU BdrXsGwfzpt2t7kCKcOBAiVS9fyCyHtQ/j37PA+dzBJsUKmR3d7/fxwULwMZlqAqoqTvbo rkyZgoSSYHuJ5Lpe948zUsjN5T0nSVo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615154; x=1718151154; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BugAl8HBv2gDjezo8BApRaTr6a0cYuyYBztBA+16s0k=; b=nUkU1rNgXwNiW5Mom6Xq9Dm0YRHaa9aSa3/XDmV1LxCAwE8SnapZg2mS MRZrorD7WfvUr+otvmZpJ0PRtz9AMiU+2qTxdDZIzIKgZB3kfRn26AqK6 LImrhF6KAC+0Fw4fsGyde5Z6mB0PR16lCRORhTMxTe7KZUK7+7V09ifM8 MDZf2nxoRGoEZv3u1q0Eh6F4Iy5lB8ozcx+r6bVcaOT/qCaqzc1X3KVof 3SE+m9KxVk7Ny1pioDuKLOd0asMyojiCOlz6l8A7WHt7asLjVGGrFspjq rROb8qEbHEorg/68eJmt8crmsMardpwPAvxbd6KTYgDQ4fHa6gbiwQ2Zw Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557326" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557326" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671093" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671093" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:31 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 28/42] x86/shstk: Add user-mode shadow stack support Date: Mon, 12 Jun 2023 17:10:54 -0700 Message-Id: <20230613001108.3040476-29-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: D403340011 X-Stat-Signature: updpszdiq4c3or1y36fkjat7dgbu73s6 X-Rspam-User: X-HE-Tag: 1686615153-455673 X-HE-Meta: U2FsdGVkX1/eSKWBug5NehUbV0sd2AWumGE+B/eeSnvV0OUd/chrZDf/ABHOLe2K8YgsAK26AVVEe77J+wIz9/7noE+h8SSSK0GmV2/AKcFPNc0wCceCYQXBK/cHn1IEcGl/nhhUJ19aKWetZcqmzXfMLGmAyuro3ENU+3PezeTcykQNkzVKaqApdRAtUR1xRpOY0gf57erzIgHU6bMaEwieCt2dfbw+WJSS/xBTDiPQ44ideb7iyT4ig8DsFOAYaI6rsLsLBrXr/LR2qfM7jmOzGtmAJXychxSSsNrBjXP5AfRztYF2Ei3g/qEtpnQkC7/cOBNT7yiTERB8FllwDmTvrzQCUeEAPJdOHHXWwUYnDMG3Nv1Dde7kIS3uiwqZ7Hqas03/c/QATt93f2YB5q+ruU3yEX54ZXXIrRtOZ0D4pZ3aAC8695wvhFDe9WuSDCFvT7x6cjYyvtok5m8BKH7RChZT0/TIojDzqPE0WsWHb//Zpqg88e/vdo64EEQcifS+nKDLjjQ+kHmBPpkiTaq/0Z5kilQ0yl/oOCI+iE5vDBsEWXdvsThznQcYBl0WiT9fJx2HoG4zkmgZUjPpRgQNnJ7xlJ3yC4U6BXHt/w4O0VLO3z8z7gPzO53CYpF0cwdYDOyTOWeMlf7/ONbdAk4s6YgMLiCwkdV9wjdSfOjSMFBx9j75v3leRGUyu2JQ1gQAf5YG/A9p7FXwcZfvAX9DxIdGlmKpCLhBcWjfommz33mbxKRY1mmCl/5+Xggyqt/HbU3XysdWW7X0OGTJendS5hz6Fb9PAIzpxrK0sEi/UJIrVvqxbZANIP9tc+1Svg/L+dpiPHMHTz/Tb0Hw7DAev3h35Qay7Of8gZKhV/SExlbL1LTGP37DuXa6BrC8u8bQU4Nu0oSld8YNX0AzpSOQvKBGSvKVi9DA++Xlx7PfywjYFbfOUYR6LVvj32ZsPVIedLmUyQjdS7D0EcW Ctn4mD7S b5I680aIQWjA5QK4xbHoH5qaytAmip997c6O3ULrFEaCssjTKku4zWLMEoxi2lDHnOnunSo+FZ6YF+bZvBkrV2sHfd4xOtkRoAmPtN2RcpZhOYoOwvk+K+GymIchqo8jKcL9bPdnSJ8n0ue112ueJWKqmbF7PVgh59e/OoWzBjVkqB6pOwe9gOKwRLIC+kJGuEic+r4KhUh+flPPTqeyKbKj1qd6Fi9icJWHjtvzSFLaezEDmqDoqPHmwnUAeEtnava5H6Fd8C5hYsn2GR3jCGVtOIw+jEgP3mKYfalG5nEZpX+DtP6UkpDuGCWSX3OggtA4Zz0RaSx/qkNNTZDVTXV3Td0WzbOtOIMRQLvZvCoVjqAArOV3TdJURLA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Introduce basic shadow stack enabling/disabling/allocation routines. A task's shadow stack is allocated from memory with VM_SHADOW_STACK flag and has a fixed size of min(RLIMIT_STACK, 4GB). Keep the task's shadow stack address and size in thread_struct. This will be copied when cloning new threads, but needs to be cleared during exec, so add a function to do this. 32 bit shadow stack is not expected to have many users and it will complicate the signal implementation. So do not support IA32 emulation or x32. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/processor.h | 2 + arch/x86/include/asm/shstk.h | 7 ++ arch/x86/include/uapi/asm/prctl.h | 3 + arch/x86/kernel/shstk.c | 145 ++++++++++++++++++++++++++++++ 4 files changed, 157 insertions(+) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 407d5551b6a7..2a5ec5750ba7 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -479,6 +479,8 @@ struct thread_struct { #ifdef CONFIG_X86_USER_SHADOW_STACK unsigned long features; unsigned long features_locked; + + struct thread_shstk shstk; #endif /* Floating point and extended processor state */ diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h index ec753809f074..2b1f7c9b9995 100644 --- a/arch/x86/include/asm/shstk.h +++ b/arch/x86/include/asm/shstk.h @@ -8,12 +8,19 @@ struct task_struct; #ifdef CONFIG_X86_USER_SHADOW_STACK +struct thread_shstk { + u64 base; + u64 size; +}; + long shstk_prctl(struct task_struct *task, int option, unsigned long features); void reset_thread_features(void); +void shstk_free(struct task_struct *p); #else static inline long shstk_prctl(struct task_struct *task, int option, unsigned long arg2) { return -EINVAL; } static inline void reset_thread_features(void) {} +static inline void shstk_free(struct task_struct *p) {} #endif /* CONFIG_X86_USER_SHADOW_STACK */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 1cd44ecc9ce0..6a8e0e1bff4a 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -34,4 +34,7 @@ #define ARCH_SHSTK_DISABLE 0x5002 #define ARCH_SHSTK_LOCK 0x5003 +/* ARCH_SHSTK_ features bits */ +#define ARCH_SHSTK_SHSTK (1ULL << 0) + #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 41ed6552e0a5..3cb85224d856 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -8,14 +8,159 @@ #include #include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include #include +static bool features_enabled(unsigned long features) +{ + return current->thread.features & features; +} + +static void features_set(unsigned long features) +{ + current->thread.features |= features; +} + +static void features_clr(unsigned long features) +{ + current->thread.features &= ~features; +} + +static unsigned long alloc_shstk(unsigned long size) +{ + int flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_ABOVE4G; + struct mm_struct *mm = current->mm; + unsigned long addr, unused; + + mmap_write_lock(mm); + addr = do_mmap(NULL, addr, size, PROT_READ, flags, + VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); + + mmap_write_unlock(mm); + + return addr; +} + +static unsigned long adjust_shstk_size(unsigned long size) +{ + if (size) + return PAGE_ALIGN(size); + + return PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G)); +} + +static void unmap_shadow_stack(u64 base, u64 size) +{ + while (1) { + int r; + + r = vm_munmap(base, size); + + /* + * vm_munmap() returns -EINTR when mmap_lock is held by + * something else, and that lock should not be held for a + * long time. Retry it for the case. + */ + if (r == -EINTR) { + cond_resched(); + continue; + } + + /* + * For all other types of vm_munmap() failure, either the + * system is out of memory or there is bug. + */ + WARN_ON_ONCE(r); + break; + } +} + +static int shstk_setup(void) +{ + struct thread_shstk *shstk = ¤t->thread.shstk; + unsigned long addr, size; + + /* Already enabled */ + if (features_enabled(ARCH_SHSTK_SHSTK)) + return 0; + + /* Also not supported for 32 bit and x32 */ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || in_32bit_syscall()) + return -EOPNOTSUPP; + + size = adjust_shstk_size(0); + addr = alloc_shstk(size); + if (IS_ERR_VALUE(addr)) + return PTR_ERR((void *)addr); + + fpregs_lock_and_load(); + wrmsrl(MSR_IA32_PL3_SSP, addr + size); + wrmsrl(MSR_IA32_U_CET, CET_SHSTK_EN); + fpregs_unlock(); + + shstk->base = addr; + shstk->size = size; + features_set(ARCH_SHSTK_SHSTK); + + return 0; +} + void reset_thread_features(void) { + memset(¤t->thread.shstk, 0, sizeof(struct thread_shstk)); current->thread.features = 0; current->thread.features_locked = 0; } +void shstk_free(struct task_struct *tsk) +{ + struct thread_shstk *shstk = &tsk->thread.shstk; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || + !features_enabled(ARCH_SHSTK_SHSTK)) + return; + + if (!tsk->mm) + return; + + unmap_shadow_stack(shstk->base, shstk->size); +} + +static int shstk_disable(void) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return -EOPNOTSUPP; + + /* Already disabled? */ + if (!features_enabled(ARCH_SHSTK_SHSTK)) + return 0; + + fpregs_lock_and_load(); + /* Disable WRSS too when disabling shadow stack */ + wrmsrl(MSR_IA32_U_CET, 0); + wrmsrl(MSR_IA32_PL3_SSP, 0); + fpregs_unlock(); + + shstk_free(current); + features_clr(ARCH_SHSTK_SHSTK); + + return 0; +} + long shstk_prctl(struct task_struct *task, int option, unsigned long features) { if (option == ARCH_SHSTK_LOCK) { From patchwork Tue Jun 13 00:10:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277744 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB109C88CB2 for ; Tue, 13 Jun 2023 00:13:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4E71C8E001E; Mon, 12 Jun 2023 20:12:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 470078E000B; Mon, 12 Jun 2023 20:12:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2761B8E001E; Mon, 12 Jun 2023 20:12:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 0E6648E000B for ; Mon, 12 Jun 2023 20:12:37 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id D9227120394 for ; Tue, 13 Jun 2023 00:12:36 +0000 (UTC) X-FDA: 80895798312.16.B6D6F08 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf08.hostedemail.com (Postfix) with ESMTP id 8C9A7160011 for ; Tue, 13 Jun 2023 00:12:34 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=CJzusdv0; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615155; a=rsa-sha256; cv=none; b=sX+yCmKTGpB4QRMflxgO4DN3Q4Ta6dGAfDltJDLqtb79lZhBd41OAWo7hRdfDE8ohPibY8 hYwxJh+CbnXJfZy+k9wZ8bJ5mgY0wiafVbyZH0b0ZOnjVIS769VpENm3Wj8mvt5q3RokGq aE06D5maONMYIiBPOSkp4HE3J++a6W0= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=CJzusdv0; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615155; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6rHYz8SdjBH/6c3pJG6cDBMu51EcZSu0gv2NjEypa6I=; b=CxRZEZikzY651OF5TZIfYBiOcy4jq0WPh3HlZtRUMQfrxz/MFrR86B6KonTA4F3b+2rR7v oIi8K0tEG2zu01xV8eEj5bgSL9uu+kr+cTBYmnLQHuJSmt9WPtCbbY9i/uz7FrKkU0YMTq gkjXaqI2gjeDYOR90epUH1u7x8YO8bo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615154; x=1718151154; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZRAh96yHcl8Psj9OI1FtC3p64r7joGZ816QWoOgXk8M=; b=CJzusdv09hNLEznBTIelV+gbK8QwTB6Y7cXCXLJ7ADyLIjzNUaCILAfO 3Q3rizXQ4lBLLBPYs2CZZyoF/8p2dfG2B3vmMzpjmQjNtKMLvlArnOFkq G2kDzJgZkWae1KrqP+F3tReTZt7vS/vrD0qoH5cmG3hoC7JWW2pJR9Dzg LM9xMQRe2IMtqr3akMWQXXqbKsfReHFdB8ehxWtf679GtqE9AirnlcqAs voJ2mYu41w4WPbrO92kPfaLcAG4rFUiwvenR8qeaWhtshc5gwTjvcowli JtopQgQrYctQQ56qRvCiSWmpNn87p3qosazsxnmR3ZrXrPUDrJTHmDXuA g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557353" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557353" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671099" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671099" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:32 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 29/42] x86/shstk: Handle thread shadow stack Date: Mon, 12 Jun 2023 17:10:55 -0700 Message-Id: <20230613001108.3040476-30-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 8C9A7160011 X-Stat-Signature: ymjkqi3zkdkbp9g3qmrkd6d4ds1tjiix X-HE-Tag: 1686615154-813812 X-HE-Meta: U2FsdGVkX1+G23I6ECmMfHLGgPfx0G8vca4Se8kgG95fMAqvoSTYciFD/cCeRlCThW5wxMO+IfzYhc70YDpmT2W10YyqePvFRW3qTXcp8VqS0MU1YezvYyOxWyYlatoW99VX0JZE6pWdkQG1i4I8FsAta/AKrU0Jd6VRqUQouPmC5u6BogoyyC/e2iK3J98Mvc4B8doz4MruNUNKrExmhjPrKR1yI5sywsBMvp96h78aKTXudWr9xzob0Ki7r8lApBys5FPG1DCWEv9eQwgDgpDnWr9ONS8QhuLmmSOa5y+NkbWAlDHf+J6YsLRpb4w0NDPuELHp3obiuiOT40yqThr+jyJ5hiZ0wtSjrjqyHuMeq3BrEKgYQUG9eF47aongdLadzlFBBUMKQfRFDkke/JuyKko+R8ZqODDtxlka4RutaXuJQ9LAh6tzKo5VnX13PB2IbduHiH/G29+Kicy67JAtkb2A24tJf9IqnWW1fWMjoCCPsLaT318NsKrqXHHvEWJ0fRt3NFwVuFs4j0y6wXcmAlnU0C8F3x4NPbX19+BYECHqPyQ+2Mg0WKH6RAcJDAkcHc/uu1eUsr2jdlM+lhPYstHZpyO3UBd+KzmQNKeZF3bjzPFMnOHt0gCKbCQNsc8OyjRkMZJC+N3w+h/x+UbEcBeCgM0nyuPfsJixtbwxyqgl6+zRc3LgzMahJOhJ/3DedMsrEDG2E5enbZxaCLEkvSX0naWA5/ctmgv89utrB87IiMKv2CMZ/J8iN9X6MkbeRR9mf6i9XzlzBf49JMGvQMJ9UUG3c9Fqq9v0Wcco6L/LthPmY0cXI+Y4c+N7QyPJ/rZzbPZR+UDY3zZXuIqIITcy6vNhCorZr6GeGDN0LBYkjJnggNyapUlrAJKZW2lKW20h3fYPhzLK0gzAg/0NuKdWqFbasp345mblhDBoKMQf89DRPmSYjKs9JUT8455EW6u/ozepICFE1oZ 1e7xqugR o3n2iew+k3vBTqbgH0N8mKK/LgMS+pJ4XfH36492HpUH9I5sNC6J5cmDdFIwclRJyqC/D+6BKQwrymgTEd5zdCaQ5Kp8JYm0Q899ao350+hbNLk4IVh2gYFKZVu34GFG9ix+QL5P1H5RHzQkQ2V5JeN0vSnUL/fOnp0nArgVsjx7Ko71pOSxrkrMdSdm2oNjXvObmU4TvgwBPI1P+WDsHU9SuU0i5YCBaEMRhraPi8han9lX4Pq291VChyqcEX9A8K0ibkk8fLaZ3Yvd65cdIjChnfWUadY/XfIdFcZm/n9Ts3yodPt5LGtNEexBvyHcdNPKvIwzMKNLxrr4FjFYTwJ8hlbVAjL3+sGU/0xLb8ljmaWnLhdZYUiZS6A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When a process is duplicated, but the child shares the address space with the parent, there is potential for the threads sharing a single stack to cause conflicts for each other. In the normal non-CET case this is handled in two ways. With regular CLONE_VM a new stack is provided by userspace such that the parent and child have different stacks. For vfork, the parent is suspended until the child exits. So as long as the child doesn't return from the vfork()/CLONE_VFORK calling function and sticks to a limited set of operations, the parent and child can share the same stack. For shadow stack, these scenarios present similar sharing problems. For the CLONE_VM case, the child and the parent must have separate shadow stacks. Instead of changing clone to take a shadow stack, have the kernel just allocate one and switch to it. Use stack_size passed from clone3() syscall for thread shadow stack size. A compat-mode thread shadow stack size is further reduced to 1/4. This allows more threads to run in a 32-bit address space. The clone() does not pass stack_size, which was added to clone3(). In that case, use RLIMIT_STACK size and cap to 4 GB. For shadow stack enabled vfork(), the parent and child can share the same shadow stack, like they can share a normal stack. Since the parent is suspended until the child terminates, the child will not interfere with the parent while executing as long as it doesn't return from the vfork() and overwrite up the shadow stack. The child can safely overwrite down the shadow stack, as the parent can just overwrite this later. So CET does not add any additional limitations for vfork(). Free the shadow stack on thread exit by doing it in mm_release(). Skip this when exiting a vfork() child since the stack is shared in the parent. During this operation, the shadow stack pointer of the new thread needs to be updated to point to the newly allocated shadow stack. Since the ability to do this is confined to the FPU subsystem, change fpu_clone() to take the new shadow stack pointer, and update it internally inside the FPU subsystem. This part was suggested by Thomas Gleixner. Suggested-by: Thomas Gleixner Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/fpu/sched.h | 3 ++- arch/x86/include/asm/mmu_context.h | 2 ++ arch/x86/include/asm/shstk.h | 5 ++++ arch/x86/kernel/fpu/core.c | 36 +++++++++++++++++++++++++- arch/x86/kernel/process.c | 21 ++++++++++++++- arch/x86/kernel/shstk.c | 41 ++++++++++++++++++++++++++++-- 6 files changed, 103 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h index c2d6cd78ed0c..3c2903bbb456 100644 --- a/arch/x86/include/asm/fpu/sched.h +++ b/arch/x86/include/asm/fpu/sched.h @@ -11,7 +11,8 @@ extern void save_fpregs_to_fpstate(struct fpu *fpu); extern void fpu__drop(struct fpu *fpu); -extern int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal); +extern int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal, + unsigned long shstk_addr); extern void fpu_flush_thread(void); /* diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 1d29dc791f5a..416901d406f8 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -186,6 +186,8 @@ do { \ #else #define deactivate_mm(tsk, mm) \ do { \ + if (!tsk->vfork_done) \ + shstk_free(tsk); \ load_gs_index(0); \ loadsegment(fs, 0); \ } while (0) diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h index 2b1f7c9b9995..d4a5c7b10cb5 100644 --- a/arch/x86/include/asm/shstk.h +++ b/arch/x86/include/asm/shstk.h @@ -15,11 +15,16 @@ struct thread_shstk { long shstk_prctl(struct task_struct *task, int option, unsigned long features); void reset_thread_features(void); +unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, + unsigned long stack_size); void shstk_free(struct task_struct *p); #else static inline long shstk_prctl(struct task_struct *task, int option, unsigned long arg2) { return -EINVAL; } static inline void reset_thread_features(void) {} +static inline unsigned long shstk_alloc_thread_stack(struct task_struct *p, + unsigned long clone_flags, + unsigned long stack_size) { return 0; } static inline void shstk_free(struct task_struct *p) {} #endif /* CONFIG_X86_USER_SHADOW_STACK */ diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index f851558b673f..aa4856b236b8 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -552,8 +552,36 @@ static inline void fpu_inherit_perms(struct fpu *dst_fpu) } } +/* A passed ssp of zero will not cause any update */ +static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp) +{ +#ifdef CONFIG_X86_USER_SHADOW_STACK + struct cet_user_state *xstate; + + /* If ssp update is not needed. */ + if (!ssp) + return 0; + + xstate = get_xsave_addr(&dst->thread.fpu.fpstate->regs.xsave, + XFEATURE_CET_USER); + + /* + * If there is a non-zero ssp, then 'dst' must be configured with a shadow + * stack and the fpu state should be up to date since it was just copied + * from the parent in fpu_clone(). So there must be a valid non-init CET + * state location in the buffer. + */ + if (WARN_ON_ONCE(!xstate)) + return 1; + + xstate->user_ssp = (u64)ssp; +#endif + return 0; +} + /* Clone current's FPU state on fork */ -int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal) +int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal, + unsigned long ssp) { struct fpu *src_fpu = ¤t->thread.fpu; struct fpu *dst_fpu = &dst->thread.fpu; @@ -613,6 +641,12 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal) if (use_xsave()) dst_fpu->fpstate->regs.xsave.header.xfeatures &= ~XFEATURE_MASK_PASID; + /* + * Update shadow stack pointer, in case it changed during clone. + */ + if (update_fpu_shstk(dst, ssp)) + return 1; + trace_x86_fpu_copy_src(src_fpu); trace_x86_fpu_copy_dst(dst_fpu); diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index dac41a0072ea..3ab62ac98c2c 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -50,6 +50,7 @@ #include #include #include +#include #include "process.h" @@ -121,6 +122,7 @@ void exit_thread(struct task_struct *tsk) free_vm86(t); + shstk_free(tsk); fpu__drop(fpu); } @@ -142,6 +144,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) struct inactive_task_frame *frame; struct fork_frame *fork_frame; struct pt_regs *childregs; + unsigned long new_ssp; int ret = 0; childregs = task_pt_regs(p); @@ -179,7 +182,16 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) frame->flags = X86_EFLAGS_FIXED; #endif - fpu_clone(p, clone_flags, args->fn); + /* + * Allocate a new shadow stack for thread if needed. If shadow stack, + * is disabled, new_ssp will remain 0, and fpu_clone() will know not to + * update it. + */ + new_ssp = shstk_alloc_thread_stack(p, clone_flags, args->stack_size); + if (IS_ERR_VALUE(new_ssp)) + return PTR_ERR((void *)new_ssp); + + fpu_clone(p, clone_flags, args->fn, new_ssp); /* Kernel thread ? */ if (unlikely(p->flags & PF_KTHREAD)) { @@ -225,6 +237,13 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP))) io_bitmap_share(p); + /* + * If copy_thread() if failing, don't leak the shadow stack possibly + * allocated in shstk_alloc_thread_stack() above. + */ + if (ret) + shstk_free(p); + return ret; } diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 3cb85224d856..bd9cdc3a7338 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -47,7 +47,7 @@ static unsigned long alloc_shstk(unsigned long size) unsigned long addr, unused; mmap_write_lock(mm); - addr = do_mmap(NULL, addr, size, PROT_READ, flags, + addr = do_mmap(NULL, 0, size, PROT_READ, flags, VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); mmap_write_unlock(mm); @@ -126,6 +126,37 @@ void reset_thread_features(void) current->thread.features_locked = 0; } +unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags, + unsigned long stack_size) +{ + struct thread_shstk *shstk = &tsk->thread.shstk; + unsigned long addr, size; + + /* + * If shadow stack is not enabled on the new thread, skip any + * switch to a new shadow stack. + */ + if (!features_enabled(ARCH_SHSTK_SHSTK)) + return 0; + + /* + * For CLONE_VM, except vfork, the child needs a separate shadow + * stack. + */ + if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM) + return 0; + + size = adjust_shstk_size(stack_size); + addr = alloc_shstk(size); + if (IS_ERR_VALUE(addr)) + return addr; + + shstk->base = addr; + shstk->size = size; + + return addr + size; +} + void shstk_free(struct task_struct *tsk) { struct thread_shstk *shstk = &tsk->thread.shstk; @@ -134,7 +165,13 @@ void shstk_free(struct task_struct *tsk) !features_enabled(ARCH_SHSTK_SHSTK)) return; - if (!tsk->mm) + /* + * When fork() with CLONE_VM fails, the child (tsk) already has a + * shadow stack allocated, and exit_thread() calls this function to + * free it. In this case the parent (current) and the child share + * the same mm struct. + */ + if (!tsk->mm || tsk->mm != current->mm) return; unmap_shadow_stack(shstk->base, shstk->size); From patchwork Tue Jun 13 00:10:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277745 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9295EC7EE2E for ; Tue, 13 Jun 2023 00:13:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A2FD8E001F; Mon, 12 Jun 2023 20:12:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 92C5D8E000B; Mon, 12 Jun 2023 20:12:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 694B38E001F; Mon, 12 Jun 2023 20:12:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 500218E000B for ; Mon, 12 Jun 2023 20:12:38 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 256FD14036E for ; Tue, 13 Jun 2023 00:12:38 +0000 (UTC) X-FDA: 80895798396.08.E6C15AB Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf19.hostedemail.com (Postfix) with ESMTP id 243791A0007 for ; Tue, 13 Jun 2023 00:12:35 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=DGyIIn1+; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615156; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CCO8Fv6gzg+48ugIh6HaSUWv+RYPVlqFMXpZfWmPIDc=; b=CrgljBdXVbVWkS7v0FQWbJRLyDIGtp886lC9xExgNn8p2jE1jXuFoqvqgcbfJubG8EB0Tu 9bmfxzcwANcjJxak3lokA0nhF9/mkir12xqpiYqgGBGmdZT8E/SBiVEzUOWCRpekQe4REc xnx+D1ryU4Ax9AfObRVamFG5IfkM85Y= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=DGyIIn1+; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615156; a=rsa-sha256; cv=none; b=NxLZUbpqXFuxFEkYWwDq3rlhmB1RvM3Pl1pC9m8Ptyoke8ZTuwB8a7HZH2v+tFu3LxqN5S H0Ka9yx2AtDc9NQysJMvYZi7E9E81/vfA3n2V/XY1WjGbR/i1bFJtbanaroWZBFm/ckl3W 6kqdRwq3i67xEtrl703nqeLlmo56EFY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615156; x=1718151156; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=JRlODCr89ZW4tpE3v7Z5UBl8Adk88XgPugy9MazljtY=; b=DGyIIn1+KbQ5CzN2LRgX5+BUJjK9/Cc5Ea+rSij5mfNtqhQxM7YD7REa 8GD9V7NKI3mnrDfX83P3uLGjWU2DM8dXuAbZVge8FVvFq6T4uh8QX1VbB EzbWBNrWMeelhbRaew9ztwYU44gvXGtIC4MYf+Z4KYP7Xvc7hYK+IEVG1 nBHXSt1um9OMDb1pONxircQ6U17wSw/J1nYp3SmJTKtI5AGcnXNpli2XP /2pjQjCUtUyRPWFqyuoZGmhe6UxaJctxy70j5L41troxgD4ymlv4og+uk 6kBJyNRoo5fP7iQjghGQmdxEy7XGGqbG4sAF6lD/mr12cHQsxCdya/47K Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557375" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557375" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671105" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671105" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:33 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 30/42] x86/shstk: Introduce routines modifying shstk Date: Mon, 12 Jun 2023 17:10:56 -0700 Message-Id: <20230613001108.3040476-31-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 243791A0007 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: c5sfuzy4kojsnb38b6i1xz7jty8mafaw X-HE-Tag: 1686615155-973177 X-HE-Meta: U2FsdGVkX19khmEmqkDQPYXt4GSZcvP8AfL0PtM6W+qgMdl4e7ZSU1CFfG0t74OeXVxm5X8LQ+2Xsd+IQ3JO38mVZumAg948gmeSz2z/zbxunUMWG5uk3UDddenkNIb7/PwUdmhopeHL6oGqGw0qBm3OsBNl6SH/i6D4AjqecqfOHMkfrFrTSPBud7PT8fAJHAqxxTwcmFBFEpIDg3gSwOM94ob1hcQqLDnv0ucvgYpJuu8EHViZnBKXHl8MvEfsHcqW6WHU43+fO/9tdfzq3NFU5XcriBJZpoz8pv2V98TMqM+GK06cbtMQWzZ4KOKAShk71DbykSefoJ62ZlUA78jE4H0AfA5ZXbRIziiK6Rma/iqDzqU+M5NvgsQseaPJleqRto7d4hM16D64hI+ni7OFmmxCBSWvYMWiH/LloiAweaR+1PQ+G2zh2g1ERugtW6fjTjLLHiXGv7dBaOkWyI+6XxzU1CKBH5oFfa5srCiX3rvvFSpnFXAzor3bsgK+mrZ8M9ib/LWvKGf4smaly19P7r+TssqD0khax+0FJrBRAOWSF2z2MklOQnCBrWq5wTdK3NZ4UdFhj0DCfzh2rJR3lG1X2Tdvsk9pqKMIqCNcVIrxmAUv2zIEs/Y8F0H+252UIXo0xiBCukZUxejNYXzxqhdkID6+xVrWsmF3aDm+3CzoC58TLutgtSU54vG3a30uKHMgohkbp4v7lQpqFHzWUIiwnzRTz0Z/bcDxauELEXsMiEvSs+O7ZyP6zph4X9va/sFLlOnTEB533m5bXMwDlppXC2+8rgFUUS3G8dZwdAYstRJd1UmGWuT4TKQUd+Qugggvwc6DYV3ygdhyRVJWeHTVeaiycHCQL0p3om/hd3fbeenhLfyDIoF+0gNw1rRRnUW1fInXysB8PwkyP299XpiLaVZufCf3zXLqGQ7kkkPWbJHtUnhurBiI13OvRcONMegRcPIY8xV3nn6 9+YlXXHF R8N2Es//ERMzNAq8ZEze/RAXigTEa9HedHZRU2QPp2GN+PStb6LoOj1En41Cb+xokwHkuCYRRMfW4BkIrvjMZLk8k44DqZfm+idVeWbt1aiOj+3KqL9xZeDn/zptS4lLYeg48aV/u6X2csU5j7WVg85156l8H9MFsNoyj1lNvo8lcSeUCNEdow+FnG3hnwRflei17PgKe9ZjCf/RYI8NM/1GAAXUPVGIbd7i9rsKHju4nCLCde/HdUEy++DYgYha4c2xYQWvjvR6pVyN5AU2VuPhi+pcA+ZSXx7ejMnwo4FGEQ3bnbVoWE/EHYn7YrCLyzkMYyBxCb8Jv1KCJ4vfTd7yAFLQlEaJy1jYkoXrI0xU4Lm88dim6pk2U6Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Shadow stacks are normally written to via CALL/RET or specific CET instructions like RSTORSSP/SAVEPREVSSP. However, sometimes the kernel will need to write to the shadow stack directly using the ring-0 only WRUSS instruction. A shadow stack restore token marks a restore point of the shadow stack, and the address in a token must point directly above the token, which is within the same shadow stack. This is distinctively different from other pointers on the shadow stack, since those pointers point to executable code area. Introduce token setup and verify routines. Also introduce WRUSS, which is a kernel-mode instruction but writes directly to user shadow stack. In future patches that enable shadow stack to work with signals, the kernel will need something to denote the point in the stack where sigreturn may be called. This will prevent attackers calling sigreturn at arbitrary places in the stack, in order to help prevent SROP attacks. To do this, something that can only be written by the kernel needs to be placed on the shadow stack. This can be accomplished by setting bit 63 in the frame written to the shadow stack. Userspace return addresses can't have this bit set as it is in the kernel range. It also can't be a valid restore token. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/special_insns.h | 13 +++++ arch/x86/kernel/shstk.c | 75 ++++++++++++++++++++++++++++ 2 files changed, 88 insertions(+) diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index de48d1389936..d6cd9344f6c7 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -202,6 +202,19 @@ static inline void clwb(volatile void *__p) : [pax] "a" (p)); } +#ifdef CONFIG_X86_USER_SHADOW_STACK +static inline int write_user_shstk_64(u64 __user *addr, u64 val) +{ + asm_volatile_goto("1: wrussq %[val], (%[addr])\n" + _ASM_EXTABLE(1b, %l[fail]) + :: [addr] "r" (addr), [val] "r" (val) + :: fail); + return 0; +fail: + return -EFAULT; +} +#endif /* CONFIG_X86_USER_SHADOW_STACK */ + #define nop() asm volatile ("nop") static inline void serialize(void) diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index bd9cdc3a7338..e22928c63ffc 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -25,6 +25,8 @@ #include #include +#define SS_FRAME_SIZE 8 + static bool features_enabled(unsigned long features) { return current->thread.features & features; @@ -40,6 +42,35 @@ static void features_clr(unsigned long features) current->thread.features &= ~features; } +/* + * Create a restore token on the shadow stack. A token is always 8-byte + * and aligned to 8. + */ +static int create_rstor_token(unsigned long ssp, unsigned long *token_addr) +{ + unsigned long addr; + + /* Token must be aligned */ + if (!IS_ALIGNED(ssp, 8)) + return -EINVAL; + + addr = ssp - SS_FRAME_SIZE; + + /* + * SSP is aligned, so reserved bits and mode bit are a zero, just mark + * the token 64-bit. + */ + ssp |= BIT(0); + + if (write_user_shstk_64((u64 __user *)addr, (u64)ssp)) + return -EFAULT; + + if (token_addr) + *token_addr = addr; + + return 0; +} + static unsigned long alloc_shstk(unsigned long size) { int flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_ABOVE4G; @@ -157,6 +188,50 @@ unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long cl return addr + size; } +static unsigned long get_user_shstk_addr(void) +{ + unsigned long long ssp; + + fpregs_lock_and_load(); + + rdmsrl(MSR_IA32_PL3_SSP, ssp); + + fpregs_unlock(); + + return ssp; +} + +#define SHSTK_DATA_BIT BIT(63) + +static int put_shstk_data(u64 __user *addr, u64 data) +{ + if (WARN_ON_ONCE(data & SHSTK_DATA_BIT)) + return -EINVAL; + + /* + * Mark the high bit so that the sigframe can't be processed as a + * return address. + */ + if (write_user_shstk_64(addr, data | SHSTK_DATA_BIT)) + return -EFAULT; + return 0; +} + +static int get_shstk_data(unsigned long *data, unsigned long __user *addr) +{ + unsigned long ldata; + + if (unlikely(get_user(ldata, addr))) + return -EFAULT; + + if (!(ldata & SHSTK_DATA_BIT)) + return -EINVAL; + + *data = ldata & ~SHSTK_DATA_BIT; + + return 0; +} + void shstk_free(struct task_struct *tsk) { struct thread_shstk *shstk = &tsk->thread.shstk; From patchwork Tue Jun 13 00:10:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277746 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A61B5C7EE43 for ; Tue, 13 Jun 2023 00:13:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EAC128E0020; Mon, 12 Jun 2023 20:12:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E36668E000B; Mon, 12 Jun 2023 20:12:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BA1878E0021; Mon, 12 Jun 2023 20:12:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9AA6E8E0020 for ; Mon, 12 Jun 2023 20:12:38 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 6A26A80378 for ; Tue, 13 Jun 2023 00:12:38 +0000 (UTC) X-FDA: 80895798396.10.760E2CD Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf01.hostedemail.com (Postfix) with ESMTP id 3AD4D4000B for ; Tue, 13 Jun 2023 00:12:36 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=QOnyf2BN; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615156; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MgDcxUbk8bavQieeuDWIimstnW9ktFErCUXNtfQmk0E=; b=hVnZ913vtSA9+uAAfMFrUzbPvYStV7nJ0Auk6uV3u4M+OM9KUzKwfLEBT7QNXH4X0MnIox +uy7nAzlHrgpO5cgCAhFcw5TP5Jz3e5CsDuR6HY/91l8uLuh9RvcvlU5RR3sFdNAayoRPg bmI5zEyeGnQZ57ww1K0y8j/32EDixKE= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=QOnyf2BN; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615156; a=rsa-sha256; cv=none; b=dGP3qQReJgxwnIjqP4WYsF0VjfbYwnH7Md9HIp//oH1O7qfXyKZVxbxggx2N8PJQ99PGa3 /MzKj+FoXX9SF2IQbIPwQ/ENUj+lz8iFC5dk6cXh1ZZG7ogvgTeUNx2g+U0r0gh6GsJuNB 3I+haaIZiqZ3wsotJzHQckEdW16xNFc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615156; x=1718151156; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=5PTalCu64urUNwCF3tCp/oZ88l0SJTltpYFQdwbjc9Y=; b=QOnyf2BNsWy7PSiwD8DoBlNo/viIb6zVJfGIVeY9bUrQ1jM3IpEGMHzH dW+c25OaFv03DWA+7J5LYYYHcQ2eOwxT2muexyiA7NB5nfWRNG51UGerP k/jNCiZntqlIUc+GjhlINnupYargeKBfXn4AJ0S9T0fluR39V/znI4rBN phSbasdDOzZyHIj14zJJRnQ9ZeNT0FWI48IYAfegY4a42zgl8Uly76AP0 iAKUJKR7rNl97Su4iMgZg63WUgDAtKJIKm+dF1IgHr34rlsnD2oJsLAW4 iLSbS87oM1YHVHYBgZY/xMg0o4vF3NThI4WbpvtZIsIV0JIar6Q5TKXBC A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557398" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557398" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671109" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671109" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:34 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 31/42] x86/shstk: Handle signals for shadow stack Date: Mon, 12 Jun 2023 17:10:57 -0700 Message-Id: <20230613001108.3040476-32-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 3AD4D4000B X-Stat-Signature: b68nrdc3p6ifntj1nyp3fh46qxguh4ti X-Rspam-User: X-HE-Tag: 1686615156-281505 X-HE-Meta: U2FsdGVkX1996bjxlKff084mpNjdlOiiMLERQPZk0oDjUYin/LpHwoR9rHrFc4HMwSOBGq8bnuIuN/RfaSGsL7XYI7Qb3uPReBFgQ0TCJIJADOFW94FeJZUwp3FhgroWeYVbXu0mAtaBqnoNdADN0NUuC4sN68d6/XY6miQZ6Iq5ul8bfT2iUXYudcZGDIo3jEET7CvTU6DrDGc5ZWGhl4Wv9nmR6zRF+pYzjZ4JDawggmc0Z+A0GrnY9KBXZ43TxbpsKa4/6ffTPYZvjA/j2SSJCD8iI6W8sXIER6xuvubm/E76wK4P9TFjrDwNQbaJhJkALhGNuYN6Z+EeFFw2nFpRybarGXM4d1XVa/TH3sncjwWbeMc07aDt5RxsU6EvHAF6v4jQUY01Tg51E56Plt0GRTr6UKVIpxIjZ0JxmlQu3vEInO3YNw3JjsHQTGE/FLKv0iybn4rOQGUoZDMwFd/4VyAazCQ9mbGUc5PR4z/nsn2N8po0VySdnu0GFnoYpg+W78s1X0JmgMhXW8IacLptPliVXyGl1naB9gsgm0Qn6wbtfoXQyWQeMc7rUpNiHZFhTUAwFPbGtQPphoMIs6Cnfu/+AY7mDRHPyXLn+g90oFvvUqp2ClX9Dt2BBLgoacqhzXF/UufxaHlGg5nf6em+MNF46147JK5x2fK+rrrRyZsW/Zhem66V47KJEEwz3mw/lcK7KbyO42qAYRBIaPD87ERPr+0TNTLw47ecg474w2B6O0n0pyHSPR+ZusYh1FQAp4zgyp4x80mE1sF0jNKdE9TIIZ4T1hIaUiD4z3Usv5OibKhXdebYfFsHoIW1eI52alXHpsdy0CZMLrjhwgR3MR5aTilEOtyRXodXPKYvbHvfNNbOKOf4Rr3Kj88NeefVrsXQm888/yQ3dCIAE2OmEWLp9vlm3fb137fzFM9y0xDPw6s4LD8x9ooZz9Q+4sFNAnLiVeBVLOy3i7U x3RqeU35 LflVmkQNDpPLn1XAs0IRM4T01T3tbwFqBqHLlVCFmxXm6W+Q7bOVF0YLoA2pIu4MAmIhGO9GjUb3cyNA9ZcaU89ZnYTn18LFCya4/cHU957cihGypajetyRyl3yp7VZI54nwximlQ0KLACB4s+Hb3kdZztEUbzHQLi6Vr2m1c5o0sJJ4kZWXF8HjEBBiC2oC5o4/H6+eFzQlYd3aSUK7ZIqBKj4F2DmpWjLbLqbQFBZTXv+h8PRxQH+MxZKFZTqVmbMBzDR9awvHeeBmeJjyDS6jDvOV7YI2Gm5TJBlzd6toiwdVr8XqHd071VGM8OkA1rzqZ5QDd7umqpZ/HyHOsCXRPRwsXob+BI7VBX37SBUceO7MPKXDBCQ0unA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When a signal is handled, the context is pushed to the stack before handling it. For shadow stacks, since the shadow stack only tracks return addresses, there isn't any state that needs to be pushed. However, there are still a few things that need to be done. These things are visible to userspace and which will be kernel ABI for shadow stacks. One is to make sure the restorer address is written to shadow stack, since the signal handler (if not changing ucontext) returns to the restorer, and the restorer calls sigreturn. So add the restorer on the shadow stack before handling the signal, so there is not a conflict when the signal handler returns to the restorer. The other thing to do is to place some type of checkable token on the thread's shadow stack before handling the signal and check it during sigreturn. This is an extra layer of protection to hamper attackers calling sigreturn manually as in SROP-like attacks. For this token the shadow stack data format defined earlier can be used. Have the data pushed be the previous SSP. In the future the sigreturn might want to return back to a different stack. Storing the SSP (instead of a restore offset or something) allows for future functionality that may want to restore to a different stack. So, when handling a signal push - the SSP pointing in the shadow stack data format - the restorer address below the restore token. In sigreturn, verify SSP is stored in the data format and pop the shadow stack. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/shstk.h | 5 ++ arch/x86/kernel/shstk.c | 95 ++++++++++++++++++++++++++++++++++++ arch/x86/kernel/signal.c | 1 + arch/x86/kernel/signal_64.c | 6 +++ 4 files changed, 107 insertions(+) diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h index d4a5c7b10cb5..ecb23a8ca47d 100644 --- a/arch/x86/include/asm/shstk.h +++ b/arch/x86/include/asm/shstk.h @@ -6,6 +6,7 @@ #include struct task_struct; +struct ksignal; #ifdef CONFIG_X86_USER_SHADOW_STACK struct thread_shstk { @@ -18,6 +19,8 @@ void reset_thread_features(void); unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, unsigned long stack_size); void shstk_free(struct task_struct *p); +int setup_signal_shadow_stack(struct ksignal *ksig); +int restore_signal_shadow_stack(void); #else static inline long shstk_prctl(struct task_struct *task, int option, unsigned long arg2) { return -EINVAL; } @@ -26,6 +29,8 @@ static inline unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, unsigned long stack_size) { return 0; } static inline void shstk_free(struct task_struct *p) {} +static inline int setup_signal_shadow_stack(struct ksignal *ksig) { return 0; } +static inline int restore_signal_shadow_stack(void) { return 0; } #endif /* CONFIG_X86_USER_SHADOW_STACK */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index e22928c63ffc..f02e8ea4f1b5 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -232,6 +232,101 @@ static int get_shstk_data(unsigned long *data, unsigned long __user *addr) return 0; } +static int shstk_push_sigframe(unsigned long *ssp) +{ + unsigned long target_ssp = *ssp; + + /* Token must be aligned */ + if (!IS_ALIGNED(target_ssp, 8)) + return -EINVAL; + + *ssp -= SS_FRAME_SIZE; + if (put_shstk_data((void *__user)*ssp, target_ssp)) + return -EFAULT; + + return 0; +} + +static int shstk_pop_sigframe(unsigned long *ssp) +{ + unsigned long token_addr; + int err; + + err = get_shstk_data(&token_addr, (unsigned long __user *)*ssp); + if (unlikely(err)) + return err; + + /* Restore SSP aligned? */ + if (unlikely(!IS_ALIGNED(token_addr, 8))) + return -EINVAL; + + /* SSP in userspace? */ + if (unlikely(token_addr >= TASK_SIZE_MAX)) + return -EINVAL; + + *ssp = token_addr; + + return 0; +} + +int setup_signal_shadow_stack(struct ksignal *ksig) +{ + void __user *restorer = ksig->ka.sa.sa_restorer; + unsigned long ssp; + int err; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || + !features_enabled(ARCH_SHSTK_SHSTK)) + return 0; + + if (!restorer) + return -EINVAL; + + ssp = get_user_shstk_addr(); + if (unlikely(!ssp)) + return -EINVAL; + + err = shstk_push_sigframe(&ssp); + if (unlikely(err)) + return err; + + /* Push restorer address */ + ssp -= SS_FRAME_SIZE; + err = write_user_shstk_64((u64 __user *)ssp, (u64)restorer); + if (unlikely(err)) + return -EFAULT; + + fpregs_lock_and_load(); + wrmsrl(MSR_IA32_PL3_SSP, ssp); + fpregs_unlock(); + + return 0; +} + +int restore_signal_shadow_stack(void) +{ + unsigned long ssp; + int err; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || + !features_enabled(ARCH_SHSTK_SHSTK)) + return 0; + + ssp = get_user_shstk_addr(); + if (unlikely(!ssp)) + return -EINVAL; + + err = shstk_pop_sigframe(&ssp); + if (unlikely(err)) + return err; + + fpregs_lock_and_load(); + wrmsrl(MSR_IA32_PL3_SSP, ssp); + fpregs_unlock(); + + return 0; +} + void shstk_free(struct task_struct *tsk) { struct thread_shstk *shstk = &tsk->thread.shstk; diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index 004cb30b7419..356253e85ce9 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -40,6 +40,7 @@ #include #include #include +#include static inline int is_ia32_compat_frame(struct ksignal *ksig) { diff --git a/arch/x86/kernel/signal_64.c b/arch/x86/kernel/signal_64.c index 0e808c72bf7e..cacf2ede6217 100644 --- a/arch/x86/kernel/signal_64.c +++ b/arch/x86/kernel/signal_64.c @@ -175,6 +175,9 @@ int x64_setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs) frame = get_sigframe(ksig, regs, sizeof(struct rt_sigframe), &fp); uc_flags = frame_uc_flags(regs); + if (setup_signal_shadow_stack(ksig)) + return -EFAULT; + if (!user_access_begin(frame, sizeof(*frame))) return -EFAULT; @@ -260,6 +263,9 @@ SYSCALL_DEFINE0(rt_sigreturn) if (!restore_sigcontext(regs, &frame->uc.uc_mcontext, uc_flags)) goto badframe; + if (restore_signal_shadow_stack()) + goto badframe; + if (restore_altstack(&frame->uc.uc_stack)) goto badframe; From patchwork Tue Jun 13 00:10:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277747 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7349C7EE43 for ; Tue, 13 Jun 2023 00:13:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CDCCE8E0021; Mon, 12 Jun 2023 20:12:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C868A8E000B; Mon, 12 Jun 2023 20:12:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A64588E0021; Mon, 12 Jun 2023 20:12:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 8A7C18E000B for ; Mon, 12 Jun 2023 20:12:39 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 320E44033D for ; Tue, 13 Jun 2023 00:12:39 +0000 (UTC) X-FDA: 80895798438.29.2F46A49 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf08.hostedemail.com (Postfix) with ESMTP id 07F05160011 for ; Tue, 13 Jun 2023 00:12:36 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=eb5IGzFw; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615157; a=rsa-sha256; cv=none; b=VyaftKwI0I9n//x1TzaQj+sZ43KqO9MNZmqDb5GEUfB2AN6PHmUMzLncdvGYOJr+hH/K0l kp3V06l6xDP/+UIVdFnNKTKN7xJo1oWEOsO1dURVrn4pF/mCC4cuvsQa32nWTBSJe3DTQR Eeix8y7R17AWQMiV7LfQQnvPkx/UhnM= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=eb5IGzFw; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615157; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VJQfD0Gy0U01zHXthnUY9VvfAHAiqWwVCIxa02ED0FQ=; b=LwFT/tPPaAiIhpnmk2IAnnrXz7xXRJxrZorOzcSiIvI2nPPufmsLnq/mn8pnJeYBdzs4hW SmfX9pyNbffij7F102ZpefPshDzs0w4hzXi7Fg5euxeXFzvA2DTwhJLZEe6AcOs5wAUBBz BlZBVgY/3zv1SiYD2WZfR9EnOvMaZAA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615157; x=1718151157; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qj0G4UwqkNNGQlR4Z3X67pikfZ5WOGk4HokEGvo8sro=; b=eb5IGzFwghUhxxg5RCggzg8UdJIc673PHnjWpVHWN1qsJnhW8Ygro7kg A4PGBniAxNi0jb2RHIpkvqmjsCDZs0T9bz3g6h4ZUFmC39AyDlmUymb4x /EXSjQeh806eaNuVcDE54vP6iAYZp4DlhHVwIVh8UFgx5eqhDY48BczXl f3Po3lRVhMIGa4rD8BvUQq/sQWE8pTqqpNiYE11QPYhgWGmcPQBtSfvoG a5C62G1dbJlLsPiNsGHjyaZLZY8FZ660D79y1eDeXCEpKxtepmUlfoFg6 BH256noCPuqQztKiKCJk28ceI2EgL0z7DW8eRaal9Yjf8s+1bXhenE5EP Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557423" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557423" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671113" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671113" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:35 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com Subject: [PATCH v9 32/42] x86/shstk: Check that SSP is aligned on sigreturn Date: Mon, 12 Jun 2023 17:10:58 -0700 Message-Id: <20230613001108.3040476-33-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 07F05160011 X-Stat-Signature: rryuokfr461iwjqbksnma5mbsm3f13ut X-HE-Tag: 1686615156-190575 X-HE-Meta: U2FsdGVkX1+GcCI7DMYhhWg06SKl1+dPJEA8ywUzZSluQuGwt5LH47peKl1eIToMJxD3T76CvQygUHlJoryQaOezf1Pti/AowkpSRsAB58D/hxVQQKy3U7axaWD5RgAJCC1kSojKH/19Y+zCcRPsUOUg0U5iObM322rXQVjUwruEcX7kij7FRKSJkVSGcZOpX60/N1Md554jGaUbHWRbkGPMdOoA9dY3KV8nD5C/wwYIfQc1M33WTdO554Jw/uBc/bEvlQ9bwIzlAkPrwVQ2hBisnRwaCJQgQjE9buYLfRfIfaRllXoL+RYKlRNNoSFlpfJgxCBsKeFUz4ozpN8YX5Q1LVmmi4ZFWj66jxZLen2Q7aIwDwLaLy7hfbHuPVUEJaKw8ZgY6TFfM4sxseLPofAEfjW9em65Jv2Ycf4E3aIcZlH3KqRc1BGA579d6MMY78j1KlmTxH+iY3pK0W8X0Zl7gwigzNTXsuHmeDXdIFHfdVI4w3IEwl68Wkm4MJqv5qoGEtXHvBASmKHjRoKV/wn65+LFi2d/UAzPCM2d6w7gnEX7t3E/eU602sR2Zvb0GiP0K/H2o/mXIPRPy8cOcr9HB9fSkmk5wmRz6G4L0gOhaF/VZ5eU63Ur2WVEwAz1A9sbIEG74plUtIQwNYLUi4LtDkmeHTIdt8MGI92kG7Mz4ylXR8lTH6gXpljrbIosH1JJf2lHkj0qWIR8GqtVvjqtsiF4QW4UI6qBy/oy0Lrc8Ww4QUDq/wZ+GfxTp/BrBF5MOGD8etGl1ALGeO3pLC/KgFlM7qJg5MgbHy5QaGRgTLW3hGeWG91nBfN4344z5IzOhQhtJEYoFHUiMwwQA9qvCyJ415pxmVjY93xz/ZgAg3eauj22piN+Y7gErJwyklD3iXOjzypVpGYiZy3P2JHSmC3DsOvhTHLwvNpLR1qXNbHMyPs3CzEAJkB8l/5egUPls84opPkj7JKRqHG uYmkBE8+ f6dJE7NF8NGybUK92T6phcXCkHwwOeRPL4qCt/XXIvafw6ez8E1UeRrx+56XqWVI5pIgqp2deHHXF41GcnsuBH+AIQPTqg+rr67HgIiFR8t22Ot81m1a5iNCuNxNzzVPWFNHNFaFBuBnQUH+ibyD1XWxyCpQ2NFgEUrT5fRhiUEKDLM9DEF9ecIEnYIZB7CSxTugtICFRhuUQ+tShWxzvUyzmraYlgCuke8K8hpy/Y+W6Xyw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The shadow stack signal frame is read by the kernel on sigreturn. It relies on shadow stack memory protections to prevent forgeries of this signal frame (which included the pre-signal SSP). It also relies on the shadow stack signal frame to have bit 63 set. Since this bit would not be set via typical shadow stack operations, so the kernel can assume it was a value it placed there. However, in order to support 32 bit shadow stack, the INCSSPD instruction can increment the shadow stack by 4 bytes. In this case SSP might be pointing to a region spanning two 8 byte shadow stack frames. It could confuse the checks described above. Since the kernel only supports shadow stack in 64 bit, just check that the SSP is 8 byte aligned in the sigreturn path. Signed-off-by: Rick Edgecombe --- v9: - New patch --- arch/x86/kernel/shstk.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index f02e8ea4f1b5..a8705f7d966c 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -252,6 +252,9 @@ static int shstk_pop_sigframe(unsigned long *ssp) unsigned long token_addr; int err; + if (!IS_ALIGNED(*ssp, 8)) + return -EINVAL; + err = get_shstk_data(&token_addr, (unsigned long __user *)*ssp); if (unlikely(err)) return err; From patchwork Tue Jun 13 00:10:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277748 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 995EDC7EE2E for ; Tue, 13 Jun 2023 00:13:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3151B8E0022; Mon, 12 Jun 2023 20:12:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2785A8E000B; Mon, 12 Jun 2023 20:12:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 07AF78E0022; Mon, 12 Jun 2023 20:12:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D87268E000B for ; Mon, 12 Jun 2023 20:12:40 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id B008012038F for ; Tue, 13 Jun 2023 00:12:40 +0000 (UTC) X-FDA: 80895798480.15.DD82501 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf01.hostedemail.com (Postfix) with ESMTP id 903EA4000B for ; Tue, 13 Jun 2023 00:12:38 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=TuMDuGQ+; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615158; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=j/S9+lJTj7lOJLdBjMtY21maw2CmhDPCo82KqZj1ygs=; b=Yw6FqlbkqCFg1edhG38t9NhDYkVIVlcQUCjBkE7n4rtXckPFZYS1SerOFDDB5VUzdzfh5O A9As9VeNU4uZL59JqnjxF1ovQZ8FbG0gQo6rImf6TqtEi4yHGXPO9xLo70btTBVDP+npvx s8O/1yytycT87AwL2NKt7rOeZKPMwmo= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=TuMDuGQ+; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615158; a=rsa-sha256; cv=none; b=SJJXPGGbX8xT2ct9zY/idSGFHy5M8LHRJ/0nwCAvzFBOVXnGDPdGvEqyVxxpfGQAttHR4L CXdgCHrMf0k9BQ7YuNRn6XUCVDIrNFm2hw1UT1PuggOmLgQWSrzw4t8qRUIsiOb9rOV6ZV 3W62JFaMrnsNBhRkqI8fyZ2re8WUj74= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615158; x=1718151158; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=T7gc3JiJMbmruUv2dTeFaPrm+ED2TBUgFQFH5yJf4mA=; b=TuMDuGQ+bVKbHvVgiXg4Rptm7zpSz/mAAFVTQEHniVOERzkmCtxlOXNx wQjxP3J0blNOFqv/nYvjp6n2HiqHe1QRpq9z2AazPV4zJSd6LaajmSYNY i9uYb/L0fKZEYO01l0z0JYNcXdw/f9q+I3o0wcSCjNjWcvktWwOTkmvGD 22jl5N6e4k+lRfy9C66I/o/zJ8nNYzAfvA1z+Cx2CIHKnA2sCNpS+hKyC bvtGLP4xlAdAOQkyN9tBfoiXUMJs2WtDSi0zfLA/NlwHwP52+lytG4WdR ndTbKiCFOYZSids/h4TtrO6vD4Nid/VqcXkbDOzcfo8PEXczdL610DULM g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557448" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557448" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671121" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671121" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:36 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com Subject: [PATCH v9 33/42] x86/shstk: Check that signal frame is shadow stack mem Date: Mon, 12 Jun 2023 17:10:59 -0700 Message-Id: <20230613001108.3040476-34-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 903EA4000B X-Stat-Signature: j5xcbqfjawidnqih3ebyuboui43zhzq5 X-Rspam-User: X-HE-Tag: 1686615158-309852 X-HE-Meta: U2FsdGVkX19nhSTCQDbeEUkS4rbAgSIVFVMq3mV+KPrmZcrsQYCHcQG81nErx4wuAl9vMVDCpteMa1gODqEbybUQ7gG1o26vLChbkMwHDErxH1ti6UkNLL+vSp3+aegYahpdSlH/ncSzk6yBqmqVKUTMTIcQJKuvGOR330Q7DIbQyeJZhMBOJaPF7I1sOavsVW9Syrc5JTNf0L8TNL2UBg1k+KI/PwtH2ju+VDVGQscmpt4HNRusJkKpycN9UHuJoWhSMUDHaKyzZV0/Uas9ELdGT0B/G1Rgrsxl+QTJoyKkiiUji+SMlIU0dArMh5NWfMtxXEWwkxMYwm8Wt/6zEOkel2kkI1N7McTJGnURJmLWSP+RywXlwm/UOjZcDreEfy6U6k6B0kB5oTBIRV91UbiOw9KyLQ4QIl0/uIqdALenSLfebtX5qYgsAscOigQhZm+ND45Ts79jB016G8drB8R043ldafRdHhyc9hjU7Ky+3D7Cnq8S5pjxqfoyiNoC6nN8kYMkNfJ8z3Q+IDT3KUB9WAMVa+crQnkCl4Cf3tSD05DOgtxW12r+TOjdLYul3S98Snf2gHgWEwyf4kOxGYHhvCIQU32+IAZ+rfcX7Xo3QVK0q4oAX95bLGpvk28GWrDmYB3lzpKnw3H6RwTtXlEueF3uHHXR4vGENWsPO2pJfV354XgFrOJ76fiLCQhbi1B9eXblzcQH0btJM9jO4vQoJoR9Gf29V3zjKTNaRxWPcNutcrZGzce9VRR9w85W9ZSSag6cxOK9S5IiGltBiVZDMCKqaanEKrhb9qdt3XcWqXJCUN5GOMbMl4nuhNA2W+bRwFf2CiN919Nmotw67iNiFF9Vm2ek7XpOUhNJUK2Brj4PTB3oJB2Auws10TNlOKWd+zWqbYHVXnAhrz2RdpTRbiOF3XfaYawJkJeK8dwJknRuG8JSKeIcG5Y9tNpfQVB2tLCGXztxJhHEJAB 9KZ91rEA EeeXOld5BjpTdI4VfGmgCS4X4QWTRJ9Jx9F/c6rvKmhwpZw1mwT/+XyT8j8YAfyk6RpF670bPLIW3uJIeVc6O/TYXQ2Xtys/5LCumZmQSdv8XLE7qje+Lqk9p5X63j463BZZoxZtsTHAYaj6ic77Lc32u9veh8JCBFDXRA9CclPhJ6vdbYEU3bJBcNEo4vvDNA19TzM4kYFSdPspvGaRkazN4KqWC3uVfM/C/3ceZiKNGEnoBuwKl560bBQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The shadow stack signal frame is read by the kernel on sigreturn. It relies on shadow stack memory protections to prevent forgeries of this signal frame (which included the pre-signal SSP). This behavior helps userspace protect itself. However, using the INCSSP instruction userspace can adjust the SSP to 8 bytes beyond the end of a shadow stack. INCSSP performs shadow stack reads to make sure it doesn’t increment off of the shadow stack, but on the end position it actually reads 8 bytes below the new SSP. For the shadow stack HW operations, this situation (INCSSP off the end of a shadow stack by 8 bytes) would be fine. If the a RET is executed, the push to the shadow stack would fail to write to the shadow stack. If a CALL is executed, the SSP will be incremented back onto the stack and the return address will be written successfully to the very end. That is expected behavior around shadow stack underflow. However, the kernel doesn’t have a way to read shadow stack memory using shadow stack accesses. WRUSS can write to shadow stack memory with a shadow stack access which ensures the access is to shadow stack memory. But unfortunately for this case, there is no equivalent instruction for shadow stack reads. So when reading the shadow stack signal frames, the kernel currently assumes the SSP is pointing to the shadow stack and uses a normal read. The SSP pointing to shadow stack memory will be true in most cases, but as described above, in can be untrue by 8 bytes. So lookup the VMA of the shadow stack sigframe being read to verify it is shadow stack. Since the SSP can only be beyond the shadow stack by 8 bytes, and shadow stack memory is page aligned, this check only needs to be done when this type of relative position to a page boundary is encountered. So skip the extra work otherwise. Signed-off-by: Rick Edgecombe --- v9: - New patch --- arch/x86/kernel/shstk.c | 31 +++++++++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index a8705f7d966c..50733a510446 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -249,15 +249,38 @@ static int shstk_push_sigframe(unsigned long *ssp) static int shstk_pop_sigframe(unsigned long *ssp) { + struct vm_area_struct *vma; unsigned long token_addr; - int err; + bool need_to_check_vma; + int err = 1; + /* + * It is possible for the SSP to be off the end of a shadow stack by 4 + * or 8 bytes. If the shadow stack is at the start of a page or 4 bytes + * before it, it might be this case, so check that the address being + * read is actually shadow stack. + */ if (!IS_ALIGNED(*ssp, 8)) return -EINVAL; + need_to_check_vma = PAGE_ALIGN(*ssp) == *ssp; + + if (need_to_check_vma) + mmap_read_lock_killable(current->mm); + err = get_shstk_data(&token_addr, (unsigned long __user *)*ssp); if (unlikely(err)) - return err; + goto out_err; + + if (need_to_check_vma) { + vma = find_vma(current->mm, *ssp); + if (!vma || !(vma->vm_flags & VM_SHADOW_STACK)) { + err = -EFAULT; + goto out_err; + } + + mmap_read_unlock(current->mm); + } /* Restore SSP aligned? */ if (unlikely(!IS_ALIGNED(token_addr, 8))) @@ -270,6 +293,10 @@ static int shstk_pop_sigframe(unsigned long *ssp) *ssp = token_addr; return 0; +out_err: + if (need_to_check_vma) + mmap_read_unlock(current->mm); + return err; } int setup_signal_shadow_stack(struct ksignal *ksig) From patchwork Tue Jun 13 00:11:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277749 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 619FEC7EE2E for ; Tue, 13 Jun 2023 00:13:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B05F58E000B; Mon, 12 Jun 2023 20:12:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A18CE8E0023; Mon, 12 Jun 2023 20:12:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7573F8E000B; Mon, 12 Jun 2023 20:12:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 5B26B8E0023 for ; Mon, 12 Jun 2023 20:12:41 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 42C681C7C9E for ; Tue, 13 Jun 2023 00:12:41 +0000 (UTC) X-FDA: 80895798522.09.F73C054 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf19.hostedemail.com (Postfix) with ESMTP id E6D071A000D for ; Tue, 13 Jun 2023 00:12:38 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="ODQgyXN/"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615159; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7bqbFXf7e8tAHfT5l+M8gmkEcZ4WmmRLHaSbxb3zw8o=; b=MmalItiVZcVJqk+cGBiLugk7ms7/9kvZYasW1HBvRameiacbNbKRWpwjn4veYrrZ6fFM+i EWXwH8fHMZGrGiJ7/635F6Jjd4Ab1F8raEwtzBcuwtjtI23xKk1sH0aAZRpTsPv5lUBcx/ WBZ/CsdvFE+gE26DJApiQgRS70hvNhg= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="ODQgyXN/"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615159; a=rsa-sha256; cv=none; b=H6+bzqv6f9YJRWhm49SVMzdMx47M5KjMr8P1jEj8G3AzuPSLHaQ0Qjf4y2+yYjx0N68o2U 1R/3BQ4S1ZU+97dghxfDEjWC/ymJUvuuPZihqseKAOJ8Tc5Mc+lzdYv9TrjCclxn1DyBKU VTJmtSnaqiZmIifw89HC5BL265sDX8E= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615159; x=1718151159; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=q/smTC4JyzbSbIdcW3lB/5FNt1IRicAxT9isRFlZelE=; b=ODQgyXN/SbldUxlz6q5sqeUPH5B1lpOdFbwjTrt9Act871Lx+rfDcNvy XRSLivut1TyVXcG2y/AebqehyGOzrD0Cn4STyw31bIYRdHzaOLeeeQOzB QCKpPL6CMboMaSglG7aKDlr7AsLgtjkgMy+NgQ/UUBeIqM7zRcsNfDmVS dueDHKGSAR3xIy187mQfOrtkL/7vcr09a2+kefVNvx2y4bbc05qNpoZj5 Lf8Ao06MWEzKEOeliNjORSdLnguS2IsApGupTQThxqF5emj6wp0Ii8GQ+ Ikb8AoGyoW/NJFwnfVg4S4M6yct4IyVR9pYmnd2JkmlmofJ9/QyKPObqr Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557475" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557475" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671125" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671125" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:36 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 34/42] x86/shstk: Introduce map_shadow_stack syscall Date: Mon, 12 Jun 2023 17:11:00 -0700 Message-Id: <20230613001108.3040476-35-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: E6D071A000D X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: gpbgc1y7psdipnfqdwc4u6astt1tce8x X-HE-Tag: 1686615158-925107 X-HE-Meta: U2FsdGVkX18L2Jcth/FZin86SfNZv6weUOp2LZNJQltgxXqEe+Fgq0imAfupfLuXKAbeZt37bTvGO/omRvdoUJdF0hYfTNiGiQmzlrS34ISBbY6GJ6uGkqMC8e9l/3iwcCVAUyalm9TwupfgwDYS+E7mnDjaRXE2DQ+U32mwJFlbf5Cc4mBn4RpZRy7dRlN78mlPLXdhnKmdsL6SO04La83RGFAgLjM6LejhwP57IWMG/uG9Snm8ccio3MmY6zJ7DCOB0zBGyXnS3elSR4JrGg5idIlNuxTbv/NhIHnQMZ2u59OWT6/9aWx+tdI5q4NhAVevlocdpTaNSEwDDrrgfOCqVt018w3VNGU573BOG0wpSWq1IUrtPLOomdyvh/Ub/k9FTwfWVPsyOr3gD3fVG+vVZHN33nH/419XE3mfkYGI6jhoDD7fa1HsgPsLv2uOmoqaJGYSd6ZvocgYnlAHlE86EYOBMBO9dIxO59ZxDFooI9c7ABnutDjkm0ZzvpKn6io5L1E/nnrBNjqeXQZjg3kfbRjwpUGTAR9Njgu+vx/pVm1D3YFs+6qMk97Q+no7R1DGOVE5ogf6PGQNLpXf4xY02Gxklnu2nklpuLfJyJdmbc/Ff3/3z5ddArJKVPbhWp8a6B4xzGUmNONgRuIM9rW2hfXdXxlcbe9SVGeNDvYAY4YU/usWdTaqMsiKyggZH8QpqFlkQH3m4QoNkCTuuDtf1M6pNtqVX3n8xb7Ppd8TLI8BveO+MsyXuuxjWEHL96fgKtXUdkf1LFz+Z7eHAy1+hIxrTsOJ7rAQbdpkIbu3R52A9Hmqr01IUTqewOBytsrcsOMBp8tH6eJ6ohDt9BS0v8r57R8iaq2Hqz0gjqYAa8SzxDeyOSlCuL5mhSMYMeKdmu3tTEYO1hZ05MUq2lomRzkcSGtp1pvHyByenALJmnUUHJBvzr/F/J0YKjW82mEHuOq/oJ7OA4r4AOJ GN7BfqBX 7TbmSv9pjmqCM6MNRaDGk0SCIFCJMpBYXjuK7tcjQJ721dT4UWWCQfdoN8b4f2vok3KjyOWoENFolwTYo/7m8J3ry0suz8IZ58sin3ZT+js2Lgrv4QQMjZ9NgWtRWbc+0oJ89CmRMykfsddvTSB/HX4C57OEoGWx6JKpm2Rtxp0CmvzkmZu1Hr+jtIOOZHedUNYKOHbXFeZa7yCxHjKWgeop054KNvGj0ls72Yxbkl3xvDsvLR538UKOYWpmhGO3pqbH3AiWUwPhYjQ/6GZ7ZpuGTgMshVtGY05lUD51dblA7KCnrOsi+uAjWWBGwGVyHv8lcKxyazZQPBSMnFJVF71gthtG2+d7S2XAWzMa2/VZR0G/yxS+MP9FB+IdTTA49zqyxQ155GPa90VkOa7m30f+pTY22U34wpI0ujccVREM8npA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When operating with shadow stacks enabled, the kernel will automatically allocate shadow stacks for new threads, however in some cases userspace will need additional shadow stacks. The main example of this is the ucontext family of functions, which require userspace allocating and pivoting to userspace managed stacks. Unlike most other user memory permissions, shadow stacks need to be provisioned with special data in order to be useful. They need to be setup with a restore token so that userspace can pivot to them via the RSTORSSP instruction. But, the security design of shadow stacks is that they should not be written to except in limited circumstances. This presents a problem for userspace, as to how userspace can provision this special data, without allowing for the shadow stack to be generally writable. Previously, a new PROT_SHADOW_STACK was attempted, which could be mprotect()ed from RW permissions after the data was provisioned. This was found to not be secure enough, as other threads could write to the shadow stack during the writable window. The kernel can use a special instruction, WRUSS, to write directly to userspace shadow stacks. So the solution can be that memory can be mapped as shadow stack permissions from the beginning (never generally writable in userspace), and the kernel itself can write the restore token. First, a new madvise() flag was explored, which could operate on the PROT_SHADOW_STACK memory. This had a couple of downsides: 1. Extra checks were needed in mprotect() to prevent writable memory from ever becoming PROT_SHADOW_STACK. 2. Extra checks/vma state were needed in the new madvise() to prevent restore tokens being written into the middle of pre-used shadow stacks. It is ideal to prevent restore tokens being added at arbitrary locations, so the check was to make sure the shadow stack had never been written to. 3. It stood out from the rest of the madvise flags, as more of direct action than a hint at future desired behavior. So rather than repurpose two existing syscalls (mmap, madvise) that don't quite fit, just implement a new map_shadow_stack syscall to allow userspace to map and setup new shadow stacks in one step. While ucontext is the primary motivator, userspace may have other unforeseen reasons to setup its own shadow stacks using the WRSS instruction. Towards this provide a flag so that stacks can be optionally setup securely for the common case of ucontext without enabling WRSS. Or potentially have the kernel set up the shadow stack in some new way. The following example demonstrates how to create a new shadow stack with map_shadow_stack: void *shstk = map_shadow_stack(addr, stack_size, SHADOW_STACK_SET_TOKEN); Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/x86/include/uapi/asm/mman.h | 3 ++ arch/x86/kernel/shstk.c | 59 ++++++++++++++++++++++---- include/linux/syscalls.h | 1 + include/uapi/asm-generic/unistd.h | 2 +- kernel/sys_ni.c | 1 + 6 files changed, 58 insertions(+), 9 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index c84d12608cd2..f65c671ce3b1 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -372,6 +372,7 @@ 448 common process_mrelease sys_process_mrelease 449 common futex_waitv sys_futex_waitv 450 common set_mempolicy_home_node sys_set_mempolicy_home_node +451 64 map_shadow_stack sys_map_shadow_stack # # Due to a historical design error, certain syscalls are numbered differently diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h index 5a0256e73f1e..8148bdddbd2c 100644 --- a/arch/x86/include/uapi/asm/mman.h +++ b/arch/x86/include/uapi/asm/mman.h @@ -13,6 +13,9 @@ ((key) & 0x8 ? VM_PKEY_BIT3 : 0)) #endif +/* Flags for map_shadow_stack(2) */ +#define SHADOW_STACK_SET_TOKEN (1ULL << 0) /* Set up a restore token in the shadow stack */ + #include #endif /* _ASM_X86_MMAN_H */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 50733a510446..04c37b33a625 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -71,19 +72,31 @@ static int create_rstor_token(unsigned long ssp, unsigned long *token_addr) return 0; } -static unsigned long alloc_shstk(unsigned long size) +static unsigned long alloc_shstk(unsigned long addr, unsigned long size, + unsigned long token_offset, bool set_res_tok) { int flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_ABOVE4G; struct mm_struct *mm = current->mm; - unsigned long addr, unused; + unsigned long mapped_addr, unused; + + if (addr) + flags |= MAP_FIXED_NOREPLACE; mmap_write_lock(mm); - addr = do_mmap(NULL, 0, size, PROT_READ, flags, - VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); - + mapped_addr = do_mmap(NULL, addr, size, PROT_READ, flags, + VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); mmap_write_unlock(mm); - return addr; + if (!set_res_tok || IS_ERR_VALUE(mapped_addr)) + goto out; + + if (create_rstor_token(mapped_addr + token_offset, NULL)) { + vm_munmap(mapped_addr, size); + return -EINVAL; + } + +out: + return mapped_addr; } static unsigned long adjust_shstk_size(unsigned long size) @@ -134,7 +147,7 @@ static int shstk_setup(void) return -EOPNOTSUPP; size = adjust_shstk_size(0); - addr = alloc_shstk(size); + addr = alloc_shstk(0, size, 0, false); if (IS_ERR_VALUE(addr)) return PTR_ERR((void *)addr); @@ -178,7 +191,7 @@ unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long cl return 0; size = adjust_shstk_size(stack_size); - addr = alloc_shstk(size); + addr = alloc_shstk(0, size, 0, false); if (IS_ERR_VALUE(addr)) return addr; @@ -398,6 +411,36 @@ static int shstk_disable(void) return 0; } +SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags) +{ + bool set_tok = flags & SHADOW_STACK_SET_TOKEN; + unsigned long aligned_size; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return -EOPNOTSUPP; + + if (flags & ~SHADOW_STACK_SET_TOKEN) + return -EINVAL; + + /* If there isn't space for a token */ + if (set_tok && size < 8) + return -ENOSPC; + + if (addr && addr < SZ_4G) + return -ERANGE; + + /* + * An overflow would result in attempting to write the restore token + * to the wrong location. Not catastrophic, but just return the right + * error code and block it. + */ + aligned_size = PAGE_ALIGN(size); + if (aligned_size < size) + return -EOVERFLOW; + + return alloc_shstk(addr, aligned_size, size, set_tok); +} + long shstk_prctl(struct task_struct *task, int option, unsigned long features) { if (option == ARCH_SHSTK_LOCK) { diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 33a0ee3bcb2e..392dc11e3556 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1058,6 +1058,7 @@ asmlinkage long sys_memfd_secret(unsigned int flags); asmlinkage long sys_set_mempolicy_home_node(unsigned long start, unsigned long len, unsigned long home_node, unsigned long flags); +asmlinkage long sys_map_shadow_stack(unsigned long addr, unsigned long size, unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 45fa180cc56a..b12940ec5926 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -887,7 +887,7 @@ __SYSCALL(__NR_futex_waitv, sys_futex_waitv) __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node) #undef __NR_syscalls -#define __NR_syscalls 451 +#define __NR_syscalls 452 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 860b2dcf3ac4..cb9aebd34646 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -381,6 +381,7 @@ COND_SYSCALL(vm86old); COND_SYSCALL(modify_ldt); COND_SYSCALL(vm86); COND_SYSCALL(kexec_file_load); +COND_SYSCALL(map_shadow_stack); /* s390 */ COND_SYSCALL(s390_pci_mmio_read); From patchwork Tue Jun 13 00:11:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277750 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 881C5C7EE2E for ; Tue, 13 Jun 2023 00:13:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 86D348E0024; Mon, 12 Jun 2023 20:12:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7CD108E0023; Mon, 12 Jun 2023 20:12:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5F8EE8E0024; Mon, 12 Jun 2023 20:12:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 434FA8E0023 for ; Mon, 12 Jun 2023 20:12:43 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 233C0C035E for ; Tue, 13 Jun 2023 00:12:43 +0000 (UTC) X-FDA: 80895798606.26.C9DB393 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf08.hostedemail.com (Postfix) with ESMTP id EBD79160014 for ; Tue, 13 Jun 2023 00:12:40 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=iNOBbaHw; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615161; a=rsa-sha256; cv=none; b=Ft2ZqW9xSF/Z027r6LM551TNYB27IKq3axv5e0tqEIlCytKodx32SvdavOO4I/ppFf71mp De4Y8fzok2v1G5L4nZGAVpojEsI7727ll9tYy9gRwJTJShrk816uLxSrcX+JJKxroLVTBS HTTJiOLDJABYBlVK5frx4iUbVZ17ZvY= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=iNOBbaHw; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615161; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DsGV+FArnCWzPmUm3Iwim6nAv4YxSXJQs9pogdjIv6c=; b=PvD/4Bwyz2u+aaN4VanVu/GU9qsVxWR7s1YMQLsR9Ja8EfO307EydVozLE4qgIxdZIpinT IAyj966BuX6SrxUX/9u+8jvMokV92TOpfz467IPgvWDXiqMRx1rbuj37+gClpTlbyfiz0r gBvhlQpnytF9wA6Ndsren2nPhAvWUCM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615161; x=1718151161; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Src9+kIB+Tfk1ReeJh4vEiYrNG1G01r+lIh28P15+jA=; b=iNOBbaHwNQrzF+4PxtzNVYv3oBDRPYqyfLVXxcAdFFt/AxsR8UPl9Hr2 OfmHWQdW7C10GMla5xcv6Eqad5f5hHZnQD2LEC1qRahe8sc3EFvz4Abwg KKCnYFmss2sXot/jim+bAUFLm7fdRA333OyZoRPZl8CDtDbZtlIevJ2NU +oDC9o2L51LAi4xZWgGReX+2r7+QJlGFzJnODFOOf/Sppq8wBM+fmJCqX 1pld9n9en+YqLitnavc7+TIwsOaQbQLcASclNBhMr4anotuDY8LEmqhvq kQPgF42ZzHbzLuPPARlqQHXTOmBE6KLRqrgnL3FpuLAvkhS32AtJBoNpj Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557498" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557498" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671131" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671131" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:37 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 35/42] x86/shstk: Support WRSS for userspace Date: Mon, 12 Jun 2023 17:11:01 -0700 Message-Id: <20230613001108.3040476-36-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: EBD79160014 X-Stat-Signature: irojqmsyaaro64xbdid4ojhnekpk99xe X-HE-Tag: 1686615160-633247 X-HE-Meta: U2FsdGVkX1/Xd/u/TXkjyRxNQ8KWDfc8gY0Kh1NJqiqr6psKgyaaza2MgZZ12wv96xNi7cJsPT7WIwYg0KRAoAZ2OrFMAkuP42LzyS1FTtGmzLiKcMJZ/id4q0xk+5F/NxWsRaOQdW5d8YcobtEEpv+0jiyg4H0DMUl32TqMt3ovgUxipi8ExAFxeSznP7+DTg8IsBjvDls7bNwuy1MpaIR80t1jRfZJ7K7aVJePXQWGk8eN0br+ndro0Ed41jt2eoFKkkaXGwAxCZcT338WO8xNIOPd63wV/RVCODivKI0OTZtYomaen3RsjBDXFvCSBDj/iBQM1cdMRKzRaY24ptFCc5xeZ8uzre4AyOzM/5aPKiOrk+QwcL3o8nlgL5Mdg+fnaxpZRWR9Xc1NNiiwcQtmootbpe12aqNBgAIZntZENCPh8UCk8ne1ZmFbRtQKJpoXts32p31apwdWoUiSCAWsmxwzg4O17Z8TwfN9IySCkkoyAan/YNLHPoFnCq0kQhw11kgi/OQjEO8oZpEHJNG4LOEAti4QAPBKlcvgDRhsC8id6qEcupjtOXl9sVsiA05J/qitYKXbHvwi0Z71zB+0JLn8AYlOSONWYgt0EfsWhKHl33jsHSHz4B2pT8d7USVK+G78iHCTum1HG7siWNVNOeTYbzcttSOT87OGy3PCiONrWJS4QGMw3He1QORudy1hWg80Lpl053YWPt7kfxvFQWctH1LmSlWTPFg/gBITP7R37fXmdNb4iBN7y12sphH0Ktc2CuQp5zA5sn9UnBLESwIM+KCL8ijHKgzbxEzsDVCChpJqLlU6Mr1xEAYzjmpx+Jv0ZUXwtq1CsTMMnw22VrKY15Rc6cRHgrqMNph2Njyz3vSG41J/AZaiev8RPaAo99oV83LwnwAhohlcT2KgW+BKpTfTUoPewsyafU95rtaMrv5no3kyL2yrZt0z4kDaq8OmtA+TXXL0pB1 +hR74agE BKvbXMfP6X2KL9vmgO4Y8OeHKb9HKwt9n9J5K+6VmKj8QnNRmlJDEJEkHUCkGE88X65Nlxbp73bKZs+uU5q3bNYQ5w96bGoi0/Etao9u9OBbpQeuJSBSeFf4SUZHNwoato4QQJEd4bxcnlp6CINk4x/BScqgl6OeBsrRPSYNU74YADyYvDgh2LRpeXYdN/PX4VMu9GoHyxh8CA8mK3+705K7Y7V3pdfF7n/EELjDzNQN6qgkIFaclkwZN+cFmDhowwYd1L0aqP2zIrWA84H1BzJXvQaW7tbdeVN5k8fBgdTnM/IWZUBFmqMyd7D+UGbakyRfgmbCqNpsEmuDR9B31Oc0T6QIxaOnMcP+CJtsuf3BHe6uH2sMzTLu9lA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For the current shadow stack implementation, shadow stacks contents can't easily be provisioned with arbitrary data. This property helps apps protect themselves better, but also restricts any potential apps that may want to do exotic things at the expense of a little security. The x86 shadow stack feature introduces a new instruction, WRSS, which can be enabled to write directly to shadow stack memory from userspace. Allow it to get enabled via the prctl interface. Only enable the userspace WRSS instruction, which allows writes to userspace shadow stacks from userspace. Do not allow it to be enabled independently of shadow stack, as HW does not support using WRSS when shadow stack is disabled. >From a fault handler perspective, WRSS will behave very similar to WRUSS, which is treated like a user access from a #PF err code perspective. Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/uapi/asm/prctl.h | 1 + arch/x86/kernel/shstk.c | 43 ++++++++++++++++++++++++++++++- 2 files changed, 43 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 6a8e0e1bff4a..eedfde3b63be 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -36,5 +36,6 @@ /* ARCH_SHSTK_ features bits */ #define ARCH_SHSTK_SHSTK (1ULL << 0) +#define ARCH_SHSTK_WRSS (1ULL << 1) #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 04c37b33a625..ea0bf113f9cf 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -390,6 +390,47 @@ void shstk_free(struct task_struct *tsk) unmap_shadow_stack(shstk->base, shstk->size); } +static int wrss_control(bool enable) +{ + u64 msrval; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return -EOPNOTSUPP; + + /* + * Only enable WRSS if shadow stack is enabled. If shadow stack is not + * enabled, WRSS will already be disabled, so don't bother clearing it + * when disabling. + */ + if (!features_enabled(ARCH_SHSTK_SHSTK)) + return -EPERM; + + /* Already enabled/disabled? */ + if (features_enabled(ARCH_SHSTK_WRSS) == enable) + return 0; + + fpregs_lock_and_load(); + rdmsrl(MSR_IA32_U_CET, msrval); + + if (enable) { + features_set(ARCH_SHSTK_WRSS); + msrval |= CET_WRSS_EN; + } else { + features_clr(ARCH_SHSTK_WRSS); + if (!(msrval & CET_WRSS_EN)) + goto unlock; + + msrval &= ~CET_WRSS_EN; + } + + wrmsrl(MSR_IA32_U_CET, msrval); + +unlock: + fpregs_unlock(); + + return 0; +} + static int shstk_disable(void) { if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) @@ -406,7 +447,7 @@ static int shstk_disable(void) fpregs_unlock(); shstk_free(current); - features_clr(ARCH_SHSTK_SHSTK); + features_clr(ARCH_SHSTK_SHSTK | ARCH_SHSTK_WRSS); return 0; } From patchwork Tue Jun 13 00:11:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277751 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35652C7EE2E for ; Tue, 13 Jun 2023 00:13:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F0E18E0025; Mon, 12 Jun 2023 20:12:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 02A918E0023; Mon, 12 Jun 2023 20:12:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D99A88E0025; Mon, 12 Jun 2023 20:12:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B4DCA8E0023 for ; Mon, 12 Jun 2023 20:12:43 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8A6AA160376 for ; Tue, 13 Jun 2023 00:12:43 +0000 (UTC) X-FDA: 80895798606.18.B15503A Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf01.hostedemail.com (Postfix) with ESMTP id 614D94000B for ; Tue, 13 Jun 2023 00:12:41 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=nfWpif70; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615161; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0WlSnIOpYVVB1S6QKfwhyiJstnml2qma1WgJwX0ae3s=; b=NnUW3tpk6fE4CuU1QuLrfHV1ZRdICt8iGpDsp29pD/yRT/hEHorjJ6zIks3bZkFGT3wD5Z eTIfpsBUfRPq7Que7UO0JmMGcBnckOTcfFegwVxSohK4eX1BNqAzz6fqao50vj5rVp5FFo N2OT+iiSsUqQmOSXRZNNAQ42ztE91Sw= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=nfWpif70; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615161; a=rsa-sha256; cv=none; b=Or9pwnxJU8R+08CMuBDAY5DIho+5d9MsfxMl3L/gxbHvbESqPyrlO+q5Q+Nv8QVCd9CWT8 36K/6bPfj+I/ISmMey2utMFMk3zgPSyG/UBYijyj/ulQJ23jG8ni06Lq6AiNAMYN1qqR6w 8XhtCCoCR+f3kECy2Er50ou9DUCHj7k= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615161; x=1718151161; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=x/pco2Xet/ihWAT7hysHs+5knwsBlg8L8hAHOHmdlgo=; b=nfWpif70VUQQPe19vDAk/ide0uLj1/h5JF21vLyRMjJYby46kd2xeAI9 IjwwIL5jJ2cJyB2gxNjWlTenEtjd8JPE3WCx/NO7IkpWR83Wte8YnPV5X PzfJ8+hjO5zDSBdNCpZeBUW+8dw49EWH13y0kchl+yrNnugk3YN7HyGli D/YVkk4ywxvuWR7Kuo5bqysp+qAI3nM5PaJZVmah4mv5fcYzoFVDWqEjA BgWwuq7IssL+RPMDEclw3etW4K2MbOja0jyhlSOYd5qNC5FhPGEbDwLrd /j86nAHW+bxOKCiOb4X0HEaKQ+QGAYMtDAzBIT+QwIxwWZC+zVtM0ouDX w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557522" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557522" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671140" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671140" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:38 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 36/42] x86: Expose thread features in /proc/$PID/status Date: Mon, 12 Jun 2023 17:11:02 -0700 Message-Id: <20230613001108.3040476-37-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 614D94000B X-Stat-Signature: uianbegcxzb8fo77s7qpjdyzzf71bd3a X-Rspam-User: X-HE-Tag: 1686615161-727282 X-HE-Meta: U2FsdGVkX1+j/l7Ax8NJahfEUI8ap1HPXHyKj7TPTLhibhf1vOb3w284NDlf2cJGxHtQvHNInWj/dPEB+qTPnLYGDAw814jlZ+8NDCWMtGoBfo0f6WqTK9NqwD7XmCOktfbeb4TdGPcD/SetxY/ao3uTDWBDsVvu0q6rjrUbeDC4NLGTQsjKAZNZRItVJKK/2LVNTSaONs7G2NzHBE3OQL9g4Hk467eqXzY1WH8BhI8FSuV0XW1CUnNRrYvifZAj67k7UQNoYa/hYdaJITTqDYupjmu6KVTlPNTmEF6OtfgyhFe3wHLM4BxRTxr9EPbl2vs+LnOEy/l5OXPppk/ThKLV3zH7/4PR+pIgdV4GySkgwu3tP9JxHteTvuyo/35Ve3f11CG7jCEW+oRbYSf0CP1jB+K1xb3bXjlFyZ8Zi5eRzDzLPlikMygZvjo2J3uO3do5zJ6laDeHxteX4swYE97PnVXk7Qw9ENcRZeA9FxqVhLDOoKZjy/Gkz8yzi8HEtTAs6mdnyHS28W4qwtOUTqzInsGeOn33zVlPXkYqgTRQls1Op1TaLSvfvhCMM6YGMTBvuEzj3yFHH/Kp1BW6v03uxUloGwB6ndN568Suun1wyLQoUJy4D2JYdPCd3a9iuMWmz05sYOIooedEQKswxw5cFDTozmhOh5llknZW7+RQdvxiBYttlyEF4WHwb2wc4L3eEp8J2aMudE2D0W9eVVv8gKtZ28ejpQVuxXuf3RgLoqheVMeB0fLdP38QbIxrcYjH9vkif/ms3LCPxoO1IXa3Sm3yykw/Cc5kMsrV9Z1DTvW3p0KwwOszpgQkNK3kqv6DWHmEzUZPrVSe6Js9jV0AIDVSkUqSFJJUBSTONi2Twv/lchnNHzWJAWKI4E8WXwhUERdle3wCqxejCwnHyXcY6vFyYmIjB7nW4h8G7oOPJ+RcJ5PKzQMsRgBei69sobHuUdV97tTrhXIJTlX H8jkE02M 8tSAc/Er2weUdSpTJie9/VNd33gHPRacHlW5OwgLgzd2jsUk0ZuYQthvt6JtedUfXUldjTEk8cecdxbF1OtdmtyC6OjrAu8N9hyhO6WJc0a7NDkYrOQTuiRwMY39s/CTlGpHadJ4DPsTkSkwe+vJg2JYnhLdxFKSCNknwJFchtLRMhjKRMaxpQgn/5dtjq7RufghbJL/gLorwAagkVn9CoFLdLBKGFrBNPK2W61OQZ5HYfy33panlk/BYPT/EULKoMbz/JwESTGjNsVQmWkt7swfGoNLILmOV8cd7g5BqwFWuTJX/idtU9NlfTN8ZYkYig0wPg/YE+fEKLtrOa8P1RG1UbrS+hAsjVUu38GU5nACAMp+tPcwrCB8h0LNNSCLBQ7R/ol4vNXBxQZMjW4PvJlZHxFBEPJhtQtt6 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Applications and loaders can have logic to decide whether to enable shadow stack. They usually don't report whether shadow stack has been enabled or not, so there is no way to verify whether an application actually is protected by shadow stack. Add two lines in /proc/$PID/status to report enabled and locked features. Since, this involves referring to arch specific defines in asm/prctl.h, implement an arch breakout to emit the feature lines. [Switched to CET, added to commit log] Co-developed-by: Kirill A. Shutemov Signed-off-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/kernel/cpu/proc.c | 23 +++++++++++++++++++++++ fs/proc/array.c | 6 ++++++ include/linux/proc_fs.h | 2 ++ 3 files changed, 31 insertions(+) diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c index 099b6f0d96bd..31c0e68f6227 100644 --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -4,6 +4,8 @@ #include #include #include +#include +#include #include "cpu.h" @@ -175,3 +177,24 @@ const struct seq_operations cpuinfo_op = { .stop = c_stop, .show = show_cpuinfo, }; + +#ifdef CONFIG_X86_USER_SHADOW_STACK +static void dump_x86_features(struct seq_file *m, unsigned long features) +{ + if (features & ARCH_SHSTK_SHSTK) + seq_puts(m, "shstk "); + if (features & ARCH_SHSTK_WRSS) + seq_puts(m, "wrss "); +} + +void arch_proc_pid_thread_features(struct seq_file *m, struct task_struct *task) +{ + seq_puts(m, "x86_Thread_features:\t"); + dump_x86_features(m, task->thread.features); + seq_putc(m, '\n'); + + seq_puts(m, "x86_Thread_features_locked:\t"); + dump_x86_features(m, task->thread.features_locked); + seq_putc(m, '\n'); +} +#endif /* CONFIG_X86_USER_SHADOW_STACK */ diff --git a/fs/proc/array.c b/fs/proc/array.c index d35bbf35a874..2c2efbe685d8 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -431,6 +431,11 @@ static inline void task_untag_mask(struct seq_file *m, struct mm_struct *mm) seq_printf(m, "untag_mask:\t%#lx\n", mm_untag_mask(mm)); } +__weak void arch_proc_pid_thread_features(struct seq_file *m, + struct task_struct *task) +{ +} + int proc_pid_status(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task) { @@ -455,6 +460,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns, task_cpus_allowed(m, task); cpuset_task_status_allowed(m, task); task_context_switch_counts(m, task); + arch_proc_pid_thread_features(m, task); return 0; } diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h index 0260f5ea98fe..80ff8e533cbd 100644 --- a/include/linux/proc_fs.h +++ b/include/linux/proc_fs.h @@ -158,6 +158,8 @@ int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task); #endif /* CONFIG_PROC_PID_ARCH_STATUS */ +void arch_proc_pid_thread_features(struct seq_file *m, struct task_struct *task); + #else /* CONFIG_PROC_FS */ static inline void proc_root_init(void) From patchwork Tue Jun 13 00:11:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277752 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4CFBFC7EE43 for ; Tue, 13 Jun 2023 00:13:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EAC358E0026; Mon, 12 Jun 2023 20:12:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E0CF38E0023; Mon, 12 Jun 2023 20:12:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C5FA78E0026; Mon, 12 Jun 2023 20:12:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id A40E78E0023 for ; Mon, 12 Jun 2023 20:12:44 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 71C23AF63E for ; Tue, 13 Jun 2023 00:12:44 +0000 (UTC) X-FDA: 80895798648.16.A91CB6B Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf19.hostedemail.com (Postfix) with ESMTP id 3FC8C1A000A for ; Tue, 13 Jun 2023 00:12:42 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Q4AVEeq3; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615162; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1F1/TF3ynb464OSVq+cAxAGFjNLdRbyulxUg251OjE0=; b=0HZ71NeWuxVAWebQNTEr6eGgGVat2qQFDJ6KuXow/vNqJ1iFUVB+yv0ryUwX3XiyALU4+R Zsc616EZvX8vLtYCB6g8PmWKzm3WmMy02oNo8tVa6M+NbBRH00/6f8NbuZtWf85KAwVm8E D/fTIm0FgNBo3HQGPRyLJoExPHg7cDs= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Q4AVEeq3; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615162; a=rsa-sha256; cv=none; b=OwG5l9dSbafsGp5Kk+XPDrKBTo5t6hTqR8lVmgLQB0rTWXsRjb41nwe4g6G7VMuhiBuPz1 OMeHQhoVtdabFLRM25nxqYOF8RbYywmRm9+xg5AwXmGI36cBiwzkTXiKx19tx70LZ1qNax cv/c7PPOfZGZ9bPry8KBb4cbpK6Kk8I= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615162; x=1718151162; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xI4pbs3PiOhFQEbuDUq+4GewOhti2Nle1TQJwXiYqGs=; b=Q4AVEeq3aopZ432IPSpI0EqU0JJ14KJVzEp3IU7QemWyssNa7wJB2wz0 8zZZfoDTYtfuWgT+7NqKCWIglNcOMrBHbKikcy0Bk9oA8fuydNzjhrUXC LXE9jkUNs4dAy9KemE+iPV8g6o4ZYMQs2OkE0T3hGKlMZCtM8F/QOM8WG zUAmJ9g40iit7iZQuDgo+s3E6JKYl53bU4fceNOOZnT+0KKHYyyKxkABu 2KLf3molqHFANaqE7tyV1FvapxIObJ/J2PUd/R0YNrXkt/U8Bj/Ih60v2 PDSGhq5mCRPhCdlisseCaizpvguRfeCrfLTOHaSoVc50a4wXYApo2Jwsl w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557551" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557551" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671152" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671152" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:39 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 37/42] x86/shstk: Wire in shadow stack interface Date: Mon, 12 Jun 2023 17:11:03 -0700 Message-Id: <20230613001108.3040476-38-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 3FC8C1A000A X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 96ekcgqht1d6x3oz5tw3bokq6e3y6tjc X-HE-Tag: 1686615162-48834 X-HE-Meta: U2FsdGVkX1+7Frz4JDC4wIMb9/oTrpie4Ddnrxr5vVWd3yVdkm7NavwitUlpF8m9FeZAe3jXOETo2LkPkcAX7/LYeFQaMT2IXxkiBao0mtz8ZLPy7MvH1Jw19YZR8AsTosdnjAJ/YPujk1lFBcfzQbTjzUBqAweXqSgrogiUSvs1E5KItbRHpJYA+y3O6o4QM35FJZ0k5DXTx98iE5J9j7wuua2nKVNisl3yclawRx53CLGoQwSour0uf77Ca8WS6SCyPtY9zL0QFb9cv95phIqDJLMyrc8s1eet0bCLzUs2i0E+LWkHanJN5WNchvJRpfcHptMwhtbeaq/wxgCI6yDEXlIuxqbtARjKv4NsSTLNyaeh1Q1KouAennVgABPHVJ5prz6IAsPjBEoRKLYFzuBkFKyO183e1df4jpR53UsNt8jOmqdWiLkpMdO/sYMtYbIMRSv1BECrjH36SBDYI/ayQuVBXVhH7oBRByIB3XRyVCSD2n3qrozrHfQ+yp0KVkyUxdFz6npK3lP2PxwI7E8qZcRUPJ+h4DkOAKtjOIUM/3/J74xrx9i9yxnARvuDMPCUidqsRw87ogo3r2rJICzcfwCL26OWXFpS11NrAmVb9ZCCsCFY87y/3NSQViqTEFfoC0/pVRtUqfnXoh3rvWvsN9jPm1b+fwcWdMUQ/w4ppew+8JVSw0J25cBcH1JzBF4k5VihINKQYDwKOQZf+oGC0rQ2Q2ORiRJ21surLDa66H5JJZ9dSoBhNdjE0kxHDJl2UxcucnUQw9iYU0c5rH2rMhAceGDnNvzqxg65KT57hYFFvm5W2QcAzG243ZzH7mnAlW3FifXpbWfUMNAB+Y4/GP3Kg2uKM4aU1QNPqyIWHShBjc5anervluTHijtDw6pGG0Lr15/9lURyMo/C/A2D57D/EyaVMwYryYl44Y7lRXlDJn1icZ1Y5RwEpgFuxVqiq2xMX4FjHjED8ou iOxCaGsL H1ND9TD2LoYs7hYx3Kkuy6nAGKbWer8WiMoDKreEchowvM9Cr0pcwNjNdlU2z6otapCa4cES3lXngUw6zI7g6VKzVbNGCu5hAJ9j8iwCdI1UFsZkwIf50cvWIexwrZarIMzXm1owPQ81Pp8PE0MmkzZET/2N9kuC1McBZ5YjheN44VHPVA3BW/77AzL+lxqfY5roje3R9dcLn0y3H2ZUJuCTzbvvfzbNN0VDz9RvdLITx0ZZLEQLHef9yAUoUIO9ViKBw6QUKnEH3+YFhi7sCLNgAzVAYdaVKyux9fzTn7JUSORUN/wPwvbta8yvAocCSovs8PkjDrdt7WEyGryr5EjLoqhmxgJTivkuJHlGKGdKui8qaa3lxg5YBkg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The kernel now has the main shadow stack functionality to support applications. Wire in the WRSS and shadow stack enable/disable functions into the existing shadow stack API skeleton. Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/kernel/shstk.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index ea0bf113f9cf..d723cdc93474 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -502,9 +502,17 @@ long shstk_prctl(struct task_struct *task, int option, unsigned long features) return -EINVAL; if (option == ARCH_SHSTK_DISABLE) { + if (features & ARCH_SHSTK_WRSS) + return wrss_control(false); + if (features & ARCH_SHSTK_SHSTK) + return shstk_disable(); return -EINVAL; } /* Handle ARCH_SHSTK_ENABLE */ + if (features & ARCH_SHSTK_SHSTK) + return shstk_setup(); + if (features & ARCH_SHSTK_WRSS) + return wrss_control(true); return -EINVAL; } From patchwork Tue Jun 13 00:11:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277753 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D5F4C7EE2E for ; Tue, 13 Jun 2023 00:13:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E868E8E0027; Mon, 12 Jun 2023 20:12:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E0D798E0023; Mon, 12 Jun 2023 20:12:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B77208E0027; Mon, 12 Jun 2023 20:12:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 948008E0023 for ; Mon, 12 Jun 2023 20:12:45 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 6E6861A0385 for ; Tue, 13 Jun 2023 00:12:45 +0000 (UTC) X-FDA: 80895798690.03.CBF809E Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf08.hostedemail.com (Postfix) with ESMTP id 5C75416001F for ; Tue, 13 Jun 2023 00:12:43 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=IilS7xHE; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615163; a=rsa-sha256; cv=none; b=m1LsbP+8WGv1D9tTmMLWRmHfn7jkwmOa9klRubSq3GtwzjWear77MyCrhyNgD9k9HJnlkH 69I9Pa6OiOv8Yw431ABcPkKLNE3KdNHmUuniz3+T8TGaWe4xuNLGHM0yEJZlqYhfTZ781n 8Xiv/XkootMISJ+NRxg5IZQpnaxbEqg= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=IilS7xHE; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615163; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ggbaxEIISLmgT7I/30ANB4iH1d2bu4GVx9rE6ZdVsLQ=; b=Lhyp2aydDvqXJlFeeezNJEfyJheqylO4KmBDbbriRSVnda78bWCkpMqklAIEsaeWEvlJ/G P22TnPVRFoieO9Fff1TwWbvlzjnqEA4pFS+ixR8GFAw83Wiwv0EKf96HGh3HThZpQv6t+T JzyC1/Nw89BhplpzL1Bzi+uBg7YTmPI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615163; x=1718151163; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Umo485lkaXi+aij5vBJ9ejxN8cdryMJacLabCk9dyEI=; b=IilS7xHEjltuwXw5JsVVZ8tmX8m40aOvTJAOsF5FRHNLAdIFUjq3GcV4 eLqwY21FkcZIXIpC1iI7LTIrxkG4XYuoD4X2hjtcuICOo6mz5DeS5R5XN cZG8rqDcX1+gUXJTH9VC3idr0ovPxlnrzAKeI+2F0LBm61lthYxPMSLWa tZgzKIxuHUJElDxIDlF2OAa33dyrz696BB8jOCzJnMzjPT3q16OvKrYAv sdAVMYOHxnuVmAxgPAp5ZkZbONjTdf15bOB5oSm6Iuctz27CiLqN9ss0+ uz5/jQcuComrHrhzcOsiE/riBp80rNeYljIkDIP5ojU2NPqGB+I+dXfGk A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557575" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557575" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671162" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671162" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:40 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 38/42] x86/cpufeatures: Enable CET CR4 bit for shadow stack Date: Mon, 12 Jun 2023 17:11:04 -0700 Message-Id: <20230613001108.3040476-39-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 5C75416001F X-Stat-Signature: zqgrgq8z89j1hbxorq99g1rm4zgujcfs X-HE-Tag: 1686615163-258029 X-HE-Meta: U2FsdGVkX18MKf1XKg/DLl587E28v5jzjqPYCLprFtMT7GMc3j6biPu1YPYOU2vD5qIO/g1F7v4dPIT7dG/0bIyVmLm83pM2ai8xBb8pR8/TXk3TDsk95gye/CBnJG+zZD452GBTfPqOlBSgho1+FCr2vBnOlNZELD0u8Hh9W2q5QO/2SnxTiUmE/AwFjpY7DJQX91/UmwocS5uJGtzfFn9lae/Xx1dvvLFagiVYWAtBDtaHSELQEKoxErNMlIdmn7swLQ4TRgqt6j4BiS1vvxk4Zi1Eiv6zoYdatF239l23enmAhvI4rcDNhRL0KI2BT0qisa5R/UdBj9UaPHke07lUPbdxwAeBadPi+xciNbMLg1A0kwtTZ1iNLQMrLWjUOcm07pCO7flytN7JscsnapoaH+llKNc9SP9XuzBlR5q3Ow5z3a47UFWKz3oa5CZlQ6So+2B8QveHcJpmv5k51Ya62iZdg3dVa0gXnmZRGs6Yhtj+oqy0Yn6q6VycoTHxcCfW4u89QwDjW5c3yUYT0RKYawF0jlH308982BCtzKrZyo66wUZqpX2sHEtwdIVv0v0qY2SJvJLjXojSvkysNFSlK3uSnqA9C07WVn2dW1XYfQrN7Ai72K9PM89/mOk4XnG86k5v/wXncYVkEqzBrvndeT7J6vSKV97/muWJIVUq1rNMXdLR/7vLPHQ4sw2YwDWxtN32pEWHzuqSIQssPKgrsJDQq/WG4SNIEuYGjvJTacxda7ihbNkwRmooi3i7yL3Hm+J3E5Y1KKAoA9F8e33CW8S/wWp0lCks/xjadS5jlMJtUfEowLAcBJKsTgKBJ1CT0AxTDF2+vb++vcmPmjmaTitZB5LjNyiT3ThT95IrL3jf16e/CvFnTnl6aN+ZBwU+wjf7D8T7Z32PL2Bw0xDw3lyhg1bUiMixw+ufmi56TGMwwbCXJtLduYhX//xP+LhaM+Yg1IpJwoPyG+Q xPRacY6U YV3cMljkZGPOuOHucwusFQ04A2Sx//gV7Zi8DdQnAUMSUfs33AK4LKzkYdVnV1LwT9105KEpyWruBsNIzEeOFfmq5FXgV1j0BdoBGrGuU9vvFvpuMthwxY9ckHuPfQGT5a/aEipCyyJrK6oQlg1aXabzrCr2GUI+z9lZBG3X5GKEIqO16SKaRNPaj3PAhQYGNeF2rslYU2BVu8FAYhueGJKZzlybVEjQN+z2+C2BXwHJZkszIh7X0xaV1D5Iyb/6okh/fv/WD+F8DmRCyuIIJluWxhR6Toat3UxDWXWWbXrj9ujTL5N6BB+g1qDiU7gRwSU/6AKZav+grRjz874+8LiDRw14FqYj4eZm50rvmxhYwHR5KFdXx8nTa3A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Setting CR4.CET is a prerequisite for utilizing any CET features, most of which also require setting MSRs. Kernel IBT already enables the CET CR4 bit when it detects IBT HW support and is configured with kernel IBT. However, future patches that enable userspace shadow stack support will need the bit set as well. So change the logic to enable it in either case. Clear MSR_IA32_U_CET in cet_disable() so that it can't live to see userspace in a new kexec-ed kernel that has CR4.CET set from kernel IBT. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/kernel/cpu/common.c | 35 +++++++++++++++++++++++++++-------- 1 file changed, 27 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 80710a68ef7d..3ea06b0b4570 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -601,27 +601,43 @@ __noendbr void ibt_restore(u64 save) static __always_inline void setup_cet(struct cpuinfo_x86 *c) { - u64 msr = CET_ENDBR_EN; + bool user_shstk, kernel_ibt; - if (!HAS_KERNEL_IBT || - !cpu_feature_enabled(X86_FEATURE_IBT)) + if (!IS_ENABLED(CONFIG_X86_CET)) return; - wrmsrl(MSR_IA32_S_CET, msr); + kernel_ibt = HAS_KERNEL_IBT && cpu_feature_enabled(X86_FEATURE_IBT); + user_shstk = cpu_feature_enabled(X86_FEATURE_SHSTK) && + IS_ENABLED(CONFIG_X86_USER_SHADOW_STACK); + + if (!kernel_ibt && !user_shstk) + return; + + if (user_shstk) + set_cpu_cap(c, X86_FEATURE_USER_SHSTK); + + if (kernel_ibt) + wrmsrl(MSR_IA32_S_CET, CET_ENDBR_EN); + else + wrmsrl(MSR_IA32_S_CET, 0); + cr4_set_bits(X86_CR4_CET); - if (!ibt_selftest()) { + if (kernel_ibt && !ibt_selftest()) { pr_err("IBT selftest: Failed!\n"); wrmsrl(MSR_IA32_S_CET, 0); setup_clear_cpu_cap(X86_FEATURE_IBT); - return; } } __noendbr void cet_disable(void) { - if (cpu_feature_enabled(X86_FEATURE_IBT)) - wrmsrl(MSR_IA32_S_CET, 0); + if (!(cpu_feature_enabled(X86_FEATURE_IBT) || + cpu_feature_enabled(X86_FEATURE_SHSTK))) + return; + + wrmsrl(MSR_IA32_S_CET, 0); + wrmsrl(MSR_IA32_U_CET, 0); } /* @@ -1483,6 +1499,9 @@ static void __init cpu_parse_early_param(void) if (cmdline_find_option_bool(boot_command_line, "noxsaves")) setup_clear_cpu_cap(X86_FEATURE_XSAVES); + if (cmdline_find_option_bool(boot_command_line, "nousershstk")) + setup_clear_cpu_cap(X86_FEATURE_USER_SHSTK); + arglen = cmdline_find_option(boot_command_line, "clearcpuid", arg, sizeof(arg)); if (arglen <= 0) return; From patchwork Tue Jun 13 00:11:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277755 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44BDEC7EE2E for ; Tue, 13 Jun 2023 00:13:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 165AE8E0028; Mon, 12 Jun 2023 20:12:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0FD988E0029; Mon, 12 Jun 2023 20:12:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E0D008E0028; Mon, 12 Jun 2023 20:12:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C66F08E0023 for ; Mon, 12 Jun 2023 20:12:46 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 7EBC340395 for ; Tue, 13 Jun 2023 00:12:46 +0000 (UTC) X-FDA: 80895798732.27.1683256 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf01.hostedemail.com (Postfix) with ESMTP id 24AAC40008 for ; Tue, 13 Jun 2023 00:12:43 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=CEHnvXG6; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615164; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XIDUjPughZjFOXgJfHaTyjXWpD03lk8bxIhB3OEpD8I=; b=Jy5HhL+qByG4UeVwPPR6OE6SJvIIKKmOE5aSt4vu9dbJWfS07QYSnJN7w8WKptMEH73HpW mp4Hw9ARnHKgSTL27adXBzrKOcvDUYyTtnBjnvpLgh4gNG0njS9QbNsYJoONDQMEqDLO9i 6YF0Oj/Nw9rVu5jyQX5NK2PpsvhSEfg= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=CEHnvXG6; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615164; a=rsa-sha256; cv=none; b=47Lq8b+FXiOW7h3Vl2Q0Y6LthrPjF/R6slHrP/jkn/slg9ANbxr97f+7F6XpcVpDZPMXJp eyre+MTcLhqztcn/To/W//FHrGpIwvWDh9/HSEPjJwSO6eOFPJigjT1NIBUaYtWy4ZvtKg eK52PVmJMcRY9LmfSGfB0RX6skfRKyg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615164; x=1718151164; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BODs5YgrIYiUG9wa8tbCiXPQczkVVrNo7gO7I9Fm7W8=; b=CEHnvXG6nzfNYqG9F9pRweiNfaL4ot0J+UbKIeVD92oxB9+sMHOnJ/+a S4MGA2+ga/AhA62KTNoducfoGKLKql7yXETOGWBSp1CONwRf3yH3qVc2L TjhG2O2TyazZW7vKqqhVwXRL/aQz8GRWWUIrMNqXr7etsI/TF1ZnKCqm/ Y51DsqC8ptrb232KmfrFCCeSL1uabItn3ep26v/Pr/PzBLnPwWs1TmkGY BQlhXpgTtEoJL8EIgi4H/7V2p5tLWlK3jx0M3BSjvITNb+GzBGhTJxrRF 2d3tmYnK8+8kW4GjuH3xeFlwYpNBqxd8Nujs7Q7BW8UlKq7j96TLvwBhU w==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557599" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557599" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671173" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671173" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:41 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 39/42] selftests/x86: Add shadow stack test Date: Mon, 12 Jun 2023 17:11:05 -0700 Message-Id: <20230613001108.3040476-40-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 24AAC40008 X-Stat-Signature: awzcwsagdai678kfj7c4scduzya1m7wa X-Rspam-User: X-HE-Tag: 1686615163-510678 X-HE-Meta: U2FsdGVkX1+JloBinNxsCmVEnTxsmsND9hbOH/J6dGWHaEuc9EByudlyORqrU5a/4z0vAcJnkUOqlKbbx1BEkC38hAMAY9s3f+YrAHQgpDIuyteJwk5jB6xPDyS/6LW/SGeADg0dYB1utYAdw/qS90fquPUoazG6z0wjyrrfJVIgNfKbl1+bjjpQSr+26/OExVF3MfUZdplFVxrK1ogufeRT+nU9yR/Y9d2yy/kMGo2IRzBXmYuL3USukBsGuCIuQtqLjohTwoHGd/kmyeutvSUs4U6DAOd9tVRq4t4+7NdQ59PfIps48bYlWDUAsvOePO5Yu6CjbGjLrkv+EoNz0uiR2mOI5bwyTNCkVOG7rAK7jgDOoJQZfOa0vR0Cnqv2u8nEfGxidIv+M+5FTkrH/LQxi9jh8XGoLtBoJLumJBUsDJXHBf+i64eh9TJ0Xc8hJyKrjxRZBObJjTciskliQJhEQHT7zNWDDTiPsBacep2pdx7gpGXXtRpSiAksWy4YC2jkugcCdnuI4qOLrhphzLTqMieLo+GtGdwImVmjtAN0/trA4inVO2zz/y5miYdQHPLZ5ctnWZa6hO7LYR8RMUI2yzepnNzBhvMN6n4OM9jnoMhRlqJk1CYOW73bYEz6QntfxKvw4/742GmTuXrnaQeXTG7KcGbBBX26k/b6H7z5GRRQWWDnZQPpFgd+d2cf8NWWSMHRqhfD7Cs/WV+Q27ZRksrZXOeJaiESVwisKybVlsu4honHRdREx4R4VkJxilrmSpYN8h8D6yNT5P/pCjrJWX3sSUOuEkCilsoeaLDEkZTE6XeCqTVTtODqqQOotDpXlvaGwR05hhHqxCKFRUCtG+NBIY3DeeWnXgoTujzx97j7xgGxaTOBSACy8TkFmzTPSNgTF5k0cs6Lq1Phakp+R5838Rqt6f9A+KyFxdtmnqYoF2VkwLmpSjzBhvj4/hfpeJ3kmBSqWC4yG7X twN1IfX3 ZAaNA3d2BJ0rkqhD6jhLaVwK/Wf0yOwNINu8f6RtMoE5f6sh5mpouMLtoFuEPgjiTiTDRBeXDIIu2Y3aZ5ZWcQDv7wDl6Kcun4QqgYet5RyTzbPZwzctBeC3IXQvzG9akvh6bSN2qTYimNva884ZWByFOkQOper4OQ4cYddLzMb0rmUabf5XM9KyNABH2EOwMaxkgqe1tnSSPnHgKQsNczSZh+174N34j8OZW4t++LrXSEBex7PggGFpotMusx8nMWuqZWBdLNOK5NxxNhuPH4mEU00WjsMmnk/Q1hT4ToEKbX9y0dzlKtmYqW23yLv0KeFjuK3hT+/25Lwj4cfQ6vc4FsB0PokJyQXZ8FFomcKH9EpZ3SBsMDhuYiQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a simple selftest for exercising some shadow stack behavior: - map_shadow_stack syscall and pivot - Faulting in shadow stack memory - Handling shadow stack violations - GUP of shadow stack memory - mprotect() of shadow stack memory - Userfaultfd on shadow stack memory - 32 bit segmentation - Guard gap test - Ptrace test Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v9: - Fix race in userfaultfd test - Add guard gap test - Add ptrace test --- tools/testing/selftests/x86/Makefile | 2 +- .../testing/selftests/x86/test_shadow_stack.c | 884 ++++++++++++++++++ 2 files changed, 885 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/x86/test_shadow_stack.c diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index 598135d3162b..7e8c937627dd 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -18,7 +18,7 @@ TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ vdso_restorer TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering \ - corrupt_xstate_header amx lam + corrupt_xstate_header amx lam test_shadow_stack # Some selftests require 32bit support enabled also on 64bit systems TARGETS_C_32BIT_NEEDED := ldt_gdt ptrace_syscall diff --git a/tools/testing/selftests/x86/test_shadow_stack.c b/tools/testing/selftests/x86/test_shadow_stack.c new file mode 100644 index 000000000000..5ab788dc1ce6 --- /dev/null +++ b/tools/testing/selftests/x86/test_shadow_stack.c @@ -0,0 +1,884 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * This program test's basic kernel shadow stack support. It enables shadow + * stack manual via the arch_prctl(), instead of relying on glibc. It's + * Makefile doesn't compile with shadow stack support, so it doesn't rely on + * any particular glibc. As a result it can't do any operations that require + * special glibc shadow stack support (longjmp(), swapcontext(), etc). Just + * stick to the basics and hope the compiler doesn't do anything strange. + */ + +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * Define the ABI defines if needed, so people can run the tests + * without building the headers. + */ +#ifndef __NR_map_shadow_stack +#define __NR_map_shadow_stack 451 + +#define SHADOW_STACK_SET_TOKEN (1ULL << 0) + +#define ARCH_SHSTK_ENABLE 0x5001 +#define ARCH_SHSTK_DISABLE 0x5002 +#define ARCH_SHSTK_LOCK 0x5003 +#define ARCH_SHSTK_UNLOCK 0x5004 +#define ARCH_SHSTK_STATUS 0x5005 + +#define ARCH_SHSTK_SHSTK (1ULL << 0) +#define ARCH_SHSTK_WRSS (1ULL << 1) + +#define NT_X86_SHSTK 0x204 +#endif + +#define SS_SIZE 0x200000 +#define PAGE_SIZE 0x1000 + +#if (__GNUC__ < 8) || (__GNUC__ == 8 && __GNUC_MINOR__ < 5) +int main(int argc, char *argv[]) +{ + printf("[SKIP]\tCompiler does not support CET.\n"); + return 0; +} +#else +void write_shstk(unsigned long *addr, unsigned long val) +{ + asm volatile("wrssq %[val], (%[addr])\n" + : "=m" (addr) + : [addr] "r" (addr), [val] "r" (val)); +} + +static inline unsigned long __attribute__((always_inline)) get_ssp(void) +{ + unsigned long ret = 0; + + asm volatile("xor %0, %0; rdsspq %0" : "=r" (ret)); + return ret; +} + +/* + * For use in inline enablement of shadow stack. + * + * The program can't return from the point where shadow stack gets enabled + * because there will be no address on the shadow stack. So it can't use + * syscall() for enablement, since it is a function. + * + * Based on code from nolibc.h. Keep a copy here because this can't pull in all + * of nolibc.h. + */ +#define ARCH_PRCTL(arg1, arg2) \ +({ \ + long _ret; \ + register long _num asm("eax") = __NR_arch_prctl; \ + register long _arg1 asm("rdi") = (long)(arg1); \ + register long _arg2 asm("rsi") = (long)(arg2); \ + \ + asm volatile ( \ + "syscall\n" \ + : "=a"(_ret) \ + : "r"(_arg1), "r"(_arg2), \ + "0"(_num) \ + : "rcx", "r11", "memory", "cc" \ + ); \ + _ret; \ +}) + +void *create_shstk(void *addr) +{ + return (void *)syscall(__NR_map_shadow_stack, addr, SS_SIZE, SHADOW_STACK_SET_TOKEN); +} + +void *create_normal_mem(void *addr) +{ + return mmap(addr, SS_SIZE, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); +} + +void free_shstk(void *shstk) +{ + munmap(shstk, SS_SIZE); +} + +int reset_shstk(void *shstk) +{ + return madvise(shstk, SS_SIZE, MADV_DONTNEED); +} + +void try_shstk(unsigned long new_ssp) +{ + unsigned long ssp; + + printf("[INFO]\tnew_ssp = %lx, *new_ssp = %lx\n", + new_ssp, *((unsigned long *)new_ssp)); + + ssp = get_ssp(); + printf("[INFO]\tchanging ssp from %lx to %lx\n", ssp, new_ssp); + + asm volatile("rstorssp (%0)\n":: "r" (new_ssp)); + asm volatile("saveprevssp"); + printf("[INFO]\tssp is now %lx\n", get_ssp()); + + /* Switch back to original shadow stack */ + ssp -= 8; + asm volatile("rstorssp (%0)\n":: "r" (ssp)); + asm volatile("saveprevssp"); +} + +int test_shstk_pivot(void) +{ + void *shstk = create_shstk(0); + + if (shstk == MAP_FAILED) { + printf("[FAIL]\tError creating shadow stack: %d\n", errno); + return 1; + } + try_shstk((unsigned long)shstk + SS_SIZE - 8); + free_shstk(shstk); + + printf("[OK]\tShadow stack pivot\n"); + return 0; +} + +int test_shstk_faults(void) +{ + unsigned long *shstk = create_shstk(0); + + /* Read shadow stack, test if it's zero to not get read optimized out */ + if (*shstk != 0) + goto err; + + /* Wrss memory that was already read. */ + write_shstk(shstk, 1); + if (*shstk != 1) + goto err; + + /* Page out memory, so we can wrss it again. */ + if (reset_shstk((void *)shstk)) + goto err; + + write_shstk(shstk, 1); + if (*shstk != 1) + goto err; + + printf("[OK]\tShadow stack faults\n"); + return 0; + +err: + return 1; +} + +unsigned long saved_ssp; +unsigned long saved_ssp_val; +volatile bool segv_triggered; + +void __attribute__((noinline)) violate_ss(void) +{ + saved_ssp = get_ssp(); + saved_ssp_val = *(unsigned long *)saved_ssp; + + /* Corrupt shadow stack */ + printf("[INFO]\tCorrupting shadow stack\n"); + write_shstk((void *)saved_ssp, 0); +} + +void segv_handler(int signum, siginfo_t *si, void *uc) +{ + printf("[INFO]\tGenerated shadow stack violation successfully\n"); + + segv_triggered = true; + + /* Fix shadow stack */ + write_shstk((void *)saved_ssp, saved_ssp_val); +} + +int test_shstk_violation(void) +{ + struct sigaction sa = {}; + + sa.sa_sigaction = segv_handler; + sa.sa_flags = SA_SIGINFO; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + + segv_triggered = false; + + /* Make sure segv_triggered is set before violate_ss() */ + asm volatile("" : : : "memory"); + + violate_ss(); + + signal(SIGSEGV, SIG_DFL); + + printf("[OK]\tShadow stack violation test\n"); + + return !segv_triggered; +} + +/* Gup test state */ +#define MAGIC_VAL 0x12345678 +bool is_shstk_access; +void *shstk_ptr; +int fd; + +void reset_test_shstk(void *addr) +{ + if (shstk_ptr) + free_shstk(shstk_ptr); + shstk_ptr = create_shstk(addr); +} + +void test_access_fix_handler(int signum, siginfo_t *si, void *uc) +{ + printf("[INFO]\tViolation from %s\n", is_shstk_access ? "shstk access" : "normal write"); + + segv_triggered = true; + + /* Fix shadow stack */ + if (is_shstk_access) { + reset_test_shstk(shstk_ptr); + return; + } + + free_shstk(shstk_ptr); + create_normal_mem(shstk_ptr); +} + +bool test_shstk_access(void *ptr) +{ + is_shstk_access = true; + segv_triggered = false; + write_shstk(ptr, MAGIC_VAL); + + asm volatile("" : : : "memory"); + + return segv_triggered; +} + +bool test_write_access(void *ptr) +{ + is_shstk_access = false; + segv_triggered = false; + *(unsigned long *)ptr = MAGIC_VAL; + + asm volatile("" : : : "memory"); + + return segv_triggered; +} + +bool gup_write(void *ptr) +{ + unsigned long val; + + lseek(fd, (unsigned long)ptr, SEEK_SET); + if (write(fd, &val, sizeof(val)) < 0) + return 1; + + return 0; +} + +bool gup_read(void *ptr) +{ + unsigned long val; + + lseek(fd, (unsigned long)ptr, SEEK_SET); + if (read(fd, &val, sizeof(val)) < 0) + return 1; + + return 0; +} + +int test_gup(void) +{ + struct sigaction sa = {}; + int status; + pid_t pid; + + sa.sa_sigaction = test_access_fix_handler; + sa.sa_flags = SA_SIGINFO; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + + segv_triggered = false; + + fd = open("/proc/self/mem", O_RDWR); + if (fd == -1) + return 1; + + reset_test_shstk(0); + if (gup_read(shstk_ptr)) + return 1; + if (test_shstk_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup read -> shstk access success\n"); + + reset_test_shstk(0); + if (gup_write(shstk_ptr)) + return 1; + if (test_shstk_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup write -> shstk access success\n"); + + reset_test_shstk(0); + if (gup_read(shstk_ptr)) + return 1; + if (!test_write_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup read -> write access success\n"); + + reset_test_shstk(0); + if (gup_write(shstk_ptr)) + return 1; + if (!test_write_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup write -> write access success\n"); + + close(fd); + + /* COW/gup test */ + reset_test_shstk(0); + pid = fork(); + if (!pid) { + fd = open("/proc/self/mem", O_RDWR); + if (fd == -1) + exit(1); + + if (gup_write(shstk_ptr)) { + close(fd); + exit(1); + } + close(fd); + exit(0); + } + waitpid(pid, &status, 0); + if (WEXITSTATUS(status)) { + printf("[FAIL]\tWrite in child failed\n"); + return 1; + } + if (*(unsigned long *)shstk_ptr == MAGIC_VAL) { + printf("[FAIL]\tWrite in child wrote through to shared memory\n"); + return 1; + } + + printf("[INFO]\tCow gup write -> write access success\n"); + + free_shstk(shstk_ptr); + + signal(SIGSEGV, SIG_DFL); + + printf("[OK]\tShadow gup test\n"); + + return 0; +} + +int test_mprotect(void) +{ + struct sigaction sa = {}; + + sa.sa_sigaction = test_access_fix_handler; + sa.sa_flags = SA_SIGINFO; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + + segv_triggered = false; + + /* mprotect a shadow stack as read only */ + reset_test_shstk(0); + if (mprotect(shstk_ptr, SS_SIZE, PROT_READ) < 0) { + printf("[FAIL]\tmprotect(PROT_READ) failed\n"); + return 1; + } + + /* try to wrss it and fail */ + if (!test_shstk_access(shstk_ptr)) { + printf("[FAIL]\tShadow stack access to read-only memory succeeded\n"); + return 1; + } + + /* + * The shadow stack was reset above to resolve the fault, make the new one + * read-only. + */ + if (mprotect(shstk_ptr, SS_SIZE, PROT_READ) < 0) { + printf("[FAIL]\tmprotect(PROT_READ) failed\n"); + return 1; + } + + /* then back to writable */ + if (mprotect(shstk_ptr, SS_SIZE, PROT_WRITE | PROT_READ) < 0) { + printf("[FAIL]\tmprotect(PROT_WRITE) failed\n"); + return 1; + } + + /* then wrss to it and succeed */ + if (test_shstk_access(shstk_ptr)) { + printf("[FAIL]\tShadow stack access to mprotect() writable memory failed\n"); + return 1; + } + + free_shstk(shstk_ptr); + + signal(SIGSEGV, SIG_DFL); + + printf("[OK]\tmprotect() test\n"); + + return 0; +} + +char zero[4096]; + +static void *uffd_thread(void *arg) +{ + struct uffdio_copy req; + int uffd = *(int *)arg; + struct uffd_msg msg; + int ret; + + while (1) { + ret = read(uffd, &msg, sizeof(msg)); + if (ret > 0) + break; + else if (errno == EAGAIN) + continue; + return (void *)1; + } + + req.dst = msg.arg.pagefault.address; + req.src = (__u64)zero; + req.len = 4096; + req.mode = 0; + + if (ioctl(uffd, UFFDIO_COPY, &req)) + return (void *)1; + + return (void *)0; +} + +int test_userfaultfd(void) +{ + struct uffdio_register uffdio_register; + struct uffdio_api uffdio_api; + struct sigaction sa = {}; + pthread_t thread; + void *res; + int uffd; + + sa.sa_sigaction = test_access_fix_handler; + sa.sa_flags = SA_SIGINFO; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + + uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); + if (uffd < 0) { + printf("[SKIP]\tUserfaultfd unavailable.\n"); + return 0; + } + + reset_test_shstk(0); + + uffdio_api.api = UFFD_API; + uffdio_api.features = 0; + if (ioctl(uffd, UFFDIO_API, &uffdio_api)) + goto err; + + uffdio_register.range.start = (__u64)shstk_ptr; + uffdio_register.range.len = 4096; + uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING; + if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register)) + goto err; + + if (pthread_create(&thread, NULL, &uffd_thread, &uffd)) + goto err; + + reset_shstk(shstk_ptr); + test_shstk_access(shstk_ptr); + + if (pthread_join(thread, &res)) + goto err; + + if (test_shstk_access(shstk_ptr)) + goto err; + + free_shstk(shstk_ptr); + + signal(SIGSEGV, SIG_DFL); + + if (!res) + printf("[OK]\tUserfaultfd test\n"); + return !!res; +err: + free_shstk(shstk_ptr); + close(uffd); + signal(SIGSEGV, SIG_DFL); + return 1; +} + +/* Simple linked list for keeping track of mappings in test_guard_gap() */ +struct node { + struct node *next; + void *mapping; +}; + +/* + * This tests whether mmap will place other mappings in a shadow stack's guard + * gap. The steps are: + * 1. Finds an empty place by mapping and unmapping something. + * 2. Map a shadow stack in the middle of the known empty area. + * 3. Map a bunch of PAGE_SIZE mappings. These will use the search down + * direction, filling any gaps until it encounters the shadow stack's + * guard gap. + * 4. When a mapping lands below the shadow stack from step 2, then all + * of the above gaps are filled. The search down algorithm will have + * looked at the shadow stack gaps. + * 5. See if it landed in the gap. + */ +int test_guard_gap(void) +{ + void *free_area, *shstk, *test_map = (void *)0xFFFFFFFFFFFFFFFF; + struct node *head = NULL, *cur; + + free_area = mmap(0, SS_SIZE * 3, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + munmap(free_area, SS_SIZE * 3); + + shstk = create_shstk(free_area + SS_SIZE); + if (shstk == MAP_FAILED) + return 1; + + while (test_map > shstk) { + test_map = mmap(0, PAGE_SIZE, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (test_map == MAP_FAILED) + return 1; + cur = malloc(sizeof(*cur)); + cur->mapping = test_map; + + cur->next = head; + head = cur; + } + + while (head) { + cur = head; + head = cur->next; + munmap(cur->mapping, PAGE_SIZE); + free(cur); + } + + free_shstk(shstk); + + if (shstk - test_map - PAGE_SIZE != PAGE_SIZE) + return 1; + + printf("[OK]\tGuard gap test\n"); + + return 0; +} + +/* + * Too complicated to pull it out of the 32 bit header, but also get the + * 64 bit one needed above. Just define a copy here. + */ +#define __NR_compat_sigaction 67 + +/* + * Call 32 bit signal handler to get 32 bit signals ABI. Make sure + * to push the registers that will get clobbered. + */ +int sigaction32(int signum, const struct sigaction *restrict act, + struct sigaction *restrict oldact) +{ + register long syscall_reg asm("eax") = __NR_compat_sigaction; + register long signum_reg asm("ebx") = signum; + register long act_reg asm("ecx") = (long)act; + register long oldact_reg asm("edx") = (long)oldact; + int ret = 0; + + asm volatile ("int $0x80;" + : "=a"(ret), "=m"(oldact) + : "r"(syscall_reg), "r"(signum_reg), "r"(act_reg), + "r"(oldact_reg) + : "r8", "r9", "r10", "r11" + ); + + return ret; +} + +sigjmp_buf jmp_buffer; + +void segv_gp_handler(int signum, siginfo_t *si, void *uc) +{ + segv_triggered = true; + + /* + * To work with old glibc, this can't rely on siglongjmp working with + * shadow stack enabled, so disable shadow stack before siglongjmp(). + */ + ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK); + siglongjmp(jmp_buffer, -1); +} + +/* + * Transition to 32 bit mode and check that a #GP triggers a segfault. + */ +int test_32bit(void) +{ + struct sigaction sa = {}; + struct sigaction *sa32; + + /* Create sigaction in 32 bit address range */ + sa32 = mmap(0, 4096, PROT_READ | PROT_WRITE, + MAP_32BIT | MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); + sa32->sa_flags = SA_SIGINFO; + + sa.sa_sigaction = segv_gp_handler; + sa.sa_flags = SA_SIGINFO; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + + + segv_triggered = false; + + /* Make sure segv_triggered is set before triggering the #GP */ + asm volatile("" : : : "memory"); + + /* + * Set handler to somewhere in 32 bit address space + */ + sa32->sa_handler = (void *)sa32; + if (sigaction32(SIGUSR1, sa32, NULL)) + return 1; + + if (!sigsetjmp(jmp_buffer, 1)) + raise(SIGUSR1); + + if (segv_triggered) + printf("[OK]\t32 bit test\n"); + + return !segv_triggered; +} + +void segv_handler_ptrace(int signum, siginfo_t *si, void *uc) +{ + /* The SSP adjustment caused a segfault. */ + exit(0); +} + +int test_ptrace(void) +{ + unsigned long saved_ssp, ssp = 0; + struct sigaction sa= {}; + struct iovec iov; + int status; + int pid; + + iov.iov_base = &ssp; + iov.iov_len = sizeof(ssp); + + pid = fork(); + if (!pid) { + ssp = get_ssp(); + + sa.sa_sigaction = segv_handler_ptrace; + sa.sa_flags = SA_SIGINFO; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + + ptrace(PTRACE_TRACEME, NULL, NULL, NULL); + /* + * The parent will tweak the SSP and return from this function + * will #CP. + */ + raise(SIGTRAP); + + exit(1); + } + + while (waitpid(pid, &status, 0) != -1 && WSTOPSIG(status) != SIGTRAP); + + if (ptrace(PTRACE_GETREGSET, pid, NT_X86_SHSTK, &iov)) { + printf("[INFO]\tFailed to PTRACE_GETREGS\n"); + goto out_kill; + } + + if (!ssp) { + printf("[INFO]\tPtrace child SSP was 0\n"); + goto out_kill; + } + + saved_ssp = ssp; + + iov.iov_len = 0; + if (!ptrace(PTRACE_SETREGSET, pid, NT_X86_SHSTK, &iov)) { + printf("[INFO]\tToo small size accepted via PTRACE_SETREGS\n"); + goto out_kill; + } + + iov.iov_len = sizeof(ssp) + 1; + if (!ptrace(PTRACE_SETREGSET, pid, NT_X86_SHSTK, &iov)) { + printf("[INFO]\tToo large size accepted via PTRACE_SETREGS\n"); + goto out_kill; + } + + ssp += 1; + if (!ptrace(PTRACE_SETREGSET, pid, NT_X86_SHSTK, &iov)) { + printf("[INFO]\tUnaligned SSP written via PTRACE_SETREGS\n"); + goto out_kill; + } + + ssp = 0xFFFFFFFFFFFF0000; + if (!ptrace(PTRACE_SETREGSET, pid, NT_X86_SHSTK, &iov)) { + printf("[INFO]\tKernel range SSP written via PTRACE_SETREGS\n"); + goto out_kill; + } + + /* + * Tweak the SSP so the child with #CP when it resumes and returns + * from raise() + */ + ssp = saved_ssp + 8; + iov.iov_len = sizeof(ssp); + if (ptrace(PTRACE_SETREGSET, pid, NT_X86_SHSTK, &iov)) { + printf("[INFO]\tFailed to PTRACE_SETREGS\n"); + goto out_kill; + } + + if (ptrace(PTRACE_DETACH, pid, NULL, NULL)) { + printf("[INFO]\tFailed to PTRACE_DETACH\n"); + goto out_kill; + } + + waitpid(pid, &status, 0); + if (WEXITSTATUS(status)) + return 1; + + printf("[OK]\tPtrace test\n"); + return 0; + +out_kill: + kill(pid, SIGKILL); + return 1; +} + +int main(int argc, char *argv[]) +{ + int ret = 0; + + if (ARCH_PRCTL(ARCH_SHSTK_ENABLE, ARCH_SHSTK_SHSTK)) { + printf("[SKIP]\tCould not enable Shadow stack\n"); + return 1; + } + + if (ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK)) { + ret = 1; + printf("[FAIL]\tDisabling shadow stack failed\n"); + } + + if (ARCH_PRCTL(ARCH_SHSTK_ENABLE, ARCH_SHSTK_SHSTK)) { + printf("[SKIP]\tCould not re-enable Shadow stack\n"); + return 1; + } + + if (ARCH_PRCTL(ARCH_SHSTK_ENABLE, ARCH_SHSTK_WRSS)) { + printf("[SKIP]\tCould not enable WRSS\n"); + ret = 1; + goto out; + } + + /* Should have succeeded if here, but this is a test, so double check. */ + if (!get_ssp()) { + printf("[FAIL]\tShadow stack disabled\n"); + return 1; + } + + if (test_shstk_pivot()) { + ret = 1; + printf("[FAIL]\tShadow stack pivot\n"); + goto out; + } + + if (test_shstk_faults()) { + ret = 1; + printf("[FAIL]\tShadow stack fault test\n"); + goto out; + } + + if (test_shstk_violation()) { + ret = 1; + printf("[FAIL]\tShadow stack violation test\n"); + goto out; + } + + if (test_gup()) { + ret = 1; + printf("[FAIL]\tShadow shadow stack gup\n"); + goto out; + } + + if (test_mprotect()) { + ret = 1; + printf("[FAIL]\tShadow shadow mprotect test\n"); + goto out; + } + + if (test_userfaultfd()) { + ret = 1; + printf("[FAIL]\tUserfaultfd test\n"); + goto out; + } + + if (test_guard_gap()) { + ret = 1; + printf("[FAIL]\tGuard gap test\n"); + goto out; + } + + if (test_ptrace()) { + ret = 1; + printf("[FAIL]\tptrace test\n"); + } + + if (test_32bit()) { + ret = 1; + printf("[FAIL]\t32 bit test\n"); + goto out; + } + + return ret; + +out: + /* + * Disable shadow stack before the function returns, or there will be a + * shadow stack violation. + */ + if (ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK)) { + ret = 1; + printf("[FAIL]\tDisabling shadow stack failed\n"); + } + + return ret; +} +#endif From patchwork Tue Jun 13 00:11:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277754 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD005C88CB2 for ; Tue, 13 Jun 2023 00:13:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 66F868E0023; Mon, 12 Jun 2023 20:12:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5F71C8E0003; Mon, 12 Jun 2023 20:12:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 35D4C8E002A; Mon, 12 Jun 2023 20:12:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id F38058E0023 for ; Mon, 12 Jun 2023 20:12:46 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D2AA51C7C69 for ; Tue, 13 Jun 2023 00:12:46 +0000 (UTC) X-FDA: 80895798732.27.090FA3A Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf19.hostedemail.com (Postfix) with ESMTP id B60F51A0004 for ; Tue, 13 Jun 2023 00:12:44 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=dq1TNI4D; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615165; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ym1IUf+WN11sfR5WNNfTjB6hlFyLrR1ZQd5U/3cGxxM=; b=HcRr4+nPVGvj/lQ8pG/Nb1h8dJDyl98iJd0HsSpxzHdwCFMqrCaeTwh/vawCgY5JVKfAD7 nnoNXw1HnN1FjbzDJ6EiiNKxpMhugOYOXtsIXfAWrJb5WkWhdI9EOYcWEq2TlXY9izr6xw P3rsRQJbugt4i/tl/CPe/eT/n2x7NmI= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=dq1TNI4D; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615165; a=rsa-sha256; cv=none; b=dMA+PDJ5VFz3yKfKCPDhdFZDZ5ljKuPfq0zs3tZcQ/q8J3FywaeJgZBNeKlIUiV01pfnO6 q1rjL89kJPHRdPsU83whURHmPctwV+Qt+X9vAO0jFow46thGnUo494c8FMO2m4cb4Caefk 9uLUbqXu1kl3ihptTp6ctDnnHDCOl3Y= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615164; x=1718151164; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YlC9UAklwmTYEYA/IeAnmisSABN50M2A88ovR11LHkA=; b=dq1TNI4D4/I9KBjNQOIFw0377GDC7VfWLKhFQKHwPN6pDdODQSa0kn4h HjNAhtfVBr+VDgzyHQYf2xKmAakAlyJQqj20tOs4ZBfga4lhuxQlcNTPs Vbk28uXaRaxn9PEP/WJ5wRMSr+dUCXUN4gN0Y32MMnFJoacXNKS3rKFQL 7mmMfb72Sg4aYNC0WXlgLIIv78AYM0CHgWfhhhnGQEhww1w+rTdjXQHxy XvrO51ahFv/CD2617url1maM98ExBfa9OuKQtOtKIqUG3ipd+ESw5WbCh SaZCUyitQn0sJNLRPkBYiFTScf5oKMEEDZInZj9/o+TIP2mW4HeH0ETPq A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557621" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557621" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671179" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671179" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:42 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 40/42] x86: Add PTRACE interface for shadow stack Date: Mon, 12 Jun 2023 17:11:06 -0700 Message-Id: <20230613001108.3040476-41-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: B60F51A0004 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: kfu1s8j8dc97b8f6sico88m37cn7rf8b X-HE-Tag: 1686615164-146832 X-HE-Meta: U2FsdGVkX1+mWsuY13B4lm/6g9LlA8JEtkcTD/7PaNA0NUpvmwWiISgKes1vlgLfcpwgWlh1pcdUD+T8qMlXKQz5amOsG0b94DW8+bjXHpbIrnBvpk3mWVNrwA/eukHXnPGdoyeJtNqYyyf6Lf7lZQgQ9wSF2gpTa7OGj/3u7NGHXLF72nGEr94jQiyWhs6xBkmCqvvwf20hcpXoWkOYalU4lP3CitMGNDoOmpmQqvMhSDhtnrTC1/CfneLLuxa9VFGS3FkKWOxlzNvJoAOXFNRjWrLSmSLH2dmyZNlNOBIt9/BJNe+4D2aseDEleYDzUhBbBaVrV9w1P6oXzlyRfdmJrD7pGRc86CCzngyKbxgcoDXGA8TPj955xF4m4WaYqUUUM2e2C06UbFJ7Cbg6osSkCLDovv1Wx+B2UIlMI8X6bKA8HmQNVXhzqLwWZMMltXzZdxs3J47DHgK2Vfxnf1CYfxu4kiu+9lWZ0EwWlxJ3U2EQinqxZiVg9EWAVnuAZXVVJdIoN6WAc1fZmdUvJ1SOEwXyNoKdqVKkQml1vk0i3ARx5wfGSgOONYn5Fvl2MWkAslsiYUhHkbrjLPaG9GWTnp77KOCjYWensV7LqgljqDKRhIpuL1W1c8c82bGOFbmh1eVjGRZ0cPNJoeSDyUqIz90xJAJNI7fSRJfn4CqfzEhB86O8juGooQmaq7+PNOdjMtM8772L20/X2QUoHz5/2e/HzY/BD2r4UinaDxBc3VjLK/4Dz7TetyML+nh2Cp8OKmFnoVO37fE+QNR2ADO6hk/vUWFXWyO+u3Oo3RGedNgd0FAG3kWRdar7zcecPyvr2sBXkrT7bIO1ZR6kTj0903sIMfakX7cl6e1qSOZXNLW65+tQYyqWWZoEhgSeKDvKgNH9k6FpIPNuObj5kve71ioOqyo93PgE79v7x7x8kI16zayVfD7nW6xk4trKS2FVle6NWS2Mhw3Vt/x y9fM/4K7 DM2f28dIFDWUbXmIQxPsctdIHM5XtGZw7SKJF+yVASrnpHEszWjg2ysBQO62QG2UD8RomDsaXYRinpFSqGwi86wXid5IlWMPEZoq7CgExArtHbtvLxYy8HOsozbAUhBOXH7umFg0qnDUPRUQ3nPz2lFWNOcKxe7PAhsfldTZL7wQyM7q4E+YXXUU0pzqgz5JY6QuzL7LNLEBfpVozayaAVAyrh0wUi8r9yLNwwPvbg40MjjkUlo7Du5nSOZwlKL8udpknyMyfArp549nUgc1kMFJxmrfdHNkOlZ7OJlH1Q7QiICTVtqJ45196A8ZYtMtAmjgVAVa4bW6wo4kQ5bz/I4YEnOFb1HV2U0J5aqGZdjq9zcsqhKJnlIAewQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Some applications (like GDB) would like to tweak shadow stack state via ptrace. This allows for existing functionality to continue to work for seized shadow stack applications. Provide a regset interface for manipulating the shadow stack pointer (SSP). There is already ptrace functionality for accessing xstate, but this does not include supervisor xfeatures. So there is not a completely clear place for where to put the shadow stack state. Adding it to the user xfeatures regset would complicate that code, as it currently shares logic with signals which should not have supervisor features. Don't add a general supervisor xfeature regset like the user one, because it is better to maintain flexibility for other supervisor xfeatures to define their own interface. For example, an xfeature may decide not to expose all of it's state to userspace, as is actually the case for shadow stack ptrace functionality. A lot of enum values remain to be used, so just put it in dedicated shadow stack regset. The only downside to not having a generic supervisor xfeature regset, is that apps need to be enlightened of any new supervisor xfeature exposed this way (i.e. they can't try to have generic save/restore logic). But maybe that is a good thing, because they have to think through each new xfeature instead of encountering issues when a new supervisor xfeature was added. By adding a shadow stack regset, it also has the effect of including the shadow stack state in a core dump, which could be useful for debugging. The shadow stack specific xstate includes the SSP, and the shadow stack and WRSS enablement status. Enabling shadow stack or WRSS in the kernel involves more than just flipping the bit. The kernel is made aware that it has to do extra things when cloning or handling signals. That logic is triggered off of separate feature enablement state kept in the task struct. So the flipping on HW shadow stack enforcement without notifying the kernel to change its behavior would severely limit what an application could do without crashing, and the results would depend on kernel internal implementation details. There is also no known use for controlling this state via ptrace today. So only expose the SSP, which is something that userspace already has indirect control over. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v9: - Squash "Enforce only whole copies for ssp_set()" fix that previously was in tip. --- arch/x86/include/asm/fpu/regset.h | 7 +-- arch/x86/kernel/fpu/regset.c | 81 +++++++++++++++++++++++++++++++ arch/x86/kernel/ptrace.c | 12 +++++ include/uapi/linux/elf.h | 2 + 4 files changed, 99 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/fpu/regset.h b/arch/x86/include/asm/fpu/regset.h index 4f928d6a367b..697b77e96025 100644 --- a/arch/x86/include/asm/fpu/regset.h +++ b/arch/x86/include/asm/fpu/regset.h @@ -7,11 +7,12 @@ #include -extern user_regset_active_fn regset_fpregs_active, regset_xregset_fpregs_active; +extern user_regset_active_fn regset_fpregs_active, regset_xregset_fpregs_active, + ssp_active; extern user_regset_get2_fn fpregs_get, xfpregs_get, fpregs_soft_get, - xstateregs_get; + xstateregs_get, ssp_get; extern user_regset_set_fn fpregs_set, xfpregs_set, fpregs_soft_set, - xstateregs_set; + xstateregs_set, ssp_set; /* * xstateregs_active == regset_fpregs_active. Please refer to the comment diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c index 6d056b68f4ed..6bc1eb2a21bd 100644 --- a/arch/x86/kernel/fpu/regset.c +++ b/arch/x86/kernel/fpu/regset.c @@ -8,6 +8,7 @@ #include #include #include +#include #include "context.h" #include "internal.h" @@ -174,6 +175,86 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset, return ret; } +#ifdef CONFIG_X86_USER_SHADOW_STACK +int ssp_active(struct task_struct *target, const struct user_regset *regset) +{ + if (target->thread.features & ARCH_SHSTK_SHSTK) + return regset->n; + + return 0; +} + +int ssp_get(struct task_struct *target, const struct user_regset *regset, + struct membuf to) +{ + struct fpu *fpu = &target->thread.fpu; + struct cet_user_state *cetregs; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return -ENODEV; + + sync_fpstate(fpu); + cetregs = get_xsave_addr(&fpu->fpstate->regs.xsave, XFEATURE_CET_USER); + if (WARN_ON(!cetregs)) { + /* + * This shouldn't ever be NULL because shadow stack was + * verified to be enabled above. This means + * MSR_IA32_U_CET.CET_SHSTK_EN should be 1 and so + * XFEATURE_CET_USER should not be in the init state. + */ + return -ENODEV; + } + + return membuf_write(&to, (unsigned long *)&cetregs->user_ssp, + sizeof(cetregs->user_ssp)); +} + +int ssp_set(struct task_struct *target, const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf) +{ + struct fpu *fpu = &target->thread.fpu; + struct xregs_state *xsave = &fpu->fpstate->regs.xsave; + struct cet_user_state *cetregs; + unsigned long user_ssp; + int r; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || + !ssp_active(target, regset)) + return -ENODEV; + + if (pos != 0 || count != sizeof(user_ssp)) + return -EINVAL; + + r = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &user_ssp, 0, -1); + if (r) + return r; + + /* + * Some kernel instructions (IRET, etc) can cause exceptions in the case + * of disallowed CET register values. Just prevent invalid values. + */ + if (user_ssp >= TASK_SIZE_MAX || !IS_ALIGNED(user_ssp, 8)) + return -EINVAL; + + fpu_force_restore(fpu); + + cetregs = get_xsave_addr(xsave, XFEATURE_CET_USER); + if (WARN_ON(!cetregs)) { + /* + * This shouldn't ever be NULL because shadow stack was + * verified to be enabled above. This means + * MSR_IA32_U_CET.CET_SHSTK_EN should be 1 and so + * XFEATURE_CET_USER should not be in the init state. + */ + return -ENODEV; + } + + cetregs->user_ssp = user_ssp; + return 0; +} +#endif /* CONFIG_X86_USER_SHADOW_STACK */ + #if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION /* diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index dfaa270a7cc9..095f04bdabdc 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -58,6 +58,7 @@ enum x86_regset_64 { REGSET64_FP, REGSET64_IOPERM, REGSET64_XSTATE, + REGSET64_SSP, }; #define REGSET_GENERAL \ @@ -1267,6 +1268,17 @@ static struct user_regset x86_64_regsets[] __ro_after_init = { .active = ioperm_active, .regset_get = ioperm_get }, +#ifdef CONFIG_X86_USER_SHADOW_STACK + [REGSET64_SSP] = { + .core_note_type = NT_X86_SHSTK, + .n = 1, + .size = sizeof(u64), + .align = sizeof(u64), + .active = ssp_active, + .regset_get = ssp_get, + .set = ssp_set + }, +#endif }; static const struct user_regset_view user_x86_64_view = { diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index ac3da855fb19..fa1ceeae2596 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -406,6 +406,8 @@ typedef struct elf64_shdr { #define NT_386_TLS 0x200 /* i386 TLS slots (struct user_desc) */ #define NT_386_IOPERM 0x201 /* x86 io permission bitmap (1=deny) */ #define NT_X86_XSTATE 0x202 /* x86 extended state using xsave */ +/* Old binutils treats 0x203 as a CET state */ +#define NT_X86_SHSTK 0x204 /* x86 SHSTK state */ #define NT_S390_HIGH_GPRS 0x300 /* s390 upper register halves */ #define NT_S390_TIMER 0x301 /* s390 timer register */ #define NT_S390_TODCMP 0x302 /* s390 TOD clock comparator register */ From patchwork Tue Jun 13 00:11:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277756 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97FC4C7EE2E for ; Tue, 13 Jun 2023 00:13:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D9A198E0029; Mon, 12 Jun 2023 20:12:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CD97B8E0003; Mon, 12 Jun 2023 20:12:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A3F0F8E0029; Mon, 12 Jun 2023 20:12:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8ABAE8E0003 for ; Mon, 12 Jun 2023 20:12:48 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id C36871C7CAB for ; Tue, 13 Jun 2023 00:12:47 +0000 (UTC) X-FDA: 80895798774.15.8D50F60 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf08.hostedemail.com (Postfix) with ESMTP id A354B160011 for ; Tue, 13 Jun 2023 00:12:45 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=aKNYSLyl; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615166; a=rsa-sha256; cv=none; b=8U64MMcicPqnal12ax/d3R0tbKnD+lI+n/JvbLDtTC0eHJ33Pv4f/awLnzBUbQJ1xlaJtR dZ2rOxnzab4BT6FYzfQFKftvTTGMJoh6200XF69hWfVi0FnTwIgyOer08tbUH5e8spppHD kuFV1gchs31nfHu3byg33Y9O0NC5814= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=aKNYSLyl; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615165; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RkSEKvOu5kgy5PUfrb3PfcTMpS5rTzAByk+WnpSbRuQ=; b=ZnMOvLZvxbEsIxADyqzqR5CSlUyIHM0xgbdBLvSNnJrvLWNsUH6cLdY4Yu5S4IbVFABG5D FOc2sTtLmFHTNW/eKfJE8zlv2QiHU1a52qnYmPvE5934JXRcqkoPMrLEhzSQ2Rd52175nG Y2Zg2UNFDv0nlkABhRYNfi00FHYMFcg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615165; x=1718151165; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hFHVnW9lxN9HUD3QIC9fY4x2IPOgeGJUGkzddQmtaZw=; b=aKNYSLyliunpSvJCIc1022QyindVWQwpEm2w+BPn/L40RO4wWN7G9PNQ BimElzG4WMkpfZRkr/8jV6iLvPnxmbDKuKzIXDEeS7tmQHiBVu57dqy/5 fklmgkFBdZUA6RUTz7Urabragzz1ofjb4j8bXXQbM4jwDT9lrdWAU/5P4 8jlHu/YwwjBU+s92TWS14CQ9UeIgjT9GYfp+J7wbH5jC1LTMGaAGAcI7H mMiLS7gX/4LMuLUOx4OLstgpYozwg3mFcvj5cHMyudk0EztowM3kC8PBM b1mWq9Hjjf1GFBHL0l8F93xW9LSVITLlas1u3vCA3dGIuFjuvIejY/EuR A==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557646" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557646" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671183" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671183" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:43 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Mike Rapoport , Pengfei Xu Subject: [PATCH v9 41/42] x86/shstk: Add ARCH_SHSTK_UNLOCK Date: Mon, 12 Jun 2023 17:11:07 -0700 Message-Id: <20230613001108.3040476-42-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: A354B160011 X-Stat-Signature: d6untj4ow1rcksauwjnmbha6765zkuxp X-HE-Tag: 1686615165-84826 X-HE-Meta: U2FsdGVkX18hKx9Mger/nV1i2rJLUA/rLXQeZeVVlVl9oFP9OT0bFy2DYabUQ0q92BUm9mRg3VgOyXFXY2kRL74zujHlmKt8fTSf/iZWUi1D65smY03lG53fS7ACiLgSpt9RY+sfLIeEhgdlC79F9GwiR5e6MgD6iAfC2QYxrD14TXTMuN3IAwN1xN5erzT4BQDBpoDeH8VMtlnY6KgXA4Td+5Kfhxyrc9tywYS6PGt4D+VIfLMCtoxIrqmdXD7GmP6W4R9M7+b9XAt78Q9Cru6oPxeog2Bm7itNNraSyG16KUnYAMJjI9dEweQFybTRC0K/J4/RMcRi6mMP/uZ/PdiBmTc+zriZaWflRecwkOQ90Xnxv6/8Ru9I0I469ix6JxuyC7hRJUyZ4Qj0MCBTmfnyIOChVu6bJC/8zrwGlStbAQ4/Nyu5i7LIeDlzbInLMIbuPRGuxm02NiNohQkyOOd4eT3HxenmezF6xTt+fFXc5hPpdUQZ+jgYraRf2iSVmxVDssfr7KhQHn8tgfS4a2qRWX4Mte+/j//2syv0LGwIicc9GzHlrYDTfOOKxTQC3zuMxEGulT/szV+QwPbo+C/C3yXjNlbNvoxlBAqkU3nYusORp90oU/iRA4S1Dr+4VJXWzVemsbJ7NNmo6zeW4GYy3tFlz7lwYfBYPv94vEq9aQ+gOFBfgFdP61uH/YiYS0wGP/dNiuw8czvVq2M11HxXb/MzYf/L13SPPN44jii5qK2Wpsfd68gjPdnI/ATkZtsnm7rmJJ2zYBLJ6iUeTeswAV+xw/Lk33Z73eDasGEgpY1bsRPFDnulwzmXaZmCj8LDHPL3o+rLGKznWW9qHON9KF55r8MLD+XO4StQ3QjGbDEJyjkaPcwrdPpInwlalIsp0rRUJ0fVgWTGhi7IEr38+6n0jVuWRxl45yWjQnxmZt9Nd5hzI5dAEyj2ti5U/ykf+PGyTzrxj5gSLd/ 7qoh1f7S RFA2SssBKHAaBe1qJ1QIglW1F9V9ShKPTkG4JIl+L7FrwdJHEYgMNTb6skRzP3G8QKx8XXk/wjB4tbAVt6vTrIPH8peV3lxDcsVTG2UWRKpFzjxHXhW4alH7RH4NsxcIxIkJk5P9gdtQYsO24a5I2FWASptdL7VJJP6W00Jll4UceqiBsAezZHexhZ+Ob/DsyayCq1yqcGXUPoaj84Kp+SsV+Edzym5NeO9HhKuxBZW+Wsqyu/dLLY4hgUAF57HDbulazWX7Ljsg3KZV7kYv4gOEUl3cLhxlciutaOrey9aeiDMJL1hXi/iObnfoa/tfVh3zPXCf/DERIVkiv6hJ8Syxj4UJbNTjhHTOJ1n6As0TnLAouZsfwRBapKPyqQRa/yTt/ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Mike Rapoport Userspace loaders may lock features before a CRIU restore operation has the chance to set them to whatever state is required by the process being restored. Allow a way for CRIU to unlock features. Add it as an arch_prctl() like the other shadow stack operations, but restrict it being called by the ptrace arch_pctl() interface. [Merged into recent API changes, added commit log and docs] Signed-off-by: Mike Rapoport Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- Documentation/arch/x86/shstk.rst | 4 ++++ arch/x86/include/uapi/asm/prctl.h | 1 + arch/x86/kernel/process_64.c | 1 + arch/x86/kernel/shstk.c | 9 +++++++-- 4 files changed, 13 insertions(+), 2 deletions(-) diff --git a/Documentation/arch/x86/shstk.rst b/Documentation/arch/x86/shstk.rst index f09afa504ec0..f3553cc8c758 100644 --- a/Documentation/arch/x86/shstk.rst +++ b/Documentation/arch/x86/shstk.rst @@ -75,6 +75,10 @@ arch_prctl(ARCH_SHSTK_LOCK, unsigned long features) are ignored. The mask is ORed with the existing value. So any feature bits set here cannot be enabled or disabled afterwards. +arch_prctl(ARCH_SHSTK_UNLOCK, unsigned long features) + Unlock features. 'features' is a mask of all features to unlock. All + bits set are processed, unset bits are ignored. Only works via ptrace. + The return values are as follows. On success, return 0. On error, errno can be:: diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index eedfde3b63be..3189c4a96468 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -33,6 +33,7 @@ #define ARCH_SHSTK_ENABLE 0x5001 #define ARCH_SHSTK_DISABLE 0x5002 #define ARCH_SHSTK_LOCK 0x5003 +#define ARCH_SHSTK_UNLOCK 0x5004 /* ARCH_SHSTK_ features bits */ #define ARCH_SHSTK_SHSTK (1ULL << 0) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 0f89aa0186d1..e6db21c470aa 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -899,6 +899,7 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2) case ARCH_SHSTK_ENABLE: case ARCH_SHSTK_DISABLE: case ARCH_SHSTK_LOCK: + case ARCH_SHSTK_UNLOCK: return shstk_prctl(task, option, arg2); default: ret = -EINVAL; diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index d723cdc93474..d43b7a9c57ce 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -489,9 +489,14 @@ long shstk_prctl(struct task_struct *task, int option, unsigned long features) return 0; } - /* Don't allow via ptrace */ - if (task != current) + /* Only allow via ptrace */ + if (task != current) { + if (option == ARCH_SHSTK_UNLOCK && IS_ENABLED(CONFIG_CHECKPOINT_RESTORE)) { + task->thread.features_locked &= ~features; + return 0; + } return -EINVAL; + } /* Do not allow to change locked features */ if (features & task->thread.features_locked) From patchwork Tue Jun 13 00:11:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277757 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25E28C88CB2 for ; Tue, 13 Jun 2023 00:14:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 472558E002A; Mon, 12 Jun 2023 20:12:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3ACCF8E0003; Mon, 12 Jun 2023 20:12:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0EDB18E002B; Mon, 12 Jun 2023 20:12:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E6F9A8E002A for ; Mon, 12 Jun 2023 20:12:48 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C0944AF5E9 for ; Tue, 13 Jun 2023 00:12:48 +0000 (UTC) X-FDA: 80895798816.09.FBBFD3F Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf01.hostedemail.com (Postfix) with ESMTP id A600F4000B for ; Tue, 13 Jun 2023 00:12:46 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=bYyJxVFf; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615166; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ID5oQBvY8UKE1ZCeO9J0p0kT0pVp+VavLYCmip95cII=; b=aBxUuPfgorWDs1SJih/nKMUHgmZAAvy9mHYlma/laYMag4Y17gcOZV823qVxl712/GmOto YVEhYEBLhv+giof06XrUWvJALb5lcAIELD/1PT7xQgbZOuiNsc1Yn+ihN9TjLnhvHe0Jod +hhYpe2agAZ6aVOf82D+8hLla/0QRIQ= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=bYyJxVFf; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615166; a=rsa-sha256; cv=none; b=uYvvlolWR1t5QTCaIL5Ec6LpLZH3qxdSVK+jmSKseM+aj7YmcBmj2uw9vs1oIbOrjFBmPq mUvRzw3evn3IFVepDKtduk3p9OvrNWL8NyF5XAV36mCICrn1A2wbVTZnpqpxNoXYVBGP1w 2wG9fCEKOHF4Vcvs1WYjTfkOYpsFDyQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615166; x=1718151166; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MIQ1pWvE/RoBE8pKawMdEBQM6Q57ao+KDpsn7QGopL8=; b=bYyJxVFfFBKkMXtqdYa4yEor8ICTpAIdiTx+e+lM4k8eduEx1SRGmeit +HLY4DuVxp4DC3hnPMvBpknup1vvrHSs2TQhMiyWGkDBUsS1pIhmbSfvp pdItyl6QDx5Hnhs9d6RypU+zonDZpDGNU85/CMyxY/NOhLy8Hu8zeu3v4 6qH0HQOQVIfsL5HslHS5TRMDnPfNIfeyFZx13uNuYwQlZBOtH4EOhtIgs lG/DEsvcgxwfwRRHtuQpe7L2kjjWQPl6LZfcLY/cfMHmG8KkXw2Ui1TtM /XBrpFLL7lz/2WqkpUusd7ekmMlG82+RSy1nXMg6vCA//WV6k4e7XWHN3 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361557671" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361557671" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671189" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671189" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:43 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Pengfei Xu Subject: [PATCH v9 42/42] x86/shstk: Add ARCH_SHSTK_STATUS Date: Mon, 12 Jun 2023 17:11:08 -0700 Message-Id: <20230613001108.3040476-43-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: A600F4000B X-Stat-Signature: gre8xrti1a6ypbfxxonsez95dqzx1x4j X-Rspam-User: X-HE-Tag: 1686615166-712541 X-HE-Meta: U2FsdGVkX18bjQU4N2j4BPrcmWBYiHb+cvsm5h0xb7H4j2YjXgqMqyYv0kjWd2oTkG8M6rn6Fm6rIvlY3zIOIgOKtpNwr+4hhHJ0TSS6dYujMwlOWxRmhTfhZfUOxe3MHYwAQ36KlgrqROHBSUaL93SHnhCsLOs2yO/9TUP4Slxp2ogQUgSGJuDHOvZq+PhzuQvvPDPsm5JMPAp9L4WOSuTxD+KDyt96wdDQ20Ul+Po71bi8dIU7NEcBWTmIlaz+2qSiOTgCXjTND9XM9+swihMJFU6VfeiD+X/6qJeS9LUZKgJiDmuh72PD2CrYD45leAdm98KydTtIewhgkBzRzAhOmjI6RTZhsJZYzyVXY57jhgFfrhXkS6AWtBCa22crhJkfnkeAve2Qh7hWNR9h0KYJqn8zmkMCaizQDdivDEByT9LJhl/dFf8MWVNJBBKiTnU7RzqrhH1uvA4Dt38NnDRUUk1Vd+TtyZ7231vs4rYmgC285iZmhSRNJuezrEFgbUHbx95tFR0/D8XET23ymt/AcHN+XA2LfKzgiJoK6NHVDiJ0ILimByW6sa9e6MoazjW8LgLgWY+pUwV6X0lyIjAzMO1Capcf9VPQbtXnfStTuGkOnquk12Mq+ZNyPMxz4QuyAaCamYFNZN7wjfw2xkKVyZpp/nvaHNfTbr2v2/H87rcT16n7VPqHn4yS1T+7t6ggCOmcrawSwa5ZbMyYxv/ExXwj6Hq9GSbOKRXl4VA4wGSWzHkLQUdoIx1zk971MT92QadwqGfy/LBr5Fre5UhDJ6Kg1MKCxB1JCaXQYMac+tkIigf4FpT/lAI1mBsj+GqURX6ndfI8Vl4btCuub9pSZtRwB5W2SV8QsW4Jtsi2YnucqqojpBLPGxU3ZJD1igq385kcbXa/zbUMFm2R1ccgYO5tEH9BnL8M9p77qPj3c36FI86o2AEtQU1ILaPUqEBbuH/5+QGYzme6yaV xJTFuGEz jOjsb9XtuyxlEKzGJudHDtgmD1WNRLFiEZAMxZGAvBviRwyKjT7kfLKyhIj3gRCv7/SWgwSkAX28GkCYF5zf1oyOv5g8XMexXN0YvdqNi2Op9l/K6Dt18CYdxw1Rou4vf7OVtXLv7h9TJeeAbAUXjf7ShbjOIhbBgxVj4YSs6olN3LsVrjKPrvgCiIsCpCayJr2zrrvrgUbGrx0zTRYgQdLAyd20LuNegtnwkr8gDaDbT5UCYG5xzpi11VPl2C7vxyU/b3MaP7CSk1Pmv28PNnblI2wPVIPiLcqObtdvpGnuSFkqDJ2udd+XlntoeIQ35XBCr/seh6qGn/uVzmjTu9fkYFeMahjGgX634vfVywr+NMD8UO/xwpCHq/JVgQzMrLOuaa/8XeWmhAK4oIBJQvojwcv3akXHz97zT6K/yH+OtutI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: CRIU and GDB need to get the current shadow stack and WRSS enablement status. This information is already available via /proc/pid/status, but this is inconvenient for CRIU because it involves parsing the text output in an area of the code where this is difficult. Provide a status arch_prctl(), ARCH_SHSTK_STATUS for retrieving the status. Have arg2 be a userspace address, and make the new arch_prctl simply copy the features out to userspace. Suggested-by: Mike Rapoport Signed-off-by: Rick Edgecombe Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- Documentation/arch/x86/shstk.rst | 6 ++++++ arch/x86/include/asm/shstk.h | 2 +- arch/x86/include/uapi/asm/prctl.h | 1 + arch/x86/kernel/process_64.c | 1 + arch/x86/kernel/shstk.c | 8 +++++++- 5 files changed, 16 insertions(+), 2 deletions(-) diff --git a/Documentation/arch/x86/shstk.rst b/Documentation/arch/x86/shstk.rst index f3553cc8c758..60260e809baf 100644 --- a/Documentation/arch/x86/shstk.rst +++ b/Documentation/arch/x86/shstk.rst @@ -79,6 +79,11 @@ arch_prctl(ARCH_SHSTK_UNLOCK, unsigned long features) Unlock features. 'features' is a mask of all features to unlock. All bits set are processed, unset bits are ignored. Only works via ptrace. +arch_prctl(ARCH_SHSTK_STATUS, unsigned long addr) + Copy the currently enabled features to the address passed in addr. The + features are described using the bits passed into the others in + 'features'. + The return values are as follows. On success, return 0. On error, errno can be:: @@ -86,6 +91,7 @@ be:: -ENOTSUPP if the feature is not supported by the hardware or kernel. -EINVAL arguments (non existing feature, etc) + -EFAULT if could not copy information back to userspace The feature's bits supported are:: diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h index ecb23a8ca47d..42fee8959df7 100644 --- a/arch/x86/include/asm/shstk.h +++ b/arch/x86/include/asm/shstk.h @@ -14,7 +14,7 @@ struct thread_shstk { u64 size; }; -long shstk_prctl(struct task_struct *task, int option, unsigned long features); +long shstk_prctl(struct task_struct *task, int option, unsigned long arg2); void reset_thread_features(void); unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, unsigned long stack_size); diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 3189c4a96468..384e2cc6ac19 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -34,6 +34,7 @@ #define ARCH_SHSTK_DISABLE 0x5002 #define ARCH_SHSTK_LOCK 0x5003 #define ARCH_SHSTK_UNLOCK 0x5004 +#define ARCH_SHSTK_STATUS 0x5005 /* ARCH_SHSTK_ features bits */ #define ARCH_SHSTK_SHSTK (1ULL << 0) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index e6db21c470aa..33b268747bb7 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -900,6 +900,7 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2) case ARCH_SHSTK_DISABLE: case ARCH_SHSTK_LOCK: case ARCH_SHSTK_UNLOCK: + case ARCH_SHSTK_STATUS: return shstk_prctl(task, option, arg2); default: ret = -EINVAL; diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index d43b7a9c57ce..b26810c7cd1c 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -482,8 +482,14 @@ SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsi return alloc_shstk(addr, aligned_size, size, set_tok); } -long shstk_prctl(struct task_struct *task, int option, unsigned long features) +long shstk_prctl(struct task_struct *task, int option, unsigned long arg2) { + unsigned long features = arg2; + + if (option == ARCH_SHSTK_STATUS) { + return put_user(task->thread.features, (unsigned long __user *)arg2); + } + if (option == ARCH_SHSTK_LOCK) { task->thread.features_locked |= features; return 0;