From patchwork Fri Sep 25 14:56:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11799963 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 32806112C for ; Fri, 25 Sep 2020 14:57:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DF4BF2389F for ; Fri, 25 Sep 2020 14:57:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DF4BF2389F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 19CC56B0068; Fri, 25 Sep 2020 10:57:19 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 152EF6B006C; Fri, 25 Sep 2020 10:57:19 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E70236B006E; Fri, 25 Sep 2020 10:57:18 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0135.hostedemail.com [216.40.44.135]) by kanga.kvack.org (Postfix) with ESMTP id C6E1E6B0068 for ; Fri, 25 Sep 2020 10:57:18 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 8EFA3180AD806 for ; Fri, 25 Sep 2020 14:57:18 +0000 (UTC) X-FDA: 77301886956.03.drink60_160f05f27168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 69CB128A4E8 for ; Fri, 25 Sep 2020 14:57:18 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30012:30046:30051:30054:30055:30056:30062:30064:30069:30070:30075:30079:30089,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04ygyx78t5mu7nt18957fdeqzqwciyc548gqoywf6ooi5rj41zufsijxrimetf7.591k1trobmhbni14qe7sam6kbnd9h8k1ey5ghdyem9piqgrxxenx88sjd1axbkr.s-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: drink60_160f05f27168 X-Filterd-Recvd-Size: 10171 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:16 +0000 (UTC) IronPort-SDR: xS+jhONRoAi23QhHMLdLcj+h1eA5onOZ+DS7eg8Oy6stewGpaKAZEdunwyM4GnImeGfYn9H4jd YDVukBNdmo1g== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631879" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631879" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:14 -0700 IronPort-SDR: Er0JJcz2MKSigqT+baqkcjHmkETdh31Gm4Br8Y4f7TQT0ghLjTZJWwynNHKfguqa0AgEJBXoOA Y8IPA7NEUiqw== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499128" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:13 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v13 01/26] Documentation/x86: Add CET description Date: Fri, 25 Sep 2020 07:56:24 -0700 Message-Id: <20200925145649.5438-2-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Explain no_user_shstk/no_user_ibt kernel parameters, and introduce a new document on Control-flow Enforcement Technology (CET). Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- v13: - Change X86_INTEL_* to X86_*. v12: - Remove ARCH_X86_CET_MMAP_SHSTK information. v11: - Add back GLIBC tunables information. - Add ARCH_X86_CET_MMAP_SHSTK information. v10: - Change no_cet_shstk and no_cet_ibt to no_user_shstk and no_user_ibt. - Remove the opcode section, as it is already in the Intel SDM. - Remove sections related to GLIBC implementation. - Remove shadow stack memory management section, as it is already in the code comments. - Remove legacy bitmap related information, as it is not supported now. - Fix arch_ioctl() related text. - Change SHSTK, IBT to plain English. .../admin-guide/kernel-parameters.txt | 6 + Documentation/x86/index.rst | 1 + Documentation/x86/intel_cet.rst | 133 ++++++++++++++++++ 3 files changed, 140 insertions(+) create mode 100644 Documentation/x86/intel_cet.rst diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index a1068742a6df..7c7124a6a7ac 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3164,6 +3164,12 @@ noexec=on: enable non-executable mappings (default) noexec=off: disable non-executable mappings + no_user_shstk [X86-64] Disable Shadow Stack for user-mode + applications + + no_user_ibt [X86-64] Disable Indirect Branch Tracking for user-mode + applications + nosmap [X86,PPC] Disable SMAP (Supervisor Mode Access Prevention) even if it is supported by processor. diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index 265d9e9a093b..2aef972a868d 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -19,6 +19,7 @@ x86-specific Documentation tlb mtrr pat + intel_cet intel-iommu intel_txt amd-memory-encryption diff --git a/Documentation/x86/intel_cet.rst b/Documentation/x86/intel_cet.rst new file mode 100644 index 000000000000..c9ebb3d9dd00 --- /dev/null +++ b/Documentation/x86/intel_cet.rst @@ -0,0 +1,133 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================================= +Control-flow Enforcement Technology (CET) +========================================= + +[1] Overview +============ + +Control-flow Enforcement Technology (CET) is an Intel processor feature +that provides protection against return/jump-oriented programming (ROP) +attacks. It can be set up to protect both applications and the kernel. +Only user-mode protection is implemented in the 64-bit kernel, including +support for running legacy 32-bit applications. + +CET introduces Shadow Stack and Indirect Branch Tracking. Shadow stack is +a secondary stack allocated from memory and cannot be directly modified by +applications. When executing a CALL, the processor pushes the return +address to both the normal stack and the shadow stack. Upon function +return, the processor pops the shadow stack copy and compares it to the +normal stack copy. If the two differ, the processor raises a control- +protection fault. Indirect branch tracking verifies indirect CALL/JMP +targets are intended as marked by the compiler with 'ENDBR' opcodes. + +There are two kernel configuration options: + + X86_SHADOW_STACK_USER, and + X86_BRANCH_TRACKING_USER. + +These need to be enabled to build a CET-enabled kernel, and Binutils v2.31 +and GCC v8.1 or later are required to build a CET kernel. To build a CET- +enabled application, GLIBC v2.28 or later is also required. + +There are two command-line options for disabling CET features:: + + no_user_shstk - disables user shadow stack, and + no_user_ibt - disables user indirect branch tracking. + +At run time, /proc/cpuinfo shows CET features if the processor supports +CET. + +[2] Application Enabling +======================== + +An application's CET capability is marked in its ELF header and can be +verified from the following command output, in the NT_GNU_PROPERTY_TYPE_0 +field: + + readelf -n + +If an application supports CET and is statically linked, it will run with +CET protection. If the application needs any shared libraries, the loader +checks all dependencies and enables CET when all requirements are met. + +[3] Backward Compatibility +========================== + +GLIBC provides a few tunables for backward compatibility. + +GLIBC_TUNABLES=glibc.tune.hwcaps=-SHSTK,-IBT + Turn off SHSTK/IBT for the current shell. + +GLIBC_TUNABLES=glibc.tune.x86_shstk= + This controls how dlopen() handles SHSTK legacy libraries:: + + on - continue with SHSTK enabled; + permissive - continue with SHSTK off. + +[4] CET arch_prctl()'s +====================== + +Several arch_prctl()'s have been added for CET: + +arch_prctl(ARCH_X86_CET_STATUS, u64 *addr) + Return CET feature status. + + The parameter 'addr' is a pointer to a user buffer. + On returning to the caller, the kernel fills the following + information:: + + *addr = shadow stack/indirect branch tracking status + *(addr + 1) = shadow stack base address + *(addr + 2) = shadow stack size + +arch_prctl(ARCH_X86_CET_DISABLE, unsigned int features) + Disable shadow stack and/or indirect branch tracking as specified in + 'features'. Return -EPERM if CET is locked. + +arch_prctl(ARCH_X86_CET_LOCK) + Lock in all CET features. They cannot be turned off afterwards. + +Note: + There is no CET-enabling arch_prctl function. By design, CET is enabled + automatically if the binary and the system can support it. + +[5] The implementation of the Shadow Stack +========================================== + +Shadow Stack size +----------------- + +A task's shadow stack is allocated from memory to a fixed size of +MIN(RLIMIT_STACK, 4 GB). In other words, the shadow stack is allocated to +the maximum size of the normal stack, but capped to 4 GB. However, +a compat-mode application's address space is smaller, each of its thread's +shadow stack size is MIN(1/4 RLIMIT_STACK, 4 GB). + +Signal +------ + +The main program and its signal handlers use the same shadow stack. +Because the shadow stack stores only return addresses, a large shadow +stack covers the condition that both the program stack and the signal +alternate stack run out. + +The kernel creates a restore token for the shadow stack restoring address +and verifies that token when restoring from the signal handler. + +Fork +---- + +The shadow stack's vma has VM_SHSTK flag set; its PTEs are required to be +read-only and dirty. When a shadow stack PTE is not RO and dirty, a +shadow access triggers a page fault with the shadow stack access bit set +in the page fault error code. + +When a task forks a child, its shadow stack PTEs are copied and both the +parent's and the child's shadow stack PTEs are cleared of the dirty bit. +Upon the next shadow stack access, the resulting shadow stack page fault +is handled by page copy/re-use. + +When a pthread child is created, the kernel allocates a new shadow stack +for the new thread. From patchwork Fri Sep 25 14:56:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11799965 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3CD2C112C for ; Fri, 25 Sep 2020 14:57:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0DD1720715 for ; Fri, 25 Sep 2020 14:57:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0DD1720715 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6CEFA6B006C; Fri, 25 Sep 2020 10:57:19 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 631DB6B006E; Fri, 25 Sep 2020 10:57:19 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 439E36B0071; Fri, 25 Sep 2020 10:57:19 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0044.hostedemail.com [216.40.44.44]) by kanga.kvack.org (Postfix) with ESMTP id 216DA6B006E for ; Fri, 25 Sep 2020 10:57:19 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D492C5DD5 for ; Fri, 25 Sep 2020 14:57:18 +0000 (UTC) X-FDA: 77301886956.14.basin28_181135627168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id AC49918229818 for ; Fri, 25 Sep 2020 14:57:18 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30055:30056:30064,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yrr38nb1oh9kicoiuipwqp98r3yop6kcwpah6sgdti4kgsan3ce67oam3sxss.yfraxaqdyt9djbh6kyxfqb4hn4e9qtxx9qxi6wcux5fprcoeznpfte7qdqfh1ax.q-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: basin28_181135627168 X-Filterd-Recvd-Size: 6456 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:17 +0000 (UTC) IronPort-SDR: TZCur9DoQabY6nWCZHgMWJkjsFKrzqCM3Laq/vsNlF9o84xGet+3f4ZX5OPE2aMKRMSjtfj5yS W2SpNEBL4q3Q== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631881" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631881" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:15 -0700 IronPort-SDR: AHkmUdFYoTL2TwSbkYScum328qgCyU2qmzdxvrHiGXotpfNZibXm4/eEfSeagHO84o3Cd+Fqzf Lq1hNNkbd08Q== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499136" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:14 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu , Borislav Petkov Subject: [PATCH v13 02/26] x86/cpufeatures: Add CET CPU feature flags for Control-flow Enforcement Technology (CET) Date: Fri, 25 Sep 2020 07:56:25 -0700 Message-Id: <20200925145649.5438-3-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add CPU feature flags for Control-flow Enforcement Technology (CET). CPUID.(EAX=7,ECX=0):ECX[bit 7] Shadow stack CPUID.(EAX=7,ECX=0):EDX[bit 20] Indirect Branch Tracking Signed-off-by: Yu-cheng Yu Reviewed-by: Borislav Petkov Reviewed-by: Kees Cook --- arch/x86/include/asm/cpufeatures.h | 2 ++ arch/x86/kernel/cpu/cpuid-deps.c | 2 ++ tools/arch/x86/include/asm/cpufeatures.h | 2 ++ 3 files changed, 6 insertions(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 2901d5df4366..c794e18e8a14 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -341,6 +341,7 @@ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ #define X86_FEATURE_WAITPKG (16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */ #define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ +#define X86_FEATURE_SHSTK (16*32+ 7) /* Shadow Stack */ #define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ #define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */ #define X86_FEATURE_VPCLMULQDQ (16*32+10) /* Carry-Less Multiplication Double Quadword */ @@ -370,6 +371,7 @@ #define X86_FEATURE_SERIALIZE (18*32+14) /* SERIALIZE instruction */ #define X86_FEATURE_PCONFIG (18*32+18) /* Intel PCONFIG */ #define X86_FEATURE_ARCH_LBR (18*32+19) /* Intel ARCH LBR */ +#define X86_FEATURE_IBT (18*32+20) /* Indirect Branch Tracking */ #define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */ #define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */ #define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */ diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c index 3cbe24ca80ab..fec83cc74b9e 100644 --- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -69,6 +69,8 @@ static const struct cpuid_dep cpuid_deps[] = { { X86_FEATURE_CQM_MBM_TOTAL, X86_FEATURE_CQM_LLC }, { X86_FEATURE_CQM_MBM_LOCAL, X86_FEATURE_CQM_LLC }, { X86_FEATURE_AVX512_BF16, X86_FEATURE_AVX512VL }, + { X86_FEATURE_SHSTK, X86_FEATURE_XSAVES }, + { X86_FEATURE_IBT, X86_FEATURE_XSAVES }, {} }; diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h index 2901d5df4366..c794e18e8a14 100644 --- a/tools/arch/x86/include/asm/cpufeatures.h +++ b/tools/arch/x86/include/asm/cpufeatures.h @@ -341,6 +341,7 @@ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ #define X86_FEATURE_WAITPKG (16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */ #define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ +#define X86_FEATURE_SHSTK (16*32+ 7) /* Shadow Stack */ #define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ #define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */ #define X86_FEATURE_VPCLMULQDQ (16*32+10) /* Carry-Less Multiplication Double Quadword */ @@ -370,6 +371,7 @@ #define X86_FEATURE_SERIALIZE (18*32+14) /* SERIALIZE instruction */ #define X86_FEATURE_PCONFIG (18*32+18) /* Intel PCONFIG */ #define X86_FEATURE_ARCH_LBR (18*32+19) /* Intel ARCH LBR */ +#define X86_FEATURE_IBT (18*32+20) /* Indirect Branch Tracking */ #define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */ #define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */ #define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */ From patchwork Fri Sep 25 14:56:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11799967 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0B77592C for ; Fri, 25 Sep 2020 14:57:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BE5E62311D for ; Fri, 25 Sep 2020 14:57:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BE5E62311D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B903C6B006E; Fri, 25 Sep 2020 10:57:19 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AECC76B0070; Fri, 25 Sep 2020 10:57:19 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 93F806B0072; Fri, 25 Sep 2020 10:57:19 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0069.hostedemail.com [216.40.44.69]) by kanga.kvack.org (Postfix) with ESMTP id 7B57E6B0070 for ; Fri, 25 Sep 2020 10:57:19 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 338545821 for ; Fri, 25 Sep 2020 14:57:19 +0000 (UTC) X-FDA: 77301886998.20.bee70_3d0c51327168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id 12F6C180C07AB for ; Fri, 25 Sep 2020 14:57:19 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30034:30045:30051:30054:30056:30064:30075:30090,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yrid1emso7t4d3xziq3gmtjb6beycj9xm6ypeqe7u78qchcpxsso9ty9fho46.ity8bkqfznxtt9makkfoyg5ot1nbc5jq1us63xrhpxnt6suc3t1e7w4kztpk9i1.w-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:28,LUA_SUMMARY:none X-HE-Tag: bee70_3d0c51327168 X-Filterd-Recvd-Size: 12564 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:17 +0000 (UTC) IronPort-SDR: KRFVt2lyVPFiSLCxDbqCzkX6hH7AI5Wp9wWTtC9LgajgrWrdo/dwaFTVc6FYVzTE/gX0/juUww aHw2przSIt1g== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631886" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631886" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:15 -0700 IronPort-SDR: i0PajnRxEGQuQU2eyLvB8/v79lhQNnzsI2etr3yVtl+ZP0vEmBAW+ivnoh5UvuXITaXq3EbPWJ yHwdUqkcfnjw== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499140" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:15 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v13 03/26] x86/fpu/xstate: Introduce CET MSR XSAVES supervisor states Date: Fri, 25 Sep 2020 07:56:26 -0700 Message-Id: <20200925145649.5438-4-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Control-flow Enforcement Technology (CET) adds five MSRs. Introduce them and their XSAVES supervisor states: MSR_IA32_U_CET (user-mode CET settings), MSR_IA32_PL3_SSP (user-mode Shadow Stack pointer), MSR_IA32_PL0_SSP (kernel-mode Shadow Stack pointer), MSR_IA32_PL1_SSP (Privilege Level 1 Shadow Stack pointer), MSR_IA32_PL2_SSP (Privilege Level 2 Shadow Stack pointer). Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- v12: - Add definition of reserved/suppress* bits for MSR_IA32_U_CET and MSR_IA32_S_CET. v11: - Drop MSR_IA32 prefix for individual bits, and use BIT_ULL(). - Drop MSR_IA32_CET_BITMAP_MASK. v6: - Remove __packed from struct cet_user_state, struct cet_kernel_state. arch/x86/include/asm/fpu/types.h | 23 +++++++++++++++-- arch/x86/include/asm/fpu/xstate.h | 5 ++-- arch/x86/include/asm/msr-index.h | 20 +++++++++++++++ arch/x86/include/uapi/asm/processor-flags.h | 2 ++ arch/x86/kernel/fpu/xstate.c | 28 ++++++++++++++++++--- tools/arch/x86/include/asm/msr-index.h | 20 +++++++++++++++ 6 files changed, 91 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index c87364ea6446..2a7037a6f960 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -115,8 +115,8 @@ enum xfeature { XFEATURE_PT_UNIMPLEMENTED_SO_FAR, XFEATURE_PKRU, XFEATURE_RSRVD_COMP_10, - XFEATURE_RSRVD_COMP_11, - XFEATURE_RSRVD_COMP_12, + XFEATURE_CET_USER, + XFEATURE_CET_KERNEL, XFEATURE_RSRVD_COMP_13, XFEATURE_RSRVD_COMP_14, XFEATURE_LBR, @@ -134,6 +134,8 @@ enum xfeature { #define XFEATURE_MASK_Hi16_ZMM (1 << XFEATURE_Hi16_ZMM) #define XFEATURE_MASK_PT (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR) #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU) +#define XFEATURE_MASK_CET_USER (1 << XFEATURE_CET_USER) +#define XFEATURE_MASK_CET_KERNEL (1 << XFEATURE_CET_KERNEL) #define XFEATURE_MASK_LBR (1 << XFEATURE_LBR) #define XFEATURE_MASK_FPSSE (XFEATURE_MASK_FP | XFEATURE_MASK_SSE) @@ -236,6 +238,23 @@ struct pkru_state { u32 pad; } __packed; +/* + * State component 11 is Control-flow Enforcement user states + */ +struct cet_user_state { + u64 user_cet; /* user control-flow settings */ + u64 user_ssp; /* user shadow stack pointer */ +}; + +/* + * State component 12 is Control-flow Enforcement kernel states + */ +struct cet_kernel_state { + u64 kernel_ssp; /* kernel shadow stack */ + u64 pl1_ssp; /* privilege level 1 shadow stack */ + u64 pl2_ssp; /* privilege level 2 shadow stack */ +}; + /* * State component 15: Architectural LBR configuration state. * The size of Arch LBR state depends on the number of LBRs (lbr_depth). diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h index 14ab815132d4..e4408db88bca 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -35,7 +35,7 @@ XFEATURE_MASK_BNDCSR) /* All currently supported supervisor features */ -#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (0) +#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_CET_USER) /* * A supervisor state component may not always contain valuable information, @@ -62,7 +62,8 @@ * Unsupported supervisor features. When a supervisor feature in this mask is * supported in the future, move it to the supported supervisor feature mask. */ -#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT) +#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT | \ + XFEATURE_MASK_CET_KERNEL) /* All supervisor states including supported and unsupported states. */ #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \ diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 2859ee4f39a8..f6f3a0e6c664 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -912,4 +912,24 @@ #define MSR_VM_IGNNE 0xc0010115 #define MSR_VM_HSAVE_PA 0xc0010117 +/* Control-flow Enforcement Technology MSRs */ +#define MSR_IA32_U_CET 0x6a0 /* user mode cet setting */ +#define MSR_IA32_S_CET 0x6a2 /* kernel mode cet setting */ +#define MSR_IA32_PL0_SSP 0x6a4 /* kernel shstk pointer */ +#define MSR_IA32_PL1_SSP 0x6a5 /* ring-1 shstk pointer */ +#define MSR_IA32_PL2_SSP 0x6a6 /* ring-2 shstk pointer */ +#define MSR_IA32_PL3_SSP 0x6a7 /* user shstk pointer */ +#define MSR_IA32_INT_SSP_TAB 0x6a8 /* exception shstk table */ + +/* MSR_IA32_U_CET and MSR_IA32_S_CET bits */ +#define CET_SHSTK_EN BIT_ULL(0) +#define CET_WRSS_EN BIT_ULL(1) +#define CET_ENDBR_EN BIT_ULL(2) +#define CET_LEG_IW_EN BIT_ULL(3) +#define CET_NO_TRACK_EN BIT_ULL(4) +#define CET_SUPPRESS_DISABLE BIT_ULL(5) +#define CET_RESERVED (BIT_ULL(6) | BIT_ULL(7) | BIT_ULL(8) | BIT_ULL(9)) +#define CET_SUPPRESS BIT_ULL(10) +#define CET_WAIT_ENDBR BIT_ULL(11) + #endif /* _ASM_X86_MSR_INDEX_H */ diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h index bcba3c643e63..a8df907e8017 100644 --- a/arch/x86/include/uapi/asm/processor-flags.h +++ b/arch/x86/include/uapi/asm/processor-flags.h @@ -130,6 +130,8 @@ #define X86_CR4_SMAP _BITUL(X86_CR4_SMAP_BIT) #define X86_CR4_PKE_BIT 22 /* enable Protection Keys support */ #define X86_CR4_PKE _BITUL(X86_CR4_PKE_BIT) +#define X86_CR4_CET_BIT 23 /* enable Control-flow Enforcement */ +#define X86_CR4_CET _BITUL(X86_CR4_CET_BIT) /* * x86-64 Task Priority Register, CR8 diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 038e19c0019e..705fd9b94e31 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -38,6 +38,9 @@ static const char *xfeature_names[] = "Processor Trace (unused)" , "Protection Keys User registers", "unknown xstate feature" , + "Control-flow User registers" , + "Control-flow Kernel registers" , + "unknown xstate feature" , }; static short xsave_cpuid_features[] __initdata = { @@ -51,6 +54,9 @@ static short xsave_cpuid_features[] __initdata = { X86_FEATURE_AVX512F, X86_FEATURE_INTEL_PT, X86_FEATURE_PKU, + -1, /* Unused */ + X86_FEATURE_SHSTK, /* XFEATURE_CET_USER */ + X86_FEATURE_SHSTK, /* XFEATURE_CET_KERNEL */ }; /* @@ -318,6 +324,8 @@ static void __init print_xstate_features(void) print_xstate_feature(XFEATURE_MASK_ZMM_Hi256); print_xstate_feature(XFEATURE_MASK_Hi16_ZMM); print_xstate_feature(XFEATURE_MASK_PKRU); + print_xstate_feature(XFEATURE_MASK_CET_USER); + print_xstate_feature(XFEATURE_MASK_CET_KERNEL); } /* @@ -592,6 +600,8 @@ static void check_xstate_against_struct(int nr) XCHECK_SZ(sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state); XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM, struct avx_512_hi16_state); XCHECK_SZ(sz, nr, XFEATURE_PKRU, struct pkru_state); + XCHECK_SZ(sz, nr, XFEATURE_CET_USER, struct cet_user_state); + XCHECK_SZ(sz, nr, XFEATURE_CET_KERNEL, struct cet_kernel_state); /* * Make *SURE* to add any feature numbers in below if @@ -601,7 +611,8 @@ static void check_xstate_against_struct(int nr) if ((nr < XFEATURE_YMM) || (nr >= XFEATURE_MAX) || (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) || - ((nr >= XFEATURE_RSRVD_COMP_10) && (nr <= XFEATURE_LBR))) { + (nr == XFEATURE_RSRVD_COMP_10) || + ((nr >= XFEATURE_RSRVD_COMP_13) && (nr <= XFEATURE_LBR))) { WARN_ONCE(1, "no structure for xstate: %d\n", nr); XSTATE_WARN_ON(1); } @@ -831,8 +842,19 @@ void __init fpu__init_system_xstate(void) * Clear XSAVE features that are disabled in the normal CPUID. */ for (i = 0; i < ARRAY_SIZE(xsave_cpuid_features); i++) { - if (!boot_cpu_has(xsave_cpuid_features[i])) - xfeatures_mask_all &= ~BIT_ULL(i); + if (xsave_cpuid_features[i] == X86_FEATURE_SHSTK) { + /* + * X86_FEATURE_SHSTK and X86_FEATURE_IBT share + * same states, but can be enabled separately. + */ + if (!boot_cpu_has(X86_FEATURE_SHSTK) && + !boot_cpu_has(X86_FEATURE_IBT)) + xfeatures_mask_all &= ~BIT_ULL(i); + } else { + if ((xsave_cpuid_features[i] == -1) || + !boot_cpu_has(xsave_cpuid_features[i])) + xfeatures_mask_all &= ~BIT_ULL(i); + } } xfeatures_mask_all &= fpu__get_supported_xfeatures_mask(); diff --git a/tools/arch/x86/include/asm/msr-index.h b/tools/arch/x86/include/asm/msr-index.h index 2859ee4f39a8..f6f3a0e6c664 100644 --- a/tools/arch/x86/include/asm/msr-index.h +++ b/tools/arch/x86/include/asm/msr-index.h @@ -912,4 +912,24 @@ #define MSR_VM_IGNNE 0xc0010115 #define MSR_VM_HSAVE_PA 0xc0010117 +/* Control-flow Enforcement Technology MSRs */ +#define MSR_IA32_U_CET 0x6a0 /* user mode cet setting */ +#define MSR_IA32_S_CET 0x6a2 /* kernel mode cet setting */ +#define MSR_IA32_PL0_SSP 0x6a4 /* kernel shstk pointer */ +#define MSR_IA32_PL1_SSP 0x6a5 /* ring-1 shstk pointer */ +#define MSR_IA32_PL2_SSP 0x6a6 /* ring-2 shstk pointer */ +#define MSR_IA32_PL3_SSP 0x6a7 /* user shstk pointer */ +#define MSR_IA32_INT_SSP_TAB 0x6a8 /* exception shstk table */ + +/* MSR_IA32_U_CET and MSR_IA32_S_CET bits */ +#define CET_SHSTK_EN BIT_ULL(0) +#define CET_WRSS_EN BIT_ULL(1) +#define CET_ENDBR_EN BIT_ULL(2) +#define CET_LEG_IW_EN BIT_ULL(3) +#define CET_NO_TRACK_EN BIT_ULL(4) +#define CET_SUPPRESS_DISABLE BIT_ULL(5) +#define CET_RESERVED (BIT_ULL(6) | BIT_ULL(7) | BIT_ULL(8) | BIT_ULL(9)) +#define CET_SUPPRESS BIT_ULL(10) +#define CET_WAIT_ENDBR BIT_ULL(11) + #endif /* _ASM_X86_MSR_INDEX_H */ From patchwork Fri Sep 25 14:56:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11799969 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C29BE92C for ; Fri, 25 Sep 2020 14:57:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8875621D42 for ; Fri, 25 Sep 2020 14:57:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8875621D42 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 376DB6B0070; Fri, 25 Sep 2020 10:57:20 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2D9AD6B0071; Fri, 25 Sep 2020 10:57:20 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 08CB06B0072; Fri, 25 Sep 2020 10:57:19 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0025.hostedemail.com [216.40.44.25]) by kanga.kvack.org (Postfix) with ESMTP id DD4AB6B0071 for ; Fri, 25 Sep 2020 10:57:19 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 956D7181AE860 for ; Fri, 25 Sep 2020 14:57:19 +0000 (UTC) X-FDA: 77301886998.02.bead12_1e045a527168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id 6247B100C8A5C for ; Fri, 25 Sep 2020 14:57:19 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30051:30054:30056:30064:30070:30079:30083,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yg8k4o3e5a6a3y8twwou1bnwptmyczp38dtjschrno16w4dcuotb4oerj8mf5.qap8bc1o54hn8e5pp73xj9dexs5967dntgsth3hpzu9eykink5a5acqzq7kwwtz.1-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: bead12_1e045a527168 X-Filterd-Recvd-Size: 7961 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:18 +0000 (UTC) IronPort-SDR: xHvVLBQ0YeqAo1FrVWDKW8K++2GpYuZamMGC7fRlCSLbCk3IorhoXUnxxVxqsRSW8gGcswNgX1 jFLjZlr/BPoQ== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631891" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631891" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:16 -0700 IronPort-SDR: oAhpvDyQfu3Ob5yS4gn2h7BMsyP/9git+o+mrcL/hLp/0ifyHu9SYnAqQcn3kvvPkdCuEY220g +32rvFq05W+Q== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499144" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:15 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v13 04/26] x86/cet: Add control-protection fault handler Date: Fri, 25 Sep 2020 07:56:27 -0700 Message-Id: <20200925145649.5438-5-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A control-protection fault is triggered when a control-flow transfer attempt violates Shadow Stack or Indirect Branch Tracking constraints. For example, the return address for a RET instruction differs from the copy on the Shadow Stack; or an indirect JMP instruction, without the NOTRACK prefix, arrives at a non-ENDBR opcode. The control-protection fault handler works in a similar way as the general protection fault handler. It provides the si_code SEGV_CPERR to the signal handler. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- v13: - Change X86_INTEL_* to X86_*. v10: - Change CONFIG_X86_64 to CONFIG_X86_INTEL_CET. v9: - Add Shadow Stack pointer to the fault printout. arch/x86/include/asm/idtentry.h | 4 ++ arch/x86/kernel/idt.c | 4 ++ arch/x86/kernel/signal_compat.c | 2 +- arch/x86/kernel/traps.c | 59 ++++++++++++++++++++++++++++++ include/uapi/asm-generic/siginfo.h | 3 +- 5 files changed, 70 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h index a43366191212..6293d94b9b66 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -532,6 +532,10 @@ DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_SS, exc_stack_segment); DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_GP, exc_general_protection); DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_AC, exc_alignment_check); +#ifdef CONFIG_X86_CET +DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP, exc_control_protection); +#endif + /* Raw exception entries which need extra work */ DECLARE_IDTENTRY_RAW(X86_TRAP_UD, exc_invalid_op); DECLARE_IDTENTRY_RAW(X86_TRAP_BP, exc_int3); diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c index 7ecf9babf0cb..34f2a7383d5d 100644 --- a/arch/x86/kernel/idt.c +++ b/arch/x86/kernel/idt.c @@ -112,6 +112,10 @@ static const __initconst struct idt_data def_idts[] = { #elif defined(CONFIG_X86_32) SYSG(IA32_SYSCALL_VECTOR, entry_INT80_32), #endif + +#ifdef CONFIG_X86_CET + INTG(X86_TRAP_CP, asm_exc_control_protection), +#endif }; /* diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c index 9ccbf0576cd0..c572a3de1037 100644 --- a/arch/x86/kernel/signal_compat.c +++ b/arch/x86/kernel/signal_compat.c @@ -27,7 +27,7 @@ static inline void signal_compat_build_tests(void) */ BUILD_BUG_ON(NSIGILL != 11); BUILD_BUG_ON(NSIGFPE != 15); - BUILD_BUG_ON(NSIGSEGV != 7); + BUILD_BUG_ON(NSIGSEGV != 8); BUILD_BUG_ON(NSIGBUS != 5); BUILD_BUG_ON(NSIGTRAP != 5); BUILD_BUG_ON(NSIGCHLD != 6); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 81a2fb711091..97a049258e33 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -597,6 +597,65 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection) cond_local_irq_disable(regs); } +#ifdef CONFIG_X86_CET +static const char * const control_protection_err[] = { + "unknown", + "near-ret", + "far-ret/iret", + "endbranch", + "rstorssp", + "setssbsy", +}; + +/* + * When a control protection exception occurs, send a signal + * to the responsible application. Currently, control + * protection is only enabled for the user mode. This + * exception should not come from the kernel mode. + */ +DEFINE_IDTENTRY_ERRORCODE(exc_control_protection) +{ + struct task_struct *tsk; + + if (notify_die(DIE_TRAP, "control protection fault", regs, + error_code, X86_TRAP_CP, SIGSEGV) == NOTIFY_STOP) + return; + cond_local_irq_enable(regs); + + if (!user_mode(regs)) + die("kernel control protection fault", regs, error_code); + + if (!static_cpu_has(X86_FEATURE_SHSTK) && + !static_cpu_has(X86_FEATURE_IBT)) + WARN_ONCE(1, "CET is disabled but got control protection fault\n"); + + tsk = current; + tsk->thread.error_code = error_code; + tsk->thread.trap_nr = X86_TRAP_CP; + + if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) && + printk_ratelimit()) { + unsigned int max_err; + unsigned long ssp; + + max_err = ARRAY_SIZE(control_protection_err) - 1; + if ((error_code < 0) || (error_code > max_err)) + error_code = 0; + rdmsrl(MSR_IA32_PL3_SSP, ssp); + pr_info("%s[%d] control protection ip:%lx sp:%lx ssp:%lx error:%lx(%s)", + tsk->comm, task_pid_nr(tsk), + regs->ip, regs->sp, ssp, error_code, + control_protection_err[error_code]); + print_vma_addr(KERN_CONT " in ", regs->ip); + pr_cont("\n"); + } + + force_sig_fault(SIGSEGV, SEGV_CPERR, + (void __user *)uprobe_get_trap_addr(regs)); + cond_local_irq_disable(regs); +} +#endif + static bool do_int3(struct pt_regs *regs) { int res; diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index cb3d6c267181..91e10cbe3bb0 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -229,7 +229,8 @@ typedef struct siginfo { #define SEGV_ACCADI 5 /* ADI not enabled for mapped object */ #define SEGV_ADIDERR 6 /* Disrupting MCD error */ #define SEGV_ADIPERR 7 /* Precise MCD exception */ -#define NSIGSEGV 7 +#define SEGV_CPERR 8 /* Control protection fault */ +#define NSIGSEGV 8 /* * SIGBUS si_codes From patchwork Fri Sep 25 14:56:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11799971 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6F06492C for ; Fri, 25 Sep 2020 14:57:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 35D772395C for ; Fri, 25 Sep 2020 14:57:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 35D772395C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 994B56B0071; Fri, 25 Sep 2020 10:57:20 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 880EA6B0072; Fri, 25 Sep 2020 10:57:20 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 743AE8E0001; Fri, 25 Sep 2020 10:57:20 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0084.hostedemail.com [216.40.44.84]) by kanga.kvack.org (Postfix) with ESMTP id 5A3CB6B0072 for ; Fri, 25 Sep 2020 10:57:20 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 0C947180AD804 for ; Fri, 25 Sep 2020 14:57:20 +0000 (UTC) X-FDA: 77301887040.08.trees55_1b0e8b327168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id E461C1819E769 for ; Fri, 25 Sep 2020 14:57:19 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30051:30054:30055:30056:30064:30070,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04ygf3pprz5n1w9jp63tb7oc6pzydopgd1x5kwf4hu6diuraww6t7orgn3oep93.5fdicxezkatfqq5btdh5q8hqprbehxjy7hdetn5sxw6cnct1benkdzugcpgt4uc.a-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: trees55_1b0e8b327168 X-Filterd-Recvd-Size: 5346 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:18 +0000 (UTC) IronPort-SDR: cjUGryMyc3we5rAugh7S/VBNb0eUIABFITBg9qk3UTOoSip0YOw2mJEbilJ4x1UfJIJS9YlZhf AFLZ1zGaBSPA== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631895" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631895" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:17 -0700 IronPort-SDR: lsZAQDEVqBDeINw8JIqprW7NRo2/DwDG8cp4LaMK9V8s3zaPJ9zyP4q82HlkEs57fxwomMZyva wpyGpCGcEevQ== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499149" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:16 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v13 05/26] x86/cet/shstk: Add Kconfig option for user-mode Shadow Stack Date: Fri, 25 Sep 2020 07:56:28 -0700 Message-Id: <20200925145649.5438-6-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Shadow Stack provides protection against function return address corruption. It is active when the processor supports it, the kernel has CONFIG_X86_SHADOW_STACK_USER, and the application is built for the feature. This is only implemented for the 64-bit kernel. When it is enabled, legacy non-shadow stack applications continue to work, but without protection. Signed-off-by: Yu-cheng Yu --- v13: - Update help text, and change default to N. - Change X86_INTEL_* to X86_*. v10: - Change SHSTK to shadow stack in the help text. - Change build-time check to config-time check. - Change ARCH_HAS_SHSTK to ARCH_HAS_SHADOW_STACK. arch/x86/Kconfig | 33 +++++++++++++++++++++++++++ scripts/as-x86_64-has-shadow-stack.sh | 4 ++++ 2 files changed, 37 insertions(+) create mode 100755 scripts/as-x86_64-has-shadow-stack.sh diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 7101ac64bb20..415fcc869afc 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1927,6 +1927,39 @@ config X86_INTEL_TSX_MODE_AUTO side channel attacks- equals the tsx=auto command line parameter. endchoice +config AS_HAS_SHADOW_STACK + def_bool $(success,$(srctree)/scripts/as-x86_64-has-shadow-stack.sh $(CC)) + help + Test the assembler for shadow stack instructions. + +config X86_CET + def_bool n + +config ARCH_HAS_SHADOW_STACK + def_bool n + +config X86_SHADOW_STACK_USER + prompt "Intel Shadow Stacks for user-mode" + def_bool n + depends on CPU_SUP_INTEL && X86_64 + depends on AS_HAS_SHADOW_STACK + select ARCH_USES_HIGH_VMA_FLAGS + select X86_CET + select ARCH_HAS_SHADOW_STACK + help + Shadow Stacks provides protection against program stack + corruption. It's a hardware feature. This only matters + if you have the right hardware. It's a security hardening + feature and apps must be enabled to use it. You get no + protection "for free" on old userspace. The hardware can + support user and kernel, but this option is for user space + only. + Support for this feature is only known to be present on + processors released in 2020 or later. CET features are also + known to increase kernel text size by 3.7 KB. + + If unsure, say N. + config EFI bool "EFI runtime service support" depends on ACPI diff --git a/scripts/as-x86_64-has-shadow-stack.sh b/scripts/as-x86_64-has-shadow-stack.sh new file mode 100755 index 000000000000..fac1d363a1b8 --- /dev/null +++ b/scripts/as-x86_64-has-shadow-stack.sh @@ -0,0 +1,4 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 + +echo "wrussq %rax, (%rbx)" | $* -x assembler -c - From patchwork Fri Sep 25 14:56:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11799977 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1FB2192C for ; Fri, 25 Sep 2020 14:57:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C5EF32371F for ; Fri, 25 Sep 2020 14:57:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C5EF32371F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 015766B0073; Fri, 25 Sep 2020 10:57:22 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BD5176B007B; Fri, 25 Sep 2020 10:57:21 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A767C6B0074; Fri, 25 Sep 2020 10:57:21 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0072.hostedemail.com [216.40.44.72]) by kanga.kvack.org (Postfix) with ESMTP id 773E86B0073 for ; Fri, 25 Sep 2020 10:57:21 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 2479F40F4 for ; Fri, 25 Sep 2020 14:57:21 +0000 (UTC) X-FDA: 77301887082.18.queen51_410d30b27168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id D8D7010161DBA for ; Fri, 25 Sep 2020 14:57:20 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30056:30064:30070,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yrarfp6mij6zt63r1ih67rtywp6yp8okb56465rgu9hjo4urzzg4384a7rkrf.nnkki9sohky1cxmwagxxk6zkk5te39mu6c4tt3k4bnabkaddjiz87o15ej7kc94.c-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: queen51_410d30b27168 X-Filterd-Recvd-Size: 9638 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:19 +0000 (UTC) IronPort-SDR: NeY7wkESiiKOTwR1SbSwAojJ4MTQcBVcpGDXpaUGgWVoNInW2EbmN89+eso1+b8RcJhKbmulB8 5OBxzyjYlgKw== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631897" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631897" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:17 -0700 IronPort-SDR: CWkFadNihY3rGqxDFlMGTvgJirE1P7wjdUOUal1zPIoQbZvAKqZfL4BOWBZtBUu78Mos2gdDSE QXAFQlLOW7gw== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499153" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:17 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu , Dave Hansen Subject: [PATCH v13 06/26] x86/mm: Change _PAGE_DIRTY to _PAGE_DIRTY_HW Date: Fri, 25 Sep 2020 07:56:29 -0700 Message-Id: <20200925145649.5438-7-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Before introducing _PAGE_COW for non-hardware memory management purposes in the next patch, rename _PAGE_DIRTY to _PAGE_DIRTY_HW and _PAGE_BIT_DIRTY to _PAGE_BIT_DIRTY_HW to make meanings more clear. There are no functional changes from this patch. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook Reviewed-by: Dave Hansen --- v9: - At some places _PAGE_DIRTY were not changed to _PAGE_DIRTY_HW, because they will be changed again in the next patch to _PAGE_DIRTY_BITS. However, this causes compile issues if the next patch is not yet applied. Fix it by changing all _PAGE_DIRTY to _PAGE_DRITY_HW. arch/x86/include/asm/pgtable.h | 18 +++++++++--------- arch/x86/include/asm/pgtable_types.h | 11 +++++------ arch/x86/kernel/relocate_kernel_64.S | 2 +- arch/x86/kvm/vmx/vmx.c | 2 +- 4 files changed, 16 insertions(+), 17 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index b836138ce852..86b7acd221c1 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -124,7 +124,7 @@ extern pmdval_t early_pmd_flags; */ static inline int pte_dirty(pte_t pte) { - return pte_flags(pte) & _PAGE_DIRTY; + return pte_flags(pte) & _PAGE_DIRTY_HW; } @@ -163,7 +163,7 @@ static inline int pte_young(pte_t pte) static inline int pmd_dirty(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_DIRTY; + return pmd_flags(pmd) & _PAGE_DIRTY_HW; } static inline int pmd_young(pmd_t pmd) @@ -173,7 +173,7 @@ static inline int pmd_young(pmd_t pmd) static inline int pud_dirty(pud_t pud) { - return pud_flags(pud) & _PAGE_DIRTY; + return pud_flags(pud) & _PAGE_DIRTY_HW; } static inline int pud_young(pud_t pud) @@ -334,7 +334,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte) static inline pte_t pte_mkclean(pte_t pte) { - return pte_clear_flags(pte, _PAGE_DIRTY); + return pte_clear_flags(pte, _PAGE_DIRTY_HW); } static inline pte_t pte_mkold(pte_t pte) @@ -354,7 +354,7 @@ static inline pte_t pte_mkexec(pte_t pte) static inline pte_t pte_mkdirty(pte_t pte) { - return pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + return pte_set_flags(pte, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY); } static inline pte_t pte_mkyoung(pte_t pte) @@ -435,7 +435,7 @@ static inline pmd_t pmd_mkold(pmd_t pmd) static inline pmd_t pmd_mkclean(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_DIRTY); + return pmd_clear_flags(pmd, _PAGE_DIRTY_HW); } static inline pmd_t pmd_wrprotect(pmd_t pmd) @@ -445,7 +445,7 @@ static inline pmd_t pmd_wrprotect(pmd_t pmd) static inline pmd_t pmd_mkdirty(pmd_t pmd) { - return pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + return pmd_set_flags(pmd, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY); } static inline pmd_t pmd_mkdevmap(pmd_t pmd) @@ -489,7 +489,7 @@ static inline pud_t pud_mkold(pud_t pud) static inline pud_t pud_mkclean(pud_t pud) { - return pud_clear_flags(pud, _PAGE_DIRTY); + return pud_clear_flags(pud, _PAGE_DIRTY_HW); } static inline pud_t pud_wrprotect(pud_t pud) @@ -499,7 +499,7 @@ static inline pud_t pud_wrprotect(pud_t pud) static inline pud_t pud_mkdirty(pud_t pud) { - return pud_set_flags(pud, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + return pud_set_flags(pud, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY); } static inline pud_t pud_mkdevmap(pud_t pud) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 816b31c68550..192e1326b3db 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -15,7 +15,7 @@ #define _PAGE_BIT_PWT 3 /* page write through */ #define _PAGE_BIT_PCD 4 /* page cache disabled */ #define _PAGE_BIT_ACCESSED 5 /* was accessed (raised by CPU) */ -#define _PAGE_BIT_DIRTY 6 /* was written to (raised by CPU) */ +#define _PAGE_BIT_DIRTY_HW 6 /* was written to (raised by CPU) */ #define _PAGE_BIT_PSE 7 /* 4 MB (or 2MB) page */ #define _PAGE_BIT_PAT 7 /* on 4KB pages */ #define _PAGE_BIT_GLOBAL 8 /* Global TLB entry PPro+ */ @@ -46,7 +46,7 @@ #define _PAGE_PWT (_AT(pteval_t, 1) << _PAGE_BIT_PWT) #define _PAGE_PCD (_AT(pteval_t, 1) << _PAGE_BIT_PCD) #define _PAGE_ACCESSED (_AT(pteval_t, 1) << _PAGE_BIT_ACCESSED) -#define _PAGE_DIRTY (_AT(pteval_t, 1) << _PAGE_BIT_DIRTY) +#define _PAGE_DIRTY_HW (_AT(pteval_t, 1) << _PAGE_BIT_DIRTY_HW) #define _PAGE_PSE (_AT(pteval_t, 1) << _PAGE_BIT_PSE) #define _PAGE_GLOBAL (_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL) #define _PAGE_SOFTW1 (_AT(pteval_t, 1) << _PAGE_BIT_SOFTW1) @@ -74,7 +74,7 @@ _PAGE_PKEY_BIT3) #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) -#define _PAGE_KNL_ERRATUM_MASK (_PAGE_DIRTY | _PAGE_ACCESSED) +#define _PAGE_KNL_ERRATUM_MASK (_PAGE_DIRTY_HW | _PAGE_ACCESSED) #else #define _PAGE_KNL_ERRATUM_MASK 0 #endif @@ -126,7 +126,7 @@ * pte_modify() does modify it. */ #define _PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \ - _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY | \ + _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY_HW | \ _PAGE_SOFT_DIRTY | _PAGE_DEVMAP | _PAGE_ENC | \ _PAGE_UFFD_WP) #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE) @@ -163,7 +163,7 @@ enum page_cache_mode { #define __RW _PAGE_RW #define _USR _PAGE_USER #define ___A _PAGE_ACCESSED -#define ___D _PAGE_DIRTY +#define ___D _PAGE_DIRTY_HW #define ___G _PAGE_GLOBAL #define __NX _PAGE_NX @@ -205,7 +205,6 @@ enum page_cache_mode { #define __PAGE_KERNEL_IO __PAGE_KERNEL #define __PAGE_KERNEL_IO_NOCACHE __PAGE_KERNEL_NOCACHE - #ifndef __ASSEMBLY__ #define __PAGE_KERNEL_ENC (__PAGE_KERNEL | _ENC) diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S index a4d9a261425b..e3bb4ff95523 100644 --- a/arch/x86/kernel/relocate_kernel_64.S +++ b/arch/x86/kernel/relocate_kernel_64.S @@ -17,7 +17,7 @@ */ #define PTR(x) (x << 3) -#define PAGE_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) +#define PAGE_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY_HW) /* * control_page + KEXEC_CONTROL_CODE_MAX_SIZE diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 8646a797b7a8..8890cfa074c7 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -3605,7 +3605,7 @@ static int init_rmode_identity_map(struct kvm *kvm) /* Set up identity-mapping pagetable for EPT in real mode */ for (i = 0; i < PT32_ENT_PER_PAGE; i++) { tmp = (i << 22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | - _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE); + _PAGE_ACCESSED | _PAGE_DIRTY_HW | _PAGE_PSE); r = kvm_write_guest_page(kvm, identity_map_pfn, &tmp, i * sizeof(tmp), sizeof(tmp)); if (r < 0) From patchwork Fri Sep 25 14:56:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11799975 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 18740112C for ; Fri, 25 Sep 2020 14:57:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DA6C520715 for ; Fri, 25 Sep 2020 14:57:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DA6C520715 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C79E66B0072; Fri, 25 Sep 2020 10:57:21 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 975DA6B0073; Fri, 25 Sep 2020 10:57:21 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6F1F26B0074; Fri, 25 Sep 2020 10:57:21 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0174.hostedemail.com [216.40.44.174]) by kanga.kvack.org (Postfix) with ESMTP id 565196B0073 for ; Fri, 25 Sep 2020 10:57:21 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 172F353B4 for ; Fri, 25 Sep 2020 14:57:21 +0000 (UTC) X-FDA: 77301887082.05.idea97_21024ee27168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id E962C18033789 for ; Fri, 25 Sep 2020 14:57:20 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30056:30064,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04y8zipi18r6x5u9tizucujc9gkbpocptryet97u7gcji7w3irp1jbxgib8hgoe.acws414h345ky48qq43nbmdi38ukyh99ippsneerakguxadwj7x8a4g45tt6p7b.o-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: idea97_21024ee27168 X-Filterd-Recvd-Size: 5139 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:19 +0000 (UTC) IronPort-SDR: 9R++mMfym4SHBOzK7xUMiWcGhIfchrUvPNlgmrb2qT8VJ4ylxe7dxvF/p/HW1L8sATsbrBkeYC 3oT4VoeWFrkw== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631900" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631900" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:18 -0700 IronPort-SDR: iDFAOWwQdyVKBEaUWiRkru4jiGsfkuNhmJ9u9RoEzee2n1Jnk3np+099qA5oV6PsFwwakY/lNt V7sEh0948tnw== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499157" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:17 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu , Christoph Hellwig Subject: [PATCH v13 07/26] x86/mm: Remove _PAGE_DIRTY_HW from kernel RO pages Date: Fri, 25 Sep 2020 07:56:30 -0700 Message-Id: <20200925145649.5438-8-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Kernel read-only PTEs are setup as _PAGE_DIRTY_HW. Since these become shadow stack PTEs, remove the dirty bit. Signed-off-by: Yu-cheng Yu Cc: "H. Peter Anvin" Cc: Kees Cook Cc: Thomas Gleixner Cc: Dave Hansen Cc: Christoph Hellwig Cc: Andy Lutomirski Cc: Ingo Molnar Cc: Borislav Petkov Cc: Peter Zijlstra --- arch/x86/include/asm/pgtable_types.h | 6 +++--- arch/x86/mm/pat/set_memory.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 192e1326b3db..5f31f1c407b9 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -193,10 +193,10 @@ enum page_cache_mode { #define _KERNPG_TABLE (__PP|__RW| 0|___A| 0|___D| 0| 0| _ENC) #define _PAGE_TABLE_NOENC (__PP|__RW|_USR|___A| 0|___D| 0| 0) #define _PAGE_TABLE (__PP|__RW|_USR|___A| 0|___D| 0| 0| _ENC) -#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX|___D| 0|___G) -#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0|___D| 0|___G) +#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX| 0| 0|___G) +#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0| 0| 0|___G) #define __PAGE_KERNEL_NOCACHE (__PP|__RW| 0|___A|__NX|___D| 0|___G| __NC) -#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX|___D| 0|___G) +#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX| 0| 0|___G) #define __PAGE_KERNEL_LARGE (__PP|__RW| 0|___A|__NX|___D|_PSE|___G) #define __PAGE_KERNEL_LARGE_EXEC (__PP|__RW| 0|___A| 0|___D|_PSE|___G) #define __PAGE_KERNEL_WP (__PP|__RW| 0|___A|__NX|___D| 0|___G| __WP) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index d1b2a889f035..962434fdf0d9 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -1932,7 +1932,7 @@ int set_memory_nx(unsigned long addr, int numpages) int set_memory_ro(unsigned long addr, int numpages) { - return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW), 0); + return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW | _PAGE_DIRTY_HW), 0); } int set_memory_rw(unsigned long addr, int numpages) From patchwork Fri Sep 25 14:56:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11799973 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8E70092C for ; Fri, 25 Sep 2020 14:57:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 31B6023730 for ; Fri, 25 Sep 2020 14:57:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 31B6023730 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8D44E6B0075; Fri, 25 Sep 2020 10:57:21 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8328C6B0072; Fri, 25 Sep 2020 10:57:21 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 65B476B0075; Fri, 25 Sep 2020 10:57:21 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0133.hostedemail.com [216.40.44.133]) by kanga.kvack.org (Postfix) with ESMTP id 4DEE96B0072 for ; Fri, 25 Sep 2020 10:57:21 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 1618D8249980 for ; Fri, 25 Sep 2020 14:57:21 +0000 (UTC) X-FDA: 77301887082.21.dress37_210a96327168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id F2327180442C3 for ; Fri, 25 Sep 2020 14:57:20 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:4423:30045:30051:30054:30056:30064:30070:30079:30091,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04y8hi6intkca87ub458jxd7re7stycgprfso47on515nc9hi5qpejgzj7xgjg8.huqyd474d4wb6qiosjixutd1quru58yrpwimmj6bnysecdo1hi1rqa8rjbzygfc.o-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: dress37_210a96327168 X-Filterd-Recvd-Size: 15399 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:20 +0000 (UTC) IronPort-SDR: 5UsU643EN1HdKNxA/RVbcIFZ3CALVqv+s7aNNKU3pUqex2aEFEwB/+vB9l8OY5JABFKzduzX7+ LAVq5caHCuRA== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631902" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631902" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:19 -0700 IronPort-SDR: pv91liv5aQu3UoYGO+/7YAxS7vZbMjw0l8223TrNoQUUmDiqU6ivxEdudw0VxRzv/9oOKFM8dW C02QvI+xpNEQ== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499162" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:18 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v13 08/26] x86/mm: Introduce _PAGE_COW Date: Fri, 25 Sep 2020 07:56:31 -0700 Message-Id: <20200925145649.5438-9-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There is essentially no room left in the x86 hardware PTEs on some OSes (not Linux). That left the hardware architects looking for a way to represent a new memory type (shadow stack) within the existing bits. They chose to repurpose a lightly-used state: Write=0,Dirty=1. The reason it's lightly used is that Dirty=1 is normally set by hardware and cannot normally be set by hardware on a Write=0 PTE. Software must normally be involved to create one of these PTEs, so software can simply opt to not create them. But that leaves us with a Linux problem: we need to ensure we never create Write=0,Dirty=1 PTEs. In places where we do create them, we need to find an alternative way to represent them _without_ using the same hardware bit combination. Thus, enter _PAGE_COW. This results in the following: (a) A modified, copy-on-write (COW) page: (R/O + _PAGE_COW) (b) A R/O page that has been COW'ed: (R/O + _PAGE_COW) The user page is in a R/O VMA, and get_user_pages() needs a writable copy. The page fault handler creates a copy of the page and sets the new copy's PTE as R/O and _PAGE_COW. (c) A shadow stack PTE: (R/O + _PAGE_DIRTY_HW) (d) A shared shadow stack PTE: (R/O + _PAGE_COW) When a shadow stack page is being shared among processes (this happens at fork()), its PTE is cleared of _PAGE_DIRTY_HW, so the next shadow stack access causes a fault, and the page is duplicated and _PAGE_DIRTY_HW is set again. This is the COW equivalent for shadow stack pages, even though it's copy-on-access rather than copy-on-write. (e) A page where the processor observed a Write=1 PTE, started a write, set Dirty=1, but then observed a Write=0 PTE. That's possible today, but will not happen on processors that support shadow stack. Use _PAGE_COW in pte_wrprotect() and _PAGE_DIRTY_HW in pte_mkwrite(). Apply the same changes to pmd and pud. When this patch is applied, there are six free bits left in the 64-bit PTE. There are no more free bits in the 32-bit PTE (except for PAE) and shadow stack is not implemented for the 32-bit kernel. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- v10: - Change _PAGE_BIT_DIRTY_SW to _PAGE_BIT_COW, as it is used for copy-on- write PTEs. - Update pte_write() and treat shadow stack as writable. - Change *_mkdirty_shstk() to *_mkwrite_shstk() as these make shadow stack pages writable. - Use bit test & shift to move _PAGE_BIT_DIRTY_HW to _PAGE_BIT_COW. - Change static_cpu_has() to cpu_feature_enabled(). - Revise commit log. v9: - Remove pte_move_flags() etc. and put the logic directly in pte_wrprotect()/pte_mkwrite() etc. - Change compile-time conditionals to run-time checks. - Split out pte_modify()/pmd_modify() to a new patch. - Update comments. arch/x86/include/asm/pgtable.h | 120 ++++++++++++++++++++++++--- arch/x86/include/asm/pgtable_types.h | 41 ++++++++- 2 files changed, 150 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 86b7acd221c1..ac4ed814be96 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -122,9 +122,9 @@ extern pmdval_t early_pmd_flags; * The following only work if pte_present() is true. * Undefined behaviour if not.. */ -static inline int pte_dirty(pte_t pte) +static inline bool pte_dirty(pte_t pte) { - return pte_flags(pte) & _PAGE_DIRTY_HW; + return pte_flags(pte) & _PAGE_DIRTY_BITS; } @@ -161,9 +161,9 @@ static inline int pte_young(pte_t pte) return pte_flags(pte) & _PAGE_ACCESSED; } -static inline int pmd_dirty(pmd_t pmd) +static inline bool pmd_dirty(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_DIRTY_HW; + return pmd_flags(pmd) & _PAGE_DIRTY_BITS; } static inline int pmd_young(pmd_t pmd) @@ -171,9 +171,9 @@ static inline int pmd_young(pmd_t pmd) return pmd_flags(pmd) & _PAGE_ACCESSED; } -static inline int pud_dirty(pud_t pud) +static inline bool pud_dirty(pud_t pud) { - return pud_flags(pud) & _PAGE_DIRTY_HW; + return pud_flags(pud) & _PAGE_DIRTY_BITS; } static inline int pud_young(pud_t pud) @@ -183,6 +183,12 @@ static inline int pud_young(pud_t pud) static inline int pte_write(pte_t pte) { + /* + * If _PAGE_DIRTY_HW is set, the PTE must either have + * _PAGE_RW or be a shadow stack PTE, which is logically writable. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY_HW); return pte_flags(pte) & _PAGE_RW; } @@ -334,7 +340,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte) static inline pte_t pte_mkclean(pte_t pte) { - return pte_clear_flags(pte, _PAGE_DIRTY_HW); + return pte_clear_flags(pte, _PAGE_DIRTY_BITS); } static inline pte_t pte_mkold(pte_t pte) @@ -344,6 +350,17 @@ static inline pte_t pte_mkold(pte_t pte) static inline pte_t pte_wrprotect(pte_t pte) { + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PTE (RW=0,Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pte.pte |= (pte.pte & _PAGE_DIRTY_HW) >> + _PAGE_BIT_DIRTY_HW << _PAGE_BIT_COW; + pte = pte_clear_flags(pte, _PAGE_DIRTY_HW); + } + return pte_clear_flags(pte, _PAGE_RW); } @@ -354,6 +371,18 @@ static inline pte_t pte_mkexec(pte_t pte) static inline pte_t pte_mkdirty(pte_t pte) { + pteval_t dirty = _PAGE_DIRTY_HW; + + /* Avoid creating (HW)Dirty=1,Write=0 PTEs */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !pte_write(pte)) + dirty = _PAGE_COW; + + return pte_set_flags(pte, dirty | _PAGE_SOFT_DIRTY); +} + +static inline pte_t pte_mkwrite_shstk(pte_t pte) +{ + pte = pte_clear_flags(pte, _PAGE_COW); return pte_set_flags(pte, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY); } @@ -364,6 +393,13 @@ static inline pte_t pte_mkyoung(pte_t pte) static inline pte_t pte_mkwrite(pte_t pte) { + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + if (pte_flags(pte) & _PAGE_COW) { + pte = pte_clear_flags(pte, _PAGE_COW); + pte = pte_set_flags(pte, _PAGE_DIRTY_HW); + } + } + return pte_set_flags(pte, _PAGE_RW); } @@ -435,16 +471,41 @@ static inline pmd_t pmd_mkold(pmd_t pmd) static inline pmd_t pmd_mkclean(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_DIRTY_HW); + return pmd_clear_flags(pmd, _PAGE_DIRTY_BITS); } static inline pmd_t pmd_wrprotect(pmd_t pmd) { + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PMD (RW=0,Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pmdval_t v = native_pmd_val(pmd); + + v |= (v & _PAGE_DIRTY_HW) >> _PAGE_BIT_DIRTY_HW << + _PAGE_BIT_COW; + pmd = pmd_clear_flags(__pmd(v), _PAGE_DIRTY_HW); + } + return pmd_clear_flags(pmd, _PAGE_RW); } static inline pmd_t pmd_mkdirty(pmd_t pmd) { + pmdval_t dirty = _PAGE_DIRTY_HW; + + /* Avoid creating (HW)Dirty=1,Write=0 PMDs */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !(pmd_flags(pmd) & _PAGE_RW)) + dirty = _PAGE_COW; + + return pmd_set_flags(pmd, dirty | _PAGE_SOFT_DIRTY); +} + +static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd) +{ + pmd = pmd_clear_flags(pmd, _PAGE_COW); return pmd_set_flags(pmd, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY); } @@ -465,6 +526,13 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) static inline pmd_t pmd_mkwrite(pmd_t pmd) { + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + if (pmd_flags(pmd) & _PAGE_COW) { + pmd = pmd_clear_flags(pmd, _PAGE_COW); + pmd = pmd_set_flags(pmd, _PAGE_DIRTY_HW); + } + } + return pmd_set_flags(pmd, _PAGE_RW); } @@ -489,17 +557,36 @@ static inline pud_t pud_mkold(pud_t pud) static inline pud_t pud_mkclean(pud_t pud) { - return pud_clear_flags(pud, _PAGE_DIRTY_HW); + return pud_clear_flags(pud, _PAGE_DIRTY_BITS); } static inline pud_t pud_wrprotect(pud_t pud) { + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PUD (RW=0,Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pudval_t v = native_pud_val(pud); + + v |= (v & _PAGE_DIRTY_HW) >> _PAGE_BIT_DIRTY_HW << + _PAGE_BIT_COW; + pud = pud_clear_flags(__pud(v), _PAGE_DIRTY_HW); + } + return pud_clear_flags(pud, _PAGE_RW); } static inline pud_t pud_mkdirty(pud_t pud) { - return pud_set_flags(pud, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY); + pudval_t dirty = _PAGE_DIRTY_HW; + + /* Avoid creating (HW)Dirty=1,Write=0 PUDs */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !(pud_flags(pud) & _PAGE_RW)) + dirty = _PAGE_COW; + + return pud_set_flags(pud, dirty | _PAGE_SOFT_DIRTY); } static inline pud_t pud_mkdevmap(pud_t pud) @@ -519,6 +606,13 @@ static inline pud_t pud_mkyoung(pud_t pud) static inline pud_t pud_mkwrite(pud_t pud) { + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + if (pud_flags(pud) & _PAGE_COW) { + pud = pud_clear_flags(pud, _PAGE_COW); + pud = pud_set_flags(pud, _PAGE_DIRTY_HW); + } + } + return pud_set_flags(pud, _PAGE_RW); } @@ -1132,6 +1226,12 @@ extern int pmdp_clear_flush_young(struct vm_area_struct *vma, #define pmd_write pmd_write static inline int pmd_write(pmd_t pmd) { + /* + * If _PAGE_DIRTY_HW is set, then the PMD must either have + * _PAGE_RW or be a shadow stack PMD, which is logically writable. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pmd_flags(pmd) & (_PAGE_RW | _PAGE_DIRTY_HW); return pmd_flags(pmd) & _PAGE_RW; } diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 5f31f1c407b9..75362df8b226 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -23,7 +23,8 @@ #define _PAGE_BIT_SOFTW2 10 /* " */ #define _PAGE_BIT_SOFTW3 11 /* " */ #define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */ -#define _PAGE_BIT_SOFTW4 58 /* available for programmer */ +#define _PAGE_BIT_SOFTW4 57 /* available for programmer */ +#define _PAGE_BIT_SOFTW5 58 /* available for programmer */ #define _PAGE_BIT_PKEY_BIT0 59 /* Protection Keys, bit 1/4 */ #define _PAGE_BIT_PKEY_BIT1 60 /* Protection Keys, bit 2/4 */ #define _PAGE_BIT_PKEY_BIT2 61 /* Protection Keys, bit 3/4 */ @@ -36,6 +37,16 @@ #define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */ #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4 +/* + * This bit indicates a copy-on-write page, and is different from + * _PAGE_BIT_SOFT_DIRTY, which tracks which pages a task writes to. + */ +#ifdef CONFIG_X86_64 +#define _PAGE_BIT_COW _PAGE_BIT_SOFTW5 /* copy-on-write */ +#else +#define _PAGE_BIT_COW 0 +#endif + /* If _PAGE_BIT_PRESENT is clear, we use these: */ /* - if the user mapped it with PROT_NONE; pte_present gives true */ #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL @@ -117,6 +128,34 @@ #define _PAGE_DEVMAP (_AT(pteval_t, 0)) #endif +/* + * _PAGE_COW is used to separate R/O and copy-on-write PTEs created by + * software from the shadow stack PTE setting required by the hardware: + * (a) A modified, copy-on-write (COW) page: (R/O + _PAGE_COW) + * (b) A R/O page that has been COW'ed: (R/O +_PAGE_COW) + * The user page is in a R/O VMA, and get_user_pages() needs a + * writable copy. The page fault handler creates a copy of the page + * and sets the new copy's PTE as R/O and _PAGE_COW. + * (c) A shadow stack PTE: (R/O + _PAGE_DIRTY_HW) + * (d) A shared (copy-on-access) shadow stack PTE: (R/O + _PAGE_COW) + * When a shadow stack page is being shared among processes (this + * happens at fork()), its PTE is cleared of _PAGE_DIRTY_HW, so the + * next shadow stack access causes a fault, and the page is duplicated + * and _PAGE_DIRTY_HW is set again. This is the COW equivalent for + * shadow stack pages, even though it's copy-on-access rather than + * copy-on-write. + * (e) A page where the processor observed a Write=1 PTE, started a write, + * set Dirty=1, but then observed a Write=0 PTE. That's possible + * today, but will not happen on processors that support shadow stack. + */ +#ifdef CONFIG_X86_SHADOW_STACK_USER +#define _PAGE_COW (_AT(pteval_t, 1) << _PAGE_BIT_COW) +#else +#define _PAGE_COW (_AT(pteval_t, 0)) +#endif + +#define _PAGE_DIRTY_BITS (_PAGE_DIRTY_HW | _PAGE_COW) + #define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) /* From patchwork Fri Sep 25 14:56:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11799979 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E197B112C for ; Fri, 25 Sep 2020 14:57:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AFCC720715 for ; Fri, 25 Sep 2020 14:57:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AFCC720715 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9207A6B0074; Fri, 25 Sep 2020 10:57:22 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 746DE6B0078; Fri, 25 Sep 2020 10:57:22 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 594356B007B; Fri, 25 Sep 2020 10:57:22 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0099.hostedemail.com [216.40.44.99]) by kanga.kvack.org (Postfix) with ESMTP id 3FD0E6B0074 for ; Fri, 25 Sep 2020 10:57:22 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id F1F66180AD802 for ; Fri, 25 Sep 2020 14:57:21 +0000 (UTC) X-FDA: 77301887082.03.horse03_460895127168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id D08D128A4E8 for ; Fri, 25 Sep 2020 14:57:21 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30055:30056:30064:30070,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yrqsgas6mmjg4o3kpssrj3jq9frocuwrcxztqjdyqqkpf5ffwshn75n6ixj18.7c987apmf94enkamuo3q98h53bz6xjoyx4cb1bzumdjc95mnnqpfth16eeyiqhf.r-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: horse03_460895127168 X-Filterd-Recvd-Size: 4018 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:21 +0000 (UTC) IronPort-SDR: Ok8qdpfgno6NrkaqgFgPCknbaad6LF8PCkOyub3am9amy1Os3lmGcSUsBDKXqMPI6y6eg6UZz0 T9+hFSNzWTHA== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631909" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631909" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:20 -0700 IronPort-SDR: kZeyPmI4vaAxpJYs/0m8HiVw8F4xDJJkTMT2lPwMF58GzS526xi7Z+2DrmcXlLkp/FT51cYd6k t4YgQXsTBIsw== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499165" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:19 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu , David Airlie , Joonas Lahtinen , Jani Nikula , Daniel Vetter , Rodrigo Vivi , Zhenyu Wang , Zhi Wang Subject: [PATCH v13 09/26] drm/i915/gvt: Change _PAGE_DIRTY to _PAGE_DIRTY_BITS Date: Fri, 25 Sep 2020 07:56:32 -0700 Message-Id: <20200925145649.5438-10-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: After the introduction of _PAGE_COW, a modified page's PTE can have either _PAGE_DIRTY_HW or _PAGE_COW. Change _PAGE_DIRTY to _PAGE_DIRTY_BITS. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook Cc: David Airlie Cc: Joonas Lahtinen Cc: Jani Nikula Cc: Daniel Vetter Cc: Rodrigo Vivi Cc: Zhenyu Wang Cc: Zhi Wang --- drivers/gpu/drm/i915/gvt/gtt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c index a3a4305eda01..dd0ab28cfe7d 100644 --- a/drivers/gpu/drm/i915/gvt/gtt.c +++ b/drivers/gpu/drm/i915/gvt/gtt.c @@ -1207,7 +1207,7 @@ static int split_2MB_gtt_entry(struct intel_vgpu *vgpu, } /* Clear dirty field. */ - se->val64 &= ~_PAGE_DIRTY; + se->val64 &= ~_PAGE_DIRTY_BITS; ops->clear_pse(se); ops->clear_ips(se); From patchwork Fri Sep 25 14:56:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11799981 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2C4E492C for ; Fri, 25 Sep 2020 14:57:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id ED72320715 for ; Fri, 25 Sep 2020 14:57:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ED72320715 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E4B346B007D; Fri, 25 Sep 2020 10:57:22 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DD5F26B0080; Fri, 25 Sep 2020 10:57:22 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA3256B007E; Fri, 25 Sep 2020 10:57:22 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0210.hostedemail.com [216.40.44.210]) by kanga.kvack.org (Postfix) with ESMTP id 866976B007B for ; Fri, 25 Sep 2020 10:57:22 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 4B8D653B4 for ; Fri, 25 Sep 2020 14:57:22 +0000 (UTC) X-FDA: 77301887124.27.crow32_2511fb227168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id 2A5033D669 for ; Fri, 25 Sep 2020 14:57:22 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30056:30064:30070,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yg3rx6dri37htzfod84cf76k1abypwi5jfhgo7shmoyqz8u9k14dbf4krhe7w.hbniuxfniuqiqqzojfcwem6gkp7ch8mg1g5aj1r8pt1yc7dcace6gnnxnzyp7sp.c-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:30,LUA_SUMMARY:none X-HE-Tag: crow32_2511fb227168 X-Filterd-Recvd-Size: 4873 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:21 +0000 (UTC) IronPort-SDR: 1CrLE5HwNveOaLMBOJYZdejT1b0NwsMDGQrIhgJv1RaCD/t4Odg5vbj4ESur4N1aBgeugf2pEJ VDD2VZyBqgXQ== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631910" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631910" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:20 -0700 IronPort-SDR: 0hJ+lpgFyXGa3b1qHBZastFfvAHGpg6AaQBhrXY4rj22lxwO6b1npMV7zKehfyhwQJRDOYhGW3 0TXS26FytIRw== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499169" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:20 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v13 10/26] x86/mm: Update pte_modify for _PAGE_COW Date: Fri, 25 Sep 2020 07:56:33 -0700 Message-Id: <20200925145649.5438-11-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Pte_modify() changes a PTE to 'newprot'. It doesn't use the pte_*() helpers that a previous patch fixed up, so we need a new site. Introduce fixup_dirty_pte() to set the dirty bits based on _PAGE_RW, and apply the same changes to pmd_modify(). Signed-off-by: Yu-cheng Yu --- v10: - Replace _PAGE_CHG_MASK approach with fixup functions. arch/x86/include/asm/pgtable.h | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index ac4ed814be96..3bdb192a904b 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -727,6 +727,21 @@ static inline pmd_t pmd_mkinvalid(pmd_t pmd) static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask); +static inline pteval_t fixup_dirty_pte(pteval_t pteval) +{ + pte_t pte = __pte(pteval); + + if (pte_dirty(pte)) { + pte = pte_mkclean(pte); + + if (pte_flags(pte) & _PAGE_RW) + pte = pte_set_flags(pte, _PAGE_DIRTY_HW); + else + pte = pte_set_flags(pte, _PAGE_COW); + } + return pte_val(pte); +} + static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { pteval_t val = pte_val(pte), oldval = val; @@ -737,16 +752,34 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) */ val &= _PAGE_CHG_MASK; val |= check_pgprot(newprot) & ~_PAGE_CHG_MASK; + val = fixup_dirty_pte(val); val = flip_protnone_guard(oldval, val, PTE_PFN_MASK); return __pte(val); } +static inline int pmd_write(pmd_t pmd); +static inline pmdval_t fixup_dirty_pmd(pmdval_t pmdval) +{ + pmd_t pmd = __pmd(pmdval); + + if (pmd_dirty(pmd)) { + pmd = pmd_mkclean(pmd); + + if (pmd_flags(pmd) & _PAGE_RW) + pmd = pmd_set_flags(pmd, _PAGE_DIRTY_HW); + else + pmd = pmd_set_flags(pmd, _PAGE_COW); + } + return pmd_val(pmd); +} + static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) { pmdval_t val = pmd_val(pmd), oldval = val; val &= _HPAGE_CHG_MASK; val |= check_pgprot(newprot) & ~_HPAGE_CHG_MASK; + val = fixup_dirty_pmd(val); val = flip_protnone_guard(oldval, val, PHYSICAL_PMD_PAGE_MASK); return __pmd(val); } From patchwork Fri Sep 25 14:56:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11799983 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 701D2112C for ; Fri, 25 Sep 2020 14:57:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3558620715 for ; Fri, 25 Sep 2020 14:57:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3558620715 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 20BB36B0078; Fri, 25 Sep 2020 10:57:23 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 056158E0001; Fri, 25 Sep 2020 10:57:22 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E46F96B007B; Fri, 25 Sep 2020 10:57:22 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0240.hostedemail.com [216.40.44.240]) by kanga.kvack.org (Postfix) with ESMTP id AE5E16B007B for ; Fri, 25 Sep 2020 10:57:22 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 767B040F4 for ; Fri, 25 Sep 2020 14:57:22 +0000 (UTC) X-FDA: 77301887124.03.chalk41_2717ea427168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 661A828A4EA for ; Fri, 25 Sep 2020 14:57:22 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30056:30064:30070,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yrdgheqwc8dby387hzzmpa77cdfyc6opfa9a9jyab9nauded5a75x84r5xh7m.g7yupnka59613qzmc66fjqw6y335kb9ajt6q6quxz1hz555tckbpig5s87w7hb7.4-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: chalk41_2717ea427168 X-Filterd-Recvd-Size: 6754 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:21 +0000 (UTC) IronPort-SDR: hEyvS+tfdHYJvpTQulHn75iVml5AeIXl9Vl2TlN2p/ZvueUatIrLoWl073u0S1jpEov2Phzz0y C9k0UzZSleoA== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631912" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631912" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:21 -0700 IronPort-SDR: 3NxO/hMGcYgj1wHc1+5KziMu9Rxvdd8dGrJ99Tq1g5Y9ZK0TB2SohUma8MoMPQdwt3N+zSxCRt BBhkKTCN1xbg== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499173" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:20 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v13 11/26] x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for transition from _PAGE_DIRTY_HW to _PAGE_COW Date: Fri, 25 Sep 2020 07:56:34 -0700 Message-Id: <20200925145649.5438-12-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When shadow stack is introduced, [R/O + _PAGE_DIRTY_HW] PTE is reserved for shadow stack. Copy-on-write PTEs have [R/O + _PAGE_COW]. When a PTE goes from [R/W + _PAGE_DIRTY_HW] to [R/O + _PAGE_COW], it could become a transient shadow stack PTE in two cases: The first case is that some processors can start a write but end up seeing a read-only PTE by the time they get to the Dirty bit, creating a transient shadow stack PTE. However, this will not occur on processors supporting shadow stack, therefore we don't need a TLB flush here. The second case is that when the software, without atomic, tests & replaces _PAGE_DIRTY_HW with _PAGE_COW, a transient shadow stack PTE can exist. This is prevented with cmpxchg. Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many insights to the issue. Jann Horn provided the cmpxchg solution. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- v10: - Replace bit shift with pte_wrprotect()/pmd_wrprotect(), which use bit test & shift. - Move READ_ONCE of old_pte into try_cmpxchg() loop. - Change static_cpu_has() to cpu_feature_enabled(). v9: - Change compile-time conditionals to runtime checks. - Fix parameters of try_cmpxchg(): change pte_t/pmd_t to pte_t.pte/pmd_t.pmd. v4: - Implement try_cmpxchg(). arch/x86/include/asm/pgtable.h | 52 ++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 3bdb192a904b..a00d55fda5a2 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1230,6 +1230,32 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { + /* + * Some processors can start a write, but end up seeing a read-only + * PTE by the time they get to the Dirty bit. In this case, they + * will set the Dirty bit, leaving a read-only, Dirty PTE which + * looks like a shadow stack PTE. + * + * However, this behavior has been improved and will not occur on + * processors supporting shadow stack. Without this guarantee, a + * transition to a non-present PTE and flush the TLB would be + * needed. + * + * When changing a writable PTE to read-only and if the PTE has + * _PAGE_DIRTY_HW set, move that bit to _PAGE_COW so that the + * PTE is not a shadow stack PTE. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pte_t old_pte, new_pte; + + do { + old_pte = READ_ONCE(*ptep); + new_pte = pte_wrprotect(old_pte); + + } while (!try_cmpxchg(&ptep->pte, &old_pte.pte, new_pte.pte)); + + return; + } clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); } @@ -1286,6 +1312,32 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { + /* + * Some processors can start a write, but end up seeing a read-only + * PMD by the time they get to the Dirty bit. In this case, they + * will set the Dirty bit, leaving a read-only, Dirty PMD which + * looks like a Shadow Stack PMD. + * + * However, this behavior has been improved and will not occur on + * processors supporting Shadow Stack. Without this guarantee, a + * transition to a non-present PMD and flush the TLB would be + * needed. + * + * When changing a writable PMD to read-only and if the PMD has + * _PAGE_DIRTY_HW set, we move that bit to _PAGE_COW so that the + * PMD is not a shadow stack PMD. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pmd_t old_pmd, new_pmd; + + do { + old_pmd = READ_ONCE(*pmdp); + new_pmd = pmd_wrprotect(old_pmd); + + } while (!try_cmpxchg((pmdval_t *)pmdp, (pmdval_t *)&old_pmd, pmd_val(new_pmd))); + + return; + } clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); } From patchwork Fri Sep 25 14:56:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11799985 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ECB3D92C for ; Fri, 25 Sep 2020 14:57:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B9C9F20715 for ; Fri, 25 Sep 2020 14:57:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B9C9F20715 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C74576B007B; Fri, 25 Sep 2020 10:57:23 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C23966B007E; Fri, 25 Sep 2020 10:57:23 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AC4E86B0080; Fri, 25 Sep 2020 10:57:23 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0076.hostedemail.com [216.40.44.76]) by kanga.kvack.org (Postfix) with ESMTP id 977106B007B for ; Fri, 25 Sep 2020 10:57:23 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 5BF1F8249980 for ; Fri, 25 Sep 2020 14:57:23 +0000 (UTC) X-FDA: 77301887166.21.cup44_4a0d39527168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 3F2A2180442C7 for ; Fri, 25 Sep 2020 14:57:23 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30056:30064:30070,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yfpnckwb8c6z8s11pmbpuuy1adpop5437yt7a5kri351gfay3jprppzb95szb.acp8tiadft5f6mgd1ftfwuotwgjsryyekkaiessra1i4jb96uhm1nmrwy9aneto.1-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: cup44_4a0d39527168 X-Filterd-Recvd-Size: 5311 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:22 +0000 (UTC) IronPort-SDR: r2RLp7VIjPtckNZAoYpRA8Bb5ZomdF7L401qB2324FdtFzsY/4vqIa/LbJb1GIoRVqHBLPmzjw fewPwJmP6YYA== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631915" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631915" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:22 -0700 IronPort-SDR: Xn7tOP38aTzxS1F3DgT594Nc5/t0LVjgKiMcYDrlupByZ9U9chgs6eqiGEd6P1jD3DYPb6t+FY EwIB+RHGZh9Q== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499179" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:21 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v13 12/26] mm: Introduce VM_SHSTK for shadow stack memory Date: Fri, 25 Sep 2020 07:56:35 -0700 Message-Id: <20200925145649.5438-13-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A Shadow Stack PTE must be read-only and have _PAGE_DIRTY set. However, read-only and Dirty PTEs also exist for copy-on-write (COW) pages. These two cases are handled differently for page faults. Introduce VM_SHSTK to track shadow stack VMAs. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- v9: - Add VM_SHSTK case to arch_vma_name(). - Revise the commit log to explain why adding a new VM flag. arch/x86/mm/mmap.c | 2 ++ fs/proc/task_mmu.c | 3 +++ include/linux/mm.h | 8 ++++++++ 3 files changed, 13 insertions(+) diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index c90c20904a60..a22c6b6fc607 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -165,6 +165,8 @@ unsigned long get_mmap_base(int is_legacy) const char *arch_vma_name(struct vm_area_struct *vma) { + if (vma->vm_flags & VM_SHSTK) + return "[shadow stack]"; return NULL; } diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 5066b0251ed8..436dd37f2d4c 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -663,6 +663,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_PKEY_BIT4)] = "", #endif #endif /* CONFIG_ARCH_HAS_PKEYS */ +#ifdef CONFIG_X86_SHADOW_STACK_USER + [ilog2(VM_SHSTK)] = "ss", +#endif }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index b2f370f0b420..a1d61731d7b4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -304,11 +304,13 @@ extern unsigned int kobjsize(const void *objp); #define VM_HIGH_ARCH_BIT_2 34 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_3 35 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_4 36 /* bit only usable on 64-bit architectures */ +#define VM_HIGH_ARCH_BIT_5 37 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0) #define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1) #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2) #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3) #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4) +#define VM_HIGH_ARCH_5 BIT(VM_HIGH_ARCH_BIT_5) #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */ #ifdef CONFIG_ARCH_HAS_PKEYS @@ -324,6 +326,12 @@ extern unsigned int kobjsize(const void *objp); #endif #endif /* CONFIG_ARCH_HAS_PKEYS */ +#ifdef CONFIG_X86_SHADOW_STACK_USER +# define VM_SHSTK VM_HIGH_ARCH_5 +#else +# define VM_SHSTK VM_NONE +#endif + #if defined(CONFIG_X86) # define VM_PAT VM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ #elif defined(CONFIG_PPC) From patchwork Fri Sep 25 14:56:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11799987 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3ABD792C for ; Fri, 25 Sep 2020 14:57:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 07FFE21D42 for ; Fri, 25 Sep 2020 14:57:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 07FFE21D42 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 438196B007E; Fri, 25 Sep 2020 10:57:25 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3C19F6B0080; Fri, 25 Sep 2020 10:57:25 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 287806B0081; Fri, 25 Sep 2020 10:57:25 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0190.hostedemail.com [216.40.44.190]) by kanga.kvack.org (Postfix) with ESMTP id 013ED6B007E for ; Fri, 25 Sep 2020 10:57:24 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 95088181AE867 for ; Fri, 25 Sep 2020 14:57:24 +0000 (UTC) X-FDA: 77301887208.20.net09_4f0f69227168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id 4878B180C07AF for ; Fri, 25 Sep 2020 14:57:24 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30054:30056:30064:30069:30070,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yrs6rztfu8eske9ech98y39jmenypd9bf64hsw76mmgnwjyiwo8y8j7cbu4sd.gxpm91jod81ppowoyqkbuyt3puxfeda4qny6xryb8tr91p1y5t5gwzr7p6idfzn.q-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: net09_4f0f69227168 X-Filterd-Recvd-Size: 6048 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:23 +0000 (UTC) IronPort-SDR: lUceU+15ZgNLHOw4ERDC0kVWPNCijKOH7Jj87dEieWKY1WcFzvPAMx6nByMzp1/u/9wA6QSZMi oCHdDHB8vjMw== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631917" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631917" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:22 -0700 IronPort-SDR: r1BMx5Qu3bfGkLVfz4QblRFjZhcbtWh8XVeTX73hcYo5zqxypGMr/4S4Ga9RbSSkPZd7lbeAB5 KusDdOKr0zzA== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499183" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:22 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v13 13/26] x86/mm: Shadow Stack page fault error checking Date: Fri, 25 Sep 2020 07:56:36 -0700 Message-Id: <20200925145649.5438-14-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Shadow stack accesses are those that are performed by the CPU where it expects to encounter a shadow stack mapping. These accesses are performed implicitly by CALL/RET at the site of the shadow stack pointer. These accesses are made explicitly by shadow stack management instructions like WRUSSQ. Shadow stacks accesses to shadow-stack mapping can see faults in normal, valid operation just like regular accesses to regular mappings. Shadow stacks need some of the same features like delayed allocation, swap and copy-on-write. Shadow stack accesses can also result in errors, such as when a shadow stack overflows, or if a shadow stack access occurs to a non-shadow-stack mapping. In handling a shadow stack page fault, verify it occurs within a shadow stack mapping. It is always an error otherwise. For valid shadow stack accesses, set FAULT_FLAG_WRITE to effect copy-on-write. Because clearing _PAGE_DIRTY_HW (vs. _PAGE_RW) is used to trigger the fault, shadow stack read fault and shadow stack write fault are not differentiated and both are handled as a write access. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- v10: -Revise commit log. arch/x86/include/asm/traps.h | 2 ++ arch/x86/mm/fault.c | 19 +++++++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index 714b1a30e7b0..28b493c53d70 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -50,6 +50,7 @@ void __noreturn handle_stack_overflow(const char *message, * bit 3 == 1: use of reserved bit detected * bit 4 == 1: fault was an instruction fetch * bit 5 == 1: protection keys block access + * bit 6 == 1: shadow stack access fault */ enum x86_pf_error_code { X86_PF_PROT = 1 << 0, @@ -58,5 +59,6 @@ enum x86_pf_error_code { X86_PF_RSVD = 1 << 3, X86_PF_INSTR = 1 << 4, X86_PF_PK = 1 << 5, + X86_PF_SHSTK = 1 << 6, }; #endif /* _ASM_X86_TRAPS_H */ diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 6e3e8a124903..2390399c157f 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1110,6 +1110,17 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) (error_code & X86_PF_INSTR), foreign)) return 1; + /* + * Verify a shadow stack access is within a shadow stack VMA. + * It is always an error otherwise. Normal data access to a + * shadow stack area is checked in the case followed. + */ + if (error_code & X86_PF_SHSTK) { + if (!(vma->vm_flags & VM_SHSTK)) + return 1; + return 0; + } + if (error_code & X86_PF_WRITE) { /* write, present and write, not present: */ if (unlikely(!(vma->vm_flags & VM_WRITE))) @@ -1275,6 +1286,14 @@ void do_user_addr_fault(struct pt_regs *regs, perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); + /* + * Clearing _PAGE_DIRTY_HW is used to detect shadow stack access. + * This method cannot distinguish shadow stack read vs. write. + * For valid shadow stack accesses, set FAULT_FLAG_WRITE to effect + * copy-on-write. + */ + if (hw_error_code & X86_PF_SHSTK) + flags |= FAULT_FLAG_WRITE; if (hw_error_code & X86_PF_WRITE) flags |= FAULT_FLAG_WRITE; if (hw_error_code & X86_PF_INSTR) From patchwork Fri Sep 25 14:56:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11799989 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3F813112C for ; Fri, 25 Sep 2020 14:57:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 09EC92344C for ; Fri, 25 Sep 2020 14:57:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 09EC92344C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6AA076B0080; Fri, 25 Sep 2020 10:57:26 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5E63C6B0085; Fri, 25 Sep 2020 10:57:26 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 371E16B0082; Fri, 25 Sep 2020 10:57:26 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0223.hostedemail.com [216.40.44.223]) by kanga.kvack.org (Postfix) with ESMTP id 1489F6B0080 for ; Fri, 25 Sep 2020 10:57:26 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id BB3508249980 for ; Fri, 25 Sep 2020 14:57:25 +0000 (UTC) X-FDA: 77301887250.15.show81_3e0048027168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin15.hostedemail.com (Postfix) with ESMTP id 8507D1814B0C1 for ; Fri, 25 Sep 2020 14:57:25 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30054:30056:30064:30070,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yg1rhx81weij1446bw5pp9esk7dopp3hwu8jkm9fyy1jdnffm8x4hs1mt3hd7.wfke86crtg1fjikd9jihc4rgotgbu51wa1dqthxiyrspibjfiqzhx4h77zcan4u.c-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: show81_3e0048027168 X-Filterd-Recvd-Size: 6344 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:24 +0000 (UTC) IronPort-SDR: eBnJuHfEmKZcpfIx0D+pIzBqnQeD5YWlV5Di1llcIxixiIHyEa9zNw9jXHgr7roVy3ZsrM7iv5 2Ju/0+JmjguA== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631919" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631919" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:23 -0700 IronPort-SDR: twPg8fnhEtOjAeOmEL6Ik09OLMi9hy018ZosHDFgSurT+Sw0K62QINMdGZn8y9qqprwo7Swm23 SbPvEQbjHiuA== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499191" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:22 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v13 14/26] x86/mm: Update maybe_mkwrite() for shadow stack Date: Fri, 25 Sep 2020 07:56:37 -0700 Message-Id: <20200925145649.5438-15-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Shadow stack memory is writable, but its VMA has VM_SHSTK instead of VM_WRITE. Update maybe_mkwrite() to include the shadow stack. Signed-off-by: Yu-cheng Yu --- arch/x86/Kconfig | 4 ++++ arch/x86/mm/pgtable.c | 18 ++++++++++++++++++ include/linux/mm.h | 2 ++ include/linux/pgtable.h | 24 ++++++++++++++++++++++++ mm/huge_memory.c | 2 ++ 5 files changed, 50 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 415fcc869afc..7578327226e3 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1935,6 +1935,9 @@ config AS_HAS_SHADOW_STACK config X86_CET def_bool n +config ARCH_MAYBE_MKWRITE + def_bool n + config ARCH_HAS_SHADOW_STACK def_bool n @@ -1945,6 +1948,7 @@ config X86_SHADOW_STACK_USER depends on AS_HAS_SHADOW_STACK select ARCH_USES_HIGH_VMA_FLAGS select X86_CET + select ARCH_MAYBE_MKWRITE select ARCH_HAS_SHADOW_STACK help Shadow Stacks provides protection against program stack diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index dfd82f51ba66..a9666b64bc05 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -610,6 +610,24 @@ int pmdp_clear_flush_young(struct vm_area_struct *vma, } #endif +#ifdef CONFIG_ARCH_MAYBE_MKWRITE +pte_t arch_maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) +{ + if (likely(vma->vm_flags & VM_SHSTK)) + pte = pte_mkwrite_shstk(pte); + return pte; +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +pmd_t arch_maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) +{ + if (likely(vma->vm_flags & VM_SHSTK)) + pmd = pmd_mkwrite_shstk(pmd); + return pmd; +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#endif /* CONFIG_ARCH_MAYBE_MKWRITE */ + /** * reserve_top_address - reserves a hole in the top of kernel address space * @reserve - size of hole to reserve diff --git a/include/linux/mm.h b/include/linux/mm.h index a1d61731d7b4..db76ced54f9a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -969,6 +969,8 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) { if (likely(vma->vm_flags & VM_WRITE)) pte = pte_mkwrite(pte); + else + pte = arch_maybe_mkwrite(pte, vma); return pte; } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index e8cbc2e795d5..a665fc8c0eaf 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1356,6 +1356,30 @@ static inline bool arch_has_pfn_modify_check(void) } #endif /* !_HAVE_ARCH_PFN_MODIFY_ALLOWED */ +#ifdef CONFIG_MMU +#ifdef CONFIG_ARCH_MAYBE_MKWRITE +pte_t arch_maybe_mkwrite(pte_t pte, struct vm_area_struct *vma); + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +pmd_t arch_maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + +#else /* !CONFIG_ARCH_MAYBE_MKWRITE */ +static inline pte_t arch_maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) +{ + return pte; +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static inline pmd_t arch_maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) +{ + return pmd; +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + +#endif /* CONFIG_ARCH_MAYBE_MKWRITE */ +#endif /* CONFIG_MMU */ + /* * Architecture PAGE_KERNEL_* fallbacks * diff --git a/mm/huge_memory.c b/mm/huge_memory.c index faadc449cca5..aff9eb39f048 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -464,6 +464,8 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { if (likely(vma->vm_flags & VM_WRITE)) pmd = pmd_mkwrite(pmd); + else + pmd = arch_maybe_pmd_mkwrite(pmd, vma); return pmd; } From patchwork Fri Sep 25 14:56:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11799993 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 63148112C for ; Fri, 25 Sep 2020 14:58:05 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 22A8E20715 for ; Fri, 25 Sep 2020 14:58:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 22A8E20715 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CC25A6B0082; Fri, 25 Sep 2020 10:57:26 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BDE958E0001; Fri, 25 Sep 2020 10:57:26 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 98A7E6B0087; Fri, 25 Sep 2020 10:57:26 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0067.hostedemail.com [216.40.44.67]) by kanga.kvack.org (Postfix) with ESMTP id 61E2F6B0082 for ; Fri, 25 Sep 2020 10:57:26 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2A41C181AE860 for ; Fri, 25 Sep 2020 14:57:26 +0000 (UTC) X-FDA: 77301887292.02.route87_510fb5b27168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id E3394101A6B0A for ; Fri, 25 Sep 2020 14:57:25 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30054:30056:30064:30070,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yftxy3cgj3emexwnqix9bc97jh8ocf7ug9yofx4wnokgxmhcptck6q8s3dhng.szwrd4n5iui6aqzcsw8qi5c4mu7yqbnaoowj43cxwiaehtqdxk79hiuc48gzjmn.g-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: route87_510fb5b27168 X-Filterd-Recvd-Size: 5337 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:24 +0000 (UTC) IronPort-SDR: 3EAojNfLpOeQhVt+YeBc1m2kbzGeY34TzAQrRmnkItvaJkLmeHGiPfqvESA9+oQ2l9MYvDCiZQ +xvUHD4ncybQ== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631920" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631920" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:24 -0700 IronPort-SDR: 8mfZHM0cguQkVHLYG3qxnyH0e1lzwkB9fxCdsUeNxmOxzE6TTw0NJqFWbi4ZbQ/8RTJ9DW/hry zbRnu82JAN/A== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499197" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:23 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v13 15/26] mm: Fixup places that call pte_mkwrite() directly Date: Fri, 25 Sep 2020 07:56:38 -0700 Message-Id: <20200925145649.5438-16-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A shadow stack page is made writable by pte_mkwrite_shstk(), which sets _PAGE_DIRTY_HW. There are a few places that call pte_mkwrite() directly and miss the maybe_mkwrite() fixup in the previous patch. Fix them with maybe_mkwrite(): - do_anonymous_page() and migrate_vma_insert_page() check VM_WRITE directly and call pte_mkwrite(), which is the same as maybe_mkwrite(). Change them to maybe_mkwrite(). - In do_numa_page(), if the numa entry 'was-writable', then pte_mkwrite() is called directly. Fix it by doing maybe_mkwrite(). - In change_pte_range(), pte_mkwrite() is called directly. Replace it with maybe_mkwrite(). Signed-off-by: Yu-cheng Yu --- mm/memory.c | 5 ++--- mm/migrate.c | 3 +-- mm/mprotect.c | 2 +- 3 files changed, 4 insertions(+), 6 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 469af373ae76..c7c9d2d1118a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3374,8 +3374,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) entry = mk_pte(page, vma->vm_page_prot); entry = pte_sw_mkyoung(entry); - if (vma->vm_flags & VM_WRITE) - entry = pte_mkwrite(pte_mkdirty(entry)); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); @@ -4029,7 +4028,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) pte = pte_modify(old_pte, vma->vm_page_prot); pte = pte_mkyoung(pte); if (was_writable) - pte = pte_mkwrite(pte); + pte = maybe_mkwrite(pte, vma); ptep_modify_prot_commit(vma, vmf->address, vmf->pte, old_pte, pte); update_mmu_cache(vma, vmf->address, vmf->pte); diff --git a/mm/migrate.c b/mm/migrate.c index aecb1433cf3c..3356e1888c0e 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2905,8 +2905,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, } } else { entry = mk_pte(page, vma->vm_page_prot); - if (vma->vm_flags & VM_WRITE) - entry = pte_mkwrite(pte_mkdirty(entry)); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); diff --git a/mm/mprotect.c b/mm/mprotect.c index ce8b8a5eacbb..a8edbcb3af99 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -135,7 +135,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, if (dirty_accountable && pte_dirty(ptent) && (pte_soft_dirty(ptent) || !(vma->vm_flags & VM_SOFTDIRTY))) { - ptent = pte_mkwrite(ptent); + ptent = maybe_mkwrite(ptent, vma); } ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); pages++; From patchwork Fri Sep 25 14:56:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11799991 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0DA4C112C for ; Fri, 25 Sep 2020 14:58:02 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CBFB220715 for ; Fri, 25 Sep 2020 14:58:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CBFB220715 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A1E9E6B0081; Fri, 25 Sep 2020 10:57:26 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 963006B0088; Fri, 25 Sep 2020 10:57:26 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 808668E0001; Fri, 25 Sep 2020 10:57:26 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0187.hostedemail.com [216.40.44.187]) by kanga.kvack.org (Postfix) with ESMTP id 598E86B0081 for ; Fri, 25 Sep 2020 10:57:26 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 20A054DCD for ; Fri, 25 Sep 2020 14:57:26 +0000 (UTC) X-FDA: 77301887292.19.game40_000f0f427168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin19.hostedemail.com (Postfix) with ESMTP id D85981ACEBC for ; Fri, 25 Sep 2020 14:57:25 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30054:30056:30064,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yrfwztsenx84qi3qej84tq6befnych91ne5baonnfy7rpum4gm9jhhxgua4h3.iqdfeufm19tme9rddinuf5cnn484sowk1pixz4hifo4a4hbow6shxakzbbn4etd.q-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: game40_000f0f427168 X-Filterd-Recvd-Size: 5963 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:25 +0000 (UTC) IronPort-SDR: xu/gw+mjBv0ZlQb+LQrqQuZcjFMgaVAEnXfO5J6TrlX3lDsGY3GWk/m2zlwunc62phwz/2T2zQ GxNBv1PcWW7Q== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631926" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631926" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:24 -0700 IronPort-SDR: Rhq31SWCqEbWpGXoWOZJUDAE2t0HyQQElpwsz2fb4mQqS9x9l6PGOYb9pJL9spIwm6aB3X+Yqo ddocbUFzxLMQ== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499204" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:24 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v13 16/26] mm: Add guard pages around a shadow stack. Date: Fri, 25 Sep 2020 07:56:39 -0700 Message-Id: <20200925145649.5438-17-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: INCSSP(Q/D) increments shadow stack pointer and 'pops and discards' the first and the last elements in the range, effectively touches those memory areas. The maximum moving distance by INCSSPQ is 255 * 8 = 2040 bytes and 255 * 4 = 1020 bytes by INCSSPD. Both ranges are far from PAGE_SIZE. Thus, putting a gap page on both ends of a shadow stack prevents INCSSP, CALL, and RET from going beyond. Signed-off-by: Yu-cheng Yu --- v10: - Define ARCH_SHADOW_STACK_GUARD_GAP. arch/x86/include/asm/processor.h | 10 ++++++++++ include/linux/mm.h | 24 ++++++++++++++++++++---- 2 files changed, 30 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 97143d87994c..01acbd63cad8 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -840,6 +840,16 @@ static inline void spin_lock_prefetch(const void *x) #define STACK_TOP TASK_SIZE_LOW #define STACK_TOP_MAX TASK_SIZE_MAX +/* + * Shadow stack pointer is moved by CALL, JMP, and INCSSP(Q/D). INCSSPQ + * moves shadow stack pointer up to 255 * 8 = ~2 KB (~1KB for INCSSPD) and + * touches the first and the last element in the range, which triggers a + * page fault if the range is not in a shadow stack. Because of this, + * creating 4-KB guard pages around a shadow stack prevents these + * instructions from going beyond. + */ +#define ARCH_SHADOW_STACK_GUARD_GAP PAGE_SIZE + #define INIT_THREAD { \ .addr_limit = KERNEL_DS, \ } diff --git a/include/linux/mm.h b/include/linux/mm.h index db76ced54f9a..e09d13699bbe 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2615,6 +2615,10 @@ extern vm_fault_t filemap_page_mkwrite(struct vm_fault *vmf); int __must_check write_one_page(struct page *page); void task_dirty_inc(struct task_struct *tsk); +#ifndef ARCH_SHADOW_STACK_GUARD_GAP +#define ARCH_SHADOW_STACK_GUARD_GAP 0 +#endif + extern unsigned long stack_guard_gap; /* Generic expand stack which grows the stack according to GROWS{UP,DOWN} */ extern int expand_stack(struct vm_area_struct *vma, unsigned long address); @@ -2647,9 +2651,15 @@ static inline struct vm_area_struct * find_vma_intersection(struct mm_struct * m static inline unsigned long vm_start_gap(struct vm_area_struct *vma) { unsigned long vm_start = vma->vm_start; + unsigned long gap = 0; - if (vma->vm_flags & VM_GROWSDOWN) { - vm_start -= stack_guard_gap; + if (vma->vm_flags & VM_GROWSDOWN) + gap = stack_guard_gap; + else if (vma->vm_flags & VM_SHSTK) + gap = ARCH_SHADOW_STACK_GUARD_GAP; + + if (gap != 0) { + vm_start -= gap; if (vm_start > vma->vm_start) vm_start = 0; } @@ -2659,9 +2669,15 @@ static inline unsigned long vm_start_gap(struct vm_area_struct *vma) static inline unsigned long vm_end_gap(struct vm_area_struct *vma) { unsigned long vm_end = vma->vm_end; + unsigned long gap = 0; + + if (vma->vm_flags & VM_GROWSUP) + gap = stack_guard_gap; + else if (vma->vm_flags & VM_SHSTK) + gap = ARCH_SHADOW_STACK_GUARD_GAP; - if (vma->vm_flags & VM_GROWSUP) { - vm_end += stack_guard_gap; + if (gap != 0) { + vm_end += gap; if (vm_end < vma->vm_end) vm_end = -PAGE_SIZE; } From patchwork Fri Sep 25 14:56:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11799995 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4A126112C for ; Fri, 25 Sep 2020 14:58:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0864B20715 for ; Fri, 25 Sep 2020 14:58:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0864B20715 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 71E9F6B0085; Fri, 25 Sep 2020 10:57:27 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 67F846B0087; Fri, 25 Sep 2020 10:57:27 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4FC5F8E0001; Fri, 25 Sep 2020 10:57:27 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0109.hostedemail.com [216.40.44.109]) by kanga.kvack.org (Postfix) with ESMTP id 347926B0085 for ; Fri, 25 Sep 2020 10:57:27 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id E92E78249980 for ; Fri, 25 Sep 2020 14:57:26 +0000 (UTC) X-FDA: 77301887292.05.care30_4305ac827168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id C88F01803032D for ; Fri, 25 Sep 2020 14:57:26 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30001:30054:30056:30064:30070,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04ygzfdczjyzg8phuxbkni4gpiy4myp8uigy6srgcr9eyneuu41jibnyjta4g9m.3ibmyy6951pr3upb1ywi1kohwyy9j8x3cxodx5qb1p5katchqy16p5fr6e8cimr.1-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: care30_4305ac827168 X-Filterd-Recvd-Size: 4935 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:25 +0000 (UTC) IronPort-SDR: YG7PjVqZPhHOO7e9zjc9jmp0fbZJ2JKCvSdkGto0fsSAcGJ4P8fzda8Yi7NtNRp+rma/ZKXPXW 7Ht7T1JjajBQ== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631929" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631929" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:25 -0700 IronPort-SDR: T9jdrA8dWCLUdB2us7B18YbC0J+yfvBPFdDUJw1B0Zcs2YOrLjsspmxad3zGXmTOHm+NOWBQSw nVGOgoI39x0A== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499210" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:24 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v13 17/26] mm/mmap: Add shadow stack pages to memory accounting Date: Fri, 25 Sep 2020 07:56:40 -0700 Message-Id: <20200925145649.5438-18-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Account shadow stack pages to stack memory. Signed-off-by: Yu-cheng Yu --- v10: - Use arch_shadow_stack_mapping() to make meaning clear. v8: - Change shadow stake pages from data_vm to stack_vm. arch/x86/mm/pgtable.c | 7 +++++++ include/linux/pgtable.h | 11 +++++++++++ mm/mmap.c | 5 +++++ 3 files changed, 23 insertions(+) diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index a9666b64bc05..68e98f70298b 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -893,3 +893,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) #endif /* CONFIG_X86_64 */ #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ + +#ifdef CONFIG_ARCH_HAS_SHADOW_STACK +bool arch_shadow_stack_mapping(vm_flags_t vm_flags) +{ + return (vm_flags & VM_SHSTK); +} +#endif diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index a665fc8c0eaf..dc3511316718 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1380,6 +1380,17 @@ static inline pmd_t arch_maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma #endif /* CONFIG_ARCH_MAYBE_MKWRITE */ #endif /* CONFIG_MMU */ +#ifdef CONFIG_MMU +#ifdef CONFIG_ARCH_HAS_SHADOW_STACK +bool arch_shadow_stack_mapping(vm_flags_t vm_flags); +#else +static inline bool arch_shadow_stack_mapping(vm_flags_t vm_flags) +{ + return false; +} +#endif /* CONFIG_ARCH_HAS_SHADOW_STACK */ +#endif /* CONFIG_MMU */ + /* * Architecture PAGE_KERNEL_* fallbacks * diff --git a/mm/mmap.c b/mm/mmap.c index 40248d84ad5f..574b3f273462 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1682,6 +1682,9 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags) if (file && is_file_hugepages(file)) return 0; + if (arch_shadow_stack_mapping(vm_flags)) + return 1; + return (vm_flags & (VM_NORESERVE | VM_SHARED | VM_WRITE)) == VM_WRITE; } @@ -3352,6 +3355,8 @@ void vm_stat_account(struct mm_struct *mm, vm_flags_t flags, long npages) mm->stack_vm += npages; else if (is_data_mapping(flags)) mm->data_vm += npages; + else if (arch_shadow_stack_mapping(flags)) + mm->stack_vm += npages; } static vm_fault_t special_mapping_fault(struct vm_fault *vmf); From patchwork Fri Sep 25 14:56:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11799997 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E2D9392C for ; Fri, 25 Sep 2020 14:58:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A3E8B20715 for ; Fri, 25 Sep 2020 14:58:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A3E8B20715 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5232A6B0087; Fri, 25 Sep 2020 10:57:28 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4D1548E0001; Fri, 25 Sep 2020 10:57:28 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 371F26B0089; Fri, 25 Sep 2020 10:57:28 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0055.hostedemail.com [216.40.44.55]) by kanga.kvack.org (Postfix) with ESMTP id 175F76B0087 for ; Fri, 25 Sep 2020 10:57:28 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id C9BB3582B for ; Fri, 25 Sep 2020 14:57:27 +0000 (UTC) X-FDA: 77301887334.14.tree48_460637f27168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id A4E0118229818 for ; Fri, 25 Sep 2020 14:57:27 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30029:30054:30056:30064:30069:30070,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04y8groeij7k5ig5iy6fj3iugkaufopb3usdy4nenrewcwy3j45hc53kux175m5.oozzi19aniwh4n7qway5dirwpzz8t4dgm83q9fqz8kghk44k1bewjwbqgp7rpf9.c-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: tree48_460637f27168 X-Filterd-Recvd-Size: 5718 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:27 +0000 (UTC) IronPort-SDR: eE4NzDH1TrWhoI6ogc2bGaReo8OsoFh5v8AIYoNkuZQskqCvqEyyvlypmq78e0LZQoXChXtAt9 RKOMPCKIXiSA== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631930" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631930" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:26 -0700 IronPort-SDR: mu5voJLUF53gAbUC7pCUFtdtkawh3ajdwWBti+hbmKSnJ6nkMEYg2FS1tGJyigWlGvMdxA9P3x x2w5tK01ummw== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499213" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:25 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v13 18/26] mm: Update can_follow_write_pte() for shadow stack Date: Fri, 25 Sep 2020 07:56:41 -0700 Message-Id: <20200925145649.5438-19-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Can_follow_write_pte() ensures a read-only page is COWed by checking the FOLL_COW flag, and uses pte_dirty() to validate the flag is still valid. Like a writable data page, a shadow stack page is writable, and becomes read-only during copy-on-write, but it is always dirty. Thus, in the can_follow_write_pte() check, it belongs to the writable page case and should be excluded from the read-only page pte_dirty() check. Apply the same changes to can_follow_write_pmd(). Signed-off-by: Yu-cheng Yu --- v10: - Reverse name changes to can_follow_write_*(). mm/gup.c | 8 +++++--- mm/huge_memory.c | 8 +++++--- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index e5739a1974d5..bbe4d32269e3 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -384,10 +384,12 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address, * FOLL_FORCE can write to even unwritable pte's, but only * after we've gone through a COW cycle and they are dirty. */ -static inline bool can_follow_write_pte(pte_t pte, unsigned int flags) +static inline bool can_follow_write_pte(pte_t pte, unsigned int flags, + struct vm_area_struct *vma) { return pte_write(pte) || - ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte)); + ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte) && + !arch_shadow_stack_mapping(vma->vm_flags)); } static struct page *follow_page_pte(struct vm_area_struct *vma, @@ -430,7 +432,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, } if ((flags & FOLL_NUMA) && pte_protnone(pte)) goto no_page; - if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags)) { + if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags, vma)) { pte_unmap_unlock(ptep, ptl); return NULL; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index aff9eb39f048..ce8c06a4a813 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1296,10 +1296,12 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd) * FOLL_FORCE can write to even unwritable pmd's, but only * after we've gone through a COW cycle and they are dirty. */ -static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags) +static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags, + struct vm_area_struct *vma) { return pmd_write(pmd) || - ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd)); + ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd) && + !arch_shadow_stack_mapping(vma->vm_flags)); } struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, @@ -1312,7 +1314,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, assert_spin_locked(pmd_lockptr(mm, pmd)); - if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags)) + if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags, vma)) goto out; /* Avoid dumping huge zero page */ From patchwork Fri Sep 25 14:56:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11799999 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8AD80112C for ; Fri, 25 Sep 2020 14:58:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4AF2020715 for ; Fri, 25 Sep 2020 14:58:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4AF2020715 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F0A686B0088; Fri, 25 Sep 2020 10:57:28 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E94916B0089; Fri, 25 Sep 2020 10:57:28 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D37176B008A; Fri, 25 Sep 2020 10:57:28 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0229.hostedemail.com [216.40.44.229]) by kanga.kvack.org (Postfix) with ESMTP id AA9AB6B0088 for ; Fri, 25 Sep 2020 10:57:28 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 76A04181AE860 for ; Fri, 25 Sep 2020 14:57:28 +0000 (UTC) X-FDA: 77301887376.26.wind82_0a0d0ce27168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id 56E9F1804B654 for ; Fri, 25 Sep 2020 14:57:28 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:4423:30003:30051:30054:30056:30064:30070:30090,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yg4ymsacxzsbniesn5tsos5oi53opohz6srun6znd3yn7k37cw8e9s1cz5g33.3mrc3qpdnh9z1f3nwi8w7mi1mbrfodp3b5ofrgbibukthd9tng9j99cce8hezup.s-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:26,LUA_SUMMARY:none X-HE-Tag: wind82_0a0d0ce27168 X-Filterd-Recvd-Size: 12381 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:27 +0000 (UTC) IronPort-SDR: sjLrdZjYKTeC4R6Gp9ZJyEkzxMX8VGT3rd8wEG2VYMMymBsxzlQ9B+JaYlt0qljfeQxcytiv01 kBULzZVWqdkg== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631932" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631932" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:27 -0700 IronPort-SDR: ZATIUGVPkvsDQ3H+ieDr8zSeF5Uj19WioKk/jVh4/FaYS5OcSqVFTs4V5vm+NYiBAP0BP63UHs iYvSNTOT3sOw== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499218" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:26 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu , Peter Collingbourne , Andrew Morton Subject: [PATCH v13 19/26] mm: Re-introduce do_mmap_pgoff() Date: Fri, 25 Sep 2020 07:56:42 -0700 Message-Id: <20200925145649.5438-20-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There was no more caller passing vm_flags to do_mmap(), and vm_flags was removed from the function's input by: commit 45e55300f114 ("mm: remove unnecessary wrapper function do_mmap_pgoff()"). There is a new user now. Shadow stack allocation passes VM_SHSTK to do_mmap(). Re-introduce the vm_flags and do_mmap_pgoff(). Signed-off-by: Yu-cheng Yu Cc: Peter Collingbourne Cc: Andrew Morton Cc: Oleg Nesterov Cc: linux-mm@kvack.org Signed-off-by: Yu-cheng Yu Reviewed-by: Peter Collingbourne --- fs/aio.c | 6 +++--- fs/hugetlbfs/inode.c | 2 +- include/linux/fs.h | 2 +- include/linux/mm.h | 12 +++++++++++- ipc/shm.c | 2 +- mm/mmap.c | 16 ++++++++-------- mm/nommu.c | 6 +++--- mm/shmem.c | 2 +- mm/util.c | 4 ++-- 9 files changed, 31 insertions(+), 21 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index d5ec30385566..22d19a4ad586 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -525,9 +525,9 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) return -EINTR; } - ctx->mmap_base = do_mmap(ctx->aio_ring_file, 0, ctx->mmap_size, - PROT_READ | PROT_WRITE, - MAP_SHARED, 0, &unused, NULL); + ctx->mmap_base = do_mmap_pgoff(ctx->aio_ring_file, 0, ctx->mmap_size, + PROT_READ | PROT_WRITE, + MAP_SHARED, 0, &unused, NULL); mmap_write_unlock(mm); if (IS_ERR((void *)ctx->mmap_base)) { ctx->mmap_size = 0; diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index b5c109703daa..f936bcf02cce 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -140,7 +140,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma) * already been checked by prepare_hugepage_range. If you add * any error returns here, do so after setting VM_HUGETLB, so * is_vm_hugetlb_page tests below unmap_region go the right - * way when do_mmap unwinds (may be important on powerpc + * way when do_mmap_pgoff unwinds (may be important on powerpc * and ia64). */ vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND; diff --git a/include/linux/fs.h b/include/linux/fs.h index 7519ae003a08..f7df4558f72c 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -538,7 +538,7 @@ static inline int mapping_mapped(struct address_space *mapping) /* * Might pages of this file have been modified in userspace? - * Note that i_mmap_writable counts all VM_SHARED vmas: do_mmap + * Note that i_mmap_writable counts all VM_SHARED vmas: do_mmap_pgoff * marks vma as VM_SHARED if it is shared, and the file was opened for * writing i.e. vma may be mprotected writable even if now readonly. * diff --git a/include/linux/mm.h b/include/linux/mm.h index e09d13699bbe..9b6a0f22cd89 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2560,13 +2560,23 @@ extern unsigned long mmap_region(struct file *file, unsigned long addr, struct list_head *uf); extern unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, - unsigned long pgoff, unsigned long *populate, struct list_head *uf); + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, + struct list_head *uf); extern int __do_munmap(struct mm_struct *, unsigned long, size_t, struct list_head *uf, bool downgrade); extern int do_munmap(struct mm_struct *, unsigned long, size_t, struct list_head *uf); extern int do_madvise(unsigned long start, size_t len_in, int behavior); +static inline unsigned long +do_mmap_pgoff(struct file *file, unsigned long addr, + unsigned long len, unsigned long prot, unsigned long flags, + unsigned long pgoff, unsigned long *populate, + struct list_head *uf) +{ + return do_mmap(file, addr, len, prot, flags, 0, pgoff, populate, uf); +} + #ifdef CONFIG_MMU extern int __mm_populate(unsigned long addr, unsigned long len, int ignore_errors); diff --git a/ipc/shm.c b/ipc/shm.c index e25c7c6106bc..3131c1de6bba 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -1556,7 +1556,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, goto invalid; } - addr = do_mmap(file, addr, size, prot, flags, 0, &populate, NULL); + addr = do_mmap_pgoff(file, addr, size, prot, flags, 0, &populate, NULL); *raddr = addr; err = 0; if (IS_ERR_VALUE(addr)) diff --git a/mm/mmap.c b/mm/mmap.c index 574b3f273462..81d4a00092da 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1030,7 +1030,7 @@ static inline int is_mergeable_anon_vma(struct anon_vma *anon_vma1, * anon_vmas, nor if same anon_vma is assigned but offsets incompatible. * * We don't check here for the merged mmap wrapping around the end of pagecache - * indices (16TB on ia32) because do_mmap() does not permit mmap's which + * indices (16TB on ia32) because do_mmap_pgoff() does not permit mmap's which * wrap, nor mmaps which cover the final page at index -1UL. */ static int @@ -1365,11 +1365,11 @@ static inline bool file_mmap_ok(struct file *file, struct inode *inode, */ unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, - unsigned long flags, unsigned long pgoff, - unsigned long *populate, struct list_head *uf) + unsigned long flags, vm_flags_t vm_flags, + unsigned long pgoff, unsigned long *populate, + struct list_head *uf) { struct mm_struct *mm = current->mm; - vm_flags_t vm_flags; int pkey = 0; *populate = 0; @@ -1431,7 +1431,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, * to. we assume access permissions have been handled by the open * of the memory object, so we don't do any here. */ - vm_flags = calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | + vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; if (flags & MAP_LOCKED) @@ -2233,7 +2233,7 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, /* * mmap_region() will call shmem_zero_setup() to create a file, * so use shmem's get_unmapped_area in case it can be huge. - * do_mmap() will clear pgoff, so match alignment. + * do_mmap_pgoff() will clear pgoff, so match alignment. */ pgoff = 0; get_area = shmem_get_unmapped_area; @@ -3006,7 +3006,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, } file = get_file(vma->vm_file); - ret = do_mmap(vma->vm_file, start, size, + ret = do_mmap_pgoff(vma->vm_file, start, size, prot, flags, pgoff, &populate, NULL); fput(file); out: @@ -3226,7 +3226,7 @@ int insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma) * By setting it to reflect the virtual start address of the * vma, merges and splits can happen in a seamless way, just * using the existing file pgoff checks and manipulations. - * Similarly in do_mmap and in do_brk. + * Similarly in do_mmap_pgoff and in do_brk. */ if (vma_is_anonymous(vma)) { BUG_ON(vma->anon_vma); diff --git a/mm/nommu.c b/mm/nommu.c index 75a327149af1..71a4ea828f06 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -1078,6 +1078,7 @@ unsigned long do_mmap(struct file *file, unsigned long len, unsigned long prot, unsigned long flags, + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf) @@ -1085,7 +1086,6 @@ unsigned long do_mmap(struct file *file, struct vm_area_struct *vma; struct vm_region *region; struct rb_node *rb; - vm_flags_t vm_flags; unsigned long capabilities, result; int ret; @@ -1104,7 +1104,7 @@ unsigned long do_mmap(struct file *file, /* we've determined that we can make the mapping, now translate what we * now know into VMA flags */ - vm_flags = determine_vm_flags(file, prot, flags, capabilities); + vm_flags |= determine_vm_flags(file, prot, flags, capabilities); /* we're going to need to record the mapping */ region = kmem_cache_zalloc(vm_region_jar, GFP_KERNEL); @@ -1763,7 +1763,7 @@ EXPORT_SYMBOL_GPL(access_process_vm); * * Check the shared mappings on an inode on behalf of a shrinking truncate to * make sure that any outstanding VMAs aren't broken and then shrink the - * vm_regions that extend beyond so that do_mmap() doesn't + * vm_regions that extend beyond so that do_mmap_pgoff() doesn't * automatically grant mappings that are too large. */ int nommu_shrink_inode_mappings(struct inode *inode, size_t size, diff --git a/mm/shmem.c b/mm/shmem.c index 8e2b35ba93ad..54464c1e7414 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -4248,7 +4248,7 @@ EXPORT_SYMBOL_GPL(shmem_file_setup_with_mnt); /** * shmem_zero_setup - setup a shared anonymous mapping - * @vma: the vma to be mmapped is prepared by do_mmap + * @vma: the vma to be mmapped is prepared by do_mmap_pgoff */ int shmem_zero_setup(struct vm_area_struct *vma) { diff --git a/mm/util.c b/mm/util.c index 5ef378a2a038..8d6280c05238 100644 --- a/mm/util.c +++ b/mm/util.c @@ -503,8 +503,8 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr, if (!ret) { if (mmap_write_lock_killable(mm)) return -EINTR; - ret = do_mmap(file, addr, len, prot, flag, pgoff, &populate, - &uf); + ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff, + &populate, &uf); mmap_write_unlock(mm); userfaultfd_unmap_complete(mm, &uf); if (populate) From patchwork Fri Sep 25 14:56:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11800003 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9A0D692C for ; Fri, 25 Sep 2020 14:58:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5A34C20715 for ; Fri, 25 Sep 2020 14:58:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5A34C20715 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 019B96B008A; Fri, 25 Sep 2020 10:57:31 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EE6DB6B008C; Fri, 25 Sep 2020 10:57:30 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DAA7B6B0092; Fri, 25 Sep 2020 10:57:30 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0158.hostedemail.com [216.40.44.158]) by kanga.kvack.org (Postfix) with ESMTP id C15686B008A for ; Fri, 25 Sep 2020 10:57:30 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 770774DCD for ; Fri, 25 Sep 2020 14:57:30 +0000 (UTC) X-FDA: 77301887460.17.vase68_4907d3127168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id 9AD34180D0181 for ; Fri, 25 Sep 2020 14:57:29 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30036:30045:30046:30051:30054:30055:30056:30062:30064:30067:30070:30090,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yf9fdtqq8i6odruyzg7sm5ofdx5op5rhpk5xgziry1ykh96zna4cnnbon5dfm.464zcku3cxrk38mk5eo79srps6z3ayefukn4hba38j1miib9xsuniudtjc8bhkp.h-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:22,LUA_SUMMARY:none X-HE-Tag: vase68_4907d3127168 X-Filterd-Recvd-Size: 13470 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:28 +0000 (UTC) IronPort-SDR: I/lF/m+tZC7mN1Q9XeKt8rlTA9U0W+qrrkMCA78DLDbjHgxVpXeGPWsf/EzN8fL0dhEQpRu25o bSpcePsuMoNg== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631934" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631934" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:27 -0700 IronPort-SDR: msXtuzfBON2qJ46oSbqaZi9uDzsmd+g3tRKjUSeiTEEQ4L10o/iYoI4E7idZ+k/cdBmKjmE/Ry a31lMwjAVr0A== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499224" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:27 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v13 20/26] x86/cet/shstk: User-mode shadow stack support Date: Fri, 25 Sep 2020 07:56:43 -0700 Message-Id: <20200925145649.5438-21-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch adds basic shadow stack enabling/disabling routines. A task's shadow stack is allocated from memory with VM_SHSTK flag and has a fixed size of min(RLIMIT_STACK, 4GB). Signed-off-by: Yu-cheng Yu --- v12: - Split cet_disable_free_shstk() into disable and free parts, because only the disable shadow stack arch_prctl() needs the disable part. Other cases need only the free part. In an exit syscall, there is no need to write CET MSRs. v11: - Modify alloc_shstk() to take flags and pass to do_mmap(). This is to be used by an arch_prctl() introduced later. v10: - Change no_cet_shstk to no_user_shstk. - Limit shadow stack size to 4 GB, and round_up to PAGE_SIZE. - Replace checking shstk_enabled with shstk_size being zero. - WARN_ON_ONCE() when vm_munmap() fails. v9: - Change cpu_feature_enabled() to static_cpu_has(). - Merge cet_disable_shstk to cet_disable_free_shstk. - Remove the empty slot at the top of the shadow stack, as it is not needed. - Move do_mmap_locked() to alloc_shstk(), which is a static function. v6: - Create a function do_mmap_locked() for shadow stack allocation. v2: - Change noshstk to no_cet_shstk. arch/x86/include/asm/cet.h | 28 ++++ arch/x86/include/asm/disabled-features.h | 8 +- arch/x86/include/asm/processor.h | 5 + arch/x86/kernel/Makefile | 2 + arch/x86/kernel/cet.c | 147 ++++++++++++++++++ arch/x86/kernel/cpu/common.c | 28 ++++ arch/x86/kernel/process.c | 1 + .../arch/x86/include/asm/disabled-features.h | 8 +- 8 files changed, 225 insertions(+), 2 deletions(-) create mode 100644 arch/x86/include/asm/cet.h create mode 100644 arch/x86/kernel/cet.c diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h new file mode 100644 index 000000000000..5750fbcbb952 --- /dev/null +++ b/arch/x86/include/asm/cet.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_CET_H +#define _ASM_X86_CET_H + +#ifndef __ASSEMBLY__ +#include + +struct task_struct; +/* + * Per-thread CET status + */ +struct cet_status { + unsigned long shstk_base; + unsigned long shstk_size; +}; + +#ifdef CONFIG_X86_CET +int cet_setup_shstk(void); +void cet_disable_shstk(void); +void cet_free_shstk(struct task_struct *p); +#else +static inline void cet_disable_shstk(void) {} +static inline void cet_free_shstk(struct task_struct *p) {} +#endif + +#endif /* __ASSEMBLY__ */ + +#endif /* _ASM_X86_CET_H */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index 4ea8584682f9..edac76ed75e7 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -56,6 +56,12 @@ # define DISABLE_PTI (1 << (X86_FEATURE_PTI & 31)) #endif +#ifdef CONFIG_X86_SHADOW_STACK_USER +#define DISABLE_SHSTK 0 +#else +#define DISABLE_SHSTK (1<<(X86_FEATURE_SHSTK & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -75,7 +81,7 @@ #define DISABLED_MASK13 0 #define DISABLED_MASK14 0 #define DISABLED_MASK15 0 -#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP) +#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP|DISABLE_SHSTK) #define DISABLED_MASK17 0 #define DISABLED_MASK18 0 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 01acbd63cad8..139cb99c7076 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -27,6 +27,7 @@ struct vm86; #include #include #include +#include #include #include @@ -542,6 +543,10 @@ struct thread_struct { unsigned int sig_on_uaccess_err:1; +#ifdef CONFIG_X86_CET + struct cet_status cet; +#endif + /* Floating point and extended processor state */ struct fpu fpu; /* diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index e77261db2391..1fb85595afa7 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -145,6 +145,8 @@ obj-$(CONFIG_UNWINDER_ORC) += unwind_orc.o obj-$(CONFIG_UNWINDER_FRAME_POINTER) += unwind_frame.o obj-$(CONFIG_UNWINDER_GUESS) += unwind_guess.o +obj-$(CONFIG_X86_CET) += cet.o + ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c new file mode 100644 index 000000000000..f8b0a077594f --- /dev/null +++ b/arch/x86/kernel/cet.c @@ -0,0 +1,147 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * cet.c - Control-flow Enforcement (CET) + * + * Copyright (c) 2019, Intel Corporation. + * Yu-cheng Yu + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static void start_update_msrs(void) +{ + fpregs_lock(); + if (test_thread_flag(TIF_NEED_FPU_LOAD)) + __fpregs_load_activate(); +} + +static void end_update_msrs(void) +{ + fpregs_unlock(); +} + +static unsigned long cet_get_shstk_addr(void) +{ + struct fpu *fpu = ¤t->thread.fpu; + unsigned long ssp = 0; + + fpregs_lock(); + + if (fpregs_state_valid(fpu, smp_processor_id())) { + rdmsrl(MSR_IA32_PL3_SSP, ssp); + } else { + struct cet_user_state *p; + + p = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER); + if (p) + ssp = p->user_ssp; + } + + fpregs_unlock(); + return ssp; +} + +static unsigned long alloc_shstk(unsigned long size, int flags) +{ + struct mm_struct *mm = current->mm; + unsigned long addr, populate; + + /* VM_SHSTK requires MAP_ANONYMOUS, MAP_PRIVATE */ + flags |= MAP_ANONYMOUS | MAP_PRIVATE; + + mmap_write_lock(mm); + addr = do_mmap(NULL, 0, size, PROT_READ, flags, VM_SHSTK, 0, + &populate, NULL); + mmap_write_unlock(mm); + + if (populate) + mm_populate(addr, populate); + + return addr; +} + +int cet_setup_shstk(void) +{ + unsigned long addr, size; + struct cet_status *cet = ¤t->thread.cet; + + if (!static_cpu_has(X86_FEATURE_SHSTK)) + return -EOPNOTSUPP; + + size = round_up(min(rlimit(RLIMIT_STACK), 1UL << 32), PAGE_SIZE); + addr = alloc_shstk(size, 0); + + if (IS_ERR_VALUE(addr)) + return PTR_ERR((void *)addr); + + cet->shstk_base = addr; + cet->shstk_size = size; + + start_update_msrs(); + wrmsrl(MSR_IA32_PL3_SSP, addr + size); + wrmsrl(MSR_IA32_U_CET, CET_SHSTK_EN); + end_update_msrs(); + return 0; +} + +void cet_disable_shstk(void) +{ + struct cet_status *cet = ¤t->thread.cet; + u64 msr_val; + + if (!static_cpu_has(X86_FEATURE_SHSTK) || + !cet->shstk_size || !cet->shstk_base) + return; + + start_update_msrs(); + rdmsrl(MSR_IA32_U_CET, msr_val); + wrmsrl(MSR_IA32_U_CET, msr_val & ~CET_SHSTK_EN); + wrmsrl(MSR_IA32_PL3_SSP, 0); + end_update_msrs(); + + cet_free_shstk(current); +} + +void cet_free_shstk(struct task_struct *tsk) +{ + struct cet_status *cet = &tsk->thread.cet; + + if (!static_cpu_has(X86_FEATURE_SHSTK) || + !cet->shstk_size || !cet->shstk_base) + return; + + if (!tsk->mm || (tsk->mm != current->mm)) + return; + + while (1) { + int r; + + r = vm_munmap(cet->shstk_base, cet->shstk_size); + + /* + * Retry if mmap_lock is not available. + */ + if (r == -EINTR) { + cond_resched(); + continue; + } + + WARN_ON_ONCE(r); + break; + } + + cet->shstk_base = 0; + cet->shstk_size = 0; +} diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index c5d6f17d9b9d..084480f975aa 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -56,6 +56,7 @@ #include #include #include +#include #include #include "cpu.h" @@ -509,6 +510,32 @@ static __init int setup_disable_pku(char *arg) __setup("nopku", setup_disable_pku); #endif /* CONFIG_X86_64 */ +static __always_inline void setup_cet(struct cpuinfo_x86 *c) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK) && + !cpu_feature_enabled(X86_FEATURE_IBT)) + return; + + cr4_set_bits(X86_CR4_CET); +} + +#ifdef CONFIG_X86_SHADOW_STACK_USER +static __init int setup_disable_shstk(char *s) +{ + /* require an exact match without trailing characters */ + if (s[0] != '\0') + return 0; + + if (!boot_cpu_has(X86_FEATURE_SHSTK)) + return 1; + + setup_clear_cpu_cap(X86_FEATURE_SHSTK); + pr_info("x86: 'no_user_shstk' specified, disabling user Shadow Stack\n"); + return 1; +} +__setup("no_user_shstk", setup_disable_shstk); +#endif + /* * Some CPU features depend on higher CPUID levels, which may not always * be available due to CPUID level capping or broken virtualization @@ -1544,6 +1571,7 @@ static void identify_cpu(struct cpuinfo_x86 *c) x86_init_rdrand(c); setup_pku(c); + setup_cet(c); /* * Clear/Set all flags overridden by options, need do it diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index ba4593a913fa..ff3b44d6740b 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -43,6 +43,7 @@ #include #include #include +#include #include "process.h" diff --git a/tools/arch/x86/include/asm/disabled-features.h b/tools/arch/x86/include/asm/disabled-features.h index 4ea8584682f9..edac76ed75e7 100644 --- a/tools/arch/x86/include/asm/disabled-features.h +++ b/tools/arch/x86/include/asm/disabled-features.h @@ -56,6 +56,12 @@ # define DISABLE_PTI (1 << (X86_FEATURE_PTI & 31)) #endif +#ifdef CONFIG_X86_SHADOW_STACK_USER +#define DISABLE_SHSTK 0 +#else +#define DISABLE_SHSTK (1<<(X86_FEATURE_SHSTK & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -75,7 +81,7 @@ #define DISABLED_MASK13 0 #define DISABLED_MASK14 0 #define DISABLED_MASK15 0 -#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP) +#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP|DISABLE_SHSTK) #define DISABLED_MASK17 0 #define DISABLED_MASK18 0 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19) From patchwork Fri Sep 25 14:56:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11800001 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 79B4C92C for ; Fri, 25 Sep 2020 14:58:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 26E2D20715 for ; Fri, 25 Sep 2020 14:58:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 26E2D20715 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 59ADB6B0089; Fri, 25 Sep 2020 10:57:30 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 486E66B008A; Fri, 25 Sep 2020 10:57:30 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2D7CD6B008C; Fri, 25 Sep 2020 10:57:30 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0182.hostedemail.com [216.40.44.182]) by kanga.kvack.org (Postfix) with ESMTP id F1F016B0089 for ; Fri, 25 Sep 2020 10:57:29 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id AF749180AD802 for ; Fri, 25 Sep 2020 14:57:29 +0000 (UTC) X-FDA: 77301887418.15.leaf84_0c0770827168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin15.hostedemail.com (Postfix) with ESMTP id 88D3A1814B0C7 for ; Fri, 25 Sep 2020 14:57:29 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30012:30045:30046:30051:30054:30056:30064:30069:30070,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04ygeuhinuwxx4hkffxa9r4ndg4cuocxwafh5wzghexaco6pofpucpg67pjsg7f.xzfmdppqcmaq1cf96bubffzrakjiodn5rkesncens4hy61gk8wxfbu3j7858x65.o-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:22,LUA_SUMMARY:none X-HE-Tag: leaf84_0c0770827168 X-Filterd-Recvd-Size: 18803 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:28 +0000 (UTC) IronPort-SDR: q6wAVH3NDzn32zOeDAhejMSqZcMNrN/95yWfxy1PmQ97nWjJ5Q7i4o4KbBpbpotHWX9OZnN3jH x31x8ggRKvlA== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631938" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631938" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:28 -0700 IronPort-SDR: eseqoOG3AgJOxzWo1BKSawx7BfN0V1HAw175u0yVdfSH40WWj2HVSLoj5ROef7JLaU5y01Xv9C /Smn6xARoRkw== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499229" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:27 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v13 21/26] x86/cet/shstk: Handle signals for shadow stack Date: Fri, 25 Sep 2020 07:56:44 -0700 Message-Id: <20200925145649.5438-22-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To deliver a signal, create a shadow stack restore token and put a restore token and the signal restorer address on the shadow stack. For sigreturn, verify the token and restore the shadow stack pointer. Introduce WRUSS, which is a kernel-mode instruction but writes directly to user shadow stack. It is used to construct the user signal stack as described above. Introduce a signal context extension struct 'sc_ext', which is used to save shadow stack restore token address and WAIT_ENDBR status. WAIT_ENDBR will be introduced later in the Indirect Branch Tracking (IBT) series, but add that into sc_ext now to keep the struct stable in case the IBT series is applied later. Signed-off-by: Yu-cheng Yu --- v13: - Check restore token pointing to a valid address. v10: - Combine with WRUSS instruction patch, since it is used only here. - Revise signal restore code to the latest supervisor states handling. Move shadow stack restore token checking out of the fast path. v9: - Update CET MSR access according to XSAVES supervisor state changes. - Add 'wait_endbr' to struct 'sc_ext'. - Update and simplify signal frame allocation, setup, and restoration. - Update commit log text. v2: - Move CET status from sigcontext to a separate struct sc_ext, which is located above the fpstate on the signal frame. - Add a restore token for sigreturn address. arch/x86/ia32/ia32_signal.c | 17 +++ arch/x86/include/asm/cet.h | 8 ++ arch/x86/include/asm/fpu/internal.h | 10 ++ arch/x86/include/asm/special_insns.h | 32 ++++++ arch/x86/include/uapi/asm/sigcontext.h | 9 ++ arch/x86/kernel/cet.c | 152 +++++++++++++++++++++++++ arch/x86/kernel/fpu/signal.c | 100 ++++++++++++++++ arch/x86/kernel/signal.c | 10 ++ 8 files changed, 338 insertions(+) diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c index 81cf22398cd1..cec9cf0a00cf 100644 --- a/arch/x86/ia32/ia32_signal.c +++ b/arch/x86/ia32/ia32_signal.c @@ -35,6 +35,7 @@ #include #include #include +#include static inline void reload_segments(struct sigcontext_32 *sc) { @@ -205,6 +206,7 @@ static void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs, void __user **fpstate) { unsigned long sp, fx_aligned, math_size; + void __user *restorer = NULL; /* Default to using normal stack */ sp = regs->sp; @@ -218,8 +220,23 @@ static void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs, ksig->ka.sa.sa_restorer) sp = (unsigned long) ksig->ka.sa.sa_restorer; + if (ksig->ka.sa.sa_flags & SA_RESTORER) { + restorer = ksig->ka.sa.sa_restorer; + } else if (current->mm->context.vdso) { + if (ksig->ka.sa.sa_flags & SA_SIGINFO) + restorer = current->mm->context.vdso + + vdso_image_32.sym___kernel_rt_sigreturn; + else + restorer = current->mm->context.vdso + + vdso_image_32.sym___kernel_sigreturn; + } + sp = fpu__alloc_mathframe(sp, 1, &fx_aligned, &math_size); *fpstate = (struct _fpstate_32 __user *) sp; + + if (save_cet_to_sigframe(1, *fpstate, (unsigned long)restorer)) + return (void __user *) -1L; + if (copy_fpstate_to_sigframe(*fpstate, (void __user *)fx_aligned, math_size) < 0) return (void __user *) -1L; diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index 5750fbcbb952..73435856ce54 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -6,6 +6,8 @@ #include struct task_struct; +struct sc_ext; + /* * Per-thread CET status */ @@ -18,9 +20,15 @@ struct cet_status { int cet_setup_shstk(void); void cet_disable_shstk(void); void cet_free_shstk(struct task_struct *p); +int cet_verify_rstor_token(bool ia32, unsigned long ssp, unsigned long *new_ssp); +void cet_restore_signal(struct sc_ext *sc); +int cet_setup_signal(bool ia32, unsigned long rstor, struct sc_ext *sc); #else static inline void cet_disable_shstk(void) {} static inline void cet_free_shstk(struct task_struct *p) {} +static inline void cet_restore_signal(struct sc_ext *sc) { return; } +static inline int cet_setup_signal(bool ia32, unsigned long rstor, + struct sc_ext *sc) { return -EINVAL; } #endif #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h index 0a460f2a3f90..ec900a7a4786 100644 --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -442,6 +442,16 @@ static inline void copy_kernel_to_fpregs(union fpregs_state *fpstate) __copy_kernel_to_fpregs(fpstate, -1); } +#ifdef CONFIG_X86_CET +extern int save_cet_to_sigframe(int ia32, void __user *fp, + unsigned long restorer); +#else +static inline int save_cet_to_sigframe(int ia32, void __user *fp, + unsigned long restorer) +{ + return 0; +} +#endif extern int copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size); /* diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 59a3e13204c3..ee86c19da532 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -232,6 +232,38 @@ static inline void clwb(volatile void *__p) : [pax] "a" (p)); } +#ifdef CONFIG_X86_CET +#if defined(CONFIG_IA32_EMULATION) || defined(CONFIG_X86_X32) +static inline int write_user_shstk_32(unsigned long addr, unsigned int val) +{ + asm_volatile_goto("1: wrussd %1, (%0)\n" + _ASM_EXTABLE(1b, %l[fail]) + :: "r" (addr), "r" (val) + :: fail); + return 0; +fail: + return -EPERM; +} +#else +static inline int write_user_shstk_32(unsigned long addr, unsigned int val) +{ + WARN_ONCE(1, "%s used but not supported.\n", __func__); + return -EFAULT; +} +#endif + +static inline int write_user_shstk_64(unsigned long addr, unsigned long val) +{ + asm_volatile_goto("1: wrussq %1, (%0)\n" + _ASM_EXTABLE(1b, %l[fail]) + :: "r" (addr), "r" (val) + :: fail); + return 0; +fail: + return -EPERM; +} +#endif /* CONFIG_X86_CET */ + #define nop() asm volatile ("nop") #endif /* __KERNEL__ */ diff --git a/arch/x86/include/uapi/asm/sigcontext.h b/arch/x86/include/uapi/asm/sigcontext.h index 844d60eb1882..cf2d55db3be4 100644 --- a/arch/x86/include/uapi/asm/sigcontext.h +++ b/arch/x86/include/uapi/asm/sigcontext.h @@ -196,6 +196,15 @@ struct _xstate { /* New processor state extensions go here: */ }; +/* + * Located at the end of sigcontext->fpstate, aligned to 8. + */ +struct sc_ext { + unsigned long total_size; + unsigned long ssp; + unsigned long wait_endbr; +}; + /* * The 32-bit signal frame: */ diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c index f8b0a077594f..51ddd17aee8f 100644 --- a/arch/x86/kernel/cet.c +++ b/arch/x86/kernel/cet.c @@ -19,6 +19,8 @@ #include #include #include +#include +#include static void start_update_msrs(void) { @@ -72,6 +74,80 @@ static unsigned long alloc_shstk(unsigned long size, int flags) return addr; } +#define TOKEN_MODE_MASK 3UL +#define TOKEN_MODE_64 1UL +#define IS_TOKEN_64(token) ((token & TOKEN_MODE_MASK) == TOKEN_MODE_64) +#define IS_TOKEN_32(token) ((token & TOKEN_MODE_MASK) == 0) + +/* + * Verify the restore token at the address of 'ssp' is + * valid and then set shadow stack pointer according to the + * token. + */ +int cet_verify_rstor_token(bool ia32, unsigned long ssp, + unsigned long *new_ssp) +{ + unsigned long token; + + *new_ssp = 0; + + if (!IS_ALIGNED(ssp, 8)) + return -EINVAL; + + if (get_user(token, (unsigned long __user *)ssp)) + return -EFAULT; + + /* Is 64-bit mode flag correct? */ + if (!ia32 && !IS_TOKEN_64(token)) + return -EINVAL; + else if (ia32 && !IS_TOKEN_32(token)) + return -EINVAL; + + token &= ~TOKEN_MODE_MASK; + + /* + * Restore address properly aligned? + */ + if ((!ia32 && !IS_ALIGNED(token, 8)) || !IS_ALIGNED(token, 4)) + return -EINVAL; + + /* + * Token was placed properly? + */ + if (((ALIGN_DOWN(token, 8) - 8) != ssp) || (token >= TASK_SIZE_MAX)) + return -EINVAL; + + *new_ssp = token; + return 0; +} + +/* + * Create a restore token on the shadow stack. + * A token is always 8-byte and aligned to 8. + */ +static int create_rstor_token(bool ia32, unsigned long ssp, + unsigned long *new_ssp) +{ + unsigned long addr; + + *new_ssp = 0; + + if ((!ia32 && !IS_ALIGNED(ssp, 8)) || !IS_ALIGNED(ssp, 4)) + return -EINVAL; + + addr = ALIGN_DOWN(ssp, 8) - 8; + + /* Is the token for 64-bit? */ + if (!ia32) + ssp |= TOKEN_MODE_64; + + if (write_user_shstk_64(addr, ssp)) + return -EFAULT; + + *new_ssp = addr; + return 0; +} + int cet_setup_shstk(void) { unsigned long addr, size; @@ -145,3 +221,79 @@ void cet_free_shstk(struct task_struct *tsk) cet->shstk_base = 0; cet->shstk_size = 0; } + +/* + * Called from __fpu__restore_sig() and XSAVES buffer is protected by + * set_thread_flag(TIF_NEED_FPU_LOAD) in the slow path. + */ +void cet_restore_signal(struct sc_ext *sc_ext) +{ + struct cet_user_state *cet_user_state; + struct cet_status *cet = ¤t->thread.cet; + u64 msr_val = 0; + + if (!static_cpu_has(X86_FEATURE_SHSTK)) + return; + + cet_user_state = get_xsave_addr(¤t->thread.fpu.state.xsave, + XFEATURE_CET_USER); + if (!cet_user_state) + return; + + if (cet->shstk_size) { + if (test_thread_flag(TIF_NEED_FPU_LOAD)) + cet_user_state->user_ssp = sc_ext->ssp; + else + wrmsrl(MSR_IA32_PL3_SSP, sc_ext->ssp); + + msr_val |= CET_SHSTK_EN; + } + + if (test_thread_flag(TIF_NEED_FPU_LOAD)) + cet_user_state->user_cet = msr_val; + else + wrmsrl(MSR_IA32_U_CET, msr_val); +} + +/* + * Setup the shadow stack for the signal handler: first, + * create a restore token to keep track of the current ssp, + * and then the return address of the signal handler. + */ +int cet_setup_signal(bool ia32, unsigned long rstor_addr, struct sc_ext *sc_ext) +{ + struct cet_status *cet = ¤t->thread.cet; + unsigned long ssp = 0, new_ssp = 0; + int err; + + if (cet->shstk_size) { + if (!rstor_addr) + return -EINVAL; + + ssp = cet_get_shstk_addr(); + err = create_rstor_token(ia32, ssp, &new_ssp); + if (err) + return err; + + if (ia32) { + ssp = new_ssp - sizeof(u32); + err = write_user_shstk_32(ssp, (unsigned int)rstor_addr); + } else { + ssp = new_ssp - sizeof(u64); + err = write_user_shstk_64(ssp, rstor_addr); + } + + if (err) + return err; + + sc_ext->ssp = new_ssp; + } + + if (ssp) { + start_update_msrs(); + wrmsrl(MSR_IA32_PL3_SSP, ssp); + end_update_msrs(); + } + + return 0; +} diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c index a4ec65317a7f..c0c2141cb4b3 100644 --- a/arch/x86/kernel/fpu/signal.c +++ b/arch/x86/kernel/fpu/signal.c @@ -52,6 +52,74 @@ static inline int check_for_xstate(struct fxregs_state __user *buf, return 0; } +#ifdef CONFIG_X86_CET +int save_cet_to_sigframe(int ia32, void __user *fp, unsigned long restorer) +{ + int err = 0; + + if (!current->thread.cet.shstk_size) + return 0; + + if (fp) { + struct sc_ext ext = {0, 0, 0}; + + err = cet_setup_signal(ia32, restorer, &ext); + if (!err) { + void __user *p = fp; + + ext.total_size = sizeof(ext); + + if (ia32) + p += sizeof(struct fregs_state); + + p += fpu_user_xstate_size + FP_XSTATE_MAGIC2_SIZE; + p = (void __user *)ALIGN((unsigned long)p, 8); + + if (copy_to_user(p, &ext, sizeof(ext))) + return -EFAULT; + } + } + + return err; +} + +static int get_cet_from_sigframe(int ia32, void __user *fp, struct sc_ext *ext) +{ + int err = 0; + + memset(ext, 0, sizeof(*ext)); + + if (!current->thread.cet.shstk_size) + return 0; + + if (fp) { + void __user *p = fp; + + if (ia32) + p += sizeof(struct fregs_state); + + p += fpu_user_xstate_size + FP_XSTATE_MAGIC2_SIZE; + p = (void __user *)ALIGN((unsigned long)p, 8); + + if (copy_from_user(ext, p, sizeof(*ext))) + return -EFAULT; + + if (ext->total_size != sizeof(*ext)) + return -EFAULT; + + if (current->thread.cet.shstk_size) + err = cet_verify_rstor_token(ia32, ext->ssp, &ext->ssp); + } + + return err; +} +#else +static int get_cet_from_sigframe(int ia32, void __user *fp, struct sc_ext *ext) +{ + return 0; +} +#endif + /* * Signal frame handlers. */ @@ -295,6 +363,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) struct task_struct *tsk = current; struct fpu *fpu = &tsk->thread.fpu; struct user_i387_ia32_struct env; + struct sc_ext sc_ext; u64 user_xfeatures = 0; int fx_only = 0; int ret = 0; @@ -335,6 +404,10 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) if ((unsigned long)buf_fx % 64) fx_only = 1; + ret = get_cet_from_sigframe(ia32_fxstate, buf, &sc_ext); + if (ret) + return ret; + if (!ia32_fxstate) { /* * Attempt to restore the FPU registers directly from user @@ -349,6 +422,8 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) pagefault_enable(); if (!ret) { + cet_restore_signal(&sc_ext); + /* * Restore supervisor states: previous context switch * etc has done XSAVES and saved the supervisor states @@ -423,6 +498,8 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) if (unlikely(init_bv)) copy_kernel_to_xregs(&init_fpstate.xsave, init_bv); + cet_restore_signal(&sc_ext); + /* * Restore previously saved supervisor xstates along with * copied-in user xstates. @@ -491,12 +568,35 @@ int fpu__restore_sig(void __user *buf, int ia32_frame) return __fpu__restore_sig(buf, buf_fx, size); } +#ifdef CONFIG_X86_CET +static unsigned long fpu__alloc_sigcontext_ext(unsigned long sp) +{ + struct cet_status *cet = ¤t->thread.cet; + + /* + * sigcontext_ext is at: fpu + fpu_user_xstate_size + + * FP_XSTATE_MAGIC2_SIZE, then aligned to 8. + */ + if (cet->shstk_size) + sp -= (sizeof(struct sc_ext) + 8); + + return sp; +} +#else +static unsigned long fpu__alloc_sigcontext_ext(unsigned long sp) +{ + return sp; +} +#endif + unsigned long fpu__alloc_mathframe(unsigned long sp, int ia32_frame, unsigned long *buf_fx, unsigned long *size) { unsigned long frame_size = xstate_sigframe_size(); + sp = fpu__alloc_sigcontext_ext(sp); + *buf_fx = sp = round_down(sp - frame_size, 64); if (ia32_frame && use_fxsr()) { frame_size += sizeof(struct fregs_state); diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index be0d7d4152ec..f39335ed4f7e 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -46,6 +46,7 @@ #include #include #include +#include #ifdef CONFIG_X86_64 /* @@ -239,6 +240,9 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, unsigned long buf_fx = 0; int onsigstack = on_sig_stack(sp); int ret; +#ifdef CONFIG_X86_64 + void __user *restorer = NULL; +#endif /* redzone */ if (IS_ENABLED(CONFIG_X86_64)) @@ -270,6 +274,12 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, if (onsigstack && !likely(on_sig_stack(sp))) return (void __user *)-1L; +#ifdef CONFIG_X86_64 + if (ka->sa.sa_flags & SA_RESTORER) + restorer = ka->sa.sa_restorer; + ret = save_cet_to_sigframe(0, *fpstate, (unsigned long)restorer); +#endif + /* save i387 and extended state */ ret = copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size); if (ret < 0) From patchwork Fri Sep 25 14:56:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11800005 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5413592C for ; Fri, 25 Sep 2020 14:58:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1DBDF20715 for ; Fri, 25 Sep 2020 14:58:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1DBDF20715 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6327E6B008C; Fri, 25 Sep 2020 10:57:31 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5BAB76B0092; Fri, 25 Sep 2020 10:57:31 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3E83C8E0001; Fri, 25 Sep 2020 10:57:31 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0228.hostedemail.com [216.40.44.228]) by kanga.kvack.org (Postfix) with ESMTP id 1B45A6B0092 for ; Fri, 25 Sep 2020 10:57:31 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D19911E06 for ; Fri, 25 Sep 2020 14:57:30 +0000 (UTC) X-FDA: 77301887460.21.can28_110d2a427168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 92AB1180442C3 for ; Fri, 25 Sep 2020 14:57:30 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30056:30064,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yfby65kimquxgqaw8cm3gq3g1tyopptqj71fci5rndinhfazixr8uujj5bsj8.hqyfbrfqyi9epxcm4q687oxwe7zqnq4arow7hbr9j4tead6epgwue647bzgjggd.g-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: can28_110d2a427168 X-Filterd-Recvd-Size: 3714 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:29 +0000 (UTC) IronPort-SDR: YPdU9bVgHzvqxLg9FyO3/D+gEWsVfMxfOY0VDGCGWKASEJXRvCviMTjCv7thw7zc2NkIj9+3DU Led+U1dd/AJA== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631942" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631942" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:29 -0700 IronPort-SDR: wyuahIEijPJ91wODD6m5UQ83vVG6YLKB3SlUDyIKUy/rt2rVVc24L31uwZxt3e/oViHVhdXOaz PvWBULul6jgw== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499239" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:28 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v13 22/26] binfmt_elf: Define GNU_PROPERTY_X86_FEATURE_1_AND properties Date: Fri, 25 Sep 2020 07:56:45 -0700 Message-Id: <20200925145649.5438-23-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: An ELF file's .note.gnu.property indicates architecture features of the file.. Introduce feature definitions for Shadow Stack and Indirect Branch Tracking. Signed-off-by: Yu-cheng Yu --- include/uapi/linux/elf.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index 22220945a5fd..ca5875f384f6 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -454,4 +454,13 @@ typedef struct elf64_note { /* Bits for GNU_PROPERTY_AARCH64_FEATURE_1_BTI */ #define GNU_PROPERTY_AARCH64_FEATURE_1_BTI (1U << 0) +/* .note.gnu.property types for x86: */ +#define GNU_PROPERTY_X86_FEATURE_1_AND 0xc0000002 + +/* Bits for GNU_PROPERTY_X86_FEATURE_1_AND */ +#define GNU_PROPERTY_X86_FEATURE_1_IBT 0x00000001 +#define GNU_PROPERTY_X86_FEATURE_1_SHSTK 0x00000002 +#define GNU_PROPERTY_X86_FEATURE_1_INVAL ~(GNU_PROPERTY_X86_FEATURE_1_IBT | \ + GNU_PROPERTY_X86_FEATURE_1_SHSTK) + #endif /* _UAPI_LINUX_ELF_H */ From patchwork Fri Sep 25 14:56:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11800013 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E771892C for ; Fri, 25 Sep 2020 14:58:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B166321D42 for ; Fri, 25 Sep 2020 14:58:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B166321D42 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8C9D66B0096; Fri, 25 Sep 2020 10:57:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 880016B0098; Fri, 25 Sep 2020 10:57:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6A40D6B0099; Fri, 25 Sep 2020 10:57:34 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0206.hostedemail.com [216.40.44.206]) by kanga.kvack.org (Postfix) with ESMTP id 50ED46B0096 for ; Fri, 25 Sep 2020 10:57:34 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 115FB181AE860 for ; Fri, 25 Sep 2020 14:57:34 +0000 (UTC) X-FDA: 77301887628.12.step17_500984927168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id 3B2E218021E90 for ; Fri, 25 Sep 2020 14:57:31 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30045:30054:30056:30064:30070,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yr8y9r4atune1oh19ftgi7fqnhcopyxe81airppzycpssqx86g9qyjw89z3fc.suga1gn7mn68cokn8ye5t8hp8g3nxkgqamidis5knqu8ew73pdaqbrixs4mtrg8.4-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: step17_500984927168 X-Filterd-Recvd-Size: 7745 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:30 +0000 (UTC) IronPort-SDR: nlBeW0uHz2YUsdlEpH51z2xUq2U2xw5j71FpSk8DcUfaVcHk6vUKAYVFV0YQdkprZUKLepjbAG 4Gc3MXsZkxUQ== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631945" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631945" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:29 -0700 IronPort-SDR: bSQkNwg/M3GuYYQd9O7ebA2i89GkMfG1wRgW73KN6WwqXTz/kO1pCB8EOW7i/8OCAhDFwN+19c 8xtXIWfRFu4A== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499245" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:29 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu , Mark Brown , Catalin Marinas Subject: [PATCH v13 23/26] ELF: Introduce arch_setup_elf_property() Date: Fri, 25 Sep 2020 07:56:46 -0700 Message-Id: <20200925145649.5438-24-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: An ELF file's .note.gnu.property indicates arch features supported by the file. These features are extracted by arch_parse_elf_property() and stored in 'arch_elf_state'. Introduce arch_setup_elf_property() for enabling such features. The first use-case of this function is shadow stack. ARM64 is the other arch that has ARCH_USER_GNU_PROPERTY and arch_parse_elf_ property(). Add arch_setup_elf_property() for it. Signed-off-by: Yu-cheng Yu Cc: Mark Brown Cc: Catalin Marinas Cc: Dave Martin --- v11: - Combine three patches of arch_setup_elf_property() into one. - Add empty arch_setup_elf_property() for arm64. v9: - Change cpu_feature_enabled() to static_cpu_has(). arch/arm64/include/asm/elf.h | 5 +++++ arch/x86/Kconfig | 2 ++ arch/x86/include/asm/elf.h | 13 +++++++++++++ arch/x86/kernel/process_64.c | 32 ++++++++++++++++++++++++++++++++ fs/binfmt_elf.c | 4 ++++ include/linux/elf.h | 6 ++++++ 6 files changed, 62 insertions(+) diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h index 8d1c8dcb87fd..d37bc7915935 100644 --- a/arch/arm64/include/asm/elf.h +++ b/arch/arm64/include/asm/elf.h @@ -281,6 +281,11 @@ static inline int arch_parse_elf_property(u32 type, const void *data, return 0; } +static inline int arch_setup_elf_property(struct arch_elf_state *arch) +{ + return 0; +} + static inline int arch_elf_pt_proc(void *ehdr, void *phdr, struct file *f, bool is_interp, struct arch_elf_state *state) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 7578327226e3..4b28a0ce4594 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1950,6 +1950,8 @@ config X86_SHADOW_STACK_USER select X86_CET select ARCH_MAYBE_MKWRITE select ARCH_HAS_SHADOW_STACK + select ARCH_USE_GNU_PROPERTY + select ARCH_BINFMT_ELF_STATE help Shadow Stacks provides protection against program stack corruption. It's a hardware feature. This only matters diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h index b9a5d488f1a5..0e1be2a13359 100644 --- a/arch/x86/include/asm/elf.h +++ b/arch/x86/include/asm/elf.h @@ -385,6 +385,19 @@ extern int compat_arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp); #define compat_arch_setup_additional_pages compat_arch_setup_additional_pages +#ifdef CONFIG_ARCH_BINFMT_ELF_STATE +struct arch_elf_state { + unsigned int gnu_property; +}; + +#define INIT_ARCH_ELF_STATE { \ + .gnu_property = 0, \ +} + +#define arch_elf_pt_proc(ehdr, phdr, elf, interp, state) (0) +#define arch_check_elf(ehdr, interp, interp_ehdr, state) (0) +#endif + /* Do not change the values. See get_align_mask() */ enum align_flags { ALIGN_VA_32 = BIT(0), diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 9afefe325acb..8725e67bcd44 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -837,3 +837,35 @@ unsigned long KSTK_ESP(struct task_struct *task) { return task_pt_regs(task)->sp; } + +#ifdef CONFIG_ARCH_USE_GNU_PROPERTY +int arch_parse_elf_property(u32 type, const void *data, size_t datasz, + bool compat, struct arch_elf_state *state) +{ + if (type != GNU_PROPERTY_X86_FEATURE_1_AND) + return 0; + + if (datasz != sizeof(unsigned int)) + return -ENOEXEC; + + state->gnu_property = *(unsigned int *)data; + return 0; +} + +int arch_setup_elf_property(struct arch_elf_state *state) +{ + int r = 0; + + if (!IS_ENABLED(CONFIG_X86_CET)) + return r; + + memset(¤t->thread.cet, 0, sizeof(struct cet_status)); + + if (static_cpu_has(X86_FEATURE_SHSTK)) { + if (state->gnu_property & GNU_PROPERTY_X86_FEATURE_1_SHSTK) + r = cet_setup_shstk(); + } + + return r; +} +#endif diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 13d053982dd7..2b4cfc256895 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -1217,6 +1217,10 @@ static int load_elf_binary(struct linux_binprm *bprm) set_binfmt(&elf_format); + retval = arch_setup_elf_property(&arch_state); + if (retval < 0) + goto out; + #ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGES retval = arch_setup_additional_pages(bprm, !!interpreter); if (retval < 0) diff --git a/include/linux/elf.h b/include/linux/elf.h index 5d5b0321da0b..4827695ca415 100644 --- a/include/linux/elf.h +++ b/include/linux/elf.h @@ -82,9 +82,15 @@ static inline int arch_parse_elf_property(u32 type, const void *data, { return 0; } + +static inline int arch_setup_elf_property(struct arch_elf_state *arch) +{ + return 0; +} #else extern int arch_parse_elf_property(u32 type, const void *data, size_t datasz, bool compat, struct arch_elf_state *arch); +extern int arch_setup_elf_property(struct arch_elf_state *arch); #endif #ifdef CONFIG_ARCH_HAVE_ELF_PROT From patchwork Fri Sep 25 14:56:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11800007 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EAC2592C for ; Fri, 25 Sep 2020 14:58:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B01FF2311D for ; Fri, 25 Sep 2020 14:58:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B01FF2311D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4D4FD6B0092; Fri, 25 Sep 2020 10:57:32 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 461536B0093; Fri, 25 Sep 2020 10:57:32 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2853E6B0095; Fri, 25 Sep 2020 10:57:32 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0149.hostedemail.com [216.40.44.149]) by kanga.kvack.org (Postfix) with ESMTP id 0E24A6B0092 for ; Fri, 25 Sep 2020 10:57:32 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id CC53A8249980 for ; Fri, 25 Sep 2020 14:57:31 +0000 (UTC) X-FDA: 77301887502.18.hand51_18176bd27168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id A49CE101AC99A for ; Fri, 25 Sep 2020 14:57:31 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yf13ujtru94f6ujqadn7q1bahmyocdysrsg3gua7ds56hmhtqsxiqa3saoah3.i11t6hnnzb7cr74so336cau6sxf8h8mny11fn8jgd5mknqr686iaese86c68q19.6-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: hand51_18176bd27168 X-Filterd-Recvd-Size: 7392 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:30 +0000 (UTC) IronPort-SDR: h8NGXHM0dhXwMNtPO/6yq3bTF5Rezs8vtaghTXLDTF/GdLwfP7Tio/I1N37f8NIC+Z8K/TBBGu AR00p81SMWyg== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631950" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631950" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:30 -0700 IronPort-SDR: MR86CdIhQuuL8dldgPZoImheQXatUVenuFAE0lvdaMHi1mvUBOAYCUhMxdCZC9QRFUlYKgMsxi RqWyu8YvGGjg== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499254" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:29 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v13 24/26] x86/cet/shstk: Handle thread shadow stack Date: Fri, 25 Sep 2020 07:56:47 -0700 Message-Id: <20200925145649.5438-25-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The kernel allocates (and frees on thread exit) a new shadow stack for a pthread child. It is possible for the kernel to complete the clone syscall and set the child's shadow stack pointer to NULL and let the child thread allocate a shadow stack for itself. There are two issues in this approach: It is not compatible with existing code that does inline syscall and it cannot handle signals before the child can successfully allocate a shadow stack. A 64-bit shadow stack has a size of min(RLIMIT_STACK, 4 GB). A compat-mode thread shadow stack has a size of 1/4 min(RLIMIT_STACK, 4 GB). This allows more threads to run in a 32-bit address space. Signed-off-by: Yu-cheng Yu --- v10: - Limit shadow stack size to 4 GB. arch/x86/include/asm/cet.h | 3 ++ arch/x86/include/asm/mmu_context.h | 3 ++ arch/x86/kernel/cet.c | 44 ++++++++++++++++++++++++++++++ arch/x86/kernel/process.c | 7 +++++ 4 files changed, 57 insertions(+) diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index 73435856ce54..ec4b5e62d0ce 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -18,12 +18,15 @@ struct cet_status { #ifdef CONFIG_X86_CET int cet_setup_shstk(void); +int cet_setup_thread_shstk(struct task_struct *p, unsigned long clone_flags); void cet_disable_shstk(void); void cet_free_shstk(struct task_struct *p); int cet_verify_rstor_token(bool ia32, unsigned long ssp, unsigned long *new_ssp); void cet_restore_signal(struct sc_ext *sc); int cet_setup_signal(bool ia32, unsigned long rstor, struct sc_ext *sc); #else +static inline int cet_setup_thread_shstk(struct task_struct *p, + unsigned long clone_flags) { return 0; } static inline void cet_disable_shstk(void) {} static inline void cet_free_shstk(struct task_struct *p) {} static inline void cet_restore_signal(struct sc_ext *sc) { return; } diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index d98016b83755..ceb593e405e1 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -11,6 +11,7 @@ #include #include +#include #include extern atomic64_t last_mm_ctx_id; @@ -142,6 +143,8 @@ do { \ #else #define deactivate_mm(tsk, mm) \ do { \ + if (!tsk->vfork_done) \ + cet_free_shstk(tsk); \ load_gs_index(0); \ loadsegment(fs, 0); \ } while (0) diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c index 51ddd17aee8f..b285c726bb88 100644 --- a/arch/x86/kernel/cet.c +++ b/arch/x86/kernel/cet.c @@ -172,6 +172,50 @@ int cet_setup_shstk(void) return 0; } +int cet_setup_thread_shstk(struct task_struct *tsk, unsigned long clone_flags) +{ + unsigned long addr, size; + struct cet_user_state *state; + struct cet_status *cet = &tsk->thread.cet; + + if (!cet->shstk_size) + return 0; + + if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM) + return 0; + + state = get_xsave_addr(&tsk->thread.fpu.state.xsave, + XFEATURE_CET_USER); + + if (!state) + return -EINVAL; + + /* Cap shadow stack size to 4 GB */ + size = min(rlimit(RLIMIT_STACK), 1UL << 32); + + /* + * Compat-mode pthreads share a limited address space. + * If each function call takes an average of four slots + * stack space, we need 1/4 of stack size for shadow stack. + */ + if (in_compat_syscall()) + size /= 4; + size = round_up(size, PAGE_SIZE); + addr = alloc_shstk(size, 0); + + if (IS_ERR_VALUE(addr)) { + cet->shstk_base = 0; + cet->shstk_size = 0; + return PTR_ERR((void *)addr); + } + + fpu__prepare_write(&tsk->thread.fpu); + state->user_ssp = (u64)(addr + size); + cet->shstk_base = addr; + cet->shstk_size = size; + return 0; +} + void cet_disable_shstk(void) { struct cet_status *cet = ¤t->thread.cet; diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index ff3b44d6740b..67632ba893b7 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -110,6 +110,7 @@ void exit_thread(struct task_struct *tsk) free_vm86(t); + cet_free_shstk(tsk); fpu__drop(fpu); } @@ -182,6 +183,12 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg, if (clone_flags & CLONE_SETTLS) ret = set_new_tls(p, tls); +#ifdef CONFIG_X86_64 + /* Allocate a new shadow stack for pthread */ + if (!ret) + ret = cet_setup_thread_shstk(p, clone_flags); +#endif + if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP))) io_bitmap_share(p); From patchwork Fri Sep 25 14:56:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11800009 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8E0A4112C for ; Fri, 25 Sep 2020 14:58:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 560DD23600 for ; Fri, 25 Sep 2020 14:58:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 560DD23600 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C79FA6B0093; Fri, 25 Sep 2020 10:57:32 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C006A8E0001; Fri, 25 Sep 2020 10:57:32 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A25E16B0096; Fri, 25 Sep 2020 10:57:32 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0139.hostedemail.com [216.40.44.139]) by kanga.kvack.org (Postfix) with ESMTP id 8C7956B0093 for ; Fri, 25 Sep 2020 10:57:32 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 552474DB3 for ; Fri, 25 Sep 2020 14:57:32 +0000 (UTC) X-FDA: 77301887544.25.bed14_5505bf927168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id 304B71804E3A1 for ; Fri, 25 Sep 2020 14:57:32 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30046:30051:30054:30056:30064:30069:30070:30075,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04y8f5wbrn6a6y7fi7i7awwicr37dopsj7uft1a78bhiy471bg3f8381eww7ap5.t8kaoguu6qdoz1psgwz5rjmx79wzumr16ktrgt9u744m8hhofa5f1iffqtkqjkc.q-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: bed14_5505bf927168 X-Filterd-Recvd-Size: 8936 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:31 +0000 (UTC) IronPort-SDR: ZTd71+8wCj5jJ1YwCz1k+cgc+0GtefyN5Zw6hu39N9m9S5Tqo3/gCiqqb20Kc4uOT6SqMu4Ws5 zz76PIkUaxbA== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631953" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631953" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:31 -0700 IronPort-SDR: L7HFsBcj4pDbbE+PSkFJM62Z8GSbhLHCG6CyQXs6ixSVVHSLKDU1PlhQubn6Q+ExIXMEowCuQ9 jdQyAixhZq6g== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499266" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:30 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v13 25/26] x86/cet/shstk: Add arch_prctl functions for shadow stack Date: Fri, 25 Sep 2020 07:56:48 -0700 Message-Id: <20200925145649.5438-26-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: arch_prctl(ARCH_X86_CET_STATUS, u64 *args) Get CET feature status. The parameter 'args' is a pointer to a user buffer. The kernel returns the following information: *args = shadow stack/IBT status *(args + 1) = shadow stack base address *(args + 2) = shadow stack size arch_prctl(ARCH_X86_CET_DISABLE, unsigned int features) Disable CET features specified in 'features'. Return -EPERM if CET is locked. arch_prctl(ARCH_X86_CET_LOCK) Lock in CET features. Also change do_arch_prctl_common()'s parameter 'cpuid_enabled' to 'arg2', as it is now also passed to prctl_cet(). Signed-off-by: Yu-cheng Yu --- v12: - Remove ARCH_X86_CET_MMAP_SHSTK. v11: - Check input for invalid features. - Fix prctl_cet() return values. - Change ARCH_X86_CET_ALLOC_SHSTK to ARCH_X86_CET_MMAP_SHSTK to take MAP_32BIT, MAP_POPULATE as inputs. v10: - Verify CET is enabled before handling arch_prctl. - Change input parameters from unsigned long to u64, to make it clear they are 64-bit. arch/x86/include/asm/cet.h | 3 ++ arch/x86/include/uapi/asm/prctl.h | 4 ++ arch/x86/kernel/Makefile | 2 +- arch/x86/kernel/cet_prctl.c | 68 +++++++++++++++++++++++++ arch/x86/kernel/process.c | 6 +-- tools/arch/x86/include/uapi/asm/prctl.h | 4 ++ 6 files changed, 83 insertions(+), 4 deletions(-) create mode 100644 arch/x86/kernel/cet_prctl.c diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index ec4b5e62d0ce..16870e5bc8eb 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -14,9 +14,11 @@ struct sc_ext; struct cet_status { unsigned long shstk_base; unsigned long shstk_size; + unsigned int locked:1; }; #ifdef CONFIG_X86_CET +int prctl_cet(int option, u64 arg2); int cet_setup_shstk(void); int cet_setup_thread_shstk(struct task_struct *p, unsigned long clone_flags); void cet_disable_shstk(void); @@ -25,6 +27,7 @@ int cet_verify_rstor_token(bool ia32, unsigned long ssp, unsigned long *new_ssp) void cet_restore_signal(struct sc_ext *sc); int cet_setup_signal(bool ia32, unsigned long rstor, struct sc_ext *sc); #else +static inline int prctl_cet(int option, u64 arg2) { return -EINVAL; } static inline int cet_setup_thread_shstk(struct task_struct *p, unsigned long clone_flags) { return 0; } static inline void cet_disable_shstk(void) {} diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 5a6aac9fa41f..9245bf629120 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -14,4 +14,8 @@ #define ARCH_MAP_VDSO_32 0x2002 #define ARCH_MAP_VDSO_64 0x2003 +#define ARCH_X86_CET_STATUS 0x3001 +#define ARCH_X86_CET_DISABLE 0x3002 +#define ARCH_X86_CET_LOCK 0x3003 + #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 1fb85595afa7..321ef52e4470 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -145,7 +145,7 @@ obj-$(CONFIG_UNWINDER_ORC) += unwind_orc.o obj-$(CONFIG_UNWINDER_FRAME_POINTER) += unwind_frame.o obj-$(CONFIG_UNWINDER_GUESS) += unwind_guess.o -obj-$(CONFIG_X86_CET) += cet.o +obj-$(CONFIG_X86_CET) += cet.o cet_prctl.o ### # 64 bit specific files diff --git a/arch/x86/kernel/cet_prctl.c b/arch/x86/kernel/cet_prctl.c new file mode 100644 index 000000000000..bd5ad11763e4 --- /dev/null +++ b/arch/x86/kernel/cet_prctl.c @@ -0,0 +1,68 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* See Documentation/x86/intel_cet.rst. */ + +static int copy_status_to_user(struct cet_status *cet, u64 arg2) +{ + u64 buf[3] = {0, 0, 0}; + + if (cet->shstk_size) { + buf[0] |= GNU_PROPERTY_X86_FEATURE_1_SHSTK; + buf[1] = (u64)cet->shstk_base; + buf[2] = (u64)cet->shstk_size; + } + + return copy_to_user((u64 __user *)arg2, buf, sizeof(buf)); +} + +int prctl_cet(int option, u64 arg2) +{ + struct cet_status *cet; + unsigned int features; + + /* + * GLIBC's ENOTSUPP == EOPNOTSUPP == 95, and it does not recognize + * the kernel's ENOTSUPP (524). So return EOPNOTSUPP here. + */ + if (!IS_ENABLED(CONFIG_X86_CET)) + return -EOPNOTSUPP; + + cet = ¤t->thread.cet; + + if (option == ARCH_X86_CET_STATUS) + return copy_status_to_user(cet, arg2); + + if (!static_cpu_has(X86_FEATURE_SHSTK)) + return -EOPNOTSUPP; + + switch (option) { + case ARCH_X86_CET_DISABLE: + if (cet->locked) + return -EPERM; + + features = (unsigned int)arg2; + + if (features & GNU_PROPERTY_X86_FEATURE_1_INVAL) + return -EINVAL; + if (features & GNU_PROPERTY_X86_FEATURE_1_SHSTK) + cet_disable_shstk(); + return 0; + + case ARCH_X86_CET_LOCK: + cet->locked = 1; + return 0; + + default: + return -ENOSYS; + } +} diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 67632ba893b7..33cb6da22ef0 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -977,14 +977,14 @@ unsigned long get_wchan(struct task_struct *p) } long do_arch_prctl_common(struct task_struct *task, int option, - unsigned long cpuid_enabled) + unsigned long arg2) { switch (option) { case ARCH_GET_CPUID: return get_cpuid_mode(); case ARCH_SET_CPUID: - return set_cpuid_mode(task, cpuid_enabled); + return set_cpuid_mode(task, arg2); } - return -EINVAL; + return prctl_cet(option, arg2); } diff --git a/tools/arch/x86/include/uapi/asm/prctl.h b/tools/arch/x86/include/uapi/asm/prctl.h index 5a6aac9fa41f..9245bf629120 100644 --- a/tools/arch/x86/include/uapi/asm/prctl.h +++ b/tools/arch/x86/include/uapi/asm/prctl.h @@ -14,4 +14,8 @@ #define ARCH_MAP_VDSO_32 0x2002 #define ARCH_MAP_VDSO_64 0x2003 +#define ARCH_X86_CET_STATUS 0x3001 +#define ARCH_X86_CET_DISABLE 0x3002 +#define ARCH_X86_CET_LOCK 0x3003 + #endif /* _ASM_X86_PRCTL_H */ From patchwork Fri Sep 25 14:56:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11800011 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3AEDD112C for ; Fri, 25 Sep 2020 14:58:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E44672311D for ; Fri, 25 Sep 2020 14:58:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E44672311D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5BCFB6B0095; Fri, 25 Sep 2020 10:57:33 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 595086B0096; Fri, 25 Sep 2020 10:57:33 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 45AF66B0098; Fri, 25 Sep 2020 10:57:33 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0002.hostedemail.com [216.40.44.2]) by kanga.kvack.org (Postfix) with ESMTP id 2AC946B0095 for ; Fri, 25 Sep 2020 10:57:33 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E2D70180AD807 for ; Fri, 25 Sep 2020 14:57:32 +0000 (UTC) X-FDA: 77301887544.19.magic12_1a0b9e327168 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin19.hostedemail.com (Postfix) with ESMTP id BD01B1ACEA2 for ; Fri, 25 Sep 2020 14:57:32 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30036:30054:30056:30062:30064:30070,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04y837gksysfzprznh7asaniqn6tqypytehwhirty54y5fwe549q5enq9r3yqfk.kcute5ry3fk99kttb1mo84sz1zdzzrq4f5kd6orn1gkwznb3dsaybry689siw8q.1-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: magic12_1a0b9e327168 X-Filterd-Recvd-Size: 10828 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Sep 2020 14:57:32 +0000 (UTC) IronPort-SDR: /GmILFPLg2z89GUrUSPDjZXmpWGokQmaoXmoL/2ibJR6AF5Aw1h5GfbsXw4RChB9pXXgoaSzLq w+19ResNpu/g== X-IronPort-AV: E=McAfee;i="6000,8403,9755"; a="161631956" X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="161631956" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:31 -0700 IronPort-SDR: 1Hb9j2LyV9QwAOiAqYpd5eKTmtyJUtzDG4aN/F1NWOJroAq0GcWjvgA6Xm99qHKtgu0W8dCXc/ a4ykrY2aCpTw== X-IronPort-AV: E=Sophos;i="5.77,302,1596524400"; d="scan'208";a="487499277" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2020 07:57:31 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v13 26/26] mm: Introduce PROT_SHSTK for shadow stack Date: Fri, 25 Sep 2020 07:56:49 -0700 Message-Id: <20200925145649.5438-27-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200925145649.5438-1-yu-cheng.yu@intel.com> References: <20200925145649.5438-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There are three possible options to create a shadow stack allocation API: an arch_prctl, a new syscall, or adding PROT_SHSTK to mmap()/mprotect(). Each has its advantages and compromises. An arch_prctl() is the least intrusive. However, the existing x86 arch_prctl() takes only two parameters. Multiple parameters must be passed in a memory buffer. There is a proposal to pass more parameters in registers [1], but no active discussion on that. A new syscall minimizes compatibility issues and offers an extensible frame work to other architectures, but this will likely result in some overlap of mmap()/mprotect(). The introduction of PROT_SHSTK to mmap()/mprotect() takes advantage of existing APIs. The x86-specific PROT_SHSTK is translated to VM_SHSTK and a shadow stack mapping is created without reinventing the wheel. There are potential pitfalls though. The most obvious one would be using this as a bypass to shadow stack protection. However, the attacker would have to get to the syscall first. Since arch_calc_vm_prot_bits() is modified, I have moved arch_vm_get_page _prot() and arch_calc_vm_prot_bits() to x86/include/asm/mman.h. This will be more consistent with other architectures. [1] https://lore.kernel.org/lkml/20200828121624.108243-1-hjl.tools@gmail.com/ Signed-off-by: Yu-cheng Yu --- v13: - Add VM_SHSTK to VM_ARCH_CLEAR. arch/x86/include/asm/mman.h | 81 ++++++++++++++++++++++++++++++++ arch/x86/include/uapi/asm/mman.h | 28 ++--------- include/linux/mm.h | 1 + include/linux/mman.h | 8 ++++ mm/mmap.c | 8 +++- mm/mprotect.c | 4 ++ 6 files changed, 105 insertions(+), 25 deletions(-) create mode 100644 arch/x86/include/asm/mman.h diff --git a/arch/x86/include/asm/mman.h b/arch/x86/include/asm/mman.h new file mode 100644 index 000000000000..5cd6040d5c10 --- /dev/null +++ b/arch/x86/include/asm/mman.h @@ -0,0 +1,81 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_MMAN_H +#define _ASM_X86_MMAN_H + +#include +#include +#include + +#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS +/* + * Take the 4 protection key bits out of the vma->vm_flags + * value and turn them in to the bits that we can put in + * to a pte. + * + * Only override these if Protection Keys are available + * (which is only on 64-bit). + */ +#define arch_vm_get_page_prot(vm_flags) __pgprot( \ + ((vm_flags) & VM_PKEY_BIT0 ? _PAGE_PKEY_BIT0 : 0) | \ + ((vm_flags) & VM_PKEY_BIT1 ? _PAGE_PKEY_BIT1 : 0) | \ + ((vm_flags) & VM_PKEY_BIT2 ? _PAGE_PKEY_BIT2 : 0) | \ + ((vm_flags) & VM_PKEY_BIT3 ? _PAGE_PKEY_BIT3 : 0)) + +#define pkey_vm_prot_bits(prot, key) ( \ + ((key) & 0x1 ? VM_PKEY_BIT0 : 0) | \ + ((key) & 0x2 ? VM_PKEY_BIT1 : 0) | \ + ((key) & 0x4 ? VM_PKEY_BIT2 : 0) | \ + ((key) & 0x8 ? VM_PKEY_BIT3 : 0)) +#else +#define pkey_vm_prot_bits(prot, key) (0) +#endif + +static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot, + unsigned long pkey) +{ + unsigned long vm_prot_bits = pkey_vm_prot_bits(prot, pkey); + + if (!(prot & PROT_WRITE) && (prot & PROT_SHSTK)) + vm_prot_bits |= VM_SHSTK; + + return vm_prot_bits; +} +#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey) + +static inline bool arch_validate_prot(unsigned long prot, unsigned long addr) +{ + unsigned long supported = PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM; + + if (IS_ENABLED(CONFIG_X86_SHADOW_STACK_USER) && + static_cpu_has(X86_FEATURE_SHSTK) && (prot & PROT_SHSTK)) { + + supported |= PROT_SHSTK; + + /* + * A shadow stack mapping is indirectly writable by only + * the CALL and WRUSS instructions, but not other write + * instructions). PROT_SHSTK and PROT_WRITE are mutually + * exclusive. + */ + supported &= ~PROT_WRITE; + } + + return (prot & ~supported) == 0; +} +#define arch_validate_prot arch_validate_prot + +static inline bool arch_vma_can_mprot(struct vm_area_struct *vma, + unsigned long prot) +{ + bool can_mprot; + + /* + * Function call stack should not be backed by a file or shared. + */ + can_mprot = !(prot & PROT_SHSTK) || + !(vma->vm_file || (vma->vm_flags & VM_SHARED)); + return can_mprot; +} +#define arch_vma_can_mprot arch_vma_can_mprot + +#endif /* _ASM_X86_MMAN_H */ diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h index d4a8d0424bfb..39bb7db344a6 100644 --- a/arch/x86/include/uapi/asm/mman.h +++ b/arch/x86/include/uapi/asm/mman.h @@ -1,31 +1,11 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ -#ifndef _ASM_X86_MMAN_H -#define _ASM_X86_MMAN_H +#ifndef _UAPI_ASM_X86_MMAN_H +#define _UAPI_ASM_X86_MMAN_H #define MAP_32BIT 0x40 /* only give out 32bit addresses */ -#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS -/* - * Take the 4 protection key bits out of the vma->vm_flags - * value and turn them in to the bits that we can put in - * to a pte. - * - * Only override these if Protection Keys are available - * (which is only on 64-bit). - */ -#define arch_vm_get_page_prot(vm_flags) __pgprot( \ - ((vm_flags) & VM_PKEY_BIT0 ? _PAGE_PKEY_BIT0 : 0) | \ - ((vm_flags) & VM_PKEY_BIT1 ? _PAGE_PKEY_BIT1 : 0) | \ - ((vm_flags) & VM_PKEY_BIT2 ? _PAGE_PKEY_BIT2 : 0) | \ - ((vm_flags) & VM_PKEY_BIT3 ? _PAGE_PKEY_BIT3 : 0)) - -#define arch_calc_vm_prot_bits(prot, key) ( \ - ((key) & 0x1 ? VM_PKEY_BIT0 : 0) | \ - ((key) & 0x2 ? VM_PKEY_BIT1 : 0) | \ - ((key) & 0x4 ? VM_PKEY_BIT2 : 0) | \ - ((key) & 0x8 ? VM_PKEY_BIT3 : 0)) -#endif +#define PROT_SHSTK 0x10 /* shadow stack pages */ #include -#endif /* _ASM_X86_MMAN_H */ +#endif /* _UAPI_ASM_X86_MMAN_H */ diff --git a/include/linux/mm.h b/include/linux/mm.h index 9b6a0f22cd89..da4f7d3a14b7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -334,6 +334,7 @@ extern unsigned int kobjsize(const void *objp); #if defined(CONFIG_X86) # define VM_PAT VM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ +# define VM_ARCH_CLEAR VM_SHSTK #elif defined(CONFIG_PPC) # define VM_SAO VM_ARCH_1 /* Strong Access Ordering (powerpc) */ #elif defined(CONFIG_PARISC) diff --git a/include/linux/mman.h b/include/linux/mman.h index 6f34c33075f9..4d776adb0fdf 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -103,6 +103,14 @@ static inline bool arch_validate_prot(unsigned long prot, unsigned long addr) #define arch_validate_prot arch_validate_prot #endif +#ifndef arch_vma_can_mprot +/* + * Allow architectures to check if the vma can support the new + * protection. + */ +#define arch_vma_can_mprot(vma, prot) true +#endif + /* * Optimisation macro. It is equivalent to: * (x & bit1) ? bit2 : 0 diff --git a/mm/mmap.c b/mm/mmap.c index 81d4a00092da..4c403dfccff0 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1445,6 +1445,12 @@ unsigned long do_mmap(struct file *file, unsigned long addr, struct inode *inode = file_inode(file); unsigned long flags_mask; + /* + * Call stack cannot be backed by a file. + */ + if (vm_flags & VM_SHSTK) + return -EINVAL; + if (!file_mmap_ok(file, inode, pgoff, len)) return -EOVERFLOW; @@ -1509,7 +1515,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, } else { switch (flags & MAP_TYPE) { case MAP_SHARED: - if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP)) + if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP|VM_SHSTK)) return -EINVAL; /* * Ignore pgoff. diff --git a/mm/mprotect.c b/mm/mprotect.c index a8edbcb3af99..cf73b59a36da 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -553,6 +553,10 @@ static int do_mprotect_pkey(unsigned long start, size_t len, error = -ENOMEM; if (!vma) goto out; + if (!arch_vma_can_mprot(vma, prot)) { + error = -EINVAL; + goto out; + } prev = vma->vm_prev; if (unlikely(grows & PROT_GROWSDOWN)) { if (vma->vm_start >= end)