From patchwork Tue Nov 10 16:21:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894763 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DFD471391 for ; Tue, 10 Nov 2020 16:23:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8DEEC221FB for ; Tue, 10 Nov 2020 16:23:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8DEEC221FB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C21316B0071; Tue, 10 Nov 2020 11:22:55 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B57156B0073; Tue, 10 Nov 2020 11:22:55 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 910816B0078; Tue, 10 Nov 2020 11:22:55 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0042.hostedemail.com [216.40.44.42]) by kanga.kvack.org (Postfix) with ESMTP id 3901B6B0073 for ; Tue, 10 Nov 2020 11:22:55 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id C9D4F1EE6 for ; Tue, 10 Nov 2020 16:22:54 +0000 (UTC) X-FDA: 77469027468.25.stop25_5a1845c272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id 9F00D1804E3A0 for ; Tue, 10 Nov 2020 16:22:54 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30012:30046:30051:30054:30055:30056:30062:30064:30069:30070:30074:30075:30079:30089,0,RBL:134.134.136.31:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yfmonzhxfewjzdws9tw8k59491gypq8znokr5wqooi5rj41zt74z34ucaggu5.aipr3qarnqw5mggkqe7sam6kbnrd743mshwjon4qn37m5po5dcuxnrsjd1axbkr.s-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:71,LUA_SUMMARY:none X-HE-Tag: stop25_5a1845c272f6 X-Filterd-Recvd-Size: 9628 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:53 +0000 (UTC) IronPort-SDR: PPyBvo/kKVZ5uMF6SRIhfeAnwLiBXmCLxdhPOR7/c++tBfb3wTqTG0QItctSUKTEciCjv2+I9i gUC9axB6iR7g== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="231630830" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="231630830" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:48 -0800 IronPort-SDR: +oLTv+RDZMDNroCRhaVzneIvZYmXuZSt9kOywnfST2JPHbVBA9iF3X2nJEX/iSbY+xLeRb9uML U7ZK0amOHTqA== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572798" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:48 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v15 01/26] Documentation/x86: Add CET description Date: Tue, 10 Nov 2020 08:21:46 -0800 Message-Id: <20201110162211.9207-2-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Explain no_user_shstk/no_user_ibt kernel parameters, and introduce a new document on Control-flow Enforcement Technology (CET). Signed-off-by: Yu-cheng Yu --- .../admin-guide/kernel-parameters.txt | 6 + Documentation/x86/index.rst | 1 + Documentation/x86/intel_cet.rst | 138 ++++++++++++++++++ 3 files changed, 145 insertions(+) create mode 100644 Documentation/x86/intel_cet.rst diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 526d65d8573a..0ca8fb4d4d1e 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3193,6 +3193,12 @@ noexec=on: enable non-executable mappings (default) noexec=off: disable non-executable mappings + no_user_shstk [X86-64] Disable Shadow Stack for user-mode + applications + + no_user_ibt [X86-64] Disable Indirect Branch Tracking for user-mode + applications + nosmap [X86,PPC] Disable SMAP (Supervisor Mode Access Prevention) even if it is supported by processor. diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index b224d12c880b..e88dcea4300b 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -21,6 +21,7 @@ x86-specific Documentation tlb mtrr pat + intel_cet intel-iommu intel_txt amd-memory-encryption diff --git a/Documentation/x86/intel_cet.rst b/Documentation/x86/intel_cet.rst new file mode 100644 index 000000000000..4a81e7c9b29a --- /dev/null +++ b/Documentation/x86/intel_cet.rst @@ -0,0 +1,138 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================================= +Control-flow Enforcement Technology (CET) +========================================= + +[1] Overview +============ + +Control-flow Enforcement Technology (CET) is an Intel processor feature +that provides protection against return/jump-oriented programming (ROP) +attacks. It can be set up to protect both applications and the kernel. +Only user-mode protection is implemented in the 64-bit kernel, including +support for running legacy 32-bit applications. + +CET introduces Shadow Stack and Indirect Branch Tracking. Shadow stack is +a secondary stack allocated from memory and cannot be directly modified by +applications. When executing a CALL instruction, the processor pushes the +return address to both the normal stack and the shadow stack. Upon +function return, the processor pops the shadow stack copy and compares it +to the normal stack copy. If the two differ, the processor raises a +control-protection fault. Indirect branch tracking verifies indirect +CALL/JMP targets are intended as marked by the compiler with 'ENDBR' +opcodes. + +There are two kernel configuration options: + + X86_SHADOW_STACK_USER, and + X86_BRANCH_TRACKING_USER. + +These need to be enabled to build a CET-enabled kernel, and Binutils v2.31 +and GCC v8.1 or later are required to build a CET kernel. To build a CET- +enabled application, GLIBC v2.28 or later is also required. + +There are two command-line options for disabling CET features:: + + no_user_shstk - disables user shadow stack, and + no_user_ibt - disables user indirect branch tracking. + +At run time, /proc/cpuinfo shows CET features if the processor supports +CET. + +[2] Application Enabling +======================== + +An application's CET capability is marked in its ELF header and can be +verified from the following command output, in the NT_GNU_PROPERTY_TYPE_0 +field: + + readelf -n | grep SHSTK + properties: x86 feature: IBT, SHSTK + +If an application supports CET and is statically linked, it will run with +CET protection. If the application needs any shared libraries, the loader +checks all dependencies and enables CET when all requirements are met. + +[3] Backward Compatibility +========================== + +GLIBC provides a few CET tunables via the GLIBC_TUNABLES environment +variable: + +GLIBC_TUNABLES=glibc.tune.hwcaps=-SHSTK,-IBT + Turn off SHSTK/IBT. + +GLIBC_TUNABLES=glibc.tune.x86_shstk= + This controls how dlopen() handles SHSTK legacy libraries:: + + on - continue with SHSTK enabled; + permissive - continue with SHSTK off. + +Details can be found in the GLIBC manual pages. + +[4] CET arch_prctl()'s +====================== + +Several arch_prctl()'s have been added for CET: + +arch_prctl(ARCH_X86_CET_STATUS, u64 *addr) + Return CET feature status. + + The parameter 'addr' is a pointer to a user buffer. + On returning to the caller, the kernel fills the following + information:: + + *addr = shadow stack/indirect branch tracking status + *(addr + 1) = shadow stack base address + *(addr + 2) = shadow stack size + +arch_prctl(ARCH_X86_CET_DISABLE, unsigned int features) + Disable shadow stack and/or indirect branch tracking as specified in + 'features'. Return -EPERM if CET is locked. + +arch_prctl(ARCH_X86_CET_LOCK) + Lock in all CET features. They cannot be turned off afterwards. + +Note: + There is no CET-enabling arch_prctl function. By design, CET is enabled + automatically if the binary and the system can support it. + +[5] The implementation of the Shadow Stack +========================================== + +Shadow Stack size +----------------- + +A task's shadow stack is allocated from memory to a fixed size of +MIN(RLIMIT_STACK, 4 GB). In other words, the shadow stack is allocated to +the maximum size of the normal stack, but capped to 4 GB. However, +a compat-mode application's address space is smaller, each of its thread's +shadow stack size is MIN(1/4 RLIMIT_STACK, 4 GB). + +Signal +------ + +The main program and its signal handlers use the same shadow stack. +Because the shadow stack stores only return addresses, a large shadow +stack covers the condition that both the program stack and the signal +alternate stack run out. + +The kernel creates a restore token for the shadow stack restoring address +and verifies that token when restoring from the signal handler. + +Fork +---- + +The shadow stack's vma has VM_SHSTK flag set; its PTEs are required to be +read-only and dirty. When a shadow stack PTE is not RO and dirty, a +shadow access triggers a page fault with the shadow stack access bit set +in the page fault error code. + +When a task forks a child, its shadow stack PTEs are copied and both the +parent's and the child's shadow stack PTEs are cleared of the dirty bit. +Upon the next shadow stack access, the resulting shadow stack page fault +is handled by page copy/re-use. + +When a pthread child is created, the kernel allocates a new shadow stack +for the new thread. From patchwork Tue Nov 10 16:21:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894761 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BE1251391 for ; Tue, 10 Nov 2020 16:22:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 703A421D46 for ; Tue, 10 Nov 2020 16:22:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 703A421D46 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 641976B0072; Tue, 10 Nov 2020 11:22:55 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 628DC6B0075; Tue, 10 Nov 2020 11:22:55 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 452546B0074; Tue, 10 Nov 2020 11:22:55 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0237.hostedemail.com [216.40.44.237]) by kanga.kvack.org (Postfix) with ESMTP id EFD466B0071 for ; Tue, 10 Nov 2020 11:22:54 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id C7212362A for ; Tue, 10 Nov 2020 16:22:53 +0000 (UTC) X-FDA: 77469027426.11.wish01_3300be6272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id 92246180F8B82 for ; Tue, 10 Nov 2020 16:22:53 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30054:30055:30056:30064,0,RBL:134.134.136.31:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yry3fa8uktqiqywn5hxb34g7qfwypor3jcwts86wdagtjgdango5hrzep3pq4.be7ffum88pq3dai91eqfndmo5re8e5x9uhddqeec5t47anh6hrjur99a4h1a7k8.6-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:72,LUA_SUMMARY:none X-HE-Tag: wish01_3300be6272f6 X-Filterd-Recvd-Size: 4986 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf30.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:52 +0000 (UTC) IronPort-SDR: UNEPLD+dzuPKvNhda9hj5d2+nc3keBILmlkyxVqs4ZowaGTzs5x7vBJGG6gUFSbX3GP1FsEi6L VFTLdg39KWmA== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="231630833" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="231630833" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:49 -0800 IronPort-SDR: RM6xXfN8oX3Dr4XV78qkjqOmoSj2pAboBFck1WoyyHk5TCpHDAdBoEP5uTKUNj1t5caH+27bwl pksp/Ww6S/oQ== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572804" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:48 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v15 02/26] x86/cpufeatures: Add CET CPU feature flags for Control-flow Enforcement Technology (CET) Date: Tue, 10 Nov 2020 08:21:47 -0800 Message-Id: <20201110162211.9207-3-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add CPU feature flags for Control-flow Enforcement Technology (CET). CPUID.(EAX=7,ECX=0):ECX[bit 7] Shadow stack CPUID.(EAX=7,ECX=0):EDX[bit 20] Indirect Branch Tracking Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- arch/x86/include/asm/cpufeatures.h | 2 ++ arch/x86/kernel/cpu/cpuid-deps.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index dad350d42ecf..c9f6d62da463 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -343,6 +343,7 @@ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ #define X86_FEATURE_WAITPKG (16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */ #define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ +#define X86_FEATURE_SHSTK (16*32+ 7) /* Shadow Stack */ #define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ #define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */ #define X86_FEATURE_VPCLMULQDQ (16*32+10) /* Carry-Less Multiplication Double Quadword */ @@ -374,6 +375,7 @@ #define X86_FEATURE_TSXLDTRK (18*32+16) /* TSX Suspend Load Address Tracking */ #define X86_FEATURE_PCONFIG (18*32+18) /* Intel PCONFIG */ #define X86_FEATURE_ARCH_LBR (18*32+19) /* Intel ARCH LBR */ +#define X86_FEATURE_IBT (18*32+20) /* Indirect Branch Tracking */ #define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */ #define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */ #define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */ diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c index d502241995a3..9a3971e2f98f 100644 --- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -71,6 +71,8 @@ static const struct cpuid_dep cpuid_deps[] = { { X86_FEATURE_AVX512_BF16, X86_FEATURE_AVX512VL }, { X86_FEATURE_ENQCMD, X86_FEATURE_XSAVES }, { X86_FEATURE_PER_THREAD_MBA, X86_FEATURE_MBA }, + { X86_FEATURE_SHSTK, X86_FEATURE_XSAVES }, + { X86_FEATURE_IBT, X86_FEATURE_XSAVES }, {} }; From patchwork Tue Nov 10 16:21:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894759 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 51FBC697 for ; Tue, 10 Nov 2020 16:22:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 01C4C221FF for ; Tue, 10 Nov 2020 16:22:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 01C4C221FF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 650036B006C; Tue, 10 Nov 2020 11:22:54 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5FD5A6B0071; Tue, 10 Nov 2020 11:22:54 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3DF7A6B0073; Tue, 10 Nov 2020 11:22:54 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0154.hostedemail.com [216.40.44.154]) by kanga.kvack.org (Postfix) with ESMTP id 040FB6B006C for ; Tue, 10 Nov 2020 11:22:53 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 95C568249980 for ; Tue, 10 Nov 2020 16:22:53 +0000 (UTC) X-FDA: 77469027426.08.floor62_510d240272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id 6FCDD1819E766 for ; Tue, 10 Nov 2020 16:22:53 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30034:30045:30051:30054:30056:30064:30075:30090,0,RBL:134.134.136.31:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yrxp8tr381nzyhqtro4eomxoandyc9aofpe1rxcfo8karwi1477sixckd6nmb.5ixruiqgqkz47x6coofxj6qtrqq31acagq478qgzzno8b1uoj5gpjmmzq1wa7qk.6-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:69,LUA_SUMMARY:none X-HE-Tag: floor62_510d240272f6 X-Filterd-Recvd-Size: 10828 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:52 +0000 (UTC) IronPort-SDR: fwLTZpnDUmW87oL6bFlp3aDJYFzjHr3QprHAG7I90z/vNGXtkwv2I7oafQoI2tx8YTZsVgcJc1 DaaKpPGS72OQ== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="231630835" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="231630835" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:49 -0800 IronPort-SDR: GBG9rTYQIG+vFbty6DyJ3gTkDV/BL1d4ijIjZi0OZnUwjpmfieQDI72hsqKVH4UFBhguR+2wzI dw45XWj54Lmg== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572810" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:49 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v15 03/26] x86/fpu/xstate: Introduce CET MSR XSAVES supervisor states Date: Tue, 10 Nov 2020 08:21:48 -0800 Message-Id: <20201110162211.9207-4-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Control-flow Enforcement Technology (CET) adds five MSRs. Introduce them and their XSAVES supervisor states: MSR_IA32_U_CET (user-mode CET settings), MSR_IA32_PL3_SSP (user-mode Shadow Stack pointer), MSR_IA32_PL0_SSP (kernel-mode Shadow Stack pointer), MSR_IA32_PL1_SSP (Privilege Level 1 Shadow Stack pointer), MSR_IA32_PL2_SSP (Privilege Level 2 Shadow Stack pointer). Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- arch/x86/include/asm/fpu/types.h | 23 +++++++++++++++++-- arch/x86/include/asm/fpu/xstate.h | 6 +++-- arch/x86/include/asm/msr-index.h | 20 +++++++++++++++++ arch/x86/include/uapi/asm/processor-flags.h | 2 ++ arch/x86/kernel/fpu/xstate.c | 25 ++++++++++++++++++--- 5 files changed, 69 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index f5a38a5f3ae1..035eb0ec665e 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -115,8 +115,8 @@ enum xfeature { XFEATURE_PT_UNIMPLEMENTED_SO_FAR, XFEATURE_PKRU, XFEATURE_PASID, - XFEATURE_RSRVD_COMP_11, - XFEATURE_RSRVD_COMP_12, + XFEATURE_CET_USER, + XFEATURE_CET_KERNEL, XFEATURE_RSRVD_COMP_13, XFEATURE_RSRVD_COMP_14, XFEATURE_LBR, @@ -135,6 +135,8 @@ enum xfeature { #define XFEATURE_MASK_PT (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR) #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU) #define XFEATURE_MASK_PASID (1 << XFEATURE_PASID) +#define XFEATURE_MASK_CET_USER (1 << XFEATURE_CET_USER) +#define XFEATURE_MASK_CET_KERNEL (1 << XFEATURE_CET_KERNEL) #define XFEATURE_MASK_LBR (1 << XFEATURE_LBR) #define XFEATURE_MASK_FPSSE (XFEATURE_MASK_FP | XFEATURE_MASK_SSE) @@ -237,6 +239,23 @@ struct pkru_state { u32 pad; } __packed; +/* + * State component 11 is Control-flow Enforcement user states + */ +struct cet_user_state { + u64 user_cet; /* user control-flow settings */ + u64 user_ssp; /* user shadow stack pointer */ +}; + +/* + * State component 12 is Control-flow Enforcement kernel states + */ +struct cet_kernel_state { + u64 kernel_ssp; /* kernel shadow stack */ + u64 pl1_ssp; /* privilege level 1 shadow stack */ + u64 pl2_ssp; /* privilege level 2 shadow stack */ +}; + /* * State component 15: Architectural LBR configuration state. * The size of Arch LBR state depends on the number of LBRs (lbr_depth). diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h index 47a92232d595..582f3575e0bd 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -35,7 +35,8 @@ XFEATURE_MASK_BNDCSR) /* All currently supported supervisor features */ -#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID) +#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \ + XFEATURE_MASK_CET_USER) /* * A supervisor state component may not always contain valuable information, @@ -62,7 +63,8 @@ * Unsupported supervisor features. When a supervisor feature in this mask is * supported in the future, move it to the supported supervisor feature mask. */ -#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT) +#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT | \ + XFEATURE_MASK_CET_KERNEL) /* All supervisor states including supported and unsupported states. */ #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \ diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 972a34d93505..6f05ab2a1fa4 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -922,4 +922,24 @@ #define MSR_VM_IGNNE 0xc0010115 #define MSR_VM_HSAVE_PA 0xc0010117 +/* Control-flow Enforcement Technology MSRs */ +#define MSR_IA32_U_CET 0x6a0 /* user mode cet setting */ +#define MSR_IA32_S_CET 0x6a2 /* kernel mode cet setting */ +#define MSR_IA32_PL0_SSP 0x6a4 /* kernel shstk pointer */ +#define MSR_IA32_PL1_SSP 0x6a5 /* ring-1 shstk pointer */ +#define MSR_IA32_PL2_SSP 0x6a6 /* ring-2 shstk pointer */ +#define MSR_IA32_PL3_SSP 0x6a7 /* user shstk pointer */ +#define MSR_IA32_INT_SSP_TAB 0x6a8 /* exception shstk table */ + +/* MSR_IA32_U_CET and MSR_IA32_S_CET bits */ +#define CET_SHSTK_EN BIT_ULL(0) +#define CET_WRSS_EN BIT_ULL(1) +#define CET_ENDBR_EN BIT_ULL(2) +#define CET_LEG_IW_EN BIT_ULL(3) +#define CET_NO_TRACK_EN BIT_ULL(4) +#define CET_SUPPRESS_DISABLE BIT_ULL(5) +#define CET_RESERVED (BIT_ULL(6) | BIT_ULL(7) | BIT_ULL(8) | BIT_ULL(9)) +#define CET_SUPPRESS BIT_ULL(10) +#define CET_WAIT_ENDBR BIT_ULL(11) + #endif /* _ASM_X86_MSR_INDEX_H */ diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h index bcba3c643e63..a8df907e8017 100644 --- a/arch/x86/include/uapi/asm/processor-flags.h +++ b/arch/x86/include/uapi/asm/processor-flags.h @@ -130,6 +130,8 @@ #define X86_CR4_SMAP _BITUL(X86_CR4_SMAP_BIT) #define X86_CR4_PKE_BIT 22 /* enable Protection Keys support */ #define X86_CR4_PKE _BITUL(X86_CR4_PKE_BIT) +#define X86_CR4_CET_BIT 23 /* enable Control-flow Enforcement */ +#define X86_CR4_CET _BITUL(X86_CR4_CET_BIT) /* * x86-64 Task Priority Register, CR8 diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 5d8047441a0a..9a4307227c1f 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -38,6 +38,8 @@ static const char *xfeature_names[] = "Processor Trace (unused)" , "Protection Keys User registers", "PASID state", + "Control-flow User registers" , + "Control-flow Kernel registers" , "unknown xstate feature" , }; @@ -53,6 +55,8 @@ static short xsave_cpuid_features[] __initdata = { X86_FEATURE_INTEL_PT, X86_FEATURE_PKU, X86_FEATURE_ENQCMD, + X86_FEATURE_SHSTK, /* XFEATURE_CET_USER */ + X86_FEATURE_SHSTK, /* XFEATURE_CET_KERNEL */ }; /* @@ -321,6 +325,8 @@ static void __init print_xstate_features(void) print_xstate_feature(XFEATURE_MASK_Hi16_ZMM); print_xstate_feature(XFEATURE_MASK_PKRU); print_xstate_feature(XFEATURE_MASK_PASID); + print_xstate_feature(XFEATURE_MASK_CET_USER); + print_xstate_feature(XFEATURE_MASK_CET_KERNEL); } /* @@ -596,6 +602,8 @@ static void check_xstate_against_struct(int nr) XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM, struct avx_512_hi16_state); XCHECK_SZ(sz, nr, XFEATURE_PKRU, struct pkru_state); XCHECK_SZ(sz, nr, XFEATURE_PASID, struct ia32_pasid_state); + XCHECK_SZ(sz, nr, XFEATURE_CET_USER, struct cet_user_state); + XCHECK_SZ(sz, nr, XFEATURE_CET_KERNEL, struct cet_kernel_state); /* * Make *SURE* to add any feature numbers in below if @@ -605,7 +613,7 @@ static void check_xstate_against_struct(int nr) if ((nr < XFEATURE_YMM) || (nr >= XFEATURE_MAX) || (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) || - ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_LBR))) { + ((nr >= XFEATURE_RSRVD_COMP_13) && (nr <= XFEATURE_LBR))) { WARN_ONCE(1, "no structure for xstate: %d\n", nr); XSTATE_WARN_ON(1); } @@ -835,8 +843,19 @@ void __init fpu__init_system_xstate(void) * Clear XSAVE features that are disabled in the normal CPUID. */ for (i = 0; i < ARRAY_SIZE(xsave_cpuid_features); i++) { - if (!boot_cpu_has(xsave_cpuid_features[i])) - xfeatures_mask_all &= ~BIT_ULL(i); + if (xsave_cpuid_features[i] == X86_FEATURE_SHSTK) { + /* + * X86_FEATURE_SHSTK and X86_FEATURE_IBT share + * same states, but can be enabled separately. + */ + if (!boot_cpu_has(X86_FEATURE_SHSTK) && + !boot_cpu_has(X86_FEATURE_IBT)) + xfeatures_mask_all &= ~BIT_ULL(i); + } else { + if ((xsave_cpuid_features[i] == -1) || + !boot_cpu_has(xsave_cpuid_features[i])) + xfeatures_mask_all &= ~BIT_ULL(i); + } } xfeatures_mask_all &= fpu__get_supported_xfeatures_mask(); From patchwork Tue Nov 10 16:21:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894765 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1E2B2697 for ; Tue, 10 Nov 2020 16:23:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D0B06221FF for ; Tue, 10 Nov 2020 16:23:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D0B06221FF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 010F76B0073; Tue, 10 Nov 2020 11:22:56 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EB79E6B007D; Tue, 10 Nov 2020 11:22:55 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AE4876B0074; Tue, 10 Nov 2020 11:22:55 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0061.hostedemail.com [216.40.44.61]) by kanga.kvack.org (Postfix) with ESMTP id 56B8B6B0071 for ; Tue, 10 Nov 2020 11:22:55 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 02758180ACF8B for ; Tue, 10 Nov 2020 16:22:55 +0000 (UTC) X-FDA: 77469027510.08.run56_54130a8272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id D5BAD1819E766 for ; Tue, 10 Nov 2020 16:22:54 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30051:30054:30056:30064:30079:30083,0,RBL:134.134.136.31:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yfgnqwsjyczed5o7dmge7xyg79pyp3snzaxyqawrno16w4dcdu1nzmo9r8mf5.qap8bc1o54hn8e5c1hi18ba8pbxy5amckdqg5dgsmhnska4t8a56jbmsoyc8owz.1-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:71,LUA_SUMMARY:none X-HE-Tag: run56_54130a8272f6 X-Filterd-Recvd-Size: 7813 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:53 +0000 (UTC) IronPort-SDR: LYk2kwvxnozDQGDBTZDpN1AS1rTO/YWZ9QgmC6crgQVnzWCPV8rRAty5OuyC/nCKSSYgxPcvx5 VYndQ9gKHtZg== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="231630838" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="231630838" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:49 -0800 IronPort-SDR: xtT7hdzmkedOKWsj7ZEsHnYAdqmoSa99WFTFeYJTfhO9qKMWlvIQ/VV5yALsnpnCeBadIYt/xL XD6e5unn8hfw== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572813" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:49 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v15 04/26] x86/cet: Add control-protection fault handler Date: Tue, 10 Nov 2020 08:21:49 -0800 Message-Id: <20201110162211.9207-5-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A control-protection fault is triggered when a control-flow transfer attempt violates Shadow Stack or Indirect Branch Tracking constraints. For example, the return address for a RET instruction differs from the copy on the Shadow Stack; or an indirect JMP instruction, without the NOTRACK prefix, arrives at a non-ENDBR opcode. The control-protection fault handler works in a similar way as the general protection fault handler. It provides the si_code SEGV_CPERR to the signal handler. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- arch/x86/include/asm/idtentry.h | 4 ++ arch/x86/kernel/idt.c | 4 ++ arch/x86/kernel/signal_compat.c | 2 +- arch/x86/kernel/traps.c | 59 ++++++++++++++++++++++++++++++ include/uapi/asm-generic/siginfo.h | 3 +- 5 files changed, 70 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h index b2442eb0ac2f..f519b8ce0273 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -577,6 +577,10 @@ DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_SS, exc_stack_segment); DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_GP, exc_general_protection); DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_AC, exc_alignment_check); +#ifdef CONFIG_X86_CET +DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP, exc_control_protection); +#endif + /* Raw exception entries which need extra work */ DECLARE_IDTENTRY_RAW(X86_TRAP_UD, exc_invalid_op); DECLARE_IDTENTRY_RAW(X86_TRAP_BP, exc_int3); diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c index ee1a283f8e96..e8166d9bbb10 100644 --- a/arch/x86/kernel/idt.c +++ b/arch/x86/kernel/idt.c @@ -105,6 +105,10 @@ static const __initconst struct idt_data def_idts[] = { #elif defined(CONFIG_X86_32) SYSG(IA32_SYSCALL_VECTOR, entry_INT80_32), #endif + +#ifdef CONFIG_X86_CET + INTG(X86_TRAP_CP, asm_exc_control_protection), +#endif }; /* diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c index a7f3e12cfbdb..c44d4bebea07 100644 --- a/arch/x86/kernel/signal_compat.c +++ b/arch/x86/kernel/signal_compat.c @@ -27,7 +27,7 @@ static inline void signal_compat_build_tests(void) */ BUILD_BUG_ON(NSIGILL != 11); BUILD_BUG_ON(NSIGFPE != 15); - BUILD_BUG_ON(NSIGSEGV != 9); + BUILD_BUG_ON(NSIGSEGV != 10); BUILD_BUG_ON(NSIGBUS != 5); BUILD_BUG_ON(NSIGTRAP != 5); BUILD_BUG_ON(NSIGCHLD != 6); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index e19df6cde35d..6c21c1e92605 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -598,6 +598,65 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection) cond_local_irq_disable(regs); } +#ifdef CONFIG_X86_CET +static const char * const control_protection_err[] = { + "unknown", + "near-ret", + "far-ret/iret", + "endbranch", + "rstorssp", + "setssbsy", +}; + +/* + * When a control protection exception occurs, send a signal + * to the responsible application. Currently, control + * protection is only enabled for the user mode. This + * exception should not come from the kernel mode. + */ +DEFINE_IDTENTRY_ERRORCODE(exc_control_protection) +{ + struct task_struct *tsk; + + if (notify_die(DIE_TRAP, "control protection fault", regs, + error_code, X86_TRAP_CP, SIGSEGV) == NOTIFY_STOP) + return; + cond_local_irq_enable(regs); + + if (!user_mode(regs)) + die("kernel control protection fault", regs, error_code); + + if (!static_cpu_has(X86_FEATURE_SHSTK) && + !static_cpu_has(X86_FEATURE_IBT)) + WARN_ONCE(1, "CET is disabled but got control protection fault\n"); + + tsk = current; + tsk->thread.error_code = error_code; + tsk->thread.trap_nr = X86_TRAP_CP; + + if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) && + printk_ratelimit()) { + unsigned int max_err; + unsigned long ssp; + + max_err = ARRAY_SIZE(control_protection_err) - 1; + if ((error_code < 0) || (error_code > max_err)) + error_code = 0; + rdmsrl(MSR_IA32_PL3_SSP, ssp); + pr_info("%s[%d] control protection ip:%lx sp:%lx ssp:%lx error:%lx(%s)", + tsk->comm, task_pid_nr(tsk), + regs->ip, regs->sp, ssp, error_code, + control_protection_err[error_code]); + print_vma_addr(KERN_CONT " in ", regs->ip); + pr_cont("\n"); + } + + force_sig_fault(SIGSEGV, SEGV_CPERR, + (void __user *)uprobe_get_trap_addr(regs)); + cond_local_irq_disable(regs); +} +#endif + static bool do_int3(struct pt_regs *regs) { int res; diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index 7aacf9389010..96b9647d14ae 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -231,7 +231,8 @@ typedef struct siginfo { #define SEGV_ADIPERR 7 /* Precise MCD exception */ #define SEGV_MTEAERR 8 /* Asynchronous ARM MTE error */ #define SEGV_MTESERR 9 /* Synchronous ARM MTE exception */ -#define NSIGSEGV 9 +#define SEGV_CPERR 10 /* Control protection fault */ +#define NSIGSEGV 10 /* * SIGBUS si_codes From patchwork Tue Nov 10 16:21:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894771 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ABFC01391 for ; Tue, 10 Nov 2020 16:23:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 747D820780 for ; Tue, 10 Nov 2020 16:23:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 747D820780 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D56646B0078; Tue, 10 Nov 2020 11:22:56 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BD34A6B0083; Tue, 10 Nov 2020 11:22:56 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 874E56B0080; Tue, 10 Nov 2020 11:22:56 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0049.hostedemail.com [216.40.44.49]) by kanga.kvack.org (Postfix) with ESMTP id 3AEE26B0075 for ; Tue, 10 Nov 2020 11:22:56 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 4D2FE362A for ; Tue, 10 Nov 2020 16:22:55 +0000 (UTC) X-FDA: 77469027510.15.fall68_3a0df45272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin15.hostedemail.com (Postfix) with ESMTP id 191451814B0C7 for ; Tue, 10 Nov 2020 16:22:55 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30051:30054:30055:30056:30064:30070,0,RBL:134.134.136.31:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yg9kar9ecaap1tp6nziguies3rcyc4g9yby3ewucn5hg6py76t7orgn3oj4j4.e61s8nazkatfqq5be6ckb39dr5sbgmyiojaetn5sxw6cnct1benkdz6jggqnc66.g-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:70,LUA_SUMMARY:none X-HE-Tag: fall68_3a0df45272f6 X-Filterd-Recvd-Size: 5103 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf30.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:53 +0000 (UTC) IronPort-SDR: Y8xsJmwIq5XXD/N7yISIpbCZTK1J6cfKK5B3TlbnknAaPTU2yUrmghzj8DLqspUbCHbX27pp7b akUSLsOyRisA== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="231630842" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="231630842" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:49 -0800 IronPort-SDR: jlwf48Tx1kzec6Z120/eeqXhrL4CUeg1OnqAtvRdpdLvsTb+eG7RJjSzQpbMYu+PHhcEX17hla Yw5MeNvNIbkw== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572816" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:49 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v15 05/26] x86/cet/shstk: Add Kconfig option for user-mode Shadow Stack Date: Tue, 10 Nov 2020 08:21:50 -0800 Message-Id: <20201110162211.9207-6-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Shadow Stack provides protection against function return address corruption. It is active when the processor supports it, the kernel has CONFIG_X86_SHADOW_STACK_USER, and the application is built for the feature. This is only implemented for the 64-bit kernel. When it is enabled, legacy non-shadow stack applications continue to work, but without protection. Signed-off-by: Yu-cheng Yu --- arch/x86/Kconfig | 33 +++++++++++++++++++++++++++ scripts/as-x86_64-has-shadow-stack.sh | 4 ++++ 2 files changed, 37 insertions(+) create mode 100755 scripts/as-x86_64-has-shadow-stack.sh diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f6946b81f74a..a51d2a3de166 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1930,6 +1930,39 @@ config X86_INTEL_TSX_MODE_AUTO side channel attacks- equals the tsx=auto command line parameter. endchoice +config AS_HAS_SHADOW_STACK + def_bool $(success,$(srctree)/scripts/as-x86_64-has-shadow-stack.sh $(CC)) + help + Test the assembler for shadow stack instructions. + +config X86_CET + def_bool n + +config ARCH_HAS_SHADOW_STACK + def_bool n + +config X86_SHADOW_STACK_USER + prompt "Intel Shadow Stacks for user-mode" + def_bool n + depends on CPU_SUP_INTEL && X86_64 + depends on AS_HAS_SHADOW_STACK + select ARCH_USES_HIGH_VMA_FLAGS + select X86_CET + select ARCH_HAS_SHADOW_STACK + help + Shadow Stacks provides protection against program stack + corruption. It's a hardware feature. This only matters + if you have the right hardware. It's a security hardening + feature and apps must be enabled to use it. You get no + protection "for free" on old userspace. The hardware can + support user and kernel, but this option is for user space + only. + Support for this feature is only known to be present on + processors released in 2020 or later. CET features are also + known to increase kernel text size by 3.7 KB. + + If unsure, say N. + config EFI bool "EFI runtime service support" depends on ACPI diff --git a/scripts/as-x86_64-has-shadow-stack.sh b/scripts/as-x86_64-has-shadow-stack.sh new file mode 100755 index 000000000000..fac1d363a1b8 --- /dev/null +++ b/scripts/as-x86_64-has-shadow-stack.sh @@ -0,0 +1,4 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 + +echo "wrussq %rax, (%rbx)" | $* -x assembler -c - From patchwork Tue Nov 10 16:21:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894777 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 314E41391 for ; Tue, 10 Nov 2020 16:23:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E62DE20797 for ; Tue, 10 Nov 2020 16:23:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E62DE20797 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DA3A66B008C; Tue, 10 Nov 2020 11:22:57 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D2A0F6B0080; Tue, 10 Nov 2020 11:22:57 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 821A86B0081; Tue, 10 Nov 2020 11:22:57 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0124.hostedemail.com [216.40.44.124]) by kanga.kvack.org (Postfix) with ESMTP id 1A2256B0081 for ; Tue, 10 Nov 2020 11:22:57 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id B5492181AEF10 for ; Tue, 10 Nov 2020 16:22:56 +0000 (UTC) X-FDA: 77469027552.22.order62_61053c7272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id 86F5618038E6B for ; Tue, 10 Nov 2020 16:22:56 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30056:30064:30070,0,RBL:134.134.136.31:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yr6tsc8emo9pfkdyjzcoieci84yypz5a6tw3ejhgu9hjo4ur8ic58tbeomwsm.ysfyizro68dk9fgrs67ti77f1h8atuu9qpsrphfycqsg1addjiz87o15ej7kc94.c-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:68,LUA_SUMMARY:none X-HE-Tag: order62_61053c7272f6 X-Filterd-Recvd-Size: 9125 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:54 +0000 (UTC) IronPort-SDR: ql/GqQOjBJxHitCox/wFXvH9k1UE7z60CxC5rZPc6bkXrhRAP2jI1ZwwACtM8xEV4sBAi/oZyR o78Sz+NV2Iog== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="231630845" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="231630845" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:50 -0800 IronPort-SDR: CxsQpz9weykGltuQmK8NPcAMv8sl4KKTdOKH6yZF45f9oQE0JJJlw8mrRgOCXoyPCfnRRGtPW1 OW09+lY2Hr/g== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572822" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:49 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu , Dave Hansen Subject: [PATCH v15 06/26] x86/mm: Change _PAGE_DIRTY to _PAGE_DIRTY_HW Date: Tue, 10 Nov 2020 08:21:51 -0800 Message-Id: <20201110162211.9207-7-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Before introducing _PAGE_COW for non-hardware memory management purposes in the next patch, rename _PAGE_DIRTY to _PAGE_DIRTY_HW and _PAGE_BIT_DIRTY to _PAGE_BIT_DIRTY_HW to make meanings more clear. There are no functional changes from this patch. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook Reviewed-by: Dave Hansen --- arch/x86/include/asm/pgtable.h | 18 +++++++++--------- arch/x86/include/asm/pgtable_types.h | 10 +++++----- arch/x86/kernel/relocate_kernel_64.S | 2 +- arch/x86/kvm/vmx/vmx.c | 2 +- 4 files changed, 16 insertions(+), 16 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index a02c67291cfc..b23697658b28 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -123,7 +123,7 @@ extern pmdval_t early_pmd_flags; */ static inline int pte_dirty(pte_t pte) { - return pte_flags(pte) & _PAGE_DIRTY; + return pte_flags(pte) & _PAGE_DIRTY_HW; } @@ -162,7 +162,7 @@ static inline int pte_young(pte_t pte) static inline int pmd_dirty(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_DIRTY; + return pmd_flags(pmd) & _PAGE_DIRTY_HW; } static inline int pmd_young(pmd_t pmd) @@ -172,7 +172,7 @@ static inline int pmd_young(pmd_t pmd) static inline int pud_dirty(pud_t pud) { - return pud_flags(pud) & _PAGE_DIRTY; + return pud_flags(pud) & _PAGE_DIRTY_HW; } static inline int pud_young(pud_t pud) @@ -333,7 +333,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte) static inline pte_t pte_mkclean(pte_t pte) { - return pte_clear_flags(pte, _PAGE_DIRTY); + return pte_clear_flags(pte, _PAGE_DIRTY_HW); } static inline pte_t pte_mkold(pte_t pte) @@ -353,7 +353,7 @@ static inline pte_t pte_mkexec(pte_t pte) static inline pte_t pte_mkdirty(pte_t pte) { - return pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + return pte_set_flags(pte, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY); } static inline pte_t pte_mkyoung(pte_t pte) @@ -434,7 +434,7 @@ static inline pmd_t pmd_mkold(pmd_t pmd) static inline pmd_t pmd_mkclean(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_DIRTY); + return pmd_clear_flags(pmd, _PAGE_DIRTY_HW); } static inline pmd_t pmd_wrprotect(pmd_t pmd) @@ -444,7 +444,7 @@ static inline pmd_t pmd_wrprotect(pmd_t pmd) static inline pmd_t pmd_mkdirty(pmd_t pmd) { - return pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + return pmd_set_flags(pmd, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY); } static inline pmd_t pmd_mkdevmap(pmd_t pmd) @@ -488,7 +488,7 @@ static inline pud_t pud_mkold(pud_t pud) static inline pud_t pud_mkclean(pud_t pud) { - return pud_clear_flags(pud, _PAGE_DIRTY); + return pud_clear_flags(pud, _PAGE_DIRTY_HW); } static inline pud_t pud_wrprotect(pud_t pud) @@ -498,7 +498,7 @@ static inline pud_t pud_wrprotect(pud_t pud) static inline pud_t pud_mkdirty(pud_t pud) { - return pud_set_flags(pud, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + return pud_set_flags(pud, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY); } static inline pud_t pud_mkdevmap(pud_t pud) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 816b31c68550..810eb1567050 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -15,7 +15,7 @@ #define _PAGE_BIT_PWT 3 /* page write through */ #define _PAGE_BIT_PCD 4 /* page cache disabled */ #define _PAGE_BIT_ACCESSED 5 /* was accessed (raised by CPU) */ -#define _PAGE_BIT_DIRTY 6 /* was written to (raised by CPU) */ +#define _PAGE_BIT_DIRTY_HW 6 /* was written to (raised by CPU) */ #define _PAGE_BIT_PSE 7 /* 4 MB (or 2MB) page */ #define _PAGE_BIT_PAT 7 /* on 4KB pages */ #define _PAGE_BIT_GLOBAL 8 /* Global TLB entry PPro+ */ @@ -46,7 +46,7 @@ #define _PAGE_PWT (_AT(pteval_t, 1) << _PAGE_BIT_PWT) #define _PAGE_PCD (_AT(pteval_t, 1) << _PAGE_BIT_PCD) #define _PAGE_ACCESSED (_AT(pteval_t, 1) << _PAGE_BIT_ACCESSED) -#define _PAGE_DIRTY (_AT(pteval_t, 1) << _PAGE_BIT_DIRTY) +#define _PAGE_DIRTY_HW (_AT(pteval_t, 1) << _PAGE_BIT_DIRTY_HW) #define _PAGE_PSE (_AT(pteval_t, 1) << _PAGE_BIT_PSE) #define _PAGE_GLOBAL (_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL) #define _PAGE_SOFTW1 (_AT(pteval_t, 1) << _PAGE_BIT_SOFTW1) @@ -74,7 +74,7 @@ _PAGE_PKEY_BIT3) #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) -#define _PAGE_KNL_ERRATUM_MASK (_PAGE_DIRTY | _PAGE_ACCESSED) +#define _PAGE_KNL_ERRATUM_MASK (_PAGE_DIRTY_HW | _PAGE_ACCESSED) #else #define _PAGE_KNL_ERRATUM_MASK 0 #endif @@ -126,7 +126,7 @@ * pte_modify() does modify it. */ #define _PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \ - _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY | \ + _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY_HW | \ _PAGE_SOFT_DIRTY | _PAGE_DEVMAP | _PAGE_ENC | \ _PAGE_UFFD_WP) #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE) @@ -163,7 +163,7 @@ enum page_cache_mode { #define __RW _PAGE_RW #define _USR _PAGE_USER #define ___A _PAGE_ACCESSED -#define ___D _PAGE_DIRTY +#define ___D _PAGE_DIRTY_HW #define ___G _PAGE_GLOBAL #define __NX _PAGE_NX diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S index a4d9a261425b..e3bb4ff95523 100644 --- a/arch/x86/kernel/relocate_kernel_64.S +++ b/arch/x86/kernel/relocate_kernel_64.S @@ -17,7 +17,7 @@ */ #define PTR(x) (x << 3) -#define PAGE_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) +#define PAGE_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY_HW) /* * control_page + KEXEC_CONTROL_CODE_MAX_SIZE diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 47b8357b9751..3962284843ee 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -3574,7 +3574,7 @@ static int init_rmode_identity_map(struct kvm *kvm) /* Set up identity-mapping pagetable for EPT in real mode */ for (i = 0; i < PT32_ENT_PER_PAGE; i++) { tmp = (i << 22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | - _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE); + _PAGE_ACCESSED | _PAGE_DIRTY_HW | _PAGE_PSE); r = kvm_write_guest_page(kvm, identity_map_pfn, &tmp, i * sizeof(tmp), sizeof(tmp)); if (r < 0) From patchwork Tue Nov 10 16:21:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894769 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0AE12697 for ; Tue, 10 Nov 2020 16:23:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BD9A920780 for ; Tue, 10 Nov 2020 16:23:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BD9A920780 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A75146B0075; Tue, 10 Nov 2020 11:22:56 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9AE976B0078; Tue, 10 Nov 2020 11:22:56 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 75F8A6B0082; Tue, 10 Nov 2020 11:22:56 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0185.hostedemail.com [216.40.44.185]) by kanga.kvack.org (Postfix) with ESMTP id 3C9796B0078 for ; Tue, 10 Nov 2020 11:22:56 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id DEF1C180AD811 for ; Tue, 10 Nov 2020 16:22:55 +0000 (UTC) X-FDA: 77469027510.07.size85_570bdd8272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id C3CEA1803FD61 for ; Tue, 10 Nov 2020 16:22:55 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30056:30064,0,RBL:134.134.136.31:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yg1rqssodkfx8s1bqc7r5smf9hcycptryet97u7gcji7w3irp1jbxgib8hgoe.acws414h345ky48qq43nbmdi38ukyh99ippsneerakguxadwj7x8a4g45tt6p7b.o-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:70,LUA_SUMMARY:none X-HE-Tag: size85_570bdd8272f6 X-Filterd-Recvd-Size: 5140 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:54 +0000 (UTC) IronPort-SDR: 9QMXra/tgKXNwQ2tMWmBY65DaBKZ51XJ0soA1utdMcK7KsZsAmWYgFUSrU4ckAw9DKLTy3Ce6M 6omgE3iMQfuQ== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="231630847" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="231630847" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:50 -0800 IronPort-SDR: AmzUa4WBNZk2GkthNU4ezuSQAO+7yQHYreTaLxWHMzAL851bBBiKiT6B5qCpXEfPSJYVm4txOd 7qrjQFquKUdg== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572826" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:50 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu , Christoph Hellwig Subject: [PATCH v15 07/26] x86/mm: Remove _PAGE_DIRTY_HW from kernel RO pages Date: Tue, 10 Nov 2020 08:21:52 -0800 Message-Id: <20201110162211.9207-8-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Kernel read-only PTEs are setup as _PAGE_DIRTY_HW. Since these become shadow stack PTEs, remove the dirty bit. Signed-off-by: Yu-cheng Yu Cc: "H. Peter Anvin" Cc: Kees Cook Cc: Thomas Gleixner Cc: Dave Hansen Cc: Christoph Hellwig Cc: Andy Lutomirski Cc: Ingo Molnar Cc: Borislav Petkov Cc: Peter Zijlstra --- arch/x86/include/asm/pgtable_types.h | 6 +++--- arch/x86/mm/pat/set_memory.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 810eb1567050..7462a574fc93 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -193,10 +193,10 @@ enum page_cache_mode { #define _KERNPG_TABLE (__PP|__RW| 0|___A| 0|___D| 0| 0| _ENC) #define _PAGE_TABLE_NOENC (__PP|__RW|_USR|___A| 0|___D| 0| 0) #define _PAGE_TABLE (__PP|__RW|_USR|___A| 0|___D| 0| 0| _ENC) -#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX|___D| 0|___G) -#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0|___D| 0|___G) +#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX| 0| 0|___G) +#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0| 0| 0|___G) #define __PAGE_KERNEL_NOCACHE (__PP|__RW| 0|___A|__NX|___D| 0|___G| __NC) -#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX|___D| 0|___G) +#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX| 0| 0|___G) #define __PAGE_KERNEL_LARGE (__PP|__RW| 0|___A|__NX|___D|_PSE|___G) #define __PAGE_KERNEL_LARGE_EXEC (__PP|__RW| 0|___A| 0|___D|_PSE|___G) #define __PAGE_KERNEL_WP (__PP|__RW| 0|___A|__NX|___D| 0|___G| __WP) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 40baa90e74f4..207bbf796f5f 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -1932,7 +1932,7 @@ int set_memory_nx(unsigned long addr, int numpages) int set_memory_ro(unsigned long addr, int numpages) { - return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW), 0); + return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW | _PAGE_DIRTY_HW), 0); } int set_memory_rw(unsigned long addr, int numpages) From patchwork Tue Nov 10 16:21:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894775 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D5190697 for ; Tue, 10 Nov 2020 16:23:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6F57C21527 for ; Tue, 10 Nov 2020 16:23:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6F57C21527 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 825A66B0083; Tue, 10 Nov 2020 11:22:57 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 789686B0087; Tue, 10 Nov 2020 11:22:57 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 31BC76B0080; Tue, 10 Nov 2020 11:22:57 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0006.hostedemail.com [216.40.44.6]) by kanga.kvack.org (Postfix) with ESMTP id D152B6B0080 for ; Tue, 10 Nov 2020 11:22:56 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 75D711EF1 for ; Tue, 10 Nov 2020 16:22:56 +0000 (UTC) X-FDA: 77469027552.21.camp31_3d16fde272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 43C7A180442C0 for ; Tue, 10 Nov 2020 16:22:56 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:4423:30045:30051:30054:30056:30064:30079:30091,0,RBL:134.134.136.31:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yrnyt8d1tsyj5yapt5u4da58fo3ocgprfso47on515nc9hi5ph1gg3y9rqcd5.8bbsofp4d4wb6qiopp75yxh1wmru58yrpwimmj6bnysecdo1hi1rqa8rjbzygfc.o-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:69,LUA_SUMMARY:none X-HE-Tag: camp31_3d16fde272f6 X-Filterd-Recvd-Size: 14720 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf30.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:55 +0000 (UTC) IronPort-SDR: n7DgATaHs/Xl7z6VUxZYGZbnqWPVsIg6JwdP15PfKXHTiORJs8qOoBSFCjEE5voCRdOpuuRnhC om7APGbPqE0w== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="231630850" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="231630850" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:50 -0800 IronPort-SDR: rETqqlD08OhLyTvmXiTuNTkerpVlIjEirRfsDsX6JcO1gqeP/rUd1DvCklsfwVuNhtoVI3aWc+ gskznMtJU1Lw== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572830" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:50 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v15 08/26] x86/mm: Introduce _PAGE_COW Date: Tue, 10 Nov 2020 08:21:53 -0800 Message-Id: <20201110162211.9207-9-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There is essentially no room left in the x86 hardware PTEs on some OSes (not Linux). That left the hardware architects looking for a way to represent a new memory type (shadow stack) within the existing bits. They chose to repurpose a lightly-used state: Write=0,Dirty=1. The reason it's lightly used is that Dirty=1 is normally set by hardware and cannot normally be set by hardware on a Write=0 PTE. Software must normally be involved to create one of these PTEs, so software can simply opt to not create them. But that leaves us with a Linux problem: we need to ensure we never create Write=0,Dirty=1 PTEs. In places where we do create them, we need to find an alternative way to represent them _without_ using the same hardware bit combination. Thus, enter _PAGE_COW. This results in the following: (a) A modified, copy-on-write (COW) page: (R/O + _PAGE_COW) (b) A R/O page that has been COW'ed: (R/O + _PAGE_COW) The user page is in a R/O VMA, and get_user_pages() needs a writable copy. The page fault handler creates a copy of the page and sets the new copy's PTE as R/O and _PAGE_COW. (c) A shadow stack PTE: (R/O + _PAGE_DIRTY_HW) (d) A shared shadow stack PTE: (R/O + _PAGE_COW) When a shadow stack page is being shared among processes (this happens at fork()), its PTE is cleared of _PAGE_DIRTY_HW, so the next shadow stack access causes a fault, and the page is duplicated and _PAGE_DIRTY_HW is set again. This is the COW equivalent for shadow stack pages, even though it's copy-on-access rather than copy-on-write. (e) A page where the processor observed a Write=1 PTE, started a write, set Dirty=1, but then observed a Write=0 PTE. That's possible today, but will not happen on processors that support shadow stack. Use _PAGE_COW in pte_wrprotect() and _PAGE_DIRTY_HW in pte_mkwrite(). Apply the same changes to pmd and pud. When this patch is applied, there are six free bits left in the 64-bit PTE. There are no more free bits in the 32-bit PTE (except for PAE) and shadow stack is not implemented for the 32-bit kernel. Signed-off-by: Yu-cheng Yu --- arch/x86/include/asm/pgtable.h | 120 ++++++++++++++++++++++++--- arch/x86/include/asm/pgtable_types.h | 41 ++++++++- 2 files changed, 150 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index b23697658b28..c88c7ccf0318 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -121,9 +121,9 @@ extern pmdval_t early_pmd_flags; * The following only work if pte_present() is true. * Undefined behaviour if not.. */ -static inline int pte_dirty(pte_t pte) +static inline bool pte_dirty(pte_t pte) { - return pte_flags(pte) & _PAGE_DIRTY_HW; + return pte_flags(pte) & _PAGE_DIRTY_BITS; } @@ -160,9 +160,9 @@ static inline int pte_young(pte_t pte) return pte_flags(pte) & _PAGE_ACCESSED; } -static inline int pmd_dirty(pmd_t pmd) +static inline bool pmd_dirty(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_DIRTY_HW; + return pmd_flags(pmd) & _PAGE_DIRTY_BITS; } static inline int pmd_young(pmd_t pmd) @@ -170,9 +170,9 @@ static inline int pmd_young(pmd_t pmd) return pmd_flags(pmd) & _PAGE_ACCESSED; } -static inline int pud_dirty(pud_t pud) +static inline bool pud_dirty(pud_t pud) { - return pud_flags(pud) & _PAGE_DIRTY_HW; + return pud_flags(pud) & _PAGE_DIRTY_BITS; } static inline int pud_young(pud_t pud) @@ -182,6 +182,12 @@ static inline int pud_young(pud_t pud) static inline int pte_write(pte_t pte) { + /* + * If _PAGE_DIRTY_HW is set, the PTE must either have + * _PAGE_RW or be a shadow stack PTE, which is logically writable. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY_HW); return pte_flags(pte) & _PAGE_RW; } @@ -333,7 +339,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte) static inline pte_t pte_mkclean(pte_t pte) { - return pte_clear_flags(pte, _PAGE_DIRTY_HW); + return pte_clear_flags(pte, _PAGE_DIRTY_BITS); } static inline pte_t pte_mkold(pte_t pte) @@ -343,6 +349,17 @@ static inline pte_t pte_mkold(pte_t pte) static inline pte_t pte_wrprotect(pte_t pte) { + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PTE (RW=0,Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pte.pte |= (pte.pte & _PAGE_DIRTY_HW) >> + _PAGE_BIT_DIRTY_HW << _PAGE_BIT_COW; + pte = pte_clear_flags(pte, _PAGE_DIRTY_HW); + } + return pte_clear_flags(pte, _PAGE_RW); } @@ -353,6 +370,18 @@ static inline pte_t pte_mkexec(pte_t pte) static inline pte_t pte_mkdirty(pte_t pte) { + pteval_t dirty = _PAGE_DIRTY_HW; + + /* Avoid creating (HW)Dirty=1,Write=0 PTEs */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !pte_write(pte)) + dirty = _PAGE_COW; + + return pte_set_flags(pte, dirty | _PAGE_SOFT_DIRTY); +} + +static inline pte_t pte_mkwrite_shstk(pte_t pte) +{ + pte = pte_clear_flags(pte, _PAGE_COW); return pte_set_flags(pte, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY); } @@ -363,6 +392,13 @@ static inline pte_t pte_mkyoung(pte_t pte) static inline pte_t pte_mkwrite(pte_t pte) { + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + if (pte_flags(pte) & _PAGE_COW) { + pte = pte_clear_flags(pte, _PAGE_COW); + pte = pte_set_flags(pte, _PAGE_DIRTY_HW); + } + } + return pte_set_flags(pte, _PAGE_RW); } @@ -434,16 +470,41 @@ static inline pmd_t pmd_mkold(pmd_t pmd) static inline pmd_t pmd_mkclean(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_DIRTY_HW); + return pmd_clear_flags(pmd, _PAGE_DIRTY_BITS); } static inline pmd_t pmd_wrprotect(pmd_t pmd) { + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PMD (RW=0,Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pmdval_t v = native_pmd_val(pmd); + + v |= (v & _PAGE_DIRTY_HW) >> _PAGE_BIT_DIRTY_HW << + _PAGE_BIT_COW; + pmd = pmd_clear_flags(__pmd(v), _PAGE_DIRTY_HW); + } + return pmd_clear_flags(pmd, _PAGE_RW); } static inline pmd_t pmd_mkdirty(pmd_t pmd) { + pmdval_t dirty = _PAGE_DIRTY_HW; + + /* Avoid creating (HW)Dirty=1,Write=0 PMDs */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !(pmd_flags(pmd) & _PAGE_RW)) + dirty = _PAGE_COW; + + return pmd_set_flags(pmd, dirty | _PAGE_SOFT_DIRTY); +} + +static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd) +{ + pmd = pmd_clear_flags(pmd, _PAGE_COW); return pmd_set_flags(pmd, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY); } @@ -464,6 +525,13 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) static inline pmd_t pmd_mkwrite(pmd_t pmd) { + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + if (pmd_flags(pmd) & _PAGE_COW) { + pmd = pmd_clear_flags(pmd, _PAGE_COW); + pmd = pmd_set_flags(pmd, _PAGE_DIRTY_HW); + } + } + return pmd_set_flags(pmd, _PAGE_RW); } @@ -488,17 +556,36 @@ static inline pud_t pud_mkold(pud_t pud) static inline pud_t pud_mkclean(pud_t pud) { - return pud_clear_flags(pud, _PAGE_DIRTY_HW); + return pud_clear_flags(pud, _PAGE_DIRTY_BITS); } static inline pud_t pud_wrprotect(pud_t pud) { + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PUD (RW=0,Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pudval_t v = native_pud_val(pud); + + v |= (v & _PAGE_DIRTY_HW) >> _PAGE_BIT_DIRTY_HW << + _PAGE_BIT_COW; + pud = pud_clear_flags(__pud(v), _PAGE_DIRTY_HW); + } + return pud_clear_flags(pud, _PAGE_RW); } static inline pud_t pud_mkdirty(pud_t pud) { - return pud_set_flags(pud, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY); + pudval_t dirty = _PAGE_DIRTY_HW; + + /* Avoid creating (HW)Dirty=1,Write=0 PUDs */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !(pud_flags(pud) & _PAGE_RW)) + dirty = _PAGE_COW; + + return pud_set_flags(pud, dirty | _PAGE_SOFT_DIRTY); } static inline pud_t pud_mkdevmap(pud_t pud) @@ -518,6 +605,13 @@ static inline pud_t pud_mkyoung(pud_t pud) static inline pud_t pud_mkwrite(pud_t pud) { + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + if (pud_flags(pud) & _PAGE_COW) { + pud = pud_clear_flags(pud, _PAGE_COW); + pud = pud_set_flags(pud, _PAGE_DIRTY_HW); + } + } + return pud_set_flags(pud, _PAGE_RW); } @@ -1131,6 +1225,12 @@ extern int pmdp_clear_flush_young(struct vm_area_struct *vma, #define pmd_write pmd_write static inline int pmd_write(pmd_t pmd) { + /* + * If _PAGE_DIRTY_HW is set, then the PMD must either have + * _PAGE_RW or be a shadow stack PMD, which is logically writable. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pmd_flags(pmd) & (_PAGE_RW | _PAGE_DIRTY_HW); return pmd_flags(pmd) & _PAGE_RW; } diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 7462a574fc93..5f764d8d9bae 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -23,7 +23,8 @@ #define _PAGE_BIT_SOFTW2 10 /* " */ #define _PAGE_BIT_SOFTW3 11 /* " */ #define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */ -#define _PAGE_BIT_SOFTW4 58 /* available for programmer */ +#define _PAGE_BIT_SOFTW4 57 /* available for programmer */ +#define _PAGE_BIT_SOFTW5 58 /* available for programmer */ #define _PAGE_BIT_PKEY_BIT0 59 /* Protection Keys, bit 1/4 */ #define _PAGE_BIT_PKEY_BIT1 60 /* Protection Keys, bit 2/4 */ #define _PAGE_BIT_PKEY_BIT2 61 /* Protection Keys, bit 3/4 */ @@ -36,6 +37,16 @@ #define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */ #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4 +/* + * This bit indicates a copy-on-write page, and is different from + * _PAGE_BIT_SOFT_DIRTY, which tracks which pages a task writes to. + */ +#ifdef CONFIG_X86_64 +#define _PAGE_BIT_COW _PAGE_BIT_SOFTW5 /* copy-on-write */ +#else +#define _PAGE_BIT_COW 0 +#endif + /* If _PAGE_BIT_PRESENT is clear, we use these: */ /* - if the user mapped it with PROT_NONE; pte_present gives true */ #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL @@ -117,6 +128,34 @@ #define _PAGE_DEVMAP (_AT(pteval_t, 0)) #endif +/* + * _PAGE_COW is used to separate R/O and copy-on-write PTEs created by + * software from the shadow stack PTE setting required by the hardware: + * (a) A modified, copy-on-write (COW) page: (R/O + _PAGE_COW) + * (b) A R/O page that has been COW'ed: (R/O +_PAGE_COW) + * The user page is in a R/O VMA, and get_user_pages() needs a + * writable copy. The page fault handler creates a copy of the page + * and sets the new copy's PTE as R/O and _PAGE_COW. + * (c) A shadow stack PTE: (R/O + _PAGE_DIRTY_HW) + * (d) A shared (copy-on-access) shadow stack PTE: (R/O + _PAGE_COW) + * When a shadow stack page is being shared among processes (this + * happens at fork()), its PTE is cleared of _PAGE_DIRTY_HW, so the + * next shadow stack access causes a fault, and the page is duplicated + * and _PAGE_DIRTY_HW is set again. This is the COW equivalent for + * shadow stack pages, even though it's copy-on-access rather than + * copy-on-write. + * (e) A page where the processor observed a Write=1 PTE, started a write, + * set Dirty=1, but then observed a Write=0 PTE. That's possible + * today, but will not happen on processors that support shadow stack. + */ +#ifdef CONFIG_X86_SHADOW_STACK_USER +#define _PAGE_COW (_AT(pteval_t, 1) << _PAGE_BIT_COW) +#else +#define _PAGE_COW (_AT(pteval_t, 0)) +#endif + +#define _PAGE_DIRTY_BITS (_PAGE_DIRTY_HW | _PAGE_COW) + #define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) /* From patchwork Tue Nov 10 16:21:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894779 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C35701391 for ; Tue, 10 Nov 2020 16:23:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 873F020780 for ; Tue, 10 Nov 2020 16:23:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 873F020780 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1F2FE6B0087; Tue, 10 Nov 2020 11:22:58 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EB1766B0088; Tue, 10 Nov 2020 11:22:57 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CBD226B008A; Tue, 10 Nov 2020 11:22:57 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0075.hostedemail.com [216.40.44.75]) by kanga.kvack.org (Postfix) with ESMTP id 7F0D56B0080 for ; Tue, 10 Nov 2020 11:22:57 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 1F8BF181AEF1A for ; Tue, 10 Nov 2020 16:22:57 +0000 (UTC) X-FDA: 77469027594.25.flame05_5d0aa42272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id D03D61804E3A0 for ; Tue, 10 Nov 2020 16:22:56 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30055:30056:30064:30070,0,RBL:134.134.136.31:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04ygphaq1tdb5ix3wgpqd1sinwu9socuwrcxztqjdyqqkpf5ffwshn75n6ixj18.7c987apmf94enkamuo3q98h53bz6xjoyx4cb1bzumdjc95mnnqpfth16eeyiqhf.r-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:70,LUA_SUMMARY:none X-HE-Tag: flame05_5d0aa42272f6 X-Filterd-Recvd-Size: 4019 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:55 +0000 (UTC) IronPort-SDR: IgUy0b61EROZrUqSkY6l03Jcpzc4HW5743FJm7ysUe6bbb3/UZvf80T96QltLp5OD7fGd0n+42 7wSKyxRmUTXQ== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="231630852" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="231630852" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:51 -0800 IronPort-SDR: e7fTjv/Jxil5wQmnrfq2HtuR4k/YO23v1yAaGUCvBqXWw80XCtp1WUKuolfnSi/vVnXxxNxA4C Gg+R4PmQdoKw== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572835" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:50 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu , David Airlie , Joonas Lahtinen , Jani Nikula , Daniel Vetter , Rodrigo Vivi , Zhenyu Wang , Zhi Wang Subject: [PATCH v15 09/26] drm/i915/gvt: Change _PAGE_DIRTY to _PAGE_DIRTY_BITS Date: Tue, 10 Nov 2020 08:21:54 -0800 Message-Id: <20201110162211.9207-10-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: After the introduction of _PAGE_COW, a modified page's PTE can have either _PAGE_DIRTY_HW or _PAGE_COW. Change _PAGE_DIRTY to _PAGE_DIRTY_BITS. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook Cc: David Airlie Cc: Joonas Lahtinen Cc: Jani Nikula Cc: Daniel Vetter Cc: Rodrigo Vivi Cc: Zhenyu Wang Cc: Zhi Wang --- drivers/gpu/drm/i915/gvt/gtt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c index a3a4305eda01..dd0ab28cfe7d 100644 --- a/drivers/gpu/drm/i915/gvt/gtt.c +++ b/drivers/gpu/drm/i915/gvt/gtt.c @@ -1207,7 +1207,7 @@ static int split_2MB_gtt_entry(struct intel_vgpu *vgpu, } /* Clear dirty field. */ - se->val64 &= ~_PAGE_DIRTY; + se->val64 &= ~_PAGE_DIRTY_BITS; ops->clear_pse(se); ops->clear_ips(se); From patchwork Tue Nov 10 16:21:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894781 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 491431391 for ; Tue, 10 Nov 2020 16:23:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 07EBA2151B for ; Tue, 10 Nov 2020 16:23:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 07EBA2151B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 57D0D6B0085; Tue, 10 Nov 2020 11:22:58 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0FB4E6B0081; Tue, 10 Nov 2020 11:22:57 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D9EE76B0087; Tue, 10 Nov 2020 11:22:57 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0152.hostedemail.com [216.40.44.152]) by kanga.kvack.org (Postfix) with ESMTP id A0E556B0088 for ; Tue, 10 Nov 2020 11:22:57 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 41DC28249980 for ; Tue, 10 Nov 2020 16:22:57 +0000 (UTC) X-FDA: 77469027594.05.magic16_420473b272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id 238F718021B99 for ; Tue, 10 Nov 2020 16:22:57 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30056:30064:30070,0,RBL:134.134.136.31:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yfjjo153uq94xwmhu1kggjpf9ssypwi5jfhgo7shmoyqz8u96da4ic17see7w.hbniuxfniuqiqqzozycn75pgirx3z3pfroq4sk61hc411xxapbftiknxnzyp7sp.c-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:70,LUA_SUMMARY:none X-HE-Tag: magic16_420473b272f6 X-Filterd-Recvd-Size: 4893 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf30.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:56 +0000 (UTC) IronPort-SDR: ti4oTzEf7saFVUv9Pmc4wtUgiDXjgHsiivEJJ+IscyTtyq8NYY2usNSF7bDz+UcoHJwGd6veTe sGORwY4WmiyA== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="231630855" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="231630855" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:51 -0800 IronPort-SDR: P47CftBifCQMypsR9qiuGvoBMGSdElEEhyZfZr+iNHAZG6UsASk3QI6ez/kmZFTyeH6ycDKMp2 wZ6q5kQ6Rb3A== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572842" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:51 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v15 10/26] x86/mm: Update pte_modify for _PAGE_COW Date: Tue, 10 Nov 2020 08:21:55 -0800 Message-Id: <20201110162211.9207-11-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Pte_modify() changes a PTE to 'newprot'. It doesn't use the pte_*() helpers that a previous patch fixed up, so we need a new site. Introduce fixup_dirty_pte() to set the dirty bits based on _PAGE_RW, and apply the same changes to pmd_modify(). Signed-off-by: Yu-cheng Yu --- arch/x86/include/asm/pgtable.h | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index c88c7ccf0318..07a08e763bce 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -726,6 +726,21 @@ static inline pmd_t pmd_mkinvalid(pmd_t pmd) static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask); +static inline pteval_t fixup_dirty_pte(pteval_t pteval) +{ + pte_t pte = __pte(pteval); + + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && pte_dirty(pte)) { + pte = pte_mkclean(pte); + + if (pte_flags(pte) & _PAGE_RW) + pte = pte_set_flags(pte, _PAGE_DIRTY_HW); + else + pte = pte_set_flags(pte, _PAGE_COW); + } + return pte_val(pte); +} + static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { pteval_t val = pte_val(pte), oldval = val; @@ -736,16 +751,34 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) */ val &= _PAGE_CHG_MASK; val |= check_pgprot(newprot) & ~_PAGE_CHG_MASK; + val = fixup_dirty_pte(val); val = flip_protnone_guard(oldval, val, PTE_PFN_MASK); return __pte(val); } +static inline int pmd_write(pmd_t pmd); +static inline pmdval_t fixup_dirty_pmd(pmdval_t pmdval) +{ + pmd_t pmd = __pmd(pmdval); + + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && pmd_dirty(pmd)) { + pmd = pmd_mkclean(pmd); + + if (pmd_flags(pmd) & _PAGE_RW) + pmd = pmd_set_flags(pmd, _PAGE_DIRTY_HW); + else + pmd = pmd_set_flags(pmd, _PAGE_COW); + } + return pmd_val(pmd); +} + static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) { pmdval_t val = pmd_val(pmd), oldval = val; val &= _HPAGE_CHG_MASK; val |= check_pgprot(newprot) & ~_HPAGE_CHG_MASK; + val = fixup_dirty_pmd(val); val = flip_protnone_guard(oldval, val, PHYSICAL_PMD_PAGE_MASK); return __pmd(val); } From patchwork Tue Nov 10 16:21:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894785 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6C20C697 for ; Tue, 10 Nov 2020 16:23:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 33D7320780 for ; Tue, 10 Nov 2020 16:23:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 33D7320780 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E11106B0092; Tue, 10 Nov 2020 11:22:58 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id CE1D16B0099; Tue, 10 Nov 2020 11:22:58 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 981456B0095; Tue, 10 Nov 2020 11:22:58 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0066.hostedemail.com [216.40.44.66]) by kanga.kvack.org (Postfix) with ESMTP id 1E3956B0080 for ; Tue, 10 Nov 2020 11:22:58 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id AAA1B82499A8 for ; Tue, 10 Nov 2020 16:22:57 +0000 (UTC) X-FDA: 77469027594.13.fifth28_0813a55272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id 81B4318140B69 for ; Tue, 10 Nov 2020 16:22:57 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30056:30064:30070,0,RBL:134.134.136.31:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04y8zikzi4xb69brspz977i1mszcyoc6opfa9a9jyab9naudedwff7ciofbgyrx.61sm184a59613qzmc66fjqw6yagxbi78i5oq6quxz1hz5uq1tsiebqrob7ujeq4.r-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:69,LUA_SUMMARY:none X-HE-Tag: fifth28_0813a55272f6 X-Filterd-Recvd-Size: 6371 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:56 +0000 (UTC) IronPort-SDR: zfTbVCxD+JicNpEH+05xk1yitX+HPnhG5A8SLlpc3LoUUp8egAT9Dn0WeIphI3zcQyNJQajs2E 8+R+xWduqKzw== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="231630857" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="231630857" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:51 -0800 IronPort-SDR: btnedAJXepRgTCIznTycPfe/5Ka1MFwIT0ZAGB0RSyB+Fqksri5RnOESYNHQ/Z3lZSvCu+co0n TatqdnoiAZcA== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572845" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:51 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v15 11/26] x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for transition from _PAGE_DIRTY_HW to _PAGE_COW Date: Tue, 10 Nov 2020 08:21:56 -0800 Message-Id: <20201110162211.9207-12-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When shadow stack is introduced, [R/O + _PAGE_DIRTY_HW] PTE is reserved for shadow stack. Copy-on-write PTEs have [R/O + _PAGE_COW]. When a PTE goes from [R/W + _PAGE_DIRTY_HW] to [R/O + _PAGE_COW], it could become a transient shadow stack PTE in two cases: The first case is that some processors can start a write but end up seeing a read-only PTE by the time they get to the Dirty bit, creating a transient shadow stack PTE. However, this will not occur on processors supporting shadow stack, therefore we don't need a TLB flush here. The second case is that when the software, without atomic, tests & replaces _PAGE_DIRTY_HW with _PAGE_COW, a transient shadow stack PTE can exist. This is prevented with cmpxchg. Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many insights to the issue. Jann Horn provided the cmpxchg solution. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- arch/x86/include/asm/pgtable.h | 52 ++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 07a08e763bce..5fd4d6b60383 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1229,6 +1229,32 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { + /* + * Some processors can start a write, but end up seeing a read-only + * PTE by the time they get to the Dirty bit. In this case, they + * will set the Dirty bit, leaving a read-only, Dirty PTE which + * looks like a shadow stack PTE. + * + * However, this behavior has been improved and will not occur on + * processors supporting shadow stack. Without this guarantee, a + * transition to a non-present PTE and flush the TLB would be + * needed. + * + * When changing a writable PTE to read-only and if the PTE has + * _PAGE_DIRTY_HW set, move that bit to _PAGE_COW so that the + * PTE is not a shadow stack PTE. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pte_t old_pte, new_pte; + + do { + old_pte = READ_ONCE(*ptep); + new_pte = pte_wrprotect(old_pte); + + } while (!try_cmpxchg(&ptep->pte, &old_pte.pte, new_pte.pte)); + + return; + } clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); } @@ -1285,6 +1311,32 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { + /* + * Some processors can start a write, but end up seeing a read-only + * PMD by the time they get to the Dirty bit. In this case, they + * will set the Dirty bit, leaving a read-only, Dirty PMD which + * looks like a Shadow Stack PMD. + * + * However, this behavior has been improved and will not occur on + * processors supporting Shadow Stack. Without this guarantee, a + * transition to a non-present PMD and flush the TLB would be + * needed. + * + * When changing a writable PMD to read-only and if the PMD has + * _PAGE_DIRTY_HW set, we move that bit to _PAGE_COW so that the + * PMD is not a shadow stack PMD. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pmd_t old_pmd, new_pmd; + + do { + old_pmd = READ_ONCE(*pmdp); + new_pmd = pmd_wrprotect(old_pmd); + + } while (!try_cmpxchg((pmdval_t *)pmdp, (pmdval_t *)&old_pmd, pmd_val(new_pmd))); + + return; + } clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); } From patchwork Tue Nov 10 16:21:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894809 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3EA3A697 for ; Tue, 10 Nov 2020 16:24:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0459A20780 for ; Tue, 10 Nov 2020 16:24:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0459A20780 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 478AE6B00AC; Tue, 10 Nov 2020 11:23:42 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 404676B00AD; Tue, 10 Nov 2020 11:23:42 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E4756B00AE; Tue, 10 Nov 2020 11:23:42 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id D13726B00AC for ; Tue, 10 Nov 2020 11:23:41 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 795CF8249980 for ; Tue, 10 Nov 2020 16:23:41 +0000 (UTC) X-FDA: 77469029442.18.screw37_610e378272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id 28BAD100ED0FE for ; Tue, 10 Nov 2020 16:22:58 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30056:30064:30070,0,RBL:134.134.136.31:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04y87osurw99xsusk74twaecggicfyp5437yt7a5kwbzcebikcpkczr8stzx6wb.raae6ta795y5frffekxgzb36rsjsryyekkaiessra1i4jb96uhm1nmrwy9aneto.1-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:69,LUA_SUMMARY:none X-HE-Tag: screw37_610e378272f6 X-Filterd-Recvd-Size: 5202 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:56 +0000 (UTC) IronPort-SDR: SqeI3cuV/5i620CBHFbZY/qg0vMxwI8FWV1Exkb2xQryXdj2nUN/1e1XWTcG8AY4O3rfOfGKzz zZwKbroBnMYg== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="231630859" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="231630859" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:52 -0800 IronPort-SDR: X4ZGIXYr72OoqjMRdcb2KyYOrHFdaxGQUCrbrs3fyfXGApp6XDRiaLux66jcXM+tUX9weRoVLv roxMnFNfg1gg== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572853" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:51 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v15 12/26] mm: Introduce VM_SHSTK for shadow stack memory Date: Tue, 10 Nov 2020 08:21:57 -0800 Message-Id: <20201110162211.9207-13-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A Shadow Stack PTE must be read-only and have _PAGE_DIRTY set. However, read-only and Dirty PTEs also exist for copy-on-write (COW) pages. These two cases are handled differently for page faults. Introduce VM_SHSTK to track shadow stack VMAs. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- arch/x86/mm/mmap.c | 2 ++ fs/proc/task_mmu.c | 3 +++ include/linux/mm.h | 8 ++++++++ 3 files changed, 13 insertions(+) diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index c90c20904a60..a22c6b6fc607 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -165,6 +165,8 @@ unsigned long get_mmap_base(int is_legacy) const char *arch_vma_name(struct vm_area_struct *vma) { + if (vma->vm_flags & VM_SHSTK) + return "[shadow stack]"; return NULL; } diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 217aa2705d5d..c72143cdbb5d 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -661,6 +661,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_PKEY_BIT4)] = "", #endif #endif /* CONFIG_ARCH_HAS_PKEYS */ +#ifdef CONFIG_X86_SHADOW_STACK_USER + [ilog2(VM_SHSTK)] = "ss", +#endif }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index db6ae4d3fb4e..c7f527bd21fb 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -304,11 +304,13 @@ extern unsigned int kobjsize(const void *objp); #define VM_HIGH_ARCH_BIT_2 34 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_3 35 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_4 36 /* bit only usable on 64-bit architectures */ +#define VM_HIGH_ARCH_BIT_5 37 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0) #define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1) #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2) #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3) #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4) +#define VM_HIGH_ARCH_5 BIT(VM_HIGH_ARCH_BIT_5) #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */ #ifdef CONFIG_ARCH_HAS_PKEYS @@ -324,6 +326,12 @@ extern unsigned int kobjsize(const void *objp); #endif #endif /* CONFIG_ARCH_HAS_PKEYS */ +#ifdef CONFIG_X86_SHADOW_STACK_USER +# define VM_SHSTK VM_HIGH_ARCH_5 +#else +# define VM_SHSTK VM_NONE +#endif + #if defined(CONFIG_X86) # define VM_PAT VM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ #elif defined(CONFIG_PPC) From patchwork Tue Nov 10 16:21:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894795 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 18C00697 for ; Tue, 10 Nov 2020 16:23:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CCBEA20780 for ; Tue, 10 Nov 2020 16:23:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CCBEA20780 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1E2546B0098; Tue, 10 Nov 2020 11:23:00 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id CB86E6B0099; Tue, 10 Nov 2020 11:22:59 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 90CB86B0096; Tue, 10 Nov 2020 11:22:59 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0249.hostedemail.com [216.40.44.249]) by kanga.kvack.org (Postfix) with ESMTP id DE9ED6B008A for ; Tue, 10 Nov 2020 11:22:58 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 6A5418249980 for ; Tue, 10 Nov 2020 16:22:58 +0000 (UTC) X-FDA: 77469027636.24.fuel72_470747f272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id 4B4811A4A5 for ; Tue, 10 Nov 2020 16:22:58 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30054:30056:30064:30069:30070,0,RBL:134.134.136.31:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04ygmacc8pj69a1yya36eoaxarac1opbpnn7dj7af6mmgnwjyigmjddtcxd47sw.e5aze8jod81ppowth1ets9az1dxfeda4qny6xryb8tr91p1y5t5gwzr7p6idfzn.q-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:69,LUA_SUMMARY:none X-HE-Tag: fuel72_470747f272f6 X-Filterd-Recvd-Size: 5978 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf30.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:57 +0000 (UTC) IronPort-SDR: 8OEjmbretpfaVe32Vv4AfqhLRtv7T7aST7dXI+SyxIf3j4bBegyaD77/GwZxfd0N7OO36ck7jh OG6CmYyj2+JA== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="231630863" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="231630863" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:52 -0800 IronPort-SDR: jxit4pkfdtFSwf8JL80U7g+rEwtGKm686zoXKqxe6qVpbqHAYNZ6YLXWAhV0WY770ppuoY+7pf M4hhZE4xcfPQ== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572858" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:51 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v15 13/26] x86/mm: Shadow Stack page fault error checking Date: Tue, 10 Nov 2020 08:21:58 -0800 Message-Id: <20201110162211.9207-14-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Shadow stack accesses are those that are performed by the CPU where it expects to encounter a shadow stack mapping. These accesses are performed implicitly by CALL/RET at the site of the shadow stack pointer. These accesses are made explicitly by shadow stack management instructions like WRUSSQ. Shadow stacks accesses to shadow-stack mapping can see faults in normal, valid operation just like regular accesses to regular mappings. Shadow stacks need some of the same features like delayed allocation, swap and copy-on-write. Shadow stack accesses can also result in errors, such as when a shadow stack overflows, or if a shadow stack access occurs to a non-shadow-stack mapping. In handling a shadow stack page fault, verify it occurs within a shadow stack mapping. It is always an error otherwise. For valid shadow stack accesses, set FAULT_FLAG_WRITE to effect copy-on-write. Because clearing _PAGE_DIRTY_HW (vs. _PAGE_RW) is used to trigger the fault, shadow stack read fault and shadow stack write fault are not differentiated and both are handled as a write access. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- arch/x86/include/asm/trap_pf.h | 2 ++ arch/x86/mm/fault.c | 19 +++++++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h index 305bc1214aef..205766c438b3 100644 --- a/arch/x86/include/asm/trap_pf.h +++ b/arch/x86/include/asm/trap_pf.h @@ -11,6 +11,7 @@ * bit 3 == 1: use of reserved bit detected * bit 4 == 1: fault was an instruction fetch * bit 5 == 1: protection keys block access + * bit 6 == 1: shadow stack access fault */ enum x86_pf_error_code { X86_PF_PROT = 1 << 0, @@ -19,6 +20,7 @@ enum x86_pf_error_code { X86_PF_RSVD = 1 << 3, X86_PF_INSTR = 1 << 4, X86_PF_PK = 1 << 5, + X86_PF_SHSTK = 1 << 6, }; #endif /* _ASM_X86_TRAP_PF_H */ diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 82bf37a5c9ec..941f55ee7c75 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1110,6 +1110,17 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) (error_code & X86_PF_INSTR), foreign)) return 1; + /* + * Verify a shadow stack access is within a shadow stack VMA. + * It is always an error otherwise. Normal data access to a + * shadow stack area is checked in the case followed. + */ + if (error_code & X86_PF_SHSTK) { + if (!(vma->vm_flags & VM_SHSTK)) + return 1; + return 0; + } + if (error_code & X86_PF_WRITE) { /* write, present and write, not present: */ if (unlikely(!(vma->vm_flags & VM_WRITE))) @@ -1275,6 +1286,14 @@ void do_user_addr_fault(struct pt_regs *regs, perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); + /* + * Clearing _PAGE_DIRTY_HW is used to detect shadow stack access. + * This method cannot distinguish shadow stack read vs. write. + * For valid shadow stack accesses, set FAULT_FLAG_WRITE to effect + * copy-on-write. + */ + if (hw_error_code & X86_PF_SHSTK) + flags |= FAULT_FLAG_WRITE; if (hw_error_code & X86_PF_WRITE) flags |= FAULT_FLAG_WRITE; if (hw_error_code & X86_PF_INSTR) From patchwork Tue Nov 10 16:21:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894767 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BC9911391 for ; Tue, 10 Nov 2020 16:23:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7D15020E65 for ; Tue, 10 Nov 2020 16:23:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7D15020E65 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 656456B007D; Tue, 10 Nov 2020 11:22:56 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 594B26B0081; Tue, 10 Nov 2020 11:22:56 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 28BAD6B0080; Tue, 10 Nov 2020 11:22:56 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0213.hostedemail.com [216.40.44.213]) by kanga.kvack.org (Postfix) with ESMTP id BADA96B0075 for ; Tue, 10 Nov 2020 11:22:55 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 53946181AEF10 for ; Tue, 10 Nov 2020 16:22:55 +0000 (UTC) X-FDA: 77469027510.27.dress74_3b12d25272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id 269C73D669 for ; Tue, 10 Nov 2020 16:22:55 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30054:30056:30064:30070,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yg19e1gnxobwwdrfegwnetqdbbmypp3hwu8jkm9fyy1jdnffm8x4hs1mt3hd7.wfke86crtg1fjikd9jihc4rgotgbu51wa1dqthxiyrspibjfiqzhx4h77zcan4u.c-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:70,LUA_SUMMARY:none X-HE-Tag: dress74_3b12d25272f6 X-Filterd-Recvd-Size: 6345 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:54 +0000 (UTC) IronPort-SDR: csDrVzaKnysM+VQC5rb6kA/xLR3xDSYSix1PxJso+KHS6qIZOETKk4oGH2lOFI6UWyPSFv5W8k M+ITNgCtyXYA== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="170110951" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="170110951" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:52 -0800 IronPort-SDR: Mi5hXwSV3BGmEBv/CJDdY3u7Z4kpDzp15me3v0NZxUtCxx7PgN9zZ8wTWTqz4n5gdlXaY6Tipd G0lfDFEV0ICQ== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572864" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:52 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v15 14/26] x86/mm: Update maybe_mkwrite() for shadow stack Date: Tue, 10 Nov 2020 08:21:59 -0800 Message-Id: <20201110162211.9207-15-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Shadow stack memory is writable, but its VMA has VM_SHSTK instead of VM_WRITE. Update maybe_mkwrite() to include the shadow stack. Signed-off-by: Yu-cheng Yu --- arch/x86/Kconfig | 4 ++++ arch/x86/mm/pgtable.c | 18 ++++++++++++++++++ include/linux/mm.h | 2 ++ include/linux/pgtable.h | 24 ++++++++++++++++++++++++ mm/huge_memory.c | 2 ++ 5 files changed, 50 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a51d2a3de166..960993862b96 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1938,6 +1938,9 @@ config AS_HAS_SHADOW_STACK config X86_CET def_bool n +config ARCH_MAYBE_MKWRITE + def_bool n + config ARCH_HAS_SHADOW_STACK def_bool n @@ -1948,6 +1951,7 @@ config X86_SHADOW_STACK_USER depends on AS_HAS_SHADOW_STACK select ARCH_USES_HIGH_VMA_FLAGS select X86_CET + select ARCH_MAYBE_MKWRITE select ARCH_HAS_SHADOW_STACK help Shadow Stacks provides protection against program stack diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index dfd82f51ba66..a9666b64bc05 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -610,6 +610,24 @@ int pmdp_clear_flush_young(struct vm_area_struct *vma, } #endif +#ifdef CONFIG_ARCH_MAYBE_MKWRITE +pte_t arch_maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) +{ + if (likely(vma->vm_flags & VM_SHSTK)) + pte = pte_mkwrite_shstk(pte); + return pte; +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +pmd_t arch_maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) +{ + if (likely(vma->vm_flags & VM_SHSTK)) + pmd = pmd_mkwrite_shstk(pmd); + return pmd; +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#endif /* CONFIG_ARCH_MAYBE_MKWRITE */ + /** * reserve_top_address - reserves a hole in the top of kernel address space * @reserve - size of hole to reserve diff --git a/include/linux/mm.h b/include/linux/mm.h index c7f527bd21fb..09f07d07a8ff 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -977,6 +977,8 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) { if (likely(vma->vm_flags & VM_WRITE)) pte = pte_mkwrite(pte); + else + pte = arch_maybe_mkwrite(pte, vma); return pte; } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 71125a4676c4..ea66461610ae 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1384,6 +1384,30 @@ static inline bool arch_has_pfn_modify_check(void) } #endif /* !_HAVE_ARCH_PFN_MODIFY_ALLOWED */ +#ifdef CONFIG_MMU +#ifdef CONFIG_ARCH_MAYBE_MKWRITE +pte_t arch_maybe_mkwrite(pte_t pte, struct vm_area_struct *vma); + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +pmd_t arch_maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + +#else /* !CONFIG_ARCH_MAYBE_MKWRITE */ +static inline pte_t arch_maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) +{ + return pte; +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static inline pmd_t arch_maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) +{ + return pmd; +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + +#endif /* CONFIG_ARCH_MAYBE_MKWRITE */ +#endif /* CONFIG_MMU */ + /* * Architecture PAGE_KERNEL_* fallbacks * diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9474dbc150ed..ecd23777b8bf 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -464,6 +464,8 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { if (likely(vma->vm_flags & VM_WRITE)) pmd = pmd_mkwrite(pmd); + else + pmd = arch_maybe_pmd_mkwrite(pmd, vma); return pmd; } From patchwork Tue Nov 10 16:22:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894773 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3CC661391 for ; Tue, 10 Nov 2020 16:23:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F01EA207D3 for ; Tue, 10 Nov 2020 16:23:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F01EA207D3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3214E6B0082; Tue, 10 Nov 2020 11:22:57 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2A6A16B0087; Tue, 10 Nov 2020 11:22:57 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0D3416B0085; Tue, 10 Nov 2020 11:22:57 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0095.hostedemail.com [216.40.44.95]) by kanga.kvack.org (Postfix) with ESMTP id B46976B0082 for ; Tue, 10 Nov 2020 11:22:56 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 5A0C21EE6 for ; Tue, 10 Nov 2020 16:22:56 +0000 (UTC) X-FDA: 77469027552.06.cord79_3b00617272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id 3B33810044ECA for ; Tue, 10 Nov 2020 16:22:56 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30054:30056:30064:30070,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yrcdfj3oc4sfqs53k5khi7jeo7nocf7ug9yofx4wnokgxmhcptck6q8s3dhng.szwrd4n5iui6aqzcsw8qi5c4mu7yqbnaoowj43cxwiaehtqdxk79hiuc48gzjmn.g-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:69,LUA_SUMMARY:none X-HE-Tag: cord79_3b00617272f6 X-Filterd-Recvd-Size: 5338 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:55 +0000 (UTC) IronPort-SDR: QOvSGMzK7mciW17U0gqCNluULjaYbZdX7wSDSToWk8glXatV7xYmwPUCnbktc6XcvWryKSmgI/ vwhtgiFygMAA== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="170110953" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="170110953" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:52 -0800 IronPort-SDR: pw8p6GBAo7TxACxZtiALAzVuq3NLgbp5meI3qAIiIwG7freH03vnEpq/UZ8g1UzuVDbcs3lJdC 6qivn6w+XLZg== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572871" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:52 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v15 15/26] mm: Fixup places that call pte_mkwrite() directly Date: Tue, 10 Nov 2020 08:22:00 -0800 Message-Id: <20201110162211.9207-16-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A shadow stack page is made writable by pte_mkwrite_shstk(), which sets _PAGE_DIRTY_HW. There are a few places that call pte_mkwrite() directly and miss the maybe_mkwrite() fixup in the previous patch. Fix them with maybe_mkwrite(): - do_anonymous_page() and migrate_vma_insert_page() check VM_WRITE directly and call pte_mkwrite(), which is the same as maybe_mkwrite(). Change them to maybe_mkwrite(). - In do_numa_page(), if the numa entry 'was-writable', then pte_mkwrite() is called directly. Fix it by doing maybe_mkwrite(). - In change_pte_range(), pte_mkwrite() is called directly. Replace it with maybe_mkwrite(). Signed-off-by: Yu-cheng Yu --- mm/memory.c | 5 ++--- mm/migrate.c | 3 +-- mm/mprotect.c | 2 +- 3 files changed, 4 insertions(+), 6 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index c48f8df6e502..65c56a5de418 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3536,8 +3536,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) entry = mk_pte(page, vma->vm_page_prot); entry = pte_sw_mkyoung(entry); - if (vma->vm_flags & VM_WRITE) - entry = pte_mkwrite(pte_mkdirty(entry)); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); @@ -4192,7 +4191,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) pte = pte_modify(old_pte, vma->vm_page_prot); pte = pte_mkyoung(pte); if (was_writable) - pte = pte_mkwrite(pte); + pte = maybe_mkwrite(pte, vma); ptep_modify_prot_commit(vma, vmf->address, vmf->pte, old_pte, pte); update_mmu_cache(vma, vmf->address, vmf->pte); diff --git a/mm/migrate.c b/mm/migrate.c index 5ca5842df5db..f27ec0436fce 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2914,8 +2914,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, } } else { entry = mk_pte(page, vma->vm_page_prot); - if (vma->vm_flags & VM_WRITE) - entry = pte_mkwrite(pte_mkdirty(entry)); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); diff --git a/mm/mprotect.c b/mm/mprotect.c index 56c02beb6041..7235b2409422 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -135,7 +135,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, if (dirty_accountable && pte_dirty(ptent) && (pte_soft_dirty(ptent) || !(vma->vm_flags & VM_SOFTDIRTY))) { - ptent = pte_mkwrite(ptent); + ptent = maybe_mkwrite(ptent, vma); } ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); pages++; From patchwork Tue Nov 10 16:22:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894783 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DD50E1391 for ; Tue, 10 Nov 2020 16:23:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A078D20780 for ; Tue, 10 Nov 2020 16:23:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A078D20780 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A26076B0080; Tue, 10 Nov 2020 11:22:58 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7411C6B0098; Tue, 10 Nov 2020 11:22:58 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4AB806B0093; Tue, 10 Nov 2020 11:22:58 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0083.hostedemail.com [216.40.44.83]) by kanga.kvack.org (Postfix) with ESMTP id D9A786B0085 for ; Tue, 10 Nov 2020 11:22:57 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 80AAD180AD830 for ; Tue, 10 Nov 2020 16:22:57 +0000 (UTC) X-FDA: 77469027594.23.week95_3f0f5dc272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id 8FB0E37608 for ; Tue, 10 Nov 2020 16:22:56 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30054:30056:30064,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yf9cr7h86ek5xsqdnj87fay8rguoppjt9i3wnoa8wnhiye89fskfqybcuc4h3.iqdfeu8rztr8wk4fdinuf5cnn317yywxejie7c3u6tu5quwexc78rq1j55cmqdi.g-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:140,LUA_SUMMARY:none X-HE-Tag: week95_3f0f5dc272f6 X-Filterd-Recvd-Size: 5976 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:55 +0000 (UTC) IronPort-SDR: QRusISvu26IpSpKo0cQpiMcZ0s13cLy6+PuVNx92OzL0cUcDhLUPPWl9PeyJAWrIFo874aygLE GWQ3JAUR0A6A== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="170110960" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="170110960" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:53 -0800 IronPort-SDR: X1Xilg3tsvrfiVzhCGv1/87nT21DwI9nZ80X0r+y/t1qqzKIAdFKQtIQk9ovtE9kBvUACfTNny Z33ygY3x1iAA== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572878" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:52 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v15 16/26] mm: Add guard pages around a shadow stack. Date: Tue, 10 Nov 2020 08:22:01 -0800 Message-Id: <20201110162211.9207-17-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: INCSSP(Q/D) increments shadow stack pointer and 'pops and discards' the first and the last elements in the range, effectively touches those memory areas. The maximum moving distance by INCSSPQ is 255 * 8 = 2040 bytes and 255 * 4 = 1020 bytes by INCSSPD. Both ranges are far from PAGE_SIZE. Thus, putting a gap page on both ends of a shadow stack prevents INCSSP, CALL, and RET from going beyond. Signed-off-by: Yu-cheng Yu --- arch/x86/include/asm/page_64_types.h | 10 ++++++++++ include/linux/mm.h | 24 ++++++++++++++++++++---- 2 files changed, 30 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h index 3f49dac03617..0fbee6dcd3ca 100644 --- a/arch/x86/include/asm/page_64_types.h +++ b/arch/x86/include/asm/page_64_types.h @@ -97,6 +97,16 @@ #define STACK_TOP TASK_SIZE_LOW #define STACK_TOP_MAX TASK_SIZE_MAX +/* + * Shadow stack pointer is moved by CALL, JMP, and INCSSP(Q/D). INCSSPQ + * moves shadow stack pointer up to 255 * 8 = ~2 KB (~1KB for INCSSPD) and + * touches the first and the last element in the range, which triggers a + * page fault if the range is not in a shadow stack. Because of this, + * creating 4-KB guard pages around a shadow stack prevents these + * instructions from going beyond. + */ +#define ARCH_SHADOW_STACK_GUARD_GAP PAGE_SIZE + /* * Maximum kernel image size is limited to 1 GiB, due to the fixmap living * in the next 1 GiB (see level2_kernel_pgt in arch/x86/kernel/head_64.S). diff --git a/include/linux/mm.h b/include/linux/mm.h index 09f07d07a8ff..80cda65bb1ae 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2639,6 +2639,10 @@ extern vm_fault_t filemap_page_mkwrite(struct vm_fault *vmf); int __must_check write_one_page(struct page *page); void task_dirty_inc(struct task_struct *tsk); +#ifndef ARCH_SHADOW_STACK_GUARD_GAP +#define ARCH_SHADOW_STACK_GUARD_GAP 0 +#endif + extern unsigned long stack_guard_gap; /* Generic expand stack which grows the stack according to GROWS{UP,DOWN} */ extern int expand_stack(struct vm_area_struct *vma, unsigned long address); @@ -2671,9 +2675,15 @@ static inline struct vm_area_struct * find_vma_intersection(struct mm_struct * m static inline unsigned long vm_start_gap(struct vm_area_struct *vma) { unsigned long vm_start = vma->vm_start; + unsigned long gap = 0; - if (vma->vm_flags & VM_GROWSDOWN) { - vm_start -= stack_guard_gap; + if (vma->vm_flags & VM_GROWSDOWN) + gap = stack_guard_gap; + else if (vma->vm_flags & VM_SHSTK) + gap = ARCH_SHADOW_STACK_GUARD_GAP; + + if (gap != 0) { + vm_start -= gap; if (vm_start > vma->vm_start) vm_start = 0; } @@ -2683,9 +2693,15 @@ static inline unsigned long vm_start_gap(struct vm_area_struct *vma) static inline unsigned long vm_end_gap(struct vm_area_struct *vma) { unsigned long vm_end = vma->vm_end; + unsigned long gap = 0; + + if (vma->vm_flags & VM_GROWSUP) + gap = stack_guard_gap; + else if (vma->vm_flags & VM_SHSTK) + gap = ARCH_SHADOW_STACK_GUARD_GAP; - if (vma->vm_flags & VM_GROWSUP) { - vm_end += stack_guard_gap; + if (gap != 0) { + vm_end += gap; if (vm_end < vma->vm_end) vm_end = -PAGE_SIZE; } From patchwork Tue Nov 10 16:22:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894789 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 898441391 for ; Tue, 10 Nov 2020 16:23:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 549D520780 for ; Tue, 10 Nov 2020 16:23:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 549D520780 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6290B6B0081; Tue, 10 Nov 2020 11:22:59 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5DB3C6B0095; Tue, 10 Nov 2020 11:22:59 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D2CC46B0081; Tue, 10 Nov 2020 11:22:58 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0189.hostedemail.com [216.40.44.189]) by kanga.kvack.org (Postfix) with ESMTP id 525C66B0092 for ; Tue, 10 Nov 2020 11:22:58 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id EC17B180AD83A for ; Tue, 10 Nov 2020 16:22:57 +0000 (UTC) X-FDA: 77469027594.12.owl83_3d0f74e272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id 70FC118013E37 for ; Tue, 10 Nov 2020 16:22:57 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30001:30054:30056:30064,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04y8iiygkbxsneby7r5fj1o76r9miopdsg699htsi9rywb8u3u7ti8e3ohdsuu7.dswtfb45dqasgo34iu66nbxzsg17b3u5u14kpx5u5zkeatchqy16p5yjf76x65r.6-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:113,LUA_SUMMARY:none X-HE-Tag: owl83_3d0f74e272f6 X-Filterd-Recvd-Size: 4808 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf24.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:56 +0000 (UTC) IronPort-SDR: Z8EmmJudeaJs+SdtE2TohxKTkx1upryxSQETUaUkTJtNeioXBrbmWdFtHDE08I4/fu/1SUqKB/ XLi0tQgH1hdg== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="170110967" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="170110967" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:53 -0800 IronPort-SDR: cVUhgk+9LQvhWAAhB85xFpm94tMT/bs1j3LxdeggDuELIae/Ks6bVaRHoPcr5eN1yyw0IPyfJ9 wto+uxFz8nZg== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572885" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:53 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v15 17/26] mm/mmap: Add shadow stack pages to memory accounting Date: Tue, 10 Nov 2020 08:22:02 -0800 Message-Id: <20201110162211.9207-18-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Account shadow stack pages to stack memory. Signed-off-by: Yu-cheng Yu --- arch/x86/mm/pgtable.c | 7 +++++++ include/linux/pgtable.h | 11 +++++++++++ mm/mmap.c | 5 +++++ 3 files changed, 23 insertions(+) diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index a9666b64bc05..68e98f70298b 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -893,3 +893,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) #endif /* CONFIG_X86_64 */ #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ + +#ifdef CONFIG_ARCH_HAS_SHADOW_STACK +bool arch_shadow_stack_mapping(vm_flags_t vm_flags) +{ + return (vm_flags & VM_SHSTK); +} +#endif diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index ea66461610ae..13cb5fe3c2be 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1408,6 +1408,17 @@ static inline pmd_t arch_maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma #endif /* CONFIG_ARCH_MAYBE_MKWRITE */ #endif /* CONFIG_MMU */ +#ifdef CONFIG_MMU +#ifdef CONFIG_ARCH_HAS_SHADOW_STACK +bool arch_shadow_stack_mapping(vm_flags_t vm_flags); +#else +static inline bool arch_shadow_stack_mapping(vm_flags_t vm_flags) +{ + return false; +} +#endif /* CONFIG_ARCH_HAS_SHADOW_STACK */ +#endif /* CONFIG_MMU */ + /* * Architecture PAGE_KERNEL_* fallbacks * diff --git a/mm/mmap.c b/mm/mmap.c index d91ecb00d38c..2290a67f4d3f 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1720,6 +1720,9 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags) if (file && is_file_hugepages(file)) return 0; + if (arch_shadow_stack_mapping(vm_flags)) + return 1; + return (vm_flags & (VM_NORESERVE | VM_SHARED | VM_WRITE)) == VM_WRITE; } @@ -3391,6 +3394,8 @@ void vm_stat_account(struct mm_struct *mm, vm_flags_t flags, long npages) mm->stack_vm += npages; else if (is_data_mapping(flags)) mm->data_vm += npages; + else if (arch_shadow_stack_mapping(flags)) + mm->stack_vm += npages; } static vm_fault_t special_mapping_fault(struct vm_fault *vmf); From patchwork Tue Nov 10 16:22:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894787 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0E5EB697 for ; Tue, 10 Nov 2020 16:23:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C3C6F20870 for ; Tue, 10 Nov 2020 16:23:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C3C6F20870 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 322476B0088; Tue, 10 Nov 2020 11:22:59 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 22DA96B0098; Tue, 10 Nov 2020 11:22:58 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 90C376B0093; Tue, 10 Nov 2020 11:22:58 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0155.hostedemail.com [216.40.44.155]) by kanga.kvack.org (Postfix) with ESMTP id EF9926B008A for ; Tue, 10 Nov 2020 11:22:57 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 8DE44824999B for ; Tue, 10 Nov 2020 16:22:57 +0000 (UTC) X-FDA: 77469027594.20.mind71_410a079272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id 5C3B0180C07A3 for ; Tue, 10 Nov 2020 16:22:57 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30054:30056:30064:30069:30070,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yf45bqp8zkdcbk4s7te7fa7gxhfocot9km5wqj4rewcwy3j4unrzfjipnp5m5.oozzi19aniwh4n7qjncraxfz6mbyf36zs5uq9fqz8kghk44k1bewjwbqgp7rpf9.c-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:108,LUA_SUMMARY:none X-HE-Tag: mind71_410a079272f6 X-Filterd-Recvd-Size: 5662 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:56 +0000 (UTC) IronPort-SDR: c0WySu8DIcFKcvIxySoquXW1hlS6GMN2DI4nZPkeJZemohi8wIP82ladnJXtei+x4Td7nmjKaH XcbD6jozrPVQ== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="170110971" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="170110971" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:53 -0800 IronPort-SDR: cZBmSsR12TqP6Evvb16FQUgh0rcKw1PIrm5zb8RNQVUlZx5SW7G40xmJNSeZDd7S9kiIXIzWSg JHuQsXvw4U/w== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572893" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:53 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v15 18/26] mm: Update can_follow_write_pte() for shadow stack Date: Tue, 10 Nov 2020 08:22:03 -0800 Message-Id: <20201110162211.9207-19-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Can_follow_write_pte() ensures a read-only page is COWed by checking the FOLL_COW flag, and uses pte_dirty() to validate the flag is still valid. Like a writable data page, a shadow stack page is writable, and becomes read-only during copy-on-write, but it is always dirty. Thus, in the can_follow_write_pte() check, it belongs to the writable page case and should be excluded from the read-only page pte_dirty() check. Apply the same changes to can_follow_write_pmd(). Signed-off-by: Yu-cheng Yu --- mm/gup.c | 8 +++++--- mm/huge_memory.c | 8 +++++--- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 102877ed77a4..cae6e7eec0a4 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -391,10 +391,12 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address, * FOLL_FORCE can write to even unwritable pte's, but only * after we've gone through a COW cycle and they are dirty. */ -static inline bool can_follow_write_pte(pte_t pte, unsigned int flags) +static inline bool can_follow_write_pte(pte_t pte, unsigned int flags, + struct vm_area_struct *vma) { return pte_write(pte) || - ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte)); + ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte) && + !arch_shadow_stack_mapping(vma->vm_flags)); } static struct page *follow_page_pte(struct vm_area_struct *vma, @@ -437,7 +439,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, } if ((flags & FOLL_NUMA) && pte_protnone(pte)) goto no_page; - if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags)) { + if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags, vma)) { pte_unmap_unlock(ptep, ptl); return NULL; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index ecd23777b8bf..c6301c6d93b7 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1324,10 +1324,12 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd) * FOLL_FORCE can write to even unwritable pmd's, but only * after we've gone through a COW cycle and they are dirty. */ -static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags) +static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags, + struct vm_area_struct *vma) { return pmd_write(pmd) || - ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd)); + ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd) && + !arch_shadow_stack_mapping(vma->vm_flags)); } struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, @@ -1340,7 +1342,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, assert_spin_locked(pmd_lockptr(mm, pmd)); - if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags)) + if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags, vma)) goto out; /* Avoid dumping huge zero page */ From patchwork Tue Nov 10 16:22:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894791 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 105E51391 for ; Tue, 10 Nov 2020 16:23:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C508420780 for ; Tue, 10 Nov 2020 16:23:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C508420780 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A23E26B008A; Tue, 10 Nov 2020 11:22:59 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8C4466B0098; Tue, 10 Nov 2020 11:22:59 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 000C46B009A; Tue, 10 Nov 2020 11:22:58 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0180.hostedemail.com [216.40.44.180]) by kanga.kvack.org (Postfix) with ESMTP id 628396B0088 for ; Tue, 10 Nov 2020 11:22:58 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 060C51EE6 for ; Tue, 10 Nov 2020 16:22:58 +0000 (UTC) X-FDA: 77469027636.06.son56_4417dd7272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id D403010038004 for ; Tue, 10 Nov 2020 16:22:57 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:4423:30036:30051:30054:30056:30064,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yry5x5oeg9yifnu4ht9ddmx4mzfocxedzf6b3bachckdxxmcmkjwgarf71fq8.s9hg71aceys7xzdd4i4wsa48tsi78y9nsunr5wx7ukoyrrdn4ugqecdn7adb9ha.w-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:140,LUA_SUMMARY:none X-HE-Tag: son56_4417dd7272f6 X-Filterd-Recvd-Size: 8202 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:56 +0000 (UTC) IronPort-SDR: adV3Smms5dxuoPR13eOukImnrcPoBW6S9ceDWSKAoDY+FMTXSWjcskYTZjmaK2raF4tdft4Whx qRu9Yu6iJ36w== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="170110975" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="170110975" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:54 -0800 IronPort-SDR: Z00bkfKti0l1mIp1NdWgXMKUyKC8JtfDpcC6vHYslQF+xdBetCAVgU8szDipHG/w36TNyWCvlD Msw9BOVSYYLQ== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572901" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:53 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu , Peter Collingbourne , Andrew Morton Subject: [PATCH v15 19/26] mm: Re-introduce vm_flags to do_mmap() Date: Tue, 10 Nov 2020 08:22:04 -0800 Message-Id: <20201110162211.9207-20-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There was no more caller passing vm_flags to do_mmap(), and vm_flags was removed from the function's input by: commit 45e55300f114 ("mm: remove unnecessary wrapper function do_mmap_pgoff()"). There is a new user now. Shadow stack allocation passes VM_SHSTK to do_mmap(). Re-introduce vm_flags to do_mmap(), but without the old wrapper do_mmap_pgoff(). Instead, make all callers of the wrapper pass a zero vm_flags to do_mmap(). Signed-off-by: Yu-cheng Yu Reviewed-by: Peter Collingbourne Cc: Andrew Morton Cc: Oleg Nesterov Cc: linux-mm@kvack.org --- fs/aio.c | 2 +- include/linux/mm.h | 3 ++- ipc/shm.c | 2 +- mm/mmap.c | 10 +++++----- mm/nommu.c | 4 ++-- mm/util.c | 2 +- 6 files changed, 12 insertions(+), 11 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index c45c20d87538..641640288b50 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -527,7 +527,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) ctx->mmap_base = do_mmap(ctx->aio_ring_file, 0, ctx->mmap_size, PROT_READ | PROT_WRITE, - MAP_SHARED, 0, &unused, NULL); + MAP_SHARED, 0, 0, &unused, NULL); mmap_write_unlock(mm); if (IS_ERR((void *)ctx->mmap_base)) { ctx->mmap_size = 0; diff --git a/include/linux/mm.h b/include/linux/mm.h index 80cda65bb1ae..ef1bd7c7e88b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2584,7 +2584,8 @@ extern unsigned long mmap_region(struct file *file, unsigned long addr, struct list_head *uf); extern unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, - unsigned long pgoff, unsigned long *populate, struct list_head *uf); + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, + struct list_head *uf); extern int __do_munmap(struct mm_struct *, unsigned long, size_t, struct list_head *uf, bool downgrade); extern int do_munmap(struct mm_struct *, unsigned long, size_t, diff --git a/ipc/shm.c b/ipc/shm.c index e25c7c6106bc..91474258933d 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -1556,7 +1556,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, goto invalid; } - addr = do_mmap(file, addr, size, prot, flags, 0, &populate, NULL); + addr = do_mmap(file, addr, size, prot, flags, 0, 0, &populate, NULL); *raddr = addr; err = 0; if (IS_ERR_VALUE(addr)) diff --git a/mm/mmap.c b/mm/mmap.c index 2290a67f4d3f..c4938e4b789b 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1403,11 +1403,11 @@ static inline bool file_mmap_ok(struct file *file, struct inode *inode, */ unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, - unsigned long flags, unsigned long pgoff, - unsigned long *populate, struct list_head *uf) + unsigned long flags, vm_flags_t vm_flags, + unsigned long pgoff, unsigned long *populate, + struct list_head *uf) { struct mm_struct *mm = current->mm; - vm_flags_t vm_flags; int pkey = 0; *populate = 0; @@ -1469,7 +1469,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, * to. we assume access permissions have been handled by the open * of the memory object, so we don't do any here. */ - vm_flags = calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | + vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; if (flags & MAP_LOCKED) @@ -3051,7 +3051,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, file = get_file(vma->vm_file); ret = do_mmap(vma->vm_file, start, size, - prot, flags, pgoff, &populate, NULL); + prot, flags, 0, pgoff, &populate, NULL); fput(file); out: mmap_write_unlock(mm); diff --git a/mm/nommu.c b/mm/nommu.c index 0faf39b32cdb..a03c72f0c3f8 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -1071,6 +1071,7 @@ unsigned long do_mmap(struct file *file, unsigned long len, unsigned long prot, unsigned long flags, + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf) @@ -1078,7 +1079,6 @@ unsigned long do_mmap(struct file *file, struct vm_area_struct *vma; struct vm_region *region; struct rb_node *rb; - vm_flags_t vm_flags; unsigned long capabilities, result; int ret; @@ -1097,7 +1097,7 @@ unsigned long do_mmap(struct file *file, /* we've determined that we can make the mapping, now translate what we * now know into VMA flags */ - vm_flags = determine_vm_flags(file, prot, flags, capabilities); + vm_flags |= determine_vm_flags(file, prot, flags, capabilities); /* we're going to need to record the mapping */ region = kmem_cache_zalloc(vm_region_jar, GFP_KERNEL); diff --git a/mm/util.c b/mm/util.c index 4ddb6e186dd5..6fd9a272b7f9 100644 --- a/mm/util.c +++ b/mm/util.c @@ -504,7 +504,7 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr, if (!ret) { if (mmap_write_lock_killable(mm)) return -EINTR; - ret = do_mmap(file, addr, len, prot, flag, pgoff, &populate, + ret = do_mmap(file, addr, len, prot, flag, 0, pgoff, &populate, &uf); mmap_write_unlock(mm); userfaultfd_unmap_complete(mm, &uf); From patchwork Tue Nov 10 16:22:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894793 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 90945697 for ; Tue, 10 Nov 2020 16:23:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 46DC520797 for ; Tue, 10 Nov 2020 16:23:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 46DC520797 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D255A6B0095; Tue, 10 Nov 2020 11:22:59 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B83406B009A; Tue, 10 Nov 2020 11:22:59 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 540BF6B0099; Tue, 10 Nov 2020 11:22:59 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0151.hostedemail.com [216.40.44.151]) by kanga.kvack.org (Postfix) with ESMTP id BD74C6B0096 for ; Tue, 10 Nov 2020 11:22:58 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 671C0181AEF10 for ; Tue, 10 Nov 2020 16:22:58 +0000 (UTC) X-FDA: 77469027636.08.lead98_4413569272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id 4C5751819E766 for ; Tue, 10 Nov 2020 16:22:58 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30045:30046:30051:30054:30055:30056:30062:30064:30067:30070:30090,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04y8rw3t5wdq8aptmskfh6sp86bwhyp3e73u7ukg7ry1ykh96zkmewjcdm9zdfm.464zcku3cxrk38mm1x1k6yrbeq7swanxwn94hba38j1mi8663f3xtkutjc8bhkp.h-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:70,LUA_SUMMARY:none X-HE-Tag: lead98_4413569272f6 X-Filterd-Recvd-Size: 11412 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:57 +0000 (UTC) IronPort-SDR: wJNedO3hBSbokfy0sKhG3UoyzxO/RuY3IUmGWkVs4PVa/A0vZkIb1XfH9L3pHg4b3ZQZ5HSOk+ kkRF0uVUESwA== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="170110980" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="170110980" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:54 -0800 IronPort-SDR: p8fQqqYozSDnTxMFjAdHCMh0tw8haGOCp/RWCo6Go/IK23rwqrFoa9Vazc4TzsVAa5SkML1tPg 75U3jxoTYdzA== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572907" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:54 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v15 20/26] x86/cet/shstk: User-mode shadow stack support Date: Tue, 10 Nov 2020 08:22:05 -0800 Message-Id: <20201110162211.9207-21-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch adds basic shadow stack enabling/disabling routines. A task's shadow stack is allocated from memory with VM_SHSTK flag and has a fixed size of min(RLIMIT_STACK, 4GB). Signed-off-by: Yu-cheng Yu --- arch/x86/include/asm/cet.h | 28 +++++ arch/x86/include/asm/disabled-features.h | 8 +- arch/x86/include/asm/processor.h | 5 + arch/x86/kernel/Makefile | 2 + arch/x86/kernel/cet.c | 147 +++++++++++++++++++++++ arch/x86/kernel/cpu/common.c | 28 +++++ arch/x86/kernel/process.c | 1 + 7 files changed, 218 insertions(+), 1 deletion(-) create mode 100644 arch/x86/include/asm/cet.h create mode 100644 arch/x86/kernel/cet.c diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h new file mode 100644 index 000000000000..5750fbcbb952 --- /dev/null +++ b/arch/x86/include/asm/cet.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_CET_H +#define _ASM_X86_CET_H + +#ifndef __ASSEMBLY__ +#include + +struct task_struct; +/* + * Per-thread CET status + */ +struct cet_status { + unsigned long shstk_base; + unsigned long shstk_size; +}; + +#ifdef CONFIG_X86_CET +int cet_setup_shstk(void); +void cet_disable_shstk(void); +void cet_free_shstk(struct task_struct *p); +#else +static inline void cet_disable_shstk(void) {} +static inline void cet_free_shstk(struct task_struct *p) {} +#endif + +#endif /* __ASSEMBLY__ */ + +#endif /* _ASM_X86_CET_H */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index 5861d34f9771..6a9fb7f9bc01 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -62,6 +62,12 @@ # define DISABLE_ENQCMD (1 << (X86_FEATURE_ENQCMD & 31)) #endif +#ifdef CONFIG_X86_SHADOW_STACK_USER +#define DISABLE_SHSTK 0 +#else +#define DISABLE_SHSTK (1 << (X86_FEATURE_SHSTK & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -82,7 +88,7 @@ #define DISABLED_MASK14 0 #define DISABLED_MASK15 0 #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \ - DISABLE_ENQCMD) + DISABLE_ENQCMD|DISABLE_SHSTK) #define DISABLED_MASK17 0 #define DISABLED_MASK18 0 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 82a08b585818..2e0d9286f6cf 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -27,6 +27,7 @@ struct vm86; #include #include #include +#include #include #include @@ -536,6 +537,10 @@ struct thread_struct { unsigned int sig_on_uaccess_err:1; +#ifdef CONFIG_X86_CET + struct cet_status cet; +#endif + /* Floating point and extended processor state */ struct fpu fpu; /* diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 68608bd892c0..4a89d0f3792e 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -151,6 +151,8 @@ obj-$(CONFIG_UNWINDER_FRAME_POINTER) += unwind_frame.o obj-$(CONFIG_UNWINDER_GUESS) += unwind_guess.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += sev-es.o +obj-$(CONFIG_X86_CET) += cet.o + ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c new file mode 100644 index 000000000000..f8b0a077594f --- /dev/null +++ b/arch/x86/kernel/cet.c @@ -0,0 +1,147 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * cet.c - Control-flow Enforcement (CET) + * + * Copyright (c) 2019, Intel Corporation. + * Yu-cheng Yu + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static void start_update_msrs(void) +{ + fpregs_lock(); + if (test_thread_flag(TIF_NEED_FPU_LOAD)) + __fpregs_load_activate(); +} + +static void end_update_msrs(void) +{ + fpregs_unlock(); +} + +static unsigned long cet_get_shstk_addr(void) +{ + struct fpu *fpu = ¤t->thread.fpu; + unsigned long ssp = 0; + + fpregs_lock(); + + if (fpregs_state_valid(fpu, smp_processor_id())) { + rdmsrl(MSR_IA32_PL3_SSP, ssp); + } else { + struct cet_user_state *p; + + p = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER); + if (p) + ssp = p->user_ssp; + } + + fpregs_unlock(); + return ssp; +} + +static unsigned long alloc_shstk(unsigned long size, int flags) +{ + struct mm_struct *mm = current->mm; + unsigned long addr, populate; + + /* VM_SHSTK requires MAP_ANONYMOUS, MAP_PRIVATE */ + flags |= MAP_ANONYMOUS | MAP_PRIVATE; + + mmap_write_lock(mm); + addr = do_mmap(NULL, 0, size, PROT_READ, flags, VM_SHSTK, 0, + &populate, NULL); + mmap_write_unlock(mm); + + if (populate) + mm_populate(addr, populate); + + return addr; +} + +int cet_setup_shstk(void) +{ + unsigned long addr, size; + struct cet_status *cet = ¤t->thread.cet; + + if (!static_cpu_has(X86_FEATURE_SHSTK)) + return -EOPNOTSUPP; + + size = round_up(min(rlimit(RLIMIT_STACK), 1UL << 32), PAGE_SIZE); + addr = alloc_shstk(size, 0); + + if (IS_ERR_VALUE(addr)) + return PTR_ERR((void *)addr); + + cet->shstk_base = addr; + cet->shstk_size = size; + + start_update_msrs(); + wrmsrl(MSR_IA32_PL3_SSP, addr + size); + wrmsrl(MSR_IA32_U_CET, CET_SHSTK_EN); + end_update_msrs(); + return 0; +} + +void cet_disable_shstk(void) +{ + struct cet_status *cet = ¤t->thread.cet; + u64 msr_val; + + if (!static_cpu_has(X86_FEATURE_SHSTK) || + !cet->shstk_size || !cet->shstk_base) + return; + + start_update_msrs(); + rdmsrl(MSR_IA32_U_CET, msr_val); + wrmsrl(MSR_IA32_U_CET, msr_val & ~CET_SHSTK_EN); + wrmsrl(MSR_IA32_PL3_SSP, 0); + end_update_msrs(); + + cet_free_shstk(current); +} + +void cet_free_shstk(struct task_struct *tsk) +{ + struct cet_status *cet = &tsk->thread.cet; + + if (!static_cpu_has(X86_FEATURE_SHSTK) || + !cet->shstk_size || !cet->shstk_base) + return; + + if (!tsk->mm || (tsk->mm != current->mm)) + return; + + while (1) { + int r; + + r = vm_munmap(cet->shstk_base, cet->shstk_size); + + /* + * Retry if mmap_lock is not available. + */ + if (r == -EINTR) { + cond_resched(); + continue; + } + + WARN_ON_ONCE(r); + break; + } + + cet->shstk_base = 0; + cet->shstk_size = 0; +} diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 35ad8480c464..3d38ae02d9d3 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -57,6 +57,7 @@ #include #include #include +#include #include #include "cpu.h" @@ -510,6 +511,32 @@ static __init int setup_disable_pku(char *arg) __setup("nopku", setup_disable_pku); #endif /* CONFIG_X86_64 */ +static __always_inline void setup_cet(struct cpuinfo_x86 *c) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK) && + !cpu_feature_enabled(X86_FEATURE_IBT)) + return; + + cr4_set_bits(X86_CR4_CET); +} + +#ifdef CONFIG_X86_SHADOW_STACK_USER +static __init int setup_disable_shstk(char *s) +{ + /* require an exact match without trailing characters */ + if (s[0] != '\0') + return 0; + + if (!boot_cpu_has(X86_FEATURE_SHSTK)) + return 1; + + setup_clear_cpu_cap(X86_FEATURE_SHSTK); + pr_info("x86: 'no_user_shstk' specified, disabling user Shadow Stack\n"); + return 1; +} +__setup("no_user_shstk", setup_disable_shstk); +#endif + /* * Some CPU features depend on higher CPUID levels, which may not always * be available due to CPUID level capping or broken virtualization @@ -1591,6 +1618,7 @@ static void identify_cpu(struct cpuinfo_x86 *c) x86_init_rdrand(c); setup_pku(c); + setup_cet(c); /* * Clear/Set all flags overridden by options, need do it diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index ba4593a913fa..ff3b44d6740b 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -43,6 +43,7 @@ #include #include #include +#include #include "process.h" From patchwork Tue Nov 10 16:22:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894797 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 208D81391 for ; Tue, 10 Nov 2020 16:23:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C08D1207D3 for ; Tue, 10 Nov 2020 16:23:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C08D1207D3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5E5BD6B0099; Tue, 10 Nov 2020 11:23:00 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3BB856B009D; Tue, 10 Nov 2020 11:23:00 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F9466B009B; Tue, 10 Nov 2020 11:22:59 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0090.hostedemail.com [216.40.44.90]) by kanga.kvack.org (Postfix) with ESMTP id AC6D66B0093 for ; Tue, 10 Nov 2020 11:22:59 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 0B748362A for ; Tue, 10 Nov 2020 16:22:59 +0000 (UTC) X-FDA: 77469027678.22.sugar16_4216334272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id DC9A318038E68 for ; Tue, 10 Nov 2020 16:22:58 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30045:30046:30051:30054:30056:30064:30069:30070,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yrhk9ccdbgeox9bfn85abebehw5yc3uhmkt1bncexaco6pofpucpg67pjsg7f.xzfmdppqcmaq1cf96bubffzrakjiodn5rkeujj8i6u6us1gk8wxfbu3j7858x65.o-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:70,LUA_SUMMARY:none X-HE-Tag: sugar16_4216334272f6 X-Filterd-Recvd-Size: 18151 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf24.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:57 +0000 (UTC) IronPort-SDR: JAfse+HWtQ52W9/TrELEAUBIIScz8t95lAmXTrPetl8UQJjRTh1NcnEmeKPzqlCdr+AEpHyTqw uWK0WT0hDMDQ== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="170110985" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="170110985" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:54 -0800 IronPort-SDR: 5FKg+uTXySgz5XPCiB6gKsyI4Gsnx+jzWosoB5DMS8O0V98DpC4m+4RhpYyRXeohkWPJUA6jM6 V2hRj4vngz+g== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572915" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:54 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v15 21/26] x86/cet/shstk: Handle signals for shadow stack Date: Tue, 10 Nov 2020 08:22:06 -0800 Message-Id: <20201110162211.9207-22-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To deliver a signal, create a shadow stack restore token and put a restore token and the signal restorer address on the shadow stack. For sigreturn, verify the token and restore the shadow stack pointer. Introduce WRUSS, which is a kernel-mode instruction but writes directly to user shadow stack. It is used to construct the user signal stack as described above. Introduce a signal context extension struct 'sc_ext', which is used to save shadow stack restore token address and WAIT_ENDBR status. WAIT_ENDBR will be introduced later in the Indirect Branch Tracking (IBT) series, but add that into sc_ext now to keep the struct stable in case the IBT series is applied later. Signed-off-by: Yu-cheng Yu --- arch/x86/ia32/ia32_signal.c | 17 +++ arch/x86/include/asm/cet.h | 8 ++ arch/x86/include/asm/fpu/internal.h | 10 ++ arch/x86/include/asm/special_insns.h | 32 ++++++ arch/x86/include/uapi/asm/sigcontext.h | 9 ++ arch/x86/kernel/cet.c | 152 +++++++++++++++++++++++++ arch/x86/kernel/fpu/signal.c | 100 ++++++++++++++++ arch/x86/kernel/signal.c | 10 ++ 8 files changed, 338 insertions(+) diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c index 81cf22398cd1..cec9cf0a00cf 100644 --- a/arch/x86/ia32/ia32_signal.c +++ b/arch/x86/ia32/ia32_signal.c @@ -35,6 +35,7 @@ #include #include #include +#include static inline void reload_segments(struct sigcontext_32 *sc) { @@ -205,6 +206,7 @@ static void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs, void __user **fpstate) { unsigned long sp, fx_aligned, math_size; + void __user *restorer = NULL; /* Default to using normal stack */ sp = regs->sp; @@ -218,8 +220,23 @@ static void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs, ksig->ka.sa.sa_restorer) sp = (unsigned long) ksig->ka.sa.sa_restorer; + if (ksig->ka.sa.sa_flags & SA_RESTORER) { + restorer = ksig->ka.sa.sa_restorer; + } else if (current->mm->context.vdso) { + if (ksig->ka.sa.sa_flags & SA_SIGINFO) + restorer = current->mm->context.vdso + + vdso_image_32.sym___kernel_rt_sigreturn; + else + restorer = current->mm->context.vdso + + vdso_image_32.sym___kernel_sigreturn; + } + sp = fpu__alloc_mathframe(sp, 1, &fx_aligned, &math_size); *fpstate = (struct _fpstate_32 __user *) sp; + + if (save_cet_to_sigframe(1, *fpstate, (unsigned long)restorer)) + return (void __user *) -1L; + if (copy_fpstate_to_sigframe(*fpstate, (void __user *)fx_aligned, math_size) < 0) return (void __user *) -1L; diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index 5750fbcbb952..73435856ce54 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -6,6 +6,8 @@ #include struct task_struct; +struct sc_ext; + /* * Per-thread CET status */ @@ -18,9 +20,15 @@ struct cet_status { int cet_setup_shstk(void); void cet_disable_shstk(void); void cet_free_shstk(struct task_struct *p); +int cet_verify_rstor_token(bool ia32, unsigned long ssp, unsigned long *new_ssp); +void cet_restore_signal(struct sc_ext *sc); +int cet_setup_signal(bool ia32, unsigned long rstor, struct sc_ext *sc); #else static inline void cet_disable_shstk(void) {} static inline void cet_free_shstk(struct task_struct *p) {} +static inline void cet_restore_signal(struct sc_ext *sc) { return; } +static inline int cet_setup_signal(bool ia32, unsigned long rstor, + struct sc_ext *sc) { return -EINVAL; } #endif #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h index 8d33ad80704f..c1dedec2281b 100644 --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -443,6 +443,16 @@ static inline void copy_kernel_to_fpregs(union fpregs_state *fpstate) __copy_kernel_to_fpregs(fpstate, -1); } +#ifdef CONFIG_X86_CET +extern int save_cet_to_sigframe(int ia32, void __user *fp, + unsigned long restorer); +#else +static inline int save_cet_to_sigframe(int ia32, void __user *fp, + unsigned long restorer) +{ + return 0; +} +#endif extern int copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size); /* diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index cc177b4431ae..d979d0deb3ae 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -234,6 +234,38 @@ static inline void clwb(volatile void *__p) : [pax] "a" (p)); } +#ifdef CONFIG_X86_CET +#if defined(CONFIG_IA32_EMULATION) || defined(CONFIG_X86_X32) +static inline int write_user_shstk_32(unsigned long addr, unsigned int val) +{ + asm_volatile_goto("1: wrussd %1, (%0)\n" + _ASM_EXTABLE(1b, %l[fail]) + :: "r" (addr), "r" (val) + :: fail); + return 0; +fail: + return -EPERM; +} +#else +static inline int write_user_shstk_32(unsigned long addr, unsigned int val) +{ + WARN_ONCE(1, "%s used but not supported.\n", __func__); + return -EFAULT; +} +#endif + +static inline int write_user_shstk_64(unsigned long addr, unsigned long val) +{ + asm_volatile_goto("1: wrussq %1, (%0)\n" + _ASM_EXTABLE(1b, %l[fail]) + :: "r" (addr), "r" (val) + :: fail); + return 0; +fail: + return -EPERM; +} +#endif /* CONFIG_X86_CET */ + #define nop() asm volatile ("nop") static inline void serialize(void) diff --git a/arch/x86/include/uapi/asm/sigcontext.h b/arch/x86/include/uapi/asm/sigcontext.h index 844d60eb1882..cf2d55db3be4 100644 --- a/arch/x86/include/uapi/asm/sigcontext.h +++ b/arch/x86/include/uapi/asm/sigcontext.h @@ -196,6 +196,15 @@ struct _xstate { /* New processor state extensions go here: */ }; +/* + * Located at the end of sigcontext->fpstate, aligned to 8. + */ +struct sc_ext { + unsigned long total_size; + unsigned long ssp; + unsigned long wait_endbr; +}; + /* * The 32-bit signal frame: */ diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c index f8b0a077594f..728d9baceb74 100644 --- a/arch/x86/kernel/cet.c +++ b/arch/x86/kernel/cet.c @@ -19,6 +19,8 @@ #include #include #include +#include +#include static void start_update_msrs(void) { @@ -72,6 +74,80 @@ static unsigned long alloc_shstk(unsigned long size, int flags) return addr; } +#define TOKEN_MODE_MASK 3UL +#define TOKEN_MODE_64 1UL +#define IS_TOKEN_64(token) (((token) & TOKEN_MODE_MASK) == TOKEN_MODE_64) +#define IS_TOKEN_32(token) (((token) & TOKEN_MODE_MASK) == 0) + +/* + * Verify the restore token at the address of 'ssp' is + * valid and then set shadow stack pointer according to the + * token. + */ +int cet_verify_rstor_token(bool ia32, unsigned long ssp, + unsigned long *new_ssp) +{ + unsigned long token; + + *new_ssp = 0; + + if (!IS_ALIGNED(ssp, 8)) + return -EINVAL; + + if (get_user(token, (unsigned long __user *)ssp)) + return -EFAULT; + + /* Is 64-bit mode flag correct? */ + if (!ia32 && !IS_TOKEN_64(token)) + return -EINVAL; + else if (ia32 && !IS_TOKEN_32(token)) + return -EINVAL; + + token &= ~TOKEN_MODE_MASK; + + /* + * Restore address properly aligned? + */ + if ((!ia32 && !IS_ALIGNED(token, 8)) || !IS_ALIGNED(token, 4)) + return -EINVAL; + + /* + * Token was placed properly? + */ + if (((ALIGN_DOWN(token, 8) - 8) != ssp) || (token >= TASK_SIZE_MAX)) + return -EINVAL; + + *new_ssp = token; + return 0; +} + +/* + * Create a restore token on the shadow stack. + * A token is always 8-byte and aligned to 8. + */ +static int create_rstor_token(bool ia32, unsigned long ssp, + unsigned long *new_ssp) +{ + unsigned long addr; + + *new_ssp = 0; + + if ((!ia32 && !IS_ALIGNED(ssp, 8)) || !IS_ALIGNED(ssp, 4)) + return -EINVAL; + + addr = ALIGN_DOWN(ssp, 8) - 8; + + /* Is the token for 64-bit? */ + if (!ia32) + ssp |= TOKEN_MODE_64; + + if (write_user_shstk_64(addr, ssp)) + return -EFAULT; + + *new_ssp = addr; + return 0; +} + int cet_setup_shstk(void) { unsigned long addr, size; @@ -145,3 +221,79 @@ void cet_free_shstk(struct task_struct *tsk) cet->shstk_base = 0; cet->shstk_size = 0; } + +/* + * Called from __fpu__restore_sig() and XSAVES buffer is protected by + * set_thread_flag(TIF_NEED_FPU_LOAD) in the slow path. + */ +void cet_restore_signal(struct sc_ext *sc_ext) +{ + struct cet_user_state *cet_user_state; + struct cet_status *cet = ¤t->thread.cet; + u64 msr_val = 0; + + if (!static_cpu_has(X86_FEATURE_SHSTK)) + return; + + cet_user_state = get_xsave_addr(¤t->thread.fpu.state.xsave, + XFEATURE_CET_USER); + if (!cet_user_state) + return; + + if (cet->shstk_size) { + if (test_thread_flag(TIF_NEED_FPU_LOAD)) + cet_user_state->user_ssp = sc_ext->ssp; + else + wrmsrl(MSR_IA32_PL3_SSP, sc_ext->ssp); + + msr_val |= CET_SHSTK_EN; + } + + if (test_thread_flag(TIF_NEED_FPU_LOAD)) + cet_user_state->user_cet = msr_val; + else + wrmsrl(MSR_IA32_U_CET, msr_val); +} + +/* + * Setup the shadow stack for the signal handler: first, + * create a restore token to keep track of the current ssp, + * and then the return address of the signal handler. + */ +int cet_setup_signal(bool ia32, unsigned long rstor_addr, struct sc_ext *sc_ext) +{ + struct cet_status *cet = ¤t->thread.cet; + unsigned long ssp = 0, new_ssp = 0; + int err; + + if (cet->shstk_size) { + if (!rstor_addr) + return -EINVAL; + + ssp = cet_get_shstk_addr(); + err = create_rstor_token(ia32, ssp, &new_ssp); + if (err) + return err; + + if (ia32) { + ssp = new_ssp - sizeof(u32); + err = write_user_shstk_32(ssp, (unsigned int)rstor_addr); + } else { + ssp = new_ssp - sizeof(u64); + err = write_user_shstk_64(ssp, rstor_addr); + } + + if (err) + return err; + + sc_ext->ssp = new_ssp; + } + + if (ssp) { + start_update_msrs(); + wrmsrl(MSR_IA32_PL3_SSP, ssp); + end_update_msrs(); + } + + return 0; +} diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c index a4ec65317a7f..c0c2141cb4b3 100644 --- a/arch/x86/kernel/fpu/signal.c +++ b/arch/x86/kernel/fpu/signal.c @@ -52,6 +52,74 @@ static inline int check_for_xstate(struct fxregs_state __user *buf, return 0; } +#ifdef CONFIG_X86_CET +int save_cet_to_sigframe(int ia32, void __user *fp, unsigned long restorer) +{ + int err = 0; + + if (!current->thread.cet.shstk_size) + return 0; + + if (fp) { + struct sc_ext ext = {0, 0, 0}; + + err = cet_setup_signal(ia32, restorer, &ext); + if (!err) { + void __user *p = fp; + + ext.total_size = sizeof(ext); + + if (ia32) + p += sizeof(struct fregs_state); + + p += fpu_user_xstate_size + FP_XSTATE_MAGIC2_SIZE; + p = (void __user *)ALIGN((unsigned long)p, 8); + + if (copy_to_user(p, &ext, sizeof(ext))) + return -EFAULT; + } + } + + return err; +} + +static int get_cet_from_sigframe(int ia32, void __user *fp, struct sc_ext *ext) +{ + int err = 0; + + memset(ext, 0, sizeof(*ext)); + + if (!current->thread.cet.shstk_size) + return 0; + + if (fp) { + void __user *p = fp; + + if (ia32) + p += sizeof(struct fregs_state); + + p += fpu_user_xstate_size + FP_XSTATE_MAGIC2_SIZE; + p = (void __user *)ALIGN((unsigned long)p, 8); + + if (copy_from_user(ext, p, sizeof(*ext))) + return -EFAULT; + + if (ext->total_size != sizeof(*ext)) + return -EFAULT; + + if (current->thread.cet.shstk_size) + err = cet_verify_rstor_token(ia32, ext->ssp, &ext->ssp); + } + + return err; +} +#else +static int get_cet_from_sigframe(int ia32, void __user *fp, struct sc_ext *ext) +{ + return 0; +} +#endif + /* * Signal frame handlers. */ @@ -295,6 +363,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) struct task_struct *tsk = current; struct fpu *fpu = &tsk->thread.fpu; struct user_i387_ia32_struct env; + struct sc_ext sc_ext; u64 user_xfeatures = 0; int fx_only = 0; int ret = 0; @@ -335,6 +404,10 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) if ((unsigned long)buf_fx % 64) fx_only = 1; + ret = get_cet_from_sigframe(ia32_fxstate, buf, &sc_ext); + if (ret) + return ret; + if (!ia32_fxstate) { /* * Attempt to restore the FPU registers directly from user @@ -349,6 +422,8 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) pagefault_enable(); if (!ret) { + cet_restore_signal(&sc_ext); + /* * Restore supervisor states: previous context switch * etc has done XSAVES and saved the supervisor states @@ -423,6 +498,8 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) if (unlikely(init_bv)) copy_kernel_to_xregs(&init_fpstate.xsave, init_bv); + cet_restore_signal(&sc_ext); + /* * Restore previously saved supervisor xstates along with * copied-in user xstates. @@ -491,12 +568,35 @@ int fpu__restore_sig(void __user *buf, int ia32_frame) return __fpu__restore_sig(buf, buf_fx, size); } +#ifdef CONFIG_X86_CET +static unsigned long fpu__alloc_sigcontext_ext(unsigned long sp) +{ + struct cet_status *cet = ¤t->thread.cet; + + /* + * sigcontext_ext is at: fpu + fpu_user_xstate_size + + * FP_XSTATE_MAGIC2_SIZE, then aligned to 8. + */ + if (cet->shstk_size) + sp -= (sizeof(struct sc_ext) + 8); + + return sp; +} +#else +static unsigned long fpu__alloc_sigcontext_ext(unsigned long sp) +{ + return sp; +} +#endif + unsigned long fpu__alloc_mathframe(unsigned long sp, int ia32_frame, unsigned long *buf_fx, unsigned long *size) { unsigned long frame_size = xstate_sigframe_size(); + sp = fpu__alloc_sigcontext_ext(sp); + *buf_fx = sp = round_down(sp - frame_size, 64); if (ia32_frame && use_fxsr()) { frame_size += sizeof(struct fregs_state); diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index be0d7d4152ec..f39335ed4f7e 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -46,6 +46,7 @@ #include #include #include +#include #ifdef CONFIG_X86_64 /* @@ -239,6 +240,9 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, unsigned long buf_fx = 0; int onsigstack = on_sig_stack(sp); int ret; +#ifdef CONFIG_X86_64 + void __user *restorer = NULL; +#endif /* redzone */ if (IS_ENABLED(CONFIG_X86_64)) @@ -270,6 +274,12 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, if (onsigstack && !likely(on_sig_stack(sp))) return (void __user *)-1L; +#ifdef CONFIG_X86_64 + if (ka->sa.sa_flags & SA_RESTORER) + restorer = ka->sa.sa_restorer; + ret = save_cet_to_sigframe(0, *fpstate, (unsigned long)restorer); +#endif + /* save i387 and extended state */ ret = copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size); if (ret < 0) From patchwork Tue Nov 10 16:22:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894807 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E7FAA697 for ; Tue, 10 Nov 2020 16:23:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AE3CE20780 for ; Tue, 10 Nov 2020 16:23:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AE3CE20780 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F03696B005D; Tue, 10 Nov 2020 11:23:08 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E8B606B0068; Tue, 10 Nov 2020 11:23:08 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D2A676B00A0; Tue, 10 Nov 2020 11:23:08 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id A0F5E6B005D for ; Tue, 10 Nov 2020 11:23:08 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 4659F180AD801 for ; Tue, 10 Nov 2020 16:23:08 +0000 (UTC) X-FDA: 77469028056.14.cow65_480622f272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id C7DAC180144A8 for ; Tue, 10 Nov 2020 16:22:58 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30056:30064,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yrghkpguhyxwbbxbyscqrphsgonypptqj71fci5rndinhfazixr8uujj5bsj8.hqyfbrfqyi9epxcm4q687oxwe7zqnq4arow7hbr9j4tead6epgwue647bzgjggd.g-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:70,LUA_SUMMARY:none X-HE-Tag: cow65_480622f272f6 X-Filterd-Recvd-Size: 3715 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:57 +0000 (UTC) IronPort-SDR: Pu55RcX3AfIgtp0v0Kw7ynrzOoGAfC+ypDn4LsGmpCTQSAngoe/OM9sTXO5SjFPilu+FCp0C37 eExUGBMxhzsw== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="170110991" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="170110991" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:55 -0800 IronPort-SDR: ga0SEOKii7twZfjgSVG4mFl/SJ6VWqYrLWmR0p/dLB0wy6M+YNft9yX4Wxeq2GsamSdy7mghon eFNBP3oEmS9Q== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572926" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:54 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v15 22/26] binfmt_elf: Define GNU_PROPERTY_X86_FEATURE_1_AND properties Date: Tue, 10 Nov 2020 08:22:07 -0800 Message-Id: <20201110162211.9207-23-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: An ELF file's .note.gnu.property indicates architecture features of the file.. Introduce feature definitions for Shadow Stack and Indirect Branch Tracking. Signed-off-by: Yu-cheng Yu --- include/uapi/linux/elf.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index 30f68b42eeb5..7f0e46780a35 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -455,4 +455,13 @@ typedef struct elf64_note { /* Bits for GNU_PROPERTY_AARCH64_FEATURE_1_BTI */ #define GNU_PROPERTY_AARCH64_FEATURE_1_BTI (1U << 0) +/* .note.gnu.property types for x86: */ +#define GNU_PROPERTY_X86_FEATURE_1_AND 0xc0000002 + +/* Bits for GNU_PROPERTY_X86_FEATURE_1_AND */ +#define GNU_PROPERTY_X86_FEATURE_1_IBT 0x00000001 +#define GNU_PROPERTY_X86_FEATURE_1_SHSTK 0x00000002 +#define GNU_PROPERTY_X86_FEATURE_1_INVAL ~(GNU_PROPERTY_X86_FEATURE_1_IBT | \ + GNU_PROPERTY_X86_FEATURE_1_SHSTK) + #endif /* _UAPI_LINUX_ELF_H */ From patchwork Tue Nov 10 16:22:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894805 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9201A697 for ; Tue, 10 Nov 2020 16:23:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4EF9620780 for ; Tue, 10 Nov 2020 16:23:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4EF9620780 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DF3E06B009D; Tue, 10 Nov 2020 11:23:02 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D7DB56B009E; Tue, 10 Nov 2020 11:23:02 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BABF06B009F; Tue, 10 Nov 2020 11:23:02 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0201.hostedemail.com [216.40.44.201]) by kanga.kvack.org (Postfix) with ESMTP id 6D1746B009D for ; Tue, 10 Nov 2020 11:23:02 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id C14FD362A for ; Tue, 10 Nov 2020 16:23:00 +0000 (UTC) X-FDA: 77469027720.22.pen35_4a0b047272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id 375C618038E6B for ; Tue, 10 Nov 2020 16:23:00 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30045:30054:30056:30064:30070,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04y8h8fxuqqkwcnwg8iddm7a1famoopyxe81airppzycpssqx86g9qyjw89iacx.6hripu17mn68cokn8ye5t8hp8bhu4ccuc1fdis5knqu8ew73pdaqbrixs4mtrg8.4-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:71,LUA_SUMMARY:none X-HE-Tag: pen35_4a0b047272f6 X-Filterd-Recvd-Size: 7564 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:58 +0000 (UTC) IronPort-SDR: m2HmAOC6F6/GgS6vPMoZM5x+Ad8kFhnBwFAmPxh0JXJrNpzvL8H9jsMJ9K2yFp8J/V3cnzmoPL axsrdYbXJ1MQ== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="170110995" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="170110995" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:55 -0800 IronPort-SDR: RO1Tkj4Pv4RRSxTqHagL3KexBLy7/1p2b1BpS9iaAunkmPTqVduTP50mpaLy+paUAM4N/vHHPN jBHknhTY8mBQ== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572931" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:55 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu , Mark Brown , Catalin Marinas Subject: [PATCH v15 23/26] ELF: Introduce arch_setup_elf_property() Date: Tue, 10 Nov 2020 08:22:08 -0800 Message-Id: <20201110162211.9207-24-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: An ELF file's .note.gnu.property indicates arch features supported by the file. These features are extracted by arch_parse_elf_property() and stored in 'arch_elf_state'. Introduce arch_setup_elf_property() for enabling such features. The first use-case of this function is shadow stack. ARM64 is the other arch that has ARCH_USER_GNU_PROPERTY and arch_parse_elf_ property(). Add arch_setup_elf_property() for it. Signed-off-by: Yu-cheng Yu Cc: Mark Brown Cc: Catalin Marinas Cc: Dave Martin --- arch/arm64/include/asm/elf.h | 5 +++++ arch/x86/Kconfig | 2 ++ arch/x86/include/asm/elf.h | 13 +++++++++++++ arch/x86/kernel/process_64.c | 32 ++++++++++++++++++++++++++++++++ fs/binfmt_elf.c | 4 ++++ include/linux/elf.h | 6 ++++++ 6 files changed, 62 insertions(+) diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h index 8d1c8dcb87fd..d37bc7915935 100644 --- a/arch/arm64/include/asm/elf.h +++ b/arch/arm64/include/asm/elf.h @@ -281,6 +281,11 @@ static inline int arch_parse_elf_property(u32 type, const void *data, return 0; } +static inline int arch_setup_elf_property(struct arch_elf_state *arch) +{ + return 0; +} + static inline int arch_elf_pt_proc(void *ehdr, void *phdr, struct file *f, bool is_interp, struct arch_elf_state *state) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 960993862b96..18fd8cb549ad 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1953,6 +1953,8 @@ config X86_SHADOW_STACK_USER select X86_CET select ARCH_MAYBE_MKWRITE select ARCH_HAS_SHADOW_STACK + select ARCH_USE_GNU_PROPERTY + select ARCH_BINFMT_ELF_STATE help Shadow Stacks provides protection against program stack corruption. It's a hardware feature. This only matters diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h index b9a5d488f1a5..0e1be2a13359 100644 --- a/arch/x86/include/asm/elf.h +++ b/arch/x86/include/asm/elf.h @@ -385,6 +385,19 @@ extern int compat_arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp); #define compat_arch_setup_additional_pages compat_arch_setup_additional_pages +#ifdef CONFIG_ARCH_BINFMT_ELF_STATE +struct arch_elf_state { + unsigned int gnu_property; +}; + +#define INIT_ARCH_ELF_STATE { \ + .gnu_property = 0, \ +} + +#define arch_elf_pt_proc(ehdr, phdr, elf, interp, state) (0) +#define arch_check_elf(ehdr, interp, interp_ehdr, state) (0) +#endif + /* Do not change the values. See get_align_mask() */ enum align_flags { ALIGN_VA_32 = BIT(0), diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index df342bedea88..7c4687a0f001 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -837,3 +837,35 @@ unsigned long KSTK_ESP(struct task_struct *task) { return task_pt_regs(task)->sp; } + +#ifdef CONFIG_ARCH_USE_GNU_PROPERTY +int arch_parse_elf_property(u32 type, const void *data, size_t datasz, + bool compat, struct arch_elf_state *state) +{ + if (type != GNU_PROPERTY_X86_FEATURE_1_AND) + return 0; + + if (datasz != sizeof(unsigned int)) + return -ENOEXEC; + + state->gnu_property = *(unsigned int *)data; + return 0; +} + +int arch_setup_elf_property(struct arch_elf_state *state) +{ + int r = 0; + + if (!IS_ENABLED(CONFIG_X86_CET)) + return r; + + memset(¤t->thread.cet, 0, sizeof(struct cet_status)); + + if (static_cpu_has(X86_FEATURE_SHSTK)) { + if (state->gnu_property & GNU_PROPERTY_X86_FEATURE_1_SHSTK) + r = cet_setup_shstk(); + } + + return r; +} +#endif diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index fa50e8936f5f..1ae32cc0f61b 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -1245,6 +1245,10 @@ static int load_elf_binary(struct linux_binprm *bprm) set_binfmt(&elf_format); + retval = arch_setup_elf_property(&arch_state); + if (retval < 0) + goto out; + #ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGES retval = arch_setup_additional_pages(bprm, !!interpreter); if (retval < 0) diff --git a/include/linux/elf.h b/include/linux/elf.h index 5d5b0321da0b..4827695ca415 100644 --- a/include/linux/elf.h +++ b/include/linux/elf.h @@ -82,9 +82,15 @@ static inline int arch_parse_elf_property(u32 type, const void *data, { return 0; } + +static inline int arch_setup_elf_property(struct arch_elf_state *arch) +{ + return 0; +} #else extern int arch_parse_elf_property(u32 type, const void *data, size_t datasz, bool compat, struct arch_elf_state *arch); +extern int arch_setup_elf_property(struct arch_elf_state *arch); #endif #ifdef CONFIG_ARCH_HAVE_ELF_PROT From patchwork Tue Nov 10 16:22:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894801 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C37021391 for ; Tue, 10 Nov 2020 16:23:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 860E420780 for ; Tue, 10 Nov 2020 16:23:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 860E420780 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D3A996B0096; Tue, 10 Nov 2020 11:23:00 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B035D6B009D; Tue, 10 Nov 2020 11:23:00 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 83DAC6B009C; Tue, 10 Nov 2020 11:23:00 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0002.hostedemail.com [216.40.44.2]) by kanga.kvack.org (Postfix) with ESMTP id 1DFC66B0093 for ; Tue, 10 Nov 2020 11:23:00 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id BB1C58249980 for ; Tue, 10 Nov 2020 16:22:59 +0000 (UTC) X-FDA: 77469027678.07.wound59_270cc68272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id 9E5B51803F9A3 for ; Tue, 10 Nov 2020 16:22:59 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30012:30045:30054:30056:30064,0,RBL:192.55.52.115:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04y8j4w4e1dygxa3ij4kz9ktizf6iocdysrsg3gua7ds56hmhte9dfsus1reah3.i11t6hg16hzm9yano336cau6sxf8h8mny11fn8jgd5mknqr686iaese86c68q19.6-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:69,LUA_SUMMARY:none X-HE-Tag: wound59_270cc68272f6 X-Filterd-Recvd-Size: 7348 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:58 +0000 (UTC) IronPort-SDR: koo4KV8RMaLVNwPqi2wdL4E2YWeyygJyaGYNm83j15nlIwCG5Ak5Wjv/6m5xP0GKvNTH3sGFUs V+xfi/R9Mn6Q== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="169218851" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="169218851" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:55 -0800 IronPort-SDR: 0hNjc9eLInaecPPyC2pk/KIFpEm6Jz90WsXwLyBvY/2bQioIoMcQmcfOE6t7hs2VnY7NiM9D17 79mCD92wxgfw== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572935" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:55 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v15 24/26] x86/cet/shstk: Handle thread shadow stack Date: Tue, 10 Nov 2020 08:22:09 -0800 Message-Id: <20201110162211.9207-25-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The kernel allocates (and frees on thread exit) a new shadow stack for a pthread child. It is possible for the kernel to complete the clone syscall and set the child's shadow stack pointer to NULL and let the child thread allocate a shadow stack for itself. There are two issues in this approach: It is not compatible with existing code that does inline syscall and it cannot handle signals before the child can successfully allocate a shadow stack. A 64-bit shadow stack has a size of min(RLIMIT_STACK, 4 GB). A compat-mode thread shadow stack has a size of 1/4 min(RLIMIT_STACK, 4 GB). This allows more threads to run in a 32-bit address space. Signed-off-by: Yu-cheng Yu --- arch/x86/include/asm/cet.h | 3 ++ arch/x86/include/asm/mmu_context.h | 3 ++ arch/x86/kernel/cet.c | 44 ++++++++++++++++++++++++++++++ arch/x86/kernel/process.c | 7 +++++ 4 files changed, 57 insertions(+) diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index 73435856ce54..ec4b5e62d0ce 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -18,12 +18,15 @@ struct cet_status { #ifdef CONFIG_X86_CET int cet_setup_shstk(void); +int cet_setup_thread_shstk(struct task_struct *p, unsigned long clone_flags); void cet_disable_shstk(void); void cet_free_shstk(struct task_struct *p); int cet_verify_rstor_token(bool ia32, unsigned long ssp, unsigned long *new_ssp); void cet_restore_signal(struct sc_ext *sc); int cet_setup_signal(bool ia32, unsigned long rstor, struct sc_ext *sc); #else +static inline int cet_setup_thread_shstk(struct task_struct *p, + unsigned long clone_flags) { return 0; } static inline void cet_disable_shstk(void) {} static inline void cet_free_shstk(struct task_struct *p) {} static inline void cet_restore_signal(struct sc_ext *sc) { return; } diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index d98016b83755..ceb593e405e1 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -11,6 +11,7 @@ #include #include +#include #include extern atomic64_t last_mm_ctx_id; @@ -142,6 +143,8 @@ do { \ #else #define deactivate_mm(tsk, mm) \ do { \ + if (!tsk->vfork_done) \ + cet_free_shstk(tsk); \ load_gs_index(0); \ loadsegment(fs, 0); \ } while (0) diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c index 728d9baceb74..d57f3a433af9 100644 --- a/arch/x86/kernel/cet.c +++ b/arch/x86/kernel/cet.c @@ -172,6 +172,50 @@ int cet_setup_shstk(void) return 0; } +int cet_setup_thread_shstk(struct task_struct *tsk, unsigned long clone_flags) +{ + unsigned long addr, size; + struct cet_user_state *state; + struct cet_status *cet = &tsk->thread.cet; + + if (!cet->shstk_size) + return 0; + + if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM) + return 0; + + state = get_xsave_addr(&tsk->thread.fpu.state.xsave, + XFEATURE_CET_USER); + + if (!state) + return -EINVAL; + + /* Cap shadow stack size to 4 GB */ + size = min(rlimit(RLIMIT_STACK), 1UL << 32); + + /* + * Compat-mode pthreads share a limited address space. + * If each function call takes an average of four slots + * stack space, we need 1/4 of stack size for shadow stack. + */ + if (in_compat_syscall()) + size /= 4; + size = round_up(size, PAGE_SIZE); + addr = alloc_shstk(size, 0); + + if (IS_ERR_VALUE(addr)) { + cet->shstk_base = 0; + cet->shstk_size = 0; + return PTR_ERR((void *)addr); + } + + fpu__prepare_write(&tsk->thread.fpu); + state->user_ssp = (u64)(addr + size); + cet->shstk_base = addr; + cet->shstk_size = size; + return 0; +} + void cet_disable_shstk(void) { struct cet_status *cet = ¤t->thread.cet; diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index ff3b44d6740b..67632ba893b7 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -110,6 +110,7 @@ void exit_thread(struct task_struct *tsk) free_vm86(t); + cet_free_shstk(tsk); fpu__drop(fpu); } @@ -182,6 +183,12 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg, if (clone_flags & CLONE_SETTLS) ret = set_new_tls(p, tls); +#ifdef CONFIG_X86_64 + /* Allocate a new shadow stack for pthread */ + if (!ret) + ret = cet_setup_thread_shstk(p, clone_flags); +#endif + if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP))) io_bitmap_share(p); From patchwork Tue Nov 10 16:22:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894799 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F32C4697 for ; Tue, 10 Nov 2020 16:23:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B94CD20780 for ; Tue, 10 Nov 2020 16:23:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B94CD20780 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 995FB6B009A; Tue, 10 Nov 2020 11:23:00 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8EC136B0096; Tue, 10 Nov 2020 11:23:00 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 402016B0096; Tue, 10 Nov 2020 11:23:00 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0192.hostedemail.com [216.40.44.192]) by kanga.kvack.org (Postfix) with ESMTP id DA8686B0096 for ; Tue, 10 Nov 2020 11:22:59 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 759B9181AEF10 for ; Tue, 10 Nov 2020 16:22:59 +0000 (UTC) X-FDA: 77469027678.21.debt07_160ad4c272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 2FB04180442C3 for ; Tue, 10 Nov 2020 16:22:59 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30046:30051:30054:30056:30064:30070:30075,0,RBL:134.134.136.100:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04y897t3dcz7tf7hhnrdk11jz3qhhopc4funyruf1butwaotw7cee3w1g1wjap5.t8kaoguu6qdoz1pzjwgx1na8zqb4n7cter6k85yfry86ixdosz79oj5ap44gz39.6-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:70,LUA_SUMMARY:none X-HE-Tag: debt07_160ad4c272f6 X-Filterd-Recvd-Size: 8012 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:57 +0000 (UTC) IronPort-SDR: QcM6wSAtgIJRuZidShN0KXjIvL0f+HWbLmRDHwi5O6TQNLSI2uXGw2qd7AWYdmQ1GsQrtHnWjZ l+V6rDbETLkw== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="234168877" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="234168877" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:56 -0800 IronPort-SDR: On53lC7w9+nB/NsAa8Yw99CUBO0EJGFQ6m8GklvnarS6AoMAE7ggdJwSYpCqf5M8u7/gi5pa3K J59362OYvNwA== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572940" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:55 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v15 25/26] x86/cet/shstk: Add arch_prctl functions for shadow stack Date: Tue, 10 Nov 2020 08:22:10 -0800 Message-Id: <20201110162211.9207-26-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: arch_prctl(ARCH_X86_CET_STATUS, u64 *args) Get CET feature status. The parameter 'args' is a pointer to a user buffer. The kernel returns the following information: *args = shadow stack/IBT status *(args + 1) = shadow stack base address *(args + 2) = shadow stack size arch_prctl(ARCH_X86_CET_DISABLE, unsigned int features) Disable CET features specified in 'features'. Return -EPERM if CET is locked. arch_prctl(ARCH_X86_CET_LOCK) Lock in CET features. Also change do_arch_prctl_common()'s parameter 'cpuid_enabled' to 'arg2', as it is now also passed to prctl_cet(). Signed-off-by: Yu-cheng Yu --- arch/x86/include/asm/cet.h | 3 ++ arch/x86/include/uapi/asm/prctl.h | 4 ++ arch/x86/kernel/Makefile | 2 +- arch/x86/kernel/cet_prctl.c | 68 +++++++++++++++++++++++++++++++ arch/x86/kernel/process.c | 6 +-- 5 files changed, 79 insertions(+), 4 deletions(-) create mode 100644 arch/x86/kernel/cet_prctl.c diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index ec4b5e62d0ce..16870e5bc8eb 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -14,9 +14,11 @@ struct sc_ext; struct cet_status { unsigned long shstk_base; unsigned long shstk_size; + unsigned int locked:1; }; #ifdef CONFIG_X86_CET +int prctl_cet(int option, u64 arg2); int cet_setup_shstk(void); int cet_setup_thread_shstk(struct task_struct *p, unsigned long clone_flags); void cet_disable_shstk(void); @@ -25,6 +27,7 @@ int cet_verify_rstor_token(bool ia32, unsigned long ssp, unsigned long *new_ssp) void cet_restore_signal(struct sc_ext *sc); int cet_setup_signal(bool ia32, unsigned long rstor, struct sc_ext *sc); #else +static inline int prctl_cet(int option, u64 arg2) { return -EINVAL; } static inline int cet_setup_thread_shstk(struct task_struct *p, unsigned long clone_flags) { return 0; } static inline void cet_disable_shstk(void) {} diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 5a6aac9fa41f..9245bf629120 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -14,4 +14,8 @@ #define ARCH_MAP_VDSO_32 0x2002 #define ARCH_MAP_VDSO_64 0x2003 +#define ARCH_X86_CET_STATUS 0x3001 +#define ARCH_X86_CET_DISABLE 0x3002 +#define ARCH_X86_CET_LOCK 0x3003 + #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 4a89d0f3792e..5d04c4f21485 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -151,7 +151,7 @@ obj-$(CONFIG_UNWINDER_FRAME_POINTER) += unwind_frame.o obj-$(CONFIG_UNWINDER_GUESS) += unwind_guess.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += sev-es.o -obj-$(CONFIG_X86_CET) += cet.o +obj-$(CONFIG_X86_CET) += cet.o cet_prctl.o ### # 64 bit specific files diff --git a/arch/x86/kernel/cet_prctl.c b/arch/x86/kernel/cet_prctl.c new file mode 100644 index 000000000000..bd5ad11763e4 --- /dev/null +++ b/arch/x86/kernel/cet_prctl.c @@ -0,0 +1,68 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* See Documentation/x86/intel_cet.rst. */ + +static int copy_status_to_user(struct cet_status *cet, u64 arg2) +{ + u64 buf[3] = {0, 0, 0}; + + if (cet->shstk_size) { + buf[0] |= GNU_PROPERTY_X86_FEATURE_1_SHSTK; + buf[1] = (u64)cet->shstk_base; + buf[2] = (u64)cet->shstk_size; + } + + return copy_to_user((u64 __user *)arg2, buf, sizeof(buf)); +} + +int prctl_cet(int option, u64 arg2) +{ + struct cet_status *cet; + unsigned int features; + + /* + * GLIBC's ENOTSUPP == EOPNOTSUPP == 95, and it does not recognize + * the kernel's ENOTSUPP (524). So return EOPNOTSUPP here. + */ + if (!IS_ENABLED(CONFIG_X86_CET)) + return -EOPNOTSUPP; + + cet = ¤t->thread.cet; + + if (option == ARCH_X86_CET_STATUS) + return copy_status_to_user(cet, arg2); + + if (!static_cpu_has(X86_FEATURE_SHSTK)) + return -EOPNOTSUPP; + + switch (option) { + case ARCH_X86_CET_DISABLE: + if (cet->locked) + return -EPERM; + + features = (unsigned int)arg2; + + if (features & GNU_PROPERTY_X86_FEATURE_1_INVAL) + return -EINVAL; + if (features & GNU_PROPERTY_X86_FEATURE_1_SHSTK) + cet_disable_shstk(); + return 0; + + case ARCH_X86_CET_LOCK: + cet->locked = 1; + return 0; + + default: + return -ENOSYS; + } +} diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 67632ba893b7..33cb6da22ef0 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -977,14 +977,14 @@ unsigned long get_wchan(struct task_struct *p) } long do_arch_prctl_common(struct task_struct *task, int option, - unsigned long cpuid_enabled) + unsigned long arg2) { switch (option) { case ARCH_GET_CPUID: return get_cpuid_mode(); case ARCH_SET_CPUID: - return set_cpuid_mode(task, cpuid_enabled); + return set_cpuid_mode(task, arg2); } - return -EINVAL; + return prctl_cet(option, arg2); } From patchwork Tue Nov 10 16:22:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11894803 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6F4D7697 for ; Tue, 10 Nov 2020 16:23:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1C11F20797 for ; Tue, 10 Nov 2020 16:23:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1C11F20797 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6CB156B009C; Tue, 10 Nov 2020 11:23:02 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6501F6B009E; Tue, 10 Nov 2020 11:23:02 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 36BCE6B009D; Tue, 10 Nov 2020 11:23:02 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0199.hostedemail.com [216.40.44.199]) by kanga.kvack.org (Postfix) with ESMTP id 03E5E6B009B for ; Tue, 10 Nov 2020 11:23:01 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 9E469180AD811 for ; Tue, 10 Nov 2020 16:23:01 +0000 (UTC) X-FDA: 77469027762.26.slip30_1e109e6272f6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id 735B71804B660 for ; Tue, 10 Nov 2020 16:23:01 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30036:30054:30056:30062:30064,0,RBL:134.134.136.100:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04y89ee6echhtgxnneh5iynsew3n5opytehwhirtbbm37c1h841na4nyxhgsbbg.g44j3nxducqfrrm7gdt8n9krbg13ziqu7y8d6orn1gkwipwn7gjqcadnwm9ee8w.n-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:397,LUA_SUMMARY:none X-HE-Tag: slip30_1e109e6272f6 X-Filterd-Recvd-Size: 9579 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Nov 2020 16:22:59 +0000 (UTC) IronPort-SDR: s2FaxtVJSTZM2JVblSp80uo4mnCUPzNWVp3b/mYoWW1RgwEVnEId0/D6piFxrOcXW01ZXf09b9 eNgKMxA9PJSA== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="234168878" X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="234168878" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:56 -0800 IronPort-SDR: O4DP9VBWsHs9OYuBGdMV743IPlwqTcFJueU4f0RYdXLLm5KW2GMMk3xzXPcc1AagNj3YiJFz/R 2hilZ6W0c4iA== X-IronPort-AV: E=Sophos;i="5.77,466,1596524400"; d="scan'208";a="365572944" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Nov 2020 08:22:56 -0800 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu Cc: Yu-cheng Yu Subject: [PATCH v15 26/26] mm: Introduce PROT_SHSTK for shadow stack Date: Tue, 10 Nov 2020 08:22:11 -0800 Message-Id: <20201110162211.9207-27-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201110162211.9207-1-yu-cheng.yu@intel.com> References: <20201110162211.9207-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There are three possible options to create a shadow stack allocation API: an arch_prctl, a new syscall, or adding PROT_SHSTK to mmap()/mprotect(). Each has its advantages and compromises. An arch_prctl() is the least intrusive. However, the existing x86 arch_prctl() takes only two parameters. Multiple parameters must be passed in a memory buffer. There is a proposal to pass more parameters in registers [1], but no active discussion on that. A new syscall minimizes compatibility issues and offers an extensible frame work to other architectures, but this will likely result in some overlap of mmap()/mprotect(). The introduction of PROT_SHSTK to mmap()/mprotect() takes advantage of existing APIs. The x86-specific PROT_SHSTK is translated to VM_SHSTK and a shadow stack mapping is created without reinventing the wheel. There are potential pitfalls though. The most obvious one would be using this as a bypass to shadow stack protection. However, the attacker would have to get to the syscall first. Since arch_calc_vm_prot_bits() is modified, I have moved arch_vm_get_page _prot() and arch_calc_vm_prot_bits() to x86/include/asm/mman.h. This will be more consistent with other architectures. [1] https://lore.kernel.org/lkml/20200828121624.108243-1-hjl.tools@gmail.com/ Signed-off-by: Yu-cheng Yu --- arch/x86/include/asm/mman.h | 83 ++++++++++++++++++++++++++++++++ arch/x86/include/uapi/asm/mman.h | 28 ++--------- include/linux/mm.h | 1 + mm/mmap.c | 8 ++- 4 files changed, 95 insertions(+), 25 deletions(-) create mode 100644 arch/x86/include/asm/mman.h diff --git a/arch/x86/include/asm/mman.h b/arch/x86/include/asm/mman.h new file mode 100644 index 000000000000..0dcaef6f889a --- /dev/null +++ b/arch/x86/include/asm/mman.h @@ -0,0 +1,83 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_MMAN_H +#define _ASM_X86_MMAN_H + +#include +#include + +#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS +/* + * Take the 4 protection key bits out of the vma->vm_flags + * value and turn them in to the bits that we can put in + * to a pte. + * + * Only override these if Protection Keys are available + * (which is only on 64-bit). + */ +#define arch_vm_get_page_prot(vm_flags) __pgprot( \ + ((vm_flags) & VM_PKEY_BIT0 ? _PAGE_PKEY_BIT0 : 0) | \ + ((vm_flags) & VM_PKEY_BIT1 ? _PAGE_PKEY_BIT1 : 0) | \ + ((vm_flags) & VM_PKEY_BIT2 ? _PAGE_PKEY_BIT2 : 0) | \ + ((vm_flags) & VM_PKEY_BIT3 ? _PAGE_PKEY_BIT3 : 0)) + +#define pkey_vm_prot_bits(prot, key) ( \ + ((key) & 0x1 ? VM_PKEY_BIT0 : 0) | \ + ((key) & 0x2 ? VM_PKEY_BIT1 : 0) | \ + ((key) & 0x4 ? VM_PKEY_BIT2 : 0) | \ + ((key) & 0x8 ? VM_PKEY_BIT3 : 0)) +#else +#define pkey_vm_prot_bits(prot, key) (0) +#endif + +static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot, + unsigned long pkey) +{ + unsigned long vm_prot_bits = pkey_vm_prot_bits(prot, pkey); + + if (!(prot & PROT_WRITE) && (prot & PROT_SHSTK)) + vm_prot_bits |= VM_SHSTK; + + return vm_prot_bits; +} +#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey) + +#ifdef CONFIG_X86_SHADOW_STACK_USER +static inline bool arch_validate_prot(unsigned long prot, unsigned long addr) +{ + unsigned long valid = PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM; + + if (prot & ~(valid | PROT_SHSTK)) + return false; + + if (prot & PROT_SHSTK) { + struct vm_area_struct *vma; + + if (!current->thread.cet.shstk_size) + return false; + + /* + * A shadow stack mapping is indirectly writable by only + * the CALL and WRUSS instructions, but not other write + * instructions). PROT_SHSTK and PROT_WRITE are mutually + * exclusive. + */ + if (prot & PROT_WRITE) + return false; + + vma = find_vma(current->mm, addr); + if (!vma) + return false; + + /* + * Shadow stack cannot be backed by a file or shared. + */ + if (vma->vm_file || (vma->vm_flags & VM_SHARED)) + return false; + } + + return true; +} +#define arch_validate_prot arch_validate_prot +#endif + +#endif /* _ASM_X86_MMAN_H */ diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h index d4a8d0424bfb..39bb7db344a6 100644 --- a/arch/x86/include/uapi/asm/mman.h +++ b/arch/x86/include/uapi/asm/mman.h @@ -1,31 +1,11 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ -#ifndef _ASM_X86_MMAN_H -#define _ASM_X86_MMAN_H +#ifndef _UAPI_ASM_X86_MMAN_H +#define _UAPI_ASM_X86_MMAN_H #define MAP_32BIT 0x40 /* only give out 32bit addresses */ -#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS -/* - * Take the 4 protection key bits out of the vma->vm_flags - * value and turn them in to the bits that we can put in - * to a pte. - * - * Only override these if Protection Keys are available - * (which is only on 64-bit). - */ -#define arch_vm_get_page_prot(vm_flags) __pgprot( \ - ((vm_flags) & VM_PKEY_BIT0 ? _PAGE_PKEY_BIT0 : 0) | \ - ((vm_flags) & VM_PKEY_BIT1 ? _PAGE_PKEY_BIT1 : 0) | \ - ((vm_flags) & VM_PKEY_BIT2 ? _PAGE_PKEY_BIT2 : 0) | \ - ((vm_flags) & VM_PKEY_BIT3 ? _PAGE_PKEY_BIT3 : 0)) - -#define arch_calc_vm_prot_bits(prot, key) ( \ - ((key) & 0x1 ? VM_PKEY_BIT0 : 0) | \ - ((key) & 0x2 ? VM_PKEY_BIT1 : 0) | \ - ((key) & 0x4 ? VM_PKEY_BIT2 : 0) | \ - ((key) & 0x8 ? VM_PKEY_BIT3 : 0)) -#endif +#define PROT_SHSTK 0x10 /* shadow stack pages */ #include -#endif /* _ASM_X86_MMAN_H */ +#endif /* _UAPI_ASM_X86_MMAN_H */ diff --git a/include/linux/mm.h b/include/linux/mm.h index ef1bd7c7e88b..4c5ff13ed332 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -334,6 +334,7 @@ extern unsigned int kobjsize(const void *objp); #if defined(CONFIG_X86) # define VM_PAT VM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ +# define VM_ARCH_CLEAR VM_SHSTK #elif defined(CONFIG_PPC) # define VM_SAO VM_ARCH_1 /* Strong Access Ordering (powerpc) */ #elif defined(CONFIG_PARISC) diff --git a/mm/mmap.c b/mm/mmap.c index c4938e4b789b..da7e0c6689ee 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1483,6 +1483,12 @@ unsigned long do_mmap(struct file *file, unsigned long addr, struct inode *inode = file_inode(file); unsigned long flags_mask; + /* + * Call stack cannot be backed by a file. + */ + if (vm_flags & VM_SHSTK) + return -EINVAL; + if (!file_mmap_ok(file, inode, pgoff, len)) return -EOVERFLOW; @@ -1547,7 +1553,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, } else { switch (flags & MAP_TYPE) { case MAP_SHARED: - if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP)) + if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP|VM_SHSTK)) return -EINVAL; /* * Ignore pgoff.