From patchwork Tue Aug 25 00:25:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734513 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A7A0B913 for ; Tue, 25 Aug 2020 00:29:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 720B120897 for ; Tue, 25 Aug 2020 00:29:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 720B120897 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EFFCE6B0032; Mon, 24 Aug 2020 20:29:33 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DB10B8D0003; Mon, 24 Aug 2020 20:29:33 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BA98E6B0037; Mon, 24 Aug 2020 20:29:33 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0145.hostedemail.com [216.40.44.145]) by kanga.kvack.org (Postfix) with ESMTP id 928E96B0032 for ; Mon, 24 Aug 2020 20:29:33 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 5502D180AD815 for ; Tue, 25 Aug 2020 00:29:33 +0000 (UTC) X-FDA: 77187207426.29.shame95_340753d27057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id 1A92F18086CDA for ; Tue, 25 Aug 2020 00:29:33 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30012:30046:30051:30054:30055:30056:30062:30064:30069:30070:30075:30079:30089,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04ygts3ka5p8k66pf3ewutu8fhy3yychp3cwsocb5ecpsk9puzq37p9m71shtf7.591k1trobmhbni14qe7sam6kbnd9h8k1ey5w6yw8tti38jhjp3jpkasjd1axbkr.s-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: shame95_340753d27057 X-Filterd-Recvd-Size: 10374 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:31 +0000 (UTC) IronPort-SDR: I041CpFgPJTf3rurWmKfQ3u6tPQwXc4g9k8LgKs/EbY8TyBwlwIb64lDK2YoNgnz1WH5K2zoCe 7GWGOXQXUmSw== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061697" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061697" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:29 -0700 IronPort-SDR: I7wwnriYe3/d0U7TLK1Qy22Y13+ieIA2PpRnP4sMLoWFNCBzYz8nuyPBwZKTk0WlNcOfp81n2L j7047IvqPKYQ== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474134925" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:29 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu Subject: [PATCH v11 01/25] Documentation/x86: Add CET description Date: Mon, 24 Aug 2020 17:25:16 -0700 Message-Id: <20200825002540.3351-2-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 1A92F18086CDA X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Explain no_user_shstk/no_user_ibt kernel parameters, and introduce a new document on Control-flow Enforcement Technology (CET). Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- v11: - Add back GLIBC tunables information. - Add ARCH_X86_CET_MMAP_SHSTK information. v10: - Change no_cet_shstk and no_cet_ibt to no_user_shstk and no_user_ibt. - Remove the opcode section, as it is already in the Intel SDM. - Remove sections related to GLIBC implementation. - Remove shadow stack memory management section, as it is already in the code comments. - Remove legacy bitmap related information, as it is not supported now. - Fix arch_ioctl() related text. - Change SHSTK, IBT to plain English. .../admin-guide/kernel-parameters.txt | 6 + Documentation/x86/index.rst | 1 + Documentation/x86/intel_cet.rst | 143 ++++++++++++++++++ 3 files changed, 150 insertions(+) create mode 100644 Documentation/x86/intel_cet.rst diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index bdc1f33fd3d1..c85373c120a3 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3167,6 +3167,12 @@ noexec=on: enable non-executable mappings (default) noexec=off: disable non-executable mappings + no_user_shstk [X86-64] Disable Shadow Stack for user-mode + applications + + no_user_ibt [X86-64] Disable Indirect Branch Tracking for user-mode + applications + nosmap [X86,PPC] Disable SMAP (Supervisor Mode Access Prevention) even if it is supported by processor. diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index 265d9e9a093b..2aef972a868d 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -19,6 +19,7 @@ x86-specific Documentation tlb mtrr pat + intel_cet intel-iommu intel_txt amd-memory-encryption diff --git a/Documentation/x86/intel_cet.rst b/Documentation/x86/intel_cet.rst new file mode 100644 index 000000000000..2deda249bc2c --- /dev/null +++ b/Documentation/x86/intel_cet.rst @@ -0,0 +1,143 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================================= +Control-flow Enforcement Technology (CET) +========================================= + +[1] Overview +============ + +Control-flow Enforcement Technology (CET) is an Intel processor feature +that provides protection against return/jump-oriented programming (ROP) +attacks. It can be set up to protect both applications and the kernel. +Only user-mode protection is implemented in the 64-bit kernel, including +support for running legacy 32-bit applications. + +CET introduces Shadow Stack and Indirect Branch Tracking. Shadow stack is +a secondary stack allocated from memory and cannot be directly modified by +applications. When executing a CALL, the processor pushes the return +address to both the normal stack and the shadow stack. Upon function +return, the processor pops the shadow stack copy and compares it to the +normal stack copy. If the two differ, the processor raises a control- +protection fault. Indirect branch tracking verifies indirect CALL/JMP +targets are intended as marked by the compiler with 'ENDBR' opcodes. + +There are two kernel configuration options: + + X86_INTEL_SHADOW_STACK_USER, and + X86_INTEL_BRANCH_TRACKING_USER. + +These need to be enabled to build a CET-enabled kernel, and Binutils v2.31 +and GCC v8.1 or later are required to build a CET kernel. To build a CET- +enabled application, GLIBC v2.28 or later is also required. + +There are two command-line options for disabling CET features:: + + no_user_shstk - disables user shadow stack, and + no_user_ibt - disables user indirect branch tracking. + +At run time, /proc/cpuinfo shows CET features if the processor supports +CET. + +[2] Application Enabling +======================== + +An application's CET capability is marked in its ELF header and can be +verified from the following command output, in the NT_GNU_PROPERTY_TYPE_0 +field: + + readelf -n + +If an application supports CET and is statically linked, it will run with +CET protection. If the application needs any shared libraries, the loader +checks all dependencies and enables CET when all requirements are met. + +[3] Backward Compatibility +========================== + +GLIBC provides a few tunables for backward compatibility. + +GLIBC_TUNABLES=glibc.tune.hwcaps=-SHSTK,-IBT + Turn off SHSTK/IBT for the current shell. + +GLIBC_TUNABLES=glibc.tune.x86_shstk= + This controls how dlopen() handles SHSTK legacy libraries:: + + on - continue with SHSTK enabled; + permissive - continue with SHSTK off. + +[4] CET arch_prctl()'s +====================== + +Several arch_prctl()'s have been added for CET: + +arch_prctl(ARCH_X86_CET_STATUS, u64 *addr) + Return CET feature status. + + The parameter 'addr' is a pointer to a user buffer. + On returning to the caller, the kernel fills the following + information:: + + *addr = shadow stack/indirect branch tracking status + *(addr + 1) = shadow stack base address + *(addr + 2) = shadow stack size + +arch_prctl(ARCH_X86_CET_DISABLE, u64 features) + Disable shadow stack and/or indirect branch tracking as specified in + 'features'. Return -EPERM if CET is locked. + +arch_prctl(ARCH_X86_CET_LOCK) + Lock in all CET features. They cannot be turned off afterwards. + +arch_prctl(ARCH_X86_CET_MMAP_SHSTK, u64 *args) + Allocate a new shadow stack and put a restore token at top. + + The parameter 'args' is a pointer to a user buffer:: + + *args = desired size + *(args + 1) = MAP_32BIT or MAP_POPULATE + + On returning, *args is the allocated shadow stack address. + +Note: + There is no CET-enabling arch_prctl function. By design, CET is enabled + automatically if the binary and the system can support it. + +[5] The implementation of the Shadow Stack +========================================== + +Shadow Stack size +----------------- + +A task's shadow stack is allocated from memory to a fixed size of +MIN(RLIMIT_STACK, 4 GB). In other words, the shadow stack is allocated to +the maximum size of the normal stack, but capped to 4 GB. However, +a compat-mode application's address space is smaller, each of its thread's +shadow stack size is MIN(1/4 RLIMIT_STACK, 4 GB). + +Signal +------ + +The main program and its signal handlers use the same shadow stack. +Because the shadow stack stores only return addresses, a large shadow +stack covers the condition that both the program stack and the signal +alternate stack run out. + +The kernel creates a restore token for the shadow stack restoring address +and verifies that token when restoring from the signal handler. + +Fork +---- + +The shadow stack's vma has VM_SHSTK flag set; its PTEs are required to be +read-only and dirty. When a shadow stack PTE is not RO and dirty, a +shadow access triggers a page fault with the shadow stack access bit set +in the page fault error code. + +When a task forks a child, its shadow stack PTEs are copied and both the +parent's and the child's shadow stack PTEs are cleared of the dirty bit. +Upon the next shadow stack access, the resulting shadow stack page fault +is handled by page copy/re-use. + +When a pthread child is created, the kernel allocates a new shadow stack +for the new thread. From patchwork Tue Aug 25 00:25:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734511 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 67366913 for ; Tue, 25 Aug 2020 00:29:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3E4ED20897 for ; Tue, 25 Aug 2020 00:29:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3E4ED20897 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C65DD6B0031; Mon, 24 Aug 2020 20:29:33 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C429B6B0032; Mon, 24 Aug 2020 20:29:33 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A20646B0036; Mon, 24 Aug 2020 20:29:33 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0086.hostedemail.com [216.40.44.86]) by kanga.kvack.org (Postfix) with ESMTP id 854DD6B0031 for ; Mon, 24 Aug 2020 20:29:33 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 479A3362B for ; Tue, 25 Aug 2020 00:29:33 +0000 (UTC) X-FDA: 77187207426.27.noise68_3b1139f27057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id 161613D668 for ; Tue, 25 Aug 2020 00:29:33 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30055:30056:30064,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yrk77twfg1ts9f5dfb4eqkrote3yc6j1t1iotjcj6fu7a5cj3ce67oam3sxss.yfraxaqdyt9djbh9b5g8gx1p4g1bdwbpa37i6wcux5fprnh6hrjur97qdqfh1ax.q-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:26,LUA_SUMMARY:none X-HE-Tag: noise68_3b1139f27057 X-Filterd-Recvd-Size: 5017 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:31 +0000 (UTC) IronPort-SDR: vsrdvpbsDIgl+5v9Q35ZlBSA50tCTotxNli/niyGjoSsuPrOShpasqKt19pEv7Mcu2TJrI0x3v VEZDP8f6Ed1Q== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061701" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061701" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:30 -0700 IronPort-SDR: 5VW3PtJHi+h4dD6QQK1lytJjAZr4EPFzbKFAWOQKRuGQ3MEuyGmQURdNBzyM7wL4oglw0rXnrE JKI0qUqOZgeg== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474134930" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:29 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu , Borislav Petkov Subject: [PATCH v11 02/25] x86/cpufeatures: Add CET CPU feature flags for Control-flow Enforcement Technology (CET) Date: Mon, 24 Aug 2020 17:25:17 -0700 Message-Id: <20200825002540.3351-3-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 161613D668 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add CPU feature flags for Control-flow Enforcement Technology (CET). CPUID.(EAX=7,ECX=0):ECX[bit 7] Shadow stack CPUID.(EAX=7,ECX=0):EDX[bit 20] Indirect Branch Tracking Signed-off-by: Yu-cheng Yu Reviewed-by: Borislav Petkov Reviewed-by: Kees Cook --- arch/x86/include/asm/cpufeatures.h | 2 ++ arch/x86/kernel/cpu/cpuid-deps.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 2901d5df4366..c794e18e8a14 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -341,6 +341,7 @@ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ #define X86_FEATURE_WAITPKG (16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */ #define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ +#define X86_FEATURE_SHSTK (16*32+ 7) /* Shadow Stack */ #define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ #define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */ #define X86_FEATURE_VPCLMULQDQ (16*32+10) /* Carry-Less Multiplication Double Quadword */ @@ -370,6 +371,7 @@ #define X86_FEATURE_SERIALIZE (18*32+14) /* SERIALIZE instruction */ #define X86_FEATURE_PCONFIG (18*32+18) /* Intel PCONFIG */ #define X86_FEATURE_ARCH_LBR (18*32+19) /* Intel ARCH LBR */ +#define X86_FEATURE_IBT (18*32+20) /* Indirect Branch Tracking */ #define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */ #define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */ #define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */ diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c index 3cbe24ca80ab..fec83cc74b9e 100644 --- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -69,6 +69,8 @@ static const struct cpuid_dep cpuid_deps[] = { { X86_FEATURE_CQM_MBM_TOTAL, X86_FEATURE_CQM_LLC }, { X86_FEATURE_CQM_MBM_LOCAL, X86_FEATURE_CQM_LLC }, { X86_FEATURE_AVX512_BF16, X86_FEATURE_AVX512VL }, + { X86_FEATURE_SHSTK, X86_FEATURE_XSAVES }, + { X86_FEATURE_IBT, X86_FEATURE_XSAVES }, {} }; From patchwork Tue Aug 25 00:25:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734515 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AA3A3913 for ; Tue, 25 Aug 2020 00:29:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 749FB22B4D for ; Tue, 25 Aug 2020 00:29:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 749FB22B4D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A73E28D0007; Mon, 24 Aug 2020 20:29:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 95ABB6B0036; Mon, 24 Aug 2020 20:29:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 654B88D0007; Mon, 24 Aug 2020 20:29:34 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0026.hostedemail.com [216.40.44.26]) by kanga.kvack.org (Postfix) with ESMTP id 32C556B0033 for ; Mon, 24 Aug 2020 20:29:34 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id EF3C782499B9 for ; Tue, 25 Aug 2020 00:29:33 +0000 (UTC) X-FDA: 77187207426.04.nose96_1a10eee27057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin04.hostedemail.com (Postfix) with ESMTP id C175D800B211 for ; Tue, 25 Aug 2020 00:29:33 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30034:30045:30051:30054:30056:30064:30075:30090,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04ygf39w51rifbgepepazxkznbgcnop56c3a8pgmwfo8karwi1xsso9ty9f61ik.c8pds36fznxtt9maoofxj6qtrknbc5jq1us63xrhpxnt78e59wzffy4kztpk9i1.w-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: nose96_1a10eee27057 X-Filterd-Recvd-Size: 10918 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:32 +0000 (UTC) IronPort-SDR: CaGUKBKTIp9D7qnRoAgfuCaz7ZJwWOC6jAv1QegLCPvnM+MijQChczmAH+vXiYz6p6Vcsl5aD1 WxVn6lJxCdCQ== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061704" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061704" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:31 -0700 IronPort-SDR: 1Uy34UfhhHWygHbkYgOMSUhSqjnlyRxqEbxhNMtwxVhifSoun64rFy/ioQyUVh6XjkwOVYIsfe W1vp3M8sdB8w== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474134934" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:30 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu Subject: [PATCH v11 03/25] x86/fpu/xstate: Introduce CET MSR XSAVES supervisor states Date: Mon, 24 Aug 2020 17:25:18 -0700 Message-Id: <20200825002540.3351-4-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: C175D800B211 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Control-flow Enforcement Technology (CET) adds five MSRs. Introduce them and their XSAVES supervisor states: MSR_IA32_U_CET (user-mode CET settings), MSR_IA32_PL3_SSP (user-mode Shadow Stack pointer), MSR_IA32_PL0_SSP (kernel-mode Shadow Stack pointer), MSR_IA32_PL1_SSP (Privilege Level 1 Shadow Stack pointer), MSR_IA32_PL2_SSP (Privilege Level 2 Shadow Stack pointer). Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- v11: - Drop MSR_IA32 prefix for individual bits, and use BIT_ULL(). - Drop MSR_IA32_CET_BITMAP_MASK. v6: - Remove __packed from struct cet_user_state, struct cet_kernel_state. arch/x86/include/asm/fpu/types.h | 23 +++++++++++++++-- arch/x86/include/asm/fpu/xstate.h | 5 ++-- arch/x86/include/asm/msr-index.h | 17 +++++++++++++ arch/x86/include/uapi/asm/processor-flags.h | 2 ++ arch/x86/kernel/fpu/xstate.c | 28 ++++++++++++++++++--- 5 files changed, 68 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index c87364ea6446..2a7037a6f960 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -115,8 +115,8 @@ enum xfeature { XFEATURE_PT_UNIMPLEMENTED_SO_FAR, XFEATURE_PKRU, XFEATURE_RSRVD_COMP_10, - XFEATURE_RSRVD_COMP_11, - XFEATURE_RSRVD_COMP_12, + XFEATURE_CET_USER, + XFEATURE_CET_KERNEL, XFEATURE_RSRVD_COMP_13, XFEATURE_RSRVD_COMP_14, XFEATURE_LBR, @@ -134,6 +134,8 @@ enum xfeature { #define XFEATURE_MASK_Hi16_ZMM (1 << XFEATURE_Hi16_ZMM) #define XFEATURE_MASK_PT (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR) #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU) +#define XFEATURE_MASK_CET_USER (1 << XFEATURE_CET_USER) +#define XFEATURE_MASK_CET_KERNEL (1 << XFEATURE_CET_KERNEL) #define XFEATURE_MASK_LBR (1 << XFEATURE_LBR) #define XFEATURE_MASK_FPSSE (XFEATURE_MASK_FP | XFEATURE_MASK_SSE) @@ -236,6 +238,23 @@ struct pkru_state { u32 pad; } __packed; +/* + * State component 11 is Control-flow Enforcement user states + */ +struct cet_user_state { + u64 user_cet; /* user control-flow settings */ + u64 user_ssp; /* user shadow stack pointer */ +}; + +/* + * State component 12 is Control-flow Enforcement kernel states + */ +struct cet_kernel_state { + u64 kernel_ssp; /* kernel shadow stack */ + u64 pl1_ssp; /* privilege level 1 shadow stack */ + u64 pl2_ssp; /* privilege level 2 shadow stack */ +}; + /* * State component 15: Architectural LBR configuration state. * The size of Arch LBR state depends on the number of LBRs (lbr_depth). diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h index 14ab815132d4..e4408db88bca 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -35,7 +35,7 @@ XFEATURE_MASK_BNDCSR) /* All currently supported supervisor features */ -#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (0) +#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_CET_USER) /* * A supervisor state component may not always contain valuable information, @@ -62,7 +62,8 @@ * Unsupported supervisor features. When a supervisor feature in this mask is * supported in the future, move it to the supported supervisor feature mask. */ -#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT) +#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT | \ + XFEATURE_MASK_CET_KERNEL) /* All supervisor states including supported and unsupported states. */ #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \ diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 2859ee4f39a8..25bd727f6fa1 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -912,4 +912,21 @@ #define MSR_VM_IGNNE 0xc0010115 #define MSR_VM_HSAVE_PA 0xc0010117 +/* Control-flow Enforcement Technology MSRs */ +#define MSR_IA32_U_CET 0x6a0 /* user mode cet setting */ +#define MSR_IA32_S_CET 0x6a2 /* kernel mode cet setting */ +#define MSR_IA32_PL0_SSP 0x6a4 /* kernel shstk pointer */ +#define MSR_IA32_PL1_SSP 0x6a5 /* ring-1 shstk pointer */ +#define MSR_IA32_PL2_SSP 0x6a6 /* ring-2 shstk pointer */ +#define MSR_IA32_PL3_SSP 0x6a7 /* user shstk pointer */ +#define MSR_IA32_INT_SSP_TAB 0x6a8 /* exception shstk table */ + +/* MSR_IA32_U_CET and MSR_IA32_S_CET bits */ +#define CET_SHSTK_EN BIT_ULL(0) +#define CET_WRSS_EN BIT_ULL(1) +#define CET_ENDBR_EN BIT_ULL(2) +#define CET_LEG_IW_EN BIT_ULL(3) +#define CET_NO_TRACK_EN BIT_ULL(4) +#define CET_WAIT_ENDBR BIT_ULL(11) + #endif /* _ASM_X86_MSR_INDEX_H */ diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h index bcba3c643e63..a8df907e8017 100644 --- a/arch/x86/include/uapi/asm/processor-flags.h +++ b/arch/x86/include/uapi/asm/processor-flags.h @@ -130,6 +130,8 @@ #define X86_CR4_SMAP _BITUL(X86_CR4_SMAP_BIT) #define X86_CR4_PKE_BIT 22 /* enable Protection Keys support */ #define X86_CR4_PKE _BITUL(X86_CR4_PKE_BIT) +#define X86_CR4_CET_BIT 23 /* enable Control-flow Enforcement */ +#define X86_CR4_CET _BITUL(X86_CR4_CET_BIT) /* * x86-64 Task Priority Register, CR8 diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 038e19c0019e..705fd9b94e31 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -38,6 +38,9 @@ static const char *xfeature_names[] = "Processor Trace (unused)" , "Protection Keys User registers", "unknown xstate feature" , + "Control-flow User registers" , + "Control-flow Kernel registers" , + "unknown xstate feature" , }; static short xsave_cpuid_features[] __initdata = { @@ -51,6 +54,9 @@ static short xsave_cpuid_features[] __initdata = { X86_FEATURE_AVX512F, X86_FEATURE_INTEL_PT, X86_FEATURE_PKU, + -1, /* Unused */ + X86_FEATURE_SHSTK, /* XFEATURE_CET_USER */ + X86_FEATURE_SHSTK, /* XFEATURE_CET_KERNEL */ }; /* @@ -318,6 +324,8 @@ static void __init print_xstate_features(void) print_xstate_feature(XFEATURE_MASK_ZMM_Hi256); print_xstate_feature(XFEATURE_MASK_Hi16_ZMM); print_xstate_feature(XFEATURE_MASK_PKRU); + print_xstate_feature(XFEATURE_MASK_CET_USER); + print_xstate_feature(XFEATURE_MASK_CET_KERNEL); } /* @@ -592,6 +600,8 @@ static void check_xstate_against_struct(int nr) XCHECK_SZ(sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state); XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM, struct avx_512_hi16_state); XCHECK_SZ(sz, nr, XFEATURE_PKRU, struct pkru_state); + XCHECK_SZ(sz, nr, XFEATURE_CET_USER, struct cet_user_state); + XCHECK_SZ(sz, nr, XFEATURE_CET_KERNEL, struct cet_kernel_state); /* * Make *SURE* to add any feature numbers in below if @@ -601,7 +611,8 @@ static void check_xstate_against_struct(int nr) if ((nr < XFEATURE_YMM) || (nr >= XFEATURE_MAX) || (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) || - ((nr >= XFEATURE_RSRVD_COMP_10) && (nr <= XFEATURE_LBR))) { + (nr == XFEATURE_RSRVD_COMP_10) || + ((nr >= XFEATURE_RSRVD_COMP_13) && (nr <= XFEATURE_LBR))) { WARN_ONCE(1, "no structure for xstate: %d\n", nr); XSTATE_WARN_ON(1); } @@ -831,8 +842,19 @@ void __init fpu__init_system_xstate(void) * Clear XSAVE features that are disabled in the normal CPUID. */ for (i = 0; i < ARRAY_SIZE(xsave_cpuid_features); i++) { - if (!boot_cpu_has(xsave_cpuid_features[i])) - xfeatures_mask_all &= ~BIT_ULL(i); + if (xsave_cpuid_features[i] == X86_FEATURE_SHSTK) { + /* + * X86_FEATURE_SHSTK and X86_FEATURE_IBT share + * same states, but can be enabled separately. + */ + if (!boot_cpu_has(X86_FEATURE_SHSTK) && + !boot_cpu_has(X86_FEATURE_IBT)) + xfeatures_mask_all &= ~BIT_ULL(i); + } else { + if ((xsave_cpuid_features[i] == -1) || + !boot_cpu_has(xsave_cpuid_features[i])) + xfeatures_mask_all &= ~BIT_ULL(i); + } } xfeatures_mask_all &= fpu__get_supported_xfeatures_mask(); From patchwork Tue Aug 25 00:25:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734517 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0C069109B for ; Tue, 25 Aug 2020 00:29:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CA11520838 for ; Tue, 25 Aug 2020 00:29:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CA11520838 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CEE8C6B0033; Mon, 24 Aug 2020 20:29:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A70DC8D0003; Mon, 24 Aug 2020 20:29:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8EC706B0037; Mon, 24 Aug 2020 20:29:34 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0019.hostedemail.com [216.40.44.19]) by kanga.kvack.org (Postfix) with ESMTP id 5F0298D0003 for ; Mon, 24 Aug 2020 20:29:34 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 1AF791EF1 for ; Tue, 25 Aug 2020 00:29:34 +0000 (UTC) X-FDA: 77187207468.18.skin98_3d00bf827057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id E16DF100EC661 for ; Tue, 25 Aug 2020 00:29:33 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30051:30054:30056:30064:30070:30079:30083,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yrk9fikecsfsyqerd3yirkxwb8hyczp38dtjschrno16w4dcuotb4oerj8mf5.qap8bc1o54hn8e5pp73xj9dexi4apm6opbxth3hpzu9eykink5a5acqzq7kwwtz.1-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: skin98_3d00bf827057 X-Filterd-Recvd-Size: 7902 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:33 +0000 (UTC) IronPort-SDR: QRG5B+ICzXXwa3e7Fv4W0P9RSbukpB9nD4ErD7iYtfj4JRYxz8BIZcl+qRhISGq4BG6WztBcgH jmalHkNArm6A== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061708" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061708" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:31 -0700 IronPort-SDR: favj2E9b8nA5uKGkkmwaKujg3pgnHR5nwgizYOPuiislUvm2DprYYa5y56zQEwkc2DRAZjgOoZ PZQTE+TskdaQ== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474134939" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:31 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu Subject: [PATCH v11 04/25] x86/cet: Add control-protection fault handler Date: Mon, 24 Aug 2020 17:25:19 -0700 Message-Id: <20200825002540.3351-5-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: E16DF100EC661 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A control-protection fault is triggered when a control-flow transfer attempt violates Shadow Stack or Indirect Branch Tracking constraints. For example, the return address for a RET instruction differs from the copy on the Shadow Stack; or an indirect JMP instruction, without the NOTRACK prefix, arrives at a non-ENDBR opcode. The control-protection fault handler works in a similar way as the general protection fault handler. It provides the si_code SEGV_CPERR to the signal handler. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- v10: - Change CONFIG_X86_64 to CONFIG_X86_INTEL_CET. v9: - Add Shadow Stack pointer to the fault printout. arch/x86/include/asm/idtentry.h | 4 ++ arch/x86/kernel/idt.c | 4 ++ arch/x86/kernel/signal_compat.c | 2 +- arch/x86/kernel/traps.c | 59 ++++++++++++++++++++++++++++++ include/uapi/asm-generic/siginfo.h | 3 +- 5 files changed, 70 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h index a43366191212..47ffdb658867 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -532,6 +532,10 @@ DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_SS, exc_stack_segment); DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_GP, exc_general_protection); DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_AC, exc_alignment_check); +#ifdef CONFIG_X86_INTEL_CET +DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP, exc_control_protection); +#endif + /* Raw exception entries which need extra work */ DECLARE_IDTENTRY_RAW(X86_TRAP_UD, exc_invalid_op); DECLARE_IDTENTRY_RAW(X86_TRAP_BP, exc_int3); diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c index 7ecf9babf0cb..395c34fd201a 100644 --- a/arch/x86/kernel/idt.c +++ b/arch/x86/kernel/idt.c @@ -112,6 +112,10 @@ static const __initconst struct idt_data def_idts[] = { #elif defined(CONFIG_X86_32) SYSG(IA32_SYSCALL_VECTOR, entry_INT80_32), #endif + +#ifdef CONFIG_X86_INTEL_CET + INTG(X86_TRAP_CP, asm_exc_control_protection), +#endif }; /* diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c index 9ccbf0576cd0..c572a3de1037 100644 --- a/arch/x86/kernel/signal_compat.c +++ b/arch/x86/kernel/signal_compat.c @@ -27,7 +27,7 @@ static inline void signal_compat_build_tests(void) */ BUILD_BUG_ON(NSIGILL != 11); BUILD_BUG_ON(NSIGFPE != 15); - BUILD_BUG_ON(NSIGSEGV != 7); + BUILD_BUG_ON(NSIGSEGV != 8); BUILD_BUG_ON(NSIGBUS != 5); BUILD_BUG_ON(NSIGTRAP != 5); BUILD_BUG_ON(NSIGCHLD != 6); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 1f66d2d1e998..dca6fdc829f2 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -597,6 +597,65 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection) cond_local_irq_disable(regs); } +#ifdef CONFIG_X86_INTEL_CET +static const char * const control_protection_err[] = { + "unknown", + "near-ret", + "far-ret/iret", + "endbranch", + "rstorssp", + "setssbsy", +}; + +/* + * When a control protection exception occurs, send a signal + * to the responsible application. Currently, control + * protection is only enabled for the user mode. This + * exception should not come from the kernel mode. + */ +DEFINE_IDTENTRY_ERRORCODE(exc_control_protection) +{ + struct task_struct *tsk; + + if (notify_die(DIE_TRAP, "control protection fault", regs, + error_code, X86_TRAP_CP, SIGSEGV) == NOTIFY_STOP) + return; + cond_local_irq_enable(regs); + + if (!user_mode(regs)) + die("kernel control protection fault", regs, error_code); + + if (!static_cpu_has(X86_FEATURE_SHSTK) && + !static_cpu_has(X86_FEATURE_IBT)) + WARN_ONCE(1, "CET is disabled but got control protection fault\n"); + + tsk = current; + tsk->thread.error_code = error_code; + tsk->thread.trap_nr = X86_TRAP_CP; + + if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) && + printk_ratelimit()) { + unsigned int max_err; + unsigned long ssp; + + max_err = ARRAY_SIZE(control_protection_err) - 1; + if ((error_code < 0) || (error_code > max_err)) + error_code = 0; + rdmsrl(MSR_IA32_PL3_SSP, ssp); + pr_info("%s[%d] control protection ip:%lx sp:%lx ssp:%lx error:%lx(%s)", + tsk->comm, task_pid_nr(tsk), + regs->ip, regs->sp, ssp, error_code, + control_protection_err[error_code]); + print_vma_addr(KERN_CONT " in ", regs->ip); + pr_cont("\n"); + } + + force_sig_fault(SIGSEGV, SEGV_CPERR, + (void __user *)uprobe_get_trap_addr(regs)); + cond_local_irq_disable(regs); +} +#endif + static bool do_int3(struct pt_regs *regs) { int res; diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index cb3d6c267181..91e10cbe3bb0 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -229,7 +229,8 @@ typedef struct siginfo { #define SEGV_ACCADI 5 /* ADI not enabled for mapped object */ #define SEGV_ADIDERR 6 /* Disrupting MCD error */ #define SEGV_ADIPERR 7 /* Precise MCD exception */ -#define NSIGSEGV 7 +#define SEGV_CPERR 8 /* Control protection fault */ +#define NSIGSEGV 8 /* * SIGBUS si_codes From patchwork Tue Aug 25 00:25:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734519 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 54B66109B for ; Tue, 25 Aug 2020 00:29:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2231322B4D for ; Tue, 25 Aug 2020 00:29:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2231322B4D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4A5306B0036; Mon, 24 Aug 2020 20:29:35 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 459D28E0010; Mon, 24 Aug 2020 20:29:35 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1BE688D0003; Mon, 24 Aug 2020 20:29:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0074.hostedemail.com [216.40.44.74]) by kanga.kvack.org (Postfix) with ESMTP id E5D7A6B0036 for ; Mon, 24 Aug 2020 20:29:34 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id AD5A0181AEF1D for ; Tue, 25 Aug 2020 00:29:34 +0000 (UTC) X-FDA: 77187207468.17.park01_360d43027057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id 80411180D0180 for ; Tue, 25 Aug 2020 00:29:34 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30051:30054:30055:30056:30064:30070,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04y8fw6z6zd3fx61gt63qinhw5643yp8ypyd8pmuyt8oc118f6qyjznwo3f5p6y.9yb7381i63s6n4ngwhakikxeoj6s1d6prbxha1s1y8i3gwif4pemm1bgiiyhmxc.4-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:32,LUA_SUMMARY:none X-HE-Tag: park01_360d43027057 X-Filterd-Recvd-Size: 5071 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:33 +0000 (UTC) IronPort-SDR: JxVvGYzZc2vophdBcF0jTFwOSYiQfazcyo05popRBB60ZTaeE2VOvFZi6tbvVVHkwxRx+phMCT 5wJ+wNPwzO0g== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061712" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061712" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:32 -0700 IronPort-SDR: KliBP1uD8mOzaFgxrsU0TS+5QwIrClOBRiOwdtSTMI9ok6R25EdFTX1tksV12Lbs2lQf7T5r50 3flkW69Zw/wA== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474134945" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:31 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu Subject: [PATCH v11 05/25] x86/cet/shstk: Add Kconfig option for user-mode Shadow Stack Date: Mon, 24 Aug 2020 17:25:20 -0700 Message-Id: <20200825002540.3351-6-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 80411180D0180 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Shadow Stack provides protection against function return address corruption. It is active when the processor supports it, the kernel has CONFIG_X86_INTEL_SHADOW_STACK_USER, and the application is built for the feature. This is only implemented for the 64-bit kernel. When it is enabled, legacy non-shadow stack applications continue to work, but without protection. Signed-off-by: Yu-cheng Yu --- v10: - Change SHSTK to shadow stack in the help text. - Change build-time check to config-time check. - Change ARCH_HAS_SHSTK to ARCH_HAS_SHADOW_STACK. arch/x86/Kconfig | 30 +++++++++++++++++++++++++++ scripts/as-x86_64-has-shadow-stack.sh | 4 ++++ 2 files changed, 34 insertions(+) create mode 100755 scripts/as-x86_64-has-shadow-stack.sh diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 7101ac64bb20..4844649ee884 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1927,6 +1927,36 @@ config X86_INTEL_TSX_MODE_AUTO side channel attacks- equals the tsx=auto command line parameter. endchoice +config AS_HAS_SHADOW_STACK + def_bool $(success,$(srctree)/scripts/as-x86_64-has-shadow-stack.sh $(CC)) + help + Test the assembler for shadow stack instructions. + +config X86_INTEL_CET + def_bool n + +config ARCH_HAS_SHADOW_STACK + def_bool n + +config X86_INTEL_SHADOW_STACK_USER + prompt "Intel Shadow Stacks for user-mode" + def_bool n + depends on CPU_SUP_INTEL && X86_64 + depends on AS_HAS_SHADOW_STACK + select ARCH_USES_HIGH_VMA_FLAGS + select X86_INTEL_CET + select ARCH_HAS_SHADOW_STACK + help + Shadow Stacks provides protection against program stack + corruption. It's a hardware feature. This only matters + if you have the right hardware. It's a security hardening + feature and apps must be enabled to use it. You get no + protection "for free" on old userspace. The hardware can + support user and kernel, but this option is for user space + only. + + If unsure, say y. + config EFI bool "EFI runtime service support" depends on ACPI diff --git a/scripts/as-x86_64-has-shadow-stack.sh b/scripts/as-x86_64-has-shadow-stack.sh new file mode 100755 index 000000000000..fac1d363a1b8 --- /dev/null +++ b/scripts/as-x86_64-has-shadow-stack.sh @@ -0,0 +1,4 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 + +echo "wrussq %rax, (%rbx)" | $* -x assembler -c - From patchwork Tue Aug 25 00:25:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734537 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 077E214F6 for ; Tue, 25 Aug 2020 00:30:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C5E0E207D8 for ; Tue, 25 Aug 2020 00:30:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C5E0E207D8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4F8B48E001A; Mon, 24 Aug 2020 20:29:41 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3D942900008; Mon, 24 Aug 2020 20:29:41 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB5AE8E001A; Mon, 24 Aug 2020 20:29:40 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0220.hostedemail.com [216.40.44.220]) by kanga.kvack.org (Postfix) with ESMTP id A5B618E000A for ; Mon, 24 Aug 2020 20:29:35 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 60EE1180AD815 for ; Tue, 25 Aug 2020 00:29:35 +0000 (UTC) X-FDA: 77187207510.16.loss22_1c06c8727057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id 319C6100E6903 for ; Tue, 25 Aug 2020 00:29:35 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30056:30064:30070,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04ygitqixp5r61x1qs6ipztsuno9nyp8okb56465rgu9hjo4urzzg4384a7rkrf.nnkki9sohky1cxmwagxxk6zkk5te39mu6c4tt3k4bnabkaddjiz87o15ej7kc94.c-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: loss22_1c06c8727057 X-Filterd-Recvd-Size: 9601 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:34 +0000 (UTC) IronPort-SDR: QGJmd2qXK9KvZ5chCsTQDB5X+kSrIbMlSJ3Rws7e/rc4LB43T5ejopN2DMir1b4/kerLaql+Rf BDWSdQkfqrKg== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061715" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061715" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:33 -0700 IronPort-SDR: OzUGVynLgsI5smQmPFBssZ+4O6PDVKyYId3qPiTZFSr3BYXkiOUQRwekTrEauQSpYVCyMMrT40 6dhAfL7DzMYw== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474134951" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:32 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu , Dave Hansen Subject: [PATCH v11 06/25] x86/mm: Change _PAGE_DIRTY to _PAGE_DIRTY_HW Date: Mon, 24 Aug 2020 17:25:21 -0700 Message-Id: <20200825002540.3351-7-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 319C6100E6903 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Before introducing _PAGE_COW for non-hardware memory management purposes in the next patch, rename _PAGE_DIRTY to _PAGE_DIRTY_HW and _PAGE_BIT_DIRTY to _PAGE_BIT_DIRTY_HW to make meanings more clear. There are no functional changes from this patch. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook Reviewed-by: Dave Hansen --- v9: - At some places _PAGE_DIRTY were not changed to _PAGE_DIRTY_HW, because they will be changed again in the next patch to _PAGE_DIRTY_BITS. However, this causes compile issues if the next patch is not yet applied. Fix it by changing all _PAGE_DIRTY to _PAGE_DRITY_HW. arch/x86/include/asm/pgtable.h | 18 +++++++++--------- arch/x86/include/asm/pgtable_types.h | 11 +++++------ arch/x86/kernel/relocate_kernel_64.S | 2 +- arch/x86/kvm/vmx/vmx.c | 2 +- 4 files changed, 16 insertions(+), 17 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index b836138ce852..86b7acd221c1 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -124,7 +124,7 @@ extern pmdval_t early_pmd_flags; */ static inline int pte_dirty(pte_t pte) { - return pte_flags(pte) & _PAGE_DIRTY; + return pte_flags(pte) & _PAGE_DIRTY_HW; } @@ -163,7 +163,7 @@ static inline int pte_young(pte_t pte) static inline int pmd_dirty(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_DIRTY; + return pmd_flags(pmd) & _PAGE_DIRTY_HW; } static inline int pmd_young(pmd_t pmd) @@ -173,7 +173,7 @@ static inline int pmd_young(pmd_t pmd) static inline int pud_dirty(pud_t pud) { - return pud_flags(pud) & _PAGE_DIRTY; + return pud_flags(pud) & _PAGE_DIRTY_HW; } static inline int pud_young(pud_t pud) @@ -334,7 +334,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte) static inline pte_t pte_mkclean(pte_t pte) { - return pte_clear_flags(pte, _PAGE_DIRTY); + return pte_clear_flags(pte, _PAGE_DIRTY_HW); } static inline pte_t pte_mkold(pte_t pte) @@ -354,7 +354,7 @@ static inline pte_t pte_mkexec(pte_t pte) static inline pte_t pte_mkdirty(pte_t pte) { - return pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + return pte_set_flags(pte, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY); } static inline pte_t pte_mkyoung(pte_t pte) @@ -435,7 +435,7 @@ static inline pmd_t pmd_mkold(pmd_t pmd) static inline pmd_t pmd_mkclean(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_DIRTY); + return pmd_clear_flags(pmd, _PAGE_DIRTY_HW); } static inline pmd_t pmd_wrprotect(pmd_t pmd) @@ -445,7 +445,7 @@ static inline pmd_t pmd_wrprotect(pmd_t pmd) static inline pmd_t pmd_mkdirty(pmd_t pmd) { - return pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + return pmd_set_flags(pmd, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY); } static inline pmd_t pmd_mkdevmap(pmd_t pmd) @@ -489,7 +489,7 @@ static inline pud_t pud_mkold(pud_t pud) static inline pud_t pud_mkclean(pud_t pud) { - return pud_clear_flags(pud, _PAGE_DIRTY); + return pud_clear_flags(pud, _PAGE_DIRTY_HW); } static inline pud_t pud_wrprotect(pud_t pud) @@ -499,7 +499,7 @@ static inline pud_t pud_wrprotect(pud_t pud) static inline pud_t pud_mkdirty(pud_t pud) { - return pud_set_flags(pud, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + return pud_set_flags(pud, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY); } static inline pud_t pud_mkdevmap(pud_t pud) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 816b31c68550..192e1326b3db 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -15,7 +15,7 @@ #define _PAGE_BIT_PWT 3 /* page write through */ #define _PAGE_BIT_PCD 4 /* page cache disabled */ #define _PAGE_BIT_ACCESSED 5 /* was accessed (raised by CPU) */ -#define _PAGE_BIT_DIRTY 6 /* was written to (raised by CPU) */ +#define _PAGE_BIT_DIRTY_HW 6 /* was written to (raised by CPU) */ #define _PAGE_BIT_PSE 7 /* 4 MB (or 2MB) page */ #define _PAGE_BIT_PAT 7 /* on 4KB pages */ #define _PAGE_BIT_GLOBAL 8 /* Global TLB entry PPro+ */ @@ -46,7 +46,7 @@ #define _PAGE_PWT (_AT(pteval_t, 1) << _PAGE_BIT_PWT) #define _PAGE_PCD (_AT(pteval_t, 1) << _PAGE_BIT_PCD) #define _PAGE_ACCESSED (_AT(pteval_t, 1) << _PAGE_BIT_ACCESSED) -#define _PAGE_DIRTY (_AT(pteval_t, 1) << _PAGE_BIT_DIRTY) +#define _PAGE_DIRTY_HW (_AT(pteval_t, 1) << _PAGE_BIT_DIRTY_HW) #define _PAGE_PSE (_AT(pteval_t, 1) << _PAGE_BIT_PSE) #define _PAGE_GLOBAL (_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL) #define _PAGE_SOFTW1 (_AT(pteval_t, 1) << _PAGE_BIT_SOFTW1) @@ -74,7 +74,7 @@ _PAGE_PKEY_BIT3) #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) -#define _PAGE_KNL_ERRATUM_MASK (_PAGE_DIRTY | _PAGE_ACCESSED) +#define _PAGE_KNL_ERRATUM_MASK (_PAGE_DIRTY_HW | _PAGE_ACCESSED) #else #define _PAGE_KNL_ERRATUM_MASK 0 #endif @@ -126,7 +126,7 @@ * pte_modify() does modify it. */ #define _PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \ - _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY | \ + _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY_HW | \ _PAGE_SOFT_DIRTY | _PAGE_DEVMAP | _PAGE_ENC | \ _PAGE_UFFD_WP) #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE) @@ -163,7 +163,7 @@ enum page_cache_mode { #define __RW _PAGE_RW #define _USR _PAGE_USER #define ___A _PAGE_ACCESSED -#define ___D _PAGE_DIRTY +#define ___D _PAGE_DIRTY_HW #define ___G _PAGE_GLOBAL #define __NX _PAGE_NX @@ -205,7 +205,6 @@ enum page_cache_mode { #define __PAGE_KERNEL_IO __PAGE_KERNEL #define __PAGE_KERNEL_IO_NOCACHE __PAGE_KERNEL_NOCACHE - #ifndef __ASSEMBLY__ #define __PAGE_KERNEL_ENC (__PAGE_KERNEL | _ENC) diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S index a4d9a261425b..e3bb4ff95523 100644 --- a/arch/x86/kernel/relocate_kernel_64.S +++ b/arch/x86/kernel/relocate_kernel_64.S @@ -17,7 +17,7 @@ */ #define PTR(x) (x << 3) -#define PAGE_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) +#define PAGE_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY_HW) /* * control_page + KEXEC_CONTROL_CODE_MAX_SIZE diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 46ba2e03a892..f5ec6dadca4f 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -3605,7 +3605,7 @@ static int init_rmode_identity_map(struct kvm *kvm) /* Set up identity-mapping pagetable for EPT in real mode */ for (i = 0; i < PT32_ENT_PER_PAGE; i++) { tmp = (i << 22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | - _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE); + _PAGE_ACCESSED | _PAGE_DIRTY_HW | _PAGE_PSE); r = kvm_write_guest_page(kvm, identity_map_pfn, &tmp, i * sizeof(tmp), sizeof(tmp)); if (r < 0) From patchwork Tue Aug 25 00:25:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734521 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 67953913 for ; Tue, 25 Aug 2020 00:29:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3EAD7207D3 for ; Tue, 25 Aug 2020 00:29:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3EAD7207D3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9D2B26B0037; Mon, 24 Aug 2020 20:29:35 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9605D8D0003; Mon, 24 Aug 2020 20:29:35 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 763836B0055; Mon, 24 Aug 2020 20:29:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0121.hostedemail.com [216.40.44.121]) by kanga.kvack.org (Postfix) with ESMTP id 4214C8E000A for ; Mon, 24 Aug 2020 20:29:35 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 0AB73362B for ; Tue, 25 Aug 2020 00:29:35 +0000 (UTC) X-FDA: 77187207510.26.spark21_3f0ffe227057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id C98111804B660 for ; Tue, 25 Aug 2020 00:29:34 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30056:30064,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yf4gsj86ykeexa56tc3ttw3ybpcocptryet97u7gcji7w3irp1jbxgib8hgoe.acws414h345ky48qq43nbmdi38ukyh99ippsneerakguxadwj7x8a4g45tt6p7b.o-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: spark21_3f0ffe227057 X-Filterd-Recvd-Size: 5102 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:34 +0000 (UTC) IronPort-SDR: agf6O2SLzTX4NHI4QZu3Vo122hBvxkfrUt4XeeeGXocRZhVUj74CARvvEKJnYTTbWxoC+ovzbk 3RmMu9HAEfig== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061717" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061717" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:33 -0700 IronPort-SDR: Q7oZ8rBW1MW97Jr/V+HreD129oTSM6KcACxLjj5IssxvHysecFa/1GhiXc2TqOytbM3l6DonCJ xU/ts3UqVwHA== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474134957" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:33 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu , Christoph Hellwig Subject: [PATCH v11 07/25] x86/mm: Remove _PAGE_DIRTY_HW from kernel RO pages Date: Mon, 24 Aug 2020 17:25:22 -0700 Message-Id: <20200825002540.3351-8-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: C98111804B660 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Kernel read-only PTEs are setup as _PAGE_DIRTY_HW. Since these become shadow stack PTEs, remove the dirty bit. Signed-off-by: Yu-cheng Yu Cc: "H. Peter Anvin" Cc: Kees Cook Cc: Thomas Gleixner Cc: Dave Hansen Cc: Christoph Hellwig Cc: Andy Lutomirski Cc: Ingo Molnar Cc: Borislav Petkov Cc: Peter Zijlstra --- arch/x86/include/asm/pgtable_types.h | 6 +++--- arch/x86/mm/pat/set_memory.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 192e1326b3db..5f31f1c407b9 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -193,10 +193,10 @@ enum page_cache_mode { #define _KERNPG_TABLE (__PP|__RW| 0|___A| 0|___D| 0| 0| _ENC) #define _PAGE_TABLE_NOENC (__PP|__RW|_USR|___A| 0|___D| 0| 0) #define _PAGE_TABLE (__PP|__RW|_USR|___A| 0|___D| 0| 0| _ENC) -#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX|___D| 0|___G) -#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0|___D| 0|___G) +#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX| 0| 0|___G) +#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0| 0| 0|___G) #define __PAGE_KERNEL_NOCACHE (__PP|__RW| 0|___A|__NX|___D| 0|___G| __NC) -#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX|___D| 0|___G) +#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX| 0| 0|___G) #define __PAGE_KERNEL_LARGE (__PP|__RW| 0|___A|__NX|___D|_PSE|___G) #define __PAGE_KERNEL_LARGE_EXEC (__PP|__RW| 0|___A| 0|___D|_PSE|___G) #define __PAGE_KERNEL_WP (__PP|__RW| 0|___A|__NX|___D| 0|___G| __WP) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index d1b2a889f035..962434fdf0d9 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -1932,7 +1932,7 @@ int set_memory_nx(unsigned long addr, int numpages) int set_memory_ro(unsigned long addr, int numpages) { - return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW), 0); + return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW | _PAGE_DIRTY_HW), 0); } int set_memory_rw(unsigned long addr, int numpages) From patchwork Tue Aug 25 00:25:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734523 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E5D73913 for ; Tue, 25 Aug 2020 00:29:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A1B3120838 for ; Tue, 25 Aug 2020 00:29:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A1B3120838 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0382A8D0003; Mon, 24 Aug 2020 20:29:37 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F03558E0010; Mon, 24 Aug 2020 20:29:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D31068D0008; Mon, 24 Aug 2020 20:29:36 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0074.hostedemail.com [216.40.44.74]) by kanga.kvack.org (Postfix) with ESMTP id B2F6B8D0003 for ; Mon, 24 Aug 2020 20:29:36 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 849C41EF1 for ; Tue, 25 Aug 2020 00:29:36 +0000 (UTC) X-FDA: 77187207552.06.ship68_381545427057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id 4BD121004E3ED for ; Tue, 25 Aug 2020 00:29:36 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:4423:30045:30051:30054:30056:30064:30070:30079:30091,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04ygy3sdpyigut16otg9n7cna7k9kycgprfso47on515nc9hi5qpejgzj7xgjg8.huqyd474d4wb6qiosjixutd1quru58yrpwimmj6bnysecdo1hi1rqartemoyni5.6-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: ship68_381545427057 X-Filterd-Recvd-Size: 15368 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:34 +0000 (UTC) IronPort-SDR: ohGcqZpRWzOkAKl+X304skD0sEeM7mbX6lki0ps3V/fXcxU85NV8yyWMFHpzLqq4o1ygL/ZmOY E7WZlo4zHz9A== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061720" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061720" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:34 -0700 IronPort-SDR: 0kWHlDfFyPtUR/rbb+b2yoh6sUJwJTyzkxHnm82FGJfa4RRzOH7XnOKCZN2o1Cw+Y5+HyoV/Vt Cy/+iz4t358A== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474134961" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:33 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu Subject: [PATCH v11 08/25] x86/mm: Introduce _PAGE_COW Date: Mon, 24 Aug 2020 17:25:23 -0700 Message-Id: <20200825002540.3351-9-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 4BD121004E3ED X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There is essentially no room left in the x86 hardware PTEs on some OSes (not Linux). That left the hardware architects looking for a way to represent a new memory type (shadow stack) within the existing bits. They chose to repurpose a lightly-used state: Write=0,Dirty=1. The reason it's lightly used is that Dirty=1 is normally set by hardware and cannot normally be set by hardware on a Write=0 PTE. Software must normally be involved to create one of these PTEs, so software can simply opt to not create them. But that leaves us with a Linux problem: we need to ensure we never create Write=0,Dirty=1 PTEs. In places where we do create them, we need to find an alternative way to represent them _without_ using the same hardware bit combination. Thus, enter _PAGE_COW. This results in the following: (a) A modified, copy-on-write (COW) page: (R/O + _PAGE_COW) (b) A R/O page that has been COW'ed: (R/O + _PAGE_COW) The user page is in a R/O VMA, and get_user_pages() needs a writable copy. The page fault handler creates a copy of the page and sets the new copy's PTE as R/O and _PAGE_COW. (c) A shadow stack PTE: (R/O + _PAGE_DIRTY_HW) (d) A shared shadow stack PTE: (R/O + _PAGE_COW) When a shadow stack page is being shared among processes (this happens at fork()), its PTE is cleared of _PAGE_DIRTY_HW, so the next shadow stack access causes a fault, and the page is duplicated and _PAGE_DIRTY_HW is set again. This is the COW equivalent for shadow stack pages, even though it's copy-on-access rather than copy-on-write. (e) A page where the processor observed a Write=1 PTE, started a write, set Dirty=1, but then observed a Write=0 PTE. That's possible today, but will not happen on processors that support shadow stack. Use _PAGE_COW in pte_wrprotect() and _PAGE_DIRTY_HW in pte_mkwrite(). Apply the same changes to pmd and pud. When this patch is applied, there are six free bits left in the 64-bit PTE. There are no more free bits in the 32-bit PTE (except for PAE) and shadow stack is not implemented for the 32-bit kernel. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- v10: - Change _PAGE_BIT_DIRTY_SW to _PAGE_BIT_COW, as it is used for copy-on- write PTEs. - Update pte_write() and treat shadow stack as writable. - Change *_mkdirty_shstk() to *_mkwrite_shstk() as these make shadow stack pages writable. - Use bit test & shift to move _PAGE_BIT_DIRTY_HW to _PAGE_BIT_COW. - Change static_cpu_has() to cpu_feature_enabled(). - Revise commit log. v9: - Remove pte_move_flags() etc. and put the logic directly in pte_wrprotect()/pte_mkwrite() etc. - Change compile-time conditionals to run-time checks. - Split out pte_modify()/pmd_modify() to a new patch. - Update comments. arch/x86/include/asm/pgtable.h | 120 ++++++++++++++++++++++++--- arch/x86/include/asm/pgtable_types.h | 41 ++++++++- 2 files changed, 150 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 86b7acd221c1..ac4ed814be96 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -122,9 +122,9 @@ extern pmdval_t early_pmd_flags; * The following only work if pte_present() is true. * Undefined behaviour if not.. */ -static inline int pte_dirty(pte_t pte) +static inline bool pte_dirty(pte_t pte) { - return pte_flags(pte) & _PAGE_DIRTY_HW; + return pte_flags(pte) & _PAGE_DIRTY_BITS; } @@ -161,9 +161,9 @@ static inline int pte_young(pte_t pte) return pte_flags(pte) & _PAGE_ACCESSED; } -static inline int pmd_dirty(pmd_t pmd) +static inline bool pmd_dirty(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_DIRTY_HW; + return pmd_flags(pmd) & _PAGE_DIRTY_BITS; } static inline int pmd_young(pmd_t pmd) @@ -171,9 +171,9 @@ static inline int pmd_young(pmd_t pmd) return pmd_flags(pmd) & _PAGE_ACCESSED; } -static inline int pud_dirty(pud_t pud) +static inline bool pud_dirty(pud_t pud) { - return pud_flags(pud) & _PAGE_DIRTY_HW; + return pud_flags(pud) & _PAGE_DIRTY_BITS; } static inline int pud_young(pud_t pud) @@ -183,6 +183,12 @@ static inline int pud_young(pud_t pud) static inline int pte_write(pte_t pte) { + /* + * If _PAGE_DIRTY_HW is set, the PTE must either have + * _PAGE_RW or be a shadow stack PTE, which is logically writable. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY_HW); return pte_flags(pte) & _PAGE_RW; } @@ -334,7 +340,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte) static inline pte_t pte_mkclean(pte_t pte) { - return pte_clear_flags(pte, _PAGE_DIRTY_HW); + return pte_clear_flags(pte, _PAGE_DIRTY_BITS); } static inline pte_t pte_mkold(pte_t pte) @@ -344,6 +350,17 @@ static inline pte_t pte_mkold(pte_t pte) static inline pte_t pte_wrprotect(pte_t pte) { + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PTE (RW=0,Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pte.pte |= (pte.pte & _PAGE_DIRTY_HW) >> + _PAGE_BIT_DIRTY_HW << _PAGE_BIT_COW; + pte = pte_clear_flags(pte, _PAGE_DIRTY_HW); + } + return pte_clear_flags(pte, _PAGE_RW); } @@ -354,6 +371,18 @@ static inline pte_t pte_mkexec(pte_t pte) static inline pte_t pte_mkdirty(pte_t pte) { + pteval_t dirty = _PAGE_DIRTY_HW; + + /* Avoid creating (HW)Dirty=1,Write=0 PTEs */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !pte_write(pte)) + dirty = _PAGE_COW; + + return pte_set_flags(pte, dirty | _PAGE_SOFT_DIRTY); +} + +static inline pte_t pte_mkwrite_shstk(pte_t pte) +{ + pte = pte_clear_flags(pte, _PAGE_COW); return pte_set_flags(pte, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY); } @@ -364,6 +393,13 @@ static inline pte_t pte_mkyoung(pte_t pte) static inline pte_t pte_mkwrite(pte_t pte) { + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + if (pte_flags(pte) & _PAGE_COW) { + pte = pte_clear_flags(pte, _PAGE_COW); + pte = pte_set_flags(pte, _PAGE_DIRTY_HW); + } + } + return pte_set_flags(pte, _PAGE_RW); } @@ -435,16 +471,41 @@ static inline pmd_t pmd_mkold(pmd_t pmd) static inline pmd_t pmd_mkclean(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_DIRTY_HW); + return pmd_clear_flags(pmd, _PAGE_DIRTY_BITS); } static inline pmd_t pmd_wrprotect(pmd_t pmd) { + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PMD (RW=0,Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pmdval_t v = native_pmd_val(pmd); + + v |= (v & _PAGE_DIRTY_HW) >> _PAGE_BIT_DIRTY_HW << + _PAGE_BIT_COW; + pmd = pmd_clear_flags(__pmd(v), _PAGE_DIRTY_HW); + } + return pmd_clear_flags(pmd, _PAGE_RW); } static inline pmd_t pmd_mkdirty(pmd_t pmd) { + pmdval_t dirty = _PAGE_DIRTY_HW; + + /* Avoid creating (HW)Dirty=1,Write=0 PMDs */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !(pmd_flags(pmd) & _PAGE_RW)) + dirty = _PAGE_COW; + + return pmd_set_flags(pmd, dirty | _PAGE_SOFT_DIRTY); +} + +static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd) +{ + pmd = pmd_clear_flags(pmd, _PAGE_COW); return pmd_set_flags(pmd, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY); } @@ -465,6 +526,13 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) static inline pmd_t pmd_mkwrite(pmd_t pmd) { + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + if (pmd_flags(pmd) & _PAGE_COW) { + pmd = pmd_clear_flags(pmd, _PAGE_COW); + pmd = pmd_set_flags(pmd, _PAGE_DIRTY_HW); + } + } + return pmd_set_flags(pmd, _PAGE_RW); } @@ -489,17 +557,36 @@ static inline pud_t pud_mkold(pud_t pud) static inline pud_t pud_mkclean(pud_t pud) { - return pud_clear_flags(pud, _PAGE_DIRTY_HW); + return pud_clear_flags(pud, _PAGE_DIRTY_BITS); } static inline pud_t pud_wrprotect(pud_t pud) { + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PUD (RW=0,Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pudval_t v = native_pud_val(pud); + + v |= (v & _PAGE_DIRTY_HW) >> _PAGE_BIT_DIRTY_HW << + _PAGE_BIT_COW; + pud = pud_clear_flags(__pud(v), _PAGE_DIRTY_HW); + } + return pud_clear_flags(pud, _PAGE_RW); } static inline pud_t pud_mkdirty(pud_t pud) { - return pud_set_flags(pud, _PAGE_DIRTY_HW | _PAGE_SOFT_DIRTY); + pudval_t dirty = _PAGE_DIRTY_HW; + + /* Avoid creating (HW)Dirty=1,Write=0 PUDs */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !(pud_flags(pud) & _PAGE_RW)) + dirty = _PAGE_COW; + + return pud_set_flags(pud, dirty | _PAGE_SOFT_DIRTY); } static inline pud_t pud_mkdevmap(pud_t pud) @@ -519,6 +606,13 @@ static inline pud_t pud_mkyoung(pud_t pud) static inline pud_t pud_mkwrite(pud_t pud) { + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + if (pud_flags(pud) & _PAGE_COW) { + pud = pud_clear_flags(pud, _PAGE_COW); + pud = pud_set_flags(pud, _PAGE_DIRTY_HW); + } + } + return pud_set_flags(pud, _PAGE_RW); } @@ -1132,6 +1226,12 @@ extern int pmdp_clear_flush_young(struct vm_area_struct *vma, #define pmd_write pmd_write static inline int pmd_write(pmd_t pmd) { + /* + * If _PAGE_DIRTY_HW is set, then the PMD must either have + * _PAGE_RW or be a shadow stack PMD, which is logically writable. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pmd_flags(pmd) & (_PAGE_RW | _PAGE_DIRTY_HW); return pmd_flags(pmd) & _PAGE_RW; } diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 5f31f1c407b9..b57483567b8b 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -23,7 +23,8 @@ #define _PAGE_BIT_SOFTW2 10 /* " */ #define _PAGE_BIT_SOFTW3 11 /* " */ #define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */ -#define _PAGE_BIT_SOFTW4 58 /* available for programmer */ +#define _PAGE_BIT_SOFTW4 57 /* available for programmer */ +#define _PAGE_BIT_SOFTW5 58 /* available for programmer */ #define _PAGE_BIT_PKEY_BIT0 59 /* Protection Keys, bit 1/4 */ #define _PAGE_BIT_PKEY_BIT1 60 /* Protection Keys, bit 2/4 */ #define _PAGE_BIT_PKEY_BIT2 61 /* Protection Keys, bit 3/4 */ @@ -36,6 +37,16 @@ #define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */ #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4 +/* + * This bit indicates a copy-on-write page, and is different from + * _PAGE_BIT_SOFT_DIRTY, which tracks which pages a task writes to. + */ +#ifdef CONFIG_X86_64 +#define _PAGE_BIT_COW _PAGE_BIT_SOFTW5 /* copy-on-write */ +#else +#define _PAGE_BIT_COW 0 +#endif + /* If _PAGE_BIT_PRESENT is clear, we use these: */ /* - if the user mapped it with PROT_NONE; pte_present gives true */ #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL @@ -117,6 +128,34 @@ #define _PAGE_DEVMAP (_AT(pteval_t, 0)) #endif +/* + * _PAGE_COW is used to separate R/O and copy-on-write PTEs created by + * software from the shadow stack PTE setting required by the hardware: + * (a) A modified, copy-on-write (COW) page: (R/O + _PAGE_COW) + * (b) A R/O page that has been COW'ed: (R/O +_PAGE_COW) + * The user page is in a R/O VMA, and get_user_pages() needs a + * writable copy. The page fault handler creates a copy of the page + * and sets the new copy's PTE as R/O and _PAGE_COW. + * (c) A shadow stack PTE: (R/O + _PAGE_DIRTY_HW) + * (d) A shared (copy-on-access) shadow stack PTE: (R/O + _PAGE_COW) + * When a shadow stack page is being shared among processes (this + * happens at fork()), its PTE is cleared of _PAGE_DIRTY_HW, so the + * next shadow stack access causes a fault, and the page is duplicated + * and _PAGE_DIRTY_HW is set again. This is the COW equivalent for + * shadow stack pages, even though it's copy-on-access rather than + * copy-on-write. + * (e) A page where the processor observed a Write=1 PTE, started a write, + * set Dirty=1, but then observed a Write=0 PTE. That's possible + * today, but will not happen on processors that support shadow stack. + */ +#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER +#define _PAGE_COW (_AT(pteval_t, 1) << _PAGE_BIT_COW) +#else +#define _PAGE_COW (_AT(pteval_t, 0)) +#endif + +#define _PAGE_DIRTY_BITS (_PAGE_DIRTY_HW | _PAGE_COW) + #define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) /* From patchwork Tue Aug 25 00:25:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734525 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3856C109B for ; Tue, 25 Aug 2020 00:29:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 104B320838 for ; Tue, 25 Aug 2020 00:29:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 104B320838 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D7DE28E0015; Mon, 24 Aug 2020 20:29:37 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id CE0C18E0010; Mon, 24 Aug 2020 20:29:37 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B0C848E0015; Mon, 24 Aug 2020 20:29:37 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0142.hostedemail.com [216.40.44.142]) by kanga.kvack.org (Postfix) with ESMTP id 990EB8E0010 for ; Mon, 24 Aug 2020 20:29:37 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 5A1D71EF1 for ; Tue, 25 Aug 2020 00:29:37 +0000 (UTC) X-FDA: 77187207594.13.cap78_160ae8c27057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id 2DDE118140B60 for ; Tue, 25 Aug 2020 00:29:37 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30055:30056:30064:30070,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yfk8p5dnfc13karzgaaxqgrzqsqycuwrcxztqjdyqqkpf5ffwshn75n6ixj18.7c987apmf94enkamuo3q98h53bz6xjoyx4cb1bzumdjc95mnnqpfth16eeyiqhf.r-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:27,LUA_SUMMARY:none X-HE-Tag: cap78_160ae8c27057 X-Filterd-Recvd-Size: 3981 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf20.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:36 +0000 (UTC) IronPort-SDR: G8knVMXlhTdiBFVdzS6jZ229NS4ZUziS/yQ/gpCGCAaX+7NGS/UkFg1wttDimLfrekJYYJcQmU wKYKNlD39ZXA== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061724" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061724" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:35 -0700 IronPort-SDR: t/f0Fy8XKSj8Tdpvkh/FWYuRZOS2AZNlWGcFsgCZ90WgclJzmKC4FVplytaqWqn9m/VhoxDeVw 06O1b+ZRfWNQ== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474134968" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:34 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu , David Airlie , Joonas Lahtinen , Jani Nikula , Daniel Vetter , Rodrigo Vivi , Zhenyu Wang , Zhi Wang Subject: [PATCH v11 09/25] drm/i915/gvt: Change _PAGE_DIRTY to _PAGE_DIRTY_BITS Date: Mon, 24 Aug 2020 17:25:24 -0700 Message-Id: <20200825002540.3351-10-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 2DDE118140B60 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: After the introduction of _PAGE_COW, a modified page's PTE can have either _PAGE_DIRTY_HW or _PAGE_COW. Change _PAGE_DIRTY to _PAGE_DIRTY_BITS. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook Cc: David Airlie Cc: Joonas Lahtinen Cc: Jani Nikula Cc: Daniel Vetter Cc: Rodrigo Vivi Cc: Zhenyu Wang Cc: Zhi Wang --- drivers/gpu/drm/i915/gvt/gtt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c index 210016192ce7..c01f4880c794 100644 --- a/drivers/gpu/drm/i915/gvt/gtt.c +++ b/drivers/gpu/drm/i915/gvt/gtt.c @@ -1207,7 +1207,7 @@ static int split_2MB_gtt_entry(struct intel_vgpu *vgpu, } /* Clear dirty field. */ - se->val64 &= ~_PAGE_DIRTY; + se->val64 &= ~_PAGE_DIRTY_BITS; ops->clear_pse(se); ops->clear_ips(se); From patchwork Tue Aug 25 00:25:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734527 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7DFF0109B for ; Tue, 25 Aug 2020 00:29:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5560620838 for ; Tue, 25 Aug 2020 00:29:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5560620838 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 433658E0010; Mon, 24 Aug 2020 20:29:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 394518E0018; Mon, 24 Aug 2020 20:29:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 20D138E0010; Mon, 24 Aug 2020 20:29:38 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0076.hostedemail.com [216.40.44.76]) by kanga.kvack.org (Postfix) with ESMTP id E75AA8E0018 for ; Mon, 24 Aug 2020 20:29:37 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id A5E1882499B9 for ; Tue, 25 Aug 2020 00:29:37 +0000 (UTC) X-FDA: 77187207594.07.toys71_3e0191e27057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id 6C2E21803F9AA for ; Tue, 25 Aug 2020 00:29:37 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30056:30064:30070,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yfxr19bi9m3ewzux9ycp54oxp8iypwi5jfhgo7shmoyqz8u9k14dbf4krhe7w.hbniuxfniuqiqqzojfcwem6gkp7ch8mg1g5aj1r8pt1yc7dcace6gnnxnzyp7sp.c-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: toys71_3e0191e27057 X-Filterd-Recvd-Size: 4836 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:36 +0000 (UTC) IronPort-SDR: JCVRdc3STuBJaSW0H40KWZM0tpeQiagyOpvNEnXWJndUNferz87dEtT8ar59x6KQGyNI5dTjIE sutzmDFoOJBQ== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061726" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061726" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:36 -0700 IronPort-SDR: V0kQ26yYMCa8m5gx20e6HHqkIYTDZvkUIo+1sNBujf9XL0hAzOw6GidInRZ9Od4TCe4opIyAYC +JXc0WLWMFmw== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474134973" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:35 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu Subject: [PATCH v11 10/25] x86/mm: Update pte_modify for _PAGE_COW Date: Mon, 24 Aug 2020 17:25:25 -0700 Message-Id: <20200825002540.3351-11-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 6C2E21803F9AA X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Pte_modify() changes a PTE to 'newprot'. It doesn't use the pte_*() helpers that a previous patch fixed up, so we need a new site. Introduce fixup_dirty_pte() to set the dirty bits based on _PAGE_RW, and apply the same changes to pmd_modify(). Signed-off-by: Yu-cheng Yu --- v10: - Replace _PAGE_CHG_MASK approach with fixup functions. arch/x86/include/asm/pgtable.h | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index ac4ed814be96..3bdb192a904b 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -727,6 +727,21 @@ static inline pmd_t pmd_mkinvalid(pmd_t pmd) static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask); +static inline pteval_t fixup_dirty_pte(pteval_t pteval) +{ + pte_t pte = __pte(pteval); + + if (pte_dirty(pte)) { + pte = pte_mkclean(pte); + + if (pte_flags(pte) & _PAGE_RW) + pte = pte_set_flags(pte, _PAGE_DIRTY_HW); + else + pte = pte_set_flags(pte, _PAGE_COW); + } + return pte_val(pte); +} + static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { pteval_t val = pte_val(pte), oldval = val; @@ -737,16 +752,34 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) */ val &= _PAGE_CHG_MASK; val |= check_pgprot(newprot) & ~_PAGE_CHG_MASK; + val = fixup_dirty_pte(val); val = flip_protnone_guard(oldval, val, PTE_PFN_MASK); return __pte(val); } +static inline int pmd_write(pmd_t pmd); +static inline pmdval_t fixup_dirty_pmd(pmdval_t pmdval) +{ + pmd_t pmd = __pmd(pmdval); + + if (pmd_dirty(pmd)) { + pmd = pmd_mkclean(pmd); + + if (pmd_flags(pmd) & _PAGE_RW) + pmd = pmd_set_flags(pmd, _PAGE_DIRTY_HW); + else + pmd = pmd_set_flags(pmd, _PAGE_COW); + } + return pmd_val(pmd); +} + static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) { pmdval_t val = pmd_val(pmd), oldval = val; val &= _HPAGE_CHG_MASK; val |= check_pgprot(newprot) & ~_HPAGE_CHG_MASK; + val = fixup_dirty_pmd(val); val = flip_protnone_guard(oldval, val, PHYSICAL_PMD_PAGE_MASK); return __pmd(val); } From patchwork Tue Aug 25 00:25:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734529 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BE60F913 for ; Tue, 25 Aug 2020 00:29:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9628520838 for ; Tue, 25 Aug 2020 00:29:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9628520838 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0AFCC8E0018; Mon, 24 Aug 2020 20:29:39 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F02988E001A; Mon, 24 Aug 2020 20:29:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF60D8E0019; Mon, 24 Aug 2020 20:29:38 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0133.hostedemail.com [216.40.44.133]) by kanga.kvack.org (Postfix) with ESMTP id A2BB98E0018 for ; Mon, 24 Aug 2020 20:29:38 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 5E5CC362B for ; Tue, 25 Aug 2020 00:29:38 +0000 (UTC) X-FDA: 77187207636.10.class13_180a1d527057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin10.hostedemail.com (Postfix) with ESMTP id 2459D16A040 for ; Tue, 25 Aug 2020 00:29:38 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30056:30064:30070,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yfww1rbw1bh7w3owsku613174szyc6opfa9a9jyab9nauded5a75x84r5xh7m.g7yupnka59613qzmc66fjqw6y335kb9ajt6q6quxz1hz555tckbpig5s87w7hb7.4-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: class13_180a1d527057 X-Filterd-Recvd-Size: 6717 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf20.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:37 +0000 (UTC) IronPort-SDR: p+W3zrkAUJOfR8nqzazS0i62+A0vjnGrGi68nps8hIiOFV5m9vnpfTzDIPnWREm/LZvS/4hF8R 5srOc30IX9pw== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061728" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061728" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:36 -0700 IronPort-SDR: KPIzFsb0zuWPPZY90PJXbLA/AJeM/792ake0CEuQMgRw2TA1CNGRELRobu4+D1A/xEW+IX5I6u +Q4nBiehXRQQ== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474134979" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:36 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu Subject: [PATCH v11 11/25] x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for transition from _PAGE_DIRTY_HW to _PAGE_COW Date: Mon, 24 Aug 2020 17:25:26 -0700 Message-Id: <20200825002540.3351-12-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 2459D16A040 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When shadow stack is introduced, [R/O + _PAGE_DIRTY_HW] PTE is reserved for shadow stack. Copy-on-write PTEs have [R/O + _PAGE_COW]. When a PTE goes from [R/W + _PAGE_DIRTY_HW] to [R/O + _PAGE_COW], it could become a transient shadow stack PTE in two cases: The first case is that some processors can start a write but end up seeing a read-only PTE by the time they get to the Dirty bit, creating a transient shadow stack PTE. However, this will not occur on processors supporting shadow stack, therefore we don't need a TLB flush here. The second case is that when the software, without atomic, tests & replaces _PAGE_DIRTY_HW with _PAGE_COW, a transient shadow stack PTE can exist. This is prevented with cmpxchg. Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many insights to the issue. Jann Horn provided the cmpxchg solution. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- v10: - Replace bit shift with pte_wrprotect()/pmd_wrprotect(), which use bit test & shift. - Move READ_ONCE of old_pte into try_cmpxchg() loop. - Change static_cpu_has() to cpu_feature_enabled(). v9: - Change compile-time conditionals to runtime checks. - Fix parameters of try_cmpxchg(): change pte_t/pmd_t to pte_t.pte/pmd_t.pmd. v4: - Implement try_cmpxchg(). arch/x86/include/asm/pgtable.h | 52 ++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 3bdb192a904b..a00d55fda5a2 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1230,6 +1230,32 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { + /* + * Some processors can start a write, but end up seeing a read-only + * PTE by the time they get to the Dirty bit. In this case, they + * will set the Dirty bit, leaving a read-only, Dirty PTE which + * looks like a shadow stack PTE. + * + * However, this behavior has been improved and will not occur on + * processors supporting shadow stack. Without this guarantee, a + * transition to a non-present PTE and flush the TLB would be + * needed. + * + * When changing a writable PTE to read-only and if the PTE has + * _PAGE_DIRTY_HW set, move that bit to _PAGE_COW so that the + * PTE is not a shadow stack PTE. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pte_t old_pte, new_pte; + + do { + old_pte = READ_ONCE(*ptep); + new_pte = pte_wrprotect(old_pte); + + } while (!try_cmpxchg(&ptep->pte, &old_pte.pte, new_pte.pte)); + + return; + } clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); } @@ -1286,6 +1312,32 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { + /* + * Some processors can start a write, but end up seeing a read-only + * PMD by the time they get to the Dirty bit. In this case, they + * will set the Dirty bit, leaving a read-only, Dirty PMD which + * looks like a Shadow Stack PMD. + * + * However, this behavior has been improved and will not occur on + * processors supporting Shadow Stack. Without this guarantee, a + * transition to a non-present PMD and flush the TLB would be + * needed. + * + * When changing a writable PMD to read-only and if the PMD has + * _PAGE_DIRTY_HW set, we move that bit to _PAGE_COW so that the + * PMD is not a shadow stack PMD. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pmd_t old_pmd, new_pmd; + + do { + old_pmd = READ_ONCE(*pmdp); + new_pmd = pmd_wrprotect(old_pmd); + + } while (!try_cmpxchg((pmdval_t *)pmdp, (pmdval_t *)&old_pmd, pmd_val(new_pmd))); + + return; + } clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); } From patchwork Tue Aug 25 00:25:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734531 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 524AB913 for ; Tue, 25 Aug 2020 00:29:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 28CD120838 for ; Tue, 25 Aug 2020 00:29:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 28CD120838 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3C6EB8E0019; Mon, 24 Aug 2020 20:29:39 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 31EC18E001B; Mon, 24 Aug 2020 20:29:39 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0AC348E0019; Mon, 24 Aug 2020 20:29:39 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0090.hostedemail.com [216.40.44.90]) by kanga.kvack.org (Postfix) with ESMTP id D88518E0018 for ; Mon, 24 Aug 2020 20:29:38 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 9F9DA180AD817 for ; Tue, 25 Aug 2020 00:29:38 +0000 (UTC) X-FDA: 77187207636.22.scale65_400c98527057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id 7141F18038E60 for ; Tue, 25 Aug 2020 00:29:38 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30056:30064:30070:30079,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04ygjaz4ftf4e79yw1qghnedpfdeyyc334iwak6zkri351gfay3jprppzb95mz1.fmxoi1edft5f6mgdpibmeujeio4fjhotc6ajqzaajhriib96uhm1nmecsknbwip.n-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: scale65_400c98527057 X-Filterd-Recvd-Size: 5263 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:37 +0000 (UTC) IronPort-SDR: yu7bgfkyeRG4BSfqU7Yn4ieERbFnOUp6PQSNKXWcG2vtX6BF9s4DYDkRXax/14m7WT7pmYtDnw TxFSPdvmsEaw== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061731" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061731" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:37 -0700 IronPort-SDR: d1EfFxe85+4cInBbr4i5psOpTmlLClZXL+buEQDddkYhHxWKLs6TCRKmt3rsr7l6jq94lgfeji oaoxmeTPeBiA== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474134984" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:36 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu Subject: [PATCH v11 12/25] mm: Introduce VM_SHSTK for shadow stack memory Date: Mon, 24 Aug 2020 17:25:27 -0700 Message-Id: <20200825002540.3351-13-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 7141F18038E60 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A Shadow Stack PTE must be read-only and have _PAGE_DIRTY set. However, read-only and Dirty PTEs also exist for copy-on-write (COW) pages. These two cases are handled differently for page faults. Introduce VM_SHSTK to track shadow stack VMAs. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- v9: - Add VM_SHSTK case to arch_vma_name(). - Revise the commit log to explain why adding a new VM flag. arch/x86/mm/mmap.c | 2 ++ fs/proc/task_mmu.c | 3 +++ include/linux/mm.h | 8 ++++++++ 3 files changed, 13 insertions(+) diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index c90c20904a60..a22c6b6fc607 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -165,6 +165,8 @@ unsigned long get_mmap_base(int is_legacy) const char *arch_vma_name(struct vm_area_struct *vma) { + if (vma->vm_flags & VM_SHSTK) + return "[shadow stack]"; return NULL; } diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 5066b0251ed8..682ea6f95fa4 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -663,6 +663,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_PKEY_BIT4)] = "", #endif #endif /* CONFIG_ARCH_HAS_PKEYS */ +#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER + [ilog2(VM_SHSTK)] = "ss", +#endif }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index 1983e08f5906..62f5f496a6d1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -299,11 +299,13 @@ extern unsigned int kobjsize(const void *objp); #define VM_HIGH_ARCH_BIT_2 34 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_3 35 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_4 36 /* bit only usable on 64-bit architectures */ +#define VM_HIGH_ARCH_BIT_5 37 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0) #define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1) #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2) #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3) #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4) +#define VM_HIGH_ARCH_5 BIT(VM_HIGH_ARCH_BIT_5) #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */ #ifdef CONFIG_ARCH_HAS_PKEYS @@ -335,6 +337,12 @@ extern unsigned int kobjsize(const void *objp); # define VM_MAPPED_COPY VM_ARCH_1 /* T if mapped copy of data (nommu mmap) */ #endif +#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER +# define VM_SHSTK VM_HIGH_ARCH_5 +#else +# define VM_SHSTK VM_NONE +#endif + #ifndef VM_GROWSUP # define VM_GROWSUP VM_NONE #endif From patchwork Tue Aug 25 00:25:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734533 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D0853913 for ; Tue, 25 Aug 2020 00:30:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9D20A207D8 for ; Tue, 25 Aug 2020 00:30:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9D20A207D8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A9B18900005; Mon, 24 Aug 2020 20:29:39 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9AD818E001A; Mon, 24 Aug 2020 20:29:39 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7FE8E8E001B; Mon, 24 Aug 2020 20:29:39 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0062.hostedemail.com [216.40.44.62]) by kanga.kvack.org (Postfix) with ESMTP id 583F08E001A for ; Mon, 24 Aug 2020 20:29:39 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2297F181AEF1D for ; Tue, 25 Aug 2020 00:29:39 +0000 (UTC) X-FDA: 77187207678.16.stamp24_1b1389d27057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id EC46D100E690B for ; Tue, 25 Aug 2020 00:29:38 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30054:30056:30064:30069:30070,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yrchwanhjggw3aq96yfepcwmgwkopd9bf64hsw76mmgnwjyiwo8y8j7cbu4sd.gxpm91jod81ppowoyqkbuyt3puxfeda4qny6xryb8tr91p1y5t5gwzr7p6idfzn.q-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: stamp24_1b1389d27057 X-Filterd-Recvd-Size: 6011 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf20.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:38 +0000 (UTC) IronPort-SDR: din5q8mfepaaAtDB+WGUg+kb3iP2xWEFqLIcz2yPyIY5hBRFj8pqxfEMo45Zh9xkx0ku84b6lY tqWPMLOT9PcQ== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061733" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061733" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:38 -0700 IronPort-SDR: gyDL0bD4h4WCbMcpj25tslUEs127x3suqMjrmQ0E2yteo22B/UsPr9y0EVQVTIfUjN0EIkSfP0 6V6TH39xotQw== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474134987" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:37 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu Subject: [PATCH v11 13/25] x86/mm: Shadow Stack page fault error checking Date: Mon, 24 Aug 2020 17:25:28 -0700 Message-Id: <20200825002540.3351-14-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: EC46D100E690B X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Shadow stack accesses are those that are performed by the CPU where it expects to encounter a shadow stack mapping. These accesses are performed implicitly by CALL/RET at the site of the shadow stack pointer. These accesses are made explicitly by shadow stack management instructions like WRUSSQ. Shadow stacks accesses to shadow-stack mapping can see faults in normal, valid operation just like regular accesses to regular mappings. Shadow stacks need some of the same features like delayed allocation, swap and copy-on-write. Shadow stack accesses can also result in errors, such as when a shadow stack overflows, or if a shadow stack access occurs to a non-shadow-stack mapping. In handling a shadow stack page fault, verify it occurs within a shadow stack mapping. It is always an error otherwise. For valid shadow stack accesses, set FAULT_FLAG_WRITE to effect copy-on-write. Because clearing _PAGE_DIRTY_HW (vs. _PAGE_RW) is used to trigger the fault, shadow stack read fault and shadow stack write fault are not differentiated and both are handled as a write access. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook --- v10: -Revise commit log. arch/x86/include/asm/traps.h | 2 ++ arch/x86/mm/fault.c | 19 +++++++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index 714b1a30e7b0..28b493c53d70 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -50,6 +50,7 @@ void __noreturn handle_stack_overflow(const char *message, * bit 3 == 1: use of reserved bit detected * bit 4 == 1: fault was an instruction fetch * bit 5 == 1: protection keys block access + * bit 6 == 1: shadow stack access fault */ enum x86_pf_error_code { X86_PF_PROT = 1 << 0, @@ -58,5 +59,6 @@ enum x86_pf_error_code { X86_PF_RSVD = 1 << 3, X86_PF_INSTR = 1 << 4, X86_PF_PK = 1 << 5, + X86_PF_SHSTK = 1 << 6, }; #endif /* _ASM_X86_TRAPS_H */ diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 35f1498e9832..db4018d122ca 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1063,6 +1063,17 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) (error_code & X86_PF_INSTR), foreign)) return 1; + /* + * Verify a shadow stack access is within a shadow stack VMA. + * It is always an error otherwise. Normal data access to a + * shadow stack area is checked in the case followed. + */ + if (error_code & X86_PF_SHSTK) { + if (!(vma->vm_flags & VM_SHSTK)) + return 1; + return 0; + } + if (error_code & X86_PF_WRITE) { /* write, present and write, not present: */ if (unlikely(!(vma->vm_flags & VM_WRITE))) @@ -1197,6 +1208,14 @@ void do_user_addr_fault(struct pt_regs *regs, perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); + /* + * Clearing _PAGE_DIRTY_HW is used to detect shadow stack access. + * This method cannot distinguish shadow stack read vs. write. + * For valid shadow stack accesses, set FAULT_FLAG_WRITE to effect + * copy-on-write. + */ + if (hw_error_code & X86_PF_SHSTK) + flags |= FAULT_FLAG_WRITE; if (hw_error_code & X86_PF_WRITE) flags |= FAULT_FLAG_WRITE; if (hw_error_code & X86_PF_INSTR) From patchwork Tue Aug 25 00:25:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734535 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 47A39739 for ; Tue, 25 Aug 2020 00:30:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 14A8A207D8 for ; Tue, 25 Aug 2020 00:30:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 14A8A207D8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 16F5D900006; Mon, 24 Aug 2020 20:29:41 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0FDDB8E000A; Mon, 24 Aug 2020 20:29:41 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DCC338E001B; Mon, 24 Aug 2020 20:29:40 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0069.hostedemail.com [216.40.44.69]) by kanga.kvack.org (Postfix) with ESMTP id B84348E001A for ; Mon, 24 Aug 2020 20:29:40 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 79E7182499B9 for ; Tue, 25 Aug 2020 00:29:40 +0000 (UTC) X-FDA: 77187207720.12.hen11_421826327057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id 497F018013040 for ; Tue, 25 Aug 2020 00:29:40 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30054:30056:30064:30070,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yr8qssq74mq557i1f6fqxh4bdzhypp3hwu8jkm9fyy1jdnffm8x4hs1mt3hd7.wfke86crtg1fjikd9jihc4rgow1ib5hqgaeqthxiyrspibjfiqzhx41bxiyzg31.y-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: hen11_421826327057 X-Filterd-Recvd-Size: 6325 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:38 +0000 (UTC) IronPort-SDR: r7+mO8OR86ExsYrmMR4NeD53R8xpVvgIZv4p2CM2Ly1wms3pHe6ITLBDQ+gOmUj2WHFB7x0knp 5rcIIMuuk7Hg== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061737" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061737" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:38 -0700 IronPort-SDR: PwY2YUAEDmzoxJ//oekfj9daSHLGf7UAvu8e/8o+YAr661TgSKDpAG5bU5j2OfK/JnONoP0W34 g/e+ae08qNbA== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474134992" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:38 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu Subject: [PATCH v11 14/25] x86/mm: Update maybe_mkwrite() for shadow stack Date: Mon, 24 Aug 2020 17:25:29 -0700 Message-Id: <20200825002540.3351-15-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 497F018013040 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Shadow stack memory is writable, but its VMA has VM_SHSTK instead of VM_WRITE. Update maybe_mkwrite() to include the shadow stack. Signed-off-by: Yu-cheng Yu --- arch/x86/Kconfig | 4 ++++ arch/x86/mm/pgtable.c | 18 ++++++++++++++++++ include/linux/mm.h | 2 ++ include/linux/pgtable.h | 24 ++++++++++++++++++++++++ mm/huge_memory.c | 2 ++ 5 files changed, 50 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 4844649ee884..e93be385cd04 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1935,6 +1935,9 @@ config AS_HAS_SHADOW_STACK config X86_INTEL_CET def_bool n +config ARCH_MAYBE_MKWRITE + def_bool n + config ARCH_HAS_SHADOW_STACK def_bool n @@ -1945,6 +1948,7 @@ config X86_INTEL_SHADOW_STACK_USER depends on AS_HAS_SHADOW_STACK select ARCH_USES_HIGH_VMA_FLAGS select X86_INTEL_CET + select ARCH_MAYBE_MKWRITE select ARCH_HAS_SHADOW_STACK help Shadow Stacks provides protection against program stack diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index dfd82f51ba66..a9666b64bc05 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -610,6 +610,24 @@ int pmdp_clear_flush_young(struct vm_area_struct *vma, } #endif +#ifdef CONFIG_ARCH_MAYBE_MKWRITE +pte_t arch_maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) +{ + if (likely(vma->vm_flags & VM_SHSTK)) + pte = pte_mkwrite_shstk(pte); + return pte; +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +pmd_t arch_maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) +{ + if (likely(vma->vm_flags & VM_SHSTK)) + pmd = pmd_mkwrite_shstk(pmd); + return pmd; +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#endif /* CONFIG_ARCH_MAYBE_MKWRITE */ + /** * reserve_top_address - reserves a hole in the top of kernel address space * @reserve - size of hole to reserve diff --git a/include/linux/mm.h b/include/linux/mm.h index 62f5f496a6d1..9c2efefb9281 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -962,6 +962,8 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) { if (likely(vma->vm_flags & VM_WRITE)) pte = pte_mkwrite(pte); + else + pte = arch_maybe_mkwrite(pte, vma); return pte; } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index a124c21e3204..4445d009f5ec 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1354,6 +1354,30 @@ static inline bool arch_has_pfn_modify_check(void) } #endif /* !_HAVE_ARCH_PFN_MODIFY_ALLOWED */ +#ifdef CONFIG_MMU +#ifdef CONFIG_ARCH_MAYBE_MKWRITE +pte_t arch_maybe_mkwrite(pte_t pte, struct vm_area_struct *vma); + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +pmd_t arch_maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + +#else /* !CONFIG_ARCH_MAYBE_MKWRITE */ +static inline pte_t arch_maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) +{ + return pte; +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static inline pmd_t arch_maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) +{ + return pmd; +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + +#endif /* CONFIG_ARCH_MAYBE_MKWRITE */ +#endif /* CONFIG_MMU */ + /* * Architecture PAGE_KERNEL_* fallbacks * diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2ccff8472cd4..a580b5fb6e1a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -464,6 +464,8 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { if (likely(vma->vm_flags & VM_WRITE)) pmd = pmd_mkwrite(pmd); + else + pmd = arch_maybe_pmd_mkwrite(pmd, vma); return pmd; } From patchwork Tue Aug 25 00:25:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734539 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9B151739 for ; Tue, 25 Aug 2020 00:30:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7237C207D3 for ; Tue, 25 Aug 2020 00:30:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7237C207D3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8C8C28E000A; Mon, 24 Aug 2020 20:29:41 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7829C8E001B; Mon, 24 Aug 2020 20:29:41 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4F5FF8E000A; Mon, 24 Aug 2020 20:29:41 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2A1FE900007 for ; Mon, 24 Aug 2020 20:29:41 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id E5E82362B for ; Tue, 25 Aug 2020 00:29:40 +0000 (UTC) X-FDA: 77187207720.17.bikes16_1f0635127057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id A4DB5180D0181 for ; Tue, 25 Aug 2020 00:29:40 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30054:30056:30064:30070,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04ygw78hrmwci6p7r3rq7gpi4f6q8ocf7ug9yofx4wnokgxmhcptck6q8s3dhng.szwrd4n5iui6aqzcsw8qi5c4mu7yqbnaoowj43cxwiaehtqdxk79hiuc48gzjmn.g-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: bikes16_1f0635127057 X-Filterd-Recvd-Size: 5300 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf20.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:39 +0000 (UTC) IronPort-SDR: RT6o4aF6N2DDVbyZOgDAMJnmwHUjaSrgDpk6jDTEXHNp4Jz1Digqp41MZvJ7VmWCNWa+XT4818 MNGqMDz9+xVg== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061739" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061739" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:39 -0700 IronPort-SDR: pKdpaWPdChMO9DCUjRoLpWHeOE4/ojNhhjVFXDckLxnoVaJOF3VhulhubdhWfJAvgK2WnZ3tPo va9eDuBxHaVQ== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474134996" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:38 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu Subject: [PATCH v11 15/25] mm: Fixup places that call pte_mkwrite() directly Date: Mon, 24 Aug 2020 17:25:30 -0700 Message-Id: <20200825002540.3351-16-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: A4DB5180D0181 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A shadow stack page is made writable by pte_mkwrite_shstk(), which sets _PAGE_DIRTY_HW. There are a few places that call pte_mkwrite() directly and miss the maybe_mkwrite() fixup in the previous patch. Fix them with maybe_mkwrite(): - do_anonymous_page() and migrate_vma_insert_page() check VM_WRITE directly and call pte_mkwrite(), which is the same as maybe_mkwrite(). Change them to maybe_mkwrite(). - In do_numa_page(), if the numa entry 'was-writable', then pte_mkwrite() is called directly. Fix it by doing maybe_mkwrite(). - In change_pte_range(), pte_mkwrite() is called directly. Replace it with maybe_mkwrite(). Signed-off-by: Yu-cheng Yu --- mm/memory.c | 5 ++--- mm/migrate.c | 3 +-- mm/mprotect.c | 2 +- 3 files changed, 4 insertions(+), 6 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 3a7779d9891d..db34853dd206 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3387,8 +3387,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) entry = mk_pte(page, vma->vm_page_prot); entry = pte_sw_mkyoung(entry); - if (vma->vm_flags & VM_WRITE) - entry = pte_mkwrite(pte_mkdirty(entry)); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); @@ -4042,7 +4041,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) pte = pte_modify(old_pte, vma->vm_page_prot); pte = pte_mkyoung(pte); if (was_writable) - pte = pte_mkwrite(pte); + pte = maybe_mkwrite(pte, vma); ptep_modify_prot_commit(vma, vmf->address, vmf->pte, old_pte, pte); update_mmu_cache(vma, vmf->address, vmf->pte); diff --git a/mm/migrate.c b/mm/migrate.c index 34a842a8eb6a..b7199a1dc449 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2897,8 +2897,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, } } else { entry = mk_pte(page, vma->vm_page_prot); - if (vma->vm_flags & VM_WRITE) - entry = pte_mkwrite(pte_mkdirty(entry)); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); diff --git a/mm/mprotect.c b/mm/mprotect.c index ce8b8a5eacbb..a8edbcb3af99 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -135,7 +135,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, if (dirty_accountable && pte_dirty(ptent) && (pte_soft_dirty(ptent) || !(vma->vm_flags & VM_SOFTDIRTY))) { - ptent = pte_mkwrite(ptent); + ptent = maybe_mkwrite(ptent, vma); } ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); pages++; From patchwork Tue Aug 25 00:25:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734541 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 34793739 for ; Tue, 25 Aug 2020 00:30:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0BA2B207D3 for ; Tue, 25 Aug 2020 00:30:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0BA2B207D3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D08888E001C; Mon, 24 Aug 2020 20:29:41 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id CBD2C8E001B; Mon, 24 Aug 2020 20:29:41 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A43E98E001D; Mon, 24 Aug 2020 20:29:41 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0250.hostedemail.com [216.40.44.250]) by kanga.kvack.org (Postfix) with ESMTP id 79A818E001C for ; Mon, 24 Aug 2020 20:29:41 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 417B9181AEF1D for ; Tue, 25 Aug 2020 00:29:41 +0000 (UTC) X-FDA: 77187207762.18.fork71_44063ee27057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id 17818100EC661 for ; Tue, 25 Aug 2020 00:29:41 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30054:30056:30064,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yfcdsq17upc5ous54ajzqs1m89zych91ne5baonnfy7rpum4gm9jhhxgua4h3.iqdfeufm19tme9rddinuf5cnn484sowk1pixz4hifo4a4hbow6shxakzbbn4etd.q-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: fork71_44063ee27057 X-Filterd-Recvd-Size: 5926 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:40 +0000 (UTC) IronPort-SDR: jAy9uyobFu8h3a5XH3/nhoR/75g5swQKkXbPjqu4S1EmkSyXjKaFfnLl7UuusuUS4R6ANxpI0M zA97w6LzJluw== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061741" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061741" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:39 -0700 IronPort-SDR: ImGOLQcTSsZKHqrNuXXAiFacd1lAl5GqfBl/eKiS3sJo3nKhmeh7hOcDC+xBM8TDvofjyNNq3y JirNgz2TlPjw== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474135001" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:39 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu Subject: [PATCH v11 16/25] mm: Add guard pages around a shadow stack. Date: Mon, 24 Aug 2020 17:25:31 -0700 Message-Id: <20200825002540.3351-17-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 17818100EC661 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: INCSSP(Q/D) increments shadow stack pointer and 'pops and discards' the first and the last elements in the range, effectively touches those memory areas. The maximum moving distance by INCSSPQ is 255 * 8 = 2040 bytes and 255 * 4 = 1020 bytes by INCSSPD. Both ranges are far from PAGE_SIZE. Thus, putting a gap page on both ends of a shadow stack prevents INCSSP, CALL, and RET from going beyond. Signed-off-by: Yu-cheng Yu --- v10: - Define ARCH_SHADOW_STACK_GUARD_GAP. arch/x86/include/asm/processor.h | 10 ++++++++++ include/linux/mm.h | 24 ++++++++++++++++++++---- 2 files changed, 30 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 97143d87994c..01acbd63cad8 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -840,6 +840,16 @@ static inline void spin_lock_prefetch(const void *x) #define STACK_TOP TASK_SIZE_LOW #define STACK_TOP_MAX TASK_SIZE_MAX +/* + * Shadow stack pointer is moved by CALL, JMP, and INCSSP(Q/D). INCSSPQ + * moves shadow stack pointer up to 255 * 8 = ~2 KB (~1KB for INCSSPD) and + * touches the first and the last element in the range, which triggers a + * page fault if the range is not in a shadow stack. Because of this, + * creating 4-KB guard pages around a shadow stack prevents these + * instructions from going beyond. + */ +#define ARCH_SHADOW_STACK_GUARD_GAP PAGE_SIZE + #define INIT_THREAD { \ .addr_limit = KERNEL_DS, \ } diff --git a/include/linux/mm.h b/include/linux/mm.h index 9c2efefb9281..d437ce0c85ac 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2608,6 +2608,10 @@ extern vm_fault_t filemap_page_mkwrite(struct vm_fault *vmf); int __must_check write_one_page(struct page *page); void task_dirty_inc(struct task_struct *tsk); +#ifndef ARCH_SHADOW_STACK_GUARD_GAP +#define ARCH_SHADOW_STACK_GUARD_GAP 0 +#endif + extern unsigned long stack_guard_gap; /* Generic expand stack which grows the stack according to GROWS{UP,DOWN} */ extern int expand_stack(struct vm_area_struct *vma, unsigned long address); @@ -2640,9 +2644,15 @@ static inline struct vm_area_struct * find_vma_intersection(struct mm_struct * m static inline unsigned long vm_start_gap(struct vm_area_struct *vma) { unsigned long vm_start = vma->vm_start; + unsigned long gap = 0; - if (vma->vm_flags & VM_GROWSDOWN) { - vm_start -= stack_guard_gap; + if (vma->vm_flags & VM_GROWSDOWN) + gap = stack_guard_gap; + else if (vma->vm_flags & VM_SHSTK) + gap = ARCH_SHADOW_STACK_GUARD_GAP; + + if (gap != 0) { + vm_start -= gap; if (vm_start > vma->vm_start) vm_start = 0; } @@ -2652,9 +2662,15 @@ static inline unsigned long vm_start_gap(struct vm_area_struct *vma) static inline unsigned long vm_end_gap(struct vm_area_struct *vma) { unsigned long vm_end = vma->vm_end; + unsigned long gap = 0; + + if (vma->vm_flags & VM_GROWSUP) + gap = stack_guard_gap; + else if (vma->vm_flags & VM_SHSTK) + gap = ARCH_SHADOW_STACK_GUARD_GAP; - if (vma->vm_flags & VM_GROWSUP) { - vm_end += stack_guard_gap; + if (gap != 0) { + vm_end += gap; if (vm_end < vma->vm_end) vm_end = -PAGE_SIZE; } From patchwork Tue Aug 25 00:25:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734543 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BEDA5109B for ; Tue, 25 Aug 2020 00:30:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9662321741 for ; Tue, 25 Aug 2020 00:30:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9662321741 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 517F18E001D; Mon, 24 Aug 2020 20:29:42 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 453858E001B; Mon, 24 Aug 2020 20:29:42 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A6B98E001D; Mon, 24 Aug 2020 20:29:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0084.hostedemail.com [216.40.44.84]) by kanga.kvack.org (Postfix) with ESMTP id 0B7928E001B for ; Mon, 24 Aug 2020 20:29:42 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id CA1391EF1 for ; Tue, 25 Aug 2020 00:29:41 +0000 (UTC) X-FDA: 77187207762.09.music00_1d02c8827057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id A4117180AD817 for ; Tue, 25 Aug 2020 00:29:41 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30001:30054:30056:30064:30070,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yg1tjjw9q1hsf5updnba9sdtw95op8uigy6srgcr9eyneuu41jibnyjta4g9m.3ibmyy6951pr3upb1ywi1kohwyy9j8x3cxodx5qb1p5katchqy16p5fr6e8cimr.1-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: music00_1d02c8827057 X-Filterd-Recvd-Size: 4898 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:40 +0000 (UTC) IronPort-SDR: 2DZegLet1dU5V1OQWnWIixPftZ0VgDzjumQHRB1sOH+dcTayi/oxZDv6ARg7jMtQliiF4mqOs5 qrR3vUBJUgmA== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061743" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061743" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:40 -0700 IronPort-SDR: r+koZSs0zmkzmGOTm3t5LR6fjphbuB22Lpgj77OtO+ecEFeA/wZHmAkw4Qlzbc9Y9C/iPpdl6N nD6apsmz/Y0Q== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474135006" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:39 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu Subject: [PATCH v11 17/25] mm/mmap: Add shadow stack pages to memory accounting Date: Mon, 24 Aug 2020 17:25:32 -0700 Message-Id: <20200825002540.3351-18-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: A4117180AD817 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Account shadow stack pages to stack memory. Signed-off-by: Yu-cheng Yu --- v10: - Use arch_shadow_stack_mapping() to make meaning clear. v8: - Change shadow stake pages from data_vm to stack_vm. arch/x86/mm/pgtable.c | 7 +++++++ include/linux/pgtable.h | 11 +++++++++++ mm/mmap.c | 5 +++++ 3 files changed, 23 insertions(+) diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index a9666b64bc05..68e98f70298b 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -893,3 +893,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) #endif /* CONFIG_X86_64 */ #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ + +#ifdef CONFIG_ARCH_HAS_SHADOW_STACK +bool arch_shadow_stack_mapping(vm_flags_t vm_flags) +{ + return (vm_flags & VM_SHSTK); +} +#endif diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 4445d009f5ec..ea92b592c053 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1378,6 +1378,17 @@ static inline pmd_t arch_maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma #endif /* CONFIG_ARCH_MAYBE_MKWRITE */ #endif /* CONFIG_MMU */ +#ifdef CONFIG_MMU +#ifdef CONFIG_ARCH_HAS_SHADOW_STACK +bool arch_shadow_stack_mapping(vm_flags_t vm_flags); +#else +static inline bool arch_shadow_stack_mapping(vm_flags_t vm_flags) +{ + return false; +} +#endif /* CONFIG_ARCH_HAS_SHADOW_STACK */ +#endif /* CONFIG_MMU */ + /* * Architecture PAGE_KERNEL_* fallbacks * diff --git a/mm/mmap.c b/mm/mmap.c index 40248d84ad5f..574b3f273462 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1682,6 +1682,9 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags) if (file && is_file_hugepages(file)) return 0; + if (arch_shadow_stack_mapping(vm_flags)) + return 1; + return (vm_flags & (VM_NORESERVE | VM_SHARED | VM_WRITE)) == VM_WRITE; } @@ -3352,6 +3355,8 @@ void vm_stat_account(struct mm_struct *mm, vm_flags_t flags, long npages) mm->stack_vm += npages; else if (is_data_mapping(flags)) mm->data_vm += npages; + else if (arch_shadow_stack_mapping(flags)) + mm->stack_vm += npages; } static vm_fault_t special_mapping_fault(struct vm_fault *vmf); From patchwork Tue Aug 25 00:25:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734545 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 50C05739 for ; Tue, 25 Aug 2020 00:30:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1EA25207D3 for ; Tue, 25 Aug 2020 00:30:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1EA25207D3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 25C1B8E001E; Mon, 24 Aug 2020 20:29:43 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1E4748E001B; Mon, 24 Aug 2020 20:29:43 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F06308E001E; Mon, 24 Aug 2020 20:29:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0062.hostedemail.com [216.40.44.62]) by kanga.kvack.org (Postfix) with ESMTP id D7CF88E001B for ; Mon, 24 Aug 2020 20:29:42 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 9D33A1EF1 for ; Tue, 25 Aug 2020 00:29:42 +0000 (UTC) X-FDA: 77187207804.16.camp89_1f091a427057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id 6D8CC100E6903 for ; Tue, 25 Aug 2020 00:29:42 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30029:30054:30056:30064:30069:30070,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04ygpxd36dbe3kwemrf5x5o6smhcoycf9xq7w1ei54wxnbxkceqek9h35331e5d.fcuftbyfdy7w4hjrway5dirwpzz8t4dgm836wx7f5g1of8y3oxuecut854bsecg.1-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:22,LUA_SUMMARY:none X-HE-Tag: camp89_1f091a427057 X-Filterd-Recvd-Size: 5594 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:41 +0000 (UTC) IronPort-SDR: 4Zxf/tG04i9Eu8Ibqp6SQviYmh5Uu1Np82+6V6PxFstTqx4g95juC7+0Ln676MN8fl34qIdoiH XLtvsZCFdFvA== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061746" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061746" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:41 -0700 IronPort-SDR: kH2cVytrlj9cXOXm1Le/m0Vr7A92tHqUFvJZcpuOkik6KpnkyoRh6aourZy3nqyGd7iloRFg72 ZiU1iVgxFGOw== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474135009" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:40 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu Subject: [PATCH v11 18/25] mm: Update can_follow_write_pte() for shadow stack Date: Mon, 24 Aug 2020 17:25:33 -0700 Message-Id: <20200825002540.3351-19-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 6D8CC100E6903 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Can_follow_write_pte() ensures a read-only page is COWed by checking the FOLL_COW flag, and uses pte_dirty() to validate the flag is still valid. Like a writable data page, a shadow stack page is writable, and becomes read-only during copy-on-write, but it is always dirty. Thus, in the can_follow_write_pte() check, it belongs to the writable page case and should be excluded from the read-only page pte_dirty() check. Apply the same changes to can_follow_write_pmd(). Signed-off-by: Yu-cheng Yu --- v10: - Reverse name changes to can_follow_write_*(). mm/gup.c | 8 +++++--- mm/huge_memory.c | 8 +++++--- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index ae096ea7583f..f6a640f2ad57 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -384,9 +384,11 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address, * FOLL_FORCE or a forced COW break can write even to unwritable pte's, * but only after we've gone through a COW cycle and they are dirty. */ -static inline bool can_follow_write_pte(pte_t pte, unsigned int flags) +static inline bool can_follow_write_pte(pte_t pte, unsigned int flags, + struct vm_area_struct *vma) { - return pte_write(pte) || ((flags & FOLL_COW) && pte_dirty(pte)); + return pte_write(pte) || ((flags & FOLL_COW) && pte_dirty(pte) && + !arch_shadow_stack_mapping(vma->vm_flags)); } /* @@ -439,7 +441,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, } if ((flags & FOLL_NUMA) && pte_protnone(pte)) goto no_page; - if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags)) { + if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags, vma)) { pte_unmap_unlock(ptep, ptl); return NULL; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a580b5fb6e1a..0f4ed61a77f1 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1296,9 +1296,11 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd) * FOLL_FORCE or a forced COW break can write even to unwritable pmd's, * but only after we've gone through a COW cycle and they are dirty. */ -static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags) +static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags, + struct vm_area_struct *vma) { - return pmd_write(pmd) || ((flags & FOLL_COW) && pmd_dirty(pmd)); + return pmd_write(pmd) || ((flags & FOLL_COW) && pmd_dirty(pmd) && + !arch_shadow_stack_mapping(vma->vm_flags)); } struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, @@ -1311,7 +1313,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, assert_spin_locked(pmd_lockptr(mm, pmd)); - if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags)) + if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags, vma)) goto out; /* Avoid dumping huge zero page */ From patchwork Tue Aug 25 00:25:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734549 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 490C6739 for ; Tue, 25 Aug 2020 00:30:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 13DBA206EB for ; Tue, 25 Aug 2020 00:30:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 13DBA206EB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8F449900007; Mon, 24 Aug 2020 20:29:44 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 540558E001B; Mon, 24 Aug 2020 20:29:44 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 367FD900007; Mon, 24 Aug 2020 20:29:44 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0075.hostedemail.com [216.40.44.75]) by kanga.kvack.org (Postfix) with ESMTP id 064DD8E001F for ; Mon, 24 Aug 2020 20:29:44 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id C396B8248047 for ; Tue, 25 Aug 2020 00:29:43 +0000 (UTC) X-FDA: 77187207846.02.trail13_4d059de27057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id 81A3310097AA1 for ; Tue, 25 Aug 2020 00:29:43 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:4423:30003:30051:30054:30056:30064:30070:30090,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yg4xqrh3zsc6s14pwzy19y48omeopohz6srun6znd3yn7k37cw8e9s1cz5g33.3mrc3qpdnh9z1f3nwi8w7mi1mbrfodp3b5ofrgbibukthd9tng9j99cce8hezup.s-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:32,LUA_SUMMARY:none X-HE-Tag: trail13_4d059de27057 X-Filterd-Recvd-Size: 12344 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:42 +0000 (UTC) IronPort-SDR: Nlx76VcRoeYzOPFhTKkvtYqq6aZrp0d9gOy/x3zEmDZRlorppxZqQzlkSXlsfr1OleoI/ZZgzT Q3gbwfaVUwLA== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061749" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061749" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:41 -0700 IronPort-SDR: rSTwxk7XXewjizOXAq/AoJScn+WIy74vviqcj88Qqm3EojR+vV91fP/JA5UEbXfUF58n2LIyiC zSbPghq8DNPA== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474135014" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:41 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu , Peter Collingbourne , Andrew Morton Subject: [PATCH v11 19/25] mm: Re-introduce do_mmap_pgoff() Date: Mon, 24 Aug 2020 17:25:34 -0700 Message-Id: <20200825002540.3351-20-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 81A3310097AA1 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There was no more caller passing vm_flags to do_mmap(), and vm_flags was removed from the function's input by: commit 45e55300f114 ("mm: remove unnecessary wrapper function do_mmap_pgoff()"). There is a new user now. Shadow stack allocation passes VM_SHSTK to do_mmap(). Re-introduce the vm_flags and do_mmap_pgoff(). Signed-off-by: Yu-cheng Yu Cc: Peter Collingbourne Cc: Andrew Morton Cc: Oleg Nesterov Cc: linux-mm@kvack.org --- fs/aio.c | 6 +++--- fs/hugetlbfs/inode.c | 2 +- include/linux/fs.h | 2 +- include/linux/mm.h | 12 +++++++++++- ipc/shm.c | 2 +- mm/mmap.c | 16 ++++++++-------- mm/nommu.c | 6 +++--- mm/shmem.c | 2 +- mm/util.c | 4 ++-- 9 files changed, 31 insertions(+), 21 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 5736bff48e9e..91e7cc4a9f17 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -525,9 +525,9 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) return -EINTR; } - ctx->mmap_base = do_mmap(ctx->aio_ring_file, 0, ctx->mmap_size, - PROT_READ | PROT_WRITE, - MAP_SHARED, 0, &unused, NULL); + ctx->mmap_base = do_mmap_pgoff(ctx->aio_ring_file, 0, ctx->mmap_size, + PROT_READ | PROT_WRITE, + MAP_SHARED, 0, &unused, NULL); mmap_write_unlock(mm); if (IS_ERR((void *)ctx->mmap_base)) { ctx->mmap_size = 0; diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index b5c109703daa..f936bcf02cce 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -140,7 +140,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma) * already been checked by prepare_hugepage_range. If you add * any error returns here, do so after setting VM_HUGETLB, so * is_vm_hugetlb_page tests below unmap_region go the right - * way when do_mmap unwinds (may be important on powerpc + * way when do_mmap_pgoff unwinds (may be important on powerpc * and ia64). */ vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND; diff --git a/include/linux/fs.h b/include/linux/fs.h index e019ea2f1347..75a98288e7c5 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -538,7 +538,7 @@ static inline int mapping_mapped(struct address_space *mapping) /* * Might pages of this file have been modified in userspace? - * Note that i_mmap_writable counts all VM_SHARED vmas: do_mmap + * Note that i_mmap_writable counts all VM_SHARED vmas: do_mmap_pgoff * marks vma as VM_SHARED if it is shared, and the file was opened for * writing i.e. vma may be mprotected writable even if now readonly. * diff --git a/include/linux/mm.h b/include/linux/mm.h index d437ce0c85ac..36b239aa5aa7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2553,13 +2553,23 @@ extern unsigned long mmap_region(struct file *file, unsigned long addr, struct list_head *uf); extern unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, - unsigned long pgoff, unsigned long *populate, struct list_head *uf); + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, + struct list_head *uf); extern int __do_munmap(struct mm_struct *, unsigned long, size_t, struct list_head *uf, bool downgrade); extern int do_munmap(struct mm_struct *, unsigned long, size_t, struct list_head *uf); extern int do_madvise(unsigned long start, size_t len_in, int behavior); +static inline unsigned long +do_mmap_pgoff(struct file *file, unsigned long addr, + unsigned long len, unsigned long prot, unsigned long flags, + unsigned long pgoff, unsigned long *populate, + struct list_head *uf) +{ + return do_mmap(file, addr, len, prot, flags, 0, pgoff, populate, uf); +} + #ifdef CONFIG_MMU extern int __mm_populate(unsigned long addr, unsigned long len, int ignore_errors); diff --git a/ipc/shm.c b/ipc/shm.c index f1ed36e3ac9f..6cf24a5994ec 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -1556,7 +1556,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, goto invalid; } - addr = do_mmap(file, addr, size, prot, flags, 0, &populate, NULL); + addr = do_mmap_pgoff(file, addr, size, prot, flags, 0, &populate, NULL); *raddr = addr; err = 0; if (IS_ERR_VALUE(addr)) diff --git a/mm/mmap.c b/mm/mmap.c index 574b3f273462..81d4a00092da 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1030,7 +1030,7 @@ static inline int is_mergeable_anon_vma(struct anon_vma *anon_vma1, * anon_vmas, nor if same anon_vma is assigned but offsets incompatible. * * We don't check here for the merged mmap wrapping around the end of pagecache - * indices (16TB on ia32) because do_mmap() does not permit mmap's which + * indices (16TB on ia32) because do_mmap_pgoff() does not permit mmap's which * wrap, nor mmaps which cover the final page at index -1UL. */ static int @@ -1365,11 +1365,11 @@ static inline bool file_mmap_ok(struct file *file, struct inode *inode, */ unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, - unsigned long flags, unsigned long pgoff, - unsigned long *populate, struct list_head *uf) + unsigned long flags, vm_flags_t vm_flags, + unsigned long pgoff, unsigned long *populate, + struct list_head *uf) { struct mm_struct *mm = current->mm; - vm_flags_t vm_flags; int pkey = 0; *populate = 0; @@ -1431,7 +1431,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, * to. we assume access permissions have been handled by the open * of the memory object, so we don't do any here. */ - vm_flags = calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | + vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; if (flags & MAP_LOCKED) @@ -2233,7 +2233,7 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, /* * mmap_region() will call shmem_zero_setup() to create a file, * so use shmem's get_unmapped_area in case it can be huge. - * do_mmap() will clear pgoff, so match alignment. + * do_mmap_pgoff() will clear pgoff, so match alignment. */ pgoff = 0; get_area = shmem_get_unmapped_area; @@ -3006,7 +3006,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, } file = get_file(vma->vm_file); - ret = do_mmap(vma->vm_file, start, size, + ret = do_mmap_pgoff(vma->vm_file, start, size, prot, flags, pgoff, &populate, NULL); fput(file); out: @@ -3226,7 +3226,7 @@ int insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma) * By setting it to reflect the virtual start address of the * vma, merges and splits can happen in a seamless way, just * using the existing file pgoff checks and manipulations. - * Similarly in do_mmap and in do_brk. + * Similarly in do_mmap_pgoff and in do_brk. */ if (vma_is_anonymous(vma)) { BUG_ON(vma->anon_vma); diff --git a/mm/nommu.c b/mm/nommu.c index 75a327149af1..71a4ea828f06 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -1078,6 +1078,7 @@ unsigned long do_mmap(struct file *file, unsigned long len, unsigned long prot, unsigned long flags, + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf) @@ -1085,7 +1086,6 @@ unsigned long do_mmap(struct file *file, struct vm_area_struct *vma; struct vm_region *region; struct rb_node *rb; - vm_flags_t vm_flags; unsigned long capabilities, result; int ret; @@ -1104,7 +1104,7 @@ unsigned long do_mmap(struct file *file, /* we've determined that we can make the mapping, now translate what we * now know into VMA flags */ - vm_flags = determine_vm_flags(file, prot, flags, capabilities); + vm_flags |= determine_vm_flags(file, prot, flags, capabilities); /* we're going to need to record the mapping */ region = kmem_cache_zalloc(vm_region_jar, GFP_KERNEL); @@ -1763,7 +1763,7 @@ EXPORT_SYMBOL_GPL(access_process_vm); * * Check the shared mappings on an inode on behalf of a shrinking truncate to * make sure that any outstanding VMAs aren't broken and then shrink the - * vm_regions that extend beyond so that do_mmap() doesn't + * vm_regions that extend beyond so that do_mmap_pgoff() doesn't * automatically grant mappings that are too large. */ int nommu_shrink_inode_mappings(struct inode *inode, size_t size, diff --git a/mm/shmem.c b/mm/shmem.c index 271548ca20f3..dea76ecc849b 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -4246,7 +4246,7 @@ EXPORT_SYMBOL_GPL(shmem_file_setup_with_mnt); /** * shmem_zero_setup - setup a shared anonymous mapping - * @vma: the vma to be mmapped is prepared by do_mmap + * @vma: the vma to be mmapped is prepared by do_mmap_pgoff */ int shmem_zero_setup(struct vm_area_struct *vma) { diff --git a/mm/util.c b/mm/util.c index 5ef378a2a038..8d6280c05238 100644 --- a/mm/util.c +++ b/mm/util.c @@ -503,8 +503,8 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr, if (!ret) { if (mmap_write_lock_killable(mm)) return -EINTR; - ret = do_mmap(file, addr, len, prot, flag, pgoff, &populate, - &uf); + ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff, + &populate, &uf); mmap_write_unlock(mm); userfaultfd_unmap_complete(mm, &uf); if (populate) From patchwork Tue Aug 25 00:25:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734547 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B35FC109B for ; Tue, 25 Aug 2020 00:30:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7D488207D3 for ; Tue, 25 Aug 2020 00:30:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7D488207D3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5448B8E001F; Mon, 24 Aug 2020 20:29:44 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 42E0E900008; Mon, 24 Aug 2020 20:29:44 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 281608E0020; Mon, 24 Aug 2020 20:29:44 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0210.hostedemail.com [216.40.44.210]) by kanga.kvack.org (Postfix) with ESMTP id F20548E001B for ; Mon, 24 Aug 2020 20:29:43 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id B9D08181AEF1D for ; Tue, 25 Aug 2020 00:29:43 +0000 (UTC) X-FDA: 77187207846.13.cup11_2006ca127057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id 815A718140B69 for ; Tue, 25 Aug 2020 00:29:43 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30036:30045:30046:30051:30054:30055:30056:30062:30064:30067:30070:30090,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04y86jyimk685ginga9cmjj5ubxz1op5rhpk5xgz1swxd1soqdn6yizeaukxdfm.464zck6d4tk8mu8qih4k7dnhdyjmzzhrdba4hba38j1mwfahcygxchn5mfduaid.h-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: cup11_2006ca127057 X-Filterd-Recvd-Size: 12975 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:42 +0000 (UTC) IronPort-SDR: ny+6NalfXR1A6mezf41OSU1DxrOHrv18c5jBW2ugXCW9Bw5si2d47cdx6HHHm9AV/nyM+edTFm vxQckfuLWTOA== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061752" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061752" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:42 -0700 IronPort-SDR: EE6oepsxvnikXsSwobPjqAW0f7gzd28tdP2eRsPg2VsJbjIn0NtBo5mOEBfJT5pT51HdyWWX1P wBS5kbqd0ISg== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474135022" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:41 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu Subject: [PATCH v11 20/25] x86/cet/shstk: User-mode shadow stack support Date: Mon, 24 Aug 2020 17:25:35 -0700 Message-Id: <20200825002540.3351-21-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 815A718140B69 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch adds basic shadow stack enabling/disabling routines. A task's shadow stack is allocated from memory with VM_SHSTK flag and has a fixed size of min(RLIMIT_STACK, 4GB). Signed-off-by: Yu-cheng Yu --- v11: - Modify alloc_shstk() to take flags and pass to do_mmap(). This is to be used by an arch_prctl() introduced later. v10: - Change no_cet_shstk to no_user_shstk. - Limit shadow stack size to 4 GB, and round_up to PAGE_SIZE. - Replace checking shstk_enabled with shstk_size being zero. - WARN_ON_ONCE() when vm_munmap() fails. v9: - Change cpu_feature_enabled() to static_cpu_has(). - Merge cet_disable_shstk to cet_disable_free_shstk. - Remove the empty slot at the top of the shadow stack, as it is not needed. - Move do_mmap_locked() to alloc_shstk(), which is a static function. v6: - Create a function do_mmap_locked() for shadow stack allocation. v2: - Change noshstk to no_cet_shstk. arch/x86/include/asm/cet.h | 26 ++++ arch/x86/include/asm/disabled-features.h | 8 +- arch/x86/include/asm/processor.h | 5 + arch/x86/kernel/Makefile | 2 + arch/x86/kernel/cet.c | 138 ++++++++++++++++++ arch/x86/kernel/cpu/common.c | 28 ++++ arch/x86/kernel/process.c | 1 + .../arch/x86/include/asm/disabled-features.h | 8 +- 8 files changed, 214 insertions(+), 2 deletions(-) create mode 100644 arch/x86/include/asm/cet.h create mode 100644 arch/x86/kernel/cet.c diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h new file mode 100644 index 000000000000..caac0687c8e4 --- /dev/null +++ b/arch/x86/include/asm/cet.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_CET_H +#define _ASM_X86_CET_H + +#ifndef __ASSEMBLY__ +#include + +struct task_struct; +/* + * Per-thread CET status + */ +struct cet_status { + unsigned long shstk_base; + unsigned long shstk_size; +}; + +#ifdef CONFIG_X86_INTEL_CET +int cet_setup_shstk(void); +void cet_disable_free_shstk(struct task_struct *p); +#else +static inline void cet_disable_free_shstk(struct task_struct *p) {} +#endif + +#endif /* __ASSEMBLY__ */ + +#endif /* _ASM_X86_CET_H */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index 4ea8584682f9..a0e1b24cfa02 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -56,6 +56,12 @@ # define DISABLE_PTI (1 << (X86_FEATURE_PTI & 31)) #endif +#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER +#define DISABLE_SHSTK 0 +#else +#define DISABLE_SHSTK (1<<(X86_FEATURE_SHSTK & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -75,7 +81,7 @@ #define DISABLED_MASK13 0 #define DISABLED_MASK14 0 #define DISABLED_MASK15 0 -#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP) +#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP|DISABLE_SHSTK) #define DISABLED_MASK17 0 #define DISABLED_MASK18 0 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 01acbd63cad8..8c874d6ce871 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -27,6 +27,7 @@ struct vm86; #include #include #include +#include #include #include @@ -542,6 +543,10 @@ struct thread_struct { unsigned int sig_on_uaccess_err:1; +#ifdef CONFIG_X86_INTEL_CET + struct cet_status cet; +#endif + /* Floating point and extended processor state */ struct fpu fpu; /* diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index e77261db2391..76f27f518266 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -145,6 +145,8 @@ obj-$(CONFIG_UNWINDER_ORC) += unwind_orc.o obj-$(CONFIG_UNWINDER_FRAME_POINTER) += unwind_frame.o obj-$(CONFIG_UNWINDER_GUESS) += unwind_guess.o +obj-$(CONFIG_X86_INTEL_CET) += cet.o + ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c new file mode 100644 index 000000000000..a1be8ccee1cc --- /dev/null +++ b/arch/x86/kernel/cet.c @@ -0,0 +1,138 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * cet.c - Control-flow Enforcement (CET) + * + * Copyright (c) 2019, Intel Corporation. + * Yu-cheng Yu + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static void start_update_msrs(void) +{ + fpregs_lock(); + if (test_thread_flag(TIF_NEED_FPU_LOAD)) + __fpregs_load_activate(); +} + +static void end_update_msrs(void) +{ + fpregs_unlock(); +} + +static unsigned long cet_get_shstk_addr(void) +{ + struct fpu *fpu = ¤t->thread.fpu; + unsigned long ssp = 0; + + fpregs_lock(); + + if (fpregs_state_valid(fpu, smp_processor_id())) { + rdmsrl(MSR_IA32_PL3_SSP, ssp); + } else { + struct cet_user_state *p; + + p = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER); + if (p) + ssp = p->user_ssp; + } + + fpregs_unlock(); + return ssp; +} + +static unsigned long alloc_shstk(unsigned long size, int flags) +{ + struct mm_struct *mm = current->mm; + unsigned long addr, populate; + + /* VM_SHSTK requires MAP_ANONYMOUS, MAP_PRIVATE */ + flags |= MAP_ANONYMOUS | MAP_PRIVATE; + + mmap_write_lock(mm); + addr = do_mmap(NULL, 0, size, PROT_READ, flags, VM_SHSTK, 0, + &populate, NULL); + mmap_write_unlock(mm); + + if (populate) + mm_populate(addr, populate); + + return addr; +} + +int cet_setup_shstk(void) +{ + unsigned long addr, size; + struct cet_status *cet = ¤t->thread.cet; + + if (!static_cpu_has(X86_FEATURE_SHSTK)) + return -EOPNOTSUPP; + + size = round_up(min(rlimit(RLIMIT_STACK), 1UL << 32), PAGE_SIZE); + addr = alloc_shstk(size, 0); + + if (IS_ERR_VALUE(addr)) + return PTR_ERR((void *)addr); + + cet->shstk_base = addr; + cet->shstk_size = size; + + start_update_msrs(); + wrmsrl(MSR_IA32_PL3_SSP, addr + size); + wrmsrl(MSR_IA32_U_CET, CET_SHSTK_EN); + end_update_msrs(); + return 0; +} + +void cet_disable_free_shstk(struct task_struct *tsk) +{ + struct cet_status *cet = &tsk->thread.cet; + + if (!static_cpu_has(X86_FEATURE_SHSTK) || + !cet->shstk_size || !cet->shstk_base) + return; + + if (!tsk->mm || (tsk->mm != current->mm)) + return; + + if (tsk == current) { + u64 msr_val; + + start_update_msrs(); + rdmsrl(MSR_IA32_U_CET, msr_val); + wrmsrl(MSR_IA32_U_CET, msr_val & ~CET_SHSTK_EN); + wrmsrl(MSR_IA32_PL3_SSP, 0); + end_update_msrs(); + } + + while (1) { + int r; + + r = vm_munmap(cet->shstk_base, cet->shstk_size); + + /* + * Retry if mmap_lock is not available. + */ + if (r == -EINTR) { + cond_resched(); + continue; + } + + WARN_ON_ONCE(r); + break; + } + cet->shstk_base = 0; + cet->shstk_size = 0; +} diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index c5d6f17d9b9d..5f60ddaabc46 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -56,6 +56,7 @@ #include #include #include +#include #include #include "cpu.h" @@ -509,6 +510,32 @@ static __init int setup_disable_pku(char *arg) __setup("nopku", setup_disable_pku); #endif /* CONFIG_X86_64 */ +static __always_inline void setup_cet(struct cpuinfo_x86 *c) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK) && + !cpu_feature_enabled(X86_FEATURE_IBT)) + return; + + cr4_set_bits(X86_CR4_CET); +} + +#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER +static __init int setup_disable_shstk(char *s) +{ + /* require an exact match without trailing characters */ + if (s[0] != '\0') + return 0; + + if (!boot_cpu_has(X86_FEATURE_SHSTK)) + return 1; + + setup_clear_cpu_cap(X86_FEATURE_SHSTK); + pr_info("x86: 'no_user_shstk' specified, disabling user Shadow Stack\n"); + return 1; +} +__setup("no_user_shstk", setup_disable_shstk); +#endif + /* * Some CPU features depend on higher CPUID levels, which may not always * be available due to CPUID level capping or broken virtualization @@ -1544,6 +1571,7 @@ static void identify_cpu(struct cpuinfo_x86 *c) x86_init_rdrand(c); setup_pku(c); + setup_cet(c); /* * Clear/Set all flags overridden by options, need do it diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 994d8393f2f7..e41f1a468ee3 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -42,6 +42,7 @@ #include #include #include +#include #include "process.h" diff --git a/tools/arch/x86/include/asm/disabled-features.h b/tools/arch/x86/include/asm/disabled-features.h index 4ea8584682f9..a0e1b24cfa02 100644 --- a/tools/arch/x86/include/asm/disabled-features.h +++ b/tools/arch/x86/include/asm/disabled-features.h @@ -56,6 +56,12 @@ # define DISABLE_PTI (1 << (X86_FEATURE_PTI & 31)) #endif +#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER +#define DISABLE_SHSTK 0 +#else +#define DISABLE_SHSTK (1<<(X86_FEATURE_SHSTK & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -75,7 +81,7 @@ #define DISABLED_MASK13 0 #define DISABLED_MASK14 0 #define DISABLED_MASK15 0 -#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP) +#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP|DISABLE_SHSTK) #define DISABLED_MASK17 0 #define DISABLED_MASK18 0 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19) From patchwork Tue Aug 25 00:25:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734551 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 58F95739 for ; Tue, 25 Aug 2020 00:30:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 16B82206EB for ; Tue, 25 Aug 2020 00:30:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 16B82206EB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2D222900009; Mon, 24 Aug 2020 20:29:45 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 236A7900008; Mon, 24 Aug 2020 20:29:45 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0AABA900009; Mon, 24 Aug 2020 20:29:44 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0125.hostedemail.com [216.40.44.125]) by kanga.kvack.org (Postfix) with ESMTP id D679E900008 for ; Mon, 24 Aug 2020 20:29:44 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 9D571362C for ; Tue, 25 Aug 2020 00:29:44 +0000 (UTC) X-FDA: 77187207888.24.elbow30_4f0199427057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id 7AA441A4A5 for ; Tue, 25 Aug 2020 00:29:44 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30012:30045:30046:30051:30054:30056:30064:30069:30070,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04y8m83b1n3qyffnqypjtcs4sh7wnyc8f3qduyz3cexaco6pofzumj9fzknpsxk.khjf1otyhrsqzc976bubffzrakjiodn5rkesncens4hyhf818o68tucjr45zgtb.y-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: elbow30_4f0199427057 X-Filterd-Recvd-Size: 18680 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:43 +0000 (UTC) IronPort-SDR: hn8y8st2xGnxHgj/3AKw+WDT18EMjEdaV76MEXSrjORyGZeTmoCELRpTkSuMpqm3QVL9tImvWV 6QWbqPauEIbw== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061754" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061754" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:43 -0700 IronPort-SDR: nrCnJaMP+F5srhUHKuRmw6BCDqkCjxfjXEb4//3V7Yk161t9mpmsNBO8PUtVBvqTfZL3qm2mdC ZpvMu80AmNrQ== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474135026" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:42 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu Subject: [PATCH v11 21/25] x86/cet/shstk: Handle signals for shadow stack Date: Mon, 24 Aug 2020 17:25:36 -0700 Message-Id: <20200825002540.3351-22-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 7AA441A4A5 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To deliver a signal, create a shadow stack restore token and put a restore token and the signal restorer address on the shadow stack. For sigreturn, verify the token and restore the shadow stack pointer. Introduce WRUSS, which is a kernel-mode instruction but writes directly to user shadow stack. It is used to construct the user signal stack as described above. Introduce a signal context extension struct 'sc_ext', which is used to save shadow stack restore token address and WAIT_ENDBR status. WAIT_ENDBR will be introduced later in the Indirect Branch Tracking (IBT) series, but add that into sc_ext now to keep the struct stable in case the IBT series is applied later. Signed-off-by: Yu-cheng Yu --- v10: - Combine with WRUSS instruction patch, since it is used only here. - Revise signal restore code to the latest supervisor states handling. Move shadow stack restore token checking out of the fast path. v9: - Update CET MSR access according to XSAVES supervisor state changes. - Add 'wait_endbr' to struct 'sc_ext'. - Update and simplify signal frame allocation, setup, and restoration. - Update commit log text. v2: - Move CET status from sigcontext to a separate struct sc_ext, which is located above the fpstate on the signal frame. - Add a restore token for sigreturn address. arch/x86/ia32/ia32_signal.c | 17 +++ arch/x86/include/asm/cet.h | 8 ++ arch/x86/include/asm/fpu/internal.h | 10 ++ arch/x86/include/asm/special_insns.h | 32 ++++++ arch/x86/include/uapi/asm/sigcontext.h | 9 ++ arch/x86/kernel/cet.c | 152 +++++++++++++++++++++++++ arch/x86/kernel/fpu/signal.c | 100 ++++++++++++++++ arch/x86/kernel/signal.c | 10 ++ 8 files changed, 338 insertions(+) diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c index 81cf22398cd1..cec9cf0a00cf 100644 --- a/arch/x86/ia32/ia32_signal.c +++ b/arch/x86/ia32/ia32_signal.c @@ -35,6 +35,7 @@ #include #include #include +#include static inline void reload_segments(struct sigcontext_32 *sc) { @@ -205,6 +206,7 @@ static void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs, void __user **fpstate) { unsigned long sp, fx_aligned, math_size; + void __user *restorer = NULL; /* Default to using normal stack */ sp = regs->sp; @@ -218,8 +220,23 @@ static void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs, ksig->ka.sa.sa_restorer) sp = (unsigned long) ksig->ka.sa.sa_restorer; + if (ksig->ka.sa.sa_flags & SA_RESTORER) { + restorer = ksig->ka.sa.sa_restorer; + } else if (current->mm->context.vdso) { + if (ksig->ka.sa.sa_flags & SA_SIGINFO) + restorer = current->mm->context.vdso + + vdso_image_32.sym___kernel_rt_sigreturn; + else + restorer = current->mm->context.vdso + + vdso_image_32.sym___kernel_sigreturn; + } + sp = fpu__alloc_mathframe(sp, 1, &fx_aligned, &math_size); *fpstate = (struct _fpstate_32 __user *) sp; + + if (save_cet_to_sigframe(1, *fpstate, (unsigned long)restorer)) + return (void __user *) -1L; + if (copy_fpstate_to_sigframe(*fpstate, (void __user *)fx_aligned, math_size) < 0) return (void __user *) -1L; diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index caac0687c8e4..56fe08eebae6 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -6,6 +6,8 @@ #include struct task_struct; +struct sc_ext; + /* * Per-thread CET status */ @@ -17,8 +19,14 @@ struct cet_status { #ifdef CONFIG_X86_INTEL_CET int cet_setup_shstk(void); void cet_disable_free_shstk(struct task_struct *p); +int cet_verify_rstor_token(bool ia32, unsigned long ssp, unsigned long *new_ssp); +void cet_restore_signal(struct sc_ext *sc); +int cet_setup_signal(bool ia32, unsigned long rstor, struct sc_ext *sc); #else static inline void cet_disable_free_shstk(struct task_struct *p) {} +static inline void cet_restore_signal(struct sc_ext *sc) { return; } +static inline int cet_setup_signal(bool ia32, unsigned long rstor, + struct sc_ext *sc) { return -EINVAL; } #endif #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h index 0a460f2a3f90..cd4249b37f45 100644 --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -442,6 +442,16 @@ static inline void copy_kernel_to_fpregs(union fpregs_state *fpstate) __copy_kernel_to_fpregs(fpstate, -1); } +#ifdef CONFIG_X86_INTEL_CET +extern int save_cet_to_sigframe(int ia32, void __user *fp, + unsigned long restorer); +#else +static inline int save_cet_to_sigframe(int ia32, void __user *fp, + unsigned long restorer) +{ + return 0; +} +#endif extern int copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size); /* diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 59a3e13204c3..21c42cef12a9 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -232,6 +232,38 @@ static inline void clwb(volatile void *__p) : [pax] "a" (p)); } +#ifdef CONFIG_X86_INTEL_CET +#if defined(CONFIG_IA32_EMULATION) || defined(CONFIG_X86_X32) +static inline int write_user_shstk_32(unsigned long addr, unsigned int val) +{ + asm_volatile_goto("1: wrussd %1, (%0)\n" + _ASM_EXTABLE(1b, %l[fail]) + :: "r" (addr), "r" (val) + :: fail); + return 0; +fail: + return -EPERM; +} +#else +static inline int write_user_shstk_32(unsigned long addr, unsigned int val) +{ + WARN_ONCE(1, "%s used but not supported.\n", __func__); + return -EFAULT; +} +#endif + +static inline int write_user_shstk_64(unsigned long addr, unsigned long val) +{ + asm_volatile_goto("1: wrussq %1, (%0)\n" + _ASM_EXTABLE(1b, %l[fail]) + :: "r" (addr), "r" (val) + :: fail); + return 0; +fail: + return -EPERM; +} +#endif /* CONFIG_X86_INTEL_CET */ + #define nop() asm volatile ("nop") #endif /* __KERNEL__ */ diff --git a/arch/x86/include/uapi/asm/sigcontext.h b/arch/x86/include/uapi/asm/sigcontext.h index 844d60eb1882..cf2d55db3be4 100644 --- a/arch/x86/include/uapi/asm/sigcontext.h +++ b/arch/x86/include/uapi/asm/sigcontext.h @@ -196,6 +196,15 @@ struct _xstate { /* New processor state extensions go here: */ }; +/* + * Located at the end of sigcontext->fpstate, aligned to 8. + */ +struct sc_ext { + unsigned long total_size; + unsigned long ssp; + unsigned long wait_endbr; +}; + /* * The 32-bit signal frame: */ diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c index a1be8ccee1cc..a08a956ac3aa 100644 --- a/arch/x86/kernel/cet.c +++ b/arch/x86/kernel/cet.c @@ -19,6 +19,8 @@ #include #include #include +#include +#include static void start_update_msrs(void) { @@ -72,6 +74,80 @@ static unsigned long alloc_shstk(unsigned long size, int flags) return addr; } +#define TOKEN_MODE_MASK 3UL +#define TOKEN_MODE_64 1UL +#define IS_TOKEN_64(token) ((token & TOKEN_MODE_MASK) == TOKEN_MODE_64) +#define IS_TOKEN_32(token) ((token & TOKEN_MODE_MASK) == 0) + +/* + * Verify the restore token at the address of 'ssp' is + * valid and then set shadow stack pointer according to the + * token. + */ +int cet_verify_rstor_token(bool ia32, unsigned long ssp, + unsigned long *new_ssp) +{ + unsigned long token; + + *new_ssp = 0; + + if (!IS_ALIGNED(ssp, 8)) + return -EINVAL; + + if (get_user(token, (unsigned long __user *)ssp)) + return -EFAULT; + + /* Is 64-bit mode flag correct? */ + if (!ia32 && !IS_TOKEN_64(token)) + return -EINVAL; + else if (ia32 && !IS_TOKEN_32(token)) + return -EINVAL; + + token &= ~TOKEN_MODE_MASK; + + /* + * Restore address properly aligned? + */ + if ((!ia32 && !IS_ALIGNED(token, 8)) || !IS_ALIGNED(token, 4)) + return -EINVAL; + + /* + * Token was placed properly? + */ + if ((ALIGN_DOWN(token, 8) - 8) != ssp) + return -EINVAL; + + *new_ssp = token; + return 0; +} + +/* + * Create a restore token on the shadow stack. + * A token is always 8-byte and aligned to 8. + */ +static int create_rstor_token(bool ia32, unsigned long ssp, + unsigned long *new_ssp) +{ + unsigned long addr; + + *new_ssp = 0; + + if ((!ia32 && !IS_ALIGNED(ssp, 8)) || !IS_ALIGNED(ssp, 4)) + return -EINVAL; + + addr = ALIGN_DOWN(ssp, 8) - 8; + + /* Is the token for 64-bit? */ + if (!ia32) + ssp |= TOKEN_MODE_64; + + if (write_user_shstk_64(addr, ssp)) + return -EFAULT; + + *new_ssp = addr; + return 0; +} + int cet_setup_shstk(void) { unsigned long addr, size; @@ -136,3 +212,79 @@ void cet_disable_free_shstk(struct task_struct *tsk) cet->shstk_base = 0; cet->shstk_size = 0; } + +/* + * Called from __fpu__restore_sig() and XSAVES buffer is protected by + * set_thread_flag(TIF_NEED_FPU_LOAD) in the slow path. + */ +void cet_restore_signal(struct sc_ext *sc_ext) +{ + struct cet_user_state *cet_user_state; + struct cet_status *cet = ¤t->thread.cet; + u64 msr_val = 0; + + if (!static_cpu_has(X86_FEATURE_SHSTK)) + return; + + cet_user_state = get_xsave_addr(¤t->thread.fpu.state.xsave, + XFEATURE_CET_USER); + if (!cet_user_state) + return; + + if (cet->shstk_size) { + if (test_thread_flag(TIF_NEED_FPU_LOAD)) + cet_user_state->user_ssp = sc_ext->ssp; + else + wrmsrl(MSR_IA32_PL3_SSP, sc_ext->ssp); + + msr_val |= CET_SHSTK_EN; + } + + if (test_thread_flag(TIF_NEED_FPU_LOAD)) + cet_user_state->user_cet = msr_val; + else + wrmsrl(MSR_IA32_U_CET, msr_val); +} + +/* + * Setup the shadow stack for the signal handler: first, + * create a restore token to keep track of the current ssp, + * and then the return address of the signal handler. + */ +int cet_setup_signal(bool ia32, unsigned long rstor_addr, struct sc_ext *sc_ext) +{ + struct cet_status *cet = ¤t->thread.cet; + unsigned long ssp = 0, new_ssp = 0; + int err; + + if (cet->shstk_size) { + if (!rstor_addr) + return -EINVAL; + + ssp = cet_get_shstk_addr(); + err = create_rstor_token(ia32, ssp, &new_ssp); + if (err) + return err; + + if (ia32) { + ssp = new_ssp - sizeof(u32); + err = write_user_shstk_32(ssp, (unsigned int)rstor_addr); + } else { + ssp = new_ssp - sizeof(u64); + err = write_user_shstk_64(ssp, rstor_addr); + } + + if (err) + return err; + + sc_ext->ssp = new_ssp; + } + + if (ssp) { + start_update_msrs(); + wrmsrl(MSR_IA32_PL3_SSP, ssp); + end_update_msrs(); + } + + return 0; +} diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c index a4ec65317a7f..d02ea8c11128 100644 --- a/arch/x86/kernel/fpu/signal.c +++ b/arch/x86/kernel/fpu/signal.c @@ -52,6 +52,74 @@ static inline int check_for_xstate(struct fxregs_state __user *buf, return 0; } +#ifdef CONFIG_X86_INTEL_CET +int save_cet_to_sigframe(int ia32, void __user *fp, unsigned long restorer) +{ + int err = 0; + + if (!current->thread.cet.shstk_size) + return 0; + + if (fp) { + struct sc_ext ext = {0, 0, 0}; + + err = cet_setup_signal(ia32, restorer, &ext); + if (!err) { + void __user *p = fp; + + ext.total_size = sizeof(ext); + + if (ia32) + p += sizeof(struct fregs_state); + + p += fpu_user_xstate_size + FP_XSTATE_MAGIC2_SIZE; + p = (void __user *)ALIGN((unsigned long)p, 8); + + if (copy_to_user(p, &ext, sizeof(ext))) + return -EFAULT; + } + } + + return err; +} + +static int get_cet_from_sigframe(int ia32, void __user *fp, struct sc_ext *ext) +{ + int err = 0; + + memset(ext, 0, sizeof(*ext)); + + if (!current->thread.cet.shstk_size) + return 0; + + if (fp) { + void __user *p = fp; + + if (ia32) + p += sizeof(struct fregs_state); + + p += fpu_user_xstate_size + FP_XSTATE_MAGIC2_SIZE; + p = (void __user *)ALIGN((unsigned long)p, 8); + + if (copy_from_user(ext, p, sizeof(*ext))) + return -EFAULT; + + if (ext->total_size != sizeof(*ext)) + return -EFAULT; + + if (current->thread.cet.shstk_size) + err = cet_verify_rstor_token(ia32, ext->ssp, &ext->ssp); + } + + return err; +} +#else +static int get_cet_from_sigframe(int ia32, void __user *fp, struct sc_ext *ext) +{ + return 0; +} +#endif + /* * Signal frame handlers. */ @@ -295,6 +363,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) struct task_struct *tsk = current; struct fpu *fpu = &tsk->thread.fpu; struct user_i387_ia32_struct env; + struct sc_ext sc_ext; u64 user_xfeatures = 0; int fx_only = 0; int ret = 0; @@ -335,6 +404,10 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) if ((unsigned long)buf_fx % 64) fx_only = 1; + ret = get_cet_from_sigframe(ia32_fxstate, buf, &sc_ext); + if (ret) + return ret; + if (!ia32_fxstate) { /* * Attempt to restore the FPU registers directly from user @@ -349,6 +422,8 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) pagefault_enable(); if (!ret) { + cet_restore_signal(&sc_ext); + /* * Restore supervisor states: previous context switch * etc has done XSAVES and saved the supervisor states @@ -423,6 +498,8 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) if (unlikely(init_bv)) copy_kernel_to_xregs(&init_fpstate.xsave, init_bv); + cet_restore_signal(&sc_ext); + /* * Restore previously saved supervisor xstates along with * copied-in user xstates. @@ -491,12 +568,35 @@ int fpu__restore_sig(void __user *buf, int ia32_frame) return __fpu__restore_sig(buf, buf_fx, size); } +#ifdef CONFIG_X86_INTEL_CET +static unsigned long fpu__alloc_sigcontext_ext(unsigned long sp) +{ + struct cet_status *cet = ¤t->thread.cet; + + /* + * sigcontext_ext is at: fpu + fpu_user_xstate_size + + * FP_XSTATE_MAGIC2_SIZE, then aligned to 8. + */ + if (cet->shstk_size) + sp -= (sizeof(struct sc_ext) + 8); + + return sp; +} +#else +static unsigned long fpu__alloc_sigcontext_ext(unsigned long sp) +{ + return sp; +} +#endif + unsigned long fpu__alloc_mathframe(unsigned long sp, int ia32_frame, unsigned long *buf_fx, unsigned long *size) { unsigned long frame_size = xstate_sigframe_size(); + sp = fpu__alloc_sigcontext_ext(sp); + *buf_fx = sp = round_down(sp - frame_size, 64); if (ia32_frame && use_fxsr()) { frame_size += sizeof(struct fregs_state); diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index d5fa494c2304..c5f465f5aeb4 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -46,6 +46,7 @@ #include #include #include +#include #ifdef CONFIG_X86_64 /* @@ -239,6 +240,9 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, unsigned long buf_fx = 0; int onsigstack = on_sig_stack(sp); int ret; +#ifdef CONFIG_X86_64 + void __user *restorer = NULL; +#endif /* redzone */ if (IS_ENABLED(CONFIG_X86_64)) @@ -270,6 +274,12 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size, if (onsigstack && !likely(on_sig_stack(sp))) return (void __user *)-1L; +#ifdef CONFIG_X86_64 + if (ka->sa.sa_flags & SA_RESTORER) + restorer = ka->sa.sa_restorer; + ret = save_cet_to_sigframe(0, *fpstate, (unsigned long)restorer); +#endif + /* save i387 and extended state */ ret = copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size); if (ret < 0) From patchwork Tue Aug 25 00:25:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734553 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B9039739 for ; Tue, 25 Aug 2020 00:30:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8FF00206EB for ; Tue, 25 Aug 2020 00:30:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8FF00206EB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BFEAF90000A; Mon, 24 Aug 2020 20:29:45 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B5BFD900008; Mon, 24 Aug 2020 20:29:45 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 95B8390000A; Mon, 24 Aug 2020 20:29:45 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0158.hostedemail.com [216.40.44.158]) by kanga.kvack.org (Postfix) with ESMTP id 6FA0C900008 for ; Mon, 24 Aug 2020 20:29:45 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 3826C180AD817 for ; Tue, 25 Aug 2020 00:29:45 +0000 (UTC) X-FDA: 77187207930.21.leg94_23044e427057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 09C30180442C0 for ; Tue, 25 Aug 2020 00:29:45 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30054:30056:30064,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yr539s5bnycsi8hbpxzszfizkm8opptqj71fci5rndinhfazixr8uujj5bsj8.hqyfbrfqyi9epxcm4q687oxwe7zqnq4arow7hbr9j4tead6epgwue647bzgjggd.g-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: leg94_23044e427057 X-Filterd-Recvd-Size: 3677 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:44 +0000 (UTC) IronPort-SDR: MOQLudUlqU4d0kxqfSMomo7H6aEbjn0v4GjWoUxU2teIwFKC/NbnmYTsRqES9QPp8qa8yoxqmk vaBnqZsIRdgQ== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061757" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061757" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:44 -0700 IronPort-SDR: nqn/oHxYk+W8TwVi7hu91ZrST/KUFbw43UdK/LoFf3WVMGZWQYSt+Akcvc9rTUBtDYR3DR8XX9 N1d4Oi4cqtng== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474135031" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:43 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu Subject: [PATCH v11 22/25] binfmt_elf: Define GNU_PROPERTY_X86_FEATURE_1_AND properties Date: Mon, 24 Aug 2020 17:25:37 -0700 Message-Id: <20200825002540.3351-23-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 09C30180442C0 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: An ELF file's .note.gnu.property indicates architecture features of the file.. Introduce feature definitions for Shadow Stack and Indirect Branch Tracking. Signed-off-by: Yu-cheng Yu --- include/uapi/linux/elf.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index 22220945a5fd..ca5875f384f6 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -454,4 +454,13 @@ typedef struct elf64_note { /* Bits for GNU_PROPERTY_AARCH64_FEATURE_1_BTI */ #define GNU_PROPERTY_AARCH64_FEATURE_1_BTI (1U << 0) +/* .note.gnu.property types for x86: */ +#define GNU_PROPERTY_X86_FEATURE_1_AND 0xc0000002 + +/* Bits for GNU_PROPERTY_X86_FEATURE_1_AND */ +#define GNU_PROPERTY_X86_FEATURE_1_IBT 0x00000001 +#define GNU_PROPERTY_X86_FEATURE_1_SHSTK 0x00000002 +#define GNU_PROPERTY_X86_FEATURE_1_INVAL ~(GNU_PROPERTY_X86_FEATURE_1_IBT | \ + GNU_PROPERTY_X86_FEATURE_1_SHSTK) + #endif /* _UAPI_LINUX_ELF_H */ From patchwork Tue Aug 25 00:25:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734555 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1E7B7739 for ; Tue, 25 Aug 2020 00:30:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E97E12065F for ; Tue, 25 Aug 2020 00:30:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E97E12065F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A216590000B; Mon, 24 Aug 2020 20:29:46 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9AC1E900008; Mon, 24 Aug 2020 20:29:46 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D6F690000B; Mon, 24 Aug 2020 20:29:46 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0058.hostedemail.com [216.40.44.58]) by kanga.kvack.org (Postfix) with ESMTP id 55E1B900008 for ; Mon, 24 Aug 2020 20:29:46 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 1C1F3180AD815 for ; Tue, 25 Aug 2020 00:29:46 +0000 (UTC) X-FDA: 77187207972.27.brain60_250340127057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id E76F63D668 for ; Tue, 25 Aug 2020 00:29:45 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30045:30054:30056:30064:30070,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yrfhwpx9kqt1fy8zyo1yhz3nqueop3c9e6ek9efzycpssqx8wzuy1rkiy83fc.suga1gn7mn68cokn8ye5t8hp8dgk3k3smn1dis5knqu8ew73pdaqbrixs4mtrg8.4-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: brain60_250340127057 X-Filterd-Recvd-Size: 7726 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:45 +0000 (UTC) IronPort-SDR: 6yBrW2NAt0mzbvmXlEquXaveOeNq7y6vfdlrMIr02xn+ECMSiYtJI60wRKP4wSAjBuk+I+H5om 9UbnQksxlF8g== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061759" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061759" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:44 -0700 IronPort-SDR: O38BpPAaXuiCSMvWvFl4YcZutv8k41p7B/0eD76u20d4BlVXdqQJz8EA+P1aUczqdsnCa39Y8Z FxFXlc5t4gwg== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474135035" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:44 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu , Mark Brown , Catalin Marinas Subject: [PATCH v11 23/25] ELF: Introduce arch_setup_elf_property() Date: Mon, 24 Aug 2020 17:25:38 -0700 Message-Id: <20200825002540.3351-24-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: E76F63D668 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: An ELF file's .note.gnu.property indicates arch features supported by the file. These features are extracted by arch_parse_elf_property() and stored in 'arch_elf_state'. Introduce arch_setup_elf_property() for enabling such features. The first use-case of this function is shadow stack. ARM64 is the other arch that has ARCH_USER_GNU_PROPERTY and arch_parse_elf_ property(). Add arch_setup_elf_property() for it. Signed-off-by: Yu-cheng Yu Cc: Mark Brown Cc: Catalin Marinas Cc: Dave Martin --- v11: - Combine three patches of arch_setup_elf_property() into one. - Add empty arch_setup_elf_property() for arm64. v9: - Change cpu_feature_enabled() to static_cpu_has(). arch/arm64/include/asm/elf.h | 5 +++++ arch/x86/Kconfig | 2 ++ arch/x86/include/asm/elf.h | 13 +++++++++++++ arch/x86/kernel/process_64.c | 32 ++++++++++++++++++++++++++++++++ fs/binfmt_elf.c | 4 ++++ include/linux/elf.h | 6 ++++++ 6 files changed, 62 insertions(+) diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h index 8d1c8dcb87fd..d37bc7915935 100644 --- a/arch/arm64/include/asm/elf.h +++ b/arch/arm64/include/asm/elf.h @@ -281,6 +281,11 @@ static inline int arch_parse_elf_property(u32 type, const void *data, return 0; } +static inline int arch_setup_elf_property(struct arch_elf_state *arch) +{ + return 0; +} + static inline int arch_elf_pt_proc(void *ehdr, void *phdr, struct file *f, bool is_interp, struct arch_elf_state *state) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index e93be385cd04..6b6dad011763 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1950,6 +1950,8 @@ config X86_INTEL_SHADOW_STACK_USER select X86_INTEL_CET select ARCH_MAYBE_MKWRITE select ARCH_HAS_SHADOW_STACK + select ARCH_USE_GNU_PROPERTY + select ARCH_BINFMT_ELF_STATE help Shadow Stacks provides protection against program stack corruption. It's a hardware feature. This only matters diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h index b9a5d488f1a5..0e1be2a13359 100644 --- a/arch/x86/include/asm/elf.h +++ b/arch/x86/include/asm/elf.h @@ -385,6 +385,19 @@ extern int compat_arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp); #define compat_arch_setup_additional_pages compat_arch_setup_additional_pages +#ifdef CONFIG_ARCH_BINFMT_ELF_STATE +struct arch_elf_state { + unsigned int gnu_property; +}; + +#define INIT_ARCH_ELF_STATE { \ + .gnu_property = 0, \ +} + +#define arch_elf_pt_proc(ehdr, phdr, elf, interp, state) (0) +#define arch_check_elf(ehdr, interp, interp_ehdr, state) (0) +#endif + /* Do not change the values. See get_align_mask() */ enum align_flags { ALIGN_VA_32 = BIT(0), diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 9afefe325acb..fd4644865a3b 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -837,3 +837,35 @@ unsigned long KSTK_ESP(struct task_struct *task) { return task_pt_regs(task)->sp; } + +#ifdef CONFIG_ARCH_USE_GNU_PROPERTY +int arch_parse_elf_property(u32 type, const void *data, size_t datasz, + bool compat, struct arch_elf_state *state) +{ + if (type != GNU_PROPERTY_X86_FEATURE_1_AND) + return 0; + + if (datasz != sizeof(unsigned int)) + return -ENOEXEC; + + state->gnu_property = *(unsigned int *)data; + return 0; +} + +int arch_setup_elf_property(struct arch_elf_state *state) +{ + int r = 0; + + if (!IS_ENABLED(CONFIG_X86_INTEL_CET)) + return r; + + memset(¤t->thread.cet, 0, sizeof(struct cet_status)); + + if (static_cpu_has(X86_FEATURE_SHSTK)) { + if (state->gnu_property & GNU_PROPERTY_X86_FEATURE_1_SHSTK) + r = cet_setup_shstk(); + } + + return r; +} +#endif diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 13d053982dd7..2b4cfc256895 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -1217,6 +1217,10 @@ static int load_elf_binary(struct linux_binprm *bprm) set_binfmt(&elf_format); + retval = arch_setup_elf_property(&arch_state); + if (retval < 0) + goto out; + #ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGES retval = arch_setup_additional_pages(bprm, !!interpreter); if (retval < 0) diff --git a/include/linux/elf.h b/include/linux/elf.h index 5d5b0321da0b..4827695ca415 100644 --- a/include/linux/elf.h +++ b/include/linux/elf.h @@ -82,9 +82,15 @@ static inline int arch_parse_elf_property(u32 type, const void *data, { return 0; } + +static inline int arch_setup_elf_property(struct arch_elf_state *arch) +{ + return 0; +} #else extern int arch_parse_elf_property(u32 type, const void *data, size_t datasz, bool compat, struct arch_elf_state *arch); +extern int arch_setup_elf_property(struct arch_elf_state *arch); #endif #ifdef CONFIG_ARCH_HAVE_ELF_PROT From patchwork Tue Aug 25 00:25:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734557 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8EF86739 for ; Tue, 25 Aug 2020 00:30:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 63E862065F for ; Tue, 25 Aug 2020 00:30:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 63E862065F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5678390000C; Mon, 24 Aug 2020 20:29:47 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4F6BB900008; Mon, 24 Aug 2020 20:29:47 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2F78C90000C; Mon, 24 Aug 2020 20:29:47 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0162.hostedemail.com [216.40.44.162]) by kanga.kvack.org (Postfix) with ESMTP id 1209E900008 for ; Mon, 24 Aug 2020 20:29:47 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id CEA061EF1 for ; Tue, 25 Aug 2020 00:29:46 +0000 (UTC) X-FDA: 77187207972.22.elbow96_351242627057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id A07B518038E60 for ; Tue, 25 Aug 2020 00:29:46 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30012:30045:30054:30056:30064,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04y8emri3jnruqw6b943yrmmks7soycwx7epqe9uoqxkwk4pnfi9q5mdxsawah3.i11t6hpoodgdmhnko336cau6setm4p3fu78ei7hrsx9hhprnx4tg9ku5wbic81a.6-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: elbow96_351242627057 X-Filterd-Recvd-Size: 7278 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:45 +0000 (UTC) IronPort-SDR: /5v5wgSAl64llgqNT84SDER+rgk0LXHenn6EiTPl50T4mpHjmfHeDV2dHiluYYU42/8uxX+XcQ FL82MvlJ0WUA== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061761" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061761" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:45 -0700 IronPort-SDR: segEIYojpKHxFpXs8IBCkaAUm8hUntJ+kzOooykdpCjv1So4sW4nBRj6wu9Gxof8afvnG7fytS lb2kpsaxPI5Q== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474135041" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:44 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu Subject: [PATCH v11 24/25] x86/cet/shstk: Handle thread shadow stack Date: Mon, 24 Aug 2020 17:25:39 -0700 Message-Id: <20200825002540.3351-25-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: A07B518038E60 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The kernel allocates (and frees on thread exit) a new shadow stack for a pthread child. It is possible for the kernel to complete the clone syscall and set the child's shadow stack pointer to NULL and let the child thread allocate a shadow stack for itself. There are two issues in this approach: It is not compatible with existing code that does inline syscall and it cannot handle signals before the child can successfully allocate a shadow stack. A 64-bit shadow stack has a size of min(RLIMIT_STACK, 4 GB). A compat-mode thread shadow stack has a size of 1/4 min(RLIMIT_STACK, 4 GB). This allows more threads to run in a 32-bit address space. Signed-off-by: Yu-cheng Yu --- v10: - Limit shadow stack size to 4 GB. arch/x86/include/asm/cet.h | 2 ++ arch/x86/include/asm/mmu_context.h | 3 +++ arch/x86/kernel/cet.c | 41 ++++++++++++++++++++++++++++++ arch/x86/kernel/process.c | 7 +++++ 4 files changed, 53 insertions(+) diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index 56fe08eebae6..71dc92acd2f2 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -18,11 +18,13 @@ struct cet_status { #ifdef CONFIG_X86_INTEL_CET int cet_setup_shstk(void); +int cet_setup_thread_shstk(struct task_struct *p); void cet_disable_free_shstk(struct task_struct *p); int cet_verify_rstor_token(bool ia32, unsigned long ssp, unsigned long *new_ssp); void cet_restore_signal(struct sc_ext *sc); int cet_setup_signal(bool ia32, unsigned long rstor, struct sc_ext *sc); #else +static inline int cet_setup_thread_shstk(struct task_struct *p) { return 0; } static inline void cet_disable_free_shstk(struct task_struct *p) {} static inline void cet_restore_signal(struct sc_ext *sc) { return; } static inline int cet_setup_signal(bool ia32, unsigned long rstor, diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index d98016b83755..e3ca4397258b 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -11,6 +11,7 @@ #include #include +#include #include extern atomic64_t last_mm_ctx_id; @@ -142,6 +143,8 @@ do { \ #else #define deactivate_mm(tsk, mm) \ do { \ + if (!tsk->vfork_done) \ + cet_disable_free_shstk(tsk); \ load_gs_index(0); \ loadsegment(fs, 0); \ } while (0) diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c index a08a956ac3aa..b30c61a66c8e 100644 --- a/arch/x86/kernel/cet.c +++ b/arch/x86/kernel/cet.c @@ -172,6 +172,47 @@ int cet_setup_shstk(void) return 0; } +int cet_setup_thread_shstk(struct task_struct *tsk) +{ + unsigned long addr, size; + struct cet_user_state *state; + struct cet_status *cet = &tsk->thread.cet; + + if (!cet->shstk_size) + return 0; + + state = get_xsave_addr(&tsk->thread.fpu.state.xsave, + XFEATURE_CET_USER); + + if (!state) + return -EINVAL; + + /* Cap shadow stack size to 4 GB */ + size = min(rlimit(RLIMIT_STACK), 1UL << 32); + + /* + * Compat-mode pthreads share a limited address space. + * If each function call takes an average of four slots + * stack space, we need 1/4 of stack size for shadow stack. + */ + if (in_compat_syscall()) + size /= 4; + size = round_up(size, PAGE_SIZE); + addr = alloc_shstk(size, 0); + + if (IS_ERR_VALUE(addr)) { + cet->shstk_base = 0; + cet->shstk_size = 0; + return PTR_ERR((void *)addr); + } + + fpu__prepare_write(&tsk->thread.fpu); + state->user_ssp = (u64)(addr + size); + cet->shstk_base = addr; + cet->shstk_size = size; + return 0; +} + void cet_disable_free_shstk(struct task_struct *tsk) { struct cet_status *cet = &tsk->thread.cet; diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index e41f1a468ee3..b6fe5b061841 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -109,6 +109,7 @@ void exit_thread(struct task_struct *tsk) free_vm86(t); + cet_disable_free_shstk(tsk); fpu__drop(fpu); } @@ -181,6 +182,12 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg, if (clone_flags & CLONE_SETTLS) ret = set_new_tls(p, tls); +#ifdef CONFIG_X86_64 + /* Allocate a new shadow stack for pthread */ + if (!ret && (clone_flags & (CLONE_VFORK | CLONE_VM)) == CLONE_VM) + ret = cet_setup_thread_shstk(p); +#endif + if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP))) io_bitmap_share(p); From patchwork Tue Aug 25 00:25:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 11734559 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DB4B3739 for ; Tue, 25 Aug 2020 00:30:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A510120897 for ; Tue, 25 Aug 2020 00:30:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A510120897 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D34FB900008; Mon, 24 Aug 2020 20:29:47 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C1CE690000D; Mon, 24 Aug 2020 20:29:47 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A4974900008; Mon, 24 Aug 2020 20:29:47 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0195.hostedemail.com [216.40.44.195]) by kanga.kvack.org (Postfix) with ESMTP id 6FD1C90000D for ; Mon, 24 Aug 2020 20:29:47 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 349E31EF1 for ; Tue, 25 Aug 2020 00:29:47 +0000 (UTC) X-FDA: 77187208014.11.house11_260b95427057 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id 05DA8180F8B82 for ; Tue, 25 Aug 2020 00:29:46 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yu-cheng.yu@intel.com,,RULES_HIT:30003:30046:30051:30054:30056:30064:30069:30070:30075,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04ygkkgedzwqyrkqysiz7yqspc7m3ypsj7uft1a7bb1q8zktky3f8381eww7hwj.kq81nnmc5ezyaz8b7o75j9xn3qaf1e7kd5gm1mry3r1y6dz9hr581u6yznpf3a4.g-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:22,LUA_SUMMARY:none X-HE-Tag: house11_260b95427057 X-Filterd-Recvd-Size: 10862 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Tue, 25 Aug 2020 00:29:46 +0000 (UTC) IronPort-SDR: 7QIVljFfR/ldjShgzvYBV70sv7Nvq7zYmS5Mlhlqhm6elcmIo+WFSVv4XEE+aick4/fss+13QG hidOuSl3L1BQ== X-IronPort-AV: E=McAfee;i="6000,8403,9723"; a="157061763" X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="157061763" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:46 -0700 IronPort-SDR: vBxtYzbcxKNaKSno6bFQ3E6UFyj551jhcVZWN6WfuCl4+Px/9HFVxgnyYKvLeQXEykBTwHw5PN 5eDLWR18+fdg== X-IronPort-AV: E=Sophos;i="5.76,350,1592895600"; d="scan'208";a="474135046" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2020 17:29:45 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang Cc: Yu-cheng Yu Subject: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack Date: Mon, 24 Aug 2020 17:25:40 -0700 Message-Id: <20200825002540.3351-26-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200825002540.3351-1-yu-cheng.yu@intel.com> References: <20200825002540.3351-1-yu-cheng.yu@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 05DA8180F8B82 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: arch_prctl(ARCH_X86_CET_STATUS, u64 *args) Get CET feature status. The parameter 'args' is a pointer to a user buffer. The kernel returns the following information: *args = shadow stack/IBT status *(args + 1) = shadow stack base address *(args + 2) = shadow stack size arch_prctl(ARCH_X86_CET_DISABLE, u64 features) Disable CET features specified in 'features'. Return -EPERM if CET is locked. arch_prctl(ARCH_X86_CET_LOCK) Lock in CET features. arch_prctl(ARCH_X86_CET_MMAP_SHSTK, u64 *args) Allocate a new shadow stack. The parameter 'args' is a pointer to a user buffer. *args = desired size *(args + 1) = MAP_32BIT or MAP_POPULATE On returning, *args is the allocated shadow stack address. Also change do_arch_prctl_common()'s parameter 'cpuid_enabled' to 'arg2', as it is now also passed to prctl_cet(). Signed-off-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu --- v11: - Check input for invalid features. - Fix prctl_cet() return values. - Change ARCH_X86_CET_ALLOC_SHSTK to ARCH_X86_CET_MMAP_SHSTK to take MAP_32BIT, MAP_POPULATE as inputs. v10: - Verify CET is enabled before handling arch_prctl. - Change input parameters from unsigned long to u64, to make it clear they are 64-bit. arch/x86/include/asm/cet.h | 4 + arch/x86/include/uapi/asm/prctl.h | 5 ++ arch/x86/kernel/Makefile | 2 +- arch/x86/kernel/cet.c | 26 +++++++ arch/x86/kernel/cet_prctl.c | 98 +++++++++++++++++++++++++ arch/x86/kernel/process.c | 6 +- tools/arch/x86/include/uapi/asm/prctl.h | 5 ++ 7 files changed, 142 insertions(+), 4 deletions(-) create mode 100644 arch/x86/kernel/cet_prctl.c diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index 71dc92acd2f2..f7eb197998ad 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -14,16 +14,20 @@ struct sc_ext; struct cet_status { unsigned long shstk_base; unsigned long shstk_size; + unsigned int locked:1; }; #ifdef CONFIG_X86_INTEL_CET +int prctl_cet(int option, u64 arg2); int cet_setup_shstk(void); int cet_setup_thread_shstk(struct task_struct *p); +unsigned long cet_alloc_shstk(unsigned long size, int flags); void cet_disable_free_shstk(struct task_struct *p); int cet_verify_rstor_token(bool ia32, unsigned long ssp, unsigned long *new_ssp); void cet_restore_signal(struct sc_ext *sc); int cet_setup_signal(bool ia32, unsigned long rstor, struct sc_ext *sc); #else +static inline int prctl_cet(int option, u64 arg2) { return -EINVAL; } static inline int cet_setup_thread_shstk(struct task_struct *p) { return 0; } static inline void cet_disable_free_shstk(struct task_struct *p) {} static inline void cet_restore_signal(struct sc_ext *sc) { return; } diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 5a6aac9fa41f..3aaac13cdc87 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -14,4 +14,9 @@ #define ARCH_MAP_VDSO_32 0x2002 #define ARCH_MAP_VDSO_64 0x2003 +#define ARCH_X86_CET_STATUS 0x3001 +#define ARCH_X86_CET_DISABLE 0x3002 +#define ARCH_X86_CET_LOCK 0x3003 +#define ARCH_X86_CET_MMAP_SHSTK 0x3004 + #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 76f27f518266..97556e4204d6 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -145,7 +145,7 @@ obj-$(CONFIG_UNWINDER_ORC) += unwind_orc.o obj-$(CONFIG_UNWINDER_FRAME_POINTER) += unwind_frame.o obj-$(CONFIG_UNWINDER_GUESS) += unwind_guess.o -obj-$(CONFIG_X86_INTEL_CET) += cet.o +obj-$(CONFIG_X86_INTEL_CET) += cet.o cet_prctl.o ### # 64 bit specific files diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c index b30c61a66c8e..2bf1a6b6abb6 100644 --- a/arch/x86/kernel/cet.c +++ b/arch/x86/kernel/cet.c @@ -148,6 +148,32 @@ static int create_rstor_token(bool ia32, unsigned long ssp, return 0; } +unsigned long cet_alloc_shstk(unsigned long len, int flags) +{ + unsigned long token; + unsigned long addr, ssp; + + addr = alloc_shstk(round_up(len, PAGE_SIZE), flags); + + if (IS_ERR_VALUE(addr)) + return addr; + + /* Restore token is 8 bytes and aligned to 8 bytes */ + ssp = addr + len; + token = ssp; + + if (!in_ia32_syscall()) + token |= TOKEN_MODE_64; + ssp -= 8; + + if (write_user_shstk_64(ssp, token)) { + vm_munmap(addr, len); + return -EINVAL; + } + + return addr; +} + int cet_setup_shstk(void) { unsigned long addr, size; diff --git a/arch/x86/kernel/cet_prctl.c b/arch/x86/kernel/cet_prctl.c new file mode 100644 index 000000000000..cc49eef08ab0 --- /dev/null +++ b/arch/x86/kernel/cet_prctl.c @@ -0,0 +1,98 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* See Documentation/x86/intel_cet.rst. */ + +static int copy_status_to_user(struct cet_status *cet, u64 arg2) +{ + u64 buf[3] = {0, 0, 0}; + + if (cet->shstk_size) { + buf[0] |= GNU_PROPERTY_X86_FEATURE_1_SHSTK; + buf[1] = (u64)cet->shstk_base; + buf[2] = (u64)cet->shstk_size; + } + + return copy_to_user((u64 __user *)arg2, buf, sizeof(buf)); +} + +static int handle_mmap_shstk(u64 arg2) +{ + u64 buf[3]; + unsigned long addr, size; + int allowed_flags; + + if (copy_from_user(buf, (unsigned long __user *)arg2, sizeof(buf))) + return -EFAULT; + + size = buf[0]; + + /* + * Check invalid flags + */ + allowed_flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_32BIT | MAP_POPULATE; + + if (buf[1] & ~allowed_flags) + return -EINVAL; + + addr = cet_alloc_shstk(size, buf[1]); + if (IS_ERR_VALUE(addr)) + return PTR_ERR((void *)addr); + + if (put_user(addr, (u64 __user *)arg2)) { + vm_munmap(addr, size); + return -EFAULT; + } + + return 0; +} + +int prctl_cet(int option, u64 arg2) +{ + struct cet_status *cet; + + /* + * GLIBC's ENOTSUPP == EOPNOTSUPP == 95, and it does not recognize + * the kernel's ENOTSUPP (524). So return EOPNOTSUPP here. + */ + if (!IS_ENABLED(CONFIG_X86_INTEL_CET)) + return -EOPNOTSUPP; + + cet = ¤t->thread.cet; + + if (option == ARCH_X86_CET_STATUS) + return copy_status_to_user(cet, arg2); + + if (!static_cpu_has(X86_FEATURE_SHSTK)) + return -EOPNOTSUPP; + + switch (option) { + case ARCH_X86_CET_DISABLE: + if (cet->locked) + return -EPERM; + if (arg2 & GNU_PROPERTY_X86_FEATURE_1_INVAL) + return -EINVAL; + if (arg2 & GNU_PROPERTY_X86_FEATURE_1_SHSTK) + cet_disable_free_shstk(current); + return 0; + + case ARCH_X86_CET_LOCK: + cet->locked = 1; + return 0; + + case ARCH_X86_CET_MMAP_SHSTK: + return handle_mmap_shstk(arg2); + + default: + return -ENOSYS; + } +} diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index b6fe5b061841..5a657a9774dc 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -980,14 +980,14 @@ unsigned long get_wchan(struct task_struct *p) } long do_arch_prctl_common(struct task_struct *task, int option, - unsigned long cpuid_enabled) + unsigned long arg2) { switch (option) { case ARCH_GET_CPUID: return get_cpuid_mode(); case ARCH_SET_CPUID: - return set_cpuid_mode(task, cpuid_enabled); + return set_cpuid_mode(task, arg2); } - return -EINVAL; + return prctl_cet(option, arg2); } diff --git a/tools/arch/x86/include/uapi/asm/prctl.h b/tools/arch/x86/include/uapi/asm/prctl.h index 5a6aac9fa41f..3aaac13cdc87 100644 --- a/tools/arch/x86/include/uapi/asm/prctl.h +++ b/tools/arch/x86/include/uapi/asm/prctl.h @@ -14,4 +14,9 @@ #define ARCH_MAP_VDSO_32 0x2002 #define ARCH_MAP_VDSO_64 0x2003 +#define ARCH_X86_CET_STATUS 0x3001 +#define ARCH_X86_CET_DISABLE 0x3002 +#define ARCH_X86_CET_LOCK 0x3003 +#define ARCH_X86_CET_MMAP_SHSTK 0x3004 + #endif /* _ASM_X86_PRCTL_H */