From patchwork Tue Jul 14 07:02:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11661653 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 43A6A913 for ; Tue, 14 Jul 2020 07:04:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2F58D221FB for ; Tue, 14 Jul 2020 07:04:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726660AbgGNHEC (ORCPT ); Tue, 14 Jul 2020 03:04:02 -0400 Received: from mga04.intel.com ([192.55.52.120]:50307 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725997AbgGNHEC (ORCPT ); Tue, 14 Jul 2020 03:04:02 -0400 IronPort-SDR: LJwFS7fVhzuDFEVHwckuJJENEwBCkIoXPSz4CRz+DB397LYUwTCOqmejJjS8q3rGKApDEa+cnB ZA1aBfL0A3KA== X-IronPort-AV: E=McAfee;i="6000,8403,9681"; a="146304252" X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="146304252" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:00 -0700 IronPort-SDR: SSi5GJb0V/HMXSKbaAyXk6HDmW4dB1Grr5oZA4rgflGb+mamJz9w29PyxLPPix1ZRfBr/DOI4L Lp6bL5FQa6zg== X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="429666461" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:00 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , x86@kernel.org, Dave Hansen , Dan Williams , Vishal Verma , Andrew Morton , Fenghua Yu , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 01/15] x86/pkeys: Create pkeys_internal.h Date: Tue, 14 Jul 2020 00:02:06 -0700 Message-Id: <20200714070220.3500839-2-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200714070220.3500839-1-ira.weiny@intel.com> References: <20200714070220.3500839-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny Protection Keys User (PKU) and Protection Keys Supervisor (PKS) work in similar fashions. Share code between them by creating a header with common defines, move those defines into this header, change their names to reflect the new use, and include the header where needed. Signed-off-by: Ira Weiny --- arch/x86/include/asm/pgtable.h | 13 ++++++------- arch/x86/include/asm/pkeys.h | 2 ++ arch/x86/include/asm/pkeys_internal.h | 11 +++++++++++ arch/x86/include/asm/processor.h | 1 + arch/x86/kernel/fpu/xstate.c | 8 ++++---- arch/x86/mm/pkeys.c | 14 ++++++-------- 6 files changed, 30 insertions(+), 19 deletions(-) create mode 100644 arch/x86/include/asm/pkeys_internal.h diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 76aa21e8128d..30e97fc8a683 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1364,9 +1364,7 @@ static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd) } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ -#define PKRU_AD_BIT 0x1 -#define PKRU_WD_BIT 0x2 -#define PKRU_BITS_PER_PKEY 2 +#include #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS extern u32 init_pkru_value; @@ -1376,18 +1374,19 @@ extern u32 init_pkru_value; static inline bool __pkru_allows_read(u32 pkru, u16 pkey) { - int pkru_pkey_bits = pkey * PKRU_BITS_PER_PKEY; - return !(pkru & (PKRU_AD_BIT << pkru_pkey_bits)); + int pkru_pkey_bits = pkey * PKR_BITS_PER_PKEY; + + return !(pkru & (PKR_AD_BIT << pkru_pkey_bits)); } static inline bool __pkru_allows_write(u32 pkru, u16 pkey) { - int pkru_pkey_bits = pkey * PKRU_BITS_PER_PKEY; + int pkru_pkey_bits = pkey * PKR_BITS_PER_PKEY; /* * Access-disable disables writes too so we need to check * both bits here. */ - return !(pkru & ((PKRU_AD_BIT|PKRU_WD_BIT) << pkru_pkey_bits)); + return !(pkru & ((PKR_AD_BIT|PKR_WD_BIT) << pkru_pkey_bits)); } static inline u16 pte_flags_pkey(unsigned long pte_flags) diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index 2ff9b98812b7..be8b3e448f76 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -2,6 +2,8 @@ #ifndef _ASM_X86_PKEYS_H #define _ASM_X86_PKEYS_H +#include + #define ARCH_DEFAULT_PKEY 0 /* diff --git a/arch/x86/include/asm/pkeys_internal.h b/arch/x86/include/asm/pkeys_internal.h new file mode 100644 index 000000000000..a9f086f1e4b4 --- /dev/null +++ b/arch/x86/include/asm/pkeys_internal.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_PKEYS_INTERNAL_H +#define _ASM_X86_PKEYS_INTERNAL_H + +#define PKR_AD_BIT 0x1 +#define PKR_WD_BIT 0x2 +#define PKR_BITS_PER_PKEY 2 + +#define PKR_AD_KEY(pkey) (PKR_AD_BIT << ((pkey) * PKR_BITS_PER_PKEY)) + +#endif /*_ASM_X86_PKEYS_INTERNAL_H */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 03b7c4ca425a..7da9855b5068 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -27,6 +27,7 @@ struct vm86; #include #include #include +#include #include #include diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index bda2e5eaca0e..fc1ec2986e03 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -955,7 +955,7 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val) { u32 old_pkru; - int pkey_shift = (pkey * PKRU_BITS_PER_PKEY); + int pkey_shift = (pkey * PKR_BITS_PER_PKEY); u32 new_pkru_bits = 0; /* @@ -974,16 +974,16 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, /* Set the bits we need in PKRU: */ if (init_val & PKEY_DISABLE_ACCESS) - new_pkru_bits |= PKRU_AD_BIT; + new_pkru_bits |= PKR_AD_BIT; if (init_val & PKEY_DISABLE_WRITE) - new_pkru_bits |= PKRU_WD_BIT; + new_pkru_bits |= PKR_WD_BIT; /* Shift the bits in to the correct place in PKRU for pkey: */ new_pkru_bits <<= pkey_shift; /* Get old PKRU and mask off any old bits in place: */ old_pkru = read_pkru(); - old_pkru &= ~((PKRU_AD_BIT|PKRU_WD_BIT) << pkey_shift); + old_pkru &= ~((PKR_AD_BIT|PKR_WD_BIT) << pkey_shift); /* Write old part along with new part: */ write_pkru(old_pkru | new_pkru_bits); diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 8873ed1438a9..f5efb4007e74 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -111,19 +111,17 @@ int __arch_override_mprotect_pkey(struct vm_area_struct *vma, int prot, int pkey return vma_pkey(vma); } -#define PKRU_AD_KEY(pkey) (PKRU_AD_BIT << ((pkey) * PKRU_BITS_PER_PKEY)) - /* * Make the default PKRU value (at execve() time) as restrictive * as possible. This ensures that any threads clone()'d early * in the process's lifetime will not accidentally get access * to data which is pkey-protected later on. */ -u32 init_pkru_value = PKRU_AD_KEY( 1) | PKRU_AD_KEY( 2) | PKRU_AD_KEY( 3) | - PKRU_AD_KEY( 4) | PKRU_AD_KEY( 5) | PKRU_AD_KEY( 6) | - PKRU_AD_KEY( 7) | PKRU_AD_KEY( 8) | PKRU_AD_KEY( 9) | - PKRU_AD_KEY(10) | PKRU_AD_KEY(11) | PKRU_AD_KEY(12) | - PKRU_AD_KEY(13) | PKRU_AD_KEY(14) | PKRU_AD_KEY(15); +u32 init_pkru_value = PKR_AD_KEY( 1) | PKR_AD_KEY( 2) | PKR_AD_KEY( 3) | + PKR_AD_KEY( 4) | PKR_AD_KEY( 5) | PKR_AD_KEY( 6) | + PKR_AD_KEY( 7) | PKR_AD_KEY( 8) | PKR_AD_KEY( 9) | + PKR_AD_KEY(10) | PKR_AD_KEY(11) | PKR_AD_KEY(12) | + PKR_AD_KEY(13) | PKR_AD_KEY(14) | PKR_AD_KEY(15); /* * Called from the FPU code when creating a fresh set of FPU @@ -173,7 +171,7 @@ static ssize_t init_pkru_write_file(struct file *file, * up immediately if someone attempts to disable access * or writes to pkey 0. */ - if (new_init_pkru & (PKRU_AD_BIT|PKRU_WD_BIT)) + if (new_init_pkru & (PKR_AD_BIT|PKR_WD_BIT)) return -EINVAL; WRITE_ONCE(init_pkru_value, new_init_pkru); From patchwork Tue Jul 14 07:02:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11661789 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 835EA1510 for ; Tue, 14 Jul 2020 07:05:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7514022205 for ; Tue, 14 Jul 2020 07:05:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726795AbgGNHEJ (ORCPT ); Tue, 14 Jul 2020 03:04:09 -0400 Received: from mga07.intel.com ([134.134.136.100]:36396 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725997AbgGNHEG (ORCPT ); Tue, 14 Jul 2020 03:04:06 -0400 IronPort-SDR: a1A9+X8wakkUx+Uj/w3EH88u4fGVLiosXj+N0BEwmO6/F13EZJPPwBXJ64DoTLqpfpVOHhDcWB zLpSTV3u183w== X-IronPort-AV: E=McAfee;i="6000,8403,9681"; a="213635611" X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="213635611" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:04 -0700 IronPort-SDR: nY/DFM9nVSCaOf42CJfiKh58xiTvUpfFPmJqpldQG8mdSpoKtENIWvEW6a2fSkGue2L/MLfD8V mS8Mw6tqq+8g== X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="485774305" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:03 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Fenghua Yu , Ira Weiny , x86@kernel.org, Dave Hansen , Dan Williams , Vishal Verma , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 02/15] x86/fpu: Refactor arch_set_user_pkey_access() for PKS support Date: Tue, 14 Jul 2020 00:02:07 -0700 Message-Id: <20200714070220.3500839-3-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200714070220.3500839-1-ira.weiny@intel.com> References: <20200714070220.3500839-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Fenghua Yu Define a helper, get_new_pkr(), which will be used to support both Protection Key User (PKU) and the new Protection Key for Supervisor (PKS) in subsequent patches. Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny Signed-off-by: Fenghua Yu --- arch/x86/include/asm/pkeys.h | 2 ++ arch/x86/kernel/fpu/xstate.c | 17 +++-------------- arch/x86/mm/pkeys.c | 28 ++++++++++++++++++++++++++++ 3 files changed, 33 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index be8b3e448f76..34cef29fed20 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -136,4 +136,6 @@ static inline int vma_pkey(struct vm_area_struct *vma) return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT; } +u32 get_new_pkr(u32 old_pkr, int pkey, unsigned long init_val); + #endif /*_ASM_X86_PKEYS_H */ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index fc1ec2986e03..1def71dc8105 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -954,9 +954,7 @@ const void *get_xsave_field_ptr(int xfeature_nr) int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val) { - u32 old_pkru; - int pkey_shift = (pkey * PKR_BITS_PER_PKEY); - u32 new_pkru_bits = 0; + u32 old_pkru, new_pkru; /* * This check implies XSAVE support. OSPKE only gets @@ -972,21 +970,12 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, */ WARN_ON_ONCE(pkey >= arch_max_pkey()); - /* Set the bits we need in PKRU: */ - if (init_val & PKEY_DISABLE_ACCESS) - new_pkru_bits |= PKR_AD_BIT; - if (init_val & PKEY_DISABLE_WRITE) - new_pkru_bits |= PKR_WD_BIT; - - /* Shift the bits in to the correct place in PKRU for pkey: */ - new_pkru_bits <<= pkey_shift; - /* Get old PKRU and mask off any old bits in place: */ old_pkru = read_pkru(); - old_pkru &= ~((PKR_AD_BIT|PKR_WD_BIT) << pkey_shift); + new_pkru = get_new_pkr(old_pkru, pkey, init_val); /* Write old part along with new part: */ - write_pkru(old_pkru | new_pkru_bits); + write_pkru(new_pkru); return 0; } diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index f5efb4007e74..a5c680d32930 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -208,3 +208,31 @@ static __init int setup_init_pkru(char *opt) return 1; } __setup("init_pkru=", setup_init_pkru); + +/* + * Get a new pkey register value from the user values specified. + * + * Kernel users use the same flags as user space: + * PKEY_DISABLE_ACCESS + * PKEY_DISABLE_WRITE + */ +u32 get_new_pkr(u32 old_pkr, int pkey, unsigned long init_val) +{ + int pkey_shift = (pkey * PKR_BITS_PER_PKEY); + u32 new_pkr_bits = 0; + + /* Set the bits we need in the register: */ + if (init_val & PKEY_DISABLE_ACCESS) + new_pkr_bits |= PKR_AD_BIT; + if (init_val & PKEY_DISABLE_WRITE) + new_pkr_bits |= PKR_WD_BIT; + + /* Shift the bits in to the correct place: */ + new_pkr_bits <<= pkey_shift; + + /* Mask off any old bits in place: */ + old_pkr &= ~((PKR_AD_BIT | PKR_WD_BIT) << pkey_shift); + + /* Return the old part along with the new part: */ + return old_pkr | new_pkr_bits; +} From patchwork Tue Jul 14 07:02:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11661781 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2573C722 for ; Tue, 14 Jul 2020 07:05:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 183D522226 for ; Tue, 14 Jul 2020 07:05:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726820AbgGNHEJ (ORCPT ); Tue, 14 Jul 2020 03:04:09 -0400 Received: from mga03.intel.com ([134.134.136.65]:9058 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726782AbgGNHEI (ORCPT ); Tue, 14 Jul 2020 03:04:08 -0400 IronPort-SDR: SzGkCToIK3nfVNmosCLbrrk7sKpWM7JgUNnxunx6ht3c1989Kmde/5luXomiQSSmUt6xwKW/UH 55ztlkoPxDKw== X-IronPort-AV: E=McAfee;i="6000,8403,9681"; a="148828649" X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="148828649" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:06 -0700 IronPort-SDR: zCgGXEqBmYqnjzD5g1tvTYbJG8Sugsaqt04O3q+7B6aA9Rho2GxTyEUJZYn+eiJhFS/hLT1lYm 76SjNBY0o0KQ== X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="307752705" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:05 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Fenghua Yu , Ira Weiny , x86@kernel.org, Dave Hansen , Dan Williams , Vishal Verma , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 03/15] x86/pks: Enable Protection Keys Supervisor (PKS) Date: Tue, 14 Jul 2020 00:02:08 -0700 Message-Id: <20200714070220.3500839-4-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200714070220.3500839-1-ira.weiny@intel.com> References: <20200714070220.3500839-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Fenghua Yu Protection Keys for Supervisor pages (PKS) enables fast, hardware thread specific, manipulation of permission restrictions on supervisor page mappings. It uses the same mechanism of Protection Keys as those on User mappings but applies that mechanism to supervisor mappings using a supervisor specific MSR. Kernel users can thus defines 'domains' of page mappings which have an extra level of protection beyond those specified in the supervisor page table entries. Define ARCH_HAS_SUPERVISOR_PKEYS to distinguish this functionality from the existing ARCH_HAS_PKEYS and then enable PKS when configured and indicated by the CPU instance. While not strictly necessary in this patch, ARCH_HAS_SUPERVISOR_PKEYS separates this functionality through the patch series so it is introduced here. Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny Signed-off-by: Fenghua Yu --- arch/x86/Kconfig | 1 + arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/uapi/asm/processor-flags.h | 2 ++ arch/x86/kernel/cpu/common.c | 15 +++++++++++++++ mm/Kconfig | 2 ++ 5 files changed, 21 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 883da0abf779..c3ecbed2cfa0 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1872,6 +1872,7 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS depends on X86_64 && (CPU_SUP_INTEL || CPU_SUP_AMD) select ARCH_USES_HIGH_VMA_FLAGS select ARCH_HAS_PKEYS + select ARCH_HAS_SUPERVISOR_PKEYS help Memory Protection Keys provides a mechanism for enforcing page-based protections, but without requiring modification of the diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 02dabc9e77b0..a832ed8820c0 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -351,6 +351,7 @@ #define X86_FEATURE_CLDEMOTE (16*32+25) /* CLDEMOTE instruction */ #define X86_FEATURE_MOVDIRI (16*32+27) /* MOVDIRI instruction */ #define X86_FEATURE_MOVDIR64B (16*32+28) /* MOVDIR64B instruction */ +#define X86_FEATURE_PKS (16*32+31) /* Protection Keys for Supervisor pages */ /* AMD-defined CPU features, CPUID level 0x80000007 (EBX), word 17 */ #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery support */ diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h index bcba3c643e63..191c574b2390 100644 --- a/arch/x86/include/uapi/asm/processor-flags.h +++ b/arch/x86/include/uapi/asm/processor-flags.h @@ -130,6 +130,8 @@ #define X86_CR4_SMAP _BITUL(X86_CR4_SMAP_BIT) #define X86_CR4_PKE_BIT 22 /* enable Protection Keys support */ #define X86_CR4_PKE _BITUL(X86_CR4_PKE_BIT) +#define X86_CR4_PKS_BIT 24 /* enable Protection Keys for Supervisor */ +#define X86_CR4_PKS _BITUL(X86_CR4_PKS_BIT) /* * x86-64 Task Priority Register, CR8 diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 95c090a45b4b..f34bcefeda42 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1430,6 +1430,20 @@ static void validate_apic_and_package_id(struct cpuinfo_x86 *c) #endif } +/* + * PKS is independent of PKU and either or both may be supported on a CPU. + * Configure PKS if the cpu supports the feature. + */ +static void setup_pks(void) +{ + if (!IS_ENABLED(CONFIG_ARCH_HAS_SUPERVISOR_PKEYS)) + return; + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + cr4_set_bits(X86_CR4_PKS); +} + /* * This does the hard work of actually picking apart the CPU stuff... */ @@ -1521,6 +1535,7 @@ static void identify_cpu(struct cpuinfo_x86 *c) x86_init_rdrand(c); setup_pku(c); + setup_pks(); /* * Clear/Set all flags overridden by options, need do it diff --git a/mm/Kconfig b/mm/Kconfig index f2104cc0d35c..e541d2c0dcac 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -826,6 +826,8 @@ config ARCH_USES_HIGH_VMA_FLAGS bool config ARCH_HAS_PKEYS bool +config ARCH_HAS_SUPERVISOR_PKEYS + bool config PERCPU_STATS bool "Collect percpu memory statistics" From patchwork Tue Jul 14 07:02:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11661775 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6FF3813B1 for ; Tue, 14 Jul 2020 07:05:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5C98B2220F for ; Tue, 14 Jul 2020 07:05:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726859AbgGNHEM (ORCPT ); Tue, 14 Jul 2020 03:04:12 -0400 Received: from mga18.intel.com ([134.134.136.126]:49022 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726788AbgGNHEK (ORCPT ); Tue, 14 Jul 2020 03:04:10 -0400 IronPort-SDR: GAAIRizmK1AB+z/yfOq2KKL7aipoUiE8GySR5ej1CiY1KFaQD6q/FvteIE8tzaVbl9Avd8g6a8 P8AKOvIS1e9A== X-IronPort-AV: E=McAfee;i="6000,8403,9681"; a="136290672" X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="136290672" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:08 -0700 IronPort-SDR: qFqECLUNnlweAM8WminwN2DdYVuOS4PKUDe954fjJlk/ekiidgBcy6ckm8/rsoFwFQ2qgU6l0k /SgluK7O37dg== X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="285662518" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:07 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , Fenghua Yu , x86@kernel.org, Dave Hansen , Dan Williams , Vishal Verma , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 04/15] x86/pks: Preserve the PKRS MSR on context switch Date: Tue, 14 Jul 2020 00:02:09 -0700 Message-Id: <20200714070220.3500839-5-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200714070220.3500839-1-ira.weiny@intel.com> References: <20200714070220.3500839-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny The PKRS MSR is defined as a per-core register. This isolates memory access by CPU. Unfortunately, the MSR is not preserved by XSAVE. Therefore, We must preserve the protections for individual tasks even if they are context switched out and placed on another cpu later. Define a saved PKRS value in the task struct, as well as a cached per-cpu MSR value which mirrors the MSR value of the current CPU. Initialize, all tasks with the default MSR value. Then, on schedule in, check the saved task MSR vs the per-cpu value. If different proceed to write the MSR. If not we avoid the overhead of the MSR write and continue. Follow on patches will update the saved PKRS as well as the MSR if needed. Co-developed-by: Fenghua Yu Signed-off-by: Fenghua Yu Signed-off-by: Ira Weiny --- arch/x86/include/asm/msr-index.h | 1 + arch/x86/include/asm/pkeys_internal.h | 20 +++++++++++++++ arch/x86/include/asm/processor.h | 12 +++++++++ arch/x86/kernel/cpu/common.c | 2 ++ arch/x86/kernel/process.c | 35 +++++++++++++++++++++++++++ arch/x86/mm/pkeys.c | 13 ++++++++++ 6 files changed, 83 insertions(+) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index e8370e64a155..b6ffdfc3f388 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -727,6 +727,7 @@ #define MSR_IA32_TSC_DEADLINE 0x000006E0 +#define MSR_IA32_PKRS 0x000006E1 #define MSR_TSX_FORCE_ABORT 0x0000010F diff --git a/arch/x86/include/asm/pkeys_internal.h b/arch/x86/include/asm/pkeys_internal.h index a9f086f1e4b4..05257cdc7200 100644 --- a/arch/x86/include/asm/pkeys_internal.h +++ b/arch/x86/include/asm/pkeys_internal.h @@ -8,4 +8,24 @@ #define PKR_AD_KEY(pkey) (PKR_AD_BIT << ((pkey) * PKR_BITS_PER_PKEY)) +/* + * Define a default PKRS value for each task. + * + * Key 0 has no restriction. All other keys are set to the most restrictive + * value which is access disabled (AD=1). + * + * NOTE: This needs to be a macro to be used as part of the INIT_THREAD macro. + */ +#define INIT_PKRS_VALUE (PKR_AD_KEY(1) | PKR_AD_KEY(2) | PKR_AD_KEY(3) | \ + PKR_AD_KEY(4) | PKR_AD_KEY(5) | PKR_AD_KEY(6) | \ + PKR_AD_KEY(7) | PKR_AD_KEY(8) | PKR_AD_KEY(9) | \ + PKR_AD_KEY(10) | PKR_AD_KEY(11) | PKR_AD_KEY(12) | \ + PKR_AD_KEY(13) | PKR_AD_KEY(14) | PKR_AD_KEY(15)) + +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +void write_pkrs(u32 pkrs_val); +#else +static inline void write_pkrs(u32 pkrs_val) { } +#endif + #endif /*_ASM_X86_PKEYS_INTERNAL_H */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 7da9855b5068..704d9f28fd4e 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -545,6 +545,11 @@ struct thread_struct { unsigned int sig_on_uaccess_err:1; +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS + /* Saved Protection key register for supervisor mappings */ + u32 saved_pkrs; +#endif + /* Floating point and extended processor state */ struct fpu fpu; /* @@ -907,8 +912,15 @@ static inline void spin_lock_prefetch(const void *x) #define STACK_TOP TASK_SIZE_LOW #define STACK_TOP_MAX TASK_SIZE_MAX +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +#define INIT_THREAD_PKRS .saved_pkrs = INIT_PKRS_VALUE, +#else +#define INIT_THREAD_PKRS +#endif + #define INIT_THREAD { \ .addr_limit = KERNEL_DS, \ + INIT_THREAD_PKRS \ } extern unsigned long KSTK_ESP(struct task_struct *task); diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index f34bcefeda42..b8241936cbbf 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -56,6 +56,7 @@ #include #include #include +#include #include "cpu.h" @@ -1442,6 +1443,7 @@ static void setup_pks(void) return; cr4_set_bits(X86_CR4_PKS); + write_pkrs(INIT_PKRS_VALUE); } /* diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index f362ce0d5ac0..d69250a7c1bf 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -42,6 +42,7 @@ #include #include #include +#include #include "process.h" @@ -184,6 +185,36 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp, return ret; } +/* + * NOTE: We wrap pks_init_task() and pks_sched_in() with + * CONFIG_ARCH_HAS_SUPERVISOR_PKEYS because using IS_ENABLED() fails + * due to the lack of task_struct->saved_pkrs in this configuration. + * Furthermore, we place them here because of the complexity introduced by + * header conflicts introduced to get the task_struct definition in the pkeys + * headers. + */ +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +DECLARE_PER_CPU(u32, pkrs_cache); +static inline void pks_init_task(struct task_struct *tsk) +{ + /* New tasks get the most restrictive PKRS value */ + tsk->thread.saved_pkrs = INIT_PKRS_VALUE; +} +static inline void pks_sched_in(void) +{ + u64 current_pkrs = current->thread.saved_pkrs; + + /* Only update the MSR when current's pkrs is different from the MSR. */ + if (this_cpu_read(pkrs_cache) == current_pkrs) + return; + + write_pkrs(current_pkrs); +} +#else +static inline void pks_init_task(struct task_struct *tsk) { } +static inline void pks_sched_in(void) { } +#endif + void flush_thread(void) { struct task_struct *tsk = current; @@ -192,6 +223,8 @@ void flush_thread(void) memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array)); fpu__clear_all(&tsk->thread.fpu); + + pks_init_task(tsk); } void disable_TSC(void) @@ -655,6 +688,8 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p) if ((tifp ^ tifn) & _TIF_SLD) switch_to_sld(tifn); + + pks_sched_in(); } /* diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index a5c680d32930..0f86f2374bd7 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -236,3 +236,16 @@ u32 get_new_pkr(u32 old_pkr, int pkey, unsigned long init_val) /* Return the old part along with the new part: */ return old_pkr | new_pkr_bits; } + +DEFINE_PER_CPU(u32, pkrs_cache); + +/* + * Write the PKey Register Supervisor. This must be run with preemption + * disabled as it does not guarantee the atomicity of updating the pkrs_cache + * and MSR on its own. + */ +void write_pkrs(u32 pkrs_val) +{ + this_cpu_write(pkrs_cache, pkrs_val); + wrmsrl(MSR_IA32_PKRS, pkrs_val); +} From patchwork Tue Jul 14 07:02:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11661673 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CA18E1510 for ; Tue, 14 Jul 2020 07:04:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B66DD221F7 for ; Tue, 14 Jul 2020 07:04:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726870AbgGNHEM (ORCPT ); Tue, 14 Jul 2020 03:04:12 -0400 Received: from mga03.intel.com ([134.134.136.65]:9058 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726848AbgGNHEK (ORCPT ); Tue, 14 Jul 2020 03:04:10 -0400 IronPort-SDR: JBuIvf2YtFsNi22v3/RUtr75DO/ciMR9WkCpEfzGFai07bchYwUQfvtxSRkpQk/XHCXSSrlKQ/ htWxveCSbjVA== X-IronPort-AV: E=McAfee;i="6000,8403,9681"; a="148828664" X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="148828664" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:10 -0700 IronPort-SDR: 2BwForAF8NIBLQ8QKx/Fab82qfbti+rxwheupRip9gzdF6hvypzmf9dFXHNlerP8hNn7uHyG1i egetIafJxTNA== X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="281659457" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:09 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Fenghua Yu , Ira Weiny , x86@kernel.org, Dave Hansen , Dan Williams , Vishal Verma , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 05/15] x86/pks: Add PKS kernel API Date: Tue, 14 Jul 2020 00:02:10 -0700 Message-Id: <20200714070220.3500839-6-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200714070220.3500839-1-ira.weiny@intel.com> References: <20200714070220.3500839-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Fenghua Yu PKS allows kernel users to define domains of page mappings which have additional protections beyond the paging protections. Add an API to allocate, use, and free a protection key which identifies such a domain. We export 2 new symbols pks_key_alloc() and pks_key_free() while pks_update_protection() is exposed as an inline function via header file. pks_key_alloc() reserves pkey 0 for default kernel pages. The other 15 keys are dynamically allocated to allow better use of the limited key space. This, and the fact that PKS may not be available on all arch's, means callers of the allocator _must_ be prepared for it to fail and take appropriate action to run without their allocated domain. This is not anticipated to be a problem as these protections only serve to harden memory and users should be no worse off than before the introduction of PKS. PAGE_KERNEL_PKEY(key) and _PAGE_PKEY(pkey) aid in setting page table entry bits by kernel users. Note these defines will be used in follow on patches but are included here for a complete interface. pks_update_protection() is inlined for performance and allows kernel users the ability to change the protections for the domain identified by the pkey specified. It is undefined behavior to call this on a pkey not allocated by the allocator. And will WARN_ON if called on architectures which do not support PKS. (Again callers are expected to check the return of pks_key_alloc() before using this API further.) Finally, pks_key_free() allows a user to return the key to the allocator for use by others. The interface maintains Access Disabled (AD=1) for all keys not currently allocated. Therefore, the user can depend on access being disabled when pks_key_alloc() returns a key. Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny Signed-off-by: Fenghua Yu --- arch/x86/include/asm/pgtable_types.h | 4 ++ arch/x86/include/asm/pkeys.h | 30 ++++++++++ arch/x86/include/asm/pkeys_internal.h | 4 ++ arch/x86/mm/pkeys.c | 79 +++++++++++++++++++++++++++ include/linux/pkeys.h | 14 +++++ 5 files changed, 131 insertions(+) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 816b31c68550..2ab45ef89c7d 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -73,6 +73,8 @@ _PAGE_PKEY_BIT2 | \ _PAGE_PKEY_BIT3) +#define _PAGE_PKEY(pkey) (_AT(pteval_t, pkey) << _PAGE_BIT_PKEY_BIT0) + #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) #define _PAGE_KNL_ERRATUM_MASK (_PAGE_DIRTY | _PAGE_ACCESSED) #else @@ -229,6 +231,8 @@ enum page_cache_mode { #define PAGE_KERNEL_IO __pgprot_mask(__PAGE_KERNEL_IO) #define PAGE_KERNEL_IO_NOCACHE __pgprot_mask(__PAGE_KERNEL_IO_NOCACHE) +#define PAGE_KERNEL_PKEY(pkey) __pgprot_mask(__PAGE_KERNEL | _PAGE_PKEY(pkey)) + #endif /* __ASSEMBLY__ */ /* xwr */ diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index 34cef29fed20..e30ea907abb6 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -138,4 +138,34 @@ static inline int vma_pkey(struct vm_area_struct *vma) u32 get_new_pkr(u32 old_pkr, int pkey, unsigned long init_val); +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +int pks_key_alloc(const char *const pkey_user); +void pks_key_free(int pkey); +u32 get_new_pkr(u32 old_pkr, int pkey, unsigned long init_val); + +/* + * pks_update_protection - Update the protection of the specified key + * + * @pkey: Key for the domain to change + * @protection: protection bits to be used + * + * Protection utilizes the same protection bits specified for User pkeys + * PKEY_DISABLE_ACCESS + * PKEY_DISABLE_WRITE + * + * This is not a global update. It only affects the current running thread. + * + * It is undefined and a bug for users to call this without having allocated a + * pkey and using it as pkey here. + */ +static inline void pks_update_protection(int pkey, unsigned long protection) +{ + current->thread.saved_pkrs = get_new_pkr(current->thread.saved_pkrs, + pkey, protection); + preempt_disable(); + write_pkrs(current->thread.saved_pkrs); + preempt_enable(); +} +#endif /* CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */ + #endif /*_ASM_X86_PKEYS_H */ diff --git a/arch/x86/include/asm/pkeys_internal.h b/arch/x86/include/asm/pkeys_internal.h index 05257cdc7200..e34f380c66d1 100644 --- a/arch/x86/include/asm/pkeys_internal.h +++ b/arch/x86/include/asm/pkeys_internal.h @@ -22,6 +22,10 @@ PKR_AD_KEY(10) | PKR_AD_KEY(11) | PKR_AD_KEY(12) | \ PKR_AD_KEY(13) | PKR_AD_KEY(14) | PKR_AD_KEY(15)) +/* PKS supports 16 keys. Key 0 is reserved for the kernel. */ +#define PKS_KERN_DEFAULT_KEY 0 +#define PKS_NUM_KEYS 16 + #ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS void write_pkrs(u32 pkrs_val); #else diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 0f86f2374bd7..16f735c12fcd 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -3,6 +3,9 @@ * Intel Memory Protection Keys management * Copyright (c) 2015, Intel Corporation. */ +#undef pr_fmt +#define pr_fmt(fmt) "x86/pkeys: " fmt + #include /* debugfs_create_u32() */ #include /* mm_struct, vma, etc... */ #include /* PKEY_* */ @@ -249,3 +252,79 @@ void write_pkrs(u32 pkrs_val) this_cpu_write(pkrs_cache, pkrs_val); wrmsrl(MSR_IA32_PKRS, pkrs_val); } + +DEFINE_MUTEX(pks_lock); +static const char pks_key_user0[] = "kernel"; + +/* Store names of allocated keys for debug. Key 0 is reserved for the kernel. */ +static const char *pks_key_users[PKS_NUM_KEYS] = { + pks_key_user0 +}; + +/* + * Each key is represented by a bit. Bit 0 is set for key 0 and reserved for + * its use. We use ulong for the bit operations but only 16 bits are used. + */ +static unsigned long pks_key_allocation_map = 1 << PKS_KERN_DEFAULT_KEY; + +/* + * pks_key_alloc - Allocate a PKS key + * + * @pkey_user: String stored for debugging of key exhaustion. The caller is + * responsible to maintain this memory until pks_key_free(). + */ +int pks_key_alloc(const char * const pkey_user) +{ + int nr, old_bit, pkey; + + might_sleep(); + + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return -EINVAL; + + mutex_lock(&pks_lock); + /* Find a free bit (0) in the bit map. */ + old_bit = 1; + while (old_bit) { + nr = ffz(pks_key_allocation_map); + old_bit = __test_and_set_bit(nr, &pks_key_allocation_map); + } + + if (nr < PKS_NUM_KEYS) { + pkey = nr; + /* for debugging key exhaustion */ + pks_key_users[pkey] = pkey_user; + } else { + pkey = -ENOSPC; + pr_info("Cannot allocate supervisor key for %s.\n", + pkey_user); + } + + mutex_unlock(&pks_lock); + return pkey; +} +EXPORT_SYMBOL_GPL(pks_key_alloc); + +/* + * pks_key_free - Free a previously allocate PKS key + * + * @pkey: Key to be free'ed + */ +void pks_key_free(int pkey) +{ + might_sleep(); + + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + if (pkey >= PKS_NUM_KEYS || pkey <= PKS_KERN_DEFAULT_KEY) + return; + + mutex_lock(&pks_lock); + __clear_bit(pkey, &pks_key_allocation_map); + pks_key_users[pkey] = NULL; + /* Restore to default AD=1 and WD=0. */ + pks_update_protection(pkey, PKEY_DISABLE_ACCESS); + mutex_unlock(&pks_lock); +} +EXPORT_SYMBOL_GPL(pks_key_free); diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h index 2955ba976048..e4bff77d7b49 100644 --- a/include/linux/pkeys.h +++ b/include/linux/pkeys.h @@ -50,4 +50,18 @@ static inline void copy_init_pkru_to_fpregs(void) #endif /* ! CONFIG_ARCH_HAS_PKEYS */ +#ifndef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +static inline int pks_key_alloc(const char * const pkey_user) +{ + return -EINVAL; +} +static inline void pks_key_free(int pkey) +{ +} +static inline void pks_update_protection(int pkey, unsigned long protection) +{ + WARN_ON_ONCE(1); +} +#endif /* ! CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */ + #endif /* _LINUX_PKEYS_H */ From patchwork Tue Jul 14 07:02:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11661771 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B56E21510 for ; Tue, 14 Jul 2020 07:05:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9CAC122210 for ; Tue, 14 Jul 2020 07:05:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727840AbgGNHF2 (ORCPT ); Tue, 14 Jul 2020 03:05:28 -0400 Received: from mga11.intel.com ([192.55.52.93]:63464 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726890AbgGNHEP (ORCPT ); Tue, 14 Jul 2020 03:04:15 -0400 IronPort-SDR: tbX199lH6+gxoeI89r9ex3VBW/twRf3oWwEnCQqLdU4FfTnUFU66rAbnrVblQlvOwVjbghLN1h JQu4qXWOtYJQ== X-IronPort-AV: E=McAfee;i="6000,8403,9681"; a="146839879" X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="146839879" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:11 -0700 IronPort-SDR: 2S/nvFjzQ392JiaSqjcp8ScDg96hb1apbpi4JcpWqJ5/phFArv7EiT3mk7FEux00XHBnw/HaLS IAyf6X3DxdnQ== X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="324463867" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:11 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Fenghua Yu , x86@kernel.org, Dave Hansen , Dan Williams , Vishal Verma , Ira Weiny , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 06/15] x86/pks: Add a debugfs file for allocated PKS keys Date: Tue, 14 Jul 2020 00:02:11 -0700 Message-Id: <20200714070220.3500839-7-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200714070220.3500839-1-ira.weiny@intel.com> References: <20200714070220.3500839-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Fenghua Yu The sysadmin may need to know which PKS keys are currently being used. Add a debugfs file to show the allocated PKS keys and their names. Signed-off-by: Fenghua Yu --- arch/x86/mm/pkeys.c | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 16f735c12fcd..e565fadd74d7 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -328,3 +328,43 @@ void pks_key_free(int pkey) mutex_unlock(&pks_lock); } EXPORT_SYMBOL_GPL(pks_key_free); + +static int pks_keys_allocated_show(struct seq_file *m, void *p) +{ + int i; + + mutex_lock(&pks_lock); + for (i = PKS_KERN_DEFAULT_KEY; i < PKS_NUM_KEYS; i++) { + /* It is ok for pks_key_users[i] to be NULL */ + if (test_bit(i, &pks_key_allocation_map)) + seq_printf(m, "%d: %s\n", i, pks_key_users[i]); + } + mutex_unlock(&pks_lock); + + return 0; +} + +static int pks_keys_allocated_open(struct inode *inode, struct file *file) +{ + return single_open(file, pks_keys_allocated_show, NULL); +} + +static const struct file_operations pks_keys_allocated_fops = { + .open = pks_keys_allocated_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static int __init pks_keys_initcall(void) +{ + if (cpu_feature_enabled(X86_FEATURE_PKS)) { + /* Create a debugfs file to show allocated PKS keys. */ + debugfs_create_file("pks_keys_allocated", 0400, + arch_debugfs_dir, NULL, + &pks_keys_allocated_fops); + } + + return 0; +} +late_initcall(pks_keys_initcall); From patchwork Tue Jul 14 07:02:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11661761 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EFCBC14DD for ; Tue, 14 Jul 2020 07:05:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DCBF422205 for ; Tue, 14 Jul 2020 07:05:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727833AbgGNHF0 (ORCPT ); Tue, 14 Jul 2020 03:05:26 -0400 Received: from mga17.intel.com ([192.55.52.151]:21384 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725788AbgGNHER (ORCPT ); Tue, 14 Jul 2020 03:04:17 -0400 IronPort-SDR: S3sOneSBhYdHhQHq2WiwtLoyzylX0NbxEGkS7VALKPlPSRvrgF29d2yrKNSlF5+SWDnKlXy9xM INmBs2wdJzkQ== X-IronPort-AV: E=McAfee;i="6000,8403,9681"; a="128914560" X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="128914560" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:14 -0700 IronPort-SDR: Q2mRak2UgfiLrg3sZxYOuh5J0y3GGbFbI2P9jkFn7i2q77yP2LA96pV7XI2Q2m/00uJqp6RVgw jDz7HsCXwj3w== X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="325755398" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:13 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , x86@kernel.org, Dave Hansen , Dan Williams , Vishal Verma , Andrew Morton , Fenghua Yu , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 07/15] Documentation/pkeys: Update documentation for kernel pkeys Date: Tue, 14 Jul 2020 00:02:12 -0700 Message-Id: <20200714070220.3500839-8-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200714070220.3500839-1-ira.weiny@intel.com> References: <20200714070220.3500839-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny Future Intel CPUS will support Protection Key Supervisor (PKS). Update the protection key documentation to cover pkeys on supervisor pages. Signed-off-by: Ira Weiny --- Documentation/core-api/protection-keys.rst | 81 +++++++++++++++++----- 1 file changed, 63 insertions(+), 18 deletions(-) diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst index ec575e72d0b2..5ac400a5a306 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -4,25 +4,33 @@ Memory Protection Keys ====================== -Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature -which is found on Intel's Skylake (and later) "Scalable Processor" -Server CPUs. It will be available in future non-server Intel parts -and future AMD processors. - -For anyone wishing to test or use this feature, it is available in -Amazon's EC2 C5 instances and is known to work there using an Ubuntu -17.04 image. - Memory Protection Keys provides a mechanism for enforcing page-based protections, but without requiring modification of the page tables -when an application changes protection domains. It works by -dedicating 4 previously ignored bits in each page table entry to a -"protection key", giving 16 possible keys. +when an application changes protection domains. + +PKeys Userspace (PKU) is a feature which is found on Intel's Skylake "Scalable +Processor" Server CPUs and later. And It will be available in future +non-server Intel parts and future AMD processors. + +Future Intel processors will support Protection Keys for Supervisor pages +(PKS). + +For anyone wishing to test or use user space pkeys, it is available in Amazon's +EC2 C5 instances and is known to work there using an Ubuntu 17.04 image. + +pkes work by dedicating 4 previously Reserved bits in each page table entry to +a "protection key", giving 16 possible keys. User and Supervisor pages are +treated separately. -There is also a new user-accessible register (PKRU) with two separate -bits (Access Disable and Write Disable) for each key. Being a CPU -register, PKRU is inherently thread-local, potentially giving each -thread a different set of protections from every other thread. +Protections for each page are controlled with per CPU registers for each type +of page User and Supervisor. Each of these 32 bit register stores two separate +bits (Access Disable and Write Disable) for each key. + +For Userspace the register is user-accessible (rdpkru/wrpkru). For +Supervisor, the register (MSR_IA32_PKRS) is accessible only to the kernel. + +Being a CPU register, pkes are inherently thread-local, potentially giving +each thread an independent set of protections from every other thread. There are two new instructions (RDPKRU/WRPKRU) for reading and writing to the new register. The feature is only available in 64-bit mode, @@ -30,8 +38,11 @@ even though there is theoretically space in the PAE PTEs. These permissions are enforced on data access only and have no effect on instruction fetches. -Syscalls -======== +For kernel space rdmsr/wrmsr are used to access the kernel MSRs. + + +Syscalls for user space keys +============================ There are 3 system calls which directly interact with pkeys:: @@ -98,3 +109,37 @@ with a read():: The kernel will send a SIGSEGV in both cases, but si_code will be set to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when the plain mprotect() permissions are violated. + + +Kernel API for PKS support +========================== + +PKS is intended to harden against unwanted access to kernel pages. But it does +not completely restrict access under all conditions. For example the MSR +setting is not saved/restored during irqs. Thus the use of PKS is a mitigation +strategy rather than a form of strict security. + +The following calls are used to allocate, use, and deallocate a pkey which +defines a 'protection domain' within the kernel. Setting a pkey value in a +supervisor mapping adds that mapping to the protection domain. Then calls can be +used to enable/disable read and/or write access to all of the pages mapped with +that key: + + int pks_key_alloc(const char * const pkey_user); + #define PAGE_KERNEL_PKEY(pkey) + #define _PAGE_KEY(pkey) + int pks_update_protection(int pkey, unsigned long protection); + void pks_key_free(int pkey); + +In-kernel users must be prepared to set PAGE_KERNEL_PKEY() permission in the +page table entries for the mappings they want to ptorect. + +WARNING: It is imperative that callers check for errors from pks_key_alloc() +because pkeys are a limited resource and so callers should be prepared to work +without PKS support. + +For admins a debugfs interface provides a list of the current keys in use at: + + /sys/kernel/debug/x86/pks_keys_allocated + +Some example code can be found in lib/pks/pks_test.c From patchwork Tue Jul 14 07:02:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11661817 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C94971392 for ; Tue, 14 Jul 2020 07:10:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A62BC221ED for ; Tue, 14 Jul 2020 07:10:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726425AbgGNHKU (ORCPT ); Tue, 14 Jul 2020 03:10:20 -0400 Received: from mga05.intel.com ([192.55.52.43]:63430 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725962AbgGNHKT (ORCPT ); Tue, 14 Jul 2020 03:10:19 -0400 IronPort-SDR: QY4qsGU+j9Yl960mADzP45/tDsPw2RrUrRScm7hWeiUpqlt6QI3hbyTx0Islm7f1/ircJHKpOi zrl9t0EcLFKQ== X-IronPort-AV: E=McAfee;i="6000,8403,9681"; a="233684295" X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="233684295" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:17 -0700 IronPort-SDR: nYhnsK6Q4lyUFFffq2O5aawKQvgHnPAWTmS5ba7ugfBtOVbgV/UfNgPfYy+K9hFJKNyJUrFUMh sLFPQjBVwxyQ== X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="299441017" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:15 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , Fenghua Yu , x86@kernel.org, Dave Hansen , Dan Williams , Vishal Verma , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 08/15] x86/pks: Add PKS Test code Date: Tue, 14 Jul 2020 00:02:13 -0700 Message-Id: <20200714070220.3500839-9-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200714070220.3500839-1-ira.weiny@intel.com> References: <20200714070220.3500839-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny The core PKS functionality provides an interface for kernel users to reserve keys to their domains set up the page tables with those keys and control access to those domains when needed. Define test code which exercises the core functionality of PKS via a debugfs entry. Basic checks can be triggered on boot with a kernel command line option while both basic and preemption checks can be triggered with separate debugfs values. debugfs controls are: '0' -- Run access tests with a single pkey '1' -- Set up the pkey register with no access for the pkey allocated to this fd '2' -- Check that the pkey register updated in '1' is still the same. (To be used after a forced context switch.) '3' -- Allocate all pkeys possible and run tests on each pkey allocated. DEFAULT when run at boot. Closing the fd will cleanup and release the pkey, therefore to fully exercise context switch testing a user space program is provided in: .../tools/testing/selftests/x86/test_pks.c Co-developed-by: Fenghua Yu Signed-off-by: Fenghua Yu Signed-off-by: Ira Weiny --- arch/x86/include/asm/pkeys.h | 9 + arch/x86/mm/fault.c | 16 +- include/linux/pkeys.h | 4 + lib/Kconfig.debug | 12 + lib/Makefile | 3 + lib/pks/Makefile | 3 + lib/pks/pks_test.c | 452 +++++++++++++++++++++++++ tools/testing/selftests/x86/Makefile | 3 +- tools/testing/selftests/x86/test_pks.c | 65 ++++ 9 files changed, 562 insertions(+), 5 deletions(-) create mode 100644 lib/pks/Makefile create mode 100644 lib/pks/pks_test.c create mode 100644 tools/testing/selftests/x86/test_pks.c diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index e30ea907abb6..097abca7784c 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -168,4 +168,13 @@ static inline void pks_update_protection(int pkey, unsigned long protection) } #endif /* CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */ +#if defined(CONFIG_PKS_TESTING) +bool pks_test_armed_and_clear(void); +#else +static inline bool pks_test_armed_and_clear(void) +{ + return false; +} +#endif + #endif /*_ASM_X86_PKEYS_H */ diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 1ead568c0101..483fbf5b7957 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -18,6 +18,7 @@ #include /* faulthandler_disabled() */ #include /* efi_recover_from_page_fault()*/ #include +#include #include /* boot_cpu_has, ... */ #include /* dotraplinkage, ... */ @@ -1105,11 +1106,18 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code, unsigned long address) { /* - * Protection keys exceptions only happen on user pages. We - * have no user pages in the kernel portion of the address - * space, so do not expect them here. + * If we get a protection key exception it could be because we are + * running the PKS test. If so, pks_test_armed_and_clear() will clear + * the protection mechanism and we can safely return. + * + * Otherwise we warn the user that something has gone wrong and + * continue with the fault. */ - WARN_ON_ONCE(hw_error_code & X86_PF_PK); + if (hw_error_code & X86_PF_PK) { + if (pks_test_armed_and_clear()) + return; + WARN_ON_ONCE(hw_error_code & X86_PF_PK); + } /* Was the fault spurious, caused by lazy TLB invalidation? */ if (spurious_kernel_fault(hw_error_code, address)) diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h index e4bff77d7b49..1d84ab7c12d4 100644 --- a/include/linux/pkeys.h +++ b/include/linux/pkeys.h @@ -48,6 +48,10 @@ static inline void copy_init_pkru_to_fpregs(void) { } +static inline bool pks_test_armed_and_clear(void) +{ + return false; +} #endif /* ! CONFIG_ARCH_HAS_PKEYS */ #ifndef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 9ad9210d70a1..aa876ebb4c8b 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2329,6 +2329,18 @@ config HYPERV_TESTING help Select this option to enable Hyper-V vmbus testing. +config PKS_TESTING + bool "PKey(S)upervisor testing" + default n + depends on ARCH_HAS_SUPERVISOR_PKEYS + help + Select this option to enable testing of PKS core software and + hardware. The PKS core provides a mechanism to allocate keys as well + as maintain the protection settings across context switches. + Answer N if you don't know what supervisor keys are. + + If unsure, say N. + endmenu # "Kernel Testing and Coverage" endmenu # Kernel hacking diff --git a/lib/Makefile b/lib/Makefile index b1c42c10073b..667dea28cf7b 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -318,3 +318,6 @@ obj-$(CONFIG_OBJAGG) += objagg.o # KUnit tests obj-$(CONFIG_LIST_KUNIT_TEST) += list-test.o obj-$(CONFIG_LINEAR_RANGES_TEST) += test_linear_ranges.o + +# PKS test +obj-y += pks/ diff --git a/lib/pks/Makefile b/lib/pks/Makefile new file mode 100644 index 000000000000..7d1df7563db9 --- /dev/null +++ b/lib/pks/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 + +obj-$(CONFIG_PKS_TESTING) += pks_test.o diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c new file mode 100644 index 000000000000..6d8172734f97 --- /dev/null +++ b/lib/pks/pks_test.c @@ -0,0 +1,452 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright(c) 2020 Intel Corporation. All rights reserved. + * + * Implement PKS testing + * Access to run this test can be with a command line parameter + * ("pks-test-on-boot") or more detailed tests can be triggered through: + * + * /sys/kernel/debug/x86/run_pks + * + * debugfs controls are: + * + * '0' -- Run access tests with a single pkey + * + * '1' -- Set up the pkey register with no access for the pkey allocated to + * this fd + * '2' -- Check that the pkey register updated in '1' is still the same. (To + * be used after a forced context switch.) + * + * '3' -- Allocate all pkeys possible and run tests on each pkey allocated. + * DEFAULT when run at boot. + * + * Closing the fd will cleanup and release the pkey. + * + * A companion user space program is provided in: + * + * .../tools/testing/selftests/x86/test_pks.c + * + * which will better test the context switching. + * + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#define PKS_TEST_MEM_SIZE (PAGE_SIZE) + +/* + * run_on_boot default '= false' which checkpatch complains about initializing; + * so we don't + */ +static bool run_on_boot; +static struct dentry *pks_test_dentry; + +/* + * We must lock the following globals for brief periods while the fault handler + * checks/updates them. + */ +static DEFINE_SPINLOCK(test_lock); +static int test_armed_key; +static unsigned long prev_cnt; +static unsigned long fault_cnt; + +struct pks_test_ctx { + bool pass; + bool pks_cpu_enabled; + int pkey; + char data[64]; +}; + +static pte_t *walk_table(void *ptr) +{ + struct page *page = NULL; + pgd_t *pgdp; + p4d_t *p4dp; + pud_t *pudp; + pmd_t *pmdp; + pte_t *ret = NULL; + + pgdp = pgd_offset_k((unsigned long)ptr); + if (pgd_none(*pgdp) || pgd_bad(*pgdp)) + goto error; + + p4dp = p4d_offset(pgdp, (unsigned long)ptr); + if (p4d_none(*p4dp) || p4d_bad(*p4dp)) + goto error; + + pudp = pud_offset(p4dp, (unsigned long)ptr); + if (pud_none(*pudp) || pud_bad(*pudp)) + goto error; + + pmdp = pmd_offset(pudp, (unsigned long)ptr); + if (pmd_none(*pmdp) || pmd_bad(*pmdp)) + goto error; + + ret = pte_offset_map(pmdp, (unsigned long)ptr); + if (pte_present(*ret)) { + page = pte_page(*ret); + if (!page) { + pte_unmap(ret); + goto error; + } + pr_info("page 0x%lx; flags 0x%lx\n", + (unsigned long)page, page->flags); + } + +error: + return ret; +} + +/** + * pks_test_armed_and_clear() is exported so that the fault handler can detect + * and report back status of intentional faults. + * + * NOTE: It clears the protection key from the page such that the fault handler + * will not re-trigger. + */ +bool pks_test_armed_and_clear(void) +{ + bool armed = (test_armed_key != 0); + + if (armed) { + /* Enable read and write to stop faults */ + pks_update_protection(test_armed_key, 0); + fault_cnt++; + } + + return armed; +} +EXPORT_SYMBOL(pks_test_armed_and_clear); + +static bool exception_caught(void) +{ + bool ret = (fault_cnt != prev_cnt); + + prev_cnt = fault_cnt; + return ret; +} + +static void report_pkey_settings(void *unused) +{ + u8 pkey; + unsigned long long msr = 0; + unsigned int cpu = smp_processor_id(); + + rdmsrl(MSR_IA32_PKRS, msr); + + pr_info("for CPU %d : 0x%llx\n", cpu, msr); + for (pkey = 0; pkey < PKS_NUM_KEYS; pkey++) { + int ad, wd; + + ad = (msr >> (pkey * PKR_BITS_PER_PKEY)) & PKEY_DISABLE_ACCESS; + wd = (msr >> (pkey * PKR_BITS_PER_PKEY)) & PKEY_DISABLE_WRITE; + pr_info(" %u: A:%d W:%d\n", pkey, ad, wd); + } +} + +struct pks_access_test { + int ad; + int wd; + bool write; + bool exception; +}; + +static struct pks_access_test pkey_test_ary[] = { + /* disable both */ + { PKEY_DISABLE_ACCESS, PKEY_DISABLE_WRITE, true, true }, + { PKEY_DISABLE_ACCESS, PKEY_DISABLE_WRITE, false, true }, + + /* enable both */ + { 0, 0, true, false }, + { 0, 0, false, false }, + + /* enable read only */ + { 0, PKEY_DISABLE_WRITE, true, true }, + { 0, PKEY_DISABLE_WRITE, false, false }, +}; + +static int run_access_test(struct pks_test_ctx *ctx, + struct pks_access_test *test, + void *ptr) +{ + int ret = 0; + bool exception; + + pks_update_protection(ctx->pkey, test->ad | test->wd); + + spin_lock(&test_lock); + test_armed_key = ctx->pkey; + + if (test->write) + memcpy(ptr, ctx->data, 8); + else + memcpy(ctx->data, ptr, 8); + + exception = exception_caught(); + + test_armed_key = 0; + spin_unlock(&test_lock); + + if (test->exception != exception) { + pr_err("pkey test FAILED: ad %d; wd %d; write %s; exception %s != %s\n", + test->ad, test->wd, + test->write ? "TRUE" : "FALSE", + test->exception ? "TRUE" : "FALSE", + exception ? "TRUE" : "FALSE"); + ret = -EFAULT; + } + + return ret; +} + +static void test_mem_access(struct pks_test_ctx *ctx) +{ + int i, rc; + u8 pkey; + void *ptr = NULL; + pte_t *ptep; + + ptr = __vmalloc_node_range(PKS_TEST_MEM_SIZE, 1, VMALLOC_START, VMALLOC_END, + GFP_KERNEL, PAGE_KERNEL_PKEY(ctx->pkey), + 0, NUMA_NO_NODE, __builtin_return_address(0)); + if (!ptr) { + pr_err("Failed to vmalloc page???\n"); + ctx->pass = false; + return; + } + + ptep = walk_table(ptr); + if (!ptep) { + pr_err("Failed to walk table???\n"); + ctx->pass = false; + goto done; + } + + pkey = pte_flags_pkey(ptep->pte); + pr_info("ptep flags 0x%lx pkey %u\n", + (unsigned long)ptep->pte, pkey); + + if (pkey != ctx->pkey) { + pr_err("invalid pkey found: %u, test_pkey: %u\n", + pkey, ctx->pkey); + ctx->pass = false; + goto unmap; + } + + if (!ctx->pks_cpu_enabled) { + pr_err("not CPU enabled; skipping access tests...\n"); + ctx->pass = true; + goto unmap; + } + + for (i = 0; i < ARRAY_SIZE(pkey_test_ary); i++) { + rc = run_access_test(ctx, &pkey_test_ary[i], ptr); + + /* only save last error is fine */ + if (rc) + ctx->pass = false; + } + +unmap: + pte_unmap(ptep); +done: + vfree(ptr); +} + +static void pks_run_test(struct pks_test_ctx *ctx) +{ + ctx->pass = true; + + pr_info("\n"); + pr_info("\n"); + pr_info(" ***** BEGIN: Testing (CPU enabled : %s) *****\n", + ctx->pks_cpu_enabled ? "TRUE" : "FALSE"); + + if (ctx->pks_cpu_enabled) + on_each_cpu(report_pkey_settings, NULL, 1); + + pr_info(" BEGIN: pkey %d Testing\n", ctx->pkey); + test_mem_access(ctx); + pr_info(" END: PAGE_KERNEL_PKEY Testing : %s\n", + ctx->pass ? "PASS" : "FAIL"); + + pr_info(" ***** END: Testing *****\n"); + pr_info("\n"); + pr_info("\n"); +} + +static ssize_t pks_read_file(struct file *file, char __user *user_buf, + size_t count, loff_t *ppos) +{ + struct pks_test_ctx *ctx = file->private_data; + char buf[32]; + unsigned int len; + + if (!ctx) + len = sprintf(buf, "not run\n"); + else + len = sprintf(buf, "%s\n", ctx->pass ? "PASS" : "FAIL"); + + return simple_read_from_buffer(user_buf, count, ppos, buf, len); +} + +static struct pks_test_ctx *alloc_ctx(const char *name) +{ + struct pks_test_ctx *ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); + + if (!ctx) { + pr_err("Failed to allocate memory for test context\n"); + return ERR_PTR(-ENOMEM); + } + + ctx->pkey = pks_key_alloc(name); + if (ctx->pkey <= 0) { + pr_err("Failed to allocate memory for test context\n"); + kfree(ctx); + return ERR_PTR(-ENOMEM); + } + + ctx->pks_cpu_enabled = cpu_feature_enabled(X86_FEATURE_PKS); + sprintf(ctx->data, "%s", "DEADBEEF"); + return ctx; +} + +static void free_ctx(struct pks_test_ctx *ctx) +{ + pks_key_free(ctx->pkey); + kfree(ctx); +} + +static void run_all(void) +{ + struct pks_test_ctx *ctx[PKS_NUM_KEYS]; + static char name[PKS_NUM_KEYS][64]; + int i; + + for (i = 1; i < PKS_NUM_KEYS; i++) { + sprintf(name[i], "pks ctx %d", i); + ctx[i] = alloc_ctx((const char *)name[i]); + } + + for (i = 1; i < PKS_NUM_KEYS; i++) { + if (!IS_ERR(ctx[i])) + pks_run_test(ctx[i]); + } + + for (i = 1; i < PKS_NUM_KEYS; i++) { + if (!IS_ERR(ctx[i])) + free_ctx(ctx[i]); + } +} + +static ssize_t pks_write_file(struct file *file, const char __user *user_buf, + size_t count, loff_t *ppos) +{ + char buf[2]; + struct pks_test_ctx *ctx = file->private_data; + + if (copy_from_user(buf, user_buf, 1)) + return -EFAULT; + buf[1] = '\0'; + + /* + * Test "3" will test allocating all keys. Do it first without + * using "ctx". + */ + if (!strcmp(buf, "3")) + run_all(); + + if (!ctx) { + ctx = alloc_ctx("pks test"); + if (IS_ERR(ctx)) + return -ENOMEM; + file->private_data = ctx; + } + + if (!strcmp(buf, "0")) + pks_run_test(ctx); + + /* start of context switch test */ + if (!strcmp(buf, "1")) { + /* Ensure a known state to test context switch */ + pks_update_protection(ctx->pkey, + PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE); + } + + /* After context switch msr should be restored */ + if (!strcmp(buf, "2") && ctx->pks_cpu_enabled) { + unsigned long reg_pkrs; + int access; + + rdmsrl(MSR_IA32_PKRS, reg_pkrs); + + access = (reg_pkrs >> (ctx->pkey * PKR_BITS_PER_PKEY)) & + PKEY_ACCESS_MASK; + if (access != (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE)) { + ctx->pass = false; + pr_err("Context switch check failed\n"); + } + } + + return count; +} + +static int pks_release_file(struct inode *inode, struct file *file) +{ + struct pks_test_ctx *ctx = file->private_data; + + if (!ctx) + return 0; + + free_ctx(ctx); + return 0; +} + +static const struct file_operations fops_init_pks = { + .read = pks_read_file, + .write = pks_write_file, + .llseek = default_llseek, + .release = pks_release_file, +}; + +static int __init parse_pks_test_options(char *str) +{ + run_on_boot = true; + + return 0; +} +early_param("pks-test-on-boot", parse_pks_test_options); + +static int __init pks_test_init(void) +{ + if (cpu_feature_enabled(X86_FEATURE_PKS)) { + if (run_on_boot) + run_all(); + + pks_test_dentry = debugfs_create_file("run_pks", 0600, arch_debugfs_dir, + NULL, &fops_init_pks); + } + + return 0; +} +late_initcall(pks_test_init); + +static void __exit pks_test_exit(void) +{ + debugfs_remove(pks_test_dentry); + pr_info("test exit\n"); +} +module_exit(pks_test_exit); + +MODULE_AUTHOR("Intel Corporation"); +MODULE_LICENSE("GPL v2"); diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index d2796ea98c5a..3572dfb25c0a 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -13,7 +13,8 @@ CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh $(CC) trivial_program.c -no-pie) TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap_vdso \ check_initial_reg_state sigreturn iopl ioperm \ test_vdso test_vsyscall mov_ss_trap \ - syscall_arg_fault + syscall_arg_fault test_pks + TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ vdso_restorer diff --git a/tools/testing/selftests/x86/test_pks.c b/tools/testing/selftests/x86/test_pks.c new file mode 100644 index 000000000000..8037a2a9ff5f --- /dev/null +++ b/tools/testing/selftests/x86/test_pks.c @@ -0,0 +1,65 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include + +int main(void) +{ + cpu_set_t cpuset; + char result[32]; + pid_t pid; + int fd; + + CPU_ZERO(&cpuset); + CPU_SET(0, &cpuset); + /* Two processes run on CPU 0 so that they go through context switch. */ + sched_setaffinity(getpid(), sizeof(cpu_set_t), &cpuset); + + pid = fork(); + if (pid == 0) { + fd = open("/sys/kernel/debug/x86/run_pks", O_RDWR); + if (fd < 0) { + printf("cannot open file\n"); + return -1; + } + + /* Allocate test_pkey1 and run test. */ + write(fd, "0", 1); + + /* Arm for context switch test */ + write(fd, "1", 1); + + /* Context switch out... */ + sleep(4); + + /* Check msr restored */ + write(fd, "2", 1); + } else { + sleep(2); + + fd = open("/sys/kernel/debug/x86/run_pks", O_RDWR); + if (fd < 0) { + printf("cannot open file\n"); + return -1; + } + + /* run test with alternate pkey */ + write(fd, "0", 1); + } + + read(fd, result, 10); + printf("#PF, context switch, pkey allocation and free tests: %s\n", + result); + + close(fd); + + return 0; +} From patchwork Tue Jul 14 07:02:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11661757 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6259A1510 for ; Tue, 14 Jul 2020 07:05:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5410222210 for ; Tue, 14 Jul 2020 07:05:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726892AbgGNHFI (ORCPT ); Tue, 14 Jul 2020 03:05:08 -0400 Received: from mga07.intel.com ([134.134.136.100]:36442 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726936AbgGNHEV (ORCPT ); Tue, 14 Jul 2020 03:04:21 -0400 IronPort-SDR: ACgwm0Q1Kl4+u+a+0fm2nJRhaByd2yiJ5R4dOGP70IW8ekzBS+C0hthslOcxkRtLLi+v0gj07v dP5G1chAvo8A== X-IronPort-AV: E=McAfee;i="6000,8403,9681"; a="213635649" X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="213635649" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:17 -0700 IronPort-SDR: 5oFSt+rQ81PQdl8QBXCqp9lx6EuUp2oSohr8H0g6Sb5uObVFDfEd/Sews/ccuD5tPz01v6Cf0X n+qUtmLpJo0g== X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="268570272" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:17 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , Ben Widawsky , Dan Williams , x86@kernel.org, Dave Hansen , Vishal Verma , Andrew Morton , Fenghua Yu , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 09/15] fs/dax: Remove unused size parameter Date: Tue, 14 Jul 2020 00:02:14 -0700 Message-Id: <20200714070220.3500839-10-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200714070220.3500839-1-ira.weiny@intel.com> References: <20200714070220.3500839-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny Passing size to copy_user_dax implies it can copy variable sizes of data when in fact it calls copy_user_page() which is exactly a page. We are safe because the only caller uses PAGE_SIZE anyway so just remove the variable for clarity. While we are at it change copy_user_dax() to copy_cow_page_dax() to make it clear it is a singleton helper for this one case not implementing what dax_iomap_actor() does. Reviewed-by: Ben Widawsky Reviewed-by: Dan Williams Signed-off-by: Ira Weiny --- fs/dax.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 11b16729b86f..3e0babeb0365 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -680,21 +680,20 @@ int dax_invalidate_mapping_entry_sync(struct address_space *mapping, return __dax_invalidate_entry(mapping, index, false); } -static int copy_user_dax(struct block_device *bdev, struct dax_device *dax_dev, - sector_t sector, size_t size, struct page *to, - unsigned long vaddr) +static int copy_cow_page_dax(struct block_device *bdev, struct dax_device *dax_dev, + sector_t sector, struct page *to, unsigned long vaddr) { void *vto, *kaddr; pgoff_t pgoff; long rc; int id; - rc = bdev_dax_pgoff(bdev, sector, size, &pgoff); + rc = bdev_dax_pgoff(bdev, sector, PAGE_SIZE, &pgoff); if (rc) return rc; id = dax_read_lock(); - rc = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size), &kaddr, NULL); + rc = dax_direct_access(dax_dev, pgoff, PHYS_PFN(PAGE_SIZE), &kaddr, NULL); if (rc < 0) { dax_read_unlock(id); return rc; @@ -1305,8 +1304,8 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, clear_user_highpage(vmf->cow_page, vaddr); break; case IOMAP_MAPPED: - error = copy_user_dax(iomap.bdev, iomap.dax_dev, - sector, PAGE_SIZE, vmf->cow_page, vaddr); + error = copy_cow_page_dax(iomap.bdev, iomap.dax_dev, + sector, vmf->cow_page, vaddr); break; default: WARN_ON_ONCE(1); From patchwork Tue Jul 14 07:02:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11661753 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B89E114DD for ; Tue, 14 Jul 2020 07:05:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A09F02075F for ; Tue, 14 Jul 2020 07:05:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727810AbgGNHFI (ORCPT ); Tue, 14 Jul 2020 03:05:08 -0400 Received: from mga04.intel.com ([192.55.52.120]:50365 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726930AbgGNHET (ORCPT ); Tue, 14 Jul 2020 03:04:19 -0400 IronPort-SDR: fwgGQ7YNDCwQCEUIT2phI1ATj/Qu6M9vCHJZzzk27SSttayB+89Un9q37EB3RVQaaelIib8SR0 y/oiYUdEnOKA== X-IronPort-AV: E=McAfee;i="6000,8403,9681"; a="146304344" X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="146304344" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:19 -0700 IronPort-SDR: 9LKgIdM/SdUreJg1navD+iwUluxTNA29jANU+9W/GbR+y4NJhTqBe5QwxAfUIq7aVqFXyOWk5Q vWEckWQDPPOw== X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="317627579" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:18 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , Dan Williams , x86@kernel.org, Dave Hansen , Vishal Verma , Andrew Morton , Fenghua Yu , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 10/15] drivers/dax: Expand lock scope to cover the use of addresses Date: Tue, 14 Jul 2020 00:02:15 -0700 Message-Id: <20200714070220.3500839-11-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200714070220.3500839-1-ira.weiny@intel.com> References: <20200714070220.3500839-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny The addition of PKS protection to dax read lock/unlock will require that the address returned by dax_direct_access() be protected by this lock. While not technically necessary for this series, this corrects the locking by ensuring that the use of kaddr and end_kaddr are covered by the dax read lock/unlock. Change the lock scope to cover the kaddr and end_kaddr use. Reviewed-by: Dan Williams Signed-off-by: Ira Weiny --- drivers/dax/super.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/dax/super.c b/drivers/dax/super.c index 8e32345be0f7..021739768093 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -103,11 +103,11 @@ bool __generic_fsdax_supported(struct dax_device *dax_dev, id = dax_read_lock(); len = dax_direct_access(dax_dev, pgoff, 1, &kaddr, &pfn); len2 = dax_direct_access(dax_dev, pgoff_end, 1, &end_kaddr, &end_pfn); - dax_read_unlock(id); if (len < 1 || len2 < 1) { pr_debug("%s: error: dax access failed (%ld)\n", bdevname(bdev, buf), len < 1 ? len : len2); + dax_read_unlock(id); return false; } @@ -137,6 +137,7 @@ bool __generic_fsdax_supported(struct dax_device *dax_dev, put_dev_pagemap(end_pgmap); } + dax_read_unlock(id); if (!dax_enabled) { pr_debug("%s: error: dax support not enabled\n", From patchwork Tue Jul 14 07:02:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11661693 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6E668913 for ; Tue, 14 Jul 2020 07:04:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 534B72220F for ; Tue, 14 Jul 2020 07:04:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726970AbgGNHEX (ORCPT ); Tue, 14 Jul 2020 03:04:23 -0400 Received: from mga18.intel.com ([134.134.136.126]:49045 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726944AbgGNHEW (ORCPT ); Tue, 14 Jul 2020 03:04:22 -0400 IronPort-SDR: ZSEd0WtpBNcDnNwHG/LZ2GVm0ilG0lQtQJXiT/n8aRFC3s4oDoICGcaq0JwENIiLbEw9ydFQxz kKjRbitdZtqg== X-IronPort-AV: E=McAfee;i="6000,8403,9681"; a="136290701" X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="136290701" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:20 -0700 IronPort-SDR: KzKWQ5XDPI9f4CPOecYYqtYqcrY0x1fWz3YLL7Fu5urcUA62nhYAffor9aUAx/LtWw4rqvfv9U KvNOlS82/Gng== X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="316298233" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:20 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , x86@kernel.org, Dave Hansen , Dan Williams , Vishal Verma , Andrew Morton , Fenghua Yu , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 11/15] memremap: Add zone device access protection Date: Tue, 14 Jul 2020 00:02:16 -0700 Message-Id: <20200714070220.3500839-12-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200714070220.3500839-1-ira.weiny@intel.com> References: <20200714070220.3500839-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny Device managed memory exposes itself to the kernel direct map which allows stray pointers to access these device memories. Stray pointers to normal memory may result in a crash or other undesirable behavior which, while unfortunate, are usually recoverable with a reboot. Stray writes to areas such as non-volatile memory are permanent in nature and thus are more likely to result in permanent user data loss vs a stray write to other memory areas Set up an infrastructure for extra device access protection. Then implement the new protection using the new Protection Keys Supervisor (PKS) on architectures which support it. To enable this extra protection devices specify a flag in the pgmap to indicate that these areas wish to use additional protection. Kernel code which intends to access this memory can do so automatically through the use of the kmap infrastructure calling into dev_access_[enable|disable]() described here. The kmap infrastructure is implemented in a follow on patch. In addition, users can directly enable/disable the access through dev_access_[enable|disable]() if they have a priori knowledge of the type of pages they are accessing. All calls to enable/disable protection flow through dev_access_[enable|disable]() and are nestable by the use of a per task reference count. This reference count does 2 things. 1) Allows a thread to nest calls to disable protection such that the first call to re-enable protection does not 'break' the last access of the pmem device memory. 2) Provides faster performance by avoiding lots of MSR writes. For example, looping over a sequence of pmem pages. IRQ context borrows the reference count of the interrupted task. This is a trade off vs saving/restoring on interrupt entry/exit. The following example shows how this works: ... // ref == 0 dev_access_enable() // ref += 1 ==> disable protection irq() dev_access_enable() // ref += 1 ==> 2 dev_access_disable() // ref -= 1 ==> 1 do_pmem_thing() // all good here dev_access_disable() // ref -= 1 ==> 0 ==> enable protection ... While this does leave some openings for stray writes during irq's the over all protection is much stronger after this patch and implementing save/restore during irq's would have been a much more complicated implementation. So we compromise. The pkey value is never free'ed as this too optimizes the implementation to be either on or off using the static branch conditional in the fast paths. Signed-off-by: Ira Weiny --- include/linux/memremap.h | 1 + include/linux/mm.h | 33 ++++++++++++ include/linux/sched.h | 3 ++ init/init_task.c | 3 ++ kernel/fork.c | 3 ++ mm/Kconfig | 13 +++++ mm/memremap.c | 111 +++++++++++++++++++++++++++++++++++++++ 7 files changed, 167 insertions(+) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 5f5b2df06e61..87a9772b1aa7 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -90,6 +90,7 @@ struct dev_pagemap_ops { }; #define PGMAP_ALTMAP_VALID (1 << 0) +#define PGMAP_PROT_ENABLED (1 << 1) /** * struct dev_pagemap - metadata for ZONE_DEVICE mappings diff --git a/include/linux/mm.h b/include/linux/mm.h index dc7b87310c10..99d0914e26f9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1123,6 +1123,39 @@ static inline bool is_pci_p2pdma_page(const struct page *page) page->pgmap->type == MEMORY_DEVICE_PCI_P2PDMA; } +#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION +DECLARE_STATIC_KEY_FALSE(dev_protection_static_key); + +/* + * We make page_is_access_protected() as quick as possible. + * 1) If no mappings have been enabled with extra protection we skip this + * entirely + * 2) Skip pages which are not ZONE_DEVICE + * 3) Only then check if this particular page was mapped with extra + * protections. + */ +static inline bool page_is_access_protected(struct page *page) +{ + if (!static_branch_unlikely(&dev_protection_static_key)) + return false; + if (!is_zone_device_page(page)) + return false; + if (page->pgmap->flags & PGMAP_PROT_ENABLED) + return true; + return false; +} + +void dev_access_enable(void); +void dev_access_disable(void); +#else +static inline bool page_is_access_protected(struct page *page) +{ + return false; +} +static inline void dev_access_enable(void) { } +static inline void dev_access_disable(void) { } +#endif /* CONFIG_ZONE_DEVICE_ACCESS_PROTECTION */ + /* 127: arbitrary random number, small enough to assemble well */ #define page_ref_zero_or_close_to_overflow(page) \ ((unsigned int) page_ref_count(page) + 127u <= 127u) diff --git a/include/linux/sched.h b/include/linux/sched.h index 692e327d7455..2a8dbbb371ee 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1313,6 +1313,9 @@ struct task_struct { struct callback_head mce_kill_me; #endif +#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION + unsigned int dev_page_access_ref; +#endif /* * New fields for task_struct should be added above here, so that * they are included in the randomized portion of task_struct. diff --git a/init/init_task.c b/init/init_task.c index 15089d15010a..17766b059606 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -204,6 +204,9 @@ struct task_struct init_task #ifdef CONFIG_SECURITY .security = NULL, #endif +#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION + .dev_page_access_ref = 0, +#endif }; EXPORT_SYMBOL(init_task); diff --git a/kernel/fork.c b/kernel/fork.c index efc5493203ae..a6c14b962a27 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -957,6 +957,9 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) #ifdef CONFIG_MEMCG tsk->active_memcg = NULL; +#endif +#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION + tsk->dev_page_access_ref = 0; #endif return tsk; diff --git a/mm/Kconfig b/mm/Kconfig index e541d2c0dcac..f6029f3c2c89 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -798,6 +798,19 @@ config ZONE_DEVICE If FS_DAX is enabled, then say Y. +config ZONE_DEVICE_ACCESS_PROTECTION + bool "Device memory access protection" + depends on ZONE_DEVICE + depends on ARCH_HAS_SUPERVISOR_PKEYS + + help + Enable the option of having access protections on device memory + areas. This protects against access to device memory which is not + intended such as stray writes. This feature is particularly useful + to protect against corruption of persistent memory. + + If in doubt, say 'Y'. + config DEV_PAGEMAP_OPS bool diff --git a/mm/memremap.c b/mm/memremap.c index 03e38b7a38f1..dfbbfd1221a8 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -6,12 +6,16 @@ #include #include #include +#include #include #include #include #include #include #include +#include + +#define PKEY_INVALID (INT_MIN) static DEFINE_XARRAY(pgmap_array); @@ -70,6 +74,110 @@ static void devmap_managed_enable_put(void) } #endif /* CONFIG_DEV_PAGEMAP_OPS */ +#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION +/* + * Note all devices which have asked for protections share the same key. The + * key may, or may not, have been provided by the core. If not, protection + * will remain disabled. The key acquisition is attempted at init time and + * never again. So we don't have to worry about dev_page_pkey changing. + */ +static int dev_page_pkey = PKEY_INVALID; +DEFINE_STATIC_KEY_FALSE(dev_protection_static_key); +EXPORT_SYMBOL(dev_protection_static_key); +DEFINE_MUTEX(dev_prot_enable_lock); +static int dev_protection_enable; + +static pgprot_t dev_protection_enable_get(struct dev_pagemap *pgmap, pgprot_t prot) +{ + if (pgmap->flags & PGMAP_PROT_ENABLED && dev_page_pkey != PKEY_INVALID) { + pgprotval_t val = pgprot_val(prot); + + mutex_lock(&dev_prot_enable_lock); + dev_protection_enable++; + /* Only enable the static branch 1 time */ + if (dev_protection_enable == 1) + static_branch_enable(&dev_protection_static_key); + mutex_unlock(&dev_prot_enable_lock); + + prot = __pgprot(val | _PAGE_PKEY(dev_page_pkey)); + } + return prot; +} + +static void dev_protection_enable_put(struct dev_pagemap *pgmap) +{ + if (pgmap->flags & PGMAP_PROT_ENABLED && dev_page_pkey != PKEY_INVALID) { + mutex_lock(&dev_prot_enable_lock); + dev_protection_enable--; + if (dev_protection_enable == 0) + static_branch_disable(&dev_protection_static_key); + mutex_unlock(&dev_prot_enable_lock); + } +} + +void dev_access_disable(void) +{ + unsigned long flags; + + if (!static_branch_unlikely(&dev_protection_static_key)) + return; + + local_irq_save(flags); + current->dev_page_access_ref--; + if (current->dev_page_access_ref == 0) + pks_update_protection(dev_page_pkey, PKEY_DISABLE_ACCESS); + local_irq_restore(flags); +} +EXPORT_SYMBOL_GPL(dev_access_disable); + +void dev_access_enable(void) +{ + unsigned long flags; + + if (!static_branch_unlikely(&dev_protection_static_key)) + return; + + local_irq_save(flags); + /* 0 clears the PKEY_DISABLE_ACCESS bit, allowing access */ + if (current->dev_page_access_ref == 0) + pks_update_protection(dev_page_pkey, 0); + current->dev_page_access_ref++; + local_irq_restore(flags); +} +EXPORT_SYMBOL_GPL(dev_access_enable); + +/** + * dev_access_protection_init: Configure a PKS key domain for device pages + * + * The domain defaults to the protected state. Device page mappings should set + * the PGMAP_PROT_ENABLED flag when mapping pages. + * + * Note the pkey is never free'ed. This is run at init time and we either get + * the key or we do not. We need to do this to maintian a constant key (or + * not) as device memory is added or removed. + */ +static int __init __dev_access_protection_init(void) +{ + int pkey = pks_key_alloc("Device Memory"); + + if (pkey < 0) + return 0; + + dev_page_pkey = pkey; + + return 0; +} +subsys_initcall(__dev_access_protection_init); +#else +static pgprot_t dev_protection_enable_get(struct dev_pagemap *pgmap, pgprot_t prot) +{ + return prot; +} +static void dev_protection_enable_put(struct dev_pagemap *pgmap) +{ +} +#endif /* CONFIG_ZONE_DEVICE_ACCESS_PROTECTION */ + static void pgmap_array_delete(struct resource *res) { xa_store_range(&pgmap_array, PHYS_PFN(res->start), PHYS_PFN(res->end), @@ -159,6 +267,7 @@ void memunmap_pages(struct dev_pagemap *pgmap) pgmap_array_delete(res); WARN_ONCE(pgmap->altmap.alloc, "failed to free all reserved pages\n"); devmap_managed_enable_put(); + dev_protection_enable_put(pgmap); } EXPORT_SYMBOL_GPL(memunmap_pages); @@ -194,6 +303,8 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid) int error, is_ram; bool need_devmap_managed = true; + params.pgprot = dev_protection_enable_get(pgmap, params.pgprot); + switch (pgmap->type) { case MEMORY_DEVICE_PRIVATE: if (!IS_ENABLED(CONFIG_DEVICE_PRIVATE)) { From patchwork Tue Jul 14 07:02:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11661741 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 51D2414DD for ; Tue, 14 Jul 2020 07:04:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4258F22210 for ; Tue, 14 Jul 2020 07:04:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727091AbgGNHEx (ORCPT ); Tue, 14 Jul 2020 03:04:53 -0400 Received: from mga02.intel.com ([134.134.136.20]:7568 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726964AbgGNHEX (ORCPT ); Tue, 14 Jul 2020 03:04:23 -0400 IronPort-SDR: g8izOY92QYbvCb/YVWxnMmgkQ1KonhvoN4mEpeapEZGPRPlN8eTnCQEzzCiGodv4Z2S0UY59Mx c+X4MlOOq2sQ== X-IronPort-AV: E=McAfee;i="6000,8403,9681"; a="136970732" X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="136970732" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:22 -0700 IronPort-SDR: i0LqExrBzUtJkU9eqNM0y424kEtZ8Ql24Gg0EKNp03C+/z+emnIMY4v+O6oNf5WcrrVRGG2n8D +66KNsFjR4FQ== X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="360295775" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:21 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , x86@kernel.org, Dave Hansen , Dan Williams , Vishal Verma , Andrew Morton , Fenghua Yu , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 12/15] kmap: Add stray write protection for device pages Date: Tue, 14 Jul 2020 00:02:17 -0700 Message-Id: <20200714070220.3500839-13-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200714070220.3500839-1-ira.weiny@intel.com> References: <20200714070220.3500839-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny Device managed pages may have additional protections. These protections need to be removed prior to valid use by kernel users. Check for special treatment of device managed pages in kmap and take action if needed. We use kmap as an interface for generic kernel code because under normal circumstances it would be a bug for general kernel code to not use kmap prior to accessing kernel memory. Therefore, this should allow any valid kernel users to seamlessly use these pages without issues. Signed-off-by: Ira Weiny --- include/linux/highmem.h | 32 +++++++++++++++++++++++++++++++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/include/linux/highmem.h b/include/linux/highmem.h index d6e82e3de027..7f809d8d5a94 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -8,6 +8,7 @@ #include #include #include +#include #include @@ -31,6 +32,20 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size) #include +static inline void enable_access(struct page *page) +{ + if (!page_is_access_protected(page)) + return; + dev_access_enable(); +} + +static inline void disable_access(struct page *page) +{ + if (!page_is_access_protected(page)) + return; + dev_access_disable(); +} + #ifdef CONFIG_HIGHMEM extern void *kmap_atomic_high_prot(struct page *page, pgprot_t prot); extern void kunmap_atomic_high(void *kvaddr); @@ -55,6 +70,11 @@ static inline void *kmap(struct page *page) else addr = kmap_high(page); kmap_flush_tlb((unsigned long)addr); + /* + * Even non-highmem pages may have additional access protections which + * need to be checked and potentially enabled. + */ + enable_access(page); return addr; } @@ -63,6 +83,11 @@ void kunmap_high(struct page *page); static inline void kunmap(struct page *page) { might_sleep(); + /* + * Even non-highmem pages may have additional access protections which + * need to be checked and potentially disabled. + */ + disable_access(page); if (!PageHighMem(page)) return; kunmap_high(page); @@ -85,6 +110,7 @@ static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot) { preempt_disable(); pagefault_disable(); + enable_access(page); if (!PageHighMem(page)) return page_address(page); return kmap_atomic_high_prot(page, prot); @@ -137,6 +163,7 @@ static inline unsigned long totalhigh_pages(void) { return 0UL; } static inline void *kmap(struct page *page) { might_sleep(); + enable_access(page); return page_address(page); } @@ -146,6 +173,7 @@ static inline void kunmap_high(struct page *page) static inline void kunmap(struct page *page) { + disable_access(page); #ifdef ARCH_HAS_FLUSH_ON_KUNMAP kunmap_flush_on_unmap(page_address(page)); #endif @@ -155,6 +183,7 @@ static inline void *kmap_atomic(struct page *page) { preempt_disable(); pagefault_disable(); + enable_access(page); return page_address(page); } #define kmap_atomic_prot(page, prot) kmap_atomic(page) @@ -216,7 +245,8 @@ static inline void kmap_atomic_idx_pop(void) #define kunmap_atomic(addr) \ do { \ BUILD_BUG_ON(__same_type((addr), struct page *)); \ - kunmap_atomic_high(addr); \ + disable_access(kmap_to_page(addr)); \ + kunmap_atomic_high(addr); \ pagefault_enable(); \ preempt_enable(); \ } while (0) From patchwork Tue Jul 14 07:02:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11661737 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B6705913 for ; Tue, 14 Jul 2020 07:04:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9F2932222C for ; Tue, 14 Jul 2020 07:04:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727044AbgGNHEd (ORCPT ); Tue, 14 Jul 2020 03:04:33 -0400 Received: from mga09.intel.com ([134.134.136.24]:61109 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727029AbgGNHE3 (ORCPT ); Tue, 14 Jul 2020 03:04:29 -0400 IronPort-SDR: /KgLZW4dFvREbw1hbd/0VuJ+D/5ZkkxHP23O7JcRBzVKZ5kBY8G0qeUyPItldvf6eR8wDCH6pU XnnPra5O9u8g== X-IronPort-AV: E=McAfee;i="6000,8403,9681"; a="150261214" X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="150261214" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:28 -0700 IronPort-SDR: M1818UDpOziaMV+1NuZExXbDGKfxAHE7xxsE/1/8eIxmubbVe3Yh1zWgdqipkeJC59admPi65v orXIGfMQFVpQ== X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="390397762" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:26 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , x86@kernel.org, Dave Hansen , Dan Williams , Vishal Verma , Andrew Morton , Fenghua Yu , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 13/15] dax: Stray write protection for dax_direct_access() Date: Tue, 14 Jul 2020 00:02:18 -0700 Message-Id: <20200714070220.3500839-14-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200714070220.3500839-1-ira.weiny@intel.com> References: <20200714070220.3500839-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny dax_direct_access() is a special case of accessing pmem via a page offset and without a struct page. Because the dax driver is well aware of the special protections it has mapped memory with, call dev_access_[en|dis]able() directly instead of the unnecessary overhead of trying to get a page to kmap. Like kmap though, leverage the existing dax_read[un]lock() functions because they are already required to surround the use of the memory returned from dax_direct_access(). Signed-off-by: Ira Weiny --- drivers/dax/super.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/dax/super.c b/drivers/dax/super.c index 021739768093..e8d0a28e6ed2 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -30,12 +30,14 @@ static DEFINE_SPINLOCK(dax_host_lock); int dax_read_lock(void) { + dev_access_enable(); return srcu_read_lock(&dax_srcu); } EXPORT_SYMBOL_GPL(dax_read_lock); void dax_read_unlock(int id) { + dev_access_disable(); srcu_read_unlock(&dax_srcu, id); } EXPORT_SYMBOL_GPL(dax_read_unlock); From patchwork Tue Jul 14 07:02:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11661733 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C6C8D913 for ; Tue, 14 Jul 2020 07:04:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B94F722225 for ; Tue, 14 Jul 2020 07:04:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727047AbgGNHEe (ORCPT ); Tue, 14 Jul 2020 03:04:34 -0400 Received: from mga01.intel.com ([192.55.52.88]:38972 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727025AbgGNHE2 (ORCPT ); Tue, 14 Jul 2020 03:04:28 -0400 IronPort-SDR: i7PHFqUKWuxp5lv35VRP21KvIozxwHJg1WLpeQaFVUuVqNIbWjLGxpCJUzDAjII+uR5hshyBHH Bfdkih2LvyHg== X-IronPort-AV: E=McAfee;i="6000,8403,9681"; a="166930866" X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="166930866" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:28 -0700 IronPort-SDR: 2GlF4QiJnNhoGgMJ5XMp7xdyQ0iPfyqW+HUR8m12RTg7zWuuTG19TEkAYr8cw+FwjIa2w1Dc3M MV33WpvpzAfw== X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="269934896" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:28 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , x86@kernel.org, Dave Hansen , Dan Williams , Vishal Verma , Andrew Morton , Fenghua Yu , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 14/15] nvdimm/pmem: Stray write protection for pmem->virt_addr Date: Tue, 14 Jul 2020 00:02:19 -0700 Message-Id: <20200714070220.3500839-15-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200714070220.3500839-1-ira.weiny@intel.com> References: <20200714070220.3500839-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny The pmem driver uses a cached virtual address to access its memory directly. Because the nvdimm driver is well aware of the special protections it has mapped memory with, we call dev_access_[en|dis]able() around the direct pmem->virt_addr (pmem_addr) usage instead of the unnecessary overhead of trying to get a page to kmap. Signed-off-by: Ira Weiny --- drivers/nvdimm/pmem.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index d25e66fd942d..46c11a09b813 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -148,7 +148,9 @@ static blk_status_t pmem_do_read(struct pmem_device *pmem, if (unlikely(is_bad_pmem(&pmem->bb, sector, len))) return BLK_STS_IOERR; + dev_access_enable(); rc = read_pmem(page, page_off, pmem_addr, len); + dev_access_disable(); flush_dcache_page(page); return rc; } @@ -180,11 +182,13 @@ static blk_status_t pmem_do_write(struct pmem_device *pmem, * after clear poison. */ flush_dcache_page(page); + dev_access_enable(); write_pmem(pmem_addr, page, page_off, len); if (unlikely(bad_pmem)) { rc = pmem_clear_poison(pmem, pmem_off, len); write_pmem(pmem_addr, page, page_off, len); } + dev_access_disable(); return rc; } From patchwork Tue Jul 14 07:02:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11661731 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ADD9913B6 for ; Tue, 14 Jul 2020 07:04:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 94F0B22225 for ; Tue, 14 Jul 2020 07:04:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727058AbgGNHEe (ORCPT ); Tue, 14 Jul 2020 03:04:34 -0400 Received: from mga04.intel.com ([192.55.52.120]:50389 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727034AbgGNHEb (ORCPT ); Tue, 14 Jul 2020 03:04:31 -0400 IronPort-SDR: ich8xV8hFnmzCz3xenXCt5Gwp+Pu22XqcldrFpG6x50tDs0YTP3pCyVSc5MvlINldsLKGEd0I+ MNn2n1F8J6TA== X-IronPort-AV: E=McAfee;i="6000,8403,9681"; a="146304368" X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="146304368" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:30 -0700 IronPort-SDR: OlctpnIJwdlGQxS+Bsa4EvBdFZhgaMgi6E93HtogmUqsYd+MIKqgpv8ZX2Fg+6KeQMAbeYu29A wm1rPhIk0yOg== X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="459583558" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:29 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , x86@kernel.org, Dave Hansen , Dan Williams , Vishal Verma , Andrew Morton , Fenghua Yu , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 15/15] [dax|pmem]: Enable stray write protection Date: Tue, 14 Jul 2020 00:02:20 -0700 Message-Id: <20200714070220.3500839-16-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200714070220.3500839-1-ira.weiny@intel.com> References: <20200714070220.3500839-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny Protecting against stray writes is particularly important for PMEM because, unlike writes to anonymous memory, writes to PMEM persists across a reboot. Thus data corruption could result in permanent loss of data. Therefore, there is no option presented to the user. Enable stray write protection by setting the flag in pgmap which requests it. Note if Zone Device Access Protection not be supported this flag will have no affect. Signed-off-by: Ira Weiny --- drivers/dax/device.c | 2 ++ drivers/nvdimm/pmem.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/drivers/dax/device.c b/drivers/dax/device.c index 4c0af2eb7e19..884f66d73d32 100644 --- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -430,6 +430,8 @@ int dev_dax_probe(struct device *dev) } dev_dax->pgmap.type = MEMORY_DEVICE_DEVDAX; + dev_dax->pgmap.flags |= PGMAP_PROT_ENABLED; + addr = devm_memremap_pages(dev, &dev_dax->pgmap); if (IS_ERR(addr)) return PTR_ERR(addr); diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 46c11a09b813..9416a660eede 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -427,6 +427,8 @@ static int pmem_attach_disk(struct device *dev, return -EBUSY; } + pmem->pgmap.flags |= PGMAP_PROT_ENABLED; + q = blk_alloc_queue(pmem_make_request, dev_to_node(dev)); if (!q) return -ENOMEM;