From patchwork Tue Jul 14 07:02:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11661679 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5276114DD for ; Tue, 14 Jul 2020 07:04:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1BF9122200 for ; Tue, 14 Jul 2020 07:04:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1BF9122200 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4389F8D0008; Tue, 14 Jul 2020 03:04:16 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 343558D0006; Tue, 14 Jul 2020 03:04:16 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1970D8D0008; Tue, 14 Jul 2020 03:04:16 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0217.hostedemail.com [216.40.44.217]) by kanga.kvack.org (Postfix) with ESMTP id ECDBD8D0006 for ; Tue, 14 Jul 2020 03:04:15 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id AA904181AC9C6 for ; Tue, 14 Jul 2020 07:04:15 +0000 (UTC) X-FDA: 77035792470.09.cart91_1c0242526eef Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id 8EA75180AD804 for ; Tue, 14 Jul 2020 07:04:15 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ira.weiny@intel.com,,RULES_HIT:30030:30051:30054:30055:30064:30067:30069:30070:30090,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04y89kkww7eca4wfkwqqyp3a4hh8roc4edqatjakmu9xops4g5sy6z1s98e3jdc.nteoj439cc4w75gstcyqcsjc8ptckyqkq6zu414jop99w66dij6us5s5abd1rg6.c-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:22,LUA_SUMMARY:none X-HE-Tag: cart91_1c0242526eef X-Filterd-Recvd-Size: 10448 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Tue, 14 Jul 2020 07:04:14 +0000 (UTC) IronPort-SDR: 8lglwifqCG4wsumaOdguaiN5KHagEgVW54TgZi4rxdsFre5GpRBq7Zb8IGDxuIzPD44IvTQzBH kX3y51NsXb9w== X-IronPort-AV: E=McAfee;i="6000,8403,9681"; a="148828666" X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="148828666" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:10 -0700 IronPort-SDR: 2BwForAF8NIBLQ8QKx/Fab82qfbti+rxwheupRip9gzdF6hvypzmf9dFXHNlerP8hNn7uHyG1i egetIafJxTNA== X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="281659457" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:09 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Fenghua Yu , Ira Weiny , x86@kernel.org, Dave Hansen , Dan Williams , Vishal Verma , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 05/15] x86/pks: Add PKS kernel API Date: Tue, 14 Jul 2020 00:02:10 -0700 Message-Id: <20200714070220.3500839-6-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200714070220.3500839-1-ira.weiny@intel.com> References: <20200714070220.3500839-1-ira.weiny@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 8EA75180AD804 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Fenghua Yu PKS allows kernel users to define domains of page mappings which have additional protections beyond the paging protections. Add an API to allocate, use, and free a protection key which identifies such a domain. We export 2 new symbols pks_key_alloc() and pks_key_free() while pks_update_protection() is exposed as an inline function via header file. pks_key_alloc() reserves pkey 0 for default kernel pages. The other 15 keys are dynamically allocated to allow better use of the limited key space. This, and the fact that PKS may not be available on all arch's, means callers of the allocator _must_ be prepared for it to fail and take appropriate action to run without their allocated domain. This is not anticipated to be a problem as these protections only serve to harden memory and users should be no worse off than before the introduction of PKS. PAGE_KERNEL_PKEY(key) and _PAGE_PKEY(pkey) aid in setting page table entry bits by kernel users. Note these defines will be used in follow on patches but are included here for a complete interface. pks_update_protection() is inlined for performance and allows kernel users the ability to change the protections for the domain identified by the pkey specified. It is undefined behavior to call this on a pkey not allocated by the allocator. And will WARN_ON if called on architectures which do not support PKS. (Again callers are expected to check the return of pks_key_alloc() before using this API further.) Finally, pks_key_free() allows a user to return the key to the allocator for use by others. The interface maintains Access Disabled (AD=1) for all keys not currently allocated. Therefore, the user can depend on access being disabled when pks_key_alloc() returns a key. Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny Signed-off-by: Fenghua Yu --- arch/x86/include/asm/pgtable_types.h | 4 ++ arch/x86/include/asm/pkeys.h | 30 ++++++++++ arch/x86/include/asm/pkeys_internal.h | 4 ++ arch/x86/mm/pkeys.c | 79 +++++++++++++++++++++++++++ include/linux/pkeys.h | 14 +++++ 5 files changed, 131 insertions(+) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 816b31c68550..2ab45ef89c7d 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -73,6 +73,8 @@ _PAGE_PKEY_BIT2 | \ _PAGE_PKEY_BIT3) +#define _PAGE_PKEY(pkey) (_AT(pteval_t, pkey) << _PAGE_BIT_PKEY_BIT0) + #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) #define _PAGE_KNL_ERRATUM_MASK (_PAGE_DIRTY | _PAGE_ACCESSED) #else @@ -229,6 +231,8 @@ enum page_cache_mode { #define PAGE_KERNEL_IO __pgprot_mask(__PAGE_KERNEL_IO) #define PAGE_KERNEL_IO_NOCACHE __pgprot_mask(__PAGE_KERNEL_IO_NOCACHE) +#define PAGE_KERNEL_PKEY(pkey) __pgprot_mask(__PAGE_KERNEL | _PAGE_PKEY(pkey)) + #endif /* __ASSEMBLY__ */ /* xwr */ diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index 34cef29fed20..e30ea907abb6 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -138,4 +138,34 @@ static inline int vma_pkey(struct vm_area_struct *vma) u32 get_new_pkr(u32 old_pkr, int pkey, unsigned long init_val); +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +int pks_key_alloc(const char *const pkey_user); +void pks_key_free(int pkey); +u32 get_new_pkr(u32 old_pkr, int pkey, unsigned long init_val); + +/* + * pks_update_protection - Update the protection of the specified key + * + * @pkey: Key for the domain to change + * @protection: protection bits to be used + * + * Protection utilizes the same protection bits specified for User pkeys + * PKEY_DISABLE_ACCESS + * PKEY_DISABLE_WRITE + * + * This is not a global update. It only affects the current running thread. + * + * It is undefined and a bug for users to call this without having allocated a + * pkey and using it as pkey here. + */ +static inline void pks_update_protection(int pkey, unsigned long protection) +{ + current->thread.saved_pkrs = get_new_pkr(current->thread.saved_pkrs, + pkey, protection); + preempt_disable(); + write_pkrs(current->thread.saved_pkrs); + preempt_enable(); +} +#endif /* CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */ + #endif /*_ASM_X86_PKEYS_H */ diff --git a/arch/x86/include/asm/pkeys_internal.h b/arch/x86/include/asm/pkeys_internal.h index 05257cdc7200..e34f380c66d1 100644 --- a/arch/x86/include/asm/pkeys_internal.h +++ b/arch/x86/include/asm/pkeys_internal.h @@ -22,6 +22,10 @@ PKR_AD_KEY(10) | PKR_AD_KEY(11) | PKR_AD_KEY(12) | \ PKR_AD_KEY(13) | PKR_AD_KEY(14) | PKR_AD_KEY(15)) +/* PKS supports 16 keys. Key 0 is reserved for the kernel. */ +#define PKS_KERN_DEFAULT_KEY 0 +#define PKS_NUM_KEYS 16 + #ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS void write_pkrs(u32 pkrs_val); #else diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 0f86f2374bd7..16f735c12fcd 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -3,6 +3,9 @@ * Intel Memory Protection Keys management * Copyright (c) 2015, Intel Corporation. */ +#undef pr_fmt +#define pr_fmt(fmt) "x86/pkeys: " fmt + #include /* debugfs_create_u32() */ #include /* mm_struct, vma, etc... */ #include /* PKEY_* */ @@ -249,3 +252,79 @@ void write_pkrs(u32 pkrs_val) this_cpu_write(pkrs_cache, pkrs_val); wrmsrl(MSR_IA32_PKRS, pkrs_val); } + +DEFINE_MUTEX(pks_lock); +static const char pks_key_user0[] = "kernel"; + +/* Store names of allocated keys for debug. Key 0 is reserved for the kernel. */ +static const char *pks_key_users[PKS_NUM_KEYS] = { + pks_key_user0 +}; + +/* + * Each key is represented by a bit. Bit 0 is set for key 0 and reserved for + * its use. We use ulong for the bit operations but only 16 bits are used. + */ +static unsigned long pks_key_allocation_map = 1 << PKS_KERN_DEFAULT_KEY; + +/* + * pks_key_alloc - Allocate a PKS key + * + * @pkey_user: String stored for debugging of key exhaustion. The caller is + * responsible to maintain this memory until pks_key_free(). + */ +int pks_key_alloc(const char * const pkey_user) +{ + int nr, old_bit, pkey; + + might_sleep(); + + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return -EINVAL; + + mutex_lock(&pks_lock); + /* Find a free bit (0) in the bit map. */ + old_bit = 1; + while (old_bit) { + nr = ffz(pks_key_allocation_map); + old_bit = __test_and_set_bit(nr, &pks_key_allocation_map); + } + + if (nr < PKS_NUM_KEYS) { + pkey = nr; + /* for debugging key exhaustion */ + pks_key_users[pkey] = pkey_user; + } else { + pkey = -ENOSPC; + pr_info("Cannot allocate supervisor key for %s.\n", + pkey_user); + } + + mutex_unlock(&pks_lock); + return pkey; +} +EXPORT_SYMBOL_GPL(pks_key_alloc); + +/* + * pks_key_free - Free a previously allocate PKS key + * + * @pkey: Key to be free'ed + */ +void pks_key_free(int pkey) +{ + might_sleep(); + + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + if (pkey >= PKS_NUM_KEYS || pkey <= PKS_KERN_DEFAULT_KEY) + return; + + mutex_lock(&pks_lock); + __clear_bit(pkey, &pks_key_allocation_map); + pks_key_users[pkey] = NULL; + /* Restore to default AD=1 and WD=0. */ + pks_update_protection(pkey, PKEY_DISABLE_ACCESS); + mutex_unlock(&pks_lock); +} +EXPORT_SYMBOL_GPL(pks_key_free); diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h index 2955ba976048..e4bff77d7b49 100644 --- a/include/linux/pkeys.h +++ b/include/linux/pkeys.h @@ -50,4 +50,18 @@ static inline void copy_init_pkru_to_fpregs(void) #endif /* ! CONFIG_ARCH_HAS_PKEYS */ +#ifndef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +static inline int pks_key_alloc(const char * const pkey_user) +{ + return -EINVAL; +} +static inline void pks_key_free(int pkey) +{ +} +static inline void pks_update_protection(int pkey, unsigned long protection) +{ + WARN_ON_ONCE(1); +} +#endif /* ! CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */ + #endif /* _LINUX_PKEYS_H */