From patchwork Tue Jul 14 07:02:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11661689 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 378B1913 for ; Tue, 14 Jul 2020 07:04:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 050D022205 for ; Tue, 14 Jul 2020 07:04:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 050D022205 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D5C9A8D0006; Tue, 14 Jul 2020 03:04:17 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C6C0A8D000A; Tue, 14 Jul 2020 03:04:17 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B116E8D0006; Tue, 14 Jul 2020 03:04:17 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0145.hostedemail.com [216.40.44.145]) by kanga.kvack.org (Postfix) with ESMTP id 848188D000A for ; Tue, 14 Jul 2020 03:04:17 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 40A81180AD802 for ; Tue, 14 Jul 2020 07:04:17 +0000 (UTC) X-FDA: 77035792554.04.talk01_1c04a0c26eef Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin04.hostedemail.com (Postfix) with ESMTP id 1C8DD800294C for ; Tue, 14 Jul 2020 07:04:17 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ira.weiny@intel.com,,RULES_HIT:30012:30051:30054:30055:30064:30070:30083,0,RBL:134.134.136.65:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04yfactssxjkxswahxkt97g5yqmptypubgy98wnjr3z7q63gd46cynuzasjztzg.kgk7t1usbyr6w6ukxg1pc9o14qerwzsoydyiruxsiw8xzqsi96d84ox33bxb7bp.1-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:30,LUA_SUMMARY:none X-HE-Tag: talk01_1c04a0c26eef X-Filterd-Recvd-Size: 7234 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf26.hostedemail.com (Postfix) with ESMTP for ; Tue, 14 Jul 2020 07:04:15 +0000 (UTC) IronPort-SDR: EQbHOIZLgMbTfoOBOHxhTZk2GPlEh/v4D8rAUkZwPoxLG2E0JXz8qRYYDyfqQNQV/QGNHZ7geB xzsJe83sHcCA== X-IronPort-AV: E=McAfee;i="6000,8403,9681"; a="148828669" X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="148828669" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:14 -0700 IronPort-SDR: Q2mRak2UgfiLrg3sZxYOuh5J0y3GGbFbI2P9jkFn7i2q77yP2LA96pV7XI2Q2m/00uJqp6RVgw jDz7HsCXwj3w== X-IronPort-AV: E=Sophos;i="5.75,350,1589266800"; d="scan'208";a="325755398" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2020 00:04:13 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , x86@kernel.org, Dave Hansen , Dan Williams , Vishal Verma , Andrew Morton , Fenghua Yu , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 07/15] Documentation/pkeys: Update documentation for kernel pkeys Date: Tue, 14 Jul 2020 00:02:12 -0700 Message-Id: <20200714070220.3500839-8-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200714070220.3500839-1-ira.weiny@intel.com> References: <20200714070220.3500839-1-ira.weiny@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 1C8DD800294C X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ira Weiny Future Intel CPUS will support Protection Key Supervisor (PKS). Update the protection key documentation to cover pkeys on supervisor pages. Signed-off-by: Ira Weiny --- Documentation/core-api/protection-keys.rst | 81 +++++++++++++++++----- 1 file changed, 63 insertions(+), 18 deletions(-) diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst index ec575e72d0b2..5ac400a5a306 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -4,25 +4,33 @@ Memory Protection Keys ====================== -Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature -which is found on Intel's Skylake (and later) "Scalable Processor" -Server CPUs. It will be available in future non-server Intel parts -and future AMD processors. - -For anyone wishing to test or use this feature, it is available in -Amazon's EC2 C5 instances and is known to work there using an Ubuntu -17.04 image. - Memory Protection Keys provides a mechanism for enforcing page-based protections, but without requiring modification of the page tables -when an application changes protection domains. It works by -dedicating 4 previously ignored bits in each page table entry to a -"protection key", giving 16 possible keys. +when an application changes protection domains. + +PKeys Userspace (PKU) is a feature which is found on Intel's Skylake "Scalable +Processor" Server CPUs and later. And It will be available in future +non-server Intel parts and future AMD processors. + +Future Intel processors will support Protection Keys for Supervisor pages +(PKS). + +For anyone wishing to test or use user space pkeys, it is available in Amazon's +EC2 C5 instances and is known to work there using an Ubuntu 17.04 image. + +pkes work by dedicating 4 previously Reserved bits in each page table entry to +a "protection key", giving 16 possible keys. User and Supervisor pages are +treated separately. -There is also a new user-accessible register (PKRU) with two separate -bits (Access Disable and Write Disable) for each key. Being a CPU -register, PKRU is inherently thread-local, potentially giving each -thread a different set of protections from every other thread. +Protections for each page are controlled with per CPU registers for each type +of page User and Supervisor. Each of these 32 bit register stores two separate +bits (Access Disable and Write Disable) for each key. + +For Userspace the register is user-accessible (rdpkru/wrpkru). For +Supervisor, the register (MSR_IA32_PKRS) is accessible only to the kernel. + +Being a CPU register, pkes are inherently thread-local, potentially giving +each thread an independent set of protections from every other thread. There are two new instructions (RDPKRU/WRPKRU) for reading and writing to the new register. The feature is only available in 64-bit mode, @@ -30,8 +38,11 @@ even though there is theoretically space in the PAE PTEs. These permissions are enforced on data access only and have no effect on instruction fetches. -Syscalls -======== +For kernel space rdmsr/wrmsr are used to access the kernel MSRs. + + +Syscalls for user space keys +============================ There are 3 system calls which directly interact with pkeys:: @@ -98,3 +109,37 @@ with a read():: The kernel will send a SIGSEGV in both cases, but si_code will be set to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when the plain mprotect() permissions are violated. + + +Kernel API for PKS support +========================== + +PKS is intended to harden against unwanted access to kernel pages. But it does +not completely restrict access under all conditions. For example the MSR +setting is not saved/restored during irqs. Thus the use of PKS is a mitigation +strategy rather than a form of strict security. + +The following calls are used to allocate, use, and deallocate a pkey which +defines a 'protection domain' within the kernel. Setting a pkey value in a +supervisor mapping adds that mapping to the protection domain. Then calls can be +used to enable/disable read and/or write access to all of the pages mapped with +that key: + + int pks_key_alloc(const char * const pkey_user); + #define PAGE_KERNEL_PKEY(pkey) + #define _PAGE_KEY(pkey) + int pks_update_protection(int pkey, unsigned long protection); + void pks_key_free(int pkey); + +In-kernel users must be prepared to set PAGE_KERNEL_PKEY() permission in the +page table entries for the mappings they want to ptorect. + +WARNING: It is imperative that callers check for errors from pks_key_alloc() +because pkeys are a limited resource and so callers should be prepared to work +without PKS support. + +For admins a debugfs interface provides a list of the current keys in use at: + + /sys/kernel/debug/x86/pks_keys_allocated + +Some example code can be found in lib/pks/pks_test.c