From patchwork Fri Jul 17 07:20:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 11669419 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BB7EF1392 for ; Fri, 17 Jul 2020 07:37:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 809922071A for ; Fri, 17 Jul 2020 07:37:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 809922071A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 682938D000D; Fri, 17 Jul 2020 03:37:50 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 632598D0001; Fri, 17 Jul 2020 03:37:50 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4FE968D000D; Fri, 17 Jul 2020 03:37:50 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0050.hostedemail.com [216.40.44.50]) by kanga.kvack.org (Postfix) with ESMTP id 373588D0001 for ; Fri, 17 Jul 2020 03:37:50 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id DBC0F6D83 for ; Fri, 17 Jul 2020 07:37:49 +0000 (UTC) X-FDA: 77046763458.23.work92_150882d26f09 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id 46407166A1 for ; Fri, 17 Jul 2020 07:36:41 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ira.weiny@intel.com,,RULES_HIT:30012:30045:30051:30054:30056:30064:30070:30074:30079:30090,0,RBL:134.134.136.31:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yrmkh3g7upaboq3iew6gp6qct8jopap7wk75x6mqrnjdfqzajyrhdn91eod4t.wdgj5p9bbw67y44yux7gp3aisuxr95n8kgxgd8wdjynrrrin8fh77gauqqdfoud.r-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: work92_150882d26f09 X-Filterd-Recvd-Size: 10595 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Fri, 17 Jul 2020 07:32:04 +0000 (UTC) IronPort-SDR: 0RjpqFRU8kfryjHui+MpwDCeEq3mLJt4/kzLut6CH/BLKr12+HptxH1vNZ8UjFo/G2IiBi0/ad qP8kKwGKFFKQ== X-IronPort-AV: E=McAfee;i="6000,8403,9684"; a="211088491" X-IronPort-AV: E=Sophos;i="5.75,362,1589266800"; d="scan'208";a="211088491" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jul 2020 00:21:00 -0700 IronPort-SDR: I2kPLSZY3sAJB+AtMjz54IDv5wzrL5svBf5Z2YmpNMJTlVEZtD0KsDT1urlyovYN6eDZ/bnV7t EEcE83OL6BIw== X-IronPort-AV: E=Sophos;i="5.75,362,1589266800"; d="scan'208";a="325362807" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jul 2020 00:21:00 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , x86@kernel.org, Dave Hansen , Dan Williams , Vishal Verma , Andrew Morton , Fenghua Yu , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [PATCH RFC V2 00/17] PKS: Add Protection Keys Supervisor (PKS) support Date: Fri, 17 Jul 2020 00:20:39 -0700 Message-Id: <20200717072056.73134-1-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 MIME-Version: 1.0 X-Rspamd-Queue-Id: 46407166A1 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ira Weiny This RFC series has been reviewed by Dave Hansen. Changes from RFC: Clean up commit messages based on Peter Zijlstra's and Dave Hansen's feedback Fix static branch anti-pattern New patch: (memremap: Convert devmap static branch to {inc,dec}) This was the code I used as a model for my static branch which I believe is wrong now. New Patch: (x86/entry: Preserve PKRS MSR through exceptions) This attempts to preserve the per-logical-processor MSR, and reference counting during exceptions. I'd really like feed back on this because I _think_ it should work but I'm afraid I'm missing something as my testing has shown a lot of spotty crashes which don't make sense to me. This patch set introduces a new page protection mechanism for supervisor pages, Protection Key Supervisor (PKS) and an initial user of them, persistent memory, PMEM. PKS enables protections on 'domains' of supervisor pages to limit supervisor mode access to those pages beyond the normal paging protections. They work in a similar fashion to user space pkeys. Like User page pkeys (PKU), supervisor pkeys are checked in addition to normal paging protections and Access or Writes can be disabled via a MSR update without TLB flushes when permissions change. A page mapping is assigned to a domain by setting a pkey in the page table entry. Unlike User pkeys no new instructions are added; rather WRMSR/RDMSR are used to update the PKRS register. XSAVE is not supported for the PKRS MSR. To reduce software complexity the implementation saves/restores the MSR across context switches but not during irqs. This is a compromise which results is a hardening of unwanted access without absolute restriction. For consistent behavior with current paging protections, pkey 0 is reserved and configured to allow full access via the pkey mechanism, thus preserving the default paging protections on mappings with the default pkey value of 0. Other keys, (1-15) are allocated by an allocator which prepares us for key contention from day one. Kernel users should be prepared for the allocator to fail either because of key exhaustion or due to PKS not being supported on the arch and/or CPU instance. Protecting against stray writes is particularly important for PMEM because, unlike writes to anonymous memory, writes to PMEM persists across a reboot. Thus data corruption could result in permanent loss of data. The following attributes of PKS makes it perfect as a mechanism to protect PMEM from stray access within the kernel: 1) Fast switching of permissions 2) Prevents access without page table manipulations 3) Works on a per thread basis 4) No TLB flushes required The second half of this series thus uses the PKS mechanism to protect PMEM from stray access. PKS is available with 4 and 5 level paging. Like PKRU is takes 4 bits from the PTE to store the pkey within the entry. Implementation details ---------------------- Modifications of task struct in patches: (x86/pks: Preserve the PKRS MSR on context switch) (memremap: Add zone device access protection) Because pkey access is per-thread 2 modifications are made to the task struct. The first is a saved copy of the MSR during context switches. The second reference counts access to the device domain to correctly handle kmap nesting properly. Maintain PKS setting in a re-entrant manner in patch: (memremap: Add zone device access protection) (x86/entry: Preserve PKRS MSR through exceptions) Using local_irq_save() seems to be the safest and fastest way to maintain kmap as re-entrant. But there may be a better way. spin_lock_irq() and atomic counters were considered. But atomic counters do not properly protect the pkey update and spin_lock_irq() would deadlock. Suggestions are welcome. Also preserving the pks state requires the exception handling code to store the ref count during exception processing. This seems like a layering violation but it works. The use of kmap in patch: (kmap: Add stray write protection for device pages) To keep general access to PMEM pages general, we piggy back on the kmap() interface as there are many places in the kernel who do not have, nor should be required to have, a priori knowledge that a page is PMEM. The modifications to the kmap code is careful to quickly determine which pages don't require special handling to reduce overhead for non PMEM pages. Breakdown of patches -------------------- Implement PKS within x86 arch: x86/pkeys: Create pkeys_internal.h x86/fpu: Refactor arch_set_user_pkey_access() for PKS support x86/pks: Enable Protection Keys Supervisor (PKS) x86/pks: Preserve the PKRS MSR on context switch x86/pks: Add PKS kernel API x86/pks: Add a debugfs file for allocated PKS keys Documentation/pkeys: Update documentation for kernel pkeys x86/pks: Add PKS Test code pre-req bug fixes for dax: fs/dax: Remove unused size parameter drivers/dax: Expand lock scope to cover the use of addresses Add stray write protection to PMEM: memremap: Add zone device access protection kmap: Add stray write protection for device pages dax: Stray write protection for dax_direct_access() nvdimm/pmem: Stray write protection for pmem->virt_addr [dax|pmem]: Enable stray write protection Fenghua Yu (4): x86/fpu: Refactor arch_set_user_pkey_access() for PKS support x86/pks: Enable Protection Keys Supervisor (PKS) x86/pks: Add PKS kernel API x86/pks: Add a debugfs file for allocated PKS keys Ira Weiny (13): x86/pkeys: Create pkeys_internal.h x86/pks: Preserve the PKRS MSR on context switch Documentation/pkeys: Update documentation for kernel pkeys x86/pks: Add PKS Test code memremap: Convert devmap static branch to {inc,dec} fs/dax: Remove unused size parameter drivers/dax: Expand lock scope to cover the use of addresses memremap: Add zone device access protection kmap: Add stray write protection for device pages dax: Stray write protection for dax_direct_access() nvdimm/pmem: Stray write protection for pmem->virt_addr [dax|pmem]: Enable stray write protection x86/entry: Preserve PKRS MSR across exceptions Documentation/core-api/protection-keys.rst | 81 +++- arch/x86/Kconfig | 1 + arch/x86/entry/common.c | 78 +++- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/idtentry.h | 2 + arch/x86/include/asm/msr-index.h | 1 + arch/x86/include/asm/pgtable.h | 13 +- arch/x86/include/asm/pgtable_types.h | 4 + arch/x86/include/asm/pkeys.h | 43 ++ arch/x86/include/asm/pkeys_internal.h | 36 ++ arch/x86/include/asm/processor.h | 13 + arch/x86/include/uapi/asm/processor-flags.h | 2 + arch/x86/kernel/cpu/common.c | 17 + arch/x86/kernel/fpu/xstate.c | 17 +- arch/x86/kernel/process.c | 34 ++ arch/x86/mm/fault.c | 16 +- arch/x86/mm/pkeys.c | 174 +++++++- drivers/dax/device.c | 2 + drivers/dax/super.c | 5 +- drivers/nvdimm/pmem.c | 6 + fs/dax.c | 13 +- include/linux/highmem.h | 32 +- include/linux/memremap.h | 1 + include/linux/mm.h | 33 ++ include/linux/pkeys.h | 18 + include/linux/sched.h | 3 + init/init_task.c | 3 + kernel/fork.c | 3 + lib/Kconfig.debug | 12 + lib/Makefile | 3 + lib/pks/Makefile | 3 + lib/pks/pks_test.c | 452 ++++++++++++++++++++ mm/Kconfig | 15 + mm/memremap.c | 105 ++++- tools/testing/selftests/x86/Makefile | 3 +- tools/testing/selftests/x86/test_pks.c | 65 +++ 36 files changed, 1243 insertions(+), 67 deletions(-) create mode 100644 arch/x86/include/asm/pkeys_internal.h create mode 100644 lib/pks/Makefile create mode 100644 lib/pks/pks_test.c create mode 100644 tools/testing/selftests/x86/test_pks.c