From patchwork Wed May 5 00:30:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 12238709 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2925C433B4 for ; Wed, 5 May 2021 00:32:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 832CE613CB for ; Wed, 5 May 2021 00:32:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 832CE613CB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A069D6B006C; Tue, 4 May 2021 20:32:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A4B86B0070; Tue, 4 May 2021 20:32:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 784CD6B0072; Tue, 4 May 2021 20:32:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0027.hostedemail.com [216.40.44.27]) by kanga.kvack.org (Postfix) with ESMTP id 4CBAF6B006C for ; Tue, 4 May 2021 20:32:29 -0400 (EDT) Received: from smtpin38.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 067338249980 for ; Wed, 5 May 2021 00:32:29 +0000 (UTC) X-FDA: 78105301218.38.9C892C6 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf09.hostedemail.com (Postfix) with ESMTP id 8DA756000105 for ; Wed, 5 May 2021 00:32:17 +0000 (UTC) IronPort-SDR: X3NcNgEgJw8GPyw3lMfgIputtmMV7OD3NjtCZOXNui+KEqmPXUwOV1Frjbz47SlytAiG4AZ/38 NVmcxaoPw27g== X-IronPort-AV: E=McAfee;i="6200,9189,9974"; a="197724321" X-IronPort-AV: E=Sophos;i="5.82,273,1613462400"; d="scan'208";a="197724321" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 May 2021 17:32:24 -0700 IronPort-SDR: FQrxJIOXl9kjvrzWySZvwZA5n2LRdmCwrFZ3HtrZs6RGPwdS5P3Kd3qwLAIlzOpv20rOSyaeQT ucepazSWdRBA== X-IronPort-AV: E=Sophos;i="5.82,273,1613462400"; d="scan'208";a="429490746" Received: from rpedgeco-mobl3.amr.corp.intel.com (HELO localhost.intel.com) ([10.209.26.68]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 May 2021 17:32:22 -0700 From: Rick Edgecombe To: dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org, linux-mm@kvack.org, x86@kernel.org, akpm@linux-foundation.org, linux-hardening@vger.kernel.org, kernel-hardening@lists.openwall.com Cc: ira.weiny@intel.com, rppt@kernel.org, dan.j.williams@intel.com, linux-kernel@vger.kernel.org, Rick Edgecombe Subject: [PATCH RFC 0/9] PKS write protected page tables Date: Tue, 4 May 2021 17:30:23 -0700 Message-Id: <20210505003032.489164-1-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Authentication-Results: imf09.hostedemail.com; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none); spf=none (imf09.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.115) smtp.mailfrom=rick.p.edgecombe@intel.com X-Stat-Signature: ywkhm4p993uagmfpgwm3pdd3bjgphzsh X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 8DA756000105 Received-SPF: none (intel.com>: No applicable sender policy available) receiver=imf09; identity=mailfrom; envelope-from=""; helo=mga14.intel.com; client-ip=192.55.52.115 X-HE-DKIM-Result: none/none X-HE-Tag: 1620174737-434047 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is a POC for write protecting page tables with PKS (Protection Keys for Supervisor) [1]. The basic idea is to make the page tables read only, except temporarily on a per-cpu basis when they need to be modified. I’m looking for opinions on whether people like the general direction of this in terms of value and implementation. Why would people want this? =========================== Page tables are the basis for many types of protections and as such, are a juicy target for attackers. Mapping them read-only will make them harder to use in attacks. This protects against an attacker that has acquired the ability to write to the page tables. It's not foolproof because an attacker who can execute arbitrary code can either disable PKS directly, or simply call the same functions that the kernel uses for legitimate page table writes. Why use PKS for this? ===================== PKS is an upcoming CPU feature that allows supervisor virtual memory permissions to be changed without flushing the TLB, like PKU does for user memory. Protecting page tables would normally be really expensive because you would have to do it with paging itself. PKS helps by providing a way to toggle the writability of the page tables with just a per-cpu MSR. Performance impacts =================== Setting direct map permissions on whatever random page gets allocated for a page table would result in a lot of kernel range shootdowns and direct map large page shattering. So the way the PKS page table memory is created is similar to this module page clustering series[2], where a cache of pages is replenished from 2MB pages such that the direct map permissions and associated breakage is localized on the direct map. In the PKS page tables case, a PKS key is pre-applied to the direct map for pages in the cache. There would be some costs of memory overhead in order to protect the direct map page tables. There would also be some extra kernel range shootdowns to replenish the cache on occasion, from setting the PKS key on the direct map of the new pages. I don’t have any actual performance data yet. This is based on V6 [1] of the core PKS infrastructure patches. PKS infrastructure follow-on’s are planned to enable keys to be set to the same permissions globally. Since this usage needs a key to be set globally read-only by default, a small temporary solution is hacked up in patch 8. Long term, PKS protected page tables would use a better and more generic solution to achieve this. [1] https://lore.kernel.org/lkml/20210401225833.566238-1-ira.weiny@intel.com/ [2] https://lore.kernel.org/lkml/20210405203711.1095940-1-rick.p.edgecombe@intel.com / Thanks, Rick Rick Edgecombe (9): list: Support getting most recent element in list_lru list: Support list head not in object for list_lru x86/mm/cpa: Add grouped page allocations mm: Explicitly zero page table lock ptr x86, mm: Use cache of page tables x86/mm/cpa: Add set_memory_pks() x86/mm/cpa: Add perm callbacks to grouped pages x86, mm: Protect page tables with PKS x86, cpa: PKS protect direct map page tables arch/x86/boot/compressed/ident_map_64.c | 5 + arch/x86/include/asm/pgalloc.h | 6 + arch/x86/include/asm/pgtable.h | 26 +- arch/x86/include/asm/pgtable_64.h | 33 ++- arch/x86/include/asm/pkeys_common.h | 8 +- arch/x86/include/asm/set_memory.h | 23 ++ arch/x86/mm/init.c | 40 +++ arch/x86/mm/pat/set_memory.c | 312 +++++++++++++++++++++++- arch/x86/mm/pgtable.c | 144 ++++++++++- include/asm-generic/pgalloc.h | 42 +++- include/linux/list_lru.h | 26 ++ include/linux/mm.h | 7 + mm/Kconfig | 6 +- mm/list_lru.c | 38 ++- mm/memory.c | 1 + mm/swap.c | 7 + mm/swap_state.c | 6 + 17 files changed, 705 insertions(+), 25 deletions(-)