From patchwork Tue Jun 13 00:10:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13277725 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E9F0CA9EC0 for ; Tue, 13 Jun 2023 00:12:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4E8468E000C; Mon, 12 Jun 2023 20:12:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4969F8E000B; Mon, 12 Jun 2023 20:12:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2EB258E000C; Mon, 12 Jun 2023 20:12:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1DE298E0008 for ; Mon, 12 Jun 2023 20:12:21 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id E4A7A140370 for ; Tue, 13 Jun 2023 00:12:20 +0000 (UTC) X-FDA: 80895797640.30.F88E3F9 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf01.hostedemail.com (Postfix) with ESMTP id A929740004 for ; Tue, 13 Jun 2023 00:12:18 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=GT6z5n5c; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686615139; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PX1Ox8jo3hs6viUiDsDzJZ8BE9pewzLRSmXAF0uR7OU=; b=VTJ5KAQMm72zqOWuiD21J7pZAZ1C6+46IYERcp2cuPW2Sjek9pqqFLrAdFM27AIkw4zNPa 0VUH1l4X3Av+/4qLk2fe2ShdOUaZ5taHpV5zYfjiPkOU4ATscC6XWnx3AoKyrqW6v3VR21 lX0rX6I7EsLPbYAis8MZnxW2CQZjlmE= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=GT6z5n5c; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686615139; a=rsa-sha256; cv=none; b=amJ5V6FwC7vKnfjT0QxkGf5pjDY5DzC+Q4+h7/EpJK2F1DmyGYCNi8O9R7lDDOeZrLheHp u6Wbs5nhTVJer3xAHR/RHKsA4BtMqpXyPe9oiw9Z1uYMNUkUK9R4DiSeGFEHs0Y/9uDz5u +OcIhIVTcGzwbSFwm/AN+ArE9I/n5v8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686615138; x=1718151138; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=m52H3JZHS5dKvCQFGYyapfckCiGOGTRhvdDeSntBBxk=; b=GT6z5n5cPc3/7CkC8RombuOPzRbg7q9/C5J90C00Hgh8mgjHdMgeXL9N 0VzrqyJUph4JkyuukilJFZJLYI85mHI1bMtxRwtIpeOf6+HMk1YlwCT+/ 4x6aExPCRfR0ihc97J7NsKL1/hrv92pd82a1YkGrPLUri0r566zMAXpZJ be3qe/Ad0WNzbPMVLYD+ADMn4WkCOpn7DS2G0DORVJzBkmnsqtEnrRqU8 JJvXtOXGwhET8VGJcRpCl3oNXkAkFqJN4deBf+3kRwJErcfD6Bz3Qr+f/ MacIB0J78ir+FPo7znqSeMASifSyJ++0Rhag9CsiwT4yXee+Ncose3ZrU Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="361556898" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="361556898" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10739"; a="835671009" X-IronPort-AV: E=Sophos;i="6.00,238,1681196400"; d="scan'208";a="835671009" Received: from almeisch-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.amr.corp.intel.com) ([10.209.42.242]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 17:12:16 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com, torvalds@linux-foundation.org, broonie@kernel.org Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Pengfei Xu Subject: [PATCH v9 11/42] x86/mm: Update ptep/pmdp_set_wrprotect() for _PAGE_SAVED_DIRTY Date: Mon, 12 Jun 2023 17:10:37 -0700 Message-Id: <20230613001108.3040476-12-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> References: <20230613001108.3040476-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: A929740004 X-Stat-Signature: ha6e67g4iecmhj1ixhi51totfympzuou X-Rspam-User: X-HE-Tag: 1686615138-695957 X-HE-Meta: U2FsdGVkX1/2zdlDpdZ25Y5eNzqUbWj8oHRf97N2P5GbYGYlxKJImF9dAjMTnxpDZPwed1xTW/jl09Ji+yRjLQlExFWkcPFUxhXMdOqOdh2cQumybXS/CFJ2UVpmraxvX23fb80bzpuJKQYsfg9Oght/No1IC8Urwz8poJck5u+EtWcCAuoiUsMSiivru2mJbNQ9CMONxhH7uP4HwbgTIFlGpl5xe9blRH+HKQnFyNI/cl1H119CKw46M0NsmvTiOrTkrZtMeZPsRCJE/OjaPDRniMPH7FgnSOSD4fjWJtq3e7LWNwTX1b8Lsgt1Mh9nmsSgo1pHw01v3Jc0egTROZGDxLvVrNJONyW87oxfD0My0zpNV8qv7PqPzhrDL4sef1gufXL50RdEXXak9t0BzrGzLi5hmCZ94mZnQJZFP4VLKtc/HeHM1U+YAnjx6tTAXqS65mPLkhQzM+n+iUTUpbLpsOIYnuQok6zeUTabGeGuQLPkPn6H+TU0Ka+1+Pi7/JeXqkAIXpsU/B6pTfU4QN17MoNg0NaExr3Y79DLZV740o9JdvF7aDKahAePibY0vFyNq9XYkXEAv4M8U846PUwpW/AAIlNsR07tSTpMX8RyL4eqs1YhDu3W0OEX895CNRzRRLXI01Ib2tvbDXcBMg//Hkhu4/FfOfiN7SHtMguVvT40A70c1iblSt0sF3KO3GUu1uBoiFJZIwbuoldGlm8nGsjLZKCDptT43oPvoeXV/ygr5CdUZctWd9VfaNWIQ30DIigeRWVCVw+ednek47Kzrk6T0II559GtD3dzLVVSTBQSp22Y1xHn0XWD5mvv78m3gnIq5417s9wH8kJ3CI+u8iCqVZ6XM6YZz2x13g+9ZH8mHit3Z+MGEFggntcJ8UBBdGBKnUiw29YeSycCB9U8Q7qyUr+Id1GA++gG3j+70rBPFDy6WNbPvk56s3yQtky57RVFxs7v/vnyBvy AzbL4XPx UhzDhokIFUmG9DXw0ItcGh6Xo3kiPs6bccMxGQdy1NIgzj2H5HRI8UJ8NqnfcD89Osk4T/2afJVDHyKnjrctV1kB0mVfw6DJ74/JBgOU57mo1jiSI8JhLckHUG5AWO3BBJepej2HYjJ4aXJqbwM0Tvew59SxAJ3aUHZKbqlpN2tkvIOUKBrg6SR+IidmmLIMSsc4YJrqB8kblbxDjRert1TCIZ9oH8v3GE2nUqy3YRRRtDUll58zoPSX/dzcfBVmagMIk77TY+4ORnq/KWsFc50urYL6xBX/6yswC3sVuYdYXo2N5FxeJqaJp0nihqH9oAH1SoKzzLonmel7d4dPdkH0PzSLhXMA4BGnu4QhxxObgNCJ6QuTM5OnNXIkmb73XLUEw2uIB9a99RnkU0Z8PjF8KyYvprwTiSdFdGmJ0n4gGbK0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When shadow stack is in use, Write=0,Dirty=1 PTE are preserved for shadow stack. Copy-on-write PTEs then have Write=0,SavedDirty=1. When a PTE goes from Write=1,Dirty=1 to Write=0,SavedDirty=1, it could become a transient shadow stack PTE in two cases: 1. Some processors can start a write but end up seeing a Write=0 PTE by the time they get to the Dirty bit, creating a transient shadow stack PTE. However, this will not occur on processors supporting shadow stack, and a TLB flush is not necessary. 2. When _PAGE_DIRTY is replaced with _PAGE_SAVED_DIRTY non-atomically, a transient shadow stack PTE can be created as a result. Prevent the second case when doing a write protection and Dirty->SavedDirty shift at the same time with a CMPXCHG loop. The first case Note, in the PAE case CMPXCHG will need to operate on 8 byte, but try_cmpxchg() will not use CMPXCHG8B, so it cannot operate on a full PAE PTE. However the exiting logic is not operating on a full 8 byte region either, and relies on the fact that the Write bit is in the first 4 bytes when doing the clear_bit(). Since both the Dirty, SavedDirty and Write bits are in the first 4 bytes, casting to a long will be similar to the existing behavior which also casts to a long. Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many insights to the issue. Jann Horn provided the CMPXCHG solution. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v9: - Use bit shifting helpers that don't need any extra conditional logic. (Linus) - Always do the SavedDirty shifting (Linus) --- arch/x86/include/asm/pgtable.h | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index a95f872c7429..99b54ab0a919 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1189,7 +1189,17 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); + /* + * Avoid accidentally creating shadow stack PTEs + * (Write=0,Dirty=1). Use cmpxchg() to prevent races with + * the hardware setting Dirty=1. + */ + pte_t old_pte, new_pte; + + old_pte = READ_ONCE(*ptep); + do { + new_pte = pte_wrprotect(old_pte); + } while (!try_cmpxchg((long *)&ptep->pte, (long *)&old_pte, *(long *)&new_pte)); } #define flush_tlb_fix_spurious_fault(vma, address, ptep) do { } while (0) @@ -1241,7 +1251,17 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { - clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); + /* + * Avoid accidentally creating shadow stack PTEs + * (Write=0,Dirty=1). Use cmpxchg() to prevent races with + * the hardware setting Dirty=1. + */ + pmd_t old_pmd, new_pmd; + + old_pmd = READ_ONCE(*pmdp); + do { + new_pmd = pmd_wrprotect(old_pmd); + } while (!try_cmpxchg((long *)pmdp, (long *)&old_pmd, *(long *)&new_pmd)); } #ifndef pmdp_establish