From patchwork Thu Oct 11 15:15:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu-cheng Yu X-Patchwork-Id: 10636881 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4E03917E1 for ; Thu, 11 Oct 2018 15:21:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3BB1D2B9F7 for ; Thu, 11 Oct 2018 15:21:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2F4E32BA29; Thu, 11 Oct 2018 15:21:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 95C052BA26 for ; Thu, 11 Oct 2018 15:21:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5E1BC6B0287; Thu, 11 Oct 2018 11:21:01 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4EC596B0289; Thu, 11 Oct 2018 11:21:01 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E9CD76B0280; Thu, 11 Oct 2018 11:21:00 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by kanga.kvack.org (Postfix) with ESMTP id 88C766B0280 for ; Thu, 11 Oct 2018 11:21:00 -0400 (EDT) Received: by mail-pl1-f199.google.com with SMTP id e3-v6so6548374pld.13 for ; Thu, 11 Oct 2018 08:21:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=fbIU10g+3FIH2F9V39N52dFDZKOYUCa0+VKpaAcOC2w=; b=AI4Ajq8szF974oPQ/R7lDnagYpy/lWuBzdgT4+0Y40W6ZEqfzd0VQ17ThRUNyVMWeP XY09ColEv5njC0raFL/FFg3rnvunfVtBIUB4qQe0G1MQRcwBTzm3oZTvm+pVRY1QU/4B Zi1WZhWeaRQiFnrIgFENZ7P84yRvbouwRpbAIXdAhKyc/0zJDXjzO4Snw6PVoE6vDpVr SyqNYCRdXdt7eiVSSLFFNRezXBTSBTZ/iIZRBpRa6m3sw9WqIkT+bih6sBTolYJfGAOZ Z0z3VQPIFPp7AJ/lPbt+oS8qKCc1gcPjqW9XIa8GMrAg5Na4btUsXCRfVM5wf4uUn8TE yBDQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of yu-cheng.yu@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=yu-cheng.yu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfojWTEUtZRH6OOrjGxuOWO3QEnaKJ3/53YDz6O/rpud5bpz1olHI rGARwynW49m7qlWMttHY8ROXG5mu3Dx5Eyj/NLaOQgLpWI1V4LTwuZDZfHXbafsHYjFboOb2L/x 7yrEwz6mXlzJhGOsQbpLO02c3SQr6nIw0ZOLqpmgj83l0Z7usZb71HlJtAYNmHosuTg== X-Received: by 2002:a62:3a84:: with SMTP id v4-v6mr2005771pfj.118.1539271260247; Thu, 11 Oct 2018 08:21:00 -0700 (PDT) X-Google-Smtp-Source: ACcGV61xDSYN8RhJnqfMqD0kvPwhOBw0hlQQHZcJqeJKkHd9v/iKZbu3aIV/kiE1r11Ccv1vQBSM X-Received: by 2002:a62:3a84:: with SMTP id v4-v6mr2005710pfj.118.1539271259341; Thu, 11 Oct 2018 08:20:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539271259; cv=none; d=google.com; s=arc-20160816; b=xRRcwYC727BRhmegGYyf+Ah48qia8TGo2p997StfGmriSTCivnA9kwiDfS+cxnKWW9 ux8FVT4Y7mLmDEp+9SCNAm5Y9+cxl3oAl+T13N5CGvlhwm0XHczlmbFZZWYynzQ0/I4S qIqc6/m5CtdTzsaDdHERcWyAevckQd1t60pEp2n3pEtnnRLVxdHVWB7nsDpGDr7qRHsf e4uBJLvGva5AXi3qp2kZcf2Ec1W+vF6T0UtPhXb9pOsMUiewVJk/a0gNn5H0cH6UiMTt kS6xmDdC4kO+TtiERcERV4Pxwr8Ey6zG2NejZYZXNBqgrRKyuIU59cIB0GBDpXs4IZSY pSIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=fbIU10g+3FIH2F9V39N52dFDZKOYUCa0+VKpaAcOC2w=; b=wL5PN9OL/48e9qvEuSjuj6sxHGK6KeitWc5jh2OywrXlYhbvJIvhR36xFEifZZIGhS 9OyH9OquzG3Y6LOrxJPNTeN2SJvm79x2/o9E/X4d0TSLQUh+zNL05Uxw4vBMMq0VaBI5 Rrsa/WSVZqU3oZ5cGo+BBvopdaN1FV9+vi6M5YlYgFzsKkRmfvUkPfMqnbiC/O5YWXOr 8BIL1E40eMhUEweFiJNvgx4csJYFZW2/GnUyNyhvqxHx3Oya5AOSY7N8Ddh70/CnQgb3 36dyU1ua1QcaJAGag8IJ62uKICdfq0bvz15vuAo59I1uaDC9YhGnfCxRwY1toVQnTbjm XCsg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of yu-cheng.yu@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=yu-cheng.yu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga06.intel.com (mga06.intel.com. [134.134.136.31]) by mx.google.com with ESMTPS id g9-v6si6571810plo.328.2018.10.11.08.20.59 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Oct 2018 08:20:59 -0700 (PDT) Received-SPF: pass (google.com: domain of yu-cheng.yu@intel.com designates 134.134.136.31 as permitted sender) client-ip=134.134.136.31; Authentication-Results: mx.google.com; spf=pass (google.com: domain of yu-cheng.yu@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=yu-cheng.yu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Oct 2018 08:20:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,368,1534834800"; d="scan'208";a="78019123" Received: from 2b52.sc.intel.com ([143.183.136.147]) by fmsmga008.fm.intel.com with ESMTP; 11 Oct 2018 08:20:45 -0700 From: Yu-cheng Yu To: x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue Cc: Yu-cheng Yu Subject: [PATCH v5 14/27] x86/mm: Modify ptep_set_wrprotect and pmdp_set_wrprotect for _PAGE_DIRTY_SW Date: Thu, 11 Oct 2018 08:15:10 -0700 Message-Id: <20181011151523.27101-15-yu-cheng.yu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181011151523.27101-1-yu-cheng.yu@intel.com> References: <20181011151523.27101-1-yu-cheng.yu@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When Shadow Stack is enabled, the [R/O + PAGE_DIRTY_HW] setting is reserved only for the Shadow Stack. For non-Shadow Stack R/O PTEs, we use [R/O + PAGE_DIRTY_SW]. When a PTE goes from [R/W + PAGE_DIRTY_HW] to [R/O + PAGE_DIRTY_SW], it could become a transient Shadow Stack PTE in two cases. The first case is that some processors can start a write but end up seeing a read-only PTE by the time they get to the Dirty bit, creating a transient Shadow Stack PTE. However, this will not occur on processors supporting Shadow Stack therefore we don't need a TLB flush here. The second case is that when the software, without atomic, tests & replaces PAGE_DIRTY_HW with PAGE_DIRTY_SW, a transient Shadow Stack PTE can exist. This is prevented with cmpxchg. Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many insights to the issue. Jann Horn provided the cmpxchg solution. Signed-off-by: Yu-cheng Yu --- arch/x86/include/asm/pgtable.h | 58 ++++++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 3ee554d81480..b6e0ee5c5503 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1203,7 +1203,36 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { +#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER + pte_t new_pte, pte = READ_ONCE(*ptep); + + /* + * Some processors can start a write, but end up + * seeing a read-only PTE by the time they get + * to the Dirty bit. In this case, they will + * set the Dirty bit, leaving a read-only, Dirty + * PTE which looks like a Shadow Stack PTE. + * + * However, this behavior has been improved and + * will not occur on processors supporting + * Shadow Stacks. Without this guarantee, a + * transition to a non-present PTE and flush the + * TLB would be needed. + * + * When changing a writable PTE to read-only and + * if the PTE has _PAGE_DIRTY_HW set, we move + * that bit to _PAGE_DIRTY_SW so that the PTE is + * not a valid Shadow Stack PTE. + */ + do { + new_pte = pte_wrprotect(pte); + new_pte.pte |= (new_pte.pte & _PAGE_DIRTY_HW) >> + _PAGE_BIT_DIRTY_HW << _PAGE_BIT_DIRTY_SW; + new_pte.pte &= ~_PAGE_DIRTY_HW; + } while (!try_cmpxchg(ptep, &pte, new_pte)); +#else clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); +#endif } #define flush_tlb_fix_spurious_fault(vma, address) do { } while (0) @@ -1266,7 +1295,36 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { +#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER + pmd_t new_pmd, pmd = READ_ONCE(*pmdp); + + /* + * Some processors can start a write, but end up + * seeing a read-only PMD by the time they get + * to the Dirty bit. In this case, they will + * set the Dirty bit, leaving a read-only, Dirty + * PMD which looks like a Shadow Stack PMD. + * + * However, this behavior has been improved and + * will not occur on processors supporting + * Shadow Stacks. Without this guarantee, a + * transition to a non-present PMD and flush the + * TLB would be needed. + * + * When changing a writable PMD to read-only and + * if the PMD has _PAGE_DIRTY_HW set, we move + * that bit to _PAGE_DIRTY_SW so that the PMD is + * not a valid Shadow Stack PMD. + */ + do { + new_pmd = pmd_wrprotect(pmd); + new_pmd.pmd |= (new_pmd.pmd & _PAGE_DIRTY_HW) >> + _PAGE_BIT_DIRTY_HW << _PAGE_BIT_DIRTY_SW; + new_pmd.pmd &= ~_PAGE_DIRTY_HW; + } while (!try_cmpxchg(pmdp, &pmd, new_pmd)); +#else clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); +#endif } #define pud_write pud_write