From patchwork Fri Apr 30 19:52:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michel Lespinasse X-Patchwork-Id: 12234169 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F19C2C433B4 for ; Fri, 30 Apr 2021 19:52:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9CF566146D for ; Fri, 30 Apr 2021 19:52:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9CF566146D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=lespinasse.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E16DE6B0075; Fri, 30 Apr 2021 15:52:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D0CC16B0082; Fri, 30 Apr 2021 15:52:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7CF5E6B0073; Fri, 30 Apr 2021 15:52:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0037.hostedemail.com [216.40.44.37]) by kanga.kvack.org (Postfix) with ESMTP id EDD3E6B0075 for ; Fri, 30 Apr 2021 15:52:36 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id AE4EE180945C5 for ; Fri, 30 Apr 2021 19:52:36 +0000 (UTC) X-FDA: 78090080712.27.9396D4E Received: from server.lespinasse.org (server.lespinasse.org [63.205.204.226]) by imf26.hostedemail.com (Postfix) with ESMTP id A6DD740002E2 for ; Fri, 30 Apr 2021 19:52:26 +0000 (UTC) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-14-ed; t=1619812353; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=/loUZ02DI4vLNq42pqfAgn2FTqDP65md0qhN70FxHpk=; b=IwSGLPkn4zooFRT7/gWLCebPara7aRpy5Qj+LPZ0tukfbpfTRenz2PxZwOHaT0wEqGD31 h5DpMgPRFwLsUemAA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-14-rsa; t=1619812353; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=/loUZ02DI4vLNq42pqfAgn2FTqDP65md0qhN70FxHpk=; b=QCnRPKzkUxRk2aTYrFT8g5Z9E+pght0bvblf0FiErmKwNeEu/SgIE+yk0D47V3Sml/KhL ozkbM6lBlWu8LbZm7ex+QeqNn3HC68pgPjG/DJaqj/7bS9xI9cuUZVqUGBX9T5SkO6cUko/ /UW9i09OLWfQVhfmU7z7s9HfDlTQwokpKDbzuvWEacFEH3m7ga9Nzzahwi6mbNsGAqIjGIV HYj0ov47CZyiJcfF6gb98pOvWiAmK52W2C0l3KeS0Rr0Na4LS82iTJEwr0G+jMSxCfKQ6xW g8HfGfrU63q1nT0cE//UZWITnSuZc2MWRjuQ+HcfxBv+APJFbEAZvx+7kqIw== Received: from zeus.lespinasse.org (zeus.lespinasse.org [10.0.0.150]) by server.lespinasse.org (Postfix) with ESMTPS id 68992160325; Fri, 30 Apr 2021 12:52:33 -0700 (PDT) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id 59C6019F522; Fri, 30 Apr 2021 12:52:33 -0700 (PDT) From: Michel Lespinasse To: Linux-MM , Linux-Kernel Cc: Laurent Dufour , Peter Zijlstra , Michal Hocko , Matthew Wilcox , Rik van Riel , Paul McKenney , Andrew Morton , Suren Baghdasaryan , Joel Fernandes , Andy Lutomirski , Michel Lespinasse Subject: [PATCH 14/29] mm: refactor __handle_mm_fault() / handle_pte_fault() Date: Fri, 30 Apr 2021 12:52:15 -0700 Message-Id: <20210430195232.30491-15-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210430195232.30491-1-michel@lespinasse.org> References: <20210430195232.30491-1-michel@lespinasse.org> MIME-Version: 1.0 Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=lespinasse.org header.s=srv-14-ed header.b=IwSGLPkn; dkim=pass header.d=lespinasse.org header.s=srv-14-rsa header.b=QCnRPKzk; dmarc=pass (policy=none) header.from=lespinasse.org; spf=pass (imf26.hostedemail.com: domain of walken@lespinasse.org designates 63.205.204.226 as permitted sender) smtp.mailfrom=walken@lespinasse.org X-Stat-Signature: 7unnn9gb94gwcgda4rsy7a513wqgyc8p X-Rspamd-Queue-Id: A6DD740002E2 X-Rspamd-Server: rspam05 Received-SPF: none (lespinasse.org>: No applicable sender policy available) receiver=imf26; identity=mailfrom; envelope-from=""; helo=server.lespinasse.org; client-ip=63.205.204.226 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1619812346-364927 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Move the code that initializes vmf->pte and vmf->orig_pte from handle_pte_fault() to its single call site in __handle_mm_fault(). This ensures vmf->pte is now initialized together with the higher levels of the page table hierarchy. This also prepares for speculative page fault handling, where the entire page table walk (higher levels down to ptes) needs special care in the speculative case. Signed-off-by: Michel Lespinasse --- mm/memory.c | 98 ++++++++++++++++++++++++++--------------------------- 1 file changed, 49 insertions(+), 49 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index b28047765de7..45696166b10f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3538,7 +3538,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) if (pte_alloc(vma->vm_mm, vmf->pmd)) return VM_FAULT_OOM; - /* See comment in handle_pte_fault() */ + /* See comment in __handle_mm_fault() */ if (unlikely(pmd_trans_unstable(vmf->pmd))) return 0; @@ -3819,7 +3819,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf) return VM_FAULT_OOM; } - /* See comment in handle_pte_fault() */ + /* See comment in __handle_mm_fault() */ if (pmd_devmap_trans_unstable(vmf->pmd)) return 0; @@ -4275,53 +4275,6 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) { pte_t entry; - if (unlikely(pmd_none(*vmf->pmd))) { - /* - * Leave __pte_alloc() until later: because vm_ops->fault may - * want to allocate huge page, and if we expose page table - * for an instant, it will be difficult to retract from - * concurrent faults and from rmap lookups. - */ - vmf->pte = NULL; - } else { - /* - * If a huge pmd materialized under us just retry later. Use - * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead - * of pmd_trans_huge() to ensure the pmd didn't become - * pmd_trans_huge under us and then back to pmd_none, as a - * result of MADV_DONTNEED running immediately after a huge pmd - * fault in a different thread of this mm, in turn leading to a - * misleading pmd_trans_huge() retval. All we have to ensure is - * that it is a regular pmd that we can walk with - * pte_offset_map() and we can do that through an atomic read - * in C, which is what pmd_trans_unstable() provides. - */ - if (pmd_devmap_trans_unstable(vmf->pmd)) - return 0; - /* - * A regular pmd is established and it can't morph into a huge - * pmd from under us anymore at this point because we hold the - * mmap_lock read mode and khugepaged takes it in write mode. - * So now it's safe to run pte_offset_map(). - */ - vmf->pte = pte_offset_map(vmf->pmd, vmf->address); - vmf->orig_pte = *vmf->pte; - - /* - * some architectures can have larger ptes than wordsize, - * e.g.ppc44x-defconfig has CONFIG_PTE_64BIT=y and - * CONFIG_32BIT=y, so READ_ONCE cannot guarantee atomic - * accesses. The code below just needs a consistent view - * for the ifs and we later double check anyway with the - * ptl lock held. So here a barrier will do. - */ - barrier(); - if (pte_none(vmf->orig_pte)) { - pte_unmap(vmf->pte); - vmf->pte = NULL; - } - } - if (!vmf->pte) { if (vma_is_anonymous(vmf->vma)) return do_anonymous_page(vmf); @@ -4461,6 +4414,53 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, } } + if (unlikely(pmd_none(*vmf.pmd))) { + /* + * Leave __pte_alloc() until later: because vm_ops->fault may + * want to allocate huge page, and if we expose page table + * for an instant, it will be difficult to retract from + * concurrent faults and from rmap lookups. + */ + vmf.pte = NULL; + } else { + /* + * If a huge pmd materialized under us just retry later. Use + * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead + * of pmd_trans_huge() to ensure the pmd didn't become + * pmd_trans_huge under us and then back to pmd_none, as a + * result of MADV_DONTNEED running immediately after a huge pmd + * fault in a different thread of this mm, in turn leading to a + * misleading pmd_trans_huge() retval. All we have to ensure is + * that it is a regular pmd that we can walk with + * pte_offset_map() and we can do that through an atomic read + * in C, which is what pmd_trans_unstable() provides. + */ + if (pmd_devmap_trans_unstable(vmf.pmd)) + return 0; + /* + * A regular pmd is established and it can't morph into a huge + * pmd from under us anymore at this point because we hold the + * mmap_lock read mode and khugepaged takes it in write mode. + * So now it's safe to run pte_offset_map(). + */ + vmf.pte = pte_offset_map(vmf.pmd, vmf.address); + vmf.orig_pte = *vmf.pte; + + /* + * some architectures can have larger ptes than wordsize, + * e.g.ppc44x-defconfig has CONFIG_PTE_64BIT=y and + * CONFIG_32BIT=y, so READ_ONCE cannot guarantee atomic + * accesses. The code below just needs a consistent view + * for the ifs and we later double check anyway with the + * ptl lock held. So here a barrier will do. + */ + barrier(); + if (pte_none(vmf.orig_pte)) { + pte_unmap(vmf.pte); + vmf.pte = NULL; + } + } + return handle_pte_fault(&vmf); }