From patchwork Fri Nov 30 19:58:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 10707079 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 705671057 for ; Fri, 30 Nov 2018 19:58:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 61744304C9 for ; Fri, 30 Nov 2018 19:58:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 54977304F5; Fri, 30 Nov 2018 19:58:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 835F5304C9 for ; Fri, 30 Nov 2018 19:58:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 45BC36B59EF; Fri, 30 Nov 2018 14:58:18 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3BD8B6B59F0; Fri, 30 Nov 2018 14:58:18 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1EA286B59F1; Fri, 30 Nov 2018 14:58:18 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-yb1-f198.google.com (mail-yb1-f198.google.com [209.85.219.198]) by kanga.kvack.org (Postfix) with ESMTP id DB30C6B59EF for ; Fri, 30 Nov 2018 14:58:17 -0500 (EST) Received: by mail-yb1-f198.google.com with SMTP id t72-v6so1335607ybi.4 for ; Fri, 30 Nov 2018 11:58:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:subject:date:message-id :in-reply-to:references; bh=kCDMHDgxF4RHKOeveWI/4AVg8a5tgkW4DZ503EDE0GE=; b=V5vBxU+iSTPdBC7HlQA1ayOzLrNdjOP/Q06J+YuXzrx8TQOu3DES64WCzWb4zn3QdF dselYFSRMXV77jUaP4A0Nka2jk7RBZHrmHKJ74Bp/SqRtcFKKOnQgtZfe/ZkhyHSQZ7q g2Hj2JLChFB1xAygHDrnIii/ZAZGHKLLEg1YjYU8RKdipgMWoZpDyBmGgsBkBnh0zFQq YHRyURarnnL2ioXq+6B+UBWFDviJsVysQbUeEO7QptkBjJCkg3bbPx1v7PosvmoxWyU9 ffQDymhQ/Yg+KZlvpkRpbUvzEM48bQ3Jrb82lt4HMpUKvuFg418tFt2Wp6SrJNXrsBMj 8YnA== X-Gm-Message-State: AA+aEWZ9SDcOvw7NUxrvdp1S4Idi7OP0zXGqa6PbPOCjJSz8a2Flj01a Pnrlm6L8EYZpQIvvFg1p4W5KBIQqwdsjmSd5seMNgsA/RCu7xixIHKuCAyeUqJguhXO1tLzY/75 cC9el3RzgRJ0dNdRiHo+y33SetIMB3V65KbqV/QF/FQ4LyFJPQepbmzIvMq9+4VdZgOBOzSInSd +kJvb8JkFndHDCmute+X1FBYm009W+MhoHxW4fTvYkDtlMPT/u9ovb3KOF+wjoVYFnpGqTNnyJp Duoe5uOjXfGEC4Tl3kESmLh1I7Mbxm+ASKA6Et9VdsiSUirUxRwarWo0zsC/sJkkuX0aA2aypCa uLu5bs/1xf90xLK8LkaYqeoFcZeLbnY4MbgabTlbhlefYo8u6iOJPhVqY7c5Tz31vF0RqhmqMzc S X-Received: by 2002:a25:1143:: with SMTP id 64-v6mr6672689ybr.195.1543607897562; Fri, 30 Nov 2018 11:58:17 -0800 (PST) X-Received: by 2002:a25:1143:: with SMTP id 64-v6mr6672643ybr.195.1543607896545; Fri, 30 Nov 2018 11:58:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543607896; cv=none; d=google.com; s=arc-20160816; b=k/qcRm4KS6b9x7OZlaMnpXI+nRIVXCJd6PkQweBfFJrYKI2arTxVUi32SwbTtNOIAy iifnvXpxNadbeTOWtvQ6Max1MmFX8UrLlTced7XfV/RTPkDVhXDpiV2aQjUMPfKoLsN9 c2SRee3ekqe5n3VhKOK/DypTygGkdVqg6amrs8vemkaXACcP0vgIOk7gR0TofcmXCzao FhNZ1+G0WqQZTbWbkaqS8vIgIzlbiu+ZKWTgfohGxOrUHCRkawAAJzo7WGjmq7uLexOn MxBRAWRGsNCT0Dc6ENBJXCLrytYpK80xwsO4A4tQqMX8Fy3Fu3/A5+YQQgvdcxLFBSrp YNLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=kCDMHDgxF4RHKOeveWI/4AVg8a5tgkW4DZ503EDE0GE=; b=BsJfZVL0sRD0U+eCvnAh99tMXjPqE2VlFH66N4rAZ8nbHUY+mCrk0OwiqqGXkS6nSz EF668HsiS2CVSWUoTik/zn2vY7P5KwzjMKwczBvRJq3o061FxJ75+i/e8/HgieCi3XCn 6mxhqPgGz2dGu5LyEmKABS7wlzQW7/sERDxVvprXbd8mkXFQB2HGS0hb3YC00MrCKnPm rESppmMo8ztmu6LjEBCamaLfpBFM6evkF3skJQpNBzoMOPoxdipi/u6KQlkMGTsPg3K9 36Z/mU+lSvcxzefDEfkh4QuhHA8b6fWBd1LahuUA9jkP/RUcBExTySP0SFC5p4yvtiv4 tu4w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=GBQ2mQAI; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) smtp.mailfrom=josef@toxicpanda.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id a10-v6sor2613524ybl.160.2018.11.30.11.58.16 for (Google Transport Security); Fri, 30 Nov 2018 11:58:16 -0800 (PST) Received-SPF: neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=GBQ2mQAI; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) smtp.mailfrom=josef@toxicpanda.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=kCDMHDgxF4RHKOeveWI/4AVg8a5tgkW4DZ503EDE0GE=; b=GBQ2mQAI/KMIiFv0hJFuN4zdyFkhlX10EzJkSeIZDmIf9+jHgJMTEijqixDM2Qd8tX HUpUu4KkaNWLLt6u+98OMZ3pwhWYCpQZfJbGTLnWK6bh0pIVq4vPGIBvrI8qjSG2KZv7 TXPuQlpGgI1bxQ/k2I/M9Qwyl4LJYPpFl8OaRE8sTplrdIuTTx2CLfvSoH7A67WLlzSI wnc1K4q+0bRtijQVN02uNncp212CJe9ykSGaaqkP84Y5s35n61p3SPG1RagsjeA/1XEo VcgvNHqOelukmrYnCj/rmlpRmFzJPveaqxSJ49WEl2ejINPf5TAwxRdN6N+N/+8GG6jJ dPPQ== X-Google-Smtp-Source: AFSGD/UnM7uDtSEiDaQXS2+I4s9dRaN0aDs925Rsc5cNuP+UR4d/zruWweKYIhf8gO9kvAQDrCaaIw== X-Received: by 2002:a25:d8d5:: with SMTP id p204-v6mr1553426ybg.507.1543607896113; Fri, 30 Nov 2018 11:58:16 -0800 (PST) Received: from localhost ([107.15.81.208]) by smtp.gmail.com with ESMTPSA id 206-v6sm2163066ywp.0.2018.11.30.11.58.15 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 30 Nov 2018 11:58:15 -0800 (PST) From: Josef Bacik To: kernel-team@fb.com, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, tj@kernel.org, david@fromorbit.com, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, riel@redhat.com, jack@suse.cz Subject: [PATCH 1/4] mm: infrastructure for page fault page caching Date: Fri, 30 Nov 2018 14:58:09 -0500 Message-Id: <20181130195812.19536-2-josef@toxicpanda.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20181130195812.19536-1-josef@toxicpanda.com> References: <20181130195812.19536-1-josef@toxicpanda.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP We want to be able to cache the result of a previous loop of a page fault in the case that we use VM_FAULT_RETRY, so introduce handle_mm_fault_cacheable that will take a struct vm_fault directly, add a ->cached_page field to vm_fault, and add helpers to init/cleanup the struct vm_fault. I've converted x86, other arch's can follow suit if they so wish, it's relatively straightforward. Signed-off-by: Josef Bacik --- arch/x86/mm/fault.c | 6 +++- include/linux/mm.h | 31 +++++++++++++++++++++ mm/memory.c | 79 ++++++++++++++++++++++++++++++++--------------------- 3 files changed, 84 insertions(+), 32 deletions(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 71d4b9d4d43f..8060ad6a34da 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1230,6 +1230,7 @@ void do_user_addr_fault(struct pt_regs *regs, unsigned long hw_error_code, unsigned long address) { + struct vm_fault vmf = {}; unsigned long sw_error_code; struct vm_area_struct *vma; struct task_struct *tsk; @@ -1420,7 +1421,8 @@ void do_user_addr_fault(struct pt_regs *regs, * userland). The return to userland is identified whenever * FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in flags. */ - fault = handle_mm_fault(vma, address, flags); + vm_fault_init(&vmf, vma, address, flags); + fault = handle_mm_fault_cacheable(&vmf); major |= fault & VM_FAULT_MAJOR; /* @@ -1436,6 +1438,7 @@ void do_user_addr_fault(struct pt_regs *regs, if (!fatal_signal_pending(tsk)) goto retry; } + vm_fault_cleanup(&vmf); /* User mode? Just return to handle the fatal exception */ if (flags & FAULT_FLAG_USER) @@ -1446,6 +1449,7 @@ void do_user_addr_fault(struct pt_regs *regs, return; } + vm_fault_cleanup(&vmf); up_read(&mm->mmap_sem); if (unlikely(fault & VM_FAULT_ERROR)) { mm_fault_error(regs, sw_error_code, address, fault); diff --git a/include/linux/mm.h b/include/linux/mm.h index 5411de93a363..3f1dda389aa7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -360,6 +360,12 @@ struct vm_fault { * is set (which is also implied by * VM_FAULT_ERROR). */ + struct page *cached_page; /* ->fault handlers that return + * VM_FAULT_RETRY can store their + * previous page here to be reused the + * next time we loop through the fault + * handler for faster lookup. + */ /* These three entries are valid only while holding ptl lock */ pte_t *pte; /* Pointer to pte entry matching * the 'address'. NULL if the page @@ -378,6 +384,16 @@ struct vm_fault { */ }; +static inline void vm_fault_init(struct vm_fault *vmf, + struct vm_area_struct *vma, + unsigned long address, + unsigned int flags) +{ + vmf->vma = vma; + vmf->address = address; + vmf->flags = flags; +} + /* page entry size for vm->huge_fault() */ enum page_entry_size { PE_SIZE_PTE = 0, @@ -963,6 +979,14 @@ static inline void put_page(struct page *page) __put_page(page); } +static inline void vm_fault_cleanup(struct vm_fault *vmf) +{ + if (vmf->cached_page) { + put_page(vmf->cached_page); + vmf->cached_page = NULL; + } +} + #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) #define SECTION_IN_PAGE_FLAGS #endif @@ -1425,6 +1449,7 @@ int invalidate_inode_page(struct page *page); #ifdef CONFIG_MMU extern vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, unsigned int flags); +extern vm_fault_t handle_mm_fault_cacheable(struct vm_fault *vmf); extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, unsigned long address, unsigned int fault_flags, bool *unlocked); @@ -1440,6 +1465,12 @@ static inline vm_fault_t handle_mm_fault(struct vm_area_struct *vma, BUG(); return VM_FAULT_SIGBUS; } +static inline vm_fault_t handle_mm_fault_cacheable(struct vm_fault *vmf) +{ + /* should never happen if there's no MMU */ + BUG(); + return VM_FAULT_SIGBUS; +} static inline int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, unsigned long address, unsigned int fault_flags, bool *unlocked) diff --git a/mm/memory.c b/mm/memory.c index 4ad2d293ddc2..d16bb4816f9d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3806,36 +3806,34 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) * The mmap_sem may have been released depending on flags and our * return value. See filemap_fault() and __lock_page_or_retry(). */ -static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, - unsigned long address, unsigned int flags) +static vm_fault_t __handle_mm_fault(struct vm_fault *vmf) { - struct vm_fault vmf = { - .vma = vma, - .address = address & PAGE_MASK, - .flags = flags, - .pgoff = linear_page_index(vma, address), - .gfp_mask = __get_fault_gfp_mask(vma), - }; - unsigned int dirty = flags & FAULT_FLAG_WRITE; + struct vm_area_struct *vma = vmf->vma; + unsigned long address = vmf->address; + unsigned int dirty = vmf->flags & FAULT_FLAG_WRITE; struct mm_struct *mm = vma->vm_mm; pgd_t *pgd; p4d_t *p4d; vm_fault_t ret; + vmf->address = address & PAGE_MASK; + vmf->pgoff = linear_page_index(vma, address); + vmf->gfp_mask = __get_fault_gfp_mask(vma); + pgd = pgd_offset(mm, address); p4d = p4d_alloc(mm, pgd, address); if (!p4d) return VM_FAULT_OOM; - vmf.pud = pud_alloc(mm, p4d, address); - if (!vmf.pud) + vmf->pud = pud_alloc(mm, p4d, address); + if (!vmf->pud) return VM_FAULT_OOM; - if (pud_none(*vmf.pud) && transparent_hugepage_enabled(vma)) { - ret = create_huge_pud(&vmf); + if (pud_none(*vmf->pud) && transparent_hugepage_enabled(vma)) { + ret = create_huge_pud(vmf); if (!(ret & VM_FAULT_FALLBACK)) return ret; } else { - pud_t orig_pud = *vmf.pud; + pud_t orig_pud = *vmf->pud; barrier(); if (pud_trans_huge(orig_pud) || pud_devmap(orig_pud)) { @@ -3843,50 +3841,50 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, /* NUMA case for anonymous PUDs would go here */ if (dirty && !pud_write(orig_pud)) { - ret = wp_huge_pud(&vmf, orig_pud); + ret = wp_huge_pud(vmf, orig_pud); if (!(ret & VM_FAULT_FALLBACK)) return ret; } else { - huge_pud_set_accessed(&vmf, orig_pud); + huge_pud_set_accessed(vmf, orig_pud); return 0; } } } - vmf.pmd = pmd_alloc(mm, vmf.pud, address); - if (!vmf.pmd) + vmf->pmd = pmd_alloc(mm, vmf->pud, address); + if (!vmf->pmd) return VM_FAULT_OOM; - if (pmd_none(*vmf.pmd) && transparent_hugepage_enabled(vma)) { - ret = create_huge_pmd(&vmf); + if (pmd_none(*vmf->pmd) && transparent_hugepage_enabled(vma)) { + ret = create_huge_pmd(vmf); if (!(ret & VM_FAULT_FALLBACK)) return ret; } else { - pmd_t orig_pmd = *vmf.pmd; + pmd_t orig_pmd = *vmf->pmd; barrier(); if (unlikely(is_swap_pmd(orig_pmd))) { VM_BUG_ON(thp_migration_supported() && !is_pmd_migration_entry(orig_pmd)); if (is_pmd_migration_entry(orig_pmd)) - pmd_migration_entry_wait(mm, vmf.pmd); + pmd_migration_entry_wait(mm, vmf->pmd); return 0; } if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { if (pmd_protnone(orig_pmd) && vma_is_accessible(vma)) - return do_huge_pmd_numa_page(&vmf, orig_pmd); + return do_huge_pmd_numa_page(vmf, orig_pmd); if (dirty && !pmd_write(orig_pmd)) { - ret = wp_huge_pmd(&vmf, orig_pmd); + ret = wp_huge_pmd(vmf, orig_pmd); if (!(ret & VM_FAULT_FALLBACK)) return ret; } else { - huge_pmd_set_accessed(&vmf, orig_pmd); + huge_pmd_set_accessed(vmf, orig_pmd); return 0; } } } - return handle_pte_fault(&vmf); + return handle_pte_fault(vmf); } /* @@ -3895,9 +3893,10 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, * The mmap_sem may have been released depending on flags and our * return value. See filemap_fault() and __lock_page_or_retry(). */ -vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, - unsigned int flags) +static vm_fault_t do_handle_mm_fault(struct vm_fault *vmf) { + struct vm_area_struct *vma = vmf->vma; + unsigned int flags = vmf->flags; vm_fault_t ret; __set_current_state(TASK_RUNNING); @@ -3921,9 +3920,9 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, mem_cgroup_enter_user_fault(); if (unlikely(is_vm_hugetlb_page(vma))) - ret = hugetlb_fault(vma->vm_mm, vma, address, flags); + ret = hugetlb_fault(vma->vm_mm, vma, vmf->address, flags); else - ret = __handle_mm_fault(vma, address, flags); + ret = __handle_mm_fault(vmf); if (flags & FAULT_FLAG_USER) { mem_cgroup_exit_user_fault(); @@ -3939,8 +3938,26 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, return ret; } + +vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, + unsigned int flags) +{ + struct vm_fault vmf = {}; + vm_fault_t ret; + + vm_fault_init(&vmf, vma, address, flags); + ret = do_handle_mm_fault(&vmf); + vm_fault_cleanup(&vmf); + return ret; +} EXPORT_SYMBOL_GPL(handle_mm_fault); +vm_fault_t handle_mm_fault_cacheable(struct vm_fault *vmf) +{ + return do_handle_mm_fault(vmf); +} +EXPORT_SYMBOL_GPL(handle_mm_fault_cacheable); + #ifndef __PAGETABLE_P4D_FOLDED /* * Allocate p4d page table. From patchwork Fri Nov 30 19:58:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 10707083 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B7D1314BD for ; Fri, 30 Nov 2018 19:58:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AAEB6304C9 for ; Fri, 30 Nov 2018 19:58:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9EC1F304F5; Fri, 30 Nov 2018 19:58:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1CCA4304C9 for ; Fri, 30 Nov 2018 19:58:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 71CE96B59F0; Fri, 30 Nov 2018 14:58:19 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6A7646B59F1; Fri, 30 Nov 2018 14:58:19 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4FBF26B59F2; Fri, 30 Nov 2018 14:58:19 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-yb1-f198.google.com (mail-yb1-f198.google.com [209.85.219.198]) by kanga.kvack.org (Postfix) with ESMTP id 1DD686B59F0 for ; Fri, 30 Nov 2018 14:58:19 -0500 (EST) Received: by mail-yb1-f198.google.com with SMTP id f17-v6so4147569yba.15 for ; Fri, 30 Nov 2018 11:58:19 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:subject:date:message-id :in-reply-to:references; bh=tfQKrc5MaRoWXLgCC8cvo0W1La7ADJ1hTk4hFrcxJ2w=; b=NpSIT+Mq69zEu8G271zwbN04ehAnYEdrbsdoF/mwoSYTaaAWGhjS8OaI6nduQd3651 cGePlUhaLEfDqL0xsQzPeSpI3JCwrpG7zsvKVlFQh3rbcONTIBiv3hecYWhSpZ98A7id 1GKQiiDf3gG0eMhyJwiqknW+b8pccARnZNn1bGQ14ebjEPuEuZ2IvcC675Vhz94jYvNu S3VIoQeZ6XHxfLNBJM+dOPbm4HKc2EYxQAcM3OPk9+ZbB5ewBR23/CasSrFxtvw+v/qM 8JgBLk6Ya9eChI0XGCsIrk22nHnaxHswNTj8OAL3Tp4fms4xhAOGfEeY5X2l3zYSibQl K6sg== X-Gm-Message-State: AA+aEWbY7eHbyZQ+VuRIsRUUVmd18ySbP1qYbt/rJD6781J7vZVuMlDn aqSQ67isb2LzgBUVMxbfatPqxOJ0DTp2m5Z/Wm9Ny1EIfFRQVZ8ojiXn7QI0q5be2MY6qpIMhcS Fg3XMlNFOSx9mvfhQPwB9n6BXf6hTs/VAPMVf+wgI/o0jTJfxh5Fr6tVwAzFz8W2g537xSgQs/G wnUvd/FvH2MEw3B2EtJNMbsAYhK63GQ0L2QF5Xa2tdI85oMa64o6UeJCv2yg13X4M1XWivbkBLX kOYv7oAE1pUuXeo6CJPPHIVkL25bulkkZmc4c7okkk/pK51+tQKkO3h2Z9ETGujPNrBwYSc4gtj rFQl1xpa9a1nVoDyBJk4a0mbbV2W21k9eBwnV2YCd4Zd5Qg03OAfy3N/4xWCArHp8rc6A2iOlsY n X-Received: by 2002:a25:4457:: with SMTP id r84-v6mr6576203yba.192.1543607898855; Fri, 30 Nov 2018 11:58:18 -0800 (PST) X-Received: by 2002:a25:4457:: with SMTP id r84-v6mr6576173yba.192.1543607898042; Fri, 30 Nov 2018 11:58:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543607897; cv=none; d=google.com; s=arc-20160816; b=g/dJ+H3yCk26jAK76hagoPYcncqv6+9OgN5mIa0UEWRN0rKznk/n0l+nELC7SAa1Ma 5nlSYwUdL9N7QxMynvzH9Ch1pYBvdsRSfuzZN/dyDxdbWsloBSxylMg5LeGSRsJxKCzT DdZyy59XtszMICDZfXNEQn1HOEf2ubCQFnJb9VuJp6pJZidJIeN2WulnPH6r7EhfQd7L mWzjJ6B90MnfIHkGl/BZsdtAcO831Mz8m1iEyAjiBDyID/6xHLzVCe3v7wwtwErflRB0 SDugWtEF8FupjCvGfNMZXCadGjRLJqjn+L+8ni+R5Uh2amFU+YC4POdjCbLLY/U9qxhP aL/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=tfQKrc5MaRoWXLgCC8cvo0W1La7ADJ1hTk4hFrcxJ2w=; b=Kp8AtxfHqhPnxyCnNVb+26D6OcVNz3C6+uawVRVXQ79o2E+6EENS7n4qw41gVHI3bT 30IlKTFPyj+CiaMeRks44u/JmndvuzHyM7MJc5UaBCGKjs/E6gJWS5D9Ww+v6JM1Y70N dD3JYMljkzZXQxBu6kOt9Xb6ZmOwKObGmzRqwP5Vk7fek6++0IcWKWTUIplSEUVWEMB+ 7ho5iZFTm7wYA9NftoC8AauaQL13Curx+zj5cbLjRGPM/OWyllKqYv8x9dlyuSw9nFWS AnD1bZCWOMRix0g+eObbvlsqh+jRuOnR7dVf98QnvlxFtb15qYhiIviRBpEVwHUSI29u s6Dg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=RDqpS5N7; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) smtp.mailfrom=josef@toxicpanda.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id m206-v6sor2621714ybb.36.2018.11.30.11.58.17 for (Google Transport Security); Fri, 30 Nov 2018 11:58:17 -0800 (PST) Received-SPF: neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=RDqpS5N7; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) smtp.mailfrom=josef@toxicpanda.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=tfQKrc5MaRoWXLgCC8cvo0W1La7ADJ1hTk4hFrcxJ2w=; b=RDqpS5N7sRdUqzfJbuUQ3cP6FZpSxwXrU7nIdtgeLwAMDypJ/AuyKBAVURaYyIpF+W TUH2qDG4sSw5CnfzSAptkUg9lt+KdZAK+fgbTlSAWUmnsUi4y6VWMS5jZtZQKr39Njv/ F8ZwAG81ZhZYKh7m/tVeTdOki3HtqStKzcvGDk0owKjUXs3fTpd2tJ9Rs0BrFOp8rDTv IpC12b7mnQMbZP4XGeqQMEzrvS3ty0nrYovoxKqVkHGcEbLTCAAlHOUg1vvZMXmuLnA3 0ld3tOkRZh1iEo4DOXmJFJ2K3rCYGyhbIjrDMvgb3ZrdwLjwfYfU25tiozmcObmmIpqM ezSA== X-Google-Smtp-Source: AFSGD/XWgCRVHDSF9Pe7/FTfBQjuZIj2paksVvKyVRtCwfnLtusXiVGu/hAx627t9t+3WpViWhH6Xw== X-Received: by 2002:a25:bb4c:: with SMTP id b12-v6mr6648882ybk.45.1543607897615; Fri, 30 Nov 2018 11:58:17 -0800 (PST) Received: from localhost ([107.15.81.208]) by smtp.gmail.com with ESMTPSA id e194sm2151940ywa.85.2018.11.30.11.58.16 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 30 Nov 2018 11:58:16 -0800 (PST) From: Josef Bacik To: kernel-team@fb.com, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, tj@kernel.org, david@fromorbit.com, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, riel@redhat.com, jack@suse.cz Subject: [PATCH 2/4] filemap: kill page_cache_read usage in filemap_fault Date: Fri, 30 Nov 2018 14:58:10 -0500 Message-Id: <20181130195812.19536-3-josef@toxicpanda.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20181130195812.19536-1-josef@toxicpanda.com> References: <20181130195812.19536-1-josef@toxicpanda.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP If we do not have a page at filemap_fault time we'll do this weird forced page_cache_read thing to populate the page, and then drop it again and loop around and find it. This makes for 2 ways we can read a page in filemap_fault, and it's not really needed. Instead add a FGP_FOR_MMAP flag so that pagecache_get_page() will return a unlocked page that's in pagecache. Then use the normal page locking and readpage logic already in filemap_fault. This simplifies the no page in page cache case significantly. Signed-off-by: Josef Bacik Acked-by: Johannes Weiner Reviewed-by: Jan Kara --- include/linux/pagemap.h | 1 + mm/filemap.c | 73 ++++++++++--------------------------------------- 2 files changed, 16 insertions(+), 58 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 226f96f0dee0..b13c2442281f 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -252,6 +252,7 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping, #define FGP_WRITE 0x00000008 #define FGP_NOFS 0x00000010 #define FGP_NOWAIT 0x00000020 +#define FGP_FOR_MMAP 0x00000040 struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, int fgp_flags, gfp_t cache_gfp_mask); diff --git a/mm/filemap.c b/mm/filemap.c index 81adec8ee02c..f068712c2525 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1503,6 +1503,9 @@ EXPORT_SYMBOL(find_lock_entry); * @gfp_mask and added to the page cache and the VM's LRU * list. The page is returned locked and with an increased * refcount. Otherwise, NULL is returned. + * - FGP_FOR_MMAP: Similar to FGP_CREAT, only it unlocks the page after it has + * added it to pagecache, as the mmap code expects to do it's own special + * locking dance. * * If FGP_LOCK or FGP_CREAT are specified then the function may sleep even * if the GFP flags specified for FGP_CREAT are atomic. @@ -1555,7 +1558,7 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, if (!page) return NULL; - if (WARN_ON_ONCE(!(fgp_flags & FGP_LOCK))) + if (WARN_ON_ONCE(!(fgp_flags & (FGP_LOCK | FGP_FOR_MMAP)))) fgp_flags |= FGP_LOCK; /* Init accessed so avoid atomic mark_page_accessed later */ @@ -1569,6 +1572,13 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, if (err == -EEXIST) goto repeat; } + + /* + * add_to_page_cache_lru lock's the page, and for mmap we expect + * a unlocked page. + */ + if (fgp_flags & FGP_FOR_MMAP) + unlock_page(page); } return page; @@ -2293,39 +2303,6 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) EXPORT_SYMBOL(generic_file_read_iter); #ifdef CONFIG_MMU -/** - * page_cache_read - adds requested page to the page cache if not already there - * @file: file to read - * @offset: page index - * @gfp_mask: memory allocation flags - * - * This adds the requested page to the page cache if it isn't already there, - * and schedules an I/O to read in its contents from disk. - */ -static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask) -{ - struct address_space *mapping = file->f_mapping; - struct page *page; - int ret; - - do { - page = __page_cache_alloc(gfp_mask); - if (!page) - return -ENOMEM; - - ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask); - if (ret == 0) - ret = mapping->a_ops->readpage(file, page); - else if (ret == -EEXIST) - ret = 0; /* losing race to add is OK */ - - put_page(page); - - } while (ret == AOP_TRUNCATED_PAGE); - - return ret; -} - #define MMAP_LOTSAMISS (100) /* @@ -2449,9 +2426,11 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT); ret = VM_FAULT_MAJOR; retry_find: - page = find_get_page(mapping, offset); + page = pagecache_get_page(mapping, offset, + FGP_CREAT|FGP_FOR_MMAP, + vmf->gfp_mask); if (!page) - goto no_cached_page; + return vmf_error(-ENOMEM); } if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) { @@ -2488,28 +2467,6 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) vmf->page = page; return ret | VM_FAULT_LOCKED; -no_cached_page: - /* - * We're only likely to ever get here if MADV_RANDOM is in - * effect. - */ - error = page_cache_read(file, offset, vmf->gfp_mask); - - /* - * The page we want has now been added to the page cache. - * In the unlikely event that someone removed it in the - * meantime, we'll just come back here and read it again. - */ - if (error >= 0) - goto retry_find; - - /* - * An error return from page_cache_read can result if the - * system is low on memory, or a problem occurs while trying - * to schedule I/O. - */ - return vmf_error(error); - page_not_uptodate: /* * Umm, take care of errors if the page isn't up-to-date. From patchwork Fri Nov 30 19:58:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 10707085 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5641414BD for ; Fri, 30 Nov 2018 19:58:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4686C30453 for ; Fri, 30 Nov 2018 19:58:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 35E4A304F0; Fri, 30 Nov 2018 19:58:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 738EC304C9 for ; Fri, 30 Nov 2018 19:58:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 596E76B59F1; Fri, 30 Nov 2018 14:58:21 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4CE966B59F2; Fri, 30 Nov 2018 14:58:21 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 323DB6B59F3; Fri, 30 Nov 2018 14:58:21 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-yw1-f71.google.com (mail-yw1-f71.google.com [209.85.161.71]) by kanga.kvack.org (Postfix) with ESMTP id F0A256B59F1 for ; Fri, 30 Nov 2018 14:58:20 -0500 (EST) Received: by mail-yw1-f71.google.com with SMTP id l69so4404043ywb.7 for ; Fri, 30 Nov 2018 11:58:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:subject:date:message-id :in-reply-to:references; bh=nUZutRNzquUqiT8YQe9ytPV/656PUstfyd8NxlX/hPI=; b=lpkieQAIG/FupbMcswAkT+tzmOmpYkRQN1btftgv+IYg1XNFgCMG1p1TMbPRHHtlrm z6x1jkdTWwjaZ46KjVW8Wbx+471JHEkrgIg5S0wUtnP3+pIbxfisJbGH+AghldZ6DI1p vqkQQJ8G4fkyVMharl6FcAU/4HUa76pi92ni1ALaHe/SgF18LSmpG2DZjRi3xRZgONbZ puAUeshAbENNpnXWzyl8SDKtaDblExiZBG0j/7aS6BBPeB5Wssn3K2DbNC17YWm1MRDm Pa7Bw6D5xG1dAZh/YC3AfaVSyyiDIVwviVxa+jial0V0STmpBO58Juqz7wzkBn3Yp4KQ nZHA== X-Gm-Message-State: AA+aEWaVa2FcBIXTf+qUizb3iXbKZleVJneCgldDb4pNbO46ZMZixGi0 rzkKSLz1xTplDI45Df8ufriuEvUKL6/xIQ5cLhA+aNBFsveAfl/a7UdyYuQmXg5zDZmxnkzF6mK joI6TJlYSQ334w4WWC/RJa7FlVI71eTlNqn/EACdGB57LMo9U+w3j/j0DJ3wq6xVCJ1qcp8EyqF XJjyvQwqFpCRlglHNc07bepLB8FMBGQ4bgQvShgyFTt/chvkiK/9l5Pp+tR0KEJvcRtIxKReg1J jrKIHIUMMqiK0QtZz4Sv+EjDrDB6UZX78z5WaKx1KWrpN4mvtyeA0/yFDNuE5kYFvC3tIBs199C xCHubFrI3UozTdOAlUXZRpeblRzy6PqUVzgIiEa/66gBeQ9hlkZUWQk5Wmc0bB2Di0Dr3HxOqRl 2 X-Received: by 2002:a81:ec0d:: with SMTP id j13mr6914317ywm.5.1543607900673; Fri, 30 Nov 2018 11:58:20 -0800 (PST) X-Received: by 2002:a81:ec0d:: with SMTP id j13mr6914292ywm.5.1543607899779; Fri, 30 Nov 2018 11:58:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543607899; cv=none; d=google.com; s=arc-20160816; b=1EIEkTzs65G0kwuwUOkOORiAAwKMwqCxeUhqsM9ybPlOpqfqaIRgpat2C+aF+SPgsd uEKOhB4xWyy3jak9HTyKOG7qzHAcsE9XG+8ynE4KPSVaxDS0bKmoqBFE0AE3/tFDjf87 AAe8FtIUfuC/juHAkfpA19nCi/YZ2E2xJG3WE0U+tKK1cMLr6N9UWubGJTUgBVufICKl H6NW4uX0fvaVc9T9xkRqmcBmRYpvuGI3IyjzfVkCpoGjfAq/oXCrWFfOY0DVuTKUkRoF L+x1xzIeorbb1cFxQwn/5ASo9fbA0U4sICCONG737iZwMtawMtYpLvFs6nXKb1TupUUj sUUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=nUZutRNzquUqiT8YQe9ytPV/656PUstfyd8NxlX/hPI=; b=GCmfEVff6S8BJKSzjRgl2MzCVMGViw+1sDiqdCOHxDeD1rlDXNP3AC2W1Cw/NqDYUm ehuH82JvslkcgueRAnYsNG9+IGE4jZwJUlj9/G92UEggNLL732zLf9uZbfpZ5UHmcewI klyAyHP/pHNz8wtsuIYXe/37Zp2RFCkNYvEHrHdCCYfQqwPtDrZaWe8An2D/FMg4WO78 UXTqQrTp/XZN0k1Z9MZPWF+GwLLdh/pJbWnKyBX1CmPvDt9DjX2GTPkcrswRx2JIWZqZ PpHkQlvLFFnDELmIa2pgC3Zp9UhYPIQf4sq09Hm7BQ5jHTL+PVkVGS7wLZYG9HUkgB0k jj0A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b="pGqfh1b/"; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) smtp.mailfrom=josef@toxicpanda.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id z125sor1061669ywa.67.2018.11.30.11.58.19 for (Google Transport Security); Fri, 30 Nov 2018 11:58:19 -0800 (PST) Received-SPF: neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b="pGqfh1b/"; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) smtp.mailfrom=josef@toxicpanda.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=nUZutRNzquUqiT8YQe9ytPV/656PUstfyd8NxlX/hPI=; b=pGqfh1b/nsbK/ZbCXMPcYm87vbT8EJixts8PxW5qh/PS62xI0lfhXgstg/SfXmcgKx QL8eZ/eIwVWknS6P3BhLvtDQdWszrYyJ1QwEt94N/dKXdv2geai/8blz0IKRlaedV9a3 kTTXnVh6UVcSBASSiHhFEJ/XCNZ0i9pc8m87rH/W68Yqp/GhcX/uyHvsmhBMlQeJ+IMm nJzEv1nMrwT8VbQFp6GqCt4JSUMGfTDEbIdakFPs2kJcdTQfrMWq4lURis0hDKJXfhnq Te6J3KoAQ4jU9tgqDrcyIcSSHxaSDvRoVHvuDbyQj8YPy/lq7/kuL8g4zlM4OIEmaQWV BZQQ== X-Google-Smtp-Source: AFSGD/UTW3BkBMXkE9PGFk0hChnQLXXRjoCPzae6goRhgMpcuc2F0LFE+t5itICdCj2HY6SpfuWGxg== X-Received: by 2002:a81:3402:: with SMTP id b2mr7025741ywa.12.1543607899303; Fri, 30 Nov 2018 11:58:19 -0800 (PST) Received: from localhost ([107.15.81.208]) by smtp.gmail.com with ESMTPSA id p201sm2705356ywe.45.2018.11.30.11.58.18 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 30 Nov 2018 11:58:18 -0800 (PST) From: Josef Bacik To: kernel-team@fb.com, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, tj@kernel.org, david@fromorbit.com, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, riel@redhat.com, jack@suse.cz Subject: [PATCH 3/4] filemap: drop the mmap_sem for all blocking operations Date: Fri, 30 Nov 2018 14:58:11 -0500 Message-Id: <20181130195812.19536-4-josef@toxicpanda.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20181130195812.19536-1-josef@toxicpanda.com> References: <20181130195812.19536-1-josef@toxicpanda.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Currently we only drop the mmap_sem if there is contention on the page lock. The idea is that we issue readahead and then go to lock the page while it is under IO and we want to not hold the mmap_sem during the IO. The problem with this is the assumption that the readahead does anything. In the case that the box is under extreme memory or IO pressure we may end up not reading anything at all for readahead, which means we will end up reading in the page under the mmap_sem. Instead rework filemap fault path to drop the mmap sem at any point that we may do IO or block for an extended period of time. This includes while issuing readahead, locking the page, or needing to call ->readpage because readahead did not occur. Then once we have a fully uptodate page we can return with VM_FAULT_RETRY and come back again to find our nicely in-cache page that was gotten outside of the mmap_sem. Signed-off-by: Josef Bacik Acked-by: Johannes Weiner --- mm/filemap.c | 113 ++++++++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 93 insertions(+), 20 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index f068712c2525..5e76b24b2a0f 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2304,28 +2304,44 @@ EXPORT_SYMBOL(generic_file_read_iter); #ifdef CONFIG_MMU #define MMAP_LOTSAMISS (100) +static struct file *maybe_unlock_mmap_for_io(struct file *fpin, + struct vm_area_struct *vma, + int flags) +{ + if (fpin) + return fpin; + if ((flags & (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT)) == + FAULT_FLAG_ALLOW_RETRY) { + fpin = get_file(vma->vm_file); + up_read(&vma->vm_mm->mmap_sem); + } + return fpin; +} /* * Synchronous readahead happens when we don't even find * a page in the page cache at all. */ -static void do_sync_mmap_readahead(struct vm_area_struct *vma, - struct file_ra_state *ra, - struct file *file, - pgoff_t offset) +static struct file *do_sync_mmap_readahead(struct vm_area_struct *vma, + struct file_ra_state *ra, + struct file *file, + pgoff_t offset, + int flags) { struct address_space *mapping = file->f_mapping; + struct file *fpin = NULL; /* If we don't want any read-ahead, don't bother */ if (vma->vm_flags & VM_RAND_READ) - return; + return fpin; if (!ra->ra_pages) - return; + return fpin; if (vma->vm_flags & VM_SEQ_READ) { + fpin = maybe_unlock_mmap_for_io(fpin, vma, flags); page_cache_sync_readahead(mapping, ra, file, offset, ra->ra_pages); - return; + return fpin; } /* Avoid banging the cache line if not needed */ @@ -2337,37 +2353,43 @@ static void do_sync_mmap_readahead(struct vm_area_struct *vma, * stop bothering with read-ahead. It will only hurt. */ if (ra->mmap_miss > MMAP_LOTSAMISS) - return; + return fpin; /* * mmap read-around */ + fpin = maybe_unlock_mmap_for_io(fpin, vma, flags); ra->start = max_t(long, 0, offset - ra->ra_pages / 2); ra->size = ra->ra_pages; ra->async_size = ra->ra_pages / 4; ra_submit(ra, mapping, file); + return fpin; } /* * Asynchronous readahead happens when we find the page and PG_readahead, * so we want to possibly extend the readahead further.. */ -static void do_async_mmap_readahead(struct vm_area_struct *vma, - struct file_ra_state *ra, - struct file *file, - struct page *page, - pgoff_t offset) +static struct file *do_async_mmap_readahead(struct vm_area_struct *vma, + struct file_ra_state *ra, + struct file *file, + struct page *page, + pgoff_t offset, int flags) { struct address_space *mapping = file->f_mapping; + struct file *fpin = NULL; /* If we don't want any read-ahead, don't bother */ if (vma->vm_flags & VM_RAND_READ) - return; + return fpin; if (ra->mmap_miss > 0) ra->mmap_miss--; - if (PageReadahead(page)) + if (PageReadahead(page)) { + fpin = maybe_unlock_mmap_for_io(fpin, vma, flags); page_cache_async_readahead(mapping, ra, file, page, offset, ra->ra_pages); + } + return fpin; } /** @@ -2397,6 +2419,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) { int error; struct file *file = vmf->vma->vm_file; + struct file *fpin = NULL; struct address_space *mapping = file->f_mapping; struct file_ra_state *ra = &file->f_ra; struct inode *inode = mapping->host; @@ -2418,10 +2441,12 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) * We found the page, so try async readahead before * waiting for the lock. */ - do_async_mmap_readahead(vmf->vma, ra, file, page, offset); + fpin = do_async_mmap_readahead(vmf->vma, ra, file, page, offset, + vmf->flags); } else if (!page) { /* No page in the page cache at all */ - do_sync_mmap_readahead(vmf->vma, ra, file, offset); + fpin = do_sync_mmap_readahead(vmf->vma, ra, file, offset, + vmf->flags); count_vm_event(PGMAJFAULT); count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT); ret = VM_FAULT_MAJOR; @@ -2433,9 +2458,32 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) return vmf_error(-ENOMEM); } - if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) { - put_page(page); - return ret | VM_FAULT_RETRY; + /* + * We are open-coding lock_page_or_retry here because we want to do the + * readpage if necessary while the mmap_sem is dropped. If there + * happens to be a lock on the page but it wasn't being faulted in we'd + * come back around without ALLOW_RETRY set and then have to do the IO + * under the mmap_sem, which would be a bummer. + */ + if (!trylock_page(page)) { + fpin = maybe_unlock_mmap_for_io(fpin, vmf->vma, vmf->flags); + if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT) + goto out_retry; + if (vmf->flags & FAULT_FLAG_KILLABLE) { + if (__lock_page_killable(page)) { + /* + * If we don't have the right flags for + * maybe_unlock_mmap_for_io to do it's thing we + * still need to drop the sem and return + * VM_FAULT_RETRY so the upper layer checks the + * signal and takes the appropriate action. + */ + if (!fpin) + up_read(&vmf->vma->vm_mm->mmap_sem); + goto out_retry; + } + } else + __lock_page(page); } /* Did it get truncated? */ @@ -2453,6 +2501,16 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) if (unlikely(!PageUptodate(page))) goto page_not_uptodate; + /* + * We've made it this far and we had to drop our mmap_sem, now is the + * time to return to the upper layer and have it re-find the vma and + * redo the fault. + */ + if (fpin) { + unlock_page(page); + goto out_retry; + } + /* * Found the page and have a reference on it. * We must recheck i_size under page lock. @@ -2475,12 +2533,15 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) * and we need to check for errors. */ ClearPageError(page); + fpin = maybe_unlock_mmap_for_io(fpin, vmf->vma, vmf->flags); error = mapping->a_ops->readpage(file, page); if (!error) { wait_on_page_locked(page); if (!PageUptodate(page)) error = -EIO; } + if (fpin) + goto out_retry; put_page(page); if (!error || error == AOP_TRUNCATED_PAGE) @@ -2489,6 +2550,18 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) /* Things didn't work out. Return zero to tell the mm layer so. */ shrink_readahead_size_eio(file, ra); return VM_FAULT_SIGBUS; + +out_retry: + /* + * We dropped the mmap_sem, we need to return to the fault handler to + * re-find the vma and come back and find our hopefully still populated + * page. + */ + if (page) + put_page(page); + if (fpin) + fput(fpin); + return ret | VM_FAULT_RETRY; } EXPORT_SYMBOL(filemap_fault); From patchwork Fri Nov 30 19:58:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 10707087 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 609D01057 for ; Fri, 30 Nov 2018 19:58:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 541A430453 for ; Fri, 30 Nov 2018 19:58:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4869C304F5; Fri, 30 Nov 2018 19:58:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C205B30453 for ; Fri, 30 Nov 2018 19:58:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 98F796B59F2; Fri, 30 Nov 2018 14:58:22 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 914CB6B59F3; Fri, 30 Nov 2018 14:58:22 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 78D816B59F4; Fri, 30 Nov 2018 14:58:22 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-yw1-f70.google.com (mail-yw1-f70.google.com [209.85.161.70]) by kanga.kvack.org (Postfix) with ESMTP id 48D406B59F2 for ; Fri, 30 Nov 2018 14:58:22 -0500 (EST) Received: by mail-yw1-f70.google.com with SMTP id t17so4380254ywc.23 for ; Fri, 30 Nov 2018 11:58:22 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:subject:date:message-id :in-reply-to:references; bh=ZiSWalXRo2lAHNoxFcnCBmYqM80BUgBK6TZ//qlBoW0=; b=kzxMiKV6o/I7QrSoqo1GOvFCOpV6DhplE91iSB1Z8yjXxwFXR5v9uRDpeGXXu+vXO0 hlqWKZu4KnPm0B+7CfFysvaF4672vSrL8RR+RS4WnQ6L/QrkBGmjHR/W99jmkCkvTjPK IrsPybisP+YRRrWA5QHDbFnu+lLufZhrQUW/BND/MACu0fLlKFQqUgdY5kDym3X2nq21 wHxuk75FQdeeqyjEFeQLX713hL3mmUyrMYQZgpG4qfBhjtixEcQAw9IzeypwHllTS3ZG xpgZmOlaPPmGVnjTusedIytCS6NaeQ8r/F0tGHArZUyonq9CIlwlQDKOCMu9kN6S1dBB rBIA== X-Gm-Message-State: AA+aEWavqivjJ3ezvZ4CPsyM+sl/QrXTAGI6hGAW2WWR0zOc7jyzU/c9 XKP2e5Om6FZG78SlSiAu0jb8AeIqCek8/6II1lOCebL23GDk+jdJZjq5dWkzyF1VyhGpYSdhedh X23mQ5xyVxHMRwd+VeHRMnlvMt5G5/ktqJJ/IzYU8n29oytGEhrgOmdS64L83emUoLLMNVgaxff hFaOqJ6A6y5R3GKPB67E+WxlfKUtD0BfyWTjrFBUdUU1KQz2dWT0k/+BcLvo5kAyy7B7QJO6Iil Gap6kdyPJhD/xtssA/tNpjxCJdZfiJTZVc6HiImyjMCRzBHQpM6KOq71Rbm4ljGfSaa7FOLyneZ TXT0xvyWj5l7a7HUv6AtLLEb3xxsAK3TPrnGj5w7D9oSTIoYkE+F6T3nKiQDERJyj2SLx4v6rqq A X-Received: by 2002:a25:af0a:: with SMTP id a10-v6mr6613556ybh.279.1543607902037; Fri, 30 Nov 2018 11:58:22 -0800 (PST) X-Received: by 2002:a25:af0a:: with SMTP id a10-v6mr6613527ybh.279.1543607901168; Fri, 30 Nov 2018 11:58:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543607901; cv=none; d=google.com; s=arc-20160816; b=wPVAl7WkYlYGa2kugXQJWadry8W1EPue0cQzJRCinuHbrhlld25u06WuX/tgLbCQyZ +m7vdN7xn4Jp6vFM8tRafYo05KbY+XW0LGsK+Ehx7JMvph42xq81SVM9qKStxiJA3LFt Ro7SKuGPk1B3g7iSummoVpzXc3XMbgFd93Prl7v1+NLvkkt6J9+N/FyNmuNlzoih500Q gCJzuPwTkUobsuSlR0W4vF25zv3dg4NA8PFY47by5TPMB97lZl5rOqSKawI/Pzv8qhWw 5PA9OBgmTtMZmU4pOUupKXuVgJAaMKKS41rEWjOc1/JCyrxBc2RRYP60vdn88MOXhEnA 0Png== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=ZiSWalXRo2lAHNoxFcnCBmYqM80BUgBK6TZ//qlBoW0=; b=S5m2pDGcb6LGaDAHZlxvUmRJaCILnHvF8z/vTs+UINq0MR5m6PdGpFkQ0IwnszzRE2 krzDdFpSTAuCuiKdo0cLmMqr0pD8Yj32rVL8IIR1IlYueUlv6/SnkcWaQo6mgH4swczB gqwyIt466eexSJEdF11I6MpsG4s/pjTHcGqgcaG0vDHFH40EW+alKSP+PJK/iMQGb1yw JU86W7Mi30Ts+yX5Cc71aXy2uUR2SERPZi48nWh8SDBVH+57jBLEUeRAEUR8gj/A2N/f 8OLX0FD7kcdLaMvXWGHedJ/4lt6d53KpTQpSY3UeorDCOSTd4sCtOMNrh2Tntms9Kgzx GKdA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=qE17nE2J; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) smtp.mailfrom=josef@toxicpanda.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id u186-v6sor2671556ybf.135.2018.11.30.11.58.21 for (Google Transport Security); Fri, 30 Nov 2018 11:58:21 -0800 (PST) Received-SPF: neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=qE17nE2J; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) smtp.mailfrom=josef@toxicpanda.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=ZiSWalXRo2lAHNoxFcnCBmYqM80BUgBK6TZ//qlBoW0=; b=qE17nE2Jo3x4abm13SJxAAIFGmnZyoF8C/6nu8aRpvS4rDGOAT+NB7nFb5qR/q9r0W wyEq3cqlr+cDklN/eobicAnmyPZ5O00B6CLcEWwkMfbaQv2f1e/zarf+qIW/wPERWt9z QGzxh3p+tv3y8zML5KreAJ0J0fdK1gQiiTp1jNjrMW9vgTdx5CDjGysHbm7NtzHtqj2z G48SlfhBIGMBXljirIlfet56WObqccekan4prBch/7ywFpwIc9gEWxYUBnEKPj4xUmnP p7oYI1XwYAfhLaNWOYg0r74CjigClE0qUNLmKqCBjSEdRHgY4p9ac7AqgtmRfyHOfhUM 7DqA== X-Google-Smtp-Source: AFSGD/WUySeblOm5lO6iEZroC7YfnS5osaYyKXSy7/i64tvWyXdByqa0y4VKPk5GYCaEeI3VijK3IQ== X-Received: by 2002:a5b:b09:: with SMTP id z9-v6mr6591976ybp.483.1543607900825; Fri, 30 Nov 2018 11:58:20 -0800 (PST) Received: from localhost ([107.15.81.208]) by smtp.gmail.com with ESMTPSA id z74sm3209536ywz.51.2018.11.30.11.58.19 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 30 Nov 2018 11:58:20 -0800 (PST) From: Josef Bacik To: kernel-team@fb.com, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, tj@kernel.org, david@fromorbit.com, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, riel@redhat.com, jack@suse.cz Subject: [PATCH 4/4] mm: use the cached page for filemap_fault Date: Fri, 30 Nov 2018 14:58:12 -0500 Message-Id: <20181130195812.19536-5-josef@toxicpanda.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20181130195812.19536-1-josef@toxicpanda.com> References: <20181130195812.19536-1-josef@toxicpanda.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP If we drop the mmap_sem we have to redo the vma lookup which requires redoing the fault handler. Chances are we will just come back to the same page, so save this page in our vmf->cached_page and reuse it in the next loop through the fault handler. Signed-off-by: Josef Bacik --- mm/filemap.c | 45 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 43 insertions(+), 2 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 5e76b24b2a0f..d4385b704e04 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2392,6 +2392,35 @@ static struct file *do_async_mmap_readahead(struct vm_area_struct *vma, return fpin; } +static int vmf_has_cached_page(struct vm_fault *vmf, struct page **page) +{ + struct page *cached_page = vmf->cached_page; + struct mm_struct *mm = vmf->vma->vm_mm; + struct address_space *mapping = vmf->vma->vm_file->f_mapping; + pgoff_t offset = vmf->pgoff; + + if (!cached_page) + return 0; + + if (vmf->flags & FAULT_FLAG_KILLABLE) { + int ret = lock_page_killable(cached_page); + if (ret) { + up_read(&mm->mmap_sem); + return ret; + } + } else + lock_page(cached_page); + vmf->cached_page = NULL; + if (cached_page->mapping == mapping && + cached_page->index == offset) { + *page = cached_page; + } else { + unlock_page(cached_page); + put_page(cached_page); + } + return 0; +} + /** * filemap_fault - read in file data for page fault handling * @vmf: struct vm_fault containing details of the fault @@ -2425,13 +2454,24 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) struct inode *inode = mapping->host; pgoff_t offset = vmf->pgoff; pgoff_t max_off; - struct page *page; + struct page *page = NULL; vm_fault_t ret = 0; max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); if (unlikely(offset >= max_off)) return VM_FAULT_SIGBUS; + /* + * We may have read in the page already and have a page from an earlier + * loop. If so we need to see if this page is still valid, and if not + * do the whole dance over again. + */ + error = vmf_has_cached_page(vmf, &page); + if (error) + goto out_retry; + if (page) + goto have_cached_page; + /* * Do we have something in the page cache already? */ @@ -2492,6 +2532,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) put_page(page); goto retry_find; } +have_cached_page: VM_BUG_ON_PAGE(page->index != offset, page); /* @@ -2558,7 +2599,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) * page. */ if (page) - put_page(page); + vmf->cached_page = page; if (fpin) fput(fpin); return ret | VM_FAULT_RETRY;