From patchwork Fri Nov 30 19:58:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 10707093 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1697E1057 for ; Fri, 30 Nov 2018 19:58:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 03F002F8F2 for ; Fri, 30 Nov 2018 19:58:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E90FB2F90A; Fri, 30 Nov 2018 19:58:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 273332F8F2 for ; Fri, 30 Nov 2018 19:58:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727002AbeLAHIl (ORCPT ); Sat, 1 Dec 2018 02:08:41 -0500 Received: from mail-yb1-f193.google.com ([209.85.219.193]:36703 "EHLO mail-yb1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726629AbeLAHIk (ORCPT ); Sat, 1 Dec 2018 02:08:40 -0500 Received: by mail-yb1-f193.google.com with SMTP id g192-v6so2709720ybf.3 for ; Fri, 30 Nov 2018 11:58:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=kCDMHDgxF4RHKOeveWI/4AVg8a5tgkW4DZ503EDE0GE=; b=GBQ2mQAI/KMIiFv0hJFuN4zdyFkhlX10EzJkSeIZDmIf9+jHgJMTEijqixDM2Qd8tX HUpUu4KkaNWLLt6u+98OMZ3pwhWYCpQZfJbGTLnWK6bh0pIVq4vPGIBvrI8qjSG2KZv7 TXPuQlpGgI1bxQ/k2I/M9Qwyl4LJYPpFl8OaRE8sTplrdIuTTx2CLfvSoH7A67WLlzSI wnc1K4q+0bRtijQVN02uNncp212CJe9ykSGaaqkP84Y5s35n61p3SPG1RagsjeA/1XEo VcgvNHqOelukmrYnCj/rmlpRmFzJPveaqxSJ49WEl2ejINPf5TAwxRdN6N+N/+8GG6jJ dPPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=kCDMHDgxF4RHKOeveWI/4AVg8a5tgkW4DZ503EDE0GE=; b=kwZDyIv+QP1vTdBeDyk7//n/yQCIuv6uUOvM9velrOhz/o7mGY6Aq1LlWeT/gJWu91 4yrSWUhR6rY1ctevmP4F6RdVSjW++J7P4ddNLqqyDnG4pm3ds9R8Q4LJSwKNbZl7ucvJ ytiY2PT69yxLr75QQCm3niES9SpIhN5UACJByZFtRRT/4czIlBfw86qswt9rL3GMMljM jRPXDFg4QKzhora/4vfSZ53iUzDtjJDKpibHbNV6gnUDk4G3hGgdt6wy5eHT6uwUsusK 1xuhjjwh2AJXZbnxIe+jNa7GnH1eXAhpbUl9LMVQ96XZMThMvixo1Y44m+Txnt6bT3Ye J/nQ== X-Gm-Message-State: AA+aEWZP6JvMS+wzCtHtsT9KC76SFgaxUMQfXPP5k6StWrJ1jfewjmqe k1qC1HlfqZxC9vOy+Xu3G8JLkQuDht1DyA== X-Google-Smtp-Source: AFSGD/UnM7uDtSEiDaQXS2+I4s9dRaN0aDs925Rsc5cNuP+UR4d/zruWweKYIhf8gO9kvAQDrCaaIw== X-Received: by 2002:a25:d8d5:: with SMTP id p204-v6mr1553426ybg.507.1543607896113; Fri, 30 Nov 2018 11:58:16 -0800 (PST) Received: from localhost ([107.15.81.208]) by smtp.gmail.com with ESMTPSA id 206-v6sm2163066ywp.0.2018.11.30.11.58.15 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 30 Nov 2018 11:58:15 -0800 (PST) From: Josef Bacik To: kernel-team@fb.com, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, tj@kernel.org, david@fromorbit.com, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, riel@redhat.com, jack@suse.cz Subject: [PATCH 1/4] mm: infrastructure for page fault page caching Date: Fri, 30 Nov 2018 14:58:09 -0500 Message-Id: <20181130195812.19536-2-josef@toxicpanda.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20181130195812.19536-1-josef@toxicpanda.com> References: <20181130195812.19536-1-josef@toxicpanda.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We want to be able to cache the result of a previous loop of a page fault in the case that we use VM_FAULT_RETRY, so introduce handle_mm_fault_cacheable that will take a struct vm_fault directly, add a ->cached_page field to vm_fault, and add helpers to init/cleanup the struct vm_fault. I've converted x86, other arch's can follow suit if they so wish, it's relatively straightforward. Signed-off-by: Josef Bacik --- arch/x86/mm/fault.c | 6 +++- include/linux/mm.h | 31 +++++++++++++++++++++ mm/memory.c | 79 ++++++++++++++++++++++++++++++++--------------------- 3 files changed, 84 insertions(+), 32 deletions(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 71d4b9d4d43f..8060ad6a34da 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1230,6 +1230,7 @@ void do_user_addr_fault(struct pt_regs *regs, unsigned long hw_error_code, unsigned long address) { + struct vm_fault vmf = {}; unsigned long sw_error_code; struct vm_area_struct *vma; struct task_struct *tsk; @@ -1420,7 +1421,8 @@ void do_user_addr_fault(struct pt_regs *regs, * userland). The return to userland is identified whenever * FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in flags. */ - fault = handle_mm_fault(vma, address, flags); + vm_fault_init(&vmf, vma, address, flags); + fault = handle_mm_fault_cacheable(&vmf); major |= fault & VM_FAULT_MAJOR; /* @@ -1436,6 +1438,7 @@ void do_user_addr_fault(struct pt_regs *regs, if (!fatal_signal_pending(tsk)) goto retry; } + vm_fault_cleanup(&vmf); /* User mode? Just return to handle the fatal exception */ if (flags & FAULT_FLAG_USER) @@ -1446,6 +1449,7 @@ void do_user_addr_fault(struct pt_regs *regs, return; } + vm_fault_cleanup(&vmf); up_read(&mm->mmap_sem); if (unlikely(fault & VM_FAULT_ERROR)) { mm_fault_error(regs, sw_error_code, address, fault); diff --git a/include/linux/mm.h b/include/linux/mm.h index 5411de93a363..3f1dda389aa7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -360,6 +360,12 @@ struct vm_fault { * is set (which is also implied by * VM_FAULT_ERROR). */ + struct page *cached_page; /* ->fault handlers that return + * VM_FAULT_RETRY can store their + * previous page here to be reused the + * next time we loop through the fault + * handler for faster lookup. + */ /* These three entries are valid only while holding ptl lock */ pte_t *pte; /* Pointer to pte entry matching * the 'address'. NULL if the page @@ -378,6 +384,16 @@ struct vm_fault { */ }; +static inline void vm_fault_init(struct vm_fault *vmf, + struct vm_area_struct *vma, + unsigned long address, + unsigned int flags) +{ + vmf->vma = vma; + vmf->address = address; + vmf->flags = flags; +} + /* page entry size for vm->huge_fault() */ enum page_entry_size { PE_SIZE_PTE = 0, @@ -963,6 +979,14 @@ static inline void put_page(struct page *page) __put_page(page); } +static inline void vm_fault_cleanup(struct vm_fault *vmf) +{ + if (vmf->cached_page) { + put_page(vmf->cached_page); + vmf->cached_page = NULL; + } +} + #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) #define SECTION_IN_PAGE_FLAGS #endif @@ -1425,6 +1449,7 @@ int invalidate_inode_page(struct page *page); #ifdef CONFIG_MMU extern vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, unsigned int flags); +extern vm_fault_t handle_mm_fault_cacheable(struct vm_fault *vmf); extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, unsigned long address, unsigned int fault_flags, bool *unlocked); @@ -1440,6 +1465,12 @@ static inline vm_fault_t handle_mm_fault(struct vm_area_struct *vma, BUG(); return VM_FAULT_SIGBUS; } +static inline vm_fault_t handle_mm_fault_cacheable(struct vm_fault *vmf) +{ + /* should never happen if there's no MMU */ + BUG(); + return VM_FAULT_SIGBUS; +} static inline int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, unsigned long address, unsigned int fault_flags, bool *unlocked) diff --git a/mm/memory.c b/mm/memory.c index 4ad2d293ddc2..d16bb4816f9d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3806,36 +3806,34 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) * The mmap_sem may have been released depending on flags and our * return value. See filemap_fault() and __lock_page_or_retry(). */ -static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, - unsigned long address, unsigned int flags) +static vm_fault_t __handle_mm_fault(struct vm_fault *vmf) { - struct vm_fault vmf = { - .vma = vma, - .address = address & PAGE_MASK, - .flags = flags, - .pgoff = linear_page_index(vma, address), - .gfp_mask = __get_fault_gfp_mask(vma), - }; - unsigned int dirty = flags & FAULT_FLAG_WRITE; + struct vm_area_struct *vma = vmf->vma; + unsigned long address = vmf->address; + unsigned int dirty = vmf->flags & FAULT_FLAG_WRITE; struct mm_struct *mm = vma->vm_mm; pgd_t *pgd; p4d_t *p4d; vm_fault_t ret; + vmf->address = address & PAGE_MASK; + vmf->pgoff = linear_page_index(vma, address); + vmf->gfp_mask = __get_fault_gfp_mask(vma); + pgd = pgd_offset(mm, address); p4d = p4d_alloc(mm, pgd, address); if (!p4d) return VM_FAULT_OOM; - vmf.pud = pud_alloc(mm, p4d, address); - if (!vmf.pud) + vmf->pud = pud_alloc(mm, p4d, address); + if (!vmf->pud) return VM_FAULT_OOM; - if (pud_none(*vmf.pud) && transparent_hugepage_enabled(vma)) { - ret = create_huge_pud(&vmf); + if (pud_none(*vmf->pud) && transparent_hugepage_enabled(vma)) { + ret = create_huge_pud(vmf); if (!(ret & VM_FAULT_FALLBACK)) return ret; } else { - pud_t orig_pud = *vmf.pud; + pud_t orig_pud = *vmf->pud; barrier(); if (pud_trans_huge(orig_pud) || pud_devmap(orig_pud)) { @@ -3843,50 +3841,50 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, /* NUMA case for anonymous PUDs would go here */ if (dirty && !pud_write(orig_pud)) { - ret = wp_huge_pud(&vmf, orig_pud); + ret = wp_huge_pud(vmf, orig_pud); if (!(ret & VM_FAULT_FALLBACK)) return ret; } else { - huge_pud_set_accessed(&vmf, orig_pud); + huge_pud_set_accessed(vmf, orig_pud); return 0; } } } - vmf.pmd = pmd_alloc(mm, vmf.pud, address); - if (!vmf.pmd) + vmf->pmd = pmd_alloc(mm, vmf->pud, address); + if (!vmf->pmd) return VM_FAULT_OOM; - if (pmd_none(*vmf.pmd) && transparent_hugepage_enabled(vma)) { - ret = create_huge_pmd(&vmf); + if (pmd_none(*vmf->pmd) && transparent_hugepage_enabled(vma)) { + ret = create_huge_pmd(vmf); if (!(ret & VM_FAULT_FALLBACK)) return ret; } else { - pmd_t orig_pmd = *vmf.pmd; + pmd_t orig_pmd = *vmf->pmd; barrier(); if (unlikely(is_swap_pmd(orig_pmd))) { VM_BUG_ON(thp_migration_supported() && !is_pmd_migration_entry(orig_pmd)); if (is_pmd_migration_entry(orig_pmd)) - pmd_migration_entry_wait(mm, vmf.pmd); + pmd_migration_entry_wait(mm, vmf->pmd); return 0; } if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { if (pmd_protnone(orig_pmd) && vma_is_accessible(vma)) - return do_huge_pmd_numa_page(&vmf, orig_pmd); + return do_huge_pmd_numa_page(vmf, orig_pmd); if (dirty && !pmd_write(orig_pmd)) { - ret = wp_huge_pmd(&vmf, orig_pmd); + ret = wp_huge_pmd(vmf, orig_pmd); if (!(ret & VM_FAULT_FALLBACK)) return ret; } else { - huge_pmd_set_accessed(&vmf, orig_pmd); + huge_pmd_set_accessed(vmf, orig_pmd); return 0; } } } - return handle_pte_fault(&vmf); + return handle_pte_fault(vmf); } /* @@ -3895,9 +3893,10 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, * The mmap_sem may have been released depending on flags and our * return value. See filemap_fault() and __lock_page_or_retry(). */ -vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, - unsigned int flags) +static vm_fault_t do_handle_mm_fault(struct vm_fault *vmf) { + struct vm_area_struct *vma = vmf->vma; + unsigned int flags = vmf->flags; vm_fault_t ret; __set_current_state(TASK_RUNNING); @@ -3921,9 +3920,9 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, mem_cgroup_enter_user_fault(); if (unlikely(is_vm_hugetlb_page(vma))) - ret = hugetlb_fault(vma->vm_mm, vma, address, flags); + ret = hugetlb_fault(vma->vm_mm, vma, vmf->address, flags); else - ret = __handle_mm_fault(vma, address, flags); + ret = __handle_mm_fault(vmf); if (flags & FAULT_FLAG_USER) { mem_cgroup_exit_user_fault(); @@ -3939,8 +3938,26 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, return ret; } + +vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, + unsigned int flags) +{ + struct vm_fault vmf = {}; + vm_fault_t ret; + + vm_fault_init(&vmf, vma, address, flags); + ret = do_handle_mm_fault(&vmf); + vm_fault_cleanup(&vmf); + return ret; +} EXPORT_SYMBOL_GPL(handle_mm_fault); +vm_fault_t handle_mm_fault_cacheable(struct vm_fault *vmf) +{ + return do_handle_mm_fault(vmf); +} +EXPORT_SYMBOL_GPL(handle_mm_fault_cacheable); + #ifndef __PAGETABLE_P4D_FOLDED /* * Allocate p4d page table. From patchwork Fri Nov 30 19:58:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 10707081 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 718D41057 for ; Fri, 30 Nov 2018 19:58:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 62646304F0 for ; Fri, 30 Nov 2018 19:58:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 560BE304C9; Fri, 30 Nov 2018 19:58:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BF85B304C9 for ; Fri, 30 Nov 2018 19:58:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727039AbeLAHIm (ORCPT ); Sat, 1 Dec 2018 02:08:42 -0500 Received: from mail-yb1-f196.google.com ([209.85.219.196]:39294 "EHLO mail-yb1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726808AbeLAHIl (ORCPT ); Sat, 1 Dec 2018 02:08:41 -0500 Received: by mail-yb1-f196.google.com with SMTP id w17-v6so2702909ybl.6 for ; Fri, 30 Nov 2018 11:58:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=tfQKrc5MaRoWXLgCC8cvo0W1La7ADJ1hTk4hFrcxJ2w=; b=RDqpS5N7sRdUqzfJbuUQ3cP6FZpSxwXrU7nIdtgeLwAMDypJ/AuyKBAVURaYyIpF+W TUH2qDG4sSw5CnfzSAptkUg9lt+KdZAK+fgbTlSAWUmnsUi4y6VWMS5jZtZQKr39Njv/ F8ZwAG81ZhZYKh7m/tVeTdOki3HtqStKzcvGDk0owKjUXs3fTpd2tJ9Rs0BrFOp8rDTv IpC12b7mnQMbZP4XGeqQMEzrvS3ty0nrYovoxKqVkHGcEbLTCAAlHOUg1vvZMXmuLnA3 0ld3tOkRZh1iEo4DOXmJFJ2K3rCYGyhbIjrDMvgb3ZrdwLjwfYfU25tiozmcObmmIpqM ezSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=tfQKrc5MaRoWXLgCC8cvo0W1La7ADJ1hTk4hFrcxJ2w=; b=QF+OO3qSmEmL/lKb3EFnbSltUG8XlgrzPQfvRt4hobr+io3sWDfQ8NWRDy7a7N7Kmc vqajk+yBYKrkgtbPJqg2tbtZOxj49YBTN22kZJnQnDn9RxiUHssFPPEAvO+I6RCKDsnp cy34GsnAtCkCEQTTRCM0J0BsTh/JVUysPS4rQCsK+o2Q1bwi4ZZG0kBu4X9To2vBj5Yg iBR/mSe/pV0NoGT/jYHSsI/Si4NnuM4srWHl4rGjmZudqxCNG73i/3JTrPz6QBT5mS5z zHFy0hJJyWW73c7QIOsUVrs9l6zAuja1yYP0YpMAgvmXt148FhAbp3ZkI0QPS/Ekmp71 FbSw== X-Gm-Message-State: AA+aEWamDytILTmrifkNkQiG7v2rWRB1IjQjA0AtP5UuVsB2EfjnKfVG BjNpA4z79EYngH+7zzYfmTUpnA== X-Google-Smtp-Source: AFSGD/XWgCRVHDSF9Pe7/FTfBQjuZIj2paksVvKyVRtCwfnLtusXiVGu/hAx627t9t+3WpViWhH6Xw== X-Received: by 2002:a25:bb4c:: with SMTP id b12-v6mr6648882ybk.45.1543607897615; Fri, 30 Nov 2018 11:58:17 -0800 (PST) Received: from localhost ([107.15.81.208]) by smtp.gmail.com with ESMTPSA id e194sm2151940ywa.85.2018.11.30.11.58.16 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 30 Nov 2018 11:58:16 -0800 (PST) From: Josef Bacik To: kernel-team@fb.com, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, tj@kernel.org, david@fromorbit.com, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, riel@redhat.com, jack@suse.cz Subject: [PATCH 2/4] filemap: kill page_cache_read usage in filemap_fault Date: Fri, 30 Nov 2018 14:58:10 -0500 Message-Id: <20181130195812.19536-3-josef@toxicpanda.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20181130195812.19536-1-josef@toxicpanda.com> References: <20181130195812.19536-1-josef@toxicpanda.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If we do not have a page at filemap_fault time we'll do this weird forced page_cache_read thing to populate the page, and then drop it again and loop around and find it. This makes for 2 ways we can read a page in filemap_fault, and it's not really needed. Instead add a FGP_FOR_MMAP flag so that pagecache_get_page() will return a unlocked page that's in pagecache. Then use the normal page locking and readpage logic already in filemap_fault. This simplifies the no page in page cache case significantly. Signed-off-by: Josef Bacik Acked-by: Johannes Weiner Reviewed-by: Jan Kara --- include/linux/pagemap.h | 1 + mm/filemap.c | 73 ++++++++++--------------------------------------- 2 files changed, 16 insertions(+), 58 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 226f96f0dee0..b13c2442281f 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -252,6 +252,7 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping, #define FGP_WRITE 0x00000008 #define FGP_NOFS 0x00000010 #define FGP_NOWAIT 0x00000020 +#define FGP_FOR_MMAP 0x00000040 struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, int fgp_flags, gfp_t cache_gfp_mask); diff --git a/mm/filemap.c b/mm/filemap.c index 81adec8ee02c..f068712c2525 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1503,6 +1503,9 @@ EXPORT_SYMBOL(find_lock_entry); * @gfp_mask and added to the page cache and the VM's LRU * list. The page is returned locked and with an increased * refcount. Otherwise, NULL is returned. + * - FGP_FOR_MMAP: Similar to FGP_CREAT, only it unlocks the page after it has + * added it to pagecache, as the mmap code expects to do it's own special + * locking dance. * * If FGP_LOCK or FGP_CREAT are specified then the function may sleep even * if the GFP flags specified for FGP_CREAT are atomic. @@ -1555,7 +1558,7 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, if (!page) return NULL; - if (WARN_ON_ONCE(!(fgp_flags & FGP_LOCK))) + if (WARN_ON_ONCE(!(fgp_flags & (FGP_LOCK | FGP_FOR_MMAP)))) fgp_flags |= FGP_LOCK; /* Init accessed so avoid atomic mark_page_accessed later */ @@ -1569,6 +1572,13 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, if (err == -EEXIST) goto repeat; } + + /* + * add_to_page_cache_lru lock's the page, and for mmap we expect + * a unlocked page. + */ + if (fgp_flags & FGP_FOR_MMAP) + unlock_page(page); } return page; @@ -2293,39 +2303,6 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) EXPORT_SYMBOL(generic_file_read_iter); #ifdef CONFIG_MMU -/** - * page_cache_read - adds requested page to the page cache if not already there - * @file: file to read - * @offset: page index - * @gfp_mask: memory allocation flags - * - * This adds the requested page to the page cache if it isn't already there, - * and schedules an I/O to read in its contents from disk. - */ -static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask) -{ - struct address_space *mapping = file->f_mapping; - struct page *page; - int ret; - - do { - page = __page_cache_alloc(gfp_mask); - if (!page) - return -ENOMEM; - - ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask); - if (ret == 0) - ret = mapping->a_ops->readpage(file, page); - else if (ret == -EEXIST) - ret = 0; /* losing race to add is OK */ - - put_page(page); - - } while (ret == AOP_TRUNCATED_PAGE); - - return ret; -} - #define MMAP_LOTSAMISS (100) /* @@ -2449,9 +2426,11 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT); ret = VM_FAULT_MAJOR; retry_find: - page = find_get_page(mapping, offset); + page = pagecache_get_page(mapping, offset, + FGP_CREAT|FGP_FOR_MMAP, + vmf->gfp_mask); if (!page) - goto no_cached_page; + return vmf_error(-ENOMEM); } if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) { @@ -2488,28 +2467,6 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) vmf->page = page; return ret | VM_FAULT_LOCKED; -no_cached_page: - /* - * We're only likely to ever get here if MADV_RANDOM is in - * effect. - */ - error = page_cache_read(file, offset, vmf->gfp_mask); - - /* - * The page we want has now been added to the page cache. - * In the unlikely event that someone removed it in the - * meantime, we'll just come back here and read it again. - */ - if (error >= 0) - goto retry_find; - - /* - * An error return from page_cache_read can result if the - * system is low on memory, or a problem occurs while trying - * to schedule I/O. - */ - return vmf_error(error); - page_not_uptodate: /* * Umm, take care of errors if the page isn't up-to-date. From patchwork Fri Nov 30 19:58:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 10707089 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9E91814BD for ; Fri, 30 Nov 2018 19:58:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9182E30453 for ; Fri, 30 Nov 2018 19:58:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 858F8304F0; Fri, 30 Nov 2018 19:58:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C58F4304C9 for ; Fri, 30 Nov 2018 19:58:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727088AbeLAHIp (ORCPT ); Sat, 1 Dec 2018 02:08:45 -0500 Received: from mail-yw1-f66.google.com ([209.85.161.66]:43750 "EHLO mail-yw1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727035AbeLAHIn (ORCPT ); Sat, 1 Dec 2018 02:08:43 -0500 Received: by mail-yw1-f66.google.com with SMTP id l200so2762226ywe.10 for ; Fri, 30 Nov 2018 11:58:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=nUZutRNzquUqiT8YQe9ytPV/656PUstfyd8NxlX/hPI=; b=pGqfh1b/nsbK/ZbCXMPcYm87vbT8EJixts8PxW5qh/PS62xI0lfhXgstg/SfXmcgKx QL8eZ/eIwVWknS6P3BhLvtDQdWszrYyJ1QwEt94N/dKXdv2geai/8blz0IKRlaedV9a3 kTTXnVh6UVcSBASSiHhFEJ/XCNZ0i9pc8m87rH/W68Yqp/GhcX/uyHvsmhBMlQeJ+IMm nJzEv1nMrwT8VbQFp6GqCt4JSUMGfTDEbIdakFPs2kJcdTQfrMWq4lURis0hDKJXfhnq Te6J3KoAQ4jU9tgqDrcyIcSSHxaSDvRoVHvuDbyQj8YPy/lq7/kuL8g4zlM4OIEmaQWV BZQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=nUZutRNzquUqiT8YQe9ytPV/656PUstfyd8NxlX/hPI=; b=W4wdo7ou+e8sNEPzyh4L/yZ7fTE8Dx5NXZtjWc1DVIlarSSAUKWhsBJ0uSFTMvQ/WW s27ajsxRO2TXLgseCjunDXU9EksGl1I4fylUCajqAhZZvBDJtLdBOkFrH7UtCJl7uWy1 LuudKnBXbT6UPUm7+MImFjHz4hDo6vabo363zHLh58qqymDrmUAeT84nILVYb/s5UvWm rIRMsA9+jkpKIIUDRpoy02IDOQWMxivd35PnuvEdInXff5IZJw9kw+0EdGoEggqVBXg6 SquxVpVSq15ppHiq6kKFkMtfKk7Z1FY1+k8ulO61Y9UccJs5YjdngsEG4TeKm/9H2iL5 kdZQ== X-Gm-Message-State: AA+aEWaPdwjrLhZJzxu7XiGl7HQZk1cIvFaN/cOux2JRZBLfJgMZsgfX 47tB8DW+YYtWn9ps4QWd/eL1IA== X-Google-Smtp-Source: AFSGD/UTW3BkBMXkE9PGFk0hChnQLXXRjoCPzae6goRhgMpcuc2F0LFE+t5itICdCj2HY6SpfuWGxg== X-Received: by 2002:a81:3402:: with SMTP id b2mr7025741ywa.12.1543607899303; Fri, 30 Nov 2018 11:58:19 -0800 (PST) Received: from localhost ([107.15.81.208]) by smtp.gmail.com with ESMTPSA id p201sm2705356ywe.45.2018.11.30.11.58.18 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 30 Nov 2018 11:58:18 -0800 (PST) From: Josef Bacik To: kernel-team@fb.com, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, tj@kernel.org, david@fromorbit.com, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, riel@redhat.com, jack@suse.cz Subject: [PATCH 3/4] filemap: drop the mmap_sem for all blocking operations Date: Fri, 30 Nov 2018 14:58:11 -0500 Message-Id: <20181130195812.19536-4-josef@toxicpanda.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20181130195812.19536-1-josef@toxicpanda.com> References: <20181130195812.19536-1-josef@toxicpanda.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Currently we only drop the mmap_sem if there is contention on the page lock. The idea is that we issue readahead and then go to lock the page while it is under IO and we want to not hold the mmap_sem during the IO. The problem with this is the assumption that the readahead does anything. In the case that the box is under extreme memory or IO pressure we may end up not reading anything at all for readahead, which means we will end up reading in the page under the mmap_sem. Instead rework filemap fault path to drop the mmap sem at any point that we may do IO or block for an extended period of time. This includes while issuing readahead, locking the page, or needing to call ->readpage because readahead did not occur. Then once we have a fully uptodate page we can return with VM_FAULT_RETRY and come back again to find our nicely in-cache page that was gotten outside of the mmap_sem. Signed-off-by: Josef Bacik Acked-by: Johannes Weiner --- mm/filemap.c | 113 ++++++++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 93 insertions(+), 20 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index f068712c2525..5e76b24b2a0f 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2304,28 +2304,44 @@ EXPORT_SYMBOL(generic_file_read_iter); #ifdef CONFIG_MMU #define MMAP_LOTSAMISS (100) +static struct file *maybe_unlock_mmap_for_io(struct file *fpin, + struct vm_area_struct *vma, + int flags) +{ + if (fpin) + return fpin; + if ((flags & (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT)) == + FAULT_FLAG_ALLOW_RETRY) { + fpin = get_file(vma->vm_file); + up_read(&vma->vm_mm->mmap_sem); + } + return fpin; +} /* * Synchronous readahead happens when we don't even find * a page in the page cache at all. */ -static void do_sync_mmap_readahead(struct vm_area_struct *vma, - struct file_ra_state *ra, - struct file *file, - pgoff_t offset) +static struct file *do_sync_mmap_readahead(struct vm_area_struct *vma, + struct file_ra_state *ra, + struct file *file, + pgoff_t offset, + int flags) { struct address_space *mapping = file->f_mapping; + struct file *fpin = NULL; /* If we don't want any read-ahead, don't bother */ if (vma->vm_flags & VM_RAND_READ) - return; + return fpin; if (!ra->ra_pages) - return; + return fpin; if (vma->vm_flags & VM_SEQ_READ) { + fpin = maybe_unlock_mmap_for_io(fpin, vma, flags); page_cache_sync_readahead(mapping, ra, file, offset, ra->ra_pages); - return; + return fpin; } /* Avoid banging the cache line if not needed */ @@ -2337,37 +2353,43 @@ static void do_sync_mmap_readahead(struct vm_area_struct *vma, * stop bothering with read-ahead. It will only hurt. */ if (ra->mmap_miss > MMAP_LOTSAMISS) - return; + return fpin; /* * mmap read-around */ + fpin = maybe_unlock_mmap_for_io(fpin, vma, flags); ra->start = max_t(long, 0, offset - ra->ra_pages / 2); ra->size = ra->ra_pages; ra->async_size = ra->ra_pages / 4; ra_submit(ra, mapping, file); + return fpin; } /* * Asynchronous readahead happens when we find the page and PG_readahead, * so we want to possibly extend the readahead further.. */ -static void do_async_mmap_readahead(struct vm_area_struct *vma, - struct file_ra_state *ra, - struct file *file, - struct page *page, - pgoff_t offset) +static struct file *do_async_mmap_readahead(struct vm_area_struct *vma, + struct file_ra_state *ra, + struct file *file, + struct page *page, + pgoff_t offset, int flags) { struct address_space *mapping = file->f_mapping; + struct file *fpin = NULL; /* If we don't want any read-ahead, don't bother */ if (vma->vm_flags & VM_RAND_READ) - return; + return fpin; if (ra->mmap_miss > 0) ra->mmap_miss--; - if (PageReadahead(page)) + if (PageReadahead(page)) { + fpin = maybe_unlock_mmap_for_io(fpin, vma, flags); page_cache_async_readahead(mapping, ra, file, page, offset, ra->ra_pages); + } + return fpin; } /** @@ -2397,6 +2419,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) { int error; struct file *file = vmf->vma->vm_file; + struct file *fpin = NULL; struct address_space *mapping = file->f_mapping; struct file_ra_state *ra = &file->f_ra; struct inode *inode = mapping->host; @@ -2418,10 +2441,12 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) * We found the page, so try async readahead before * waiting for the lock. */ - do_async_mmap_readahead(vmf->vma, ra, file, page, offset); + fpin = do_async_mmap_readahead(vmf->vma, ra, file, page, offset, + vmf->flags); } else if (!page) { /* No page in the page cache at all */ - do_sync_mmap_readahead(vmf->vma, ra, file, offset); + fpin = do_sync_mmap_readahead(vmf->vma, ra, file, offset, + vmf->flags); count_vm_event(PGMAJFAULT); count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT); ret = VM_FAULT_MAJOR; @@ -2433,9 +2458,32 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) return vmf_error(-ENOMEM); } - if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) { - put_page(page); - return ret | VM_FAULT_RETRY; + /* + * We are open-coding lock_page_or_retry here because we want to do the + * readpage if necessary while the mmap_sem is dropped. If there + * happens to be a lock on the page but it wasn't being faulted in we'd + * come back around without ALLOW_RETRY set and then have to do the IO + * under the mmap_sem, which would be a bummer. + */ + if (!trylock_page(page)) { + fpin = maybe_unlock_mmap_for_io(fpin, vmf->vma, vmf->flags); + if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT) + goto out_retry; + if (vmf->flags & FAULT_FLAG_KILLABLE) { + if (__lock_page_killable(page)) { + /* + * If we don't have the right flags for + * maybe_unlock_mmap_for_io to do it's thing we + * still need to drop the sem and return + * VM_FAULT_RETRY so the upper layer checks the + * signal and takes the appropriate action. + */ + if (!fpin) + up_read(&vmf->vma->vm_mm->mmap_sem); + goto out_retry; + } + } else + __lock_page(page); } /* Did it get truncated? */ @@ -2453,6 +2501,16 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) if (unlikely(!PageUptodate(page))) goto page_not_uptodate; + /* + * We've made it this far and we had to drop our mmap_sem, now is the + * time to return to the upper layer and have it re-find the vma and + * redo the fault. + */ + if (fpin) { + unlock_page(page); + goto out_retry; + } + /* * Found the page and have a reference on it. * We must recheck i_size under page lock. @@ -2475,12 +2533,15 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) * and we need to check for errors. */ ClearPageError(page); + fpin = maybe_unlock_mmap_for_io(fpin, vmf->vma, vmf->flags); error = mapping->a_ops->readpage(file, page); if (!error) { wait_on_page_locked(page); if (!PageUptodate(page)) error = -EIO; } + if (fpin) + goto out_retry; put_page(page); if (!error || error == AOP_TRUNCATED_PAGE) @@ -2489,6 +2550,18 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) /* Things didn't work out. Return zero to tell the mm layer so. */ shrink_readahead_size_eio(file, ra); return VM_FAULT_SIGBUS; + +out_retry: + /* + * We dropped the mmap_sem, we need to return to the fault handler to + * re-find the vma and come back and find our hopefully still populated + * page. + */ + if (page) + put_page(page); + if (fpin) + fput(fpin); + return ret | VM_FAULT_RETRY; } EXPORT_SYMBOL(filemap_fault); From patchwork Fri Nov 30 19:58:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 10707091 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3A8CB1057 for ; Fri, 30 Nov 2018 19:58:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2E29030453 for ; Fri, 30 Nov 2018 19:58:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 223F3304F0; Fri, 30 Nov 2018 19:58:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BE79430453 for ; Fri, 30 Nov 2018 19:58:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727072AbeLAHIp (ORCPT ); Sat, 1 Dec 2018 02:08:45 -0500 Received: from mail-yb1-f194.google.com ([209.85.219.194]:34490 "EHLO mail-yb1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726808AbeLAHIo (ORCPT ); Sat, 1 Dec 2018 02:08:44 -0500 Received: by mail-yb1-f194.google.com with SMTP id a67-v6so2720899ybg.1 for ; Fri, 30 Nov 2018 11:58:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=ZiSWalXRo2lAHNoxFcnCBmYqM80BUgBK6TZ//qlBoW0=; b=qE17nE2Jo3x4abm13SJxAAIFGmnZyoF8C/6nu8aRpvS4rDGOAT+NB7nFb5qR/q9r0W wyEq3cqlr+cDklN/eobicAnmyPZ5O00B6CLcEWwkMfbaQv2f1e/zarf+qIW/wPERWt9z QGzxh3p+tv3y8zML5KreAJ0J0fdK1gQiiTp1jNjrMW9vgTdx5CDjGysHbm7NtzHtqj2z G48SlfhBIGMBXljirIlfet56WObqccekan4prBch/7ywFpwIc9gEWxYUBnEKPj4xUmnP p7oYI1XwYAfhLaNWOYg0r74CjigClE0qUNLmKqCBjSEdRHgY4p9ac7AqgtmRfyHOfhUM 7DqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=ZiSWalXRo2lAHNoxFcnCBmYqM80BUgBK6TZ//qlBoW0=; b=oz8xGFl2o2JNMrroBQgCEcnKNTTnM8TTqMb23sC8r3EzberpiOBz1mhrobzugZ2jQG TGlQ26u67/pGDIo3NlRM2hHyE/6BYST6wj+UDm6G8vbLSq8iFj0pKrywP7H+bi3Bz7Ck drXNmSOApRRL/HDt+OtxErcSBmRnqmfmLW5JE0+Uj/UF3rt1mD14w/7135Ut6aZ5BFTN mD0vwh5x6tLjKOZdWLGGFnh7kAOipfmn853exiOPEoquoR2RUtPTUkYGKBh4drU3qo0K OzsOIOpivFLdsid0kTTRrmx/Q5yG88CSe1tqnQpiQeT+mzOPFOhZOEpScizAq+l8GFi3 Q03A== X-Gm-Message-State: AA+aEWZPocxlJEnWN7VZIDFh8yZNWF+VyfKWtScRzrCKEibXkrRmqOLg hverR+qRRqMzX1Mr94vzk/BqQg== X-Google-Smtp-Source: AFSGD/WUySeblOm5lO6iEZroC7YfnS5osaYyKXSy7/i64tvWyXdByqa0y4VKPk5GYCaEeI3VijK3IQ== X-Received: by 2002:a5b:b09:: with SMTP id z9-v6mr6591976ybp.483.1543607900825; Fri, 30 Nov 2018 11:58:20 -0800 (PST) Received: from localhost ([107.15.81.208]) by smtp.gmail.com with ESMTPSA id z74sm3209536ywz.51.2018.11.30.11.58.19 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 30 Nov 2018 11:58:20 -0800 (PST) From: Josef Bacik To: kernel-team@fb.com, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, tj@kernel.org, david@fromorbit.com, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, riel@redhat.com, jack@suse.cz Subject: [PATCH 4/4] mm: use the cached page for filemap_fault Date: Fri, 30 Nov 2018 14:58:12 -0500 Message-Id: <20181130195812.19536-5-josef@toxicpanda.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20181130195812.19536-1-josef@toxicpanda.com> References: <20181130195812.19536-1-josef@toxicpanda.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If we drop the mmap_sem we have to redo the vma lookup which requires redoing the fault handler. Chances are we will just come back to the same page, so save this page in our vmf->cached_page and reuse it in the next loop through the fault handler. Signed-off-by: Josef Bacik --- mm/filemap.c | 45 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 43 insertions(+), 2 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 5e76b24b2a0f..d4385b704e04 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2392,6 +2392,35 @@ static struct file *do_async_mmap_readahead(struct vm_area_struct *vma, return fpin; } +static int vmf_has_cached_page(struct vm_fault *vmf, struct page **page) +{ + struct page *cached_page = vmf->cached_page; + struct mm_struct *mm = vmf->vma->vm_mm; + struct address_space *mapping = vmf->vma->vm_file->f_mapping; + pgoff_t offset = vmf->pgoff; + + if (!cached_page) + return 0; + + if (vmf->flags & FAULT_FLAG_KILLABLE) { + int ret = lock_page_killable(cached_page); + if (ret) { + up_read(&mm->mmap_sem); + return ret; + } + } else + lock_page(cached_page); + vmf->cached_page = NULL; + if (cached_page->mapping == mapping && + cached_page->index == offset) { + *page = cached_page; + } else { + unlock_page(cached_page); + put_page(cached_page); + } + return 0; +} + /** * filemap_fault - read in file data for page fault handling * @vmf: struct vm_fault containing details of the fault @@ -2425,13 +2454,24 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) struct inode *inode = mapping->host; pgoff_t offset = vmf->pgoff; pgoff_t max_off; - struct page *page; + struct page *page = NULL; vm_fault_t ret = 0; max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); if (unlikely(offset >= max_off)) return VM_FAULT_SIGBUS; + /* + * We may have read in the page already and have a page from an earlier + * loop. If so we need to see if this page is still valid, and if not + * do the whole dance over again. + */ + error = vmf_has_cached_page(vmf, &page); + if (error) + goto out_retry; + if (page) + goto have_cached_page; + /* * Do we have something in the page cache already? */ @@ -2492,6 +2532,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) put_page(page); goto retry_find; } +have_cached_page: VM_BUG_ON_PAGE(page->index != offset, page); /* @@ -2558,7 +2599,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) * page. */ if (page) - put_page(page); + vmf->cached_page = page; if (fpin) fput(fpin); return ret | VM_FAULT_RETRY;