From patchwork Tue Dec 11 17:37:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 10724289 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 893566C5 for ; Tue, 11 Dec 2018 17:38:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7090F2B59B for ; Tue, 11 Dec 2018 17:38:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 64A482B59D; Tue, 11 Dec 2018 17:38:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 929212B59B for ; Tue, 11 Dec 2018 17:38:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F252F8E00BD; Tue, 11 Dec 2018 12:38:09 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EAE338E00B9; Tue, 11 Dec 2018 12:38:09 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D7CDE8E00BD; Tue, 11 Dec 2018 12:38:09 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-yb1-f197.google.com (mail-yb1-f197.google.com [209.85.219.197]) by kanga.kvack.org (Postfix) with ESMTP id A458E8E00B9 for ; Tue, 11 Dec 2018 12:38:09 -0500 (EST) Received: by mail-yb1-f197.google.com with SMTP id s10-v6so10181491ybj.5 for ; Tue, 11 Dec 2018 09:38:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:subject:date:message-id :in-reply-to:references; bh=im1lOdR1gtAkoa5Y+U3jnYHZp+MThsPseW0SoznaozY=; b=RgYjrqyS/pAGj7usf1YqhGb08aSUv4gOxw+k0r7SCkk5IHq2wGz5zy1YOVscBFwWO3 2qpw9h3Xff+EmnG/jkQzOBLpkAuRpcuyGMw2i3WqO4hELug/4CJx/EothPAvOKmfX/Z9 dxK3lL6kdkMIzzbkMlU48BeVZrzFw2bKLocBe2AGnCADD8g1O2inbRx0enT0GKiXp/hO 0t+vW6p2Wub401TuhyCQjcQGI7ER9gOqfMvbULea7jP4pLgzw7TDx41N2fKbZ0s6H72F r1hfPcUfEpawUVlyqMMfTDal+7LCqgGRMlNPBcpleU/uJhGgwm8N3pwmtsUgIxtWtWP4 meAA== X-Gm-Message-State: AA+aEWasA7x5kphRaPrKHZxTrz+lrVAd5ERHV+f8dyv4RJnh6+2wD9qf 8Xlkuh4wvyR+iTb5tqVoDO2VfmylWCelzF9ickM6MFYfgpZ4HqwsrdN9djfArpp9Ae/EHc/6Z5L 6wCbKbiKfdDUC64NphWhMFDPejo/HKfxXp9emF3dZlNeT3QZxCIqP8FOCdv7pKqhMSuktaSQJ4m O/XJ0dSTgtya19GepZaY7SO/K/imzsPcug6uvROw9YOoTohHRUkAoMBPQP1AVIk7/SZyA2KTpwT q7isHltFFN42FrSp2I+z/s4eyvIH2uoJ4OwkEAwvG4sNo3AF4mhusXTugxcPKyi9ciEovNlLBNa BhA0I7YNN4omZdhPdroN7VJ+Xi8NJ6LcZTnI48m96KC0T/rPnF9p8HX8FcYl/U6DIlxTKaz+WU9 Y X-Received: by 2002:a25:5005:: with SMTP id e5mr17253048ybb.236.1544549889348; Tue, 11 Dec 2018 09:38:09 -0800 (PST) X-Received: by 2002:a25:5005:: with SMTP id e5mr17252951ybb.236.1544549888050; Tue, 11 Dec 2018 09:38:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544549887; cv=none; d=google.com; s=arc-20160816; b=lkEHELBuztbKI02Dg1vAon2bl39u70/ZiQCs0QGLeT/OdPBPQdIioCIGYDmr6RoPcH Uo8IJuyHsqQ/kzxb5z5z8iXmqO1CZRDH/KeMiFzK+Z+eVZ9D8Wkj5NTNEDNIQNcmsQoP t74yO8FainGLinnTmUjDM6psUzuxDnD1e4GAANO7EqMW2J6KtmoWkaH8N4nwHq3Ay9TR GUD/PYGBUUmyEPGGbuUYZlgCrnm5/IE1kM2IHO/MN+DEZsJX5mharXYCAh5XiTX0ytHN gls40FI+1pZRMSKlx1QZRHDzzpOEOzMfuPCeKOsbz/HJdLJZ78qwdSMcH61mm8oJ5YH0 5JDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=im1lOdR1gtAkoa5Y+U3jnYHZp+MThsPseW0SoznaozY=; b=bGvtEZGqpA+bx7SbP27ujw++QGdgiBR7EEZihfCCSIXJ6cpzuKuvvJmcn1udAecFOn 9tJcIGzuaAP56iLYYB/kdGIK6qck6R4xwONgAnGMtyTiD+57cnw+kPclFyjNyyb/fS3j r4gy8NWTGIs5HyF0bcFUMJQgpeZ476tmW7ddYFfUohHh7WLVVWaQNONRzHXTqC6vHHVg fMLSnLveCQ5mldCQZXQ4gwWxjWRwEdpuVO/pajO+njAb08UQDVInqQgIgG+Jz7q+l7ZB zxORmyoelXIwOcHZ1zoza3U5ES8WEy90uobD/qwDyU6GyAJcmYgvb34sXpD5pCl0aBxS sCrA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=dElFjUL6; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) smtp.mailfrom=josef@toxicpanda.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id g136sor2058820ywb.33.2018.12.11.09.38.07 for (Google Transport Security); Tue, 11 Dec 2018 09:38:07 -0800 (PST) Received-SPF: neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=dElFjUL6; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) smtp.mailfrom=josef@toxicpanda.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=im1lOdR1gtAkoa5Y+U3jnYHZp+MThsPseW0SoznaozY=; b=dElFjUL6mOWqvfIN7pdHq7KvdZA/TSnmClfaII8u1KnK8lIDouqSSwhJtv6wLAYSOd psu68HLy6vx5qGKAD+FNvOYuyesDwSpolLB537QXJbQKwvHpmTSfPxGxd0ie62LyOsCC bPhbIN2J9w1tFmkIljNhcJjwG/uftGhyT+LsISUVoDJigICYVwZH01j2GhAcX3rRW/Fm BIPFOk7G8nUUpHBATObKMzuKipM1jsmXCHQ5TF0C9pJEvQx2gIY3Gqdswaq0leQD6qA0 8s28SZADVdA4PAF0Vhn5vnPOdVA1ssWBoS8ZuKiY4rCAGueaUKtckTaNkxe0Zf1QiBXO /hCg== X-Google-Smtp-Source: AFSGD/UWgQfH/IFA1ziDeQtpo/Az3ijAiizbIBHDC5CCjFtCTgcbOB9TMHT4hPOe6zI/ZWoFy+YT2g== X-Received: by 2002:a81:56d7:: with SMTP id k206mr17467616ywb.167.1544549887505; Tue, 11 Dec 2018 09:38:07 -0800 (PST) Received: from localhost ([107.15.81.208]) by smtp.gmail.com with ESMTPSA id v9sm9126070ywh.2.2018.12.11.09.38.06 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 11 Dec 2018 09:38:06 -0800 (PST) From: Josef Bacik To: kernel-team@fb.com, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, tj@kernel.org, david@fromorbit.com, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, riel@redhat.com, jack@suse.cz Subject: [PATCH 1/3] filemap: kill page_cache_read usage in filemap_fault Date: Tue, 11 Dec 2018 12:37:59 -0500 Message-Id: <20181211173801.29535-2-josef@toxicpanda.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20181211173801.29535-1-josef@toxicpanda.com> References: <20181211173801.29535-1-josef@toxicpanda.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP If we do not have a page at filemap_fault time we'll do this weird forced page_cache_read thing to populate the page, and then drop it again and loop around and find it. This makes for 2 ways we can read a page in filemap_fault, and it's not really needed. Instead add a FGP_FOR_MMAP flag so that pagecache_get_page() will return a unlocked page that's in pagecache. Then use the normal page locking and readpage logic already in filemap_fault. This simplifies the no page in page cache case significantly. Acked-by: Johannes Weiner Reviewed-by: Jan Kara Signed-off-by: Josef Bacik --- include/linux/pagemap.h | 1 + mm/filemap.c | 73 ++++++++++--------------------------------------- 2 files changed, 16 insertions(+), 58 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 226f96f0dee0..b13c2442281f 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -252,6 +252,7 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping, #define FGP_WRITE 0x00000008 #define FGP_NOFS 0x00000010 #define FGP_NOWAIT 0x00000020 +#define FGP_FOR_MMAP 0x00000040 struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, int fgp_flags, gfp_t cache_gfp_mask); diff --git a/mm/filemap.c b/mm/filemap.c index 81adec8ee02c..03bce38d8f2b 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1503,6 +1503,9 @@ EXPORT_SYMBOL(find_lock_entry); * @gfp_mask and added to the page cache and the VM's LRU * list. The page is returned locked and with an increased * refcount. Otherwise, NULL is returned. + * - FGP_FOR_MMAP: Similar to FGP_CREAT, only we want to allow the caller to do + * its own locking dance if the page is already in cache, or unlock the page + * before returning if we had to add the page to pagecache. * * If FGP_LOCK or FGP_CREAT are specified then the function may sleep even * if the GFP flags specified for FGP_CREAT are atomic. @@ -1555,7 +1558,7 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, if (!page) return NULL; - if (WARN_ON_ONCE(!(fgp_flags & FGP_LOCK))) + if (WARN_ON_ONCE(!(fgp_flags & (FGP_LOCK | FGP_FOR_MMAP)))) fgp_flags |= FGP_LOCK; /* Init accessed so avoid atomic mark_page_accessed later */ @@ -1569,6 +1572,13 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, if (err == -EEXIST) goto repeat; } + + /* + * add_to_page_cache_lru lock's the page, and for mmap we expect + * a unlocked page. + */ + if (fgp_flags & FGP_FOR_MMAP) + unlock_page(page); } return page; @@ -2293,39 +2303,6 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) EXPORT_SYMBOL(generic_file_read_iter); #ifdef CONFIG_MMU -/** - * page_cache_read - adds requested page to the page cache if not already there - * @file: file to read - * @offset: page index - * @gfp_mask: memory allocation flags - * - * This adds the requested page to the page cache if it isn't already there, - * and schedules an I/O to read in its contents from disk. - */ -static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask) -{ - struct address_space *mapping = file->f_mapping; - struct page *page; - int ret; - - do { - page = __page_cache_alloc(gfp_mask); - if (!page) - return -ENOMEM; - - ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask); - if (ret == 0) - ret = mapping->a_ops->readpage(file, page); - else if (ret == -EEXIST) - ret = 0; /* losing race to add is OK */ - - put_page(page); - - } while (ret == AOP_TRUNCATED_PAGE); - - return ret; -} - #define MMAP_LOTSAMISS (100) /* @@ -2449,9 +2426,11 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT); ret = VM_FAULT_MAJOR; retry_find: - page = find_get_page(mapping, offset); + page = pagecache_get_page(mapping, offset, + FGP_CREAT|FGP_FOR_MMAP, + vmf->gfp_mask); if (!page) - goto no_cached_page; + return vmf_error(-ENOMEM); } if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) { @@ -2488,28 +2467,6 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) vmf->page = page; return ret | VM_FAULT_LOCKED; -no_cached_page: - /* - * We're only likely to ever get here if MADV_RANDOM is in - * effect. - */ - error = page_cache_read(file, offset, vmf->gfp_mask); - - /* - * The page we want has now been added to the page cache. - * In the unlikely event that someone removed it in the - * meantime, we'll just come back here and read it again. - */ - if (error >= 0) - goto retry_find; - - /* - * An error return from page_cache_read can result if the - * system is low on memory, or a problem occurs while trying - * to schedule I/O. - */ - return vmf_error(error); - page_not_uptodate: /* * Umm, take care of errors if the page isn't up-to-date. From patchwork Tue Dec 11 17:38:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 10724291 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BCCD16C5 for ; Tue, 11 Dec 2018 17:38:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A41942B59B for ; Tue, 11 Dec 2018 17:38:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 97AE22B59D; Tue, 11 Dec 2018 17:38:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C8FFB2B59B for ; Tue, 11 Dec 2018 17:38:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 97DAD8E00BE; Tue, 11 Dec 2018 12:38:13 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 92CAB8E00B9; Tue, 11 Dec 2018 12:38:13 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7F67B8E00BE; Tue, 11 Dec 2018 12:38:13 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-yw1-f71.google.com (mail-yw1-f71.google.com [209.85.161.71]) by kanga.kvack.org (Postfix) with ESMTP id 4A1AA8E00B9 for ; Tue, 11 Dec 2018 12:38:13 -0500 (EST) Received: by mail-yw1-f71.google.com with SMTP id v7so9187681ywv.1 for ; Tue, 11 Dec 2018 09:38:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:subject:date:message-id :in-reply-to:references; bh=qqsfgaIEtlZusOyH8VwHzXg9UNC/o2gxh/Ta/fayrgA=; b=ci2dL+UBGRxoMenNl/GHEreCrNkWErRmXqDFidB2oqjyuYCZjUqMGcvQFtJUzuwLbf 8d0RiWU9FXDFGcNTSjtqh9kJ1JGogjhVzjXjz+TFHoVkHCLDUPg/2cbkXUsII04/Q7I+ KzIzfnlNLQ53M2afv5ycaPTafxgX7rW98klDeWt2Cs3eV97ShsZwqYGibxwvb3Fi7JqV i1Qw25zQDDBxwxp7Oj/doYNacr+8kIWPu3r4kfzxU0W9WJer7ZpZ8Hd9gkoJI+a1TODZ sF3xt7Hiia2pPt78j8O6/Zu6rUTit1zAfDi8YzfHbZ9W0FQRjMIwoPczGEM01NqBrnI9 9zHQ== X-Gm-Message-State: AA+aEWZe0a+xN/LnC12Tohz1ycPDXAznK7L6lE04MWVpxVy7VeAJR0lq zycucp8oyQgaeZuMVqaFVw63g1N3/fghJf836y3S7rWUBMY6aOrmOPSITNPcWbenUkZXcn+r88b WgdjZS8ADX157WSu1aLcWrgHvvEGBJKq/iBI4AFVSalXns1CZ9yqsg+CCbShJI2LmxuM8QgIl/K pmyFaYNUVNUy5hXbfxP4rQAr/CCdcK5TbwxXHlqXAMGUFUSwKbK7jurTskS/Sa85vpgZ2xCVdxX NsN5y0FfZsTw/YW2VLqGOU3Se/BMMnLvnLnVCtUdKltlZ7ESQozfUIHxLsNf16w2fixA5uSKaUg Hk2qOpNr9ZlAbwvffrpK5H8Hk/kFBxhX5dy0F8uYGFTIND0QGu2LHeZCiuD+oKFpJ1eOmiGZav/ S X-Received: by 2002:a25:57c4:: with SMTP id l187mr17219136ybb.252.1544549893013; Tue, 11 Dec 2018 09:38:13 -0800 (PST) X-Received: by 2002:a25:57c4:: with SMTP id l187mr17219041ybb.252.1544549891642; Tue, 11 Dec 2018 09:38:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544549891; cv=none; d=google.com; s=arc-20160816; b=L5qM98C2rFbdArwykM6FtDRkfxIJjFss2O/LObaDyx+CNTQdw/+04FAEa/d8ZrU6vA rEGqCBnyxLZBeyZno+E+sL/0T5s2NKne5lFAIiostvHX6pQGDqdaQlKW2Yj27gDImiQ/ KyKfgGuoVQSkqDIRQ6TPlmqIMn0NV7v5LXJMPrieaVMv86LUv0UOXQuZE/Hvm69UkdUS aiBp+qj+W6FpLdBnKDwcY2vrvk0jlXPhj7HnqZpZZtYlrjpA1l2FTIUUPUECC5D5IH+4 bArDwCDxn1UwkqWIN1gygv7mjssavVxnzine0QfR5cNPLRPbBgJbBGKO1mY4Tkf3gZTY 1Xpg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=qqsfgaIEtlZusOyH8VwHzXg9UNC/o2gxh/Ta/fayrgA=; b=kIhLZfbham8kRza7crZT5YFUB2cEktvrpPMLOw1ebZhEikidqUZLGScfQUmM5Ad3qX PenxFYsypKopF3piBZbjLO+HqATOOu/3H3cyXnEhawFKGGWYgsb1rUcZpsY7eNjwXL7u esmjSxobBJQ/Tgtn+E1r9+bl22MWSq+GAcWGQJ3Y73vU+sVKZkhv78cn12R3/o3E7kE0 7c0ls46yCFmZ3yjV+mPPu0VbffNtnqExwqSBQSbcV6dBPPdVni5qOd2SEP1H56a0jteT WAU60YET6TN1mgEa7IUVYV3BYfkSkx1EGoHCznXSqgVN/M/Ci2FE0Vwvq15b9papu6XM 7YKA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=02fJu3Ew; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) smtp.mailfrom=josef@toxicpanda.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id w2sor2755398ywd.159.2018.12.11.09.38.11 for (Google Transport Security); Tue, 11 Dec 2018 09:38:11 -0800 (PST) Received-SPF: neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=02fJu3Ew; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) smtp.mailfrom=josef@toxicpanda.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=qqsfgaIEtlZusOyH8VwHzXg9UNC/o2gxh/Ta/fayrgA=; b=02fJu3EwRY7TT8tr0X8A0YaVs6sc3p6RYmOCxZS5jtDz4RZu1sfQIJ8+m3+TfiNhyC Q5/DrtNxC5XcINKji/izNTAH2skm3lrnl+QSKt8hRk+iXlXrPosmNqKtvu9/9Aa1TzyU p8u2NYDqySq7AThC3WKH1EPera3SUhpCk26FCfGu6JGcEtjgDof++DM+uVS5+Kljyt3U 7BsZuc0PHrm4/69Ive3pGwurzyq1eEySM7mJVjCOTiMo/+mM6QLkTz2hyflsK1p6gqeO 1F4tBm4avivK4F0l+U8HDb4suvuHyKD55ApV/r2bXXQ36PluaijMie8tBt4sd6nezLAL LlvQ== X-Google-Smtp-Source: AFSGD/XOODM7RPX6IBhslmbdg1It8+FvTnuv6mE8nNEfnuaQOF9O9LZWPNt11BUYkh1CQmAAnJVhaQ== X-Received: by 2002:a81:6c90:: with SMTP id h138mr16997570ywc.379.1544549891199; Tue, 11 Dec 2018 09:38:11 -0800 (PST) Received: from localhost ([107.15.81.208]) by smtp.gmail.com with ESMTPSA id l83sm4843981ywb.23.2018.12.11.09.38.09 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 11 Dec 2018 09:38:09 -0800 (PST) From: Josef Bacik To: kernel-team@fb.com, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, tj@kernel.org, david@fromorbit.com, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, riel@redhat.com, jack@suse.cz Subject: [PATCH 3/3] filemap: drop the mmap_sem for all blocking operations Date: Tue, 11 Dec 2018 12:38:01 -0500 Message-Id: <20181211173801.29535-4-josef@toxicpanda.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20181211173801.29535-1-josef@toxicpanda.com> References: <20181211173801.29535-1-josef@toxicpanda.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Currently we only drop the mmap_sem if there is contention on the page lock. The idea is that we issue readahead and then go to lock the page while it is under IO and we want to not hold the mmap_sem during the IO. The problem with this is the assumption that the readahead does anything. In the case that the box is under extreme memory or IO pressure we may end up not reading anything at all for readahead, which means we will end up reading in the page under the mmap_sem. Even if the readahead does something, it could get throttled because of io pressure on the system and the process is in a lower priority cgroup. Holding the mmap_sem while doing IO is problematic because it can cause system-wide priority inversions. Consider some large company that does a lot of web traffic. This large company has load balancing logic in it's core web server, cause some engineer thought this was a brilliant plan. This load balancing logic gets statistics from /proc about the system, which trip over processes mmap_sem for various reasons. Now the web server application is in a protected cgroup, but these other processes may not be, and if they are being throttled while their mmap_sem is held we'll stall, and cause this nice death spiral. Instead rework filemap fault path to drop the mmap sem at any point that we may do IO or block for an extended period of time. This includes while issuing readahead, locking the page, or needing to call ->readpage because readahead did not occur. Then once we have a fully uptodate page we can return with VM_FAULT_RETRY and come back again to find our nicely in-cache page that was gotten outside of the mmap_sem. This patch also adds a new helper for locking the page with the mmap_sem dropped. This doesn't make sense currently as generally speaking if the page is already locked it'll have been read in (unless there was an error) before it was unlocked. However a forthcoming patchset will change this with the ability to abort read-ahead bio's if necessary, making it more likely that we could contend for a page lock and still have a not uptodate page. This allows us to deal with this case by grabbing the lock and issuing the IO without the mmap_sem held, and then returning VM_FAULT_RETRY to come back around. Acked-by: Johannes Weiner Signed-off-by: Josef Bacik --- mm/filemap.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 92 insertions(+), 12 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 8fc45f24b201..10084168eff1 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2304,28 +2304,76 @@ EXPORT_SYMBOL(generic_file_read_iter); #ifdef CONFIG_MMU #define MMAP_LOTSAMISS (100) +static struct file *maybe_unlock_mmap_for_io(struct vm_fault *vmf, + struct file *fpin) +{ + int flags = vmf->flags; + if (fpin) + return fpin; + if ((flags & (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT)) == + FAULT_FLAG_ALLOW_RETRY) { + fpin = get_file(vmf->vma->vm_file); + up_read(&vmf->vma->vm_mm->mmap_sem); + } + return fpin; +} + +/* + * Works similar to lock_page_or_retry, except it will pin the file and drop the + * mmap_sem if necessary and then lock the page, and return 1 in this case. + * This means the caller needs to deal with the fpin appropriately. 0 return is + * the same as in lock_page_or_retry. + */ +static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page, + struct file **fpin) +{ + if (trylock_page(page)) + return 1; + + *fpin = maybe_unlock_mmap_for_io(vmf, *fpin); + if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT) + return 0; + if (vmf->flags & FAULT_FLAG_KILLABLE) { + if (__lock_page_killable(page)) { + /* + * We didn't have the right flags to drop the mmap_sem, + * but all fault_handlers only check for fatal signals + * if we return VM_FAULT_RETRY, so we need to drop the + * mmap_sem here and return 0 if we don't have a fpin. + */ + if (*fpin == NULL) + up_read(&vmf->vma->vm_mm->mmap_sem); + return 0; + } + } else + __lock_page(page); + return 1; +} + /* * Synchronous readahead happens when we don't even find * a page in the page cache at all. */ -static void do_sync_mmap_readahead(struct vm_fault *vmf) +static struct file *do_sync_mmap_readahead(struct vm_fault *vmf) { struct file *file = vmf->vma->vm_file; struct file_ra_state *ra = &file->f_ra; struct address_space *mapping = file->f_mapping; + struct file *fpin = NULL; pgoff_t offset = vmf->pgoff; /* If we don't want any read-ahead, don't bother */ if (vmf->vma->vm_flags & VM_RAND_READ) - return; + return fpin; if (!ra->ra_pages) - return; + return fpin; if (vmf->vma->vm_flags & VM_SEQ_READ) { + fpin = maybe_unlock_mmap_for_io(vmf, fpin); page_cache_sync_readahead(mapping, ra, file, offset, ra->ra_pages); - return; + return fpin; } /* Avoid banging the cache line if not needed */ @@ -2337,37 +2385,43 @@ static void do_sync_mmap_readahead(struct vm_fault *vmf) * stop bothering with read-ahead. It will only hurt. */ if (ra->mmap_miss > MMAP_LOTSAMISS) - return; + return fpin; /* * mmap read-around */ + fpin = maybe_unlock_mmap_for_io(vmf, fpin); ra->start = max_t(long, 0, offset - ra->ra_pages / 2); ra->size = ra->ra_pages; ra->async_size = ra->ra_pages / 4; ra_submit(ra, mapping, file); + return fpin; } /* * Asynchronous readahead happens when we find the page and PG_readahead, * so we want to possibly extend the readahead further.. */ -static void do_async_mmap_readahead(struct vm_fault *vmf, - struct page *page) +static struct file *do_async_mmap_readahead(struct vm_fault *vmf, + struct page *page) { struct file *file = vmf->vma->vm_file; struct file_ra_state *ra = &file->f_ra; struct address_space *mapping = file->f_mapping; + struct file *fpin = NULL; pgoff_t offset = vmf->pgoff; /* If we don't want any read-ahead, don't bother */ if (vmf->vma->vm_flags & VM_RAND_READ) - return; + return fpin; if (ra->mmap_miss > 0) ra->mmap_miss--; - if (PageReadahead(page)) + if (PageReadahead(page)) { + fpin = maybe_unlock_mmap_for_io(vmf, fpin); page_cache_async_readahead(mapping, ra, file, page, offset, ra->ra_pages); + } + return fpin; } /** @@ -2397,6 +2451,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) { int error; struct file *file = vmf->vma->vm_file; + struct file *fpin = NULL; struct address_space *mapping = file->f_mapping; struct file_ra_state *ra = &file->f_ra; struct inode *inode = mapping->host; @@ -2418,10 +2473,10 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) * We found the page, so try async readahead before * waiting for the lock. */ - do_async_mmap_readahead(vmf, page); + fpin = do_async_mmap_readahead(vmf, page); } else if (!page) { /* No page in the page cache at all */ - do_sync_mmap_readahead(vmf); + fpin = do_sync_mmap_readahead(vmf); count_vm_event(PGMAJFAULT); count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT); ret = VM_FAULT_MAJOR; @@ -2433,7 +2488,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) return vmf_error(-ENOMEM); } - if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) { + if (!lock_page_maybe_drop_mmap(vmf, page, &fpin)) { put_page(page); return ret | VM_FAULT_RETRY; } @@ -2453,6 +2508,16 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) if (unlikely(!PageUptodate(page))) goto page_not_uptodate; + /* + * We've made it this far and we had to drop our mmap_sem, now is the + * time to return to the upper layer and have it re-find the vma and + * redo the fault. + */ + if (fpin) { + unlock_page(page); + goto out_retry; + } + /* * Found the page and have a reference on it. * We must recheck i_size under page lock. @@ -2475,12 +2540,15 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) * and we need to check for errors. */ ClearPageError(page); + fpin = maybe_unlock_mmap_for_io(vmf, fpin); error = mapping->a_ops->readpage(file, page); if (!error) { wait_on_page_locked(page); if (!PageUptodate(page)) error = -EIO; } + if (fpin) + goto out_retry; put_page(page); if (!error || error == AOP_TRUNCATED_PAGE) @@ -2489,6 +2557,18 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) /* Things didn't work out. Return zero to tell the mm layer so. */ shrink_readahead_size_eio(file, ra); return VM_FAULT_SIGBUS; + +out_retry: + /* + * We dropped the mmap_sem, we need to return to the fault handler to + * re-find the vma and come back and find our hopefully still populated + * page. + */ + if (page) + put_page(page); + if (fpin) + fput(fpin); + return ret | VM_FAULT_RETRY; } EXPORT_SYMBOL(filemap_fault);