From patchwork Wed Aug 15 23:26:31 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kent Overstreet X-Patchwork-Id: 10566887 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 35E5414E1 for ; Wed, 15 Aug 2018 23:26:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C9E8B2A5CA for ; Wed, 15 Aug 2018 23:26:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BE1762ADB7; Wed, 15 Aug 2018 23:26:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D4E952A5CA for ; Wed, 15 Aug 2018 23:26:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727576AbeHPCVD (ORCPT ); Wed, 15 Aug 2018 22:21:03 -0400 Received: from mail-yw1-f68.google.com ([209.85.161.68]:41679 "EHLO mail-yw1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726185AbeHPCVD (ORCPT ); Wed, 15 Aug 2018 22:21:03 -0400 Received: by mail-yw1-f68.google.com with SMTP id q129-v6so2137239ywg.8; Wed, 15 Aug 2018 16:26:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=1UTc2L4LhkJkY7VTEHGfee3HIFoGSa5nro7rtkd8hrI=; b=UB2tmptKf4qrLZgi2wPExbO8AwamEwToluLgc3/pUONJ6LesZFeuH+cttQ5EVQYfcy 02kUYE/sFmHDpvlol+hVe0Do0/QUu7AjLUDpTLi31MsQTsCcQjYkiefOWbdkmzUt7Q9S cS2j8p2LMLMltwFuiTSEDjYOn8YH2M9F6BYEzOYN+KE6zJ9kMbDIQXCE31E0iLuGRbSD FD2j+1F0sSJoVK9HchZ4PKp6NLCnLciQmLsVkAsGjazeqn5umk321/8/4SnU12v0YY0z EM6ChXax1ZEDrv5Xb1+kt33bSM54v4aZD8ad3Q8ep50AZAOPfbacqgcecBPYun69DLF/ 2feA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=1UTc2L4LhkJkY7VTEHGfee3HIFoGSa5nro7rtkd8hrI=; b=ZgasGYtUuNOVgVK8Ng0ryY8Q3rqeEN8QdUjysNut6OUfQwajRga9N2mg+f0NZY9T0M PL4NXWiuiKIhIVxfU6fNXwRuNpJ7QOHuEKEHhrkRa1KXWgXfLL4dghRFodzV7mQ8avNx 0jCVY3YSS/Yg/VTzYDkme2ZuPXSePckPA9TbroN6fbqYAtkfS7XT57wPtCiEe347HKDY XomdwpVOcZSUzdhJsl2k2U30GRaVhlCenEcvGrBB19wv3/cC8sxQ46bIAEabcKTeFr6Y kgQrMgeB6zxyMTzNbzeqfl3A5iLGs/re4HfJfl6Ql9b0Kr3+w7UWgT4tGMTB137ugqoQ BPHw== X-Gm-Message-State: AOUpUlFBs7BcYqNWNJvYSvj7Bx5HCco3aNipzM0fj5Whj7GNnpcyt6Kh viGopzz8jr5zPUU1gFJp+r30Atw= X-Google-Smtp-Source: AA+uWPzHzCdzmTw8SE06QnSm5RBmWqDstMFAXC9WFNfpK+J159Y/J+qGA/7EhLxSWDoMjNL8F/guhg== X-Received: by 2002:a81:9251:: with SMTP id j78-v6mr16220219ywg.5.1534375599343; Wed, 15 Aug 2018 16:26:39 -0700 (PDT) Received: from localhost.localdomain (c-71-234-172-214.hsd1.vt.comcast.net. [71.234.172.214]) by smtp.gmail.com with ESMTPSA id h10-v6sm6204361ywa.35.2018.08.15.16.26.37 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 15 Aug 2018 16:26:38 -0700 (PDT) From: Kent Overstreet To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk Cc: Kent Overstreet Subject: [PATCH 1/2] fs: Break generic_file_buffered_read up into multiple functions Date: Wed, 15 Aug 2018 19:26:31 -0400 Message-Id: <20180815232632.32548-2-kent.overstreet@gmail.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180815232632.32548-1-kent.overstreet@gmail.com> References: <20180815232632.32548-1-kent.overstreet@gmail.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This is prep work for changing generic_file_buffered_read() to use find_get_pages_contig() to batch up all the pagecache lookups. This patch should be functionally identical to the existing code and changes as little as of the flow control as possible. More refactoring could be done, this patch is intended to be relatively minimal. Signed-off-by: Kent Overstreet --- mm/filemap.c | 418 ++++++++++++++++++++++++++++----------------------- 1 file changed, 228 insertions(+), 190 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index c7723bbbbd..308bdd466f 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2110,6 +2110,210 @@ static void shrink_readahead_size_eio(struct file *filp, ra->ra_pages /= 4; } +static int generic_file_buffered_read_page_ok(struct kiocb *iocb, + struct iov_iter *iter, + struct page *page) +{ + struct address_space *mapping = iocb->ki_filp->f_mapping; + struct inode *inode = mapping->host; + struct file_ra_state *ra = &iocb->ki_filp->f_ra; + unsigned offset = iocb->ki_pos & ~PAGE_MASK; + unsigned bytes, copied; + loff_t isize, end_offset; + + BUG_ON(iocb->ki_pos >> PAGE_SHIFT != page->index); + + /* + * i_size must be checked after we know the page is Uptodate. + * + * Checking i_size after the check allows us to calculate + * the correct value for "bytes", which means the zero-filled + * part of the page is not copied back to userspace (unless + * another truncate extends the file - this is desired though). + */ + + isize = i_size_read(inode); + if (unlikely(iocb->ki_pos >= isize)) + return 1; + + end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count); + + bytes = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - offset); + + /* If users can be writing to this page using arbitrary + * virtual addresses, take care about potential aliasing + * before reading the page on the kernel side. + */ + if (mapping_writably_mapped(mapping)) + flush_dcache_page(page); + + /* + * Ok, we have the page, and it's up-to-date, so + * now we can copy it to user space... + */ + + copied = copy_page_to_iter(page, offset, bytes, iter); + + iocb->ki_pos += copied; + + /* + * When a sequential read accesses a page several times, + * only mark it as accessed the first time. + */ + if (iocb->ki_pos >> PAGE_SHIFT != ra->prev_pos >> PAGE_SHIFT) + mark_page_accessed(page); + + ra->prev_pos = iocb->ki_pos; + + if (copied < bytes) + return -EFAULT; + + return !iov_iter_count(iter) || iocb->ki_pos == isize; +} + +static struct page * +generic_file_buffered_read_readpage(struct file *filp, + struct address_space *mapping, + struct page *page) +{ + struct file_ra_state *ra = &filp->f_ra; + int error; + + /* + * A previous I/O error may have been due to temporary + * failures, eg. multipath errors. + * PG_error will be set again if readpage fails. + */ + ClearPageError(page); + /* Start the actual read. The read will unlock the page. */ + error = mapping->a_ops->readpage(filp, page); + + if (unlikely(error)) { + put_page(page); + return error != AOP_TRUNCATED_PAGE ? ERR_PTR(error) : NULL; + } + + if (!PageUptodate(page)) { + error = lock_page_killable(page); + if (unlikely(error)) { + put_page(page); + return ERR_PTR(error); + } + if (!PageUptodate(page)) { + if (page->mapping == NULL) { + /* + * invalidate_mapping_pages got it + */ + unlock_page(page); + put_page(page); + return NULL; + } + unlock_page(page); + shrink_readahead_size_eio(filp, ra); + put_page(page); + return ERR_PTR(-EIO); + } + unlock_page(page); + } + + return page; +} + +static struct page * +generic_file_buffered_read_pagenotuptodate(struct file *filp, + struct iov_iter *iter, + struct page *page, + loff_t pos, loff_t count) +{ + struct address_space *mapping = filp->f_mapping; + struct inode *inode = mapping->host; + int error; + + /* + * See comment in do_read_cache_page on why + * wait_on_page_locked is used to avoid unnecessarily + * serialisations and why it's safe. + */ + error = wait_on_page_locked_killable(page); + if (unlikely(error)) { + put_page(page); + return ERR_PTR(error); + } + + if (PageUptodate(page)) + return page; + + if (inode->i_blkbits == PAGE_SHIFT || + !mapping->a_ops->is_partially_uptodate) + goto page_not_up_to_date; + /* pipes can't handle partially uptodate pages */ + if (unlikely(iter->type & ITER_PIPE)) + goto page_not_up_to_date; + if (!trylock_page(page)) + goto page_not_up_to_date; + /* Did it get truncated before we got the lock? */ + if (!page->mapping) + goto page_not_up_to_date_locked; + + if (!mapping->a_ops->is_partially_uptodate(page, + pos & ~PAGE_MASK, count)) + goto page_not_up_to_date_locked; + unlock_page(page); + return page; + +page_not_up_to_date: + /* Get exclusive access to the page ... */ + error = lock_page_killable(page); + if (unlikely(error)) { + put_page(page); + return ERR_PTR(error); + } + +page_not_up_to_date_locked: + /* Did it get truncated before we got the lock? */ + if (!page->mapping) { + unlock_page(page); + put_page(page); + return NULL; + } + + /* Did somebody else fill it already? */ + if (PageUptodate(page)) { + unlock_page(page); + return page; + } + + return generic_file_buffered_read_readpage(filp, mapping, page); +} + +static struct page * +generic_file_buffered_read_no_cached_page(struct kiocb *iocb, + struct iov_iter *iter) +{ + struct file *filp = iocb->ki_filp; + struct address_space *mapping = filp->f_mapping; + pgoff_t index = iocb->ki_pos >> PAGE_SHIFT; + struct page *page; + int error; + + /* + * Ok, it wasn't cached, so we need to create a new + * page.. + */ + page = page_cache_alloc(mapping); + if (!page) + return ERR_PTR(-ENOMEM); + + error = add_to_page_cache_lru(page, mapping, index, + mapping_gfp_constraint(mapping, GFP_KERNEL)); + if (error) { + put_page(page); + return error != -EEXIST ? ERR_PTR(error) : NULL; + } + + return generic_file_buffered_read_readpage(filp, mapping, page); +} + /** * generic_file_buffered_read - generic file read routine * @iocb: the iocb to read @@ -2129,29 +2333,19 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, struct address_space *mapping = filp->f_mapping; struct inode *inode = mapping->host; struct file_ra_state *ra = &filp->f_ra; - loff_t *ppos = &iocb->ki_pos; - pgoff_t index; + size_t orig_count = iov_iter_count(iter); pgoff_t last_index; - pgoff_t prev_index; - unsigned long offset; /* offset into pagecache page */ - unsigned int prev_offset; int error = 0; - if (unlikely(*ppos >= inode->i_sb->s_maxbytes)) + if (unlikely(iocb->ki_pos >= inode->i_sb->s_maxbytes)) return 0; iov_iter_truncate(iter, inode->i_sb->s_maxbytes); - index = *ppos >> PAGE_SHIFT; - prev_index = ra->prev_pos >> PAGE_SHIFT; - prev_offset = ra->prev_pos & (PAGE_SIZE-1); - last_index = (*ppos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT; - offset = *ppos & ~PAGE_MASK; + last_index = (iocb->ki_pos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT; for (;;) { + pgoff_t index = iocb->ki_pos >> PAGE_SHIFT; struct page *page; - pgoff_t end_index; - loff_t isize; - unsigned long nr, ret; cond_resched(); find_page: @@ -2168,8 +2362,15 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, ra, filp, index, last_index - index); page = find_get_page(mapping, index); - if (unlikely(page == NULL)) - goto no_cached_page; + if (unlikely(page == NULL)) { + page = generic_file_buffered_read_no_cached_page(iocb, iter); + if (!page) + goto find_page; + if (IS_ERR(page)) { + error = PTR_ERR(page); + goto out; + } + } } if (PageReadahead(page)) { page_cache_async_readahead(mapping, @@ -2179,199 +2380,36 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, if (!PageUptodate(page)) { if (iocb->ki_flags & IOCB_NOWAIT) { put_page(page); - goto would_block; - } - - /* - * See comment in do_read_cache_page on why - * wait_on_page_locked is used to avoid unnecessarily - * serialisations and why it's safe. - */ - error = wait_on_page_locked_killable(page); - if (unlikely(error)) - goto readpage_error; - if (PageUptodate(page)) - goto page_ok; - - if (inode->i_blkbits == PAGE_SHIFT || - !mapping->a_ops->is_partially_uptodate) - goto page_not_up_to_date; - /* pipes can't handle partially uptodate pages */ - if (unlikely(iter->type & ITER_PIPE)) - goto page_not_up_to_date; - if (!trylock_page(page)) - goto page_not_up_to_date; - /* Did it get truncated before we got the lock? */ - if (!page->mapping) - goto page_not_up_to_date_locked; - if (!mapping->a_ops->is_partially_uptodate(page, - offset, iter->count)) - goto page_not_up_to_date_locked; - unlock_page(page); - } -page_ok: - /* - * i_size must be checked after we know the page is Uptodate. - * - * Checking i_size after the check allows us to calculate - * the correct value for "nr", which means the zero-filled - * part of the page is not copied back to userspace (unless - * another truncate extends the file - this is desired though). - */ - - isize = i_size_read(inode); - end_index = (isize - 1) >> PAGE_SHIFT; - if (unlikely(!isize || index > end_index)) { - put_page(page); - goto out; - } - - /* nr is the maximum number of bytes to copy from this page */ - nr = PAGE_SIZE; - if (index == end_index) { - nr = ((isize - 1) & ~PAGE_MASK) + 1; - if (nr <= offset) { - put_page(page); + error = -EAGAIN; goto out; } - } - nr = nr - offset; - - /* If users can be writing to this page using arbitrary - * virtual addresses, take care about potential aliasing - * before reading the page on the kernel side. - */ - if (mapping_writably_mapped(mapping)) - flush_dcache_page(page); - - /* - * When a sequential read accesses a page several times, - * only mark it as accessed the first time. - */ - if (prev_index != index || offset != prev_offset) - mark_page_accessed(page); - prev_index = index; - - /* - * Ok, we have the page, and it's up-to-date, so - * now we can copy it to user space... - */ - - ret = copy_page_to_iter(page, offset, nr, iter); - offset += ret; - index += offset >> PAGE_SHIFT; - offset &= ~PAGE_MASK; - prev_offset = offset; - put_page(page); - written += ret; - if (!iov_iter_count(iter)) - goto out; - if (ret < nr) { - error = -EFAULT; - goto out; - } - continue; - -page_not_up_to_date: - /* Get exclusive access to the page ... */ - error = lock_page_killable(page); - if (unlikely(error)) - goto readpage_error; - -page_not_up_to_date_locked: - /* Did it get truncated before we got the lock? */ - if (!page->mapping) { - unlock_page(page); - put_page(page); - continue; - } - - /* Did somebody else fill it already? */ - if (PageUptodate(page)) { - unlock_page(page); - goto page_ok; - } - -readpage: - /* - * A previous I/O error may have been due to temporary - * failures, eg. multipath errors. - * PG_error will be set again if readpage fails. - */ - ClearPageError(page); - /* Start the actual read. The read will unlock the page. */ - error = mapping->a_ops->readpage(filp, page); - - if (unlikely(error)) { - if (error == AOP_TRUNCATED_PAGE) { - put_page(page); - error = 0; + page = generic_file_buffered_read_pagenotuptodate(filp, + iter, page, iocb->ki_pos, iter->count); + if (!page) goto find_page; + if (IS_ERR(page)) { + error = PTR_ERR(page); + goto out; } - goto readpage_error; - } - - if (!PageUptodate(page)) { - error = lock_page_killable(page); - if (unlikely(error)) - goto readpage_error; - if (!PageUptodate(page)) { - if (page->mapping == NULL) { - /* - * invalidate_mapping_pages got it - */ - unlock_page(page); - put_page(page); - goto find_page; - } - unlock_page(page); - shrink_readahead_size_eio(filp, ra); - error = -EIO; - goto readpage_error; - } - unlock_page(page); } - goto page_ok; - -readpage_error: - /* UHHUH! A synchronous read error occurred. Report it */ + error = generic_file_buffered_read_page_ok(iocb, iter, page); put_page(page); - goto out; -no_cached_page: - /* - * Ok, it wasn't cached, so we need to create a new - * page.. - */ - page = page_cache_alloc(mapping); - if (!page) { - error = -ENOMEM; - goto out; - } - error = add_to_page_cache_lru(page, mapping, index, - mapping_gfp_constraint(mapping, GFP_KERNEL)); if (error) { - put_page(page); - if (error == -EEXIST) { + if (error > 0) error = 0; - goto find_page; - } goto out; } - goto readpage; } would_block: error = -EAGAIN; out: - ra->prev_pos = prev_index; - ra->prev_pos <<= PAGE_SHIFT; - ra->prev_pos |= prev_offset; - - *ppos = ((loff_t)index << PAGE_SHIFT) + offset; file_accessed(filp); + written += orig_count - iov_iter_count(iter); + return written ? written : error; } From patchwork Wed Aug 15 23:26:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kent Overstreet X-Patchwork-Id: 10566885 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7AB7314E1 for ; Wed, 15 Aug 2018 23:26:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1A20D2A5CA for ; Wed, 15 Aug 2018 23:26:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0E4892ADB7; Wed, 15 Aug 2018 23:26:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D2C742A694 for ; Wed, 15 Aug 2018 23:26:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727687AbeHPCVG (ORCPT ); Wed, 15 Aug 2018 22:21:06 -0400 Received: from mail-yw1-f65.google.com ([209.85.161.65]:38750 "EHLO mail-yw1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726185AbeHPCVF (ORCPT ); Wed, 15 Aug 2018 22:21:05 -0400 Received: by mail-yw1-f65.google.com with SMTP id r3-v6so2150311ywc.5; Wed, 15 Aug 2018 16:26:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=bQUfoafMHJn3jNeSRMJA/RipiTIUjZB9XDadwDKXyiw=; b=TNJR+BBWSUmXu5NzLPStysbmYC+SUcSkS9A6Sgi6sis2a+GK4oQap1Axmig6MOa29u TtcA/PQc0qKzqH1A6wvU5FtdXozZPVN7sZhuAZ4fQ8ndlWz+nn+MFhmXmQcC9GT/BVtt 5rRs+YHr9AOHhPzY40zRPKTCs+hMGEgWIShp3CYmUpkuX5XxOQYPZZttQqhB6BB5/nOP C1F8AEB/0GrkiVgqkP3ocbRnpprw/hRL4CXAScVw+Ad8PNPmnDAGbWtrO77Rx1bihJEm G2OINlSSyqwf3ZiU4mI7qQ/7yntnbTSLgwf/MCKvjDb5YZt+JSM/7d+l6BB0za9f5xV7 hNug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=bQUfoafMHJn3jNeSRMJA/RipiTIUjZB9XDadwDKXyiw=; b=gCEZCxahgOlK8+Ty8kCFerrQLCwPb6HCoI6OuFZSOolYieEDQGLYvvORCp7H5eRMkz 9MyNjp7IY1LtUuI/ZjoP7XmYyvzs6moN/W06y7uLYk33qyPpRVieUu1h3WNBDtFRqvL0 X0jETGbfSPuxVGcREOABG2x/BLnToifUBVv44GAdDsIOx/aXxHRlaxovlP+cvBIhC7/V bH+lR1/1aMwTR5vue/mHY/moLjTOe78RkXBweg9jGgHZCdZFTefIPdSe3xGY5Ckysga+ C+uUxnqRQu0QQZ+clywWzj4AZELxtw8wFzk7na+9t3RdYsoxvQJzrPlFE08yJpfbg5o9 WErw== X-Gm-Message-State: AOUpUlE8Y28SS6UFRUb4lFlg7SMmzYCy/aetp8/8hmxdx37KyaaC7efy RlZae+E0Uz2Sxf0bzMOzM4TGE+k= X-Google-Smtp-Source: AA+uWPwbcB/0KG0+CF3ocH/2FnV9ykEAbgwG77P5RJ4RWj/PcZSUmn0O025zlgRY11Un4+nw8JApDg== X-Received: by 2002:a81:430f:: with SMTP id q15-v6mr14725723ywa.39.1534375601673; Wed, 15 Aug 2018 16:26:41 -0700 (PDT) Received: from localhost.localdomain (c-71-234-172-214.hsd1.vt.comcast.net. [71.234.172.214]) by smtp.gmail.com with ESMTPSA id h10-v6sm6204361ywa.35.2018.08.15.16.26.39 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 15 Aug 2018 16:26:40 -0700 (PDT) From: Kent Overstreet To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk Cc: Kent Overstreet Subject: [PATCH 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig Date: Wed, 15 Aug 2018 19:26:32 -0400 Message-Id: <20180815232632.32548-3-kent.overstreet@gmail.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180815232632.32548-1-kent.overstreet@gmail.com> References: <20180815232632.32548-1-kent.overstreet@gmail.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Convert generic_file_buffered_read() to get pages to read from in batches, and then copy data to userspace from many pages at once - in particular, we now don't touch any cachelines that might be contended while we're in the loop to copy data to userspace. This is is a performance improvement on workloads that do buffered reads with large blocksizes, and a very large performance improvement if that file is also being accessed concurrently by different threads. On smaller reads (512 bytes), there's a very small performance improvement (1%, within the margin of error). Signed-off-by: Kent Overstreet --- mm/filemap.c | 266 ++++++++++++++++++++++++++++----------------------- 1 file changed, 144 insertions(+), 122 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 308bdd466f..0de9fe4c61 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2110,67 +2110,6 @@ static void shrink_readahead_size_eio(struct file *filp, ra->ra_pages /= 4; } -static int generic_file_buffered_read_page_ok(struct kiocb *iocb, - struct iov_iter *iter, - struct page *page) -{ - struct address_space *mapping = iocb->ki_filp->f_mapping; - struct inode *inode = mapping->host; - struct file_ra_state *ra = &iocb->ki_filp->f_ra; - unsigned offset = iocb->ki_pos & ~PAGE_MASK; - unsigned bytes, copied; - loff_t isize, end_offset; - - BUG_ON(iocb->ki_pos >> PAGE_SHIFT != page->index); - - /* - * i_size must be checked after we know the page is Uptodate. - * - * Checking i_size after the check allows us to calculate - * the correct value for "bytes", which means the zero-filled - * part of the page is not copied back to userspace (unless - * another truncate extends the file - this is desired though). - */ - - isize = i_size_read(inode); - if (unlikely(iocb->ki_pos >= isize)) - return 1; - - end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count); - - bytes = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - offset); - - /* If users can be writing to this page using arbitrary - * virtual addresses, take care about potential aliasing - * before reading the page on the kernel side. - */ - if (mapping_writably_mapped(mapping)) - flush_dcache_page(page); - - /* - * Ok, we have the page, and it's up-to-date, so - * now we can copy it to user space... - */ - - copied = copy_page_to_iter(page, offset, bytes, iter); - - iocb->ki_pos += copied; - - /* - * When a sequential read accesses a page several times, - * only mark it as accessed the first time. - */ - if (iocb->ki_pos >> PAGE_SHIFT != ra->prev_pos >> PAGE_SHIFT) - mark_page_accessed(page); - - ra->prev_pos = iocb->ki_pos; - - if (copied < bytes) - return -EFAULT; - - return !iov_iter_count(iter) || iocb->ki_pos == isize; -} - static struct page * generic_file_buffered_read_readpage(struct file *filp, struct address_space *mapping, @@ -2314,6 +2253,79 @@ generic_file_buffered_read_no_cached_page(struct kiocb *iocb, return generic_file_buffered_read_readpage(filp, mapping, page); } +static int generic_file_buffered_read_get_pages(struct kiocb *iocb, + struct iov_iter *iter, + struct page **pages, + unsigned nr) +{ + struct file *filp = iocb->ki_filp; + struct address_space *mapping = filp->f_mapping; + struct file_ra_state *ra = &filp->f_ra; + pgoff_t index = iocb->ki_pos >> PAGE_SHIFT; + pgoff_t last_index = (iocb->ki_pos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT; + int i, j, ret, err = 0; + + nr = min_t(unsigned long, last_index - index, nr); +find_page: + if (fatal_signal_pending(current)) + return -EINTR; + + ret = find_get_pages_contig(mapping, index, nr, pages); + if (ret) + goto got_pages; + + if (iocb->ki_flags & IOCB_NOWAIT) + return -EAGAIN; + + page_cache_sync_readahead(mapping, ra, filp, index, last_index - index); + + ret = find_get_pages_contig(mapping, index, nr, pages); + if (ret) + goto got_pages; + + pages[0] = generic_file_buffered_read_no_cached_page(iocb, iter); + err = PTR_ERR_OR_ZERO(pages[0]); + ret = !IS_ERR_OR_NULL(pages[0]); +got_pages: + for (i = 0; i < ret; i++) { + struct page *page = pages[i]; + pgoff_t pg_index = index +i; + loff_t pg_pos = max(iocb->ki_pos, + (loff_t) pg_index << PAGE_SHIFT); + loff_t pg_count = iocb->ki_pos + iter->count - pg_pos; + + if (PageReadahead(page)) + page_cache_async_readahead(mapping, ra, filp, page, + pg_index, last_index - pg_index); + + if (!PageUptodate(page)) { + if (iocb->ki_flags & IOCB_NOWAIT) { + for (j = i; j < ret; j++) + put_page(pages[j]); + ret = i; + err = -EAGAIN; + break; + } + + page = generic_file_buffered_read_pagenotuptodate(filp, + iter, page, pg_pos, pg_count); + if (IS_ERR_OR_NULL(page)) { + for (j = i + 1; j < ret; j++) + put_page(pages[j]); + ret = i; + err = PTR_ERR_OR_ZERO(page); + break; + } + } + } + + if (likely(ret)) + return ret; + if (err) + return err; + goto find_page; +} + /** * generic_file_buffered_read - generic file read routine * @iocb: the iocb to read @@ -2330,83 +2342,93 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, struct iov_iter *iter, ssize_t written) { struct file *filp = iocb->ki_filp; + struct file_ra_state *ra = &filp->f_ra; struct address_space *mapping = filp->f_mapping; struct inode *inode = mapping->host; - struct file_ra_state *ra = &filp->f_ra; size_t orig_count = iov_iter_count(iter); - pgoff_t last_index; - int error = 0; + struct page *pages[64]; + int i, pg_nr, error = 0; + bool writably_mapped; + loff_t isize, end_offset; if (unlikely(iocb->ki_pos >= inode->i_sb->s_maxbytes)) return 0; iov_iter_truncate(iter, inode->i_sb->s_maxbytes); - last_index = (iocb->ki_pos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT; - - for (;;) { - pgoff_t index = iocb->ki_pos >> PAGE_SHIFT; - struct page *page; - + do { cond_resched(); -find_page: - if (fatal_signal_pending(current)) { - error = -EINTR; - goto out; - } - page = find_get_page(mapping, index); - if (!page) { - if (iocb->ki_flags & IOCB_NOWAIT) - goto would_block; - page_cache_sync_readahead(mapping, - ra, filp, - index, last_index - index); - page = find_get_page(mapping, index); - if (unlikely(page == NULL)) { - page = generic_file_buffered_read_no_cached_page(iocb, iter); - if (!page) - goto find_page; - if (IS_ERR(page)) { - error = PTR_ERR(page); - goto out; - } - } - } - if (PageReadahead(page)) { - page_cache_async_readahead(mapping, - ra, filp, page, - index, last_index - index); + i = 0; + pg_nr = generic_file_buffered_read_get_pages(iocb, iter, pages, + ARRAY_SIZE(pages)); + if (pg_nr < 0) { + error = pg_nr; + break; } - if (!PageUptodate(page)) { - if (iocb->ki_flags & IOCB_NOWAIT) { - put_page(page); - error = -EAGAIN; - goto out; - } - page = generic_file_buffered_read_pagenotuptodate(filp, - iter, page, iocb->ki_pos, iter->count); - if (!page) - goto find_page; - if (IS_ERR(page)) { - error = PTR_ERR(page); - goto out; - } - } + /* + * i_size must be checked after we know the pages are Uptodate. + * + * Checking i_size after the check allows us to calculate + * the correct value for "nr", which means the zero-filled + * part of the page is not copied back to userspace (unless + * another truncate extends the file - this is desired though). + */ + isize = i_size_read(inode); + if (unlikely(iocb->ki_pos >= isize)) + goto put_pages; - error = generic_file_buffered_read_page_ok(iocb, iter, page); - put_page(page); + end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count); - if (error) { - if (error > 0) - error = 0; - goto out; + while ((iocb->ki_pos >> PAGE_SHIFT) + pg_nr > + (end_offset + PAGE_SIZE - 1) >> PAGE_SHIFT) + put_page(pages[--pg_nr]); + + /* + * Once we start copying data, we don't want to be touching any + * cachelines that might be contended: + */ + writably_mapped = mapping_writably_mapped(mapping); + + /* + * When a sequential read accesses a page several times, only + * mark it as accessed the first time. + */ + if (iocb->ki_pos >> PAGE_SHIFT != + ra->prev_pos >> PAGE_SHIFT) + mark_page_accessed(pages[0]); + for (i = 1; i < pg_nr; i++) + mark_page_accessed(pages[i]); + + for (i = 0; i < pg_nr; i++) { + unsigned offset = iocb->ki_pos & ~PAGE_MASK; + unsigned bytes = min_t(loff_t, end_offset - iocb->ki_pos, + PAGE_SIZE - offset); + unsigned copied; + + /* + * If users can be writing to this page using arbitrary + * virtual addresses, take care about potential aliasing + * before reading the page on the kernel side. + */ + if (writably_mapped) + flush_dcache_page(page); + + copied = copy_page_to_iter(pages[i], offset, bytes, iter); + + iocb->ki_pos += copied; + ra->prev_pos = iocb->ki_pos; + + if (copied < bytes) { + error = -EFAULT; + break; + } } - } +put_pages: + for (i = 0; i < pg_nr; i++) + put_page(pages[i]); + } while (iov_iter_count(iter) && iocb->ki_pos < isize && !error); -would_block: - error = -EAGAIN; -out: file_accessed(filp); written += orig_count - iov_iter_count(iter);