From patchwork Mon Jan 11 22:07:58 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Benjamin LaHaise X-Patchwork-Id: 8012321 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 994419F8AA for ; Mon, 11 Jan 2016 22:08:11 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 97172201BB for ; Mon, 11 Jan 2016 22:08:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 90CE9201C8 for ; Mon, 11 Jan 2016 22:08:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761319AbcAKWIB (ORCPT ); Mon, 11 Jan 2016 17:08:01 -0500 Received: from kanga.kvack.org ([205.233.56.17]:53598 "EHLO kanga.kvack.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1761252AbcAKWH7 (ORCPT ); Mon, 11 Jan 2016 17:07:59 -0500 Received: by kanga.kvack.org (Postfix, from userid 63042) id 07DDB82F65; Mon, 11 Jan 2016 17:07:58 -0500 (EST) Date: Mon, 11 Jan 2016 17:07:58 -0500 From: Benjamin LaHaise To: linux-aio@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Cc: Alexander Viro , Andrew Morton , Linus Torvalds Subject: [PATCH 12/13] aio: add support for aio readahead Message-ID: <130a393a298209223b5ed3c3d3fe9023e56eddcb.1452549431.git.bcrl@kvack.org> References: Mime-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.2i Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Introduce an asynchronous operation to populate the page cache with pages at a given offset and length. This operation is conceptually similar to performing an asynchronous read except that it does not actually copy the data from the page cache into userspace, rather it performs readahead and notifies userspace when all pages have been read. The motivation for this came about as a result of investigation into a performace degradation when reading from disk. In the case of a heavily loaded system, the copy_to_user() performed for an asynchronous read was temporally quite distant from when the data was actually used. By only reading the data into the kernel's page cache, the cache pollution caused by copying the data into userspace is avoided, and overall system performance is improved. Signed-off-by: Benjamin LaHaise Signed-off-by: Benjamin LaHaise --- fs/aio.c | 141 +++++++++++++++++++++++++++++++++++++++++++ include/uapi/linux/aio_abi.h | 1 + 2 files changed, 142 insertions(+) diff --git a/fs/aio.c b/fs/aio.c index 3a70492..5cb3d74 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -42,6 +42,7 @@ #include #include #include +#include <../mm/internal.h> #include #include @@ -238,6 +239,8 @@ long aio_do_openat(int fd, const char *filename, int flags, int mode); long aio_do_unlinkat(int fd, const char *filename, int flags, int mode); long aio_foo_at(struct aio_kiocb *req, do_foo_at_t do_foo_at); +long aio_readahead(struct aio_kiocb *iocb, unsigned long len); + static __always_inline bool aio_may_use_threads(void) { #if IS_ENABLED(CONFIG_AIO_THREAD) @@ -1812,6 +1815,137 @@ long aio_foo_at(struct aio_kiocb *req, do_foo_at_t do_foo_at) AIO_THREAD_NEED_FILES | AIO_THREAD_NEED_CRED); } + +static int aio_ra_filler(void *data, struct page *page) +{ + struct file *file = data; + + return file->f_mapping->a_ops->readpage(file, page); +} + +static long aio_ra_wait_on_pages(struct file *file, pgoff_t start, + unsigned long nr) +{ + struct address_space *mapping = file->f_mapping; + unsigned long i; + + /* Wait on pages starting at the end to holdfully avoid too many + * wakeups. + */ + for (i = nr; i-- > 0; ) { + pgoff_t index = start + i; + struct page *page; + + /* First do the quick check to see if the page is present and + * uptodate. + */ + rcu_read_lock(); + page = radix_tree_lookup(&mapping->page_tree, index); + rcu_read_unlock(); + + if (page && !radix_tree_exceptional_entry(page) && + PageUptodate(page)) { + continue; + } + + page = read_cache_page(mapping, index, aio_ra_filler, file); + if (IS_ERR(page)) + return PTR_ERR(page); + page_cache_release(page); + } + return 0; +} + +static long aio_thread_op_readahead(struct aio_kiocb *iocb) +{ + pgoff_t start, end, nr, offset; + long ret = 0; + + start = iocb->common.ki_pos >> PAGE_CACHE_SHIFT; + end = (iocb->common.ki_pos + iocb->ki_data - 1) >> PAGE_CACHE_SHIFT; + nr = end - start + 1; + + for (offset = 0; offset < nr; ) { + pgoff_t chunk = nr - offset; + unsigned long max_chunk = (2 * 1024 * 1024) / PAGE_CACHE_SIZE; + + if (chunk > max_chunk) + chunk = max_chunk; + + ret = __do_page_cache_readahead(iocb->common.ki_filp->f_mapping, + iocb->common.ki_filp, + start + offset, chunk, 0, 1); + if (ret <= 0) + break; + offset += ret; + } + + if (!offset && ret < 0) + return ret; + + if (offset > 0) { + ret = aio_ra_wait_on_pages(iocb->common.ki_filp, start, offset); + if (ret < 0) + return ret; + } + + if (offset == nr) + return iocb->ki_data; + if (offset > 0) + return ((start + offset) << PAGE_CACHE_SHIFT) - + iocb->common.ki_pos; + return 0; +} + +long aio_readahead(struct aio_kiocb *iocb, unsigned long len) +{ + struct address_space *mapping = iocb->common.ki_filp->f_mapping; + pgoff_t index, end; + loff_t epos, isize; + int do_io = 0; + + if (!mapping || !mapping->a_ops) + return -EBADF; + if (!mapping->a_ops->readpage && !mapping->a_ops->readpages) + return -EBADF; + if (!len) + return 0; + + epos = iocb->common.ki_pos + len; + if (epos < 0) + return -EINVAL; + isize = i_size_read(mapping->host); + if (isize < epos) { + epos = isize - iocb->common.ki_pos; + if (epos <= 0) + return 0; + if ((unsigned long)epos != epos) + return -EINVAL; + len = epos; + } + + index = iocb->common.ki_pos >> PAGE_CACHE_SHIFT; + end = (iocb->common.ki_pos + len - 1) >> PAGE_CACHE_SHIFT; + iocb->ki_data = len; + if (end < index) + return -EINVAL; + + do { + struct page *page; + + rcu_read_lock(); + page = radix_tree_lookup(&mapping->page_tree, index); + rcu_read_unlock(); + + if (!page || radix_tree_exceptional_entry(page) || + !PageUptodate(page)) + do_io = 1; + } while (!do_io && (index++ < end)); + + if (do_io) + return aio_thread_queue_iocb(iocb, aio_thread_op_readahead, 0); + return len; +} #endif /* IS_ENABLED(CONFIG_AIO_THREAD) */ /* @@ -1922,6 +2056,13 @@ rw_common: ret = aio_foo_at(req, aio_do_unlinkat); break; + case IOCB_CMD_READAHEAD: + if (user_iocb->aio_buf) + return -EINVAL; + if (aio_may_use_threads()) + ret = aio_readahead(req, user_iocb->aio_nbytes); + break; + default: pr_debug("EINVAL: no operation provided\n"); return -EINVAL; diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h index 63a0d41..4def682 100644 --- a/include/uapi/linux/aio_abi.h +++ b/include/uapi/linux/aio_abi.h @@ -47,6 +47,7 @@ enum { IOCB_CMD_OPENAT = 9, IOCB_CMD_UNLINKAT = 10, + IOCB_CMD_READAHEAD = 12, }; /*