From patchwork Tue Dec 17 14:39:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 11297749 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 56502112B for ; Tue, 17 Dec 2019 14:39:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2A7A12467B for ; Tue, 17 Dec 2019 14:39:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="Rz0PTf0W" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728551AbfLQOjx (ORCPT ); Tue, 17 Dec 2019 09:39:53 -0500 Received: from mail-il1-f196.google.com ([209.85.166.196]:36252 "EHLO mail-il1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728493AbfLQOjx (ORCPT ); Tue, 17 Dec 2019 09:39:53 -0500 Received: by mail-il1-f196.google.com with SMTP id b15so8576343iln.3 for ; Tue, 17 Dec 2019 06:39:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=qEQyHfYbXoJej+s6/OMXT2BLZEPX6SiZ+5HLaKOeShc=; b=Rz0PTf0WwZEmsAUdF5wBZZkfVKCAltgfrzi74jjEA3lLneDy8dSauI/jbG3xoO7X7R e/VLmB+AiDGb4AjO2B/a6EyP+ygtn5D4ibbttq9rQLt5QiJNuzGA8PKC/CE9cdH4s9f+ dNaoH3JbLMpqNSpla+Y1wRrLG2jXJyiYRfyNa/cTX2EX/ow4JZkh0XhafeJtifcvPcPw /mA6Xil4l219I3wA5uND8mxzy81yPgODPqEFgFMTmd/OtU9Ds3F3kx/oD5fjJd118/Sd EO3NABK5jRgz+ogHrcK6ROc0cGWS6zuF8O/S6gQWgDglVQaB14e5ws/hq5XF5GmnU0g9 mBug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=qEQyHfYbXoJej+s6/OMXT2BLZEPX6SiZ+5HLaKOeShc=; b=lr6G58S3iqNCVfuQp3OJYBSHBGkXPb/e6lutspfxNiC2RqfSAwzL6e1ktob0I2I3b4 udcHCF7eznKKLw4BxFG2Z6OC464rAxHWAYiIRnp17yEFJ9nnCSQ25aBavnPa+qF9CaWb krss5xTbqsgAR4QZNMD6uj8UQW2RygMkzA6z8Wz+CLPT148nUozWqje3E6ryOwaeQxDF kT0WtgpKo/20SZatNEKXQz6tWRDt/Qv0pl9sJd88gt7VykCZAwsVqcL0pru9+YKQ6Q9i dW/g8jUCpG3Yk4e0CpA8z+1dQlEKA8kJ19r6XjkQwz5G28NIAZlLg2b5KhKf3DnW8VgO VD2Q== X-Gm-Message-State: APjAAAWrg+ryiYTS0tW61spihbayK4QmXJGw6u7pceGHlKMe/bHAxlq5 XFI15IdOwfzRGsS2D3hgzAtDSQ== X-Google-Smtp-Source: APXvYqxNs1UwvqiKd0kuf9aHlqRHi8PBLH1NSYmwr4Xu4fISsW1/lVDKFz8aN9GKiiU+f+n2z5OIYg== X-Received: by 2002:a92:d809:: with SMTP id y9mr18025008ilm.261.1576593592098; Tue, 17 Dec 2019 06:39:52 -0800 (PST) Received: from x1.thefacebook.com ([65.144.74.34]) by smtp.gmail.com with ESMTPSA id w21sm5285255ioc.34.2019.12.17.06.39.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Dec 2019 06:39:51 -0800 (PST) From: Jens Axboe To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: willy@infradead.org, clm@fb.com, torvalds@linux-foundation.org, david@fromorbit.com, Jens Axboe Subject: [PATCH 1/6] fs: add read support for RWF_UNCACHED Date: Tue, 17 Dec 2019 07:39:43 -0700 Message-Id: <20191217143948.26380-2-axboe@kernel.dk> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191217143948.26380-1-axboe@kernel.dk> References: <20191217143948.26380-1-axboe@kernel.dk> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org If RWF_UNCACHED is set for io_uring (or preadv2(2)), we'll use private pages for the buffered reads. These pages will never be inserted into the page cache, and they are simply droped when we have done the copy at the end of IO. If pages in the read range are already in the page cache, then use those for just copying the data instead of starting IO on private pages. A previous solution used the page cache even for non-cached ranges, but the cost of doing so was too high. Removing nodes at the end is expensive, even with LRU bypass. On top of that, repeatedly instantiating new xarray nodes is very costly, as it needs to memset 576 bytes of data, and freeing said nodes involve an RCU call per node as well. All that adds up, making uncached somewhat slower than O_DIRECT. With the current solition, we're basically at O_DIRECT levels of performance for RWF_UNCACHED IO. Protect against truncate the same way O_DIRECT does, by calling inode_dio_begin() to elevate the inode->i_dio_count. Signed-off-by: Jens Axboe --- include/linux/fs.h | 3 +++ include/uapi/linux/fs.h | 5 ++++- mm/filemap.c | 38 ++++++++++++++++++++++++++++++++------ 3 files changed, 39 insertions(+), 7 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 98e0349adb52..092ea2a4319b 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -314,6 +314,7 @@ enum rw_hint { #define IOCB_SYNC (1 << 5) #define IOCB_WRITE (1 << 6) #define IOCB_NOWAIT (1 << 7) +#define IOCB_UNCACHED (1 << 8) struct kiocb { struct file *ki_filp; @@ -3418,6 +3419,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags) ki->ki_flags |= (IOCB_DSYNC | IOCB_SYNC); if (flags & RWF_APPEND) ki->ki_flags |= IOCB_APPEND; + if (flags & RWF_UNCACHED) + ki->ki_flags |= IOCB_UNCACHED; return 0; } diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index 379a612f8f1d..357ebb0e0c5d 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -299,8 +299,11 @@ typedef int __bitwise __kernel_rwf_t; /* per-IO O_APPEND */ #define RWF_APPEND ((__force __kernel_rwf_t)0x00000010) +/* drop cache after reading or writing data */ +#define RWF_UNCACHED ((__force __kernel_rwf_t)0x00000040) + /* mask of flags supported by the kernel */ #define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\ - RWF_APPEND) + RWF_APPEND | RWF_UNCACHED) #endif /* _UAPI_LINUX_FS_H */ diff --git a/mm/filemap.c b/mm/filemap.c index bf6aa30be58d..7ddc4d8386cf 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1990,6 +1990,13 @@ static void shrink_readahead_size_eio(struct file *filp, ra->ra_pages /= 4; } +static void buffered_put_page(struct page *page, bool clear_mapping) +{ + if (clear_mapping) + page->mapping = NULL; + put_page(page); +} + /** * generic_file_buffered_read - generic file read routine * @iocb: the iocb to read @@ -2013,6 +2020,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, struct address_space *mapping = filp->f_mapping; struct inode *inode = mapping->host; struct file_ra_state *ra = &filp->f_ra; + bool did_dio_begin = false; loff_t *ppos = &iocb->ki_pos; pgoff_t index; pgoff_t last_index; @@ -2032,6 +2040,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, offset = *ppos & ~PAGE_MASK; for (;;) { + bool clear_mapping = false; struct page *page; pgoff_t end_index; loff_t isize; @@ -2048,6 +2057,13 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, if (!page) { if (iocb->ki_flags & IOCB_NOWAIT) goto would_block; + /* UNCACHED implies no read-ahead */ + if (iocb->ki_flags & IOCB_UNCACHED) { + did_dio_begin = true; + /* block truncate for UNCACHED reads */ + inode_dio_begin(inode); + goto no_cached_page; + } page_cache_sync_readahead(mapping, ra, filp, index, last_index - index); @@ -2106,7 +2122,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, isize = i_size_read(inode); end_index = (isize - 1) >> PAGE_SHIFT; if (unlikely(!isize || index > end_index)) { - put_page(page); + buffered_put_page(page, clear_mapping); goto out; } @@ -2115,7 +2131,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, if (index == end_index) { nr = ((isize - 1) & ~PAGE_MASK) + 1; if (nr <= offset) { - put_page(page); + buffered_put_page(page, clear_mapping); goto out; } } @@ -2147,7 +2163,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, offset &= ~PAGE_MASK; prev_offset = offset; - put_page(page); + buffered_put_page(page, clear_mapping); written += ret; if (!iov_iter_count(iter)) goto out; @@ -2189,7 +2205,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, if (unlikely(error)) { if (error == AOP_TRUNCATED_PAGE) { - put_page(page); + buffered_put_page(page, clear_mapping); error = 0; goto find_page; } @@ -2206,7 +2222,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, * invalidate_mapping_pages got it */ unlock_page(page); - put_page(page); + buffered_put_page(page, clear_mapping); goto find_page; } unlock_page(page); @@ -2221,7 +2237,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, readpage_error: /* UHHUH! A synchronous read error occurred. Report it */ - put_page(page); + buffered_put_page(page, clear_mapping); goto out; no_cached_page: @@ -2234,6 +2250,14 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, error = -ENOMEM; goto out; } + if (iocb->ki_flags & IOCB_UNCACHED) { + __SetPageLocked(page); + page->mapping = mapping; + page->index = index; + clear_mapping = true; + goto readpage; + } + error = add_to_page_cache_lru(page, mapping, index, mapping_gfp_constraint(mapping, GFP_KERNEL)); if (error) { @@ -2250,6 +2274,8 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, would_block: error = -EAGAIN; out: + if (did_dio_begin) + inode_dio_end(inode); ra->prev_pos = prev_index; ra->prev_pos <<= PAGE_SHIFT; ra->prev_pos |= prev_offset; From patchwork Tue Dec 17 14:39:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 11297755 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 141F513B6 for ; Tue, 17 Dec 2019 14:39:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E73C224682 for ; Tue, 17 Dec 2019 14:39:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="2T6/KvYQ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728590AbfLQOjz (ORCPT ); Tue, 17 Dec 2019 09:39:55 -0500 Received: from mail-io1-f65.google.com ([209.85.166.65]:37838 "EHLO mail-io1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728493AbfLQOjy (ORCPT ); Tue, 17 Dec 2019 09:39:54 -0500 Received: by mail-io1-f65.google.com with SMTP id k24so9591990ioc.4 for ; Tue, 17 Dec 2019 06:39:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=eKvsGoetsIcjvq9qtw7wq8EOlHDh0dLtS5dcS2crnAM=; b=2T6/KvYQuQLJXZ7z69OfzRHfUkIi00yPsJDIgAwSi0vvuhijjnSfxE68no83S8niWu 4jzOvLSgX/c7gaxyBryaYuILa6HvnP+6BHist5GL9yIZAkFEkCd8+cSblazxNaVhiqcv lwZZcg+Ks/yuF1Gm4gOtk+hZKVJcM0Qz6+lpMD1mMWZsqYYqPfStQ6Vj0kmPNZVJrnnv 5T7Px6Qvf4gmwW81lJw98OofW81TjYZFDOT1up8o63YqZ8IBEohTcpk56PkgqTPwO/c8 C06udPw6lIKChQqvmExmfG604lXCIcMoo3fcfguwOup6upWiqBrUA3BH+twqfs/XpiHj 5btQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=eKvsGoetsIcjvq9qtw7wq8EOlHDh0dLtS5dcS2crnAM=; b=B5nlqUJAolk3YeQP8zaijVcZLsroypdKzK43SAhVvMhJz9PgyOLayfdgt4Gd42UoRy Rl5qK4uOo7VFkB4Ib3Hg+o++r0f5gIswNWGyf6QITkN0zI7195jRIl/NdR0K8wWHADsx RIoXXGeP0BVW5NTddatTHwjOq6t0DtVyhuNppYjw5Clf2G8U8QPefFdYMkRVjRdG1ZFd IwgAMvYEe/Er+eqVXXj+39AyX2H336yMiLLA5wTolmwixTQn4PG8lJz1wdlJCZgWsgeO 8FHDOrRwGGW5EYdvKBc+EiZVtBOdmKfjFl7KrtqvdvPTwM2Atbj8GZLeSI3H7/IBEP1S YFKw== X-Gm-Message-State: APjAAAV1BWR7LImyJlf8kDZKsizlGibZRttBWG61Xq38lr8Rrotzk1Ex 10rrC2KRpurNFz9l6Dgrpm8rMg== X-Google-Smtp-Source: APXvYqxMdQvhXZ9Q18XgGJtSH0yvOWY1lHHyKek4U1CcHUB2OSSyfo2zR9Bt8k43JQ6faNwcUtoHvg== X-Received: by 2002:a6b:ed15:: with SMTP id n21mr1525183iog.128.1576593593105; Tue, 17 Dec 2019 06:39:53 -0800 (PST) Received: from x1.thefacebook.com ([65.144.74.34]) by smtp.gmail.com with ESMTPSA id w21sm5285255ioc.34.2019.12.17.06.39.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Dec 2019 06:39:52 -0800 (PST) From: Jens Axboe To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: willy@infradead.org, clm@fb.com, torvalds@linux-foundation.org, david@fromorbit.com, Jens Axboe Subject: [PATCH 2/6] mm: make generic_perform_write() take a struct kiocb Date: Tue, 17 Dec 2019 07:39:44 -0700 Message-Id: <20191217143948.26380-3-axboe@kernel.dk> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191217143948.26380-1-axboe@kernel.dk> References: <20191217143948.26380-1-axboe@kernel.dk> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Right now all callers pass in iocb->ki_pos, just pass in the iocb. This is in preparation for using the iocb flags in generic_perform_write(). Signed-off-by: Jens Axboe --- fs/ceph/file.c | 2 +- fs/ext4/file.c | 2 +- fs/nfs/file.c | 2 +- include/linux/fs.h | 3 ++- mm/filemap.c | 8 +++++--- 5 files changed, 10 insertions(+), 7 deletions(-) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 11929d2bb594..096c009f188f 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1538,7 +1538,7 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from) * are pending vmtruncate. So write and vmtruncate * can not run at the same time */ - written = generic_perform_write(file, from, pos); + written = generic_perform_write(file, from, iocb); if (likely(written >= 0)) iocb->ki_pos = pos + written; ceph_end_io_write(inode); diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 6a7293a5cda2..9ffb857765d5 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -249,7 +249,7 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb, goto out; current->backing_dev_info = inode_to_bdi(inode); - ret = generic_perform_write(iocb->ki_filp, from, iocb->ki_pos); + ret = generic_perform_write(iocb->ki_filp, from, iocb); current->backing_dev_info = NULL; out: diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 8eb731d9be3e..d8f51a702a4e 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -624,7 +624,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) result = generic_write_checks(iocb, from); if (result > 0) { current->backing_dev_info = inode_to_bdi(inode); - result = generic_perform_write(file, from, iocb->ki_pos); + result = generic_perform_write(file, from, iocb); current->backing_dev_info = NULL; } nfs_end_io_write(inode); diff --git a/include/linux/fs.h b/include/linux/fs.h index 092ea2a4319b..bf58db1bc032 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3103,7 +3103,8 @@ extern ssize_t generic_file_read_iter(struct kiocb *, struct iov_iter *); extern ssize_t __generic_file_write_iter(struct kiocb *, struct iov_iter *); extern ssize_t generic_file_write_iter(struct kiocb *, struct iov_iter *); extern ssize_t generic_file_direct_write(struct kiocb *, struct iov_iter *); -extern ssize_t generic_perform_write(struct file *, struct iov_iter *, loff_t); +extern ssize_t generic_perform_write(struct file *, struct iov_iter *, + struct kiocb *); ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos, rwf_t flags); diff --git a/mm/filemap.c b/mm/filemap.c index 7ddc4d8386cf..522152ed86d8 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3292,10 +3292,11 @@ struct page *grab_cache_page_write_begin(struct address_space *mapping, EXPORT_SYMBOL(grab_cache_page_write_begin); ssize_t generic_perform_write(struct file *file, - struct iov_iter *i, loff_t pos) + struct iov_iter *i, struct kiocb *iocb) { struct address_space *mapping = file->f_mapping; const struct address_space_operations *a_ops = mapping->a_ops; + loff_t pos = iocb->ki_pos; long status = 0; ssize_t written = 0; unsigned int flags = 0; @@ -3429,7 +3430,8 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) if (written < 0 || !iov_iter_count(from) || IS_DAX(inode)) goto out; - status = generic_perform_write(file, from, pos = iocb->ki_pos); + pos = iocb->ki_pos; + status = generic_perform_write(file, from, iocb); /* * If generic_perform_write() returned a synchronous error * then we want to return the number of bytes which were @@ -3461,7 +3463,7 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) */ } } else { - written = generic_perform_write(file, from, iocb->ki_pos); + written = generic_perform_write(file, from, iocb); if (likely(written > 0)) iocb->ki_pos += written; } From patchwork Tue Dec 17 14:39:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 11297759 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E635B6C1 for ; Tue, 17 Dec 2019 14:39:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C30E22465E for ; Tue, 17 Dec 2019 14:39:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="tLwP2zT0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728639AbfLQOj5 (ORCPT ); Tue, 17 Dec 2019 09:39:57 -0500 Received: from mail-il1-f196.google.com ([209.85.166.196]:44272 "EHLO mail-il1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728589AbfLQOjy (ORCPT ); Tue, 17 Dec 2019 09:39:54 -0500 Received: by mail-il1-f196.google.com with SMTP id z12so8523054iln.11 for ; Tue, 17 Dec 2019 06:39:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=EOt7OEDxASzJpz4aW2OtR1NM6hhZ1aYaA0R6ITxPWMg=; b=tLwP2zT0MtI2kOTe48uSomwZIjJUT0pySzMCDXzO1sNdjdjOCFI1XHkUHEOozqFc8c nD8w1OUooAyprmV1auAjotvgf/lwQQgtJmjU0OOuifVAcFA6tNABcFtRb6VTMGQNvXW8 R+UIQQhagkuaIdQ7XtSKY2NXtNhP26jjoETMW/zxAtTmWwRpQW/LEGudNmZGscJC8QcP 5bfFqpV+WXfMvXGjvCr86gr0YU9WkuoxK8tJyuGnk+U8na/TmrO8XsxhHpezKrsvh+V8 wbxIlP/OlD3s6ghZTqeNhQeYNghwcPJnzfXH2JT64KUq2IgoyaXWFWMfddWHwG8HGDQK 3y0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=EOt7OEDxASzJpz4aW2OtR1NM6hhZ1aYaA0R6ITxPWMg=; b=TPg9wNeAdVwA3jawcSb7rM+4z46vyHa4qZbW2t8ukrpz31MY3KYfSHHxMLwaGaTiFV ZZynm+yA7kUrBlNM43lVb1X8+iqwueNuj4Rg5/wUr24py6d16sFB/VF/YHlFISznBgaU 2juJXKWVy4iFMQo4cBshh/pr0wnqwPt8obaVf2WbGTkxaOCZTZTfC7NjCq0fpdL6Vh7O c8GNOpc4SvMIWBe5buo+6We45ncpaH3Culyc5uLKLb9MdTBzDfFb2AZnUz9/RYuE1NZ+ Q/hNKrdx3fAzHG9sSBqCMAG509qAh2fxMtLUoi896WKFZDnwkaM0g4uEYoRz8cLUrjTl 5hNA== X-Gm-Message-State: APjAAAXCvQvAaNuHd7n0d5r4pTCXiZA4WHq89TktT0m/MmRg7Nou2+e/ dMBXNMUtCB6Th/LWC6WJihL9kQ== X-Google-Smtp-Source: APXvYqy373igjl5RaNDqw4GXVKR/B2FPaCQeTRyE7cK02n7sdyI5xVepqdHZGKv7py+DgJxlDb4Q6Q== X-Received: by 2002:a92:cb11:: with SMTP id s17mr10211920ilo.114.1576593593985; Tue, 17 Dec 2019 06:39:53 -0800 (PST) Received: from x1.thefacebook.com ([65.144.74.34]) by smtp.gmail.com with ESMTPSA id w21sm5285255ioc.34.2019.12.17.06.39.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Dec 2019 06:39:53 -0800 (PST) From: Jens Axboe To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: willy@infradead.org, clm@fb.com, torvalds@linux-foundation.org, david@fromorbit.com, Jens Axboe Subject: [PATCH 3/6] mm: make buffered writes work with RWF_UNCACHED Date: Tue, 17 Dec 2019 07:39:45 -0700 Message-Id: <20191217143948.26380-4-axboe@kernel.dk> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191217143948.26380-1-axboe@kernel.dk> References: <20191217143948.26380-1-axboe@kernel.dk> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org If RWF_UNCACHED is set for io_uring (or pwritev2(2)), we'll drop the cache instantiated for buffered writes. If new pages aren't instantiated, we leave them alone. This provides similar semantics to reads with RWF_UNCACHED set. Signed-off-by: Jens Axboe --- include/linux/fs.h | 1 + mm/filemap.c | 41 +++++++++++++++++++++++++++++++++++++++-- 2 files changed, 40 insertions(+), 2 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index bf58db1bc032..5ea5fc167524 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -285,6 +285,7 @@ enum positive_aop_returns { #define AOP_FLAG_NOFS 0x0002 /* used by filesystem to direct * helper code (eg buffer layer) * to clear GFP_FS from alloc */ +#define AOP_FLAG_UNCACHED 0x0004 /* * oh the beauties of C type declarations. diff --git a/mm/filemap.c b/mm/filemap.c index 522152ed86d8..0b5f29b30c34 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3277,10 +3277,12 @@ struct page *grab_cache_page_write_begin(struct address_space *mapping, pgoff_t index, unsigned flags) { struct page *page; - int fgp_flags = FGP_LOCK|FGP_WRITE|FGP_CREAT; + int fgp_flags = FGP_LOCK|FGP_WRITE; if (flags & AOP_FLAG_NOFS) fgp_flags |= FGP_NOFS; + if (!(flags & AOP_FLAG_UNCACHED)) + fgp_flags |= FGP_CREAT; page = pagecache_get_page(mapping, index, fgp_flags, mapping_gfp_mask(mapping)); @@ -3301,6 +3303,9 @@ ssize_t generic_perform_write(struct file *file, ssize_t written = 0; unsigned int flags = 0; + if (iocb->ki_flags & IOCB_UNCACHED) + flags |= AOP_FLAG_UNCACHED; + do { struct page *page; unsigned long offset; /* Offset into pagecache page */ @@ -3333,10 +3338,16 @@ ssize_t generic_perform_write(struct file *file, break; } +retry: status = a_ops->write_begin(file, mapping, pos, bytes, flags, &page, &fsdata); - if (unlikely(status < 0)) + if (unlikely(status < 0)) { + if (status == -ENOMEM && (flags & AOP_FLAG_UNCACHED)) { + flags &= ~AOP_FLAG_UNCACHED; + goto retry; + } break; + } if (mapping_writably_mapped(mapping)) flush_dcache_page(page); @@ -3372,6 +3383,32 @@ ssize_t generic_perform_write(struct file *file, balance_dirty_pages_ratelimited(mapping); } while (iov_iter_count(i)); + if (written && (iocb->ki_flags & IOCB_UNCACHED)) { + loff_t end; + + pos = iocb->ki_pos; + end = pos + written; + + status = filemap_write_and_wait_range(mapping, pos, end); + if (status) + goto out; + + /* + * No pages were created for this range, we're done + */ + if (flags & AOP_FLAG_UNCACHED) + goto out; + + /* + * Try to invalidate cache pages for the range we just wrote. + * We don't care if invalidation fails as the write has still + * worked and leaving clean uptodate pages in the page cache + * isn't a corruption vector for uncached IO. + */ + invalidate_inode_pages2_range(mapping, + pos >> PAGE_SHIFT, end >> PAGE_SHIFT); + } +out: return written ? written : status; } EXPORT_SYMBOL(generic_perform_write); From patchwork Tue Dec 17 14:39:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 11297769 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A4EC3112B for ; Tue, 17 Dec 2019 14:39:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 64E192467B for ; Tue, 17 Dec 2019 14:39:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="HFMnVbnX" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728558AbfLQOj5 (ORCPT ); Tue, 17 Dec 2019 09:39:57 -0500 Received: from mail-io1-f65.google.com ([209.85.166.65]:41443 "EHLO mail-io1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728601AbfLQOj4 (ORCPT ); Tue, 17 Dec 2019 09:39:56 -0500 Received: by mail-io1-f65.google.com with SMTP id c16so11266129ioo.8 for ; Tue, 17 Dec 2019 06:39:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=W8/Oau/0zx1upVgXD58HD2IdXBLquun38bfgvolAqVE=; b=HFMnVbnXL5nozNIYcQ6hd0G1F6cDCr6MWodjZKlkiKMVgERb3yFNCXBVvLbTvUeiPm X4dVlRaSHT8QhoF6Ela8U1soaTakFVvY6c5YW17xeltyddTSof8pRiiQtx3O6PGEFvlM Sa0uUKXQIHHgmfOOrwUdoVqn0hCmO785dqgmann+Uxf2y6TmNsWseofnSfPT1IMYJnUE QAT6oQ0TqmVlR0eG/xarhwSDBmEl7HwC8aIqUgQlzb2RT/ImBcuAhQBUoRLxbL3PdxJD yxJ/j6dInL9jkYTNH2nwwcv86h/OXvoBCKGcSDvlDm98iW2rfyPC/4nUFbDX8jf3VdUu C5Bw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=W8/Oau/0zx1upVgXD58HD2IdXBLquun38bfgvolAqVE=; b=YTd6vFnQIEGF+1hjpFI+sw3ZnItcBPNhT6PK7SJ6ZTqDKMtGbmot32R5DEXjba0Mgp A1QJYxKcdxQlcCzi5N0i+mXXj6zZzCgrm5X5caQnzShHLEugSHIAMVzFcuUkMIiCn5RA hpw4O2Ht7PtefBT4TVETHWgsEbScRr4qL8MQj/oSHp6mYdas1PpcjBhXMcnm1i3p+ClH ZKdUdibocmF3ZS9CA1SG9cm4l1ZfbbOSW7P+tIQyQFZJuonVpkOaWu0kjGMGfOUjzNWm 4ahrhhPi04K55gwxd4vudtvL3eukRYOwPlSASq5YpBQRc3olTqNqdm4h8xStPwKRxqp6 Xj5Q== X-Gm-Message-State: APjAAAXPsZ7aJ1A6WCg8MptrCFfaiAERozRxtvFQUNev47EI+QZBuL+B 1TB7LQlI/1E00PV0fkLEOGe5+w== X-Google-Smtp-Source: APXvYqyrdRNqE7YVIsRIri3H/jmjcdDKJ/YOrL184+dus7WHLg/xK9HpNsJVhN8e+QKeYiZCaVyzjw== X-Received: by 2002:a5d:8d95:: with SMTP id b21mr3885410ioj.303.1576593594982; Tue, 17 Dec 2019 06:39:54 -0800 (PST) Received: from x1.thefacebook.com ([65.144.74.34]) by smtp.gmail.com with ESMTPSA id w21sm5285255ioc.34.2019.12.17.06.39.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Dec 2019 06:39:54 -0800 (PST) From: Jens Axboe To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: willy@infradead.org, clm@fb.com, torvalds@linux-foundation.org, david@fromorbit.com, Jens Axboe Subject: [PATCH 4/6] iomap: add struct iomap_ctx Date: Tue, 17 Dec 2019 07:39:46 -0700 Message-Id: <20191217143948.26380-5-axboe@kernel.dk> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191217143948.26380-1-axboe@kernel.dk> References: <20191217143948.26380-1-axboe@kernel.dk> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org We pass a lot of arguments to iomap_apply(), and subsequently to the actors that it calls. In preparation for adding one more argument, switch them to using a struct iomap_ctx instead. The actor gets a const version of that, they are not supposed to change anything in it. Signed-off-by: Jens Axboe --- fs/dax.c | 25 +++-- fs/iomap/apply.c | 26 +++--- fs/iomap/buffered-io.c | 202 +++++++++++++++++++++++++---------------- fs/iomap/direct-io.c | 57 +++++++----- fs/iomap/fiemap.c | 48 ++++++---- fs/iomap/seek.c | 64 ++++++++----- fs/iomap/swapfile.c | 27 +++--- include/linux/iomap.h | 15 ++- 8 files changed, 278 insertions(+), 186 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 1f1f0201cad1..2637afec30b2 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1090,13 +1090,16 @@ int __dax_zero_page_range(struct block_device *bdev, EXPORT_SYMBOL_GPL(__dax_zero_page_range); static loff_t -dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data, - struct iomap *iomap, struct iomap *srcmap) +dax_iomap_actor(const struct iomap_ctx *data, struct iomap *iomap, + struct iomap *srcmap) { struct block_device *bdev = iomap->bdev; struct dax_device *dax_dev = iomap->dax_dev; - struct iov_iter *iter = data; + struct iov_iter *iter = data->priv; + loff_t pos = data->pos; + loff_t length = pos + data->len; loff_t end = pos + length, done = 0; + struct inode *inode = data->inode; ssize_t ret = 0; size_t xfer; int id; @@ -1197,22 +1200,26 @@ dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter, { struct address_space *mapping = iocb->ki_filp->f_mapping; struct inode *inode = mapping->host; - loff_t pos = iocb->ki_pos, ret = 0, done = 0; - unsigned flags = 0; + loff_t ret = 0, done = 0; + struct iomap_ctx data = { + .inode = inode, + .pos = iocb->ki_pos, + .priv = iter, + }; if (iov_iter_rw(iter) == WRITE) { lockdep_assert_held_write(&inode->i_rwsem); - flags |= IOMAP_WRITE; + data.flags |= IOMAP_WRITE; } else { lockdep_assert_held(&inode->i_rwsem); } while (iov_iter_count(iter)) { - ret = iomap_apply(inode, pos, iov_iter_count(iter), flags, ops, - iter, dax_iomap_actor); + data.len = iov_iter_count(iter); + ret = iomap_apply(&data, ops, dax_iomap_actor); if (ret <= 0) break; - pos += ret; + data.pos += ret; done += ret; } diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c index 76925b40b5fd..792079403a22 100644 --- a/fs/iomap/apply.c +++ b/fs/iomap/apply.c @@ -21,15 +21,16 @@ * iomap_end call. */ loff_t -iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, - const struct iomap_ops *ops, void *data, iomap_actor_t actor) +iomap_apply(struct iomap_ctx *data, const struct iomap_ops *ops, + iomap_actor_t actor) { struct iomap iomap = { .type = IOMAP_HOLE }; struct iomap srcmap = { .type = IOMAP_HOLE }; loff_t written = 0, ret; u64 end; - trace_iomap_apply(inode, pos, length, flags, ops, actor, _RET_IP_); + trace_iomap_apply(data->inode, data->pos, data->len, data->flags, ops, + actor, _RET_IP_); /* * Need to map a range from start position for length bytes. This can @@ -43,17 +44,18 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, * expose transient stale data. If the reserve fails, we can safely * back out at this point as there is nothing to undo. */ - ret = ops->iomap_begin(inode, pos, length, flags, &iomap, &srcmap); + ret = ops->iomap_begin(data->inode, data->pos, data->len, data->flags, + &iomap, &srcmap); if (ret) return ret; - if (WARN_ON(iomap.offset > pos)) + if (WARN_ON(iomap.offset > data->pos)) return -EIO; if (WARN_ON(iomap.length == 0)) return -EIO; - trace_iomap_apply_dstmap(inode, &iomap); + trace_iomap_apply_dstmap(data->inode, &iomap); if (srcmap.type != IOMAP_HOLE) - trace_iomap_apply_srcmap(inode, &srcmap); + trace_iomap_apply_srcmap(data->inode, &srcmap); /* * Cut down the length to the one actually provided by the filesystem, @@ -62,8 +64,8 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, end = iomap.offset + iomap.length; if (srcmap.type != IOMAP_HOLE) end = min(end, srcmap.offset + srcmap.length); - if (pos + length > end) - length = end - pos; + if (data->pos + data->len > end) + data->len = end - data->pos; /* * Now that we have guaranteed that the space allocation will succeed, @@ -77,7 +79,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, * iomap into the actors so that they don't need to have special * handling for the two cases. */ - written = actor(inode, pos, length, data, &iomap, + written = actor(data, &iomap, srcmap.type != IOMAP_HOLE ? &srcmap : &iomap); /* @@ -85,9 +87,9 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, * should not fail unless the filesystem has had a fatal error. */ if (ops->iomap_end) { - ret = ops->iomap_end(inode, pos, length, + ret = ops->iomap_end(data->inode, data->pos, data->len, written > 0 ? written : 0, - flags, &iomap); + data->flags, &iomap); } return written ? written : ret; diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 828444e14d09..7f8300bce767 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -248,14 +248,15 @@ static inline bool iomap_block_needs_zeroing(struct inode *inode, } static loff_t -iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, - struct iomap *iomap, struct iomap *srcmap) +iomap_readpage_actor(const struct iomap_ctx *data, struct iomap *iomap, + struct iomap *srcmap) { - struct iomap_readpage_ctx *ctx = data; + struct iomap_readpage_ctx *ctx = data->priv; + struct inode *inode = data->inode; struct page *page = ctx->cur_page; struct iomap_page *iop = iomap_page_create(inode, page); bool same_page = false, is_contig = false; - loff_t orig_pos = pos; + loff_t pos = data->pos, orig_pos = data->pos; unsigned poff, plen; sector_t sector; @@ -266,7 +267,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, } /* zero post-eof blocks as the page may be mapped */ - iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen); + iomap_adjust_read_range(inode, iop, &pos, data->len, &poff, &plen); if (plen == 0) goto done; @@ -302,7 +303,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, if (!ctx->bio || !is_contig || bio_full(ctx->bio, plen)) { gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL); - int nr_vecs = (length + PAGE_SIZE - 1) >> PAGE_SHIFT; + int nr_vecs = (data->len + PAGE_SIZE - 1) >> PAGE_SHIFT; if (ctx->bio) submit_bio(ctx->bio); @@ -333,16 +334,20 @@ int iomap_readpage(struct page *page, const struct iomap_ops *ops) { struct iomap_readpage_ctx ctx = { .cur_page = page }; - struct inode *inode = page->mapping->host; + struct iomap_ctx data = { + .inode = page->mapping->host, + .priv = &ctx, + .flags = 0 + }; unsigned poff; loff_t ret; trace_iomap_readpage(page->mapping->host, 1); for (poff = 0; poff < PAGE_SIZE; poff += ret) { - ret = iomap_apply(inode, page_offset(page) + poff, - PAGE_SIZE - poff, 0, ops, &ctx, - iomap_readpage_actor); + data.pos = page_offset(page) + poff; + data.len = PAGE_SIZE - poff; + ret = iomap_apply(&data, ops, iomap_readpage_actor); if (ret <= 0) { WARN_ON_ONCE(ret == 0); SetPageError(page); @@ -396,28 +401,34 @@ iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos, } static loff_t -iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length, - void *data, struct iomap *iomap, struct iomap *srcmap) +iomap_readpages_actor(const struct iomap_ctx *data, struct iomap *iomap, + struct iomap *srcmap) { - struct iomap_readpage_ctx *ctx = data; + struct iomap_readpage_ctx *ctx = data->priv; loff_t done, ret; - for (done = 0; done < length; done += ret) { - if (ctx->cur_page && offset_in_page(pos + done) == 0) { + for (done = 0; done < data->len; done += ret) { + struct iomap_ctx rp_data = { + .inode = data->inode, + .pos = data->pos + done, + .len = data->len - done, + .priv = ctx, + }; + + if (ctx->cur_page && offset_in_page(rp_data.pos) == 0) { if (!ctx->cur_page_in_bio) unlock_page(ctx->cur_page); put_page(ctx->cur_page); ctx->cur_page = NULL; } if (!ctx->cur_page) { - ctx->cur_page = iomap_next_page(inode, ctx->pages, - pos, length, &done); + ctx->cur_page = iomap_next_page(data->inode, ctx->pages, + data->pos, data->len, &done); if (!ctx->cur_page) break; ctx->cur_page_in_bio = false; } - ret = iomap_readpage_actor(inode, pos + done, length - done, - ctx, iomap, srcmap); + ret = iomap_readpage_actor(&rp_data, iomap, srcmap); } return done; @@ -431,21 +442,27 @@ iomap_readpages(struct address_space *mapping, struct list_head *pages, .pages = pages, .is_readahead = true, }; - loff_t pos = page_offset(list_entry(pages->prev, struct page, lru)); + struct iomap_ctx data = { + .inode = mapping->host, + .priv = &ctx, + .flags = 0 + }; loff_t last = page_offset(list_entry(pages->next, struct page, lru)); - loff_t length = last - pos + PAGE_SIZE, ret = 0; + loff_t ret = 0; + + data.pos = page_offset(list_entry(pages->prev, struct page, lru)); + data.len = last - data.pos + PAGE_SIZE; - trace_iomap_readpages(mapping->host, nr_pages); + trace_iomap_readpages(data.inode, nr_pages); - while (length > 0) { - ret = iomap_apply(mapping->host, pos, length, 0, ops, - &ctx, iomap_readpages_actor); + while (data.len > 0) { + ret = iomap_apply(&data, ops, iomap_readpages_actor); if (ret <= 0) { WARN_ON_ONCE(ret == 0); goto done; } - pos += ret; - length -= ret; + data.pos += ret; + data.len -= ret; } ret = 0; done: @@ -796,10 +813,13 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied, } static loff_t -iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data, - struct iomap *iomap, struct iomap *srcmap) +iomap_write_actor(const struct iomap_ctx *data, struct iomap *iomap, + struct iomap *srcmap) { - struct iov_iter *i = data; + struct inode *inode = data->inode; + struct iov_iter *i = data->priv; + loff_t length = data->len; + loff_t pos = data->pos; long status = 0; ssize_t written = 0; @@ -879,15 +899,20 @@ ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops) { - struct inode *inode = iocb->ki_filp->f_mapping->host; - loff_t pos = iocb->ki_pos, ret = 0, written = 0; + struct iomap_ctx data = { + .inode = iocb->ki_filp->f_mapping->host, + .pos = iocb->ki_pos, + .priv = iter, + .flags = IOMAP_WRITE + }; + loff_t ret = 0, written = 0; while (iov_iter_count(iter)) { - ret = iomap_apply(inode, pos, iov_iter_count(iter), - IOMAP_WRITE, ops, iter, iomap_write_actor); + data.len = iov_iter_count(iter); + ret = iomap_apply(&data, ops, iomap_write_actor); if (ret <= 0) break; - pos += ret; + data.pos += ret; written += ret; } @@ -896,9 +921,11 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter, EXPORT_SYMBOL_GPL(iomap_file_buffered_write); static loff_t -iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data, - struct iomap *iomap, struct iomap *srcmap) +iomap_unshare_actor(const struct iomap_ctx *data, struct iomap *iomap, + struct iomap *srcmap) { + loff_t pos = data->pos; + loff_t length = data->len; long status = 0; ssize_t written = 0; @@ -914,13 +941,13 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data, unsigned long bytes = min_t(loff_t, PAGE_SIZE - offset, length); struct page *page; - status = iomap_write_begin(inode, pos, bytes, + status = iomap_write_begin(data->inode, pos, bytes, IOMAP_WRITE_F_UNSHARE, &page, iomap, srcmap); if (unlikely(status)) return status; - status = iomap_write_end(inode, pos, bytes, bytes, page, iomap, - srcmap); + status = iomap_write_end(data->inode, pos, bytes, bytes, page, + iomap, srcmap); if (unlikely(status <= 0)) { if (WARN_ON_ONCE(status == 0)) return -EIO; @@ -933,7 +960,7 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data, written += status; length -= status; - balance_dirty_pages_ratelimited(inode->i_mapping); + balance_dirty_pages_ratelimited(data->inode->i_mapping); } while (length); return written; @@ -943,15 +970,20 @@ int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len, const struct iomap_ops *ops) { + struct iomap_ctx data = { + .inode = inode, + .pos = pos, + .len = len, + .flags = IOMAP_WRITE, + }; loff_t ret; - while (len) { - ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, NULL, - iomap_unshare_actor); + while (data.len) { + ret = iomap_apply(&data, ops, iomap_unshare_actor); if (ret <= 0) return ret; - pos += ret; - len -= ret; + data.pos += ret; + data.len -= ret; } return 0; @@ -982,16 +1014,18 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes, } static loff_t -iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count, - void *data, struct iomap *iomap, struct iomap *srcmap) +iomap_zero_range_actor(const struct iomap_ctx *data, struct iomap *iomap, + struct iomap *srcmap) { - bool *did_zero = data; + bool *did_zero = data->priv; + loff_t count = data->len; + loff_t pos = data->pos; loff_t written = 0; int status; /* already zeroed? we're done. */ if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) - return count; + return data->len; do { unsigned offset, bytes; @@ -999,11 +1033,11 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count, offset = offset_in_page(pos); bytes = min_t(loff_t, PAGE_SIZE - offset, count); - if (IS_DAX(inode)) + if (IS_DAX(data->inode)) status = iomap_dax_zero(pos, offset, bytes, iomap); else - status = iomap_zero(inode, pos, offset, bytes, iomap, - srcmap); + status = iomap_zero(data->inode, pos, offset, bytes, + iomap, srcmap); if (status < 0) return status; @@ -1021,16 +1055,22 @@ int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, const struct iomap_ops *ops) { + struct iomap_ctx data = { + .inode = inode, + .pos = pos, + .len = len, + .priv = did_zero, + .flags = IOMAP_ZERO + }; loff_t ret; - while (len > 0) { - ret = iomap_apply(inode, pos, len, IOMAP_ZERO, - ops, did_zero, iomap_zero_range_actor); + while (data.len > 0) { + ret = iomap_apply(&data, ops, iomap_zero_range_actor); if (ret <= 0) return ret; - pos += ret; - len -= ret; + data.pos += ret; + data.len -= ret; } return 0; @@ -1052,57 +1092,59 @@ iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, EXPORT_SYMBOL_GPL(iomap_truncate_page); static loff_t -iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length, - void *data, struct iomap *iomap, struct iomap *srcmap) +iomap_page_mkwrite_actor(const struct iomap_ctx *data, + struct iomap *iomap, struct iomap *srcmap) { - struct page *page = data; + struct page *page = data->priv; int ret; if (iomap->flags & IOMAP_F_BUFFER_HEAD) { - ret = __block_write_begin_int(page, pos, length, NULL, iomap); + ret = __block_write_begin_int(page, data->pos, data->len, NULL, + iomap); if (ret) return ret; - block_commit_write(page, 0, length); + block_commit_write(page, 0, data->len); } else { WARN_ON_ONCE(!PageUptodate(page)); - iomap_page_create(inode, page); + iomap_page_create(data->inode, page); set_page_dirty(page); } - return length; + return data->len; } vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops) { struct page *page = vmf->page; - struct inode *inode = file_inode(vmf->vma->vm_file); - unsigned long length; - loff_t offset, size; + struct iomap_ctx data = { + .inode = file_inode(vmf->vma->vm_file), + .pos = page_offset(page), + .flags = IOMAP_WRITE | IOMAP_FAULT, + .priv = page, + }; ssize_t ret; + loff_t size; lock_page(page); - size = i_size_read(inode); - offset = page_offset(page); - if (page->mapping != inode->i_mapping || offset > size) { + size = i_size_read(data.inode); + if (page->mapping != data.inode->i_mapping || data.pos > size) { /* We overload EFAULT to mean page got truncated */ ret = -EFAULT; goto out_unlock; } /* page is wholly or partially inside EOF */ - if (offset > size - PAGE_SIZE) - length = offset_in_page(size); + if (data.pos > size - PAGE_SIZE) + data.len = offset_in_page(size); else - length = PAGE_SIZE; + data.len = PAGE_SIZE; - while (length > 0) { - ret = iomap_apply(inode, offset, length, - IOMAP_WRITE | IOMAP_FAULT, ops, page, - iomap_page_mkwrite_actor); + while (data.len > 0) { + ret = iomap_apply(&data, ops, iomap_page_mkwrite_actor); if (unlikely(ret <= 0)) goto out_unlock; - offset += ret; - length -= ret; + data.pos += ret; + data.len -= ret; } wait_for_stable_page(page); diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 23837926c0c5..7f1bffa262e1 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -364,24 +364,27 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length, } static loff_t -iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length, - void *data, struct iomap *iomap, struct iomap *srcmap) +iomap_dio_actor(const struct iomap_ctx *data, struct iomap *iomap, + struct iomap *srcmap) { - struct iomap_dio *dio = data; + struct iomap_dio *dio = data->priv; switch (iomap->type) { case IOMAP_HOLE: if (WARN_ON_ONCE(dio->flags & IOMAP_DIO_WRITE)) return -EIO; - return iomap_dio_hole_actor(length, dio); + return iomap_dio_hole_actor(data->len, dio); case IOMAP_UNWRITTEN: if (!(dio->flags & IOMAP_DIO_WRITE)) - return iomap_dio_hole_actor(length, dio); - return iomap_dio_bio_actor(inode, pos, length, dio, iomap); + return iomap_dio_hole_actor(data->len, dio); + return iomap_dio_bio_actor(data->inode, data->pos, data->len, + dio, iomap); case IOMAP_MAPPED: - return iomap_dio_bio_actor(inode, pos, length, dio, iomap); + return iomap_dio_bio_actor(data->inode, data->pos, data->len, + dio, iomap); case IOMAP_INLINE: - return iomap_dio_inline_actor(inode, pos, length, dio, iomap); + return iomap_dio_inline_actor(data->inode, data->pos, data->len, + dio, iomap); default: WARN_ON_ONCE(1); return -EIO; @@ -404,16 +407,19 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, { struct address_space *mapping = iocb->ki_filp->f_mapping; struct inode *inode = file_inode(iocb->ki_filp); - size_t count = iov_iter_count(iter); - loff_t pos = iocb->ki_pos; - loff_t end = iocb->ki_pos + count - 1, ret = 0; - unsigned int flags = IOMAP_DIRECT; + struct iomap_ctx data = { + .inode = inode, + .pos = iocb->ki_pos, + .len = iov_iter_count(iter), + .flags = IOMAP_DIRECT + }; + loff_t end = data.pos + data.len - 1, ret = 0; struct blk_plug plug; struct iomap_dio *dio; lockdep_assert_held(&inode->i_rwsem); - if (!count) + if (!data.len) return 0; if (WARN_ON(is_sync_kiocb(iocb) && !wait_for_completion)) @@ -436,14 +442,16 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, dio->submit.cookie = BLK_QC_T_NONE; dio->submit.last_queue = NULL; + data.priv = dio; + if (iov_iter_rw(iter) == READ) { - if (pos >= dio->i_size) + if (data.pos >= dio->i_size) goto out_free_dio; if (iter_is_iovec(iter)) dio->flags |= IOMAP_DIO_DIRTY; } else { - flags |= IOMAP_WRITE; + data.flags |= IOMAP_WRITE; dio->flags |= IOMAP_DIO_WRITE; /* for data sync or sync, we need sync completion processing */ @@ -461,14 +469,14 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, } if (iocb->ki_flags & IOCB_NOWAIT) { - if (filemap_range_has_page(mapping, pos, end)) { + if (filemap_range_has_page(mapping, data.pos, end)) { ret = -EAGAIN; goto out_free_dio; } - flags |= IOMAP_NOWAIT; + data.flags |= IOMAP_NOWAIT; } - ret = filemap_write_and_wait_range(mapping, pos, end); + ret = filemap_write_and_wait_range(mapping, data.pos, end); if (ret) goto out_free_dio; @@ -479,7 +487,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, * pretty crazy thing to do, so we don't support it 100%. */ ret = invalidate_inode_pages2_range(mapping, - pos >> PAGE_SHIFT, end >> PAGE_SHIFT); + data.pos >> PAGE_SHIFT, end >> PAGE_SHIFT); if (ret) dio_warn_stale_pagecache(iocb->ki_filp); ret = 0; @@ -495,8 +503,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, blk_start_plug(&plug); do { - ret = iomap_apply(inode, pos, count, flags, ops, dio, - iomap_dio_actor); + ret = iomap_apply(&data, ops, iomap_dio_actor); if (ret <= 0) { /* magic error code to fall back to buffered I/O */ if (ret == -ENOTBLK) { @@ -505,18 +512,18 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, } break; } - pos += ret; + data.pos += ret; - if (iov_iter_rw(iter) == READ && pos >= dio->i_size) { + if (iov_iter_rw(iter) == READ && data.pos >= dio->i_size) { /* * We only report that we've read data up to i_size. * Revert iter to a state corresponding to that as * some callers (such as splice code) rely on it. */ - iov_iter_revert(iter, pos - dio->i_size); + iov_iter_revert(iter, data.pos - dio->i_size); break; } - } while ((count = iov_iter_count(iter)) > 0); + } while ((data.len = iov_iter_count(iter)) > 0); blk_finish_plug(&plug); if (ret < 0) diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c index bccf305ea9ce..61cd264a5d36 100644 --- a/fs/iomap/fiemap.c +++ b/fs/iomap/fiemap.c @@ -43,20 +43,20 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi, } static loff_t -iomap_fiemap_actor(struct inode *inode, loff_t pos, loff_t length, void *data, - struct iomap *iomap, struct iomap *srcmap) +iomap_fiemap_actor(const struct iomap_ctx *data, struct iomap *iomap, + struct iomap *srcmap) { - struct fiemap_ctx *ctx = data; - loff_t ret = length; + struct fiemap_ctx *ctx = data->priv; + loff_t ret = data->len; if (iomap->type == IOMAP_HOLE) - return length; + return data->len; ret = iomap_to_fiemap(ctx->fi, &ctx->prev, 0); ctx->prev = *iomap; switch (ret) { case 0: /* success */ - return length; + return data->len; case 1: /* extent array full */ return 0; default: @@ -68,6 +68,13 @@ int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fi, loff_t start, loff_t len, const struct iomap_ops *ops) { struct fiemap_ctx ctx; + struct iomap_ctx data = { + .inode = inode, + .pos = start, + .len = len, + .flags = IOMAP_REPORT, + .priv = &ctx + }; loff_t ret; memset(&ctx, 0, sizeof(ctx)); @@ -84,9 +91,8 @@ int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fi, return ret; } - while (len > 0) { - ret = iomap_apply(inode, start, len, IOMAP_REPORT, ops, &ctx, - iomap_fiemap_actor); + while (data.len > 0) { + ret = iomap_apply(&data, ops, iomap_fiemap_actor); /* inode with no (attribute) mapping will give ENOENT */ if (ret == -ENOENT) break; @@ -95,8 +101,8 @@ int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fi, if (ret == 0) break; - start += ret; - len -= ret; + data.pos += ret; + data.len -= ret; } if (ctx.prev.type != IOMAP_HOLE) { @@ -110,13 +116,14 @@ int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fi, EXPORT_SYMBOL_GPL(iomap_fiemap); static loff_t -iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length, - void *data, struct iomap *iomap, struct iomap *srcmap) +iomap_bmap_actor(const struct iomap_ctx *data, struct iomap *iomap, + struct iomap *srcmap) { - sector_t *bno = data, addr; + sector_t *bno = data->priv, addr; if (iomap->type == IOMAP_MAPPED) { - addr = (pos - iomap->offset + iomap->addr) >> inode->i_blkbits; + addr = (data->pos - iomap->offset + iomap->addr) >> + data->inode->i_blkbits; if (addr > INT_MAX) WARN(1, "would truncate bmap result\n"); else @@ -131,16 +138,19 @@ iomap_bmap(struct address_space *mapping, sector_t bno, const struct iomap_ops *ops) { struct inode *inode = mapping->host; - loff_t pos = bno << inode->i_blkbits; - unsigned blocksize = i_blocksize(inode); + struct iomap_ctx data = { + .inode = inode, + .pos = bno << inode->i_blkbits, + .len = i_blocksize(inode), + .priv = &bno + }; int ret; if (filemap_write_and_wait(mapping)) return 0; bno = 0; - ret = iomap_apply(inode, pos, blocksize, 0, ops, &bno, - iomap_bmap_actor); + ret = iomap_apply(&data, ops, iomap_bmap_actor); if (ret) return 0; return bno; diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c index 89f61d93c0bc..5501be02557a 100644 --- a/fs/iomap/seek.c +++ b/fs/iomap/seek.c @@ -118,21 +118,23 @@ page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length, static loff_t -iomap_seek_hole_actor(struct inode *inode, loff_t offset, loff_t length, - void *data, struct iomap *iomap, struct iomap *srcmap) +iomap_seek_hole_actor(const struct iomap_ctx *data, struct iomap *iomap, + struct iomap *srcmap) { + loff_t offset = data->pos; + switch (iomap->type) { case IOMAP_UNWRITTEN: - offset = page_cache_seek_hole_data(inode, offset, length, - SEEK_HOLE); + offset = page_cache_seek_hole_data(data->inode, offset, + data->len, SEEK_HOLE); if (offset < 0) - return length; + return data->len; /* fall through */ case IOMAP_HOLE: - *(loff_t *)data = offset; + *(loff_t *)data->priv = offset; return 0; default: - return length; + return data->len; } } @@ -140,23 +142,28 @@ loff_t iomap_seek_hole(struct inode *inode, loff_t offset, const struct iomap_ops *ops) { loff_t size = i_size_read(inode); - loff_t length = size - offset; + struct iomap_ctx data = { + .inode = inode, + .len = size - offset, + .priv = &offset, + .flags = IOMAP_REPORT + }; loff_t ret; /* Nothing to be found before or beyond the end of the file. */ if (offset < 0 || offset >= size) return -ENXIO; - while (length > 0) { - ret = iomap_apply(inode, offset, length, IOMAP_REPORT, ops, - &offset, iomap_seek_hole_actor); + while (data.len > 0) { + data.pos = offset; + ret = iomap_apply(&data, ops, iomap_seek_hole_actor); if (ret < 0) return ret; if (ret == 0) break; offset += ret; - length -= ret; + data.len -= ret; } return offset; @@ -164,20 +171,22 @@ iomap_seek_hole(struct inode *inode, loff_t offset, const struct iomap_ops *ops) EXPORT_SYMBOL_GPL(iomap_seek_hole); static loff_t -iomap_seek_data_actor(struct inode *inode, loff_t offset, loff_t length, - void *data, struct iomap *iomap, struct iomap *srcmap) +iomap_seek_data_actor(const struct iomap_ctx *data, struct iomap *iomap, + struct iomap *srcmap) { + loff_t offset = data->pos; + switch (iomap->type) { case IOMAP_HOLE: - return length; + return data->len; case IOMAP_UNWRITTEN: - offset = page_cache_seek_hole_data(inode, offset, length, - SEEK_DATA); + offset = page_cache_seek_hole_data(data->inode, offset, + data->len, SEEK_DATA); if (offset < 0) - return length; + return data->len; /*FALLTHRU*/ default: - *(loff_t *)data = offset; + *(loff_t *)data->priv = offset; return 0; } } @@ -186,26 +195,31 @@ loff_t iomap_seek_data(struct inode *inode, loff_t offset, const struct iomap_ops *ops) { loff_t size = i_size_read(inode); - loff_t length = size - offset; + struct iomap_ctx data = { + .inode = inode, + .len = size - offset, + .priv = &offset, + .flags = IOMAP_REPORT + }; loff_t ret; /* Nothing to be found before or beyond the end of the file. */ if (offset < 0 || offset >= size) return -ENXIO; - while (length > 0) { - ret = iomap_apply(inode, offset, length, IOMAP_REPORT, ops, - &offset, iomap_seek_data_actor); + while (data.len > 0) { + data.pos = offset; + ret = iomap_apply(&data, ops, iomap_seek_data_actor); if (ret < 0) return ret; if (ret == 0) break; offset += ret; - length -= ret; + data.len -= ret; } - if (length <= 0) + if (data.len <= 0) return -ENXIO; return offset; } diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c index a648dbf6991e..aae04b40a3b7 100644 --- a/fs/iomap/swapfile.c +++ b/fs/iomap/swapfile.c @@ -75,11 +75,10 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi) * swap only cares about contiguous page-aligned physical extents and makes no * distinction between written and unwritten extents. */ -static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos, - loff_t count, void *data, struct iomap *iomap, - struct iomap *srcmap) +static loff_t iomap_swapfile_activate_actor(const struct iomap_ctx *data, + struct iomap *iomap, struct iomap *srcmap) { - struct iomap_swapfile_info *isi = data; + struct iomap_swapfile_info *isi = data->priv; int error; switch (iomap->type) { @@ -125,7 +124,7 @@ static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos, return error; memcpy(&isi->iomap, iomap, sizeof(isi->iomap)); } - return count; + return data->len; } /* @@ -142,8 +141,13 @@ int iomap_swapfile_activate(struct swap_info_struct *sis, }; struct address_space *mapping = swap_file->f_mapping; struct inode *inode = mapping->host; - loff_t pos = 0; - loff_t len = ALIGN_DOWN(i_size_read(inode), PAGE_SIZE); + struct iomap_ctx data = { + .inode = inode, + .pos = 0, + .len = ALIGN_DOWN(i_size_read(inode), PAGE_SIZE), + .priv = &isi, + .flags = IOMAP_REPORT + }; loff_t ret; /* @@ -154,14 +158,13 @@ int iomap_swapfile_activate(struct swap_info_struct *sis, if (ret) return ret; - while (len > 0) { - ret = iomap_apply(inode, pos, len, IOMAP_REPORT, - ops, &isi, iomap_swapfile_activate_actor); + while (data.len > 0) { + ret = iomap_apply(&data, ops, iomap_swapfile_activate_actor); if (ret <= 0) return ret; - pos += ret; - len -= ret; + data.pos += ret; + data.len -= ret; } if (isi.iomap.length) { diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 8b09463dae0d..00e439aac8ea 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -145,11 +145,18 @@ struct iomap_ops { /* * Main iomap iterator function. */ -typedef loff_t (*iomap_actor_t)(struct inode *inode, loff_t pos, loff_t len, - void *data, struct iomap *iomap, struct iomap *srcmap); +struct iomap_ctx { + struct inode *inode; + loff_t pos; + loff_t len; + void *priv; + unsigned flags; +}; + +typedef loff_t (*iomap_actor_t)(const struct iomap_ctx *data, + struct iomap *iomap, struct iomap *srcmap); -loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length, - unsigned flags, const struct iomap_ops *ops, void *data, +loff_t iomap_apply(struct iomap_ctx *data, const struct iomap_ops *ops, iomap_actor_t actor); ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from, From patchwork Tue Dec 17 14:39:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 11297763 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6DEBC186D for ; Tue, 17 Dec 2019 14:39:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 42C5924672 for ; Tue, 17 Dec 2019 14:39:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="msJo2enQ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728668AbfLQOj5 (ORCPT ); Tue, 17 Dec 2019 09:39:57 -0500 Received: from mail-io1-f65.google.com ([209.85.166.65]:33517 "EHLO mail-io1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728612AbfLQOj4 (ORCPT ); Tue, 17 Dec 2019 09:39:56 -0500 Received: by mail-io1-f65.google.com with SMTP id z8so9409844ioh.0 for ; Tue, 17 Dec 2019 06:39:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=q2t1wbHxKKKtUdtcUmZwxHQTSURnhol0rUQmYc3rnRQ=; b=msJo2enQjInrTvHVWg7Q+Z7EQ/nUVNXBraZbHG0oDoPJai9Aw7fJelgGmqmg/yLz3Q IxNq7da227XNWskPs5YZDPkksLrksK+B3OlR4q2IpVoEhR4KXC5DfOgwykS+rNJXLWsr 6M4whoghpniEmcY1DAo891btCNofKb7o9xwPc+DGmYSsLnmhOIoVUkLefX6kwwIigoEP +Y7/MRJx/lIJnHCLqCK2uvHxatGExrnKbHZhcBoLJV3muyKEMeqWgGgsNEpklsg9O5ZE BlgVdEjoljwWHtcsa7FR1vyJ8iNszpL3IemrDGlKfiN/T/qWAtG+ESv//j24ERZiaYYD 3lXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=q2t1wbHxKKKtUdtcUmZwxHQTSURnhol0rUQmYc3rnRQ=; b=Nyjy944JBnieLBOzRbanLzt/RxXeeogCTt1hWh7nVfGQ0w+KwcdVFC0iSaFgiSKOV7 II3kyWynSof1YGDY1MOqT2Vh8Z+YTtVCsaJUyVSMLmzSjPtiNXUyNL2Ejybhy42RXmfU sZw6iuel74D0uOXiighIwv/aEiAk/aUpxWDlRVpG8xR5TrSQEtmCkyq4CpunFLulK4AP LRdRgwZcjiU7YRVJDdzYXe0M4pJeq3WBFegF2CYbezLyjf4eadpG7gTdScW3a7IdX9/a IIrkG/ugCt5xPJ2TcR+euLKlSwIhqJfIUhJgKUGagAyV8ixgjKM/l68jPu1bdaXY3nn+ cn7A== X-Gm-Message-State: APjAAAVX+jo+Ht9luv30oqhJM/vic4go4f9imyZn0q1iGDUJUH3Lmtfa 1fJjmiO1zCR+V4regEL73viiZA== X-Google-Smtp-Source: APXvYqyAnsuB8ZAralfexxDLoU7WvWvM0qDJKnIIFzyd4aLl/ADMJwnw3aX9J/s22QIMOlWMxosrwQ== X-Received: by 2002:a6b:7409:: with SMTP id s9mr4133569iog.197.1576593595835; Tue, 17 Dec 2019 06:39:55 -0800 (PST) Received: from x1.thefacebook.com ([65.144.74.34]) by smtp.gmail.com with ESMTPSA id w21sm5285255ioc.34.2019.12.17.06.39.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Dec 2019 06:39:55 -0800 (PST) From: Jens Axboe To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: willy@infradead.org, clm@fb.com, torvalds@linux-foundation.org, david@fromorbit.com, Jens Axboe Subject: [PATCH 5/6] iomap: support RWF_UNCACHED for buffered writes Date: Tue, 17 Dec 2019 07:39:47 -0700 Message-Id: <20191217143948.26380-6-axboe@kernel.dk> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191217143948.26380-1-axboe@kernel.dk> References: <20191217143948.26380-1-axboe@kernel.dk> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This adds support for RWF_UNCACHED for file systems using iomap to perform buffered writes. We use the generic infrastructure for this, by tracking pages we created and calling write_drop_cached_pages() to issue writeback and prune those pages. Signed-off-by: Jens Axboe --- fs/iomap/apply.c | 35 +++++++++++++++++++++++++++++++++++ fs/iomap/buffered-io.c | 28 ++++++++++++++++++++++++---- fs/iomap/trace.h | 4 +++- include/linux/iomap.h | 5 +++++ 4 files changed, 67 insertions(+), 5 deletions(-) diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c index 792079403a22..687e86945b27 100644 --- a/fs/iomap/apply.c +++ b/fs/iomap/apply.c @@ -92,5 +92,40 @@ iomap_apply(struct iomap_ctx *data, const struct iomap_ops *ops, data->flags, &iomap); } + if (written <= 0) + goto out; + + /* + * If this is an uncached write, then we need to write and sync this + * range of data. This is only true for a buffered write, not for + * O_DIRECT. + */ + if ((data->flags & (IOMAP_WRITE|IOMAP_DIRECT|IOMAP_UNCACHED)) == + (IOMAP_WRITE|IOMAP_UNCACHED)) { + struct address_space *mapping = data->inode->i_mapping; + + end = data->pos + written; + ret = filemap_write_and_wait_range(mapping, data->pos, end); + if (ret) + goto out; + + /* + * No pages were created for this range, we're done. We only + * invalidate the range if no pages were created for the + * entire range. + */ + if (!(iomap.flags & IOMAP_F_PAGE_CREATE)) + goto out; + + /* + * Try to invalidate cache pages for the range we just wrote. + * We don't care if invalidation fails as the write has still + * worked and leaving clean uptodate pages in the page cache + * isn't a corruption vector for uncached IO. + */ + invalidate_inode_pages2_range(mapping, + data->pos >> PAGE_SHIFT, end >> PAGE_SHIFT); + } +out: return written ? written : ret; } diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 7f8300bce767..328afeba950f 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -582,6 +582,7 @@ EXPORT_SYMBOL_GPL(iomap_migrate_page); enum { IOMAP_WRITE_F_UNSHARE = (1 << 0), + IOMAP_WRITE_F_UNCACHED = (1 << 1), }; static void @@ -659,6 +660,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, struct page **pagep, struct iomap *iomap, struct iomap *srcmap) { const struct iomap_page_ops *page_ops = iomap->page_ops; + unsigned aop_flags; struct page *page; int status = 0; @@ -675,8 +677,11 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, return status; } + aop_flags = AOP_FLAG_NOFS; + if (flags & IOMAP_WRITE_F_UNCACHED) + aop_flags |= AOP_FLAG_UNCACHED; page = grab_cache_page_write_begin(inode->i_mapping, pos >> PAGE_SHIFT, - AOP_FLAG_NOFS); + aop_flags); if (!page) { status = -ENOMEM; goto out_no_page; @@ -820,9 +825,13 @@ iomap_write_actor(const struct iomap_ctx *data, struct iomap *iomap, struct iov_iter *i = data->priv; loff_t length = data->len; loff_t pos = data->pos; + unsigned flags = 0; long status = 0; ssize_t written = 0; + if (data->flags & IOMAP_UNCACHED) + flags |= IOMAP_WRITE_F_UNCACHED; + do { struct page *page; unsigned long offset; /* Offset into pagecache page */ @@ -851,10 +860,18 @@ iomap_write_actor(const struct iomap_ctx *data, struct iomap *iomap, break; } - status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, - srcmap); - if (unlikely(status)) +retry: + status = iomap_write_begin(inode, pos, bytes, flags, + &page, iomap, srcmap); + if (unlikely(status)) { + if (status == -ENOMEM && + (flags & IOMAP_WRITE_F_UNCACHED)) { + iomap->flags |= IOMAP_F_PAGE_CREATE; + flags &= ~IOMAP_WRITE_F_UNCACHED; + goto retry; + } break; + } if (mapping_writably_mapped(inode->i_mapping)) flush_dcache_page(page); @@ -907,6 +924,9 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter, }; loff_t ret = 0, written = 0; + if (iocb->ki_flags & IOCB_UNCACHED) + data.flags |= IOMAP_UNCACHED; + while (iov_iter_count(iter)) { data.len = iov_iter_count(iter); ret = iomap_apply(&data, ops, iomap_write_actor); diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h index 6dc227b8c47e..63c771e3eef5 100644 --- a/fs/iomap/trace.h +++ b/fs/iomap/trace.h @@ -93,7 +93,8 @@ DEFINE_PAGE_EVENT(iomap_invalidatepage); { IOMAP_REPORT, "REPORT" }, \ { IOMAP_FAULT, "FAULT" }, \ { IOMAP_DIRECT, "DIRECT" }, \ - { IOMAP_NOWAIT, "NOWAIT" } + { IOMAP_NOWAIT, "NOWAIT" }, \ + { IOMAP_UNCACHED, "UNCACHED" } #define IOMAP_F_FLAGS_STRINGS \ { IOMAP_F_NEW, "NEW" }, \ @@ -101,6 +102,7 @@ DEFINE_PAGE_EVENT(iomap_invalidatepage); { IOMAP_F_SHARED, "SHARED" }, \ { IOMAP_F_MERGED, "MERGED" }, \ { IOMAP_F_BUFFER_HEAD, "BH" }, \ + { IOMAP_F_PAGE_CREATE, "PAGE_CREATE" }, \ { IOMAP_F_SIZE_CHANGED, "SIZE_CHANGED" } DECLARE_EVENT_CLASS(iomap_class, diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 00e439aac8ea..58311b6fdfdd 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -48,12 +48,16 @@ struct vm_fault; * * IOMAP_F_BUFFER_HEAD indicates that the file system requires the use of * buffer heads for this mapping. + * + * IOMAP_F_PAGE_CREATE indicates that pages had to be allocated to satisfy + * this operation. */ #define IOMAP_F_NEW 0x01 #define IOMAP_F_DIRTY 0x02 #define IOMAP_F_SHARED 0x04 #define IOMAP_F_MERGED 0x08 #define IOMAP_F_BUFFER_HEAD 0x10 +#define IOMAP_F_PAGE_CREATE 0x20 /* * Flags set by the core iomap code during operations: @@ -121,6 +125,7 @@ struct iomap_page_ops { #define IOMAP_FAULT (1 << 3) /* mapping for page fault */ #define IOMAP_DIRECT (1 << 4) /* direct I/O */ #define IOMAP_NOWAIT (1 << 5) /* do not block */ +#define IOMAP_UNCACHED (1 << 6) /* uncached IO */ struct iomap_ops { /* From patchwork Tue Dec 17 14:39:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 11297767 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E3AF6112B for ; Tue, 17 Dec 2019 14:39:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C0A7C2467E for ; Tue, 17 Dec 2019 14:39:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="1w7hkIWz" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727926AbfLQOj5 (ORCPT ); Tue, 17 Dec 2019 09:39:57 -0500 Received: from mail-io1-f66.google.com ([209.85.166.66]:44334 "EHLO mail-io1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728642AbfLQOj5 (ORCPT ); Tue, 17 Dec 2019 09:39:57 -0500 Received: by mail-io1-f66.google.com with SMTP id b10so11207874iof.11 for ; Tue, 17 Dec 2019 06:39:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=hSvlHPa+iglQscNJSf7xQ/5HwRTlC6yUkzeNdFCcqV8=; b=1w7hkIWzvLtMu/dLZZMcZIijlZyfYWT25XG5COOJJfc8oH+ykaNVmhW+kyy47uEmxT 44BfhsIy0PIQPcGGVhchbkR3/gV/mUaau6NjOzs4QWM9ffCFMNcxvMItscKe+DnZ8qPF Po3NAj4x4dgXGv6qHecBmhualhrDq1w6EvAdBXsygv8MWFEzZ7R9YWkEgVggpMNvni6n n37RGtE7bIULBdKOjo8NKGU+SO0IhHzTxOZvEEt83L9ftLfxh3r/Lej6KO9UMyH6X+iN W11mqMTLUelQXj54y55ExU0ini0NqxR9dbme6TbUTRm5QE3/urjzZE6sDy95hbxBcJ6x ntdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=hSvlHPa+iglQscNJSf7xQ/5HwRTlC6yUkzeNdFCcqV8=; b=DAm5u0PEHCgCPT8NtrDP1SSCBmaJUdOVVd8WE0/tk8lCOh2NXL5ot/zQOh7yw9rZ3/ Q3t7MCOzXEQCqoCd5vQ0G0kSGr2PaMdBXxXU19sJkglkBogkQHJTYmJ1C3ZqxtZfuSoQ cF2/8H9XjwqP9awE4SOHSNDyTRfeBjJqWGiQxEVCWwZIJQRyzlFEbTsVEqk95RJSX2HU 8u+uyATEJcP4+x1wwspvNmhVkZxdIwg3vqKgCdkxkMqMi2zCxVEjPopxGP4JYClBttoS SQZXyIlRkCieVv/7ps/n0K+w7buVi7BfYrF4qeMpgEawPhy9Ef9bj5V8D4s5uvfWfYxy T+Hg== X-Gm-Message-State: APjAAAU4OusUOa5HR6KB5nlQfJF8phk6yUY88F3AKouIYR1TU40ekZfU QoRt1y9F1sH1cU7Y8xTpL8+zQw== X-Google-Smtp-Source: APXvYqyLgOsF8+OSoa6LuQKDnmSyMVA3kpVF98IU1cYOBzMtc4vb4+N38upf1F2528sIApqSicZCyw== X-Received: by 2002:a6b:680d:: with SMTP id d13mr3796828ioc.188.1576593596972; Tue, 17 Dec 2019 06:39:56 -0800 (PST) Received: from x1.thefacebook.com ([65.144.74.34]) by smtp.gmail.com with ESMTPSA id w21sm5285255ioc.34.2019.12.17.06.39.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Dec 2019 06:39:56 -0800 (PST) From: Jens Axboe To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: willy@infradead.org, clm@fb.com, torvalds@linux-foundation.org, david@fromorbit.com, Jens Axboe Subject: [PATCH 6/6] xfs: don't do delayed allocations for uncached buffered writes Date: Tue, 17 Dec 2019 07:39:48 -0700 Message-Id: <20191217143948.26380-7-axboe@kernel.dk> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191217143948.26380-1-axboe@kernel.dk> References: <20191217143948.26380-1-axboe@kernel.dk> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This data is going to be written immediately, so don't bother trying to do delayed allocation for it. Suggested-by: Dave Chinner Signed-off-by: Jens Axboe Reviewed-by: Darrick J. Wong --- fs/xfs/xfs_iomap.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 28e2d1f37267..d0cd4a05d59f 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -847,8 +847,11 @@ xfs_buffered_write_iomap_begin( int allocfork = XFS_DATA_FORK; int error = 0; - /* we can't use delayed allocations when using extent size hints */ - if (xfs_get_extsz_hint(ip)) + /* + * Don't do delayed allocations when using extent size hints, or + * if we were asked to do uncached buffered writes. + */ + if (xfs_get_extsz_hint(ip) || (flags & IOMAP_UNCACHED)) return xfs_direct_write_iomap_begin(inode, offset, count, flags, iomap, srcmap);