From patchwork Fri Jun 21 19:28:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11010667 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1BE6314E5 for ; Fri, 21 Jun 2019 19:28:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0EAB328721 for ; Fri, 21 Jun 2019 19:28:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0329428B9C; Fri, 21 Jun 2019 19:28:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3FD5528721 for ; Fri, 21 Jun 2019 19:28:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726098AbfFUT2h (ORCPT ); Fri, 21 Jun 2019 15:28:37 -0400 Received: from mx2.suse.de ([195.135.220.15]:33198 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725947AbfFUT2g (ORCPT ); Fri, 21 Jun 2019 15:28:36 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 0E6A0AF0F; Fri, 21 Jun 2019 19:28:35 +0000 (UTC) From: Goldwyn Rodrigues To: linux-btrfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, hch@lst.de, darrick.wong@oracle.com, david@fromorbit.com, Goldwyn Rodrigues Subject: [PATCH 1/6] iomap: Use a IOMAP_COW/srcmap for a read-modify-write I/O Date: Fri, 21 Jun 2019 14:28:23 -0500 Message-Id: <20190621192828.28900-2-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190621192828.28900-1-rgoldwyn@suse.de> References: <20190621192828.28900-1-rgoldwyn@suse.de> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Goldwyn Rodrigues Introduces a new type IOMAP_COW, which means the data at offset must be read from a srcmap and copied before performing the write on the offset. The srcmap is used to identify where the read is to be performed from. This is passed to iomap->begin(), which is supposed to put in the details for reading, typically set with type IOMAP_READ. Signed-off-by: Goldwyn Rodrigues --- fs/dax.c | 8 +++++--- fs/ext2/inode.c | 2 +- fs/ext4/inode.c | 2 +- fs/gfs2/bmap.c | 3 ++- fs/internal.h | 2 +- fs/iomap.c | 31 ++++++++++++++++--------------- fs/xfs/xfs_iomap.c | 9 ++++++--- include/linux/iomap.h | 4 +++- 8 files changed, 35 insertions(+), 26 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 2e48c7ebb973..80b9e2599223 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1078,7 +1078,7 @@ EXPORT_SYMBOL_GPL(__dax_zero_page_range); static loff_t dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data, - struct iomap *iomap) + struct iomap *iomap, struct iomap *srcmap) { struct block_device *bdev = iomap->bdev; struct dax_device *dax_dev = iomap->dax_dev; @@ -1236,6 +1236,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, unsigned long vaddr = vmf->address; loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT; struct iomap iomap = { 0 }; + struct iomap srcmap = { 0 }; unsigned flags = IOMAP_FAULT; int error, major = 0; bool write = vmf->flags & FAULT_FLAG_WRITE; @@ -1280,7 +1281,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, * the file system block size to be equal the page size, which means * that we never have to deal with more than a single extent here. */ - error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap); + error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap, &srcmap); if (iomap_errp) *iomap_errp = error; if (error) { @@ -1460,6 +1461,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, struct inode *inode = mapping->host; vm_fault_t result = VM_FAULT_FALLBACK; struct iomap iomap = { 0 }; + struct iomap srcmap = { 0 }; pgoff_t max_pgoff; void *entry; loff_t pos; @@ -1534,7 +1536,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, * to look up our filesystem block. */ pos = (loff_t)xas.xa_index << PAGE_SHIFT; - error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap); + error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap, &srcmap); if (error) goto unlock_entry; diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index e474127dd255..f081f11980ad 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -801,7 +801,7 @@ int ext2_get_block(struct inode *inode, sector_t iblock, #ifdef CONFIG_FS_DAX static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length, - unsigned flags, struct iomap *iomap) + unsigned flags, struct iomap *iomap, struct iomap *srcmap) { unsigned int blkbits = inode->i_blkbits; unsigned long first_block = offset >> blkbits; diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index c7f77c643008..a8017e0c302b 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3437,7 +3437,7 @@ static bool ext4_inode_datasync_dirty(struct inode *inode) } static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, - unsigned flags, struct iomap *iomap) + unsigned flags, struct iomap *iomap, struct iomap *srcmap) { struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); unsigned int blkbits = inode->i_blkbits; diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index 93ea1d529aa3..affa0c4305b7 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -1124,7 +1124,8 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos, } static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length, - unsigned flags, struct iomap *iomap) + unsigned flags, struct iomap *iomap, + struct iomap *srcmap) { struct gfs2_inode *ip = GFS2_I(inode); struct metapath mp = { .mp_aheight = 1, }; diff --git a/fs/internal.h b/fs/internal.h index a48ef81be37d..79e495d86165 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -188,7 +188,7 @@ extern int do_vfs_ioctl(struct file *file, unsigned int fd, unsigned int cmd, * iomap support: */ typedef loff_t (*iomap_actor_t)(struct inode *inode, loff_t pos, loff_t len, - void *data, struct iomap *iomap); + void *data, struct iomap *iomap, struct iomap *srcmap); loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, const struct iomap_ops *ops, void *data, diff --git a/fs/iomap.c b/fs/iomap.c index 23ef63fd1669..6648957af268 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -41,6 +41,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, const struct iomap_ops *ops, void *data, iomap_actor_t actor) { struct iomap iomap = { 0 }; + struct iomap srcmap = { 0 }; loff_t written = 0, ret; /* @@ -55,7 +56,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, * expose transient stale data. If the reserve fails, we can safely * back out at this point as there is nothing to undo. */ - ret = ops->iomap_begin(inode, pos, length, flags, &iomap); + ret = ops->iomap_begin(inode, pos, length, flags, &iomap, &srcmap); if (ret) return ret; if (WARN_ON(iomap.offset > pos)) @@ -75,7 +76,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, * we can do the copy-in page by page without having to worry about * failures exposing transient data. */ - written = actor(inode, pos, length, data, &iomap); + written = actor(inode, pos, length, data, &iomap, &srcmap); /* * Now the data has been copied, commit the range we've copied. This @@ -282,7 +283,7 @@ iomap_read_inline_data(struct inode *inode, struct page *page, static loff_t iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, - struct iomap *iomap) + struct iomap *iomap, struct iomap *srcmap) { struct iomap_readpage_ctx *ctx = data; struct page *page = ctx->cur_page; @@ -424,7 +425,7 @@ iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos, static loff_t iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length, - void *data, struct iomap *iomap) + void *data, struct iomap *iomap, struct iomap *srcmap) { struct iomap_readpage_ctx *ctx = data; loff_t done, ret; @@ -444,7 +445,7 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length, ctx->cur_page_in_bio = false; } ret = iomap_readpage_actor(inode, pos + done, length - done, - ctx, iomap); + ctx, iomap, srcmap); } return done; @@ -796,7 +797,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len, static loff_t iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data, - struct iomap *iomap) + struct iomap *iomap, struct iomap *srcmap) { struct iov_iter *i = data; long status = 0; @@ -913,7 +914,7 @@ __iomap_read_page(struct inode *inode, loff_t offset) static loff_t iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data, - struct iomap *iomap) + struct iomap *iomap, struct iomap *srcmap) { long status = 0; ssize_t written = 0; @@ -1002,7 +1003,7 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes, static loff_t iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count, - void *data, struct iomap *iomap) + void *data, struct iomap *iomap, struct iomap *srcmap) { bool *did_zero = data; loff_t written = 0; @@ -1071,7 +1072,7 @@ EXPORT_SYMBOL_GPL(iomap_truncate_page); static loff_t iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length, - void *data, struct iomap *iomap) + void *data, struct iomap *iomap, struct iomap *srcmap) { struct page *page = data; int ret; @@ -1169,7 +1170,7 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi, static loff_t iomap_fiemap_actor(struct inode *inode, loff_t pos, loff_t length, void *data, - struct iomap *iomap) + struct iomap *iomap, struct iomap *srcmap) { struct fiemap_ctx *ctx = data; loff_t ret = length; @@ -1343,7 +1344,7 @@ page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length, static loff_t iomap_seek_hole_actor(struct inode *inode, loff_t offset, loff_t length, - void *data, struct iomap *iomap) + void *data, struct iomap *iomap, struct iomap *srcmap) { switch (iomap->type) { case IOMAP_UNWRITTEN: @@ -1389,7 +1390,7 @@ EXPORT_SYMBOL_GPL(iomap_seek_hole); static loff_t iomap_seek_data_actor(struct inode *inode, loff_t offset, loff_t length, - void *data, struct iomap *iomap) + void *data, struct iomap *iomap, struct iomap *srcmap) { switch (iomap->type) { case IOMAP_HOLE: @@ -1790,7 +1791,7 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length, static loff_t iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length, - void *data, struct iomap *iomap) + void *data, struct iomap *iomap, struct iomap *srcmap) { struct iomap_dio *dio = data; @@ -2057,7 +2058,7 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi) * distinction between written and unwritten extents. */ static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos, - loff_t count, void *data, struct iomap *iomap) + loff_t count, void *data, struct iomap *iomap, struct iomap *srcmap) { struct iomap_swapfile_info *isi = data; int error; @@ -2161,7 +2162,7 @@ EXPORT_SYMBOL_GPL(iomap_swapfile_activate); static loff_t iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length, - void *data, struct iomap *iomap) + void *data, struct iomap *iomap, struct iomap *srcmap) { sector_t *bno = data, addr; diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 63d323916bba..6116a75f03da 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -925,7 +925,8 @@ xfs_file_iomap_begin( loff_t offset, loff_t length, unsigned flags, - struct iomap *iomap) + struct iomap *iomap, + struct iomap *srcmap) { struct xfs_inode *ip = XFS_I(inode); struct xfs_mount *mp = ip->i_mount; @@ -1148,7 +1149,8 @@ xfs_seek_iomap_begin( loff_t offset, loff_t length, unsigned flags, - struct iomap *iomap) + struct iomap *iomap, + struct iomap *srcmap) { struct xfs_inode *ip = XFS_I(inode); struct xfs_mount *mp = ip->i_mount; @@ -1234,7 +1236,8 @@ xfs_xattr_iomap_begin( loff_t offset, loff_t length, unsigned flags, - struct iomap *iomap) + struct iomap *iomap, + struct iomap *srcmap) { struct xfs_inode *ip = XFS_I(inode); struct xfs_mount *mp = ip->i_mount; diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 2103b94cb1bf..f49767c7fd83 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -25,6 +25,7 @@ struct vm_fault; #define IOMAP_MAPPED 0x03 /* blocks allocated at @addr */ #define IOMAP_UNWRITTEN 0x04 /* blocks allocated at @addr in unwritten state */ #define IOMAP_INLINE 0x05 /* data inline in the inode */ +#define IOMAP_COW 0x06 /* copy data from srcmap before writing */ /* * Flags for all iomap mappings: @@ -102,7 +103,8 @@ struct iomap_ops { * The actual length is returned in iomap->length. */ int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length, - unsigned flags, struct iomap *iomap); + unsigned flags, struct iomap *iomap, + struct iomap *srcmap); /* * Commit and/or unreserve space previous allocated using iomap_begin. From patchwork Fri Jun 21 19:28:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11010671 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A109614E5 for ; Fri, 21 Jun 2019 19:28:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 92B4428B9B for ; Fri, 21 Jun 2019 19:28:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8161128BA1; Fri, 21 Jun 2019 19:28:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3043128B9C for ; Fri, 21 Jun 2019 19:28:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726104AbfFUT2j (ORCPT ); Fri, 21 Jun 2019 15:28:39 -0400 Received: from mx2.suse.de ([195.135.220.15]:33216 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725947AbfFUT2i (ORCPT ); Fri, 21 Jun 2019 15:28:38 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 0C92DAF3B; Fri, 21 Jun 2019 19:28:37 +0000 (UTC) From: Goldwyn Rodrigues To: linux-btrfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, hch@lst.de, darrick.wong@oracle.com, david@fromorbit.com, Goldwyn Rodrigues Subject: [PATCH 2/6] iomap: Read page from srcmap for IOMAP_COW Date: Fri, 21 Jun 2019 14:28:24 -0500 Message-Id: <20190621192828.28900-3-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190621192828.28900-1-rgoldwyn@suse.de> References: <20190621192828.28900-1-rgoldwyn@suse.de> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Goldwyn Rodrigues Signed-off-by: Goldwyn Rodrigues --- fs/iomap.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/fs/iomap.c b/fs/iomap.c index 6648957af268..8a7b20e432ef 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -655,7 +655,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, static int iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, - struct page **pagep, struct iomap *iomap) + struct page **pagep, struct iomap *iomap, struct iomap *srcmap) { const struct iomap_page_ops *page_ops = iomap->page_ops; pgoff_t index = pos >> PAGE_SHIFT; @@ -681,6 +681,8 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, if (iomap->type == IOMAP_INLINE) iomap_read_inline_data(inode, page, iomap); + else if (iomap->type == IOMAP_COW) + status = __iomap_write_begin(inode, pos, len, page, srcmap); else if (iomap->flags & IOMAP_F_BUFFER_HEAD) status = __block_write_begin_int(page, pos, len, NULL, iomap); else @@ -833,7 +835,7 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data, } status = iomap_write_begin(inode, pos, bytes, flags, &page, - iomap); + iomap, srcmap); if (unlikely(status)) break; @@ -932,7 +934,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data, return PTR_ERR(rpage); status = iomap_write_begin(inode, pos, bytes, - AOP_FLAG_NOFS, &page, iomap); + AOP_FLAG_NOFS, &page, iomap, srcmap); put_page(rpage); if (unlikely(status)) return status; @@ -978,13 +980,13 @@ iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len, EXPORT_SYMBOL_GPL(iomap_file_dirty); static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset, - unsigned bytes, struct iomap *iomap) + unsigned bytes, struct iomap *iomap, struct iomap *srcmap) { struct page *page; int status; status = iomap_write_begin(inode, pos, bytes, AOP_FLAG_NOFS, &page, - iomap); + iomap, srcmap); if (status) return status; @@ -1022,7 +1024,7 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count, if (IS_DAX(inode)) status = iomap_dax_zero(pos, offset, bytes, iomap); else - status = iomap_zero(inode, pos, offset, bytes, iomap); + status = iomap_zero(inode, pos, offset, bytes, iomap, srcmap); if (status < 0) return status; From patchwork Fri Jun 21 19:28:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11010675 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1EE8F14E5 for ; Fri, 21 Jun 2019 19:28:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0F9D928721 for ; Fri, 21 Jun 2019 19:28:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 03F3E28B9C; Fri, 21 Jun 2019 19:28:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BA68628721 for ; Fri, 21 Jun 2019 19:28:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726121AbfFUT2m (ORCPT ); Fri, 21 Jun 2019 15:28:42 -0400 Received: from mx2.suse.de ([195.135.220.15]:33226 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726111AbfFUT2k (ORCPT ); Fri, 21 Jun 2019 15:28:40 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 162A8AEE9; Fri, 21 Jun 2019 19:28:39 +0000 (UTC) From: Goldwyn Rodrigues To: linux-btrfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, hch@lst.de, darrick.wong@oracle.com, david@fromorbit.com, Goldwyn Rodrigues Subject: [PATCH 3/6] iomap: Check iblocksize before transforming page->private Date: Fri, 21 Jun 2019 14:28:25 -0500 Message-Id: <20190621192828.28900-4-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190621192828.28900-1-rgoldwyn@suse.de> References: <20190621192828.28900-1-rgoldwyn@suse.de> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Goldwyn Rodrigues btrfs uses page->private as well to store extent_buffer. Make the check stricter to make sure we are using page->private for iop by comparing iblocksize < PAGE_SIZE. Signed-off-by: Goldwyn Rodrigues --- include/linux/iomap.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/iomap.h b/include/linux/iomap.h index f49767c7fd83..6511124e58b6 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -128,7 +128,8 @@ struct iomap_page { static inline struct iomap_page *to_iomap_page(struct page *page) { - if (page_has_private(page)) + if (i_blocksize(page->mapping->host) < PAGE_SIZE && + page_has_private(page)) return (struct iomap_page *)page_private(page); return NULL; } From patchwork Fri Jun 21 19:28:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11010683 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B46A5924 for ; Fri, 21 Jun 2019 19:28:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A6FEB28B9B for ; Fri, 21 Jun 2019 19:28:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9B28F28B9C; Fri, 21 Jun 2019 19:28:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BC9DD28B9D for ; Fri, 21 Jun 2019 19:28:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726132AbfFUT2o (ORCPT ); Fri, 21 Jun 2019 15:28:44 -0400 Received: from mx2.suse.de ([195.135.220.15]:33250 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725947AbfFUT2o (ORCPT ); Fri, 21 Jun 2019 15:28:44 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 362C9AF0F; Fri, 21 Jun 2019 19:28:41 +0000 (UTC) From: Goldwyn Rodrigues To: linux-btrfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, hch@lst.de, darrick.wong@oracle.com, david@fromorbit.com, Goldwyn Rodrigues Subject: [PATCH 4/6] btrfs: Add a simple buffered iomap write Date: Fri, 21 Jun 2019 14:28:26 -0500 Message-Id: <20190621192828.28900-5-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190621192828.28900-1-rgoldwyn@suse.de> References: <20190621192828.28900-1-rgoldwyn@suse.de> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Goldwyn Rodrigues Introduce a new static structure btrfs_iomap, which carries information from iomap_begin() to iomap_end(). This primarily contains reservation and extent locking information. This one is a long patch. Most of the code is "inspired" by fs/btrfs/file.c. To keep the size small, all removals are in following patches. Patch inclusion / Coding style question - From a code-readability PoV, I prefer lot of small functions versus a big function. I would like to further break this down, but since it was a 1-1 mapping with file.c, I let it be. Would you prefer to put breaking into smaller functions in this same patch, or prefer it as a separate patch (re)modifying this code? Signed-off-by: Goldwyn Rodrigues --- fs/btrfs/Makefile | 2 +- fs/btrfs/ctree.h | 1 + fs/btrfs/file.c | 4 +- fs/btrfs/iomap.c | 389 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 393 insertions(+), 3 deletions(-) create mode 100644 fs/btrfs/iomap.c diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index ca693dd554e9..bfc1954204cf 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -10,7 +10,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ export.o tree-log.o free-space-cache.o zlib.o lzo.o zstd.o \ compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \ reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \ - uuid-tree.o props.o free-space-tree.o tree-checker.o + uuid-tree.o props.o free-space-tree.o tree-checker.o iomap.o btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 0a61dff27f57..80d37dfff873 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3348,6 +3348,7 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end); loff_t btrfs_remap_file_range(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t len, unsigned int remap_flags); +size_t btrfs_buffered_iomap_write(struct kiocb *iocb, struct iov_iter *from); /* tree-defrag.c */ int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 89f5be2bfb43..fc3032d8b573 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1839,7 +1839,7 @@ static ssize_t __btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) return written; pos = iocb->ki_pos; - written_buffered = btrfs_buffered_write(iocb, from); + written_buffered = btrfs_buffered_iomap_write(iocb, from); if (written_buffered < 0) { err = written_buffered; goto out; @@ -1976,7 +1976,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, if (iocb->ki_flags & IOCB_DIRECT) { num_written = __btrfs_direct_write(iocb, from); } else { - num_written = btrfs_buffered_write(iocb, from); + num_written = btrfs_buffered_iomap_write(iocb, from); if (num_written > 0) iocb->ki_pos = pos + num_written; if (clean_page) diff --git a/fs/btrfs/iomap.c b/fs/btrfs/iomap.c new file mode 100644 index 000000000000..0435b179d461 --- /dev/null +++ b/fs/btrfs/iomap.c @@ -0,0 +1,389 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * iomap support for BTRFS + * + * Copyright (c) 2019 SUSE Linux + * Author: Goldwyn Rodrigues + */ + +#include +#include "ctree.h" +#include "btrfs_inode.h" +#include "volumes.h" +#include "disk-io.h" + +struct btrfs_iomap { + u64 start; + u64 end; + bool nocow; + int extents_locked; + ssize_t reserved_bytes; + struct extent_changeset *data_reserved; + struct extent_state *cached_state; +}; + + +/* + * This function locks the extent and properly waits for data=ordered extents + * to finish before allowing the pages to be modified if need. + * + * The return value: + * 1 - the extent is locked + * 0 - the extent is not locked, and everything is OK + * -EAGAIN - need re-prepare the pages + * the other < 0 number - Something wrong happens + */ +static noinline int +lock_and_cleanup_extent(struct btrfs_inode *inode, loff_t pos, + size_t write_bytes, + u64 *lockstart, u64 *lockend, + struct extent_state **cached_state) +{ + struct btrfs_fs_info *fs_info = inode->root->fs_info; + u64 start_pos; + u64 last_pos; + int ret = 0; + + start_pos = round_down(pos, fs_info->sectorsize); + last_pos = start_pos + + round_up(pos + write_bytes - start_pos, + fs_info->sectorsize) - 1; + + if (start_pos < inode->vfs_inode.i_size) { + struct btrfs_ordered_extent *ordered; + + lock_extent_bits(&inode->io_tree, start_pos, last_pos, + cached_state); + ordered = btrfs_lookup_ordered_range(inode, start_pos, + last_pos - start_pos + 1); + if (ordered && + ordered->file_offset + ordered->len > start_pos && + ordered->file_offset <= last_pos) { + unlock_extent_cached(&inode->io_tree, start_pos, + last_pos, cached_state); + btrfs_start_ordered_extent(&inode->vfs_inode, + ordered, 1); + btrfs_put_ordered_extent(ordered); + return -EAGAIN; + } + if (ordered) + btrfs_put_ordered_extent(ordered); + + *lockstart = start_pos; + *lockend = last_pos; + ret = 1; + } + + return ret; +} + +static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos, + size_t *write_bytes) +{ + struct btrfs_fs_info *fs_info = inode->root->fs_info; + struct btrfs_root *root = inode->root; + struct btrfs_ordered_extent *ordered; + u64 lockstart, lockend; + u64 num_bytes; + int ret; + + ret = btrfs_start_write_no_snapshotting(root); + if (!ret) + return -ENOSPC; + + lockstart = round_down(pos, fs_info->sectorsize); + lockend = round_up(pos + *write_bytes, + fs_info->sectorsize) - 1; + + while (1) { + lock_extent(&inode->io_tree, lockstart, lockend); + ordered = btrfs_lookup_ordered_range(inode, lockstart, + lockend - lockstart + 1); + if (!ordered) { + break; + } + unlock_extent(&inode->io_tree, lockstart, lockend); + btrfs_start_ordered_extent(&inode->vfs_inode, ordered, 1); + btrfs_put_ordered_extent(ordered); + } + + num_bytes = lockend - lockstart + 1; + ret = can_nocow_extent(&inode->vfs_inode, lockstart, &num_bytes, + NULL, NULL, NULL); + if (ret <= 0) { + ret = 0; + btrfs_end_write_no_snapshotting(root); + } else { + *write_bytes = min_t(size_t, *write_bytes , + num_bytes - pos + lockstart); + } + + unlock_extent(&inode->io_tree, lockstart, lockend); + + return ret; +} + +static int btrfs_find_new_delalloc_bytes(struct btrfs_inode *inode, + const u64 start, + const u64 len, + struct extent_state **cached_state) +{ + u64 search_start = start; + const u64 end = start + len - 1; + + while (search_start < end) { + const u64 search_len = end - search_start + 1; + struct extent_map *em; + u64 em_len; + int ret = 0; + + em = btrfs_get_extent(inode, NULL, 0, search_start, + search_len, 0); + if (IS_ERR(em)) + return PTR_ERR(em); + + if (em->block_start != EXTENT_MAP_HOLE) + goto next; + + em_len = em->len; + if (em->start < search_start) + em_len -= search_start - em->start; + if (em_len > search_len) + em_len = search_len; + + ret = set_extent_bit(&inode->io_tree, search_start, + search_start + em_len - 1, + EXTENT_DELALLOC_NEW, + NULL, cached_state, GFP_NOFS); +next: + search_start = extent_map_end(em); + free_extent_map(em); + if (ret) + return ret; + } + return 0; +} + +static int btrfs_buffered_page_prepare(struct inode *inode, loff_t pos, + unsigned len, struct iomap *iomap) +{ + //wait_on_page_writeback(page); + //set_page_extent_mapped(page); + return 0; +} + +static void btrfs_buffered_page_done(struct inode *inode, loff_t pos, + unsigned copied, struct page *page, + struct iomap *iomap) +{ + SetPageUptodate(page); + ClearPageChecked(page); + set_page_dirty(page); + get_page(page); +} + + +static const struct iomap_page_ops btrfs_buffered_page_ops = { + .page_prepare = btrfs_buffered_page_prepare, + .page_done = btrfs_buffered_page_done, +}; + + +static int btrfs_buffered_iomap_begin(struct inode *inode, loff_t pos, + loff_t length, unsigned flags, struct iomap *iomap, + struct iomap *srcmap) +{ + int ret; + size_t write_bytes = length; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + size_t sector_offset = pos & (fs_info->sectorsize - 1); + struct btrfs_iomap *bi; + + bi = kzalloc(sizeof(struct btrfs_iomap), GFP_NOFS); + if (!bi) + return -ENOMEM; + + bi->reserved_bytes = round_up(write_bytes + sector_offset, + fs_info->sectorsize); + + /* Reserve data space */ + ret = btrfs_check_data_free_space(inode, &bi->data_reserved, pos, + write_bytes); + if (ret < 0) { + /* + * Space allocation failed. Let's check if we can + * continue I/O without allocations + */ + if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | + BTRFS_INODE_PREALLOC)) && + check_can_nocow(BTRFS_I(inode), pos, + &write_bytes) > 0) { + bi->nocow = true; + /* + * our prealloc extent may be smaller than + * write_bytes, so scale down. + */ + bi->reserved_bytes = round_up(write_bytes + + sector_offset, + fs_info->sectorsize); + } else { + goto error; + } + } + + WARN_ON(bi->reserved_bytes == 0); + + /* We have the data space allocated, reserve the metadata now */ + ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), + bi->reserved_bytes); + if (ret) { + struct btrfs_root *root = BTRFS_I(inode)->root; + if (!bi->nocow) + btrfs_free_reserved_data_space(inode, + bi->data_reserved, pos, + write_bytes); + else + btrfs_end_write_no_snapshotting(root); + goto error; + } + + do { + ret = lock_and_cleanup_extent( + BTRFS_I(inode), pos, write_bytes, &bi->start, + &bi->end, &bi->cached_state); + } while (ret == -EAGAIN); + + if (ret < 0) { + btrfs_delalloc_release_extents(BTRFS_I(inode), + bi->reserved_bytes, true); + goto release; + } else { + bi->extents_locked = ret; + } + iomap->private = bi; + iomap->length = round_up(write_bytes, fs_info->sectorsize); + iomap->offset = round_down(pos, fs_info->sectorsize); + iomap->addr = IOMAP_NULL_ADDR; + iomap->type = IOMAP_DELALLOC; + iomap->bdev = fs_info->fs_devices->latest_bdev; + iomap->page_ops = &btrfs_buffered_page_ops; + return 0; +release: + if (bi->extents_locked) + unlock_extent_cached(&BTRFS_I(inode)->io_tree, bi->start, + bi->end, &bi->cached_state); + if (bi->nocow) { + struct btrfs_root *root = BTRFS_I(inode)->root; + btrfs_end_write_no_snapshotting(root); + btrfs_delalloc_release_metadata(BTRFS_I(inode), + bi->reserved_bytes, true); + } else { + btrfs_delalloc_release_space(inode, bi->data_reserved, + round_down(pos, fs_info->sectorsize), + bi->reserved_bytes, true); + } + extent_changeset_free(bi->data_reserved); + +error: + kfree(bi); + return ret; +} + +static int btrfs_buffered_iomap_end(struct inode *inode, loff_t pos, + loff_t length, ssize_t written, unsigned flags, + struct iomap *iomap) +{ + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + struct btrfs_iomap *bi = iomap->private; + ssize_t release_bytes = round_down(bi->reserved_bytes - written, + 1 << fs_info->sb->s_blocksize_bits); + unsigned int extra_bits = 0; + u64 start_pos = pos & ~((u64) fs_info->sectorsize - 1); + u64 num_bytes = round_up(written + pos - start_pos, + fs_info->sectorsize); + u64 end_of_last_block = start_pos + num_bytes - 1; + int ret = 0; + + if (release_bytes > 0) { + if (bi->nocow) { + btrfs_delalloc_release_metadata(BTRFS_I(inode), + release_bytes, true); + } else { + u64 __pos = round_down(pos + written, fs_info->sectorsize); + btrfs_delalloc_release_space(inode, bi->data_reserved, + __pos, release_bytes, true); + } + } + + /* + * The pages may have already been dirty, clear out old accounting so + * we can set things up properly + */ + clear_extent_bit(&BTRFS_I(inode)->io_tree, start_pos, end_of_last_block, + EXTENT_DIRTY | EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING | + EXTENT_DEFRAG, 0, 0, &bi->cached_state); + + if (!btrfs_is_free_space_inode(BTRFS_I(inode))) { + if (start_pos >= i_size_read(inode) && + !(BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC)) { + /* + * There can't be any extents following eof in this case + * so just set the delalloc new bit for the range + * directly. + */ + extra_bits |= EXTENT_DELALLOC_NEW; + } else { + ret = btrfs_find_new_delalloc_bytes(BTRFS_I(inode), + start_pos, num_bytes, + &bi->cached_state); + if (ret) + goto unlock; + } + } + + ret = btrfs_set_extent_delalloc(inode, start_pos, end_of_last_block, + extra_bits, &bi->cached_state, 0); +unlock: + if (bi->extents_locked) + unlock_extent_cached(&BTRFS_I(inode)->io_tree, + bi->start, bi->end, &bi->cached_state); + + if (bi->nocow) { + struct btrfs_root *root = BTRFS_I(inode)->root; + btrfs_end_write_no_snapshotting(root); + if (written > 0) { + u64 start = round_down(pos, fs_info->sectorsize); + u64 end = round_up(pos + written, fs_info->sectorsize) - 1; + set_extent_bit(&BTRFS_I(inode)->io_tree, start, end, + EXTENT_NORESERVE, NULL, NULL, GFP_NOFS); + } + + } + btrfs_delalloc_release_extents(BTRFS_I(inode), bi->reserved_bytes, + true); + + if (written < fs_info->nodesize) + btrfs_btree_balance_dirty(fs_info); + + extent_changeset_free(bi->data_reserved); + kfree(bi); + return ret; +} + +static const struct iomap_ops btrfs_buffered_iomap_ops = { + .iomap_begin = btrfs_buffered_iomap_begin, + .iomap_end = btrfs_buffered_iomap_end, +}; + +size_t btrfs_buffered_iomap_write(struct kiocb *iocb, struct iov_iter *from) +{ + ssize_t written; + struct inode *inode = file_inode(iocb->ki_filp); + written = iomap_file_buffered_write(iocb, from, &btrfs_buffered_iomap_ops); + if (written > 0) + iocb->ki_pos += written; + if (iocb->ki_pos > i_size_read(inode)) + i_size_write(inode, iocb->ki_pos); + return written; +} + From patchwork Fri Jun 21 19:28:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11010679 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B54D614E5 for ; Fri, 21 Jun 2019 19:28:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A604428721 for ; Fri, 21 Jun 2019 19:28:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9A5CF28B9C; Fri, 21 Jun 2019 19:28:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4CCD428B9B for ; Fri, 21 Jun 2019 19:28:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726137AbfFUT2o (ORCPT ); Fri, 21 Jun 2019 15:28:44 -0400 Received: from mx2.suse.de ([195.135.220.15]:33262 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726111AbfFUT2o (ORCPT ); Fri, 21 Jun 2019 15:28:44 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 3E619AF3B; Fri, 21 Jun 2019 19:28:43 +0000 (UTC) From: Goldwyn Rodrigues To: linux-btrfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, hch@lst.de, darrick.wong@oracle.com, david@fromorbit.com, Goldwyn Rodrigues Subject: [PATCH 5/6] btrfs: Add CoW in iomap based writes Date: Fri, 21 Jun 2019 14:28:27 -0500 Message-Id: <20190621192828.28900-6-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190621192828.28900-1-rgoldwyn@suse.de> References: <20190621192828.28900-1-rgoldwyn@suse.de> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Goldwyn Rodrigues Set iomap->type to IOMAP_COW and fill up the source map in case the I/O is not page aligned. Signed-off-by: Goldwyn Rodrigues --- fs/btrfs/iomap.c | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/fs/btrfs/iomap.c b/fs/btrfs/iomap.c index 0435b179d461..f4133cf953b4 100644 --- a/fs/btrfs/iomap.c +++ b/fs/btrfs/iomap.c @@ -172,6 +172,35 @@ static int btrfs_buffered_page_prepare(struct inode *inode, loff_t pos, return 0; } +/* + * get_iomap: Get the block map and fill the iomap structure + * @pos: file position + * @length: I/O length + * @iomap: The iomap structure to fill + */ + +static int get_iomap(struct inode *inode, loff_t pos, loff_t length, + struct iomap *iomap) +{ + struct extent_map *em; + iomap->addr = IOMAP_NULL_ADDR; + em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, pos, length, 0); + if (IS_ERR(em)) + return PTR_ERR(em); + /* XXX Do we need to check for em->flags here? */ + if (em->block_start == EXTENT_MAP_HOLE) { + iomap->type = IOMAP_HOLE; + } else { + iomap->addr = em->block_start + (pos - em->start); + iomap->type = IOMAP_MAPPED; + } + iomap->offset = em->start; + iomap->bdev = em->bdev; + iomap->length = em->len; + free_extent_map(em); + return 0; +} + static void btrfs_buffered_page_done(struct inode *inode, loff_t pos, unsigned copied, struct page *page, struct iomap *iomap) @@ -196,6 +225,7 @@ static int btrfs_buffered_iomap_begin(struct inode *inode, loff_t pos, int ret; size_t write_bytes = length; struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + size_t end; size_t sector_offset = pos & (fs_info->sectorsize - 1); struct btrfs_iomap *bi; @@ -263,6 +293,17 @@ static int btrfs_buffered_iomap_begin(struct inode *inode, loff_t pos, iomap->private = bi; iomap->length = round_up(write_bytes, fs_info->sectorsize); iomap->offset = round_down(pos, fs_info->sectorsize); + end = pos + write_bytes; + /* Set IOMAP_COW if start/end is not page aligned */ + if (((pos & (PAGE_SIZE - 1)) || (end & (PAGE_SIZE - 1)))) { + iomap->type = IOMAP_COW; + ret = get_iomap(inode, pos, length, srcmap); + if (ret < 0) + goto release; + } else { + iomap->type = IOMAP_DELALLOC; + } + iomap->addr = IOMAP_NULL_ADDR; iomap->type = IOMAP_DELALLOC; iomap->bdev = fs_info->fs_devices->latest_bdev; From patchwork Fri Jun 21 19:28:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11010687 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 034D114E5 for ; Fri, 21 Jun 2019 19:28:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E9A4D28B9B for ; Fri, 21 Jun 2019 19:28:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DD77A28B9D; Fri, 21 Jun 2019 19:28:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0B68428B9C for ; Fri, 21 Jun 2019 19:28:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726155AbfFUT2s (ORCPT ); Fri, 21 Jun 2019 15:28:48 -0400 Received: from mx2.suse.de ([195.135.220.15]:33272 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726111AbfFUT2q (ORCPT ); Fri, 21 Jun 2019 15:28:46 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 4E23FAF3E; Fri, 21 Jun 2019 19:28:45 +0000 (UTC) From: Goldwyn Rodrigues To: linux-btrfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, hch@lst.de, darrick.wong@oracle.com, david@fromorbit.com, Goldwyn Rodrigues Subject: [PATCH 6/6] btrfs: remove buffered write code made unnecessary Date: Fri, 21 Jun 2019 14:28:28 -0500 Message-Id: <20190621192828.28900-7-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190621192828.28900-1-rgoldwyn@suse.de> References: <20190621192828.28900-1-rgoldwyn@suse.de> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Goldwyn Rodrigues Better done in a separate patch to keep the main patch short(er) Signed-off-by: Goldwyn Rodrigues --- fs/btrfs/file.c | 464 -------------------------------------------------------- 1 file changed, 464 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index fc3032d8b573..61b5512ef035 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -389,79 +389,6 @@ int btrfs_run_defrag_inodes(struct btrfs_fs_info *fs_info) return 0; } -/* simple helper to fault in pages and copy. This should go away - * and be replaced with calls into generic code. - */ -static noinline int btrfs_copy_from_user(loff_t pos, size_t write_bytes, - struct page **prepared_pages, - struct iov_iter *i) -{ - size_t copied = 0; - size_t total_copied = 0; - int pg = 0; - int offset = offset_in_page(pos); - - while (write_bytes > 0) { - size_t count = min_t(size_t, - PAGE_SIZE - offset, write_bytes); - struct page *page = prepared_pages[pg]; - /* - * Copy data from userspace to the current page - */ - copied = iov_iter_copy_from_user_atomic(page, i, offset, count); - - /* Flush processor's dcache for this page */ - flush_dcache_page(page); - - /* - * if we get a partial write, we can end up with - * partially up to date pages. These add - * a lot of complexity, so make sure they don't - * happen by forcing this copy to be retried. - * - * The rest of the btrfs_file_write code will fall - * back to page at a time copies after we return 0. - */ - if (!PageUptodate(page) && copied < count) - copied = 0; - - iov_iter_advance(i, copied); - write_bytes -= copied; - total_copied += copied; - - /* Return to btrfs_file_write_iter to fault page */ - if (unlikely(copied == 0)) - break; - - if (copied < PAGE_SIZE - offset) { - offset += copied; - } else { - pg++; - offset = 0; - } - } - return total_copied; -} - -/* - * unlocks pages after btrfs_file_write is done with them - */ -static void btrfs_drop_pages(struct page **pages, size_t num_pages) -{ - size_t i; - for (i = 0; i < num_pages; i++) { - /* page checked is some magic around finding pages that - * have been modified without going through btrfs_set_page_dirty - * clear it here. There should be no need to mark the pages - * accessed as prepare_pages should have marked them accessed - * in prepare_pages via find_or_create_page() - */ - ClearPageChecked(pages[i]); - unlock_page(pages[i]); - put_page(pages[i]); - } -} - static int btrfs_find_new_delalloc_bytes(struct btrfs_inode *inode, const u64 start, const u64 len, @@ -1386,165 +1313,6 @@ int btrfs_mark_extent_written(struct btrfs_trans_handle *trans, return 0; } -/* - * on error we return an unlocked page and the error value - * on success we return a locked page and 0 - */ -static int prepare_uptodate_page(struct inode *inode, - struct page *page, u64 pos, - bool force_uptodate) -{ - int ret = 0; - - if (((pos & (PAGE_SIZE - 1)) || force_uptodate) && - !PageUptodate(page)) { - ret = btrfs_readpage(NULL, page); - if (ret) - return ret; - lock_page(page); - if (!PageUptodate(page)) { - unlock_page(page); - return -EIO; - } - if (page->mapping != inode->i_mapping) { - unlock_page(page); - return -EAGAIN; - } - } - return 0; -} - -/* - * this just gets pages into the page cache and locks them down. - */ -static noinline int prepare_pages(struct inode *inode, struct page **pages, - size_t num_pages, loff_t pos, - size_t write_bytes, bool force_uptodate) -{ - int i; - unsigned long index = pos >> PAGE_SHIFT; - gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping); - int err = 0; - int faili; - - for (i = 0; i < num_pages; i++) { -again: - pages[i] = find_or_create_page(inode->i_mapping, index + i, - mask | __GFP_WRITE); - if (!pages[i]) { - faili = i - 1; - err = -ENOMEM; - goto fail; - } - - if (i == 0) - err = prepare_uptodate_page(inode, pages[i], pos, - force_uptodate); - if (!err && i == num_pages - 1) - err = prepare_uptodate_page(inode, pages[i], - pos + write_bytes, false); - if (err) { - put_page(pages[i]); - if (err == -EAGAIN) { - err = 0; - goto again; - } - faili = i - 1; - goto fail; - } - wait_on_page_writeback(pages[i]); - } - - return 0; -fail: - while (faili >= 0) { - unlock_page(pages[faili]); - put_page(pages[faili]); - faili--; - } - return err; - -} - -/* - * This function locks the extent and properly waits for data=ordered extents - * to finish before allowing the pages to be modified if need. - * - * The return value: - * 1 - the extent is locked - * 0 - the extent is not locked, and everything is OK - * -EAGAIN - need re-prepare the pages - * the other < 0 number - Something wrong happens - */ -static noinline int -lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages, - size_t num_pages, loff_t pos, - size_t write_bytes, - u64 *lockstart, u64 *lockend, - struct extent_state **cached_state) -{ - struct btrfs_fs_info *fs_info = inode->root->fs_info; - u64 start_pos; - u64 last_pos; - int i; - int ret = 0; - - start_pos = round_down(pos, fs_info->sectorsize); - last_pos = start_pos - + round_up(pos + write_bytes - start_pos, - fs_info->sectorsize) - 1; - - if (start_pos < inode->vfs_inode.i_size) { - struct btrfs_ordered_extent *ordered; - - lock_extent_bits(&inode->io_tree, start_pos, last_pos, - cached_state); - ordered = btrfs_lookup_ordered_range(inode, start_pos, - last_pos - start_pos + 1); - if (ordered && - ordered->file_offset + ordered->len > start_pos && - ordered->file_offset <= last_pos) { - unlock_extent_cached(&inode->io_tree, start_pos, - last_pos, cached_state); - for (i = 0; i < num_pages; i++) { - unlock_page(pages[i]); - put_page(pages[i]); - } - btrfs_start_ordered_extent(&inode->vfs_inode, - ordered, 1); - btrfs_put_ordered_extent(ordered); - return -EAGAIN; - } - if (ordered) - btrfs_put_ordered_extent(ordered); - - *lockstart = start_pos; - *lockend = last_pos; - ret = 1; - } - - /* - * It's possible the pages are dirty right now, but we don't want - * to clean them yet because copy_from_user may catch a page fault - * and we might have to fall back to one page at a time. If that - * happens, we'll unlock these pages and we'd have a window where - * reclaim could sneak in and drop the once-dirty page on the floor - * without writing it. - * - * We have the pages locked and the extent range locked, so there's - * no way someone can start IO on any dirty pages in this range. - * - * We'll call btrfs_dirty_pages() later on, and that will flip around - * delalloc bits and dirty the pages as required. - */ - for (i = 0; i < num_pages; i++) { - set_page_extent_mapped(pages[i]); - WARN_ON(!PageLocked(pages[i])); - } - - return ret; -} - static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos, size_t *write_bytes) { @@ -1591,238 +1359,6 @@ static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos, return ret; } -static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, - struct iov_iter *i) -{ - struct file *file = iocb->ki_filp; - loff_t pos = iocb->ki_pos; - struct inode *inode = file_inode(file); - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct btrfs_root *root = BTRFS_I(inode)->root; - struct page **pages = NULL; - struct extent_state *cached_state = NULL; - struct extent_changeset *data_reserved = NULL; - u64 release_bytes = 0; - u64 lockstart; - u64 lockend; - size_t num_written = 0; - int nrptrs; - int ret = 0; - bool only_release_metadata = false; - bool force_page_uptodate = false; - - nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE), - PAGE_SIZE / (sizeof(struct page *))); - nrptrs = min(nrptrs, current->nr_dirtied_pause - current->nr_dirtied); - nrptrs = max(nrptrs, 8); - pages = kmalloc_array(nrptrs, sizeof(struct page *), GFP_KERNEL); - if (!pages) - return -ENOMEM; - - while (iov_iter_count(i) > 0) { - size_t offset = offset_in_page(pos); - size_t sector_offset; - size_t write_bytes = min(iov_iter_count(i), - nrptrs * (size_t)PAGE_SIZE - - offset); - size_t num_pages = DIV_ROUND_UP(write_bytes + offset, - PAGE_SIZE); - size_t reserve_bytes; - size_t dirty_pages; - size_t copied; - size_t dirty_sectors; - size_t num_sectors; - int extents_locked; - - WARN_ON(num_pages > nrptrs); - - /* - * Fault pages before locking them in prepare_pages - * to avoid recursive lock - */ - if (unlikely(iov_iter_fault_in_readable(i, write_bytes))) { - ret = -EFAULT; - break; - } - - sector_offset = pos & (fs_info->sectorsize - 1); - reserve_bytes = round_up(write_bytes + sector_offset, - fs_info->sectorsize); - - extent_changeset_release(data_reserved); - ret = btrfs_check_data_free_space(inode, &data_reserved, pos, - write_bytes); - if (ret < 0) { - if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | - BTRFS_INODE_PREALLOC)) && - check_can_nocow(BTRFS_I(inode), pos, - &write_bytes) > 0) { - /* - * For nodata cow case, no need to reserve - * data space. - */ - only_release_metadata = true; - /* - * our prealloc extent may be smaller than - * write_bytes, so scale down. - */ - num_pages = DIV_ROUND_UP(write_bytes + offset, - PAGE_SIZE); - reserve_bytes = round_up(write_bytes + - sector_offset, - fs_info->sectorsize); - } else { - break; - } - } - - WARN_ON(reserve_bytes == 0); - ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), - reserve_bytes); - if (ret) { - if (!only_release_metadata) - btrfs_free_reserved_data_space(inode, - data_reserved, pos, - write_bytes); - else - btrfs_end_write_no_snapshotting(root); - break; - } - - release_bytes = reserve_bytes; -again: - /* - * This is going to setup the pages array with the number of - * pages we want, so we don't really need to worry about the - * contents of pages from loop to loop - */ - ret = prepare_pages(inode, pages, num_pages, - pos, write_bytes, - force_page_uptodate); - if (ret) { - btrfs_delalloc_release_extents(BTRFS_I(inode), - reserve_bytes, true); - break; - } - - extents_locked = lock_and_cleanup_extent_if_need( - BTRFS_I(inode), pages, - num_pages, pos, write_bytes, &lockstart, - &lockend, &cached_state); - if (extents_locked < 0) { - if (extents_locked == -EAGAIN) - goto again; - btrfs_delalloc_release_extents(BTRFS_I(inode), - reserve_bytes, true); - ret = extents_locked; - break; - } - - copied = btrfs_copy_from_user(pos, write_bytes, pages, i); - - num_sectors = BTRFS_BYTES_TO_BLKS(fs_info, reserve_bytes); - dirty_sectors = round_up(copied + sector_offset, - fs_info->sectorsize); - dirty_sectors = BTRFS_BYTES_TO_BLKS(fs_info, dirty_sectors); - - /* - * if we have trouble faulting in the pages, fall - * back to one page at a time - */ - if (copied < write_bytes) - nrptrs = 1; - - if (copied == 0) { - force_page_uptodate = true; - dirty_sectors = 0; - dirty_pages = 0; - } else { - force_page_uptodate = false; - dirty_pages = DIV_ROUND_UP(copied + offset, - PAGE_SIZE); - } - - if (num_sectors > dirty_sectors) { - /* release everything except the sectors we dirtied */ - release_bytes -= dirty_sectors << - fs_info->sb->s_blocksize_bits; - if (only_release_metadata) { - btrfs_delalloc_release_metadata(BTRFS_I(inode), - release_bytes, true); - } else { - u64 __pos; - - __pos = round_down(pos, - fs_info->sectorsize) + - (dirty_pages << PAGE_SHIFT); - btrfs_delalloc_release_space(inode, - data_reserved, __pos, - release_bytes, true); - } - } - - release_bytes = round_up(copied + sector_offset, - fs_info->sectorsize); - - if (copied > 0) - ret = btrfs_dirty_pages(inode, pages, dirty_pages, - pos, copied, &cached_state); - if (extents_locked) - unlock_extent_cached(&BTRFS_I(inode)->io_tree, - lockstart, lockend, &cached_state); - btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes, - true); - if (ret) { - btrfs_drop_pages(pages, num_pages); - break; - } - - release_bytes = 0; - if (only_release_metadata) - btrfs_end_write_no_snapshotting(root); - - if (only_release_metadata && copied > 0) { - lockstart = round_down(pos, - fs_info->sectorsize); - lockend = round_up(pos + copied, - fs_info->sectorsize) - 1; - - set_extent_bit(&BTRFS_I(inode)->io_tree, lockstart, - lockend, EXTENT_NORESERVE, NULL, - NULL, GFP_NOFS); - only_release_metadata = false; - } - - btrfs_drop_pages(pages, num_pages); - - cond_resched(); - - balance_dirty_pages_ratelimited(inode->i_mapping); - if (dirty_pages < (fs_info->nodesize >> PAGE_SHIFT) + 1) - btrfs_btree_balance_dirty(fs_info); - - pos += copied; - num_written += copied; - } - - kfree(pages); - - if (release_bytes) { - if (only_release_metadata) { - btrfs_end_write_no_snapshotting(root); - btrfs_delalloc_release_metadata(BTRFS_I(inode), - release_bytes, true); - } else { - btrfs_delalloc_release_space(inode, data_reserved, - round_down(pos, fs_info->sectorsize), - release_bytes, true); - } - } - - extent_changeset_free(data_reserved); - return num_written ? num_written : ret; -} - static ssize_t __btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) { struct file *file = iocb->ki_filp;