From patchwork Thu Apr 8 12:04:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiyang Ruan X-Patchwork-Id: 12190859 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B30DC43470 for ; Thu, 8 Apr 2021 12:04:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7623E61159 for ; Thu, 8 Apr 2021 12:04:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231352AbhDHMEz (ORCPT ); Thu, 8 Apr 2021 08:04:55 -0400 Received: from mail.cn.fujitsu.com ([183.91.158.132]:24811 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S231195AbhDHMEy (ORCPT ); Thu, 8 Apr 2021 08:04:54 -0400 IronPort-HdrOrdr: A9a23:7G2dj6v6h/XP8m9i1FC+MU3j7skDltV00zAX/kB9WHVpW+afkN2jm+le6A/shF8qKRUdsP2jGI3Fe3PT8pZp/ZIcVI3OYCDKsHalRbsN0aLMzzHsECX19Kp8+M5bGZRWJ8b3CTFB7PrSxCmdP5IezMKc8Kau7N2uqktFaQ1xcalv40NYJ2+gYy5LbTJLD5Y4C5aQj/Avz1WdUE4KZce2DGRtZZmgm/T3kvvdASIuNloO7QmiqXeS4qfmLh7w5HwjegIK7bA80WWtqWDE2pk= X-IronPort-AV: E=Sophos;i="5.82,206,1613404800"; d="scan'208";a="106797244" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 08 Apr 2021 20:04:39 +0800 Received: from G08CNEXMBPEKD06.g08.fujitsu.local (unknown [10.167.33.206]) by cn.fujitsu.com (Postfix) with ESMTP id E5F494CF2676; Thu, 8 Apr 2021 20:04:36 +0800 (CST) Received: from G08CNEXJMPEKD02.g08.fujitsu.local (10.167.33.202) by G08CNEXMBPEKD06.g08.fujitsu.local (10.167.33.206) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 8 Apr 2021 20:04:36 +0800 Received: from G08CNEXCHPEKD04.g08.fujitsu.local (10.167.33.200) by G08CNEXJMPEKD02.g08.fujitsu.local (10.167.33.202) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 8 Apr 2021 20:04:36 +0800 Received: from irides.mr.mr.mr (10.167.225.141) by G08CNEXCHPEKD04.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Thu, 8 Apr 2021 20:04:36 +0800 From: Shiyang Ruan To: , , , CC: , , , , , , , , Subject: [PATCH v4 1/7] fsdax: Introduce dax_iomap_cow_copy() Date: Thu, 8 Apr 2021 20:04:26 +0800 Message-ID: <20210408120432.1063608-2-ruansy.fnst@fujitsu.com> X-Mailer: git-send-email 2.31.0 In-Reply-To: <20210408120432.1063608-1-ruansy.fnst@fujitsu.com> References: <20210408120432.1063608-1-ruansy.fnst@fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: E5F494CF2676.A3511 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@fujitsu.com Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org In the case where the iomap is a write operation and iomap is not equal to srcmap after iomap_begin, we consider it is a CoW operation. The destance extent which iomap indicated is new allocated extent. So, it is needed to copy the data from srcmap to new allocated extent. In theory, it is better to copy the head and tail ranges which is outside of the non-aligned area instead of copying the whole aligned range. But in dax page fault, it will always be an aligned range. So, we have to copy the whole range in this case. Signed-off-by: Shiyang Ruan Reviewed-by: Christoph Hellwig --- fs/dax.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 77 insertions(+), 5 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 8d7e4e2cc0fb..b4fd3813457a 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1038,6 +1038,61 @@ static int dax_iomap_direct_access(struct iomap *iomap, loff_t pos, size_t size, return rc; } +/** + * dax_iomap_cow_copy(): Copy the data from source to destination before write. + * @pos: address to do copy from. + * @length: size of copy operation. + * @align_size: aligned w.r.t align_size (either PMD_SIZE or PAGE_SIZE) + * @srcmap: iomap srcmap + * @daddr: destination address to copy to. + * + * This can be called from two places. Either during DAX write fault, to copy + * the length size data to daddr. Or, while doing normal DAX write operation, + * dax_iomap_actor() might call this to do the copy of either start or end + * unaligned address. In this case the rest of the copy of aligned ranges is + * taken care by dax_iomap_actor() itself. + * Also, note DAX fault will always result in aligned pos and pos + length. + */ +static int dax_iomap_cow_copy(loff_t pos, loff_t length, size_t align_size, + struct iomap *srcmap, void *daddr) +{ + loff_t head_off = pos & (align_size - 1); + size_t size = ALIGN(head_off + length, align_size); + loff_t end = pos + length; + loff_t pg_end = round_up(end, align_size); + bool copy_all = head_off == 0 && end == pg_end; + void *saddr = 0; + int ret = 0; + + ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL); + if (ret) + return ret; + + if (copy_all) { + ret = copy_mc_to_kernel(daddr, saddr, length); + return ret ? -EIO : 0; + } + + /* Copy the head part of the range. Note: we pass offset as length. */ + if (head_off) { + ret = copy_mc_to_kernel(daddr, saddr, head_off); + if (ret) + return -EIO; + } + + /* Copy the tail part of the range */ + if (end < pg_end) { + loff_t tail_off = head_off + length; + loff_t tail_len = pg_end - end; + + ret = copy_mc_to_kernel(daddr + tail_off, saddr + tail_off, + tail_len); + if (ret) + return -EIO; + } + return 0; +} + /* * The user has performed a load from a hole in the file. Allocating a new * page in the file would cause excessive storage usage for workloads with @@ -1167,11 +1222,12 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data, struct dax_device *dax_dev = iomap->dax_dev; struct iov_iter *iter = data; loff_t end = pos + length, done = 0; + bool write = iov_iter_rw(iter) == WRITE; ssize_t ret = 0; size_t xfer; int id; - if (iov_iter_rw(iter) == READ) { + if (!write) { end = min(end, i_size_read(inode)); if (pos >= end) return 0; @@ -1180,7 +1236,8 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data, return iov_iter_zero(min(length, end - pos), iter); } - if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED)) + if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED && + !(iomap->flags & IOMAP_F_SHARED))) return -EIO; /* @@ -1219,6 +1276,13 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data, break; } + if (write && srcmap->addr != iomap->addr) { + ret = dax_iomap_cow_copy(pos, length, PAGE_SIZE, srcmap, + kaddr); + if (ret) + break; + } + map_len = PFN_PHYS(map_len); kaddr += offset; map_len -= offset; @@ -1230,7 +1294,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data, * validated via access_ok() in either vfs_read() or * vfs_write(), depending on which operation we are doing. */ - if (iov_iter_rw(iter) == WRITE) + if (write) xfer = dax_copy_from_iter(dax_dev, pgoff, kaddr, map_len, iter); else @@ -1382,6 +1446,7 @@ static vm_fault_t dax_fault_actor(struct vm_fault *vmf, pfn_t *pfnp, unsigned long entry_flags = pmd ? DAX_PMD : 0; int err = 0; pfn_t pfn; + void *kaddr; /* if we are reading UNWRITTEN and HOLE, return a hole. */ if (!write && @@ -1392,18 +1457,25 @@ static vm_fault_t dax_fault_actor(struct vm_fault *vmf, pfn_t *pfnp, return dax_pmd_load_hole(xas, vmf, iomap, entry); } - if (iomap->type != IOMAP_MAPPED) { + if (iomap->type != IOMAP_MAPPED && !(iomap->flags & IOMAP_F_SHARED)) { WARN_ON_ONCE(1); return pmd ? VM_FAULT_FALLBACK : VM_FAULT_SIGBUS; } - err = dax_iomap_direct_access(iomap, pos, size, NULL, &pfn); + err = dax_iomap_direct_access(iomap, pos, size, &kaddr, &pfn); if (err) return pmd ? VM_FAULT_FALLBACK : dax_fault_return(err); *entry = dax_insert_entry(xas, mapping, vmf, *entry, pfn, entry_flags, write && !sync); + if (write && + srcmap->addr != IOMAP_HOLE && srcmap->addr != iomap->addr) { + err = dax_iomap_cow_copy(pos, size, size, srcmap, kaddr); + if (err) + return dax_fault_return(err); + } + if (sync) return dax_fault_synchronous_pfnp(pfnp, pfn); From patchwork Thu Apr 8 12:04:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiyang Ruan X-Patchwork-Id: 12190861 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84B0FC433ED for ; Thu, 8 Apr 2021 12:04:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 55B7E6115B for ; Thu, 8 Apr 2021 12:04:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231451AbhDHMFF (ORCPT ); Thu, 8 Apr 2021 08:05:05 -0400 Received: from mail.cn.fujitsu.com ([183.91.158.132]:24833 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S231195AbhDHMFF (ORCPT ); Thu, 8 Apr 2021 08:05:05 -0400 IronPort-HdrOrdr: A9a23:5PmxU6DeRQQp96/lHelM55DYdL4zR+YMi2QD/UZ3VBBTb4ikh9mj9c5rsSPcpRQwfDUbmd6GMLSdWn+0z/NIyKQYILvKZmfbkUSlIIxo5YHhhx3McheVysdzzqFlGpIeNPTVLXxXyfn3+xO5FdFI+qjjzImNif3Fx3lgCSFGApsQjDtRMQqQHk1oSAQuP/NQe6a03NZNpDarZB0sH6aGL0QCNtKim/T70LriYTMjQyUs8RSyi1qTg4LSIly12Qg/Xlp0rYsfzQ== X-IronPort-AV: E=Sophos;i="5.82,206,1613404800"; d="scan'208";a="106797270" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 08 Apr 2021 20:04:52 +0800 Received: from G08CNEXMBPEKD05.g08.fujitsu.local (unknown [10.167.33.204]) by cn.fujitsu.com (Postfix) with ESMTP id C4A9D4CEA876; Thu, 8 Apr 2021 20:04:47 +0800 (CST) Received: from G08CNEXCHPEKD04.g08.fujitsu.local (10.167.33.200) by G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 8 Apr 2021 20:04:37 +0800 Received: from irides.mr.mr.mr (10.167.225.141) by G08CNEXCHPEKD04.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Thu, 8 Apr 2021 20:04:37 +0800 From: Shiyang Ruan To: , , , CC: , , , , , , , , , Goldwyn Rodrigues , Ritesh Harjani Subject: [PATCH v4 2/7] fsdax: Replace mmap entry in case of CoW Date: Thu, 8 Apr 2021 20:04:27 +0800 Message-ID: <20210408120432.1063608-3-ruansy.fnst@fujitsu.com> X-Mailer: git-send-email 2.31.0 In-Reply-To: <20210408120432.1063608-1-ruansy.fnst@fujitsu.com> References: <20210408120432.1063608-1-ruansy.fnst@fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: C4A9D4CEA876.A55FE X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@fujitsu.com Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org We replace the existing entry to the newly allocated one in case of CoW. Also, we mark the entry as PAGECACHE_TAG_TOWRITE so writeback marks this entry as writeprotected. This helps us snapshots so new write pagefaults after snapshots trigger a CoW. Signed-off-by: Goldwyn Rodrigues Signed-off-by: Shiyang Ruan Reviewed-by: Christoph Hellwig Reviewed-by: Ritesh Harjani --- fs/dax.c | 39 ++++++++++++++++++++++++++++----------- 1 file changed, 28 insertions(+), 11 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index b4fd3813457a..e6c1354b27a8 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -722,6 +722,10 @@ static int copy_cow_page_dax(struct block_device *bdev, struct dax_device *dax_d return 0; } +/* DAX Insert Flag for the entry we insert */ +#define DAX_IF_DIRTY (1 << 0) +#define DAX_IF_COW (1 << 1) + /* * By this point grab_mapping_entry() has ensured that we have a locked entry * of the appropriate size so we don't have to worry about downgrading PMDs to @@ -729,16 +733,19 @@ static int copy_cow_page_dax(struct block_device *bdev, struct dax_device *dax_d * already in the tree, we will skip the insertion and just dirty the PMD as * appropriate. */ -static void *dax_insert_entry(struct xa_state *xas, - struct address_space *mapping, struct vm_fault *vmf, - void *entry, pfn_t pfn, unsigned long flags, bool dirty) +static void *dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf, + void *entry, pfn_t pfn, unsigned long flags, + unsigned int insert_flags) { + struct address_space *mapping = vmf->vma->vm_file->f_mapping; void *new_entry = dax_make_entry(pfn, flags); + bool dirty = insert_flags & DAX_IF_DIRTY; + bool cow = insert_flags & DAX_IF_COW; if (dirty) __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); - if (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE)) { + if (cow || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) { unsigned long index = xas->xa_index; /* we are replacing a zero page with block mapping */ if (dax_is_pmd_entry(entry)) @@ -750,7 +757,7 @@ static void *dax_insert_entry(struct xa_state *xas, xas_reset(xas); xas_lock_irq(xas); - if (dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) { + if (cow || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) { void *old; dax_disassociate_entry(entry, mapping, false); @@ -774,6 +781,9 @@ static void *dax_insert_entry(struct xa_state *xas, if (dirty) xas_set_mark(xas, PAGECACHE_TAG_DIRTY); + if (cow) + xas_set_mark(xas, PAGECACHE_TAG_TOWRITE); + xas_unlock_irq(xas); return entry; } @@ -1109,8 +1119,7 @@ static vm_fault_t dax_load_hole(struct xa_state *xas, pfn_t pfn = pfn_to_pfn_t(my_zero_pfn(vaddr)); vm_fault_t ret; - *entry = dax_insert_entry(xas, mapping, vmf, *entry, pfn, - DAX_ZERO_PAGE, false); + *entry = dax_insert_entry(xas, vmf, *entry, pfn, DAX_ZERO_PAGE, 0); ret = vmf_insert_mixed(vmf->vma, vaddr, pfn); trace_dax_load_hole(inode, vmf, ret); @@ -1137,8 +1146,8 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf, goto fallback; pfn = page_to_pfn_t(zero_page); - *entry = dax_insert_entry(xas, mapping, vmf, *entry, pfn, - DAX_PMD | DAX_ZERO_PAGE, false); + *entry = dax_insert_entry(xas, vmf, *entry, pfn, + DAX_PMD | DAX_ZERO_PAGE, 0); if (arch_needs_pgtable_deposit()) { pgtable = pte_alloc_one(vma->vm_mm); @@ -1444,6 +1453,7 @@ static vm_fault_t dax_fault_actor(struct vm_fault *vmf, pfn_t *pfnp, bool write = vmf->flags & FAULT_FLAG_WRITE; bool sync = dax_fault_is_synchronous(flags, vmf->vma, iomap); unsigned long entry_flags = pmd ? DAX_PMD : 0; + unsigned int insert_flags = 0; int err = 0; pfn_t pfn; void *kaddr; @@ -1466,8 +1476,15 @@ static vm_fault_t dax_fault_actor(struct vm_fault *vmf, pfn_t *pfnp, if (err) return pmd ? VM_FAULT_FALLBACK : dax_fault_return(err); - *entry = dax_insert_entry(xas, mapping, vmf, *entry, pfn, entry_flags, - write && !sync); + if (write) { + if (!sync) + insert_flags |= DAX_IF_DIRTY; + if (iomap->flags & IOMAP_F_SHARED) + insert_flags |= DAX_IF_COW; + } + + *entry = dax_insert_entry(xas, vmf, *entry, pfn, entry_flags, + insert_flags); if (write && srcmap->addr != IOMAP_HOLE && srcmap->addr != iomap->addr) { From patchwork Thu Apr 8 12:04:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiyang Ruan X-Patchwork-Id: 12190863 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4843C433B4 for ; Thu, 8 Apr 2021 12:05:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8935261164 for ; Thu, 8 Apr 2021 12:05:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231195AbhDHMFT (ORCPT ); Thu, 8 Apr 2021 08:05:19 -0400 Received: from mail.cn.fujitsu.com ([183.91.158.132]:24855 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S231395AbhDHMFS (ORCPT ); Thu, 8 Apr 2021 08:05:18 -0400 IronPort-HdrOrdr: A9a23:WRPeqa/y1cKUU1EwtMxuk+A4I+orLtY04lQ7vn1ZYxpTb8CeioSSjO0WvCWE7Ao5dVMBvZS7OKeGSW7B7pId2+QsFJqrQQWOggWVBa5v4YboyzfjXw3Sn9Q26Y5OaK57YeeQMXFfreLXpDa1CMwhxt7vytHMuc77w212RQ9nL4FMhj0JaTqzKUF9SAlYCZdRLvP1ifZvnSaqengcc62Adxs4dtXEzueqqLvWJTYCBzMCrDKFlC6U7tfBeCSw71MzVCxuzN4ZnVT4rw== X-IronPort-AV: E=Sophos;i="5.82,206,1613404800"; d="scan'208";a="106797286" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 08 Apr 2021 20:05:04 +0800 Received: from G08CNEXMBPEKD04.g08.fujitsu.local (unknown [10.167.33.201]) by cn.fujitsu.com (Postfix) with ESMTP id 3BCC14D0B8A9; Thu, 8 Apr 2021 20:04:59 +0800 (CST) Received: from G08CNEXCHPEKD04.g08.fujitsu.local (10.167.33.200) by G08CNEXMBPEKD04.g08.fujitsu.local (10.167.33.201) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 8 Apr 2021 20:04:49 +0800 Received: from irides.mr.mr.mr (10.167.225.141) by G08CNEXCHPEKD04.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Thu, 8 Apr 2021 20:04:48 +0800 From: Shiyang Ruan To: , , , CC: , , , , , , , , , Ritesh Harjani Subject: [PATCH v4 3/7] fsdax: Add dax_iomap_cow_copy() for dax_iomap_zero Date: Thu, 8 Apr 2021 20:04:28 +0800 Message-ID: <20210408120432.1063608-4-ruansy.fnst@fujitsu.com> X-Mailer: git-send-email 2.31.0 In-Reply-To: <20210408120432.1063608-1-ruansy.fnst@fujitsu.com> References: <20210408120432.1063608-1-ruansy.fnst@fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: 3BCC14D0B8A9.A4AFC X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@fujitsu.com Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Punch hole on a reflinked file needs dax_copy_edge() too. Otherwise, data in not aligned area will be not correct. So, add the srcmap to dax_iomap_zero() and replace memset() as dax_copy_edge(). Signed-off-by: Shiyang Ruan Reviewed-by: Ritesh Harjani --- fs/dax.c | 25 +++++++++++++++---------- fs/iomap/buffered-io.c | 2 +- include/linux/dax.h | 3 ++- 3 files changed, 18 insertions(+), 12 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index e6c1354b27a8..fcd1e932716e 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1186,7 +1186,8 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf, } #endif /* CONFIG_FS_DAX_PMD */ -s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap) +s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap, + struct iomap *srcmap) { sector_t sector = iomap_sector(iomap, pos & PAGE_MASK); pgoff_t pgoff; @@ -1208,19 +1209,23 @@ s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap) if (page_aligned) rc = dax_zero_page_range(iomap->dax_dev, pgoff, 1); - else + else { rc = dax_direct_access(iomap->dax_dev, pgoff, 1, &kaddr, NULL); - if (rc < 0) { - dax_read_unlock(id); - return rc; - } - - if (!page_aligned) { - memset(kaddr + offset, 0, size); + if (rc < 0) + goto out; + if (iomap->addr != srcmap->addr) { + rc = dax_iomap_cow_copy(offset, size, PAGE_SIZE, srcmap, + kaddr); + if (rc < 0) + goto out; + } else + memset(kaddr + offset, 0, size); dax_flush(iomap->dax_dev, kaddr + offset, size); } + +out: dax_read_unlock(id); - return size; + return rc < 0 ? rc : size; } static loff_t diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 414769a6ad11..67936e9967b8 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -933,7 +933,7 @@ static loff_t iomap_zero_range_actor(struct inode *inode, loff_t pos, s64 bytes; if (IS_DAX(inode)) - bytes = dax_iomap_zero(pos, length, iomap); + bytes = dax_iomap_zero(pos, length, iomap, srcmap); else bytes = iomap_zero(inode, pos, length, iomap, srcmap); if (bytes < 0) diff --git a/include/linux/dax.h b/include/linux/dax.h index b52f084aa643..3275e01ed33d 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -237,7 +237,8 @@ vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf, int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index); int dax_invalidate_mapping_entry_sync(struct address_space *mapping, pgoff_t index); -s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap); +s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap, + struct iomap *srcmap); static inline bool dax_mapping(struct address_space *mapping) { return mapping->host && IS_DAX(mapping->host); From patchwork Thu Apr 8 12:04:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiyang Ruan X-Patchwork-Id: 12190865 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E357C43460 for ; Thu, 8 Apr 2021 12:05:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2D64161164 for ; Thu, 8 Apr 2021 12:05:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231519AbhDHMFZ (ORCPT ); Thu, 8 Apr 2021 08:05:25 -0400 Received: from mail.cn.fujitsu.com ([183.91.158.132]:24855 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S231513AbhDHMFX (ORCPT ); Thu, 8 Apr 2021 08:05:23 -0400 IronPort-HdrOrdr: A9a23:Es1dYKNTetm8FsBcTiGjsMiAIKoaSvp033AA0UdtRRtJNvGJjszGpoV+6TbYqhY0HEshl9eJJbWaTRrnmKJdzIENMd6ZMjXOl2elIYpr54mn4xCIIUfD38FH06MISclDIfnRKXQ/ssrg+gm/FL8bsby62YSln/3XwXsobSwCUdAC0y5DBgyWElJ7SWB9bPJXKLOn+sFFqzC8EE5nDPiTO39tZYj+juyOvJfnTDpDPBQ/9TSJ5AnC1JfKVzSewTcSOgki/Ysf X-IronPort-AV: E=Sophos;i="5.82,206,1613404800"; d="scan'208";a="106797297" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 08 Apr 2021 20:05:11 +0800 Received: from G08CNEXMBPEKD04.g08.fujitsu.local (unknown [10.167.33.201]) by cn.fujitsu.com (Postfix) with ESMTP id 969E04D0B8A3; Thu, 8 Apr 2021 20:05:05 +0800 (CST) Received: from G08CNEXCHPEKD04.g08.fujitsu.local (10.167.33.200) by G08CNEXMBPEKD04.g08.fujitsu.local (10.167.33.201) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 8 Apr 2021 20:05:00 +0800 Received: from irides.mr.mr.mr (10.167.225.141) by G08CNEXCHPEKD04.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Thu, 8 Apr 2021 20:05:00 +0800 From: Shiyang Ruan To: , , , CC: , , , , , , , , Subject: [PATCH v4 4/7] iomap: Introduce iomap_apply2() for operations on two files Date: Thu, 8 Apr 2021 20:04:29 +0800 Message-ID: <20210408120432.1063608-5-ruansy.fnst@fujitsu.com> X-Mailer: git-send-email 2.31.0 In-Reply-To: <20210408120432.1063608-1-ruansy.fnst@fujitsu.com> References: <20210408120432.1063608-1-ruansy.fnst@fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: 969E04D0B8A3.A1BFB X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@fujitsu.com Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Some operations, such as comparing a range of data in two files under fsdax mode, requires nested iomap_open()/iomap_end() on two file. Thus, we introduce iomap_apply2() to accept arguments from two files and iomap_actor2_t for actions on two files. Signed-off-by: Shiyang Ruan Reviewed-by: Darrick J. Wong --- fs/iomap/apply.c | 52 +++++++++++++++++++++++++++++++++++++++++++ include/linux/iomap.h | 7 +++++- 2 files changed, 58 insertions(+), 1 deletion(-) diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c index 26ab6563181f..0493da5286ad 100644 --- a/fs/iomap/apply.c +++ b/fs/iomap/apply.c @@ -97,3 +97,55 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, return written ? written : ret; } + +loff_t +iomap_apply2(struct inode *ino1, loff_t pos1, struct inode *ino2, loff_t pos2, + loff_t length, unsigned int flags, const struct iomap_ops *ops, + void *data, iomap_actor2_t actor) +{ + struct iomap smap = { .type = IOMAP_HOLE }; + struct iomap dmap = { .type = IOMAP_HOLE }; + loff_t written = 0, ret, ret2 = 0; + loff_t len1 = length, len2, min_len; + + ret = ops->iomap_begin(ino1, pos1, len1, flags, &smap, NULL); + if (ret) + goto out; + if (WARN_ON(smap.offset > pos1)) { + written = -EIO; + goto out_src; + } + if (WARN_ON(smap.length == 0)) { + written = -EIO; + goto out_src; + } + len2 = min_t(loff_t, len1, smap.length); + + ret = ops->iomap_begin(ino2, pos2, len2, flags, &dmap, NULL); + if (ret) + goto out_src; + if (WARN_ON(dmap.offset > pos2)) { + written = -EIO; + goto out_dest; + } + if (WARN_ON(dmap.length == 0)) { + written = -EIO; + goto out_dest; + } + min_len = min_t(loff_t, len2, dmap.length); + + written = actor(ino1, pos1, ino2, pos2, min_len, data, &smap, &dmap); + +out_dest: + if (ops->iomap_end) + ret2 = ops->iomap_end(ino2, pos2, len2, + written > 0 ? written : 0, flags, &dmap); +out_src: + if (ops->iomap_end) + ret = ops->iomap_end(ino1, pos1, len1, + written > 0 ? written : 0, flags, &smap); +out: + if (written) + return written; + return ret ?: ret2; +} diff --git a/include/linux/iomap.h b/include/linux/iomap.h index d202fd2d0f91..9493c48bcc9c 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -150,10 +150,15 @@ struct iomap_ops { */ typedef loff_t (*iomap_actor_t)(struct inode *inode, loff_t pos, loff_t len, void *data, struct iomap *iomap, struct iomap *srcmap); - +typedef loff_t (*iomap_actor2_t)(struct inode *ino1, loff_t pos1, + struct inode *ino2, loff_t pos2, loff_t len, void *data, + struct iomap *smap, struct iomap *dmap); loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, const struct iomap_ops *ops, void *data, iomap_actor_t actor); +loff_t iomap_apply2(struct inode *ino1, loff_t pos1, struct inode *ino2, + loff_t pos2, loff_t length, unsigned int flags, + const struct iomap_ops *ops, void *data, iomap_actor2_t actor); ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from, const struct iomap_ops *ops); From patchwork Thu Apr 8 12:04:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiyang Ruan X-Patchwork-Id: 12190867 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D787CC43470 for ; Thu, 8 Apr 2021 12:05:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AB90B610C7 for ; Thu, 8 Apr 2021 12:05:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231395AbhDHMF0 (ORCPT ); Thu, 8 Apr 2021 08:05:26 -0400 Received: from mail.cn.fujitsu.com ([183.91.158.132]:24868 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S231240AbhDHMFY (ORCPT ); Thu, 8 Apr 2021 08:05:24 -0400 IronPort-HdrOrdr: A9a23:69ZtNammCMmpXpSuerUrdzZ2WvLpDfLM3DAbvn1ZSRFFG/GwvcaogfgdyFvImC8cMUtQ/eyoFYuhZTfn9ZBz6ZQMJrvKZmTbkUahMY0K1+Xf6hLtFyD0/uRekYdMGpIVNPTeFl5/5Pya3CCdM/INhOaK67qpg+C29QYJcShPZ7t75wl0Tia3e3cGJzVuPpYyGJqC6scvnVPJFkg/VNixBXUOQoH41r/2va/hCCRnOzcXrCGKjR6NrIXxCgWk2H4lOA9n8PMP9nfknmXCipmejw== X-IronPort-AV: E=Sophos;i="5.82,206,1613404800"; d="scan'208";a="106797296" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 08 Apr 2021 20:05:11 +0800 Received: from G08CNEXMBPEKD04.g08.fujitsu.local (unknown [10.167.33.201]) by cn.fujitsu.com (Postfix) with ESMTP id BC09B4CEA876; Thu, 8 Apr 2021 20:05:06 +0800 (CST) Received: from G08CNEXCHPEKD04.g08.fujitsu.local (10.167.33.200) by G08CNEXMBPEKD04.g08.fujitsu.local (10.167.33.201) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 8 Apr 2021 20:05:07 +0800 Received: from irides.mr.mr.mr (10.167.225.141) by G08CNEXCHPEKD04.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Thu, 8 Apr 2021 20:05:06 +0800 From: Shiyang Ruan To: , , , CC: , , , , , , , , , Goldwyn Rodrigues Subject: [PATCH v4 5/7] fsdax: Dedup file range to use a compare function Date: Thu, 8 Apr 2021 20:04:30 +0800 Message-ID: <20210408120432.1063608-6-ruansy.fnst@fujitsu.com> X-Mailer: git-send-email 2.31.0 In-Reply-To: <20210408120432.1063608-1-ruansy.fnst@fujitsu.com> References: <20210408120432.1063608-1-ruansy.fnst@fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: BC09B4CEA876.A4E3C X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@fujitsu.com Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org With dax we cannot deal with readpage() etc. So, we create a dax comparison funciton which is similar with vfs_dedupe_file_range_compare(). And introduce dax_remap_file_range_prep() for filesystem use. Signed-off-by: Goldwyn Rodrigues Signed-off-by: Shiyang Ruan --- fs/dax.c | 56 ++++++++++++++++++++++++++++++++++++++++++++ fs/remap_range.c | 45 ++++++++++++++++++++++++++++------- fs/xfs/xfs_reflink.c | 9 +++++-- include/linux/dax.h | 4 ++++ include/linux/fs.h | 12 ++++++---- 5 files changed, 112 insertions(+), 14 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index fcd1e932716e..ba924b6629a6 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1849,3 +1849,59 @@ vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf, return dax_insert_pfn_mkwrite(vmf, pfn, order); } EXPORT_SYMBOL_GPL(dax_finish_sync_fault); + +static loff_t dax_range_compare_actor(struct inode *ino1, loff_t pos1, + struct inode *ino2, loff_t pos2, loff_t len, void *data, + struct iomap *smap, struct iomap *dmap) +{ + void *saddr, *daddr; + bool *same = data; + int ret; + + if (smap->type == IOMAP_HOLE && dmap->type == IOMAP_HOLE) { + *same = true; + return len; + } + + if (smap->type == IOMAP_HOLE || dmap->type == IOMAP_HOLE) { + *same = false; + return 0; + } + + ret = dax_iomap_direct_access(smap, pos1, ALIGN(pos1 + len, PAGE_SIZE), + &saddr, NULL); + if (ret < 0) + return -EIO; + + ret = dax_iomap_direct_access(dmap, pos2, ALIGN(pos2 + len, PAGE_SIZE), + &daddr, NULL); + if (ret < 0) + return -EIO; + + *same = !memcmp(saddr, daddr, len); + return len; +} + +int dax_dedupe_file_range_compare(struct inode *src, loff_t srcoff, + struct inode *dest, loff_t destoff, loff_t len, bool *is_same, + const struct iomap_ops *ops) +{ + int id, ret = 0; + + id = dax_read_lock(); + while (len) { + ret = iomap_apply2(src, srcoff, dest, destoff, len, 0, ops, + is_same, dax_range_compare_actor); + if (ret < 0 || !*is_same) + goto out; + + len -= ret; + srcoff += ret; + destoff += ret; + } + ret = 0; +out: + dax_read_unlock(id); + return ret; +} +EXPORT_SYMBOL_GPL(dax_dedupe_file_range_compare); diff --git a/fs/remap_range.c b/fs/remap_range.c index e4a5fdd7ad7b..1fab0db49c68 100644 --- a/fs/remap_range.c +++ b/fs/remap_range.c @@ -14,6 +14,7 @@ #include #include #include +#include #include "internal.h" #include @@ -199,9 +200,9 @@ static void vfs_unlock_two_pages(struct page *page1, struct page *page2) * Compare extents of two files to see if they are the same. * Caller must have locked both inodes to prevent write races. */ -static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff, - struct inode *dest, loff_t destoff, - loff_t len, bool *is_same) +int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff, + struct inode *dest, loff_t destoff, + loff_t len, bool *is_same) { loff_t src_poff; loff_t dest_poff; @@ -280,6 +281,7 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff, out_error: return error; } +EXPORT_SYMBOL(vfs_dedupe_file_range_compare); /* * Check that the two inodes are eligible for cloning, the ranges make @@ -289,9 +291,11 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff, * If there's an error, then the usual negative error code is returned. * Otherwise returns 0 with *len set to the request length. */ -int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, - struct file *file_out, loff_t pos_out, - loff_t *len, unsigned int remap_flags) +static int +__generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, + struct file *file_out, loff_t pos_out, + loff_t *len, unsigned int remap_flags, + const struct iomap_ops *ops) { struct inode *inode_in = file_inode(file_in); struct inode *inode_out = file_inode(file_out); @@ -351,8 +355,15 @@ int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, if (remap_flags & REMAP_FILE_DEDUP) { bool is_same = false; - ret = vfs_dedupe_file_range_compare(inode_in, pos_in, - inode_out, pos_out, *len, &is_same); + if (!IS_DAX(inode_in) && !IS_DAX(inode_out)) + ret = vfs_dedupe_file_range_compare(inode_in, pos_in, + inode_out, pos_out, *len, &is_same); + else if (IS_DAX(inode_in) && IS_DAX(inode_out) && ops) + ret = dax_dedupe_file_range_compare(inode_in, pos_in, + inode_out, pos_out, *len, &is_same, + ops); + else + return -EINVAL; if (ret) return ret; if (!is_same) @@ -370,6 +381,24 @@ int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, return ret; } + +int dax_remap_file_range_prep(struct file *file_in, loff_t pos_in, + struct file *file_out, loff_t pos_out, + loff_t *len, unsigned int remap_flags, + const struct iomap_ops *ops) +{ + return __generic_remap_file_range_prep(file_in, pos_in, file_out, + pos_out, len, remap_flags, ops); +} +EXPORT_SYMBOL(dax_remap_file_range_prep); + +int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, + struct file *file_out, loff_t pos_out, + loff_t *len, unsigned int remap_flags) +{ + return __generic_remap_file_range_prep(file_in, pos_in, file_out, + pos_out, len, remap_flags, NULL); +} EXPORT_SYMBOL(generic_remap_file_range_prep); loff_t do_clone_file_range(struct file *file_in, loff_t pos_in, diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 725c7d8e4438..9ef9f98725a2 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1329,8 +1329,13 @@ xfs_reflink_remap_prep( if (IS_DAX(inode_in) || IS_DAX(inode_out)) goto out_unlock; - ret = generic_remap_file_range_prep(file_in, pos_in, file_out, pos_out, - len, remap_flags); + if (!IS_DAX(inode_in)) + ret = generic_remap_file_range_prep(file_in, pos_in, file_out, + pos_out, len, remap_flags); + else + ret = dax_remap_file_range_prep(file_in, pos_in, file_out, + pos_out, len, remap_flags, + &xfs_read_iomap_ops); if (ret || *len == 0) goto out_unlock; diff --git a/include/linux/dax.h b/include/linux/dax.h index 3275e01ed33d..32e1c34349f2 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -239,6 +239,10 @@ int dax_invalidate_mapping_entry_sync(struct address_space *mapping, pgoff_t index); s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap, struct iomap *srcmap); +int dax_dedupe_file_range_compare(struct inode *src, loff_t srcoff, + struct inode *dest, loff_t destoff, + loff_t len, bool *is_same, + const struct iomap_ops *ops); static inline bool dax_mapping(struct address_space *mapping) { return mapping->host && IS_DAX(mapping->host); diff --git a/include/linux/fs.h b/include/linux/fs.h index ec8f3ddf4a6a..28861e334dac 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -70,6 +70,7 @@ struct fsverity_info; struct fsverity_operations; struct fs_context; struct fs_parameter_spec; +struct iomap_ops; extern void __init inode_init(void); extern void __init inode_init_early(void); @@ -1989,10 +1990,13 @@ extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *, extern ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, size_t len, unsigned int flags); -extern int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, - struct file *file_out, loff_t pos_out, - loff_t *count, - unsigned int remap_flags); +int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, + struct file *file_out, loff_t pos_out, + loff_t *count, unsigned int remap_flags); +int dax_remap_file_range_prep(struct file *file_in, loff_t pos_in, + struct file *file_out, loff_t pos_out, + loff_t *len, unsigned int remap_flags, + const struct iomap_ops *ops); extern loff_t do_clone_file_range(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t len, unsigned int remap_flags); From patchwork Thu Apr 8 12:04:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiyang Ruan X-Patchwork-Id: 12190869 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98121C433ED for ; Thu, 8 Apr 2021 12:05:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6BB566115B for ; Thu, 8 Apr 2021 12:05:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231572AbhDHMFk (ORCPT ); Thu, 8 Apr 2021 08:05:40 -0400 Received: from mail.cn.fujitsu.com ([183.91.158.132]:24879 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S230467AbhDHMFg (ORCPT ); Thu, 8 Apr 2021 08:05:36 -0400 IronPort-HdrOrdr: A9a23:WSmrB64l1Mx8I0MRhwPXwCjXdLJzesId70hD6mlaTxtJfsuE0/2/hfhz726RtB89elEF3eqBNq6JXG/G+fdOjLU5EL++UGDd1leAA41v4IDryT+lOwCWzIRg/Ih6dawWMrzNJHxbqeq/3wWiCdYnx7C8gcWVrMPT1W1kQw0vS4wI1XYbNi+hHkd7RBZLCPMCffLy2uN8uzGidX4LB/7LZEUtYu6rnb32vaOjSRsHKjpi0wOWkA6vgYSQLzGomjsYTBNDqI1PzVT4 X-IronPort-AV: E=Sophos;i="5.82,206,1613404800"; d="scan'208";a="106797311" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 08 Apr 2021 20:05:23 +0800 Received: from G08CNEXMBPEKD04.g08.fujitsu.local (unknown [10.167.33.201]) by cn.fujitsu.com (Postfix) with ESMTP id 2EBEE4CF2676; Thu, 8 Apr 2021 20:05:18 +0800 (CST) Received: from G08CNEXCHPEKD04.g08.fujitsu.local (10.167.33.200) by G08CNEXMBPEKD04.g08.fujitsu.local (10.167.33.201) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 8 Apr 2021 20:05:08 +0800 Received: from irides.mr.mr.mr (10.167.225.141) by G08CNEXCHPEKD04.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Thu, 8 Apr 2021 20:05:07 +0800 From: Shiyang Ruan To: , , , CC: , , , , , , , , Subject: [PATCH v4 6/7] fs/xfs: Handle CoW for fsdax write() path Date: Thu, 8 Apr 2021 20:04:31 +0800 Message-ID: <20210408120432.1063608-7-ruansy.fnst@fujitsu.com> X-Mailer: git-send-email 2.31.0 In-Reply-To: <20210408120432.1063608-1-ruansy.fnst@fujitsu.com> References: <20210408120432.1063608-1-ruansy.fnst@fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: 2EBEE4CF2676.A45D4 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@fujitsu.com Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org In fsdax mode, WRITE and ZERO on a shared extent need CoW performed. After CoW, new allocated extents needs to be remapped to the file. So, add an iomap_end for dax write ops to do the remapping work. Signed-off-by: Shiyang Ruan --- fs/xfs/xfs_bmap_util.c | 3 +-- fs/xfs/xfs_file.c | 9 +++---- fs/xfs/xfs_iomap.c | 58 +++++++++++++++++++++++++++++++++++++++++- fs/xfs/xfs_iomap.h | 4 +++ fs/xfs/xfs_iops.c | 7 +++-- fs/xfs/xfs_reflink.c | 3 +-- 6 files changed, 69 insertions(+), 15 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index e7d68318e6a5..9fcea33dd2c9 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -954,8 +954,7 @@ xfs_free_file_space( return 0; if (offset + len > XFS_ISIZE(ip)) len = XFS_ISIZE(ip) - offset; - error = iomap_zero_range(VFS_I(ip), offset, len, NULL, - &xfs_buffered_write_iomap_ops); + error = xfs_iomap_zero_range(VFS_I(ip), offset, len, NULL); if (error) return error; diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index a007ca0711d9..5795d5d6f869 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -684,11 +684,8 @@ xfs_file_dax_write( pos = iocb->ki_pos; trace_xfs_file_dax_write(iocb, from); - ret = dax_iomap_rw(iocb, from, &xfs_direct_write_iomap_ops); - if (ret > 0 && iocb->ki_pos > i_size_read(inode)) { - i_size_write(inode, iocb->ki_pos); - error = xfs_setfilesize(ip, pos, ret); - } + ret = dax_iomap_rw(iocb, from, &xfs_dax_write_iomap_ops); + out: if (iolock) xfs_iunlock(ip, iolock); @@ -1309,7 +1306,7 @@ __xfs_filemap_fault( ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, (write_fault && !vmf->cow_page) ? - &xfs_direct_write_iomap_ops : + &xfs_dax_write_iomap_ops : &xfs_read_iomap_ops); if (ret & VM_FAULT_NEEDDSYNC) ret = dax_finish_sync_fault(vmf, pe_size, pfn); diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index e17ab7f42928..f818f989687b 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -760,7 +760,8 @@ xfs_direct_write_iomap_begin( /* may drop and re-acquire the ilock */ error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared, - &lockmode, flags & IOMAP_DIRECT); + &lockmode, + flags & IOMAP_DIRECT || IS_DAX(inode)); if (error) goto out_unlock; if (shared) @@ -853,6 +854,38 @@ const struct iomap_ops xfs_direct_write_iomap_ops = { .iomap_begin = xfs_direct_write_iomap_begin, }; +static int +xfs_dax_write_iomap_end( + struct inode *inode, + loff_t pos, + loff_t length, + ssize_t written, + unsigned int flags, + struct iomap *iomap) +{ + int error = 0; + xfs_inode_t *ip = XFS_I(inode); + bool cow = xfs_is_cow_inode(ip); + + if (pos + written > i_size_read(inode)) { + i_size_write(inode, pos + written); + error = xfs_setfilesize(ip, pos, written); + if (error && cow) { + xfs_reflink_cancel_cow_range(ip, pos, written, true); + return error; + } + } + if (cow) + error = xfs_reflink_end_cow(ip, pos, written); + + return error; +} + +const struct iomap_ops xfs_dax_write_iomap_ops = { + .iomap_begin = xfs_direct_write_iomap_begin, + .iomap_end = xfs_dax_write_iomap_end, +}; + static int xfs_buffered_write_iomap_begin( struct inode *inode, @@ -1314,3 +1347,26 @@ xfs_xattr_iomap_begin( const struct iomap_ops xfs_xattr_iomap_ops = { .iomap_begin = xfs_xattr_iomap_begin, }; + +int +xfs_iomap_zero_range( + struct inode *inode, + loff_t offset, + loff_t len, + bool *did_zero) +{ + return iomap_zero_range(inode, offset, len, did_zero, + IS_DAX(inode) ? &xfs_dax_write_iomap_ops : + &xfs_buffered_write_iomap_ops); +} + +int +xfs_iomap_truncate_page( + struct inode *inode, + loff_t pos, + bool *did_zero) +{ + return iomap_truncate_page(inode, pos, did_zero, + IS_DAX(inode) ? &xfs_dax_write_iomap_ops : + &xfs_buffered_write_iomap_ops); +} diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h index 7d3703556d0e..8adb2bf78a5a 100644 --- a/fs/xfs/xfs_iomap.h +++ b/fs/xfs/xfs_iomap.h @@ -14,6 +14,9 @@ struct xfs_bmbt_irec; int xfs_iomap_write_direct(struct xfs_inode *ip, xfs_fileoff_t offset_fsb, xfs_fileoff_t count_fsb, struct xfs_bmbt_irec *imap); int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t, bool); +int xfs_iomap_zero_range(struct inode *inode, loff_t offset, loff_t len, + bool *did_zero); +int xfs_iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero); xfs_fileoff_t xfs_iomap_eof_align_last_fsb(struct xfs_inode *ip, xfs_fileoff_t end_fsb); @@ -42,6 +45,7 @@ xfs_aligned_fsb_count( extern const struct iomap_ops xfs_buffered_write_iomap_ops; extern const struct iomap_ops xfs_direct_write_iomap_ops; +extern const struct iomap_ops xfs_dax_write_iomap_ops; extern const struct iomap_ops xfs_read_iomap_ops; extern const struct iomap_ops xfs_seek_iomap_ops; extern const struct iomap_ops xfs_xattr_iomap_ops; diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 66ebccb5a6ff..db8eeaa8a773 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -879,8 +879,8 @@ xfs_setattr_size( */ if (newsize > oldsize) { trace_xfs_zero_eof(ip, oldsize, newsize - oldsize); - error = iomap_zero_range(inode, oldsize, newsize - oldsize, - &did_zeroing, &xfs_buffered_write_iomap_ops); + error = xfs_iomap_zero_range(inode, oldsize, newsize - oldsize, + &did_zeroing); } else { /* * iomap won't detect a dirty page over an unwritten block (or a @@ -892,8 +892,7 @@ xfs_setattr_size( newsize); if (error) return error; - error = iomap_truncate_page(inode, newsize, &did_zeroing, - &xfs_buffered_write_iomap_ops); + error = xfs_iomap_truncate_page(inode, newsize, &did_zeroing); } if (error) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 9ef9f98725a2..a4cd6e8a7aa0 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1266,8 +1266,7 @@ xfs_reflink_zero_posteof( return 0; trace_xfs_zero_eof(ip, isize, pos - isize); - return iomap_zero_range(VFS_I(ip), isize, pos - isize, NULL, - &xfs_buffered_write_iomap_ops); + return xfs_iomap_zero_range(VFS_I(ip), isize, pos - isize, NULL); } /* From patchwork Thu Apr 8 12:04:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiyang Ruan X-Patchwork-Id: 12190871 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.9 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B513C433B4 for ; Thu, 8 Apr 2021 12:05:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1E3CB61177 for ; Thu, 8 Apr 2021 12:05:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231591AbhDHMFo (ORCPT ); Thu, 8 Apr 2021 08:05:44 -0400 Received: from mail.cn.fujitsu.com ([183.91.158.132]:24893 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S231255AbhDHMFo (ORCPT ); Thu, 8 Apr 2021 08:05:44 -0400 IronPort-HdrOrdr: A9a23:77b/e68CuoxXbejE+Ehuk+A4I+orLtY04lQ7vn1ZYxpTb8CeioSSjO0WvCWE7Ao5dVMBvZS7OKeGSW7B7pId2+QsFJqrQQWOggWVBa5v4YboyzfjXw3Sn9Q26Y5OaK57YeeQMXFfreLXpDa1CMwhxt7vytHMuc77w212RQ9nL4FMhj0JaTqzKUF9SAlYCZdRLvP1ifZvnSaqengcc62Adxs4dtXEzueqqLvWJTYCBzMCrDKFlC6U7tfBeCSw71MzVCxuzN4ZnVT4rw== X-IronPort-AV: E=Sophos;i="5.82,206,1613404800"; d="scan'208";a="106797324" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 08 Apr 2021 20:05:29 +0800 Received: from G08CNEXMBPEKD04.g08.fujitsu.local (unknown [10.167.33.201]) by cn.fujitsu.com (Postfix) with ESMTP id 87CFB4D0B8A6; Thu, 8 Apr 2021 20:05:24 +0800 (CST) Received: from G08CNEXCHPEKD04.g08.fujitsu.local (10.167.33.200) by G08CNEXMBPEKD04.g08.fujitsu.local (10.167.33.201) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 8 Apr 2021 20:05:19 +0800 Received: from irides.mr.mr.mr (10.167.225.141) by G08CNEXCHPEKD04.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Thu, 8 Apr 2021 20:05:19 +0800 From: Shiyang Ruan To: , , , CC: , , , , , , , , Subject: [PATCH v4 7/7] fs/xfs: Add dedupe support for fsdax Date: Thu, 8 Apr 2021 20:04:32 +0800 Message-ID: <20210408120432.1063608-8-ruansy.fnst@fujitsu.com> X-Mailer: git-send-email 2.31.0 In-Reply-To: <20210408120432.1063608-1-ruansy.fnst@fujitsu.com> References: <20210408120432.1063608-1-ruansy.fnst@fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: 87CFB4D0B8A6.A41E9 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@fujitsu.com Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Add xfs_break_two_dax_layouts() to break layout for tow dax files. Then call compare range function only when files are both DAX or not. Signed-off-by: Shiyang Ruan --- fs/xfs/xfs_file.c | 20 ++++++++++++++++++++ fs/xfs/xfs_inode.c | 8 +++++++- fs/xfs/xfs_inode.h | 1 + fs/xfs/xfs_reflink.c | 5 +++-- 4 files changed, 31 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 5795d5d6f869..1fd457167c12 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -842,6 +842,26 @@ xfs_break_dax_layouts( 0, 0, xfs_wait_dax_page(inode)); } +int +xfs_break_two_dax_layouts( + struct inode *src, + struct inode *dest) +{ + int error; + bool retry = false; + +retry: + error = xfs_break_dax_layouts(src, &retry); + if (error || retry) + goto retry; + + error = xfs_break_dax_layouts(dest, &retry); + if (error || retry) + goto retry; + + return error; +} + int xfs_break_layouts( struct inode *inode, diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index f93370bd7b1e..c01786917eef 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -3713,8 +3713,10 @@ xfs_ilock2_io_mmap( struct xfs_inode *ip2) { int ret; + struct inode *inode1 = VFS_I(ip1); + struct inode *inode2 = VFS_I(ip2); - ret = xfs_iolock_two_inodes_and_break_layout(VFS_I(ip1), VFS_I(ip2)); + ret = xfs_iolock_two_inodes_and_break_layout(inode1, inode2); if (ret) return ret; if (ip1 == ip2) @@ -3722,6 +3724,10 @@ xfs_ilock2_io_mmap( else xfs_lock_two_inodes(ip1, XFS_MMAPLOCK_EXCL, ip2, XFS_MMAPLOCK_EXCL); + + if (IS_DAX(inode1) && IS_DAX(inode2)) + ret = xfs_break_two_dax_layouts(inode1, inode2); + return 0; } diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 88ee4c3930ae..5ef21924dddc 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -435,6 +435,7 @@ enum xfs_prealloc_flags { int xfs_update_prealloc_flags(struct xfs_inode *ip, enum xfs_prealloc_flags flags); +int xfs_break_two_dax_layouts(struct inode *inode1, struct inode *inode2); int xfs_break_layouts(struct inode *inode, uint *iolock, enum layout_break_reason reason); diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index a4cd6e8a7aa0..4426bcc8a985 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -29,6 +29,7 @@ #include "xfs_iomap.h" #include "xfs_sb.h" #include "xfs_ag_resv.h" +#include /* * Copy on Write of Shared Blocks @@ -1324,8 +1325,8 @@ xfs_reflink_remap_prep( if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest)) goto out_unlock; - /* Don't share DAX file data for now. */ - if (IS_DAX(inode_in) || IS_DAX(inode_out)) + /* Don't share DAX file data with non-DAX file. */ + if (IS_DAX(inode_in) != IS_DAX(inode_out)) goto out_unlock; if (!IS_DAX(inode_in))