From patchwork Mon Apr 27 08:47:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruan Shiyang X-Patchwork-Id: 11511615 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6FF1F14DD for ; Mon, 27 Apr 2020 08:48:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 38CB7206D4 for ; Mon, 27 Apr 2020 08:48:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 38CB7206D4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5B3A58E0008; Mon, 27 Apr 2020 04:48:39 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 514508E0001; Mon, 27 Apr 2020 04:48:39 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B4118E0008; Mon, 27 Apr 2020 04:48:39 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0056.hostedemail.com [216.40.44.56]) by kanga.kvack.org (Postfix) with ESMTP id 21CA28E0001 for ; Mon, 27 Apr 2020 04:48:39 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id DEF226882 for ; Mon, 27 Apr 2020 08:48:38 +0000 (UTC) X-FDA: 76753009116.03.shoes01_2e3cf8b60514c X-Spam-Summary: 2,0,0,6e067c736ff8883b,d41d8cd98f00b204,ruansy.fnst@cn.fujitsu.com,,RULES_HIT:2:41:355:379:541:960:966:973:988:989:1260:1261:1277:1311:1313:1314:1345:1359:1431:1437:1515:1516:1518:1535:1605:1730:1747:1777:1792:2196:2198:2199:2200:2393:2553:2559:2562:2693:2731:3138:3139:3140:3141:3142:3622:3865:3866:3867:3868:3870:3871:3872:4050:4120:4250:4321:4385:4605:5007:6119:6120:6261:7903:10004:11026:11232:11233:11473:11537:11658:11914:12043:12291:12296:12297:12438:12521:12555:12679:12683:12895:12986:13019:13161:13184:13229:13894:14096:14097:21080:21450:21451:21627:21990:30003:30054:30070:30090,0,RBL:183.91.158.132:@cn.fujitsu.com:.lbl8.mailshell.net-64.201.201.201 62.16.2.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: shoes01_2e3cf8b60514c X-Filterd-Recvd-Size: 9889 Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Mon, 27 Apr 2020 08:48:37 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.73,323,1583164800"; d="scan'208";a="90547652" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 27 Apr 2020 16:48:31 +0800 Received: from G08CNEXMBPEKD04.g08.fujitsu.local (unknown [10.167.33.201]) by cn.fujitsu.com (Postfix) with ESMTP id D7A794BCC88C; Mon, 27 Apr 2020 16:37:49 +0800 (CST) Received: from G08CNEXCHPEKD05.g08.fujitsu.local (10.167.33.203) by G08CNEXMBPEKD04.g08.fujitsu.local (10.167.33.201) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 27 Apr 2020 16:48:32 +0800 Received: from localhost.localdomain (10.167.225.141) by G08CNEXCHPEKD05.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Mon, 27 Apr 2020 16:48:31 +0800 From: Shiyang Ruan To: , , CC: , , , , , , , , Subject: [RFC PATCH 1/8] fs/dax: Introduce dax-rmap btree for reflink Date: Mon, 27 Apr 2020 16:47:43 +0800 Message-ID: <20200427084750.136031-2-ruansy.fnst@cn.fujitsu.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com> References: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: D7A794BCC88C.A20EE X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com X-Spam-Status: No X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Normally, when accessing a mmapped file, entering the page fault, the file's (->mapping, ->index) will be associated with dax entry(represents for one page or a couple of pages) to facilitate the reverse mapping search. But in the case of reflink, a dax entry may be shared by multiple files or offsets. In order to establish a reverse mapping relationship in this case, I introduce a rb-tree to track multiple files and offsets. The root of the rb-tree is stored in page->private, since I haven't found it be used in fsdax. We create the rb-tree and insert the (->mapping, ->index) tuple in the second time a dax entry is associated, which means this dax entry is shared. And delete this tuple from the rb-tree when disassociating. Signed-off-by: Shiyang Ruan --- fs/dax.c | 153 ++++++++++++++++++++++++++++++++++++++++---- include/linux/dax.h | 6 ++ 2 files changed, 147 insertions(+), 12 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 11b16729b86f..2f996c566103 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #define CREATE_TRACE_POINTS @@ -310,6 +311,120 @@ static unsigned long dax_entry_size(void *entry) return PAGE_SIZE; } +static struct kmem_cache *dax_rmap_node_cachep; +static struct kmem_cache *dax_rmap_root_cachep; + +static int __init init_dax_rmap_cache(void) +{ + dax_rmap_root_cachep = KMEM_CACHE(rb_root_cached, SLAB_PANIC|SLAB_ACCOUNT); + dax_rmap_node_cachep = KMEM_CACHE(shared_file, SLAB_PANIC|SLAB_ACCOUNT); + return 0; +} +fs_initcall(init_dax_rmap_cache); + +struct rb_root_cached *dax_create_rbroot(void) +{ + struct rb_root_cached *root = kmem_cache_alloc(dax_rmap_root_cachep, + GFP_KERNEL); + + memset(root, 0, sizeof(struct rb_root_cached)); + return root; +} + +static bool dax_rmap_insert(struct page *page, struct address_space *mapping, + pgoff_t index) +{ + struct rb_root_cached *root = (struct rb_root_cached *)page_private(page); + struct rb_node **new, *parent = NULL; + struct shared_file *p; + bool leftmost = true; + + if (!root) { + root = dax_create_rbroot(); + set_page_private(page, (unsigned long)root); + dax_rmap_insert(page, page->mapping, page->index); + } + new = &root->rb_root.rb_node; + /* Figure out where to insert new node */ + while (*new) { + struct shared_file *this = container_of(*new, struct shared_file, node); + long result = (long)mapping - (long)this->mapping; + + if (result == 0) + result = (long)index - (long)this->index; + parent = *new; + if (result < 0) + new = &((*new)->rb_left); + else if (result > 0) { + new = &((*new)->rb_right); + leftmost = false; + } else + return false; + } + p = kmem_cache_alloc(dax_rmap_node_cachep, GFP_KERNEL); + p->mapping = mapping; + p->index = index; + + /* Add new node and rebalance tree. */ + rb_link_node(&p->node, parent, new); + rb_insert_color_cached(&p->node, root, leftmost); + + return true; +} + +static struct shared_file *dax_rmap_search(struct page *page, + struct address_space *mapping, + pgoff_t index) +{ + struct rb_root_cached *root = (struct rb_root_cached *)page_private(page); + struct rb_node *node = root->rb_root.rb_node; + + while (node) { + struct shared_file *this = container_of(node, struct shared_file, node); + long result = (long)mapping - (long)this->mapping; + + if (result == 0) + result = (long)index - (long)this->index; + if (result < 0) + node = node->rb_left; + else if (result > 0) + node = node->rb_right; + else + return this; + } + return NULL; +} + +static void dax_rmap_delete(struct page *page, struct address_space *mapping, + pgoff_t index) +{ + struct rb_root_cached *root = (struct rb_root_cached *)page_private(page); + struct shared_file *this; + + if (!root) { + page->mapping = NULL; + page->index = 0; + return; + } + + this = dax_rmap_search(page, mapping, index); + rb_erase_cached(&this->node, root); + kmem_cache_free(dax_rmap_node_cachep, this); + + if (!RB_EMPTY_ROOT(&root->rb_root)) { + if (page->mapping == mapping && page->index == index) { + this = container_of(rb_first_cached(root), struct shared_file, node); + page->mapping = this->mapping; + page->index = this->index; + } + } else { + kmem_cache_free(dax_rmap_root_cachep, root); + set_page_private(page, 0); + page->mapping = NULL; + page->index = 0; + } +} + static unsigned long dax_end_pfn(void *entry) { return dax_to_pfn(entry) + dax_entry_size(entry) / PAGE_SIZE; @@ -341,16 +456,20 @@ static void dax_associate_entry(void *entry, struct address_space *mapping, for_each_mapped_pfn(entry, pfn) { struct page *page = pfn_to_page(pfn); - WARN_ON_ONCE(page->mapping); - page->mapping = mapping; - page->index = index + i++; + if (!page->mapping) { + page->mapping = mapping; + page->index = index + i++; + } else { + dax_rmap_insert(page, mapping, index + i++); + } } } static void dax_disassociate_entry(void *entry, struct address_space *mapping, - bool trunc) + pgoff_t index, bool trunc) { unsigned long pfn; + int i = 0; if (IS_ENABLED(CONFIG_FS_DAX_LIMITED)) return; @@ -359,9 +478,19 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping, struct page *page = pfn_to_page(pfn); WARN_ON_ONCE(trunc && page_ref_count(page) > 1); - WARN_ON_ONCE(page->mapping && page->mapping != mapping); - page->mapping = NULL; - page->index = 0; + WARN_ON_ONCE(!page->mapping); + dax_rmap_delete(page, mapping, index + i++); + } +} + +static void __dax_decrease_nrexceptional(void *entry, + struct address_space *mapping) +{ + if (dax_is_empty_entry(entry) || dax_is_zero_entry(entry) || + dax_is_pmd_entry(entry)) { + mapping->nrexceptional--; + } else { + mapping->nrexceptional -= PHYS_PFN(dax_entry_size(entry)); } } @@ -522,10 +651,10 @@ static void *grab_mapping_entry(struct xa_state *xas, xas_lock_irq(xas); } - dax_disassociate_entry(entry, mapping, false); + dax_disassociate_entry(entry, mapping, index, false); xas_store(xas, NULL); /* undo the PMD join */ dax_wake_entry(xas, entry, true); - mapping->nrexceptional--; + __dax_decrease_nrexceptional(entry, mapping); entry = NULL; xas_set(xas, index); } @@ -642,9 +771,9 @@ static int __dax_invalidate_entry(struct address_space *mapping, (xas_get_mark(&xas, PAGECACHE_TAG_DIRTY) || xas_get_mark(&xas, PAGECACHE_TAG_TOWRITE))) goto out; - dax_disassociate_entry(entry, mapping, trunc); + dax_disassociate_entry(entry, mapping, index, trunc); xas_store(&xas, NULL); - mapping->nrexceptional--; + __dax_decrease_nrexceptional(entry, mapping); ret = 1; out: put_unlocked_entry(&xas, entry); @@ -737,7 +866,7 @@ static void *dax_insert_entry(struct xa_state *xas, if (dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) { void *old; - dax_disassociate_entry(entry, mapping, false); + dax_disassociate_entry(entry, mapping, xas->xa_index, false); dax_associate_entry(new_entry, mapping, vmf->vma, vmf->address); /* * Only swap our new entry into the page cache if the current diff --git a/include/linux/dax.h b/include/linux/dax.h index d7af5d243f24..1e2e81c701b6 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -39,6 +39,12 @@ struct dax_operations { int (*zero_page_range)(struct dax_device *, pgoff_t, size_t); }; +struct shared_file { + struct address_space *mapping; + pgoff_t index; + struct rb_node node; +}; + extern struct attribute_group dax_attribute_group; #if IS_ENABLED(CONFIG_DAX) From patchwork Mon Apr 27 08:47:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruan Shiyang X-Patchwork-Id: 11511601 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 01F4C92A for ; Mon, 27 Apr 2020 08:48:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C016E2098B for ; Mon, 27 Apr 2020 08:48:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C016E2098B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0F3258E0006; Mon, 27 Apr 2020 04:48:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0A55D8E0001; Mon, 27 Apr 2020 04:48:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F23A58E0006; Mon, 27 Apr 2020 04:48:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0061.hostedemail.com [216.40.44.61]) by kanga.kvack.org (Postfix) with ESMTP id DB0D18E0001 for ; Mon, 27 Apr 2020 04:48:35 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 8BC7C180AD804 for ; Mon, 27 Apr 2020 08:48:35 +0000 (UTC) X-FDA: 76753008990.03.glove59_2db3f2f2e0745 X-Spam-Summary: 2,0,0,b20d549e6d5bf0b2,d41d8cd98f00b204,ruansy.fnst@cn.fujitsu.com,,RULES_HIT:2:41:69:355:379:541:960:973:981:988:989:1260:1261:1277:1311:1313:1314:1345:1359:1437:1515:1516:1518:1535:1605:1730:1747:1777:1792:2393:2559:2562:2693:2898:2899:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4049:4119:4321:4605:5007:6120:6261:7903:8957:10004:10226:11026:11537:11658:11914:12043:12291:12296:12297:12438:12521:12555:12683:12895:12986:13019:13184:13229:13894:14096:14097:14110:21080:21324:21451:21627:21939:30054:30069:30070:30083,0,RBL:183.91.158.132:@cn.fujitsu.com:.lbl8.mailshell.net-64.201.201.201 62.16.2.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: glove59_2db3f2f2e0745 X-Filterd-Recvd-Size: 8258 Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Mon, 27 Apr 2020 08:48:33 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.73,323,1583164800"; d="scan'208";a="90547650" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 27 Apr 2020 16:48:31 +0800 Received: from G08CNEXMBPEKD05.g08.fujitsu.local (unknown [10.167.33.204]) by cn.fujitsu.com (Postfix) with ESMTP id 7F8A84BCC89C; Mon, 27 Apr 2020 16:37:51 +0800 (CST) Received: from G08CNEXJMPEKD02.g08.fujitsu.local (10.167.33.202) by G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 27 Apr 2020 16:48:34 +0800 Received: from G08CNEXCHPEKD05.g08.fujitsu.local (10.167.33.203) by G08CNEXJMPEKD02.g08.fujitsu.local (10.167.33.202) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 27 Apr 2020 16:48:33 +0800 Received: from localhost.localdomain (10.167.225.141) by G08CNEXCHPEKD05.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Mon, 27 Apr 2020 16:48:32 +0800 From: Shiyang Ruan To: , , CC: , , , , , , , , Subject: [RFC PATCH 2/8] mm: add dax-rmap for memory-failure and rmap Date: Mon, 27 Apr 2020 16:47:44 +0800 Message-ID: <20200427084750.136031-3-ruansy.fnst@cn.fujitsu.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com> References: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: 7F8A84BCC89C.A1BFC X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com X-Spam-Status: No X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Memory-failure collects and kill processes who is accessing a posioned, file mmapped page. Add dax-rmap iteration to support reflink case. Also add it for rmap iteration. Signed-off-by: Shiyang Ruan --- mm/memory-failure.c | 63 +++++++++++++++++++++++++++++++++++---------- mm/rmap.c | 54 +++++++++++++++++++++++++++----------- 2 files changed, 88 insertions(+), 29 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index a96364be8ab4..6d7da1fd55fd 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -463,36 +463,71 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill, page_unlock_anon_vma_read(av); } +static void collect_each_procs_file(struct page *page, + struct task_struct *task, + struct list_head *to_kill) +{ + struct vm_area_struct *vma; + struct address_space *mapping = page->mapping; + struct rb_root_cached *root = (struct rb_root_cached *)page_private(page); + struct rb_node *node; + struct shared_file *shared; + pgoff_t pgoff; + + if (dax_mapping(mapping) && root) { + struct shared_file save = { + .mapping = mapping, + .index = page->index, + }; + for (node = rb_first_cached(root); node; node = rb_next(node)) { + shared = container_of(node, struct shared_file, node); + mapping = page->mapping = shared->mapping; + page->index = shared->index; + pgoff = page_to_pgoff(page); + vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, + pgoff) { + if (vma->vm_mm == task->mm) { + // each vma is unique, so is the vaddr. + add_to_kill(task, page, vma, to_kill); + } + } + } + // restore the mapping and index. + page->mapping = save.mapping; + page->index = save.index; + } else { + pgoff = page_to_pgoff(page); + vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { + /* + * Send early kill signal to tasks where a vma covers + * the page but the corrupted page is not necessarily + * mapped it in its pte. + * Assume applications who requested early kill want + * to be informed of all such data corruptions. + */ + if (vma->vm_mm == task->mm) + add_to_kill(task, page, vma, to_kill); + } + } +} + /* * Collect processes when the error hit a file mapped page. */ static void collect_procs_file(struct page *page, struct list_head *to_kill, int force_early) { - struct vm_area_struct *vma; struct task_struct *tsk; struct address_space *mapping = page->mapping; i_mmap_lock_read(mapping); read_lock(&tasklist_lock); for_each_process(tsk) { - pgoff_t pgoff = page_to_pgoff(page); struct task_struct *t = task_early_kill(tsk, force_early); if (!t) continue; - vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, - pgoff) { - /* - * Send early kill signal to tasks where a vma covers - * the page but the corrupted page is not necessarily - * mapped it in its pte. - * Assume applications who requested early kill want - * to be informed of all such data corruptions. - */ - if (vma->vm_mm == t->mm) - add_to_kill(t, page, vma, to_kill); - } + collect_each_procs_file(page, t, to_kill); } read_unlock(&tasklist_lock); i_mmap_unlock_read(mapping); diff --git a/mm/rmap.c b/mm/rmap.c index f79a206b271a..69ea66f9e971 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1870,21 +1870,7 @@ static void rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc, anon_vma_unlock_read(anon_vma); } -/* - * rmap_walk_file - do something to file page using the object-based rmap method - * @page: the page to be handled - * @rwc: control variable according to each walk type - * - * Find all the mappings of a page using the mapping pointer and the vma chains - * contained in the address_space struct it points to. - * - * When called from try_to_munlock(), the mmap_sem of the mm containing the vma - * where the page was found will be held for write. So, we won't recheck - * vm_flags for that VMA. That should be OK, because that vma shouldn't be - * LOCKED. - */ -static void rmap_walk_file(struct page *page, struct rmap_walk_control *rwc, - bool locked) +static void rmap_walk_file_one(struct page *page, struct rmap_walk_control *rwc, bool locked) { struct address_space *mapping = page_mapping(page); pgoff_t pgoff_start, pgoff_end; @@ -1925,6 +1911,44 @@ static void rmap_walk_file(struct page *page, struct rmap_walk_control *rwc, i_mmap_unlock_read(mapping); } +/* + * rmap_walk_file - do something to file page using the object-based rmap method + * @page: the page to be handled + * @rwc: control variable according to each walk type + * + * Find all the mappings of a page using the mapping pointer and the vma chains + * contained in the address_space struct it points to. + * + * When called from try_to_munlock(), the mmap_sem of the mm containing the vma + * where the page was found will be held for write. So, we won't recheck + * vm_flags for that VMA. That should be OK, because that vma shouldn't be + * LOCKED. + */ +static void rmap_walk_file(struct page *page, struct rmap_walk_control *rwc, + bool locked) +{ + struct rb_root_cached *root = (struct rb_root_cached *)page_private(page); + struct rb_node *node; + struct shared_file *shared; + + if (dax_mapping(page->mapping) && root) { + struct shared_file save = { + .mapping = page->mapping, + .index = page->index, + }; + for (node = rb_first_cached(root); node; node = rb_next(node)) { + shared = container_of(node, struct shared_file, node); + page->mapping = shared->mapping; + page->index = shared->index; + rmap_walk_file_one(page, rwc, locked); + } + // restore the mapping and index. + page->mapping = save.mapping; + page->index = save.index; + } else + rmap_walk_file_one(page, rwc, locked); +} + void rmap_walk(struct page *page, struct rmap_walk_control *rwc) { if (unlikely(PageKsm(page))) From patchwork Mon Apr 27 08:47:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruan Shiyang X-Patchwork-Id: 11511641 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D55ED92A for ; Mon, 27 Apr 2020 08:48:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A1E422080C for ; Mon, 27 Apr 2020 08:48:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A1E422080C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DC7088E000D; Mon, 27 Apr 2020 04:48:45 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D509E8E000A; Mon, 27 Apr 2020 04:48:45 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C65B78E000D; Mon, 27 Apr 2020 04:48:45 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0140.hostedemail.com [216.40.44.140]) by kanga.kvack.org (Postfix) with ESMTP id A80BA8E000A for ; Mon, 27 Apr 2020 04:48:45 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 7057F181AC9CB for ; Mon, 27 Apr 2020 08:48:45 +0000 (UTC) X-FDA: 76753009410.07.order30_2e996ff4f1b2d X-Spam-Summary: 2,0,0,956ad7582326cdfb,d41d8cd98f00b204,ruansy.fnst@cn.fujitsu.com,,RULES_HIT:41:355:371:372:379:541:800:960:973:988:989:1260:1261:1277:1311:1313:1314:1345:1359:1437:1515:1516:1518:1535:1543:1711:1730:1747:1777:1792:2194:2198:2199:2200:2393:2559:2562:2731:3138:3139:3140:3141:3142:3355:3865:3866:3867:3868:3870:3871:3872:4321:4605:5007:6119:6120:6261:7875:7903:8603:9036:10004:10400:11026:11473:11537:11658:11914:12043:12291:12296:12297:12438:12521:12555:12679:12683:12895:12986:13019:13141:13161:13229:13230:13894:14096:14097:14181:14721:21080:21451:21627:21740:21795:21796:21990:30003:30036:30051:30054:30079:30080,0,RBL:183.91.158.132:@cn.fujitsu.com:.lbl8.mailshell.net-62.16.2.100 64.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: order30_2e996ff4f1b2d X-Filterd-Recvd-Size: 5884 Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Mon, 27 Apr 2020 08:48:38 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.73,323,1583164800"; d="scan'208";a="90547653" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 27 Apr 2020 16:48:32 +0800 Received: from G08CNEXMBPEKD05.g08.fujitsu.local (unknown [10.167.33.204]) by cn.fujitsu.com (Postfix) with ESMTP id 1A32F4BCC88C; Mon, 27 Apr 2020 16:37:52 +0800 (CST) Received: from G08CNEXCHPEKD05.g08.fujitsu.local (10.167.33.203) by G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 27 Apr 2020 16:48:34 +0800 Received: from localhost.localdomain (10.167.225.141) by G08CNEXCHPEKD05.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Mon, 27 Apr 2020 16:48:34 +0800 From: Shiyang Ruan To: , , CC: , , , , , , , , , Goldwyn Rodrigues Subject: [RFC PATCH 3/8] fs/dax: Introduce dax_copy_edges() for COW Date: Mon, 27 Apr 2020 16:47:45 +0800 Message-ID: <20200427084750.136031-4-ruansy.fnst@cn.fujitsu.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com> References: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: 1A32F4BCC88C.AF766 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com X-Spam-Status: No X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add address output in dax_iomap_pfn() in order to perform a memcpy() in CoW case. Since this function both output address and pfn, rename it to dax_iomap_direct_access(). dax_copy_edges() is a helper functions performs a copy from one part of the device to another for data not page aligned. Signed-off-by: Goldwyn Rodrigues Signed-off-by: Shiyang Ruan --- fs/dax.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 58 insertions(+), 6 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 2f996c566103..8107ed10c851 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1111,8 +1111,8 @@ static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos) return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9; } -static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size, - pfn_t *pfnp) +static int dax_iomap_direct_access(struct iomap *iomap, loff_t pos, size_t size, + pfn_t *pfnp, void **addr) { const sector_t sector = dax_iomap_sector(iomap, pos); pgoff_t pgoff; @@ -1123,12 +1123,14 @@ static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size, if (rc) return rc; id = dax_read_lock(); - length = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(size), - NULL, pfnp); + length = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(size), addr, + pfnp); if (length < 0) { rc = length; goto out; } + if (!pfnp) + goto out_check_addr; rc = -EINVAL; if (PFN_PHYS(length) < size) goto out; @@ -1138,11 +1140,59 @@ static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size, if (length > 1 && !pfn_t_devmap(*pfnp)) goto out; rc = 0; + +out_check_addr: + if (!addr) + goto out; + if (!*addr) + rc = -EFAULT; out: dax_read_unlock(id); return rc; } +/* + * dax_copy_edges - Copies the part of the pages not included in + * the write, but required for CoW because + * offset/offset+length are not page aligned. + */ +static int dax_copy_edges(loff_t pos, loff_t length, struct iomap *srcmap, + void *daddr, bool pmd) +{ + size_t page_size = pmd ? PMD_SIZE : PAGE_SIZE; + loff_t offset = pos & (page_size - 1); + size_t size = ALIGN(offset + length, page_size); + loff_t end = pos + length; + loff_t pg_end = round_up(end, page_size); + void *saddr = 0; + int ret = 0; + + ret = dax_iomap_direct_access(srcmap, pos, size, NULL, &saddr); + if (ret) + return ret; + /* + * Copy the first part of the page + * Note: we pass offset as length + */ + if (offset) { + if (saddr) + ret = memcpy_mcsafe(daddr, saddr, offset); + else + memset(daddr, 0, offset); + } + + /* Copy the last part of the range */ + if (end < pg_end) { + if (saddr) + ret = memcpy_mcsafe(daddr + offset + length, + saddr + offset + length, pg_end - end); + else + memset(daddr + offset + length, 0, + pg_end - end); + } + return ret; +} + /* * The user has performed a load from a hole in the file. Allocating a new * page in the file would cause excessive storage usage for workloads with @@ -1462,7 +1512,8 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); major = VM_FAULT_MAJOR; } - error = dax_iomap_pfn(&iomap, pos, PAGE_SIZE, &pfn); + error = dax_iomap_direct_access(&iomap, pos, PAGE_SIZE, &pfn, + NULL); if (error < 0) goto error_finish_iomap; @@ -1680,7 +1731,8 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, switch (iomap.type) { case IOMAP_MAPPED: - error = dax_iomap_pfn(&iomap, pos, PMD_SIZE, &pfn); + error = dax_iomap_direct_access(&iomap, pos, PMD_SIZE, &pfn, + NULL); if (error < 0) goto finish_iomap; From patchwork Mon Apr 27 08:47:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruan Shiyang X-Patchwork-Id: 11511639 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9084492A for ; Mon, 27 Apr 2020 08:48:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5D4B42080C for ; Mon, 27 Apr 2020 08:48:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5D4B42080C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6245F8E000B; Mon, 27 Apr 2020 04:48:44 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 600E88E000A; Mon, 27 Apr 2020 04:48:44 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4C42D8E000B; Mon, 27 Apr 2020 04:48:44 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0143.hostedemail.com [216.40.44.143]) by kanga.kvack.org (Postfix) with ESMTP id 2EE3E8E000A for ; Mon, 27 Apr 2020 04:48:44 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id E4463824934B for ; Mon, 27 Apr 2020 08:48:43 +0000 (UTC) X-FDA: 76753009326.12.spy47_2f0256a258254 X-Spam-Summary: 2,0,0,97551af3821b58c9,d41d8cd98f00b204,ruansy.fnst@cn.fujitsu.com,,RULES_HIT:41:355:379:541:960:973:988:989:1260:1261:1277:1311:1313:1314:1345:1359:1437:1515:1516:1518:1535:1543:1711:1730:1747:1777:1792:2393:2559:2562:3138:3139:3140:3141:3142:3355:3866:3867:3868:3871:3874:4250:4321:4605:5007:6119:6120:6261:9592:10004:10400:11026:11233:11473:11537:11658:11914:12043:12291:12296:12297:12438:12521:12555:12683:12895:13019:13894:14096:14097:14110:14181:14721:14877:21080:21451:21627:21990:30054:30070:30079,0,RBL:183.91.158.132:@cn.fujitsu.com:.lbl8.mailshell.net-64.201.201.201 62.16.2.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:39,LUA_SUMMARY:none X-HE-Tag: spy47_2f0256a258254 X-Filterd-Recvd-Size: 5640 Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Mon, 27 Apr 2020 08:48:42 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.73,323,1583164800"; d="scan'208";a="90547665" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 27 Apr 2020 16:48:38 +0800 Received: from G08CNEXMBPEKD05.g08.fujitsu.local (unknown [10.167.33.204]) by cn.fujitsu.com (Postfix) with ESMTP id 6332250A996F; Mon, 27 Apr 2020 16:37:53 +0800 (CST) Received: from G08CNEXCHPEKD05.g08.fujitsu.local (10.167.33.203) by G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 27 Apr 2020 16:48:35 +0800 Received: from localhost.localdomain (10.167.225.141) by G08CNEXCHPEKD05.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Mon, 27 Apr 2020 16:48:35 +0800 From: Shiyang Ruan To: , , CC: , , , , , , , , Subject: [RFC PATCH 4/8] fs/dax: copy data before write Date: Mon, 27 Apr 2020 16:47:46 +0800 Message-ID: <20200427084750.136031-5-ruansy.fnst@cn.fujitsu.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com> References: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: 6332250A996F.AB3ED X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com X-Spam-Status: No X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add dax_copy_edges() into each dax actor functions to perform CoW. Signed-off-by: Shiyang Ruan --- fs/dax.c | 37 ++++++++++++++++++++++++++++++++----- 1 file changed, 32 insertions(+), 5 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 8107ed10c851..13a6a1d3c3b3 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1276,9 +1276,6 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data, return iov_iter_zero(min(length, end - pos), iter); } - if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED)) - return -EIO; - /* * Write can allocate block for an area which has a hole page mapped * into page tables. We have to tear down these mappings so that data @@ -1315,6 +1312,12 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data, break; } + if (iomap != srcmap) { + ret = dax_copy_edges(pos, length, srcmap, kaddr, false); + if (ret) + break; + } + map_len = PFN_PHYS(map_len); kaddr += offset; map_len -= offset; @@ -1426,6 +1429,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, vm_fault_t ret = 0; void *entry; pfn_t pfn; + void *kaddr; trace_dax_pte_fault(inode, vmf, ret); /* @@ -1507,19 +1511,27 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, switch (iomap.type) { case IOMAP_MAPPED: +cow: if (iomap.flags & IOMAP_F_NEW) { count_vm_event(PGMAJFAULT); count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); major = VM_FAULT_MAJOR; } error = dax_iomap_direct_access(&iomap, pos, PAGE_SIZE, &pfn, - NULL); + &kaddr); if (error < 0) goto error_finish_iomap; entry = dax_insert_entry(&xas, mapping, vmf, entry, pfn, 0, write && !sync); + if (srcmap.type != IOMAP_HOLE) { + error = dax_copy_edges(pos, PAGE_SIZE, &srcmap, kaddr, + false); + if (error) + goto error_finish_iomap; + } + /* * If we are doing synchronous page fault and inode needs fsync, * we can insert PTE into page tables only after that happens. @@ -1543,6 +1555,9 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, goto finish_iomap; case IOMAP_UNWRITTEN: + if (srcmap.type != IOMAP_HOLE) + goto cow; + /*FALLTHRU*/ case IOMAP_HOLE: if (!write) { ret = dax_load_hole(&xas, mapping, &entry, vmf); @@ -1650,6 +1665,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, loff_t pos; int error; pfn_t pfn; + void *kaddr; /* * Check whether offset isn't beyond end of file now. Caller is @@ -1731,14 +1747,22 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, switch (iomap.type) { case IOMAP_MAPPED: +cow: error = dax_iomap_direct_access(&iomap, pos, PMD_SIZE, &pfn, - NULL); + &kaddr); if (error < 0) goto finish_iomap; entry = dax_insert_entry(&xas, mapping, vmf, entry, pfn, DAX_PMD, write && !sync); + if (srcmap.type != IOMAP_HOLE) { + error = dax_copy_edges(pos, PMD_SIZE, &srcmap, kaddr, + true); + if (error) + goto unlock_entry; + } + /* * If we are doing synchronous page fault and inode needs fsync, * we can insert PMD into page tables only after that happens. @@ -1757,6 +1781,9 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, result = vmf_insert_pfn_pmd(vmf, pfn, write); break; case IOMAP_UNWRITTEN: + if (srcmap.type != IOMAP_HOLE) + goto cow; + /*FALLTHRU*/ case IOMAP_HOLE: if (WARN_ON_ONCE(write)) break; From patchwork Mon Apr 27 08:47:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruan Shiyang X-Patchwork-Id: 11511645 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 56B2F14DD for ; Mon, 27 Apr 2020 08:48:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 238912087E for ; Mon, 27 Apr 2020 08:48:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 238912087E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 479098E000A; Mon, 27 Apr 2020 04:48:46 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 400968E000E; Mon, 27 Apr 2020 04:48:46 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A23C8E000A; Mon, 27 Apr 2020 04:48:46 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0059.hostedemail.com [216.40.44.59]) by kanga.kvack.org (Postfix) with ESMTP id 08C958E000E for ; Mon, 27 Apr 2020 04:48:46 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id B09A58248047 for ; Mon, 27 Apr 2020 08:48:45 +0000 (UTC) X-FDA: 76753009410.17.rice51_2f09460fa0339 X-Spam-Summary: 2,0,0,e2eb5bbdb20f6a10,d41d8cd98f00b204,ruansy.fnst@cn.fujitsu.com,,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1261:1277:1311:1313:1314:1345:1359:1437:1515:1516:1518:1535:1544:1711:1730:1747:1777:1792:2393:2559:2562:3138:3139:3140:3141:3142:3354:3865:3866:3867:3870:3871:3874:4117:4321:5007:6117:6119:6120:6261:8603:10004:11026:11232:11473:11537:11658:11914:12043:12291:12296:12297:12438:12521:12555:12683:12895:12986:13019:13894:14096:14097:14110:14181:14721:14877:21080:21325:21451:21627:21990:30054:30070,0,RBL:183.91.158.132:@cn.fujitsu.com:.lbl8.mailshell.net-62.16.2.100 64.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: rice51_2f09460fa0339 X-Filterd-Recvd-Size: 6814 Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by imf30.hostedemail.com (Postfix) with ESMTP for ; Mon, 27 Apr 2020 08:48:42 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.73,323,1583164800"; d="scan'208";a="90547667" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 27 Apr 2020 16:48:38 +0800 Received: from G08CNEXMBPEKD06.g08.fujitsu.local (unknown [10.167.33.206]) by cn.fujitsu.com (Postfix) with ESMTP id 5ACCE50A9991; Mon, 27 Apr 2020 16:37:54 +0800 (CST) Received: from G08CNEXCHPEKD05.g08.fujitsu.local (10.167.33.203) by G08CNEXMBPEKD06.g08.fujitsu.local (10.167.33.206) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 27 Apr 2020 16:48:34 +0800 Received: from localhost.localdomain (10.167.225.141) by G08CNEXCHPEKD05.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Mon, 27 Apr 2020 16:48:36 +0800 From: Shiyang Ruan To: , , CC: , , , , , , , , , Goldwyn Rodrigues Subject: [RFC PATCH 5/8] fs/dax: replace mmap entry in case of CoW Date: Mon, 27 Apr 2020 16:47:47 +0800 Message-ID: <20200427084750.136031-6-ruansy.fnst@cn.fujitsu.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com> References: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: 5ACCE50A9991.AF0B6 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com X-Spam-Status: No X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We replace the existing entry to the newly allocated one in case of CoW. Also, we mark the entry as PAGECACHE_TAG_TOWRITE so writeback marks this entry as writeprotected. This helps us snapshots so new write pagefaults after snapshots trigger a CoW. Signed-off-by: Goldwyn Rodrigues Signed-off-by: Shiyang Ruan --- fs/dax.c | 37 ++++++++++++++++++++++++++++--------- 1 file changed, 28 insertions(+), 9 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 13a6a1d3c3b3..12096edb2569 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -835,6 +835,9 @@ static int copy_user_dax(struct block_device *bdev, struct dax_device *dax_dev, return 0; } +#define DAX_IF_DIRTY (1ULL << 0) +#define DAX_IF_COW (1ULL << 1) + /* * By this point grab_mapping_entry() has ensured that we have a locked entry * of the appropriate size so we don't have to worry about downgrading PMDs to @@ -844,14 +847,17 @@ static int copy_user_dax(struct block_device *bdev, struct dax_device *dax_dev, */ static void *dax_insert_entry(struct xa_state *xas, struct address_space *mapping, struct vm_fault *vmf, - void *entry, pfn_t pfn, unsigned long flags, bool dirty) + void *entry, pfn_t pfn, unsigned long flags, + unsigned int insert_flags) { void *new_entry = dax_make_entry(pfn, flags); + bool dirty = insert_flags & DAX_IF_DIRTY; + bool cow = insert_flags & DAX_IF_COW; if (dirty) __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); - if (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE)) { + if (cow || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) { unsigned long index = xas->xa_index; /* we are replacing a zero page with block mapping */ if (dax_is_pmd_entry(entry)) @@ -863,7 +869,7 @@ static void *dax_insert_entry(struct xa_state *xas, xas_reset(xas); xas_lock_irq(xas); - if (dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) { + if (cow || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) { void *old; dax_disassociate_entry(entry, mapping, xas->xa_index, false); @@ -887,6 +893,9 @@ static void *dax_insert_entry(struct xa_state *xas, if (dirty) xas_set_mark(xas, PAGECACHE_TAG_DIRTY); + if (cow) + xas_set_mark(xas, PAGECACHE_TAG_TOWRITE); + xas_unlock_irq(xas); return entry; } @@ -1430,6 +1439,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, void *entry; pfn_t pfn; void *kaddr; + unsigned long insert_flags = 0; trace_dax_pte_fault(inode, vmf, ret); /* @@ -1522,8 +1532,10 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, if (error < 0) goto error_finish_iomap; - entry = dax_insert_entry(&xas, mapping, vmf, entry, pfn, - 0, write && !sync); + if (write && !sync) + insert_flags |= DAX_IF_DIRTY; + entry = dax_insert_entry(&xas, mapping, vmf, entry, pfn, 0, + insert_flags); if (srcmap.type != IOMAP_HOLE) { error = dax_copy_edges(pos, PAGE_SIZE, &srcmap, kaddr, @@ -1555,8 +1567,10 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, goto finish_iomap; case IOMAP_UNWRITTEN: - if (srcmap.type != IOMAP_HOLE) + if (srcmap.type != IOMAP_HOLE) { + insert_flags |= DAX_IF_COW; goto cow; + } /*FALLTHRU*/ case IOMAP_HOLE: if (!write) { @@ -1614,7 +1628,7 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf, pfn = page_to_pfn_t(zero_page); *entry = dax_insert_entry(xas, mapping, vmf, *entry, pfn, - DAX_PMD | DAX_ZERO_PAGE, false); + DAX_PMD | DAX_ZERO_PAGE, 0); if (arch_needs_pgtable_deposit()) { pgtable = pte_alloc_one(vma->vm_mm); @@ -1666,6 +1680,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, int error; pfn_t pfn; void *kaddr; + unsigned long insert_flags = 0; /* * Check whether offset isn't beyond end of file now. Caller is @@ -1753,8 +1768,10 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, if (error < 0) goto finish_iomap; + if (write && !sync) + insert_flags |= DAX_IF_DIRTY; entry = dax_insert_entry(&xas, mapping, vmf, entry, pfn, - DAX_PMD, write && !sync); + DAX_PMD, insert_flags); if (srcmap.type != IOMAP_HOLE) { error = dax_copy_edges(pos, PMD_SIZE, &srcmap, kaddr, @@ -1781,8 +1798,10 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, result = vmf_insert_pfn_pmd(vmf, pfn, write); break; case IOMAP_UNWRITTEN: - if (srcmap.type != IOMAP_HOLE) + if (srcmap.type != IOMAP_HOLE) { + insert_flags |= DAX_IF_COW; goto cow; + } /*FALLTHRU*/ case IOMAP_HOLE: if (WARN_ON_ONCE(write)) From patchwork Mon Apr 27 08:47:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruan Shiyang X-Patchwork-Id: 11511637 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7416D14DD for ; Mon, 27 Apr 2020 08:48:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3DCFA214D8 for ; Mon, 27 Apr 2020 08:48:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3DCFA214D8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 032828E0001; Mon, 27 Apr 2020 04:48:43 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F25BC8E000A; Mon, 27 Apr 2020 04:48:42 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D78FE8E0001; Mon, 27 Apr 2020 04:48:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0101.hostedemail.com [216.40.44.101]) by kanga.kvack.org (Postfix) with ESMTP id A749C8E000A for ; Mon, 27 Apr 2020 04:48:42 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 601F8689B for ; Mon, 27 Apr 2020 08:48:42 +0000 (UTC) X-FDA: 76753009284.19.fan88_2ec0019aa6551 X-Spam-Summary: 2,0,0,c4243ef3de40ac26,d41d8cd98f00b204,ruansy.fnst@cn.fujitsu.com,,RULES_HIT:2:41:355:379:541:800:960:973:988:989:1260:1261:1277:1311:1313:1314:1345:1359:1434:1437:1515:1516:1518:1535:1730:1747:1777:1792:2194:2198:2199:2200:2393:2553:2559:2562:2731:2736:3138:3139:3140:3141:3142:3353:3865:3866:3867:3868:3872:3874:4050:4120:4321:4605:5007:6120:6261:7875:8603:9108:9592:10004:11026:11233:11473:11537:11658:11914:12043:12291:12296:12297:12438:12521:12555:12683:12895:12986:13019:13894:14096:14097:21080:21451:21627:21796:21990:30036:30051:30054:30070:30080:30090,0,RBL:183.91.158.132:@cn.fujitsu.com:.lbl8.mailshell.net-64.201.201.201 62.16.2.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: fan88_2ec0019aa6551 X-Filterd-Recvd-Size: 9317 Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Mon, 27 Apr 2020 08:48:40 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.73,323,1583164800"; d="scan'208";a="90547663" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 27 Apr 2020 16:48:38 +0800 Received: from G08CNEXMBPEKD04.g08.fujitsu.local (unknown [10.167.33.201]) by cn.fujitsu.com (Postfix) with ESMTP id AB4104BCC88F; Mon, 27 Apr 2020 16:37:55 +0800 (CST) Received: from G08CNEXCHPEKD05.g08.fujitsu.local (10.167.33.203) by G08CNEXMBPEKD04.g08.fujitsu.local (10.167.33.201) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 27 Apr 2020 16:48:38 +0800 Received: from localhost.localdomain (10.167.225.141) by G08CNEXCHPEKD05.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Mon, 27 Apr 2020 16:48:37 +0800 From: Shiyang Ruan To: , , CC: , , , , , , , , , Goldwyn Rodrigues Subject: [RFC PATCH 6/8] fs/dax: dedup file range to use a compare function Date: Mon, 27 Apr 2020 16:47:48 +0800 Message-ID: <20200427084750.136031-7-ruansy.fnst@cn.fujitsu.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com> References: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: AB4104BCC88F.A0635 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com X-Spam-Status: No X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: With dax we cannot deal with readpage() etc. So, we create a funciton callback to perform the file data comparison and pass it to generic_remap_file_range_prep() so it can use iomap-based functions. This may not be the best way to solve this. Suggestions welcome. Signed-off-by: Goldwyn Rodrigues Signed-off-by: Shiyang Ruan --- fs/dax.c | 60 ++++++++++++++++++++++++++++++++++++++++++++ fs/ocfs2/file.c | 2 +- fs/read_write.c | 11 ++++---- fs/xfs/xfs_reflink.c | 2 +- include/linux/dax.h | 5 ++++ include/linux/fs.h | 9 ++++++- 6 files changed, 81 insertions(+), 8 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 12096edb2569..ab6be7749105 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -31,6 +31,8 @@ #define CREATE_TRACE_POINTS #include +#define MIN(a, b) (((a) < (b)) ? (a) : (b)) + static inline unsigned int pe_order(enum page_entry_size pe_size) { if (pe_size == PE_SIZE_PTE) @@ -1942,3 +1944,61 @@ vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf, return dax_insert_pfn_mkwrite(vmf, pfn, order); } EXPORT_SYMBOL_GPL(dax_finish_sync_fault); + +int dax_file_range_compare(struct inode *src, loff_t srcoff, struct inode *dest, + loff_t destoff, loff_t len, bool *is_same, const struct iomap_ops *ops) +{ + void *saddr, *daddr; + struct iomap s_iomap = {0}; + struct iomap d_iomap = {0}; + bool same = true; + loff_t cmp_len; + int id, ret = 0; + + id = dax_read_lock(); + while (len) { + ret = ops->iomap_begin(src, srcoff, len, 0, &s_iomap, NULL); + if (ret < 0) + goto out_src; + cmp_len = MIN(len, s_iomap.offset + s_iomap.length - srcoff); + + ret = ops->iomap_begin(dest, destoff, cmp_len, 0, &d_iomap, NULL); + if (ret < 0) + goto out_dest; + cmp_len = MIN(cmp_len, d_iomap.offset + d_iomap.length - destoff); + + ret = dax_iomap_direct_access(&s_iomap, srcoff, + ALIGN(srcoff + cmp_len, PAGE_SIZE), + NULL, &saddr); + if (ret < 0) + goto out_dest; + + ret = dax_iomap_direct_access(&d_iomap, destoff, + ALIGN(destoff + cmp_len, PAGE_SIZE), + NULL, &daddr); + if (ret < 0) + goto out_dest; + + same = !memcmp(saddr, daddr, cmp_len); + if (!same) + break; + len -= cmp_len; + srcoff += cmp_len; + destoff += cmp_len; +out_dest: + if (ops->iomap_end) + ops->iomap_end(dest, destoff, len, 0, 0, &d_iomap); +out_src: + if (ops->iomap_end) + ops->iomap_end(src, srcoff, len, 0, 0, &s_iomap); + + if (ret < 0) + goto out; + + } + *is_same = same; +out: + dax_read_unlock(id); + return ret; +} +EXPORT_SYMBOL_GPL(dax_file_range_compare); diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index 6cd5e4924e4d..6fa6e1ac6357 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -2591,7 +2591,7 @@ static loff_t ocfs2_remap_file_range(struct file *file_in, loff_t pos_in, goto out_unlock; ret = generic_remap_file_range_prep(file_in, pos_in, file_out, pos_out, - &len, remap_flags); + &len, remap_flags, vfs_dedupe_file_range_compare); if (ret < 0 || len == 0) goto out_unlock; diff --git a/fs/read_write.c b/fs/read_write.c index bbfa9b12b15e..e08875f35f10 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1908,7 +1908,7 @@ static void vfs_unlock_two_pages(struct page *page1, struct page *page2) * Compare extents of two files to see if they are the same. * Caller must have locked both inodes to prevent write races. */ -static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff, +int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff, struct inode *dest, loff_t destoff, loff_t len, bool *is_same) { @@ -1989,6 +1989,7 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff, out_error: return error; } +EXPORT_SYMBOL_GPL(vfs_dedupe_file_range_compare); /* * Check that the two inodes are eligible for cloning, the ranges make @@ -2000,7 +2001,8 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff, */ int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, - loff_t *len, unsigned int remap_flags) + loff_t *len, unsigned int remap_flags, + compare_range_t compare) { struct inode *inode_in = file_inode(file_in); struct inode *inode_out = file_inode(file_out); @@ -2059,9 +2061,8 @@ int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, */ if (remap_flags & REMAP_FILE_DEDUP) { bool is_same = false; - - ret = vfs_dedupe_file_range_compare(inode_in, pos_in, - inode_out, pos_out, *len, &is_same); + ret = (*compare)(inode_in, pos_in, + inode_out, pos_out, *len, &is_same); if (ret) return ret; if (!is_same) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 107bf2a2f344..792217cd1e64 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1338,7 +1338,7 @@ xfs_reflink_remap_prep( goto out_unlock; ret = generic_remap_file_range_prep(file_in, pos_in, file_out, pos_out, - len, remap_flags); + len, remap_flags, vfs_dedupe_file_range_compare); if (ret < 0 || *len == 0) goto out_unlock; diff --git a/include/linux/dax.h b/include/linux/dax.h index 1e2e81c701b6..f8143fe24bf5 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -223,6 +223,11 @@ int dax_invalidate_mapping_entry_sync(struct address_space *mapping, pgoff_t index); int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size, struct iomap *iomap); +int dax_file_range_compare(struct inode *src, loff_t srcoff, + struct inode *dest, loff_t destoff, + loff_t len, bool *is_same, + const struct iomap_ops *ops); + static inline bool dax_mapping(struct address_space *mapping) { return mapping->host && IS_DAX(mapping->host); diff --git a/include/linux/fs.h b/include/linux/fs.h index 4f6f59b4f22a..959173cdeb05 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1924,13 +1924,20 @@ extern ssize_t vfs_readv(struct file *, const struct iovec __user *, unsigned long, loff_t *, rwf_t); extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *, loff_t, size_t, unsigned int); +extern int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff, + struct inode *dest, loff_t destoff, + loff_t len, bool *is_same); +typedef int (*compare_range_t)(struct inode *src, loff_t srcpos, + struct inode *dest, loff_t destpos, + loff_t len, bool *is_same); extern ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, size_t len, unsigned int flags); extern int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t *count, - unsigned int remap_flags); + unsigned int remap_flags, + compare_range_t cmp); extern loff_t do_clone_file_range(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t len, unsigned int remap_flags); From patchwork Mon Apr 27 08:47:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruan Shiyang X-Patchwork-Id: 11511631 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ECC9A1805 for ; Mon, 27 Apr 2020 08:48:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B8229206D4 for ; Mon, 27 Apr 2020 08:48:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B8229206D4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D7BD28E000C; Mon, 27 Apr 2020 04:48:42 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id CC7948E000B; Mon, 27 Apr 2020 04:48:42 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF04D8E0001; Mon, 27 Apr 2020 04:48:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0121.hostedemail.com [216.40.44.121]) by kanga.kvack.org (Postfix) with ESMTP id 900A48E0001 for ; Mon, 27 Apr 2020 04:48:42 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 53B19181AC9CB for ; Mon, 27 Apr 2020 08:48:42 +0000 (UTC) X-FDA: 76753009284.07.front90_2eaafd49c1135 X-Spam-Summary: 2,0,0,77c6cae6991ff97b,d41d8cd98f00b204,ruansy.fnst@cn.fujitsu.com,,RULES_HIT:41:355:379:421:541:960:966:973:988:989:1260:1261:1277:1311:1313:1314:1345:1359:1437:1515:1516:1518:1535:1544:1711:1730:1747:1777:1792:2196:2199:2393:2553:2559:2562:3138:3139:3140:3141:3142:3355:3622:3865:3866:3867:3868:3870:3871:3874:4250:4321:4385:4605:5007:6120:6261:7875:10004:11026:11473:11537:11658:11914:12043:12296:12297:12438:12521:12555:12679:12895:12986:13019:13894:14096:14097:14181:14721:21080:21433:21451:21627:21990:30054:30090,0,RBL:183.91.158.132:@cn.fujitsu.com:.lbl8.mailshell.net-62.16.2.100 64.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:27,LUA_SUMMARY:none X-HE-Tag: front90_2eaafd49c1135 X-Filterd-Recvd-Size: 5999 Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by imf30.hostedemail.com (Postfix) with ESMTP for ; Mon, 27 Apr 2020 08:48:40 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.73,323,1583164800"; d="scan'208";a="90547664" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 27 Apr 2020 16:48:38 +0800 Received: from G08CNEXMBPEKD04.g08.fujitsu.local (unknown [10.167.33.201]) by cn.fujitsu.com (Postfix) with ESMTP id B54804BCC892; Mon, 27 Apr 2020 16:37:56 +0800 (CST) Received: from G08CNEXCHPEKD05.g08.fujitsu.local (10.167.33.203) by G08CNEXMBPEKD04.g08.fujitsu.local (10.167.33.201) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 27 Apr 2020 16:48:39 +0800 Received: from localhost.localdomain (10.167.225.141) by G08CNEXCHPEKD05.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Mon, 27 Apr 2020 16:48:38 +0800 From: Shiyang Ruan To: , , CC: , , , , , , , , Subject: [RFC PATCH 7/8] fs/xfs: handle CoW for fsdax write() path Date: Mon, 27 Apr 2020 16:47:49 +0800 Message-ID: <20200427084750.136031-8-ruansy.fnst@cn.fujitsu.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com> References: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: B54804BCC892.AE0AE X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com X-Spam-Status: No X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In fsdax mode, WRITE and ZERO on a shared extent need CoW mechanism performed. After CoW, new extents needs to be remapped to the file. Signed-off-by: Shiyang Ruan fs/dax.c: fix nrexceptional count error. --- fs/xfs/xfs_bmap_util.c | 6 +++++- fs/xfs/xfs_file.c | 10 +++++++--- fs/xfs/xfs_iomap.c | 3 ++- fs/xfs/xfs_iops.c | 11 ++++++++--- fs/xfs/xfs_reflink.c | 2 ++ 5 files changed, 24 insertions(+), 8 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 4f800f7fe888..8980120415ba 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -969,10 +969,14 @@ xfs_free_file_space( if (offset + len > XFS_ISIZE(ip)) len = XFS_ISIZE(ip) - offset; error = iomap_zero_range(VFS_I(ip), offset, len, NULL, - &xfs_buffered_write_iomap_ops); + IS_DAX(VFS_I(ip)) ? + &xfs_direct_write_iomap_ops : &xfs_buffered_write_iomap_ops); if (error) return error; + if (xfs_is_reflink_inode(ip)) + xfs_reflink_end_cow(ip, offset, len); + /* * If we zeroed right up to EOF and EOF straddles a page boundary we * must make sure that the post-EOF area is also zeroed because the diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 4b8bdecc3863..2f8a83b2605d 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -588,9 +588,13 @@ xfs_file_dax_write( trace_xfs_file_dax_write(ip, count, pos); ret = dax_iomap_rw(iocb, from, &xfs_direct_write_iomap_ops); - if (ret > 0 && iocb->ki_pos > i_size_read(inode)) { - i_size_write(inode, iocb->ki_pos); - error = xfs_setfilesize(ip, pos, ret); + if (ret > 0) { + if (iocb->ki_pos > i_size_read(inode)) { + i_size_write(inode, iocb->ki_pos); + error = xfs_setfilesize(ip, pos, ret); + } + if (xfs_is_cow_inode(ip)) + xfs_reflink_end_cow(ip, pos, ret); } out: xfs_iunlock(ip, iolock); diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index bb590a267a7f..ca92f76498c9 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -754,13 +754,14 @@ xfs_direct_write_iomap_begin( goto out_unlock; if (imap_needs_cow(ip, flags, &imap, nimaps)) { + bool need_convert = flags & IOMAP_DIRECT || IS_DAX(inode); error = -EAGAIN; if (flags & IOMAP_NOWAIT) goto out_unlock; /* may drop and re-acquire the ilock */ error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared, - &lockmode, flags & IOMAP_DIRECT); + &lockmode, need_convert); if (error) goto out_unlock; if (shared) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index f7a99b3bbcf7..92b58ae06c0e 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -864,6 +864,7 @@ xfs_setattr_size( int error; uint lock_flags = 0; bool did_zeroing = false; + const struct iomap_ops *ops; ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL)); ASSERT(xfs_isilocked(ip, XFS_MMAPLOCK_EXCL)); @@ -910,13 +911,17 @@ xfs_setattr_size( * extension, or zeroing out the rest of the block on a downward * truncate. */ + if (IS_DAX(inode)) + ops = &xfs_direct_write_iomap_ops; + else + ops = &xfs_buffered_write_iomap_ops; + if (newsize > oldsize) { trace_xfs_zero_eof(ip, oldsize, newsize - oldsize); error = iomap_zero_range(inode, oldsize, newsize - oldsize, - &did_zeroing, &xfs_buffered_write_iomap_ops); + &did_zeroing, ops); } else { - error = iomap_truncate_page(inode, newsize, &did_zeroing, - &xfs_buffered_write_iomap_ops); + error = iomap_truncate_page(inode, newsize, &did_zeroing, ops); } if (error) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 792217cd1e64..f87ab78dd421 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1269,6 +1269,8 @@ xfs_reflink_zero_posteof( trace_xfs_zero_eof(ip, isize, pos - isize); return iomap_zero_range(VFS_I(ip), isize, pos - isize, NULL, + IS_DAX(VFS_I(ip)) ? + &xfs_direct_write_iomap_ops : &xfs_buffered_write_iomap_ops); } From patchwork Mon Apr 27 08:47:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruan Shiyang X-Patchwork-Id: 11511623 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 94DB71805 for ; Mon, 27 Apr 2020 08:48:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 628A12080C for ; Mon, 27 Apr 2020 08:48:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 628A12080C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D9FB58E0009; Mon, 27 Apr 2020 04:48:40 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D76A08E0001; Mon, 27 Apr 2020 04:48:40 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C8EF18E0009; Mon, 27 Apr 2020 04:48:40 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id AE6998E0001 for ; Mon, 27 Apr 2020 04:48:40 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 6F0E4180AD804 for ; Mon, 27 Apr 2020 08:48:40 +0000 (UTC) X-FDA: 76753009200.19.feast49_2e8344f49794f X-Spam-Summary: 2,0,0,e8a36bbf4ba5816f,d41d8cd98f00b204,ruansy.fnst@cn.fujitsu.com,,RULES_HIT:41:69:355:379:541:960:973:988:989:1260:1261:1277:1311:1313:1314:1345:1359:1437:1515:1516:1518:1535:1544:1605:1711:1730:1747:1777:1792:2194:2199:2393:2559:2562:2693:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3872:4117:4250:4321:4605:5007:6119:6120:6261:8660:9592:10004:11026:11473:11537:11658:11914:12043:12291:12296:12297:12438:12521:12555:12683:12895:13019:13148:13230:13894:14096:14097:14110:14181:14721:21080:21222:21324:21451:21627:21990:30054,0,RBL:183.91.158.132:@cn.fujitsu.com:.lbl8.mailshell.net-64.201.201.201 62.16.2.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:16,LUA_SUMMARY:none X-HE-Tag: feast49_2e8344f49794f X-Filterd-Recvd-Size: 6372 Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Mon, 27 Apr 2020 08:48:39 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.73,323,1583164800"; d="scan'208";a="90547660" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 27 Apr 2020 16:48:37 +0800 Received: from G08CNEXMBPEKD04.g08.fujitsu.local (unknown [10.167.33.201]) by cn.fujitsu.com (Postfix) with ESMTP id C716A4BCC89B; Mon, 27 Apr 2020 16:37:57 +0800 (CST) Received: from G08CNEXCHPEKD05.g08.fujitsu.local (10.167.33.203) by G08CNEXMBPEKD04.g08.fujitsu.local (10.167.33.201) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 27 Apr 2020 16:48:40 +0800 Received: from localhost.localdomain (10.167.225.141) by G08CNEXCHPEKD05.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Mon, 27 Apr 2020 16:48:39 +0800 From: Shiyang Ruan To: , , CC: , , , , , , , , Subject: [RFC PATCH 8/8] fs/xfs: support dedupe for fsdax Date: Mon, 27 Apr 2020 16:47:50 +0800 Message-ID: <20200427084750.136031-9-ruansy.fnst@cn.fujitsu.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com> References: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: C716A4BCC89B.A09FE X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com X-Spam-Status: No X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Use xfs_break_layouts() to break files' layouts when locking them. And call dax_file_range_compare() function to compare range for files both DAX. Signed-off-by: Shiyang Ruan --- fs/xfs/xfs_reflink.c | 77 ++++++++++++++++++++++++++------------------ 1 file changed, 45 insertions(+), 32 deletions(-) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index f87ab78dd421..efbe3464ae85 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1185,47 +1185,41 @@ xfs_reflink_remap_blocks( * back out both locks. */ static int -xfs_iolock_two_inodes_and_break_layout( - struct inode *src, - struct inode *dest) +xfs_reflink_remap_lock_and_break_layouts( + struct file *file_in, + struct file *file_out) { int error; + struct inode *inode_in = file_inode(file_in); + struct xfs_inode *src = XFS_I(inode_in); + struct inode *inode_out = file_inode(file_out); + struct xfs_inode *dest = XFS_I(inode_out); + uint iolock = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL; - if (src > dest) + if (inode_in > inode_out) { + swap(inode_in, inode_out); swap(src, dest); - -retry: - /* Wait to break both inodes' layouts before we start locking. */ - error = break_layout(src, true); - if (error) - return error; - if (src != dest) { - error = break_layout(dest, true); - if (error) - return error; } - /* Lock one inode and make sure nobody got in and leased it. */ - inode_lock(src); - error = break_layout(src, false); + inode_lock(inode_in); + xfs_ilock(src, XFS_MMAPLOCK_EXCL); + error = xfs_break_layouts(inode_in, &iolock, BREAK_UNMAP); + xfs_iunlock(src, XFS_MMAPLOCK_EXCL); if (error) { - inode_unlock(src); - if (error == -EWOULDBLOCK) - goto retry; + inode_unlock(inode_in); return error; } - if (src == dest) + if (inode_in == inode_out) return 0; - /* Lock the other inode and make sure nobody got in and leased it. */ - inode_lock_nested(dest, I_MUTEX_NONDIR2); - error = break_layout(dest, false); + inode_lock_nested(inode_out, I_MUTEX_NONDIR2); + xfs_ilock(dest, XFS_MMAPLOCK_EXCL); + error = xfs_break_layouts(inode_out, &iolock, BREAK_UNMAP); + xfs_iunlock(dest, XFS_MMAPLOCK_EXCL); if (error) { - inode_unlock(src); - inode_unlock(dest); - if (error == -EWOULDBLOCK) - goto retry; + inode_unlock(inode_in); + inode_unlock(inode_out); return error; } @@ -1244,6 +1238,11 @@ xfs_reflink_remap_unlock( struct xfs_inode *dest = XFS_I(inode_out); bool same_inode = (inode_in == inode_out); + if (inode_in > inode_out) { + swap(inode_in, inode_out); + swap(src, dest); + } + xfs_iunlock(dest, XFS_MMAPLOCK_EXCL); if (!same_inode) xfs_iunlock(src, XFS_MMAPLOCK_EXCL); @@ -1274,6 +1273,14 @@ xfs_reflink_zero_posteof( &xfs_buffered_write_iomap_ops); } +int xfs_reflink_dedupe_file_range_compare(struct inode *src, loff_t srcoff, + struct inode *dest, loff_t destoff, + loff_t len, bool *is_same) +{ + return dax_file_range_compare(src, srcoff, dest, destoff, len, is_same, + &xfs_read_iomap_ops); +} + /* * Prepare two files for range cloning. Upon a successful return both inodes * will have the iolock and mmaplock held, the page cache of the out file will @@ -1318,9 +1325,10 @@ xfs_reflink_remap_prep( struct xfs_inode *dest = XFS_I(inode_out); bool same_inode = (inode_in == inode_out); ssize_t ret; + compare_range_t cmp; /* Lock both files against IO */ - ret = xfs_iolock_two_inodes_and_break_layout(inode_in, inode_out); + ret = xfs_reflink_remap_lock_and_break_layouts(file_in, file_out); if (ret) return ret; if (same_inode) @@ -1335,12 +1343,17 @@ xfs_reflink_remap_prep( if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest)) goto out_unlock; - /* Don't share DAX file data for now. */ - if (IS_DAX(inode_in) || IS_DAX(inode_out)) + /* Don't share DAX file data with non-DAX file. */ + if (IS_DAX(inode_in) != IS_DAX(inode_out)) goto out_unlock; + if (IS_DAX(inode_in)) + cmp = xfs_reflink_dedupe_file_range_compare; + else + cmp = vfs_dedupe_file_range_compare; + ret = generic_remap_file_range_prep(file_in, pos_in, file_out, pos_out, - len, remap_flags, vfs_dedupe_file_range_compare); + len, remap_flags, cmp); if (ret < 0 || *len == 0) goto out_unlock;