From patchwork Wed Jul 12 08:49:59 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lu Fengqi X-Patchwork-Id: 9836345 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id B991D602BD for ; Wed, 12 Jul 2017 08:51:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A89DD2853B for ; Wed, 12 Jul 2017 08:51:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9D0DB28584; Wed, 12 Jul 2017 08:51:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 54DF42853B for ; Wed, 12 Jul 2017 08:51:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755896AbdGLIun (ORCPT ); Wed, 12 Jul 2017 04:50:43 -0400 Received: from cn.fujitsu.com ([59.151.112.132]:33351 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1755859AbdGLIuk (ORCPT ); Wed, 12 Jul 2017 04:50:40 -0400 X-IronPort-AV: E=Sophos;i="5.22,518,1449504000"; d="scan'208";a="21185017" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 12 Jul 2017 16:50:30 +0800 Received: from G08CNEXCHPEKD01.g08.fujitsu.local (unknown [10.167.33.80]) by cn.fujitsu.com (Postfix) with ESMTP id 6395D46B5254; Wed, 12 Jul 2017 16:50:29 +0800 (CST) Received: from localhost.localdomain (10.167.226.155) by G08CNEXCHPEKD01.g08.fujitsu.local (10.167.33.89) with Microsoft SMTP Server (TLS) id 14.3.319.2; Wed, 12 Jul 2017 16:50:28 +0800 From: Lu Fengqi To: CC: Qu Wenruo , Wang Xiaoguang Subject: [PATCH v14.4 12/15] btrfs: dedupe: Inband in-memory only de-duplication implement Date: Wed, 12 Jul 2017 16:49:59 +0800 Message-ID: <20170712085002.23241-13-lufq.fnst@cn.fujitsu.com> X-Mailer: git-send-email 2.13.2 In-Reply-To: <20170712085002.23241-1-lufq.fnst@cn.fujitsu.com> References: <20170712085002.23241-1-lufq.fnst@cn.fujitsu.com> MIME-Version: 1.0 X-Originating-IP: [10.167.226.155] X-yoursite-MailScanner-ID: 6395D46B5254.A09DB X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: lufq.fnst@cn.fujitsu.com Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Qu Wenruo Core implement for inband de-duplication. It reuse the async_cow_start() facility to do the calculate dedupe hash. And use dedupe hash to do inband de-duplication at extent level. The work flow is as below: 1) Run delalloc range for an inode 2) Calculate hash for the delalloc range at the unit of dedupe_bs 3) For hash match(duplicated) case, just increase source extent ref and insert file extent. For hash mismatch case, go through the normal cow_file_range() fallback, and add hash into dedupe_tree. Compress for hash miss case is not supported yet. Current implement restore all dedupe hash in memory rb-tree, with LRU behavior to control the limit. Signed-off-by: Wang Xiaoguang Signed-off-by: Qu Wenruo Signed-off-by: Lu Fengqi --- fs/btrfs/ctree.h | 4 +- fs/btrfs/dedupe.h | 18 +++ fs/btrfs/extent-tree.c | 33 ++++- fs/btrfs/extent_io.c | 10 +- fs/btrfs/extent_io.h | 1 + fs/btrfs/file.c | 3 + fs/btrfs/inode.c | 329 ++++++++++++++++++++++++++++++++++++++++--------- fs/btrfs/ioctl.c | 1 + fs/btrfs/relocation.c | 17 +++ 9 files changed, 350 insertions(+), 66 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 965c6615d882..337d9b7cc4a3 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -116,9 +116,11 @@ static inline u32 count_max_extents(u64 size, u64 max_extent_size) enum btrfs_metadata_reserve_type { BTRFS_RESERVE_NORMAL, BTRFS_RESERVE_COMPRESS, + BTRFS_RESERVE_DEDUPE, }; -u64 btrfs_max_extent_size(enum btrfs_metadata_reserve_type reserve_type); +u64 btrfs_max_extent_size(struct btrfs_inode *inode, + enum btrfs_metadata_reserve_type reserve_type); int inode_need_compress(struct inode *inode); struct btrfs_mapping_tree { diff --git a/fs/btrfs/dedupe.h b/fs/btrfs/dedupe.h index 8311ee13ca83..3a15fc2069b9 100644 --- a/fs/btrfs/dedupe.h +++ b/fs/btrfs/dedupe.h @@ -22,6 +22,7 @@ #include #include #include +#include "btrfs_inode.h" static const int btrfs_hash_sizes[] = { 32 }; @@ -63,6 +64,23 @@ struct btrfs_dedupe_info { struct btrfs_trans_handle; +static inline u64 btrfs_dedupe_blocksize(struct btrfs_inode *inode) +{ + struct btrfs_fs_info *fs_info = inode->root->fs_info; + + return fs_info->dedupe_info->blocksize; +} + +static inline int inode_need_dedupe(struct inode *inode) +{ + struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; + + if (!fs_info->dedupe_enabled) + return 0; + + return 1; +} + static inline int btrfs_dedupe_hash_hit(struct btrfs_dedupe_hash *hash) { return (hash && hash->bytenr); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 892a47b13deb..05acf9bc11df 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -38,6 +38,7 @@ #include "math.h" #include "sysfs.h" #include "qgroup.h" +#include "dedupe.h" #undef SCRAMBLE_DELAYED_REFS @@ -2429,6 +2430,7 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans, if (btrfs_delayed_ref_is_head(node)) { struct btrfs_delayed_ref_head *head; + /* * we've hit the end of the chain and we were supposed * to insert this extent into the tree. But, it got @@ -2453,6 +2455,18 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans, btrfs_pin_extent(fs_info, node->bytenr, node->num_bytes, 1); if (head->is_data) { + /* + * If insert_reserved is given, it means + * a new extent is revered, then deleted + * in one tran, and inc/dec get merged to 0. + * + * In this case, we need to remove its dedupe + * hash. + */ + ret = btrfs_dedupe_del(trans, fs_info, + node->bytenr); + if (ret < 0) + return ret; ret = btrfs_del_csums(trans, fs_info, node->bytenr, node->num_bytes); @@ -5916,7 +5930,7 @@ static unsigned drop_outstanding_extent(struct btrfs_inode *inode, unsigned drop_inode_space = 0; unsigned dropped_extents = 0; unsigned num_extents; - u64 max_extent_size = btrfs_max_extent_size(reserve_type); + u64 max_extent_size = btrfs_max_extent_size(inode, reserve_type); num_extents = count_max_extents(num_bytes, max_extent_size); ASSERT(num_extents); @@ -5985,15 +5999,17 @@ static u64 calc_csum_metadata_size(struct btrfs_inode *inode, u64 num_bytes, return btrfs_calc_trans_metadata_size(fs_info, old_csums - num_csums); } -u64 btrfs_max_extent_size(enum btrfs_metadata_reserve_type reserve_type) +u64 btrfs_max_extent_size(struct btrfs_inode *inode, + enum btrfs_metadata_reserve_type reserve_type) { if (reserve_type == BTRFS_RESERVE_NORMAL) return BTRFS_MAX_EXTENT_SIZE; else if (reserve_type == BTRFS_RESERVE_COMPRESS) return SZ_128K; - - ASSERT(0); - return BTRFS_MAX_EXTENT_SIZE; + else if (reserve_type == BTRFS_RESERVE_DEDUPE) + return btrfs_dedupe_blocksize(inode); + else + return BTRFS_MAX_EXTENT_SIZE; } int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes, @@ -6009,7 +6025,7 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes, int ret = 0; bool delalloc_lock = true; u64 to_free = 0; - u64 max_extent_size = btrfs_max_extent_size(reserve_type); + u64 max_extent_size = btrfs_max_extent_size(inode, reserve_type); unsigned dropped; bool release_extra = false; @@ -7122,6 +7138,11 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, btrfs_release_path(path); if (is_data) { + ret = btrfs_dedupe_del(trans, info, bytenr); + if (ret < 0) { + btrfs_abort_transaction(trans, ret); + goto out; + } ret = btrfs_del_csums(trans, info, bytenr, num_bytes); if (ret) { btrfs_abort_transaction(trans, ret); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index c3601069e5b3..be97d679aaac 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -597,7 +597,7 @@ static int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, btrfs_debug_check_extent_io_range(tree, start, end); if (bits & EXTENT_DELALLOC) - bits |= EXTENT_NORESERVE | EXTENT_COMPRESS; + bits |= EXTENT_NORESERVE | EXTENT_COMPRESS | EXTENT_DEDUPE; if (delete) bits |= ~EXTENT_CTLBITS; @@ -740,7 +740,8 @@ static void adjust_one_outstanding_extent(struct inode *inode, u64 len, enum btrfs_metadata_reserve_type reserve_type) { unsigned int old_extents, new_extents; - u64 max_extent_size = btrfs_max_extent_size(reserve_type); + u64 max_extent_size = btrfs_max_extent_size(BTRFS_I(inode), + reserve_type); old_extents = div64_u64(len + max_extent_size - 1, max_extent_size); new_extents = div64_u64(len + BTRFS_MAX_EXTENT_SIZE - 1, @@ -779,7 +780,7 @@ void adjust_outstanding_extents(struct inode *inode, u64 start, u64 end, * The whole range is locked, so we can safely clear * EXTENT_COMPRESS flag. */ - state->state &= ~EXTENT_COMPRESS; + state->state &= ~(EXTENT_COMPRESS | EXTENT_DEDUPE); adjust_one_outstanding_extent(inode, state->end - state->start + 1, reserve_type); node = rb_next(node); @@ -1561,7 +1562,8 @@ static noinline u64 find_delalloc_range(struct extent_io_tree *tree, state = rb_entry(node, struct extent_state, rb_node); if (found && (state->start != cur_start || (state->state & EXTENT_BOUNDARY) || - (state->state ^ pre_state) & EXTENT_COMPRESS)) { + (state->state ^ pre_state) & (EXTENT_COMPRESS | + EXTENT_DEDUPE))) { goto out; } if (!(state->state & EXTENT_DELALLOC)) { diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 397d8958fcbc..808c90db090c 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -24,6 +24,7 @@ #define EXTENT_CLEAR_DATA_RESV (1U << 17) #define EXTENT_DELALLOC_NEW (1U << 18) #define EXTENT_COMPRESS (1U << 19) +#define EXTENT_DEDUPE (1U << 20) #define EXTENT_IOBITS (EXTENT_LOCKED | EXTENT_WRITEBACK) #define EXTENT_DO_ACCOUNTING (EXTENT_CLEAR_META_RESV | \ EXTENT_CLEAR_DATA_RESV) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index ea5dd2d12bb0..9ab6910af6df 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -41,6 +41,7 @@ #include "volumes.h" #include "qgroup.h" #include "compression.h" +#include "dedupe.h" static struct kmem_cache *btrfs_inode_defrag_cachep; /* @@ -1604,6 +1605,8 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, if (inode_need_compress(inode)) reserve_type = BTRFS_RESERVE_COMPRESS; + else if (inode_need_dedupe(inode)) + reserve_type = BTRFS_RESERVE_DEDUPE; while (iov_iter_count(i) > 0) { size_t offset = pos & (PAGE_SIZE - 1); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 3df13530cf3f..74f6dd598a88 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -359,6 +359,8 @@ struct async_extent { struct page **pages; unsigned long nr_pages; int compress_type; + int dedupe; + struct btrfs_dedupe_hash *hash; struct list_head list; }; @@ -370,6 +372,7 @@ struct async_cow { u64 end; struct list_head extents; struct btrfs_work work; + enum btrfs_metadata_reserve_type reserve_type; }; static noinline int add_async_extent(struct async_cow *cow, @@ -377,7 +380,8 @@ static noinline int add_async_extent(struct async_cow *cow, u64 compressed_size, struct page **pages, unsigned long nr_pages, - int compress_type) + int compress_type, int dedupe, + struct btrfs_dedupe_hash *hash) { struct async_extent *async_extent; @@ -389,6 +393,8 @@ static noinline int add_async_extent(struct async_cow *cow, async_extent->pages = pages; async_extent->nr_pages = nr_pages; async_extent->compress_type = compress_type; + async_extent->dedupe = dedupe; + async_extent->hash = hash; list_add_tail(&async_extent->list, &cow->extents); return 0; } @@ -619,7 +625,7 @@ static noinline void compress_file_range(struct inode *inode, */ add_async_extent(async_cow, start, num_bytes, total_compressed, pages, nr_pages, - compress_type); + compress_type, 0, NULL); if (start + num_bytes < end) { start += num_bytes; @@ -665,7 +671,7 @@ static noinline void compress_file_range(struct inode *inode, if (redirty) extent_range_redirty_for_io(inode, start, end); add_async_extent(async_cow, start, end - start + 1, 0, NULL, 0, - BTRFS_COMPRESS_NONE); + BTRFS_COMPRESS_NONE, 0, NULL); *num_added += 1; return; @@ -694,6 +700,38 @@ static void free_async_extent_pages(struct async_extent *async_extent) async_extent->pages = NULL; } +static void end_dedupe_extent(struct inode *inode, u64 start, + u32 len, unsigned long page_ops) +{ + int i; + unsigned int nr_pages = len / PAGE_SIZE; + struct page *page; + + for (i = 0; i < nr_pages; i++) { + page = find_get_page(inode->i_mapping, + start >> PAGE_SHIFT); + /* page should be already locked by caller */ + if (WARN_ON(!page)) + continue; + + /* We need to do this by ourselves as we skipped IO */ + if (page_ops & PAGE_CLEAR_DIRTY) + clear_page_dirty_for_io(page); + if (page_ops & PAGE_SET_WRITEBACK) + set_page_writeback(page); + + end_extent_writepage(page, 0, start, + start + PAGE_SIZE - 1); + if (page_ops & PAGE_END_WRITEBACK) + end_page_writeback(page); + if (page_ops & PAGE_UNLOCK) + unlock_page(page); + + start += PAGE_SIZE; + put_page(page); + } +} + /* * phase two of compressed writeback. This is the ordered portion * of the code, which only gets called in the order the work was @@ -710,6 +748,7 @@ static noinline void submit_compressed_extents(struct inode *inode, struct extent_map *em; struct btrfs_root *root = BTRFS_I(inode)->root; struct extent_io_tree *io_tree; + struct btrfs_dedupe_hash *hash; int ret = 0; again: @@ -719,6 +758,7 @@ static noinline void submit_compressed_extents(struct inode *inode, list_del(&async_extent->list); io_tree = &BTRFS_I(inode)->io_tree; + hash = async_extent->hash; retry: /* did the compression code fall back to uncompressed IO? */ @@ -737,10 +777,12 @@ static noinline void submit_compressed_extents(struct inode *inode, * 128MB as max extent size to re-calculate number of * outstanding extents for this extent. */ - adjust_outstanding_extents(inode, async_extent->start, - async_extent->start + - async_extent->ram_size - 1, - BTRFS_RESERVE_COMPRESS); + if (!async_extent->dedupe) + adjust_outstanding_extents(inode, + async_extent->start, + async_extent->start + + async_extent->ram_size - 1, + BTRFS_RESERVE_COMPRESS); /* allocate blocks */ ret = cow_file_range(inode, async_cow->locked_page, async_extent->start, @@ -749,7 +791,7 @@ static noinline void submit_compressed_extents(struct inode *inode, async_extent->start + async_extent->ram_size - 1, &page_started, &nr_written, 0, - NULL); + hash); /* JDM XXX */ @@ -759,15 +801,26 @@ static noinline void submit_compressed_extents(struct inode *inode, * and IO for us. Otherwise, we need to submit * all those pages down to the drive. */ - if (!page_started && !ret) - extent_write_locked_range(io_tree, - inode, async_extent->start, - async_extent->start + - async_extent->ram_size - 1, - btrfs_get_extent, - WB_SYNC_ALL); - else if (ret) + if (!page_started && !ret) { + /* Skip IO for dedupe async_extent */ + if (btrfs_dedupe_hash_hit(hash)) + end_dedupe_extent(inode, + async_extent->start, + async_extent->ram_size, + PAGE_CLEAR_DIRTY | + PAGE_SET_WRITEBACK | + PAGE_END_WRITEBACK | + PAGE_UNLOCK); + else + extent_write_locked_range(io_tree, + inode, async_extent->start, + async_extent->start + + async_extent->ram_size - 1, + btrfs_get_extent, + WB_SYNC_ALL); + } else if (ret) unlock_page(async_cow->locked_page); + kfree(hash); kfree(async_extent); cond_resched(); continue; @@ -871,6 +924,7 @@ static noinline void submit_compressed_extents(struct inode *inode, free_async_extent_pages(async_extent); } alloc_hint = ins.objectid + ins.offset; + kfree(hash); kfree(async_extent); cond_resched(); } @@ -891,6 +945,7 @@ static noinline void submit_compressed_extents(struct inode *inode, PAGE_SET_WRITEBACK | PAGE_END_WRITEBACK | PAGE_SET_ERROR); free_async_extent_pages(async_extent); + kfree(hash); kfree(async_extent); goto again; } @@ -1005,13 +1060,19 @@ static noinline int cow_file_range(struct inode *inode, while (disk_num_bytes > 0) { cur_alloc_size = disk_num_bytes; - ret = btrfs_reserve_extent(root, cur_alloc_size, cur_alloc_size, + if (btrfs_dedupe_hash_hit(hash)) { + ins.objectid = hash->bytenr; + ins.offset = hash->num_bytes; + } else { + ret = btrfs_reserve_extent(root, cur_alloc_size, + cur_alloc_size, fs_info->sectorsize, 0, alloc_hint, &ins, 1, 1); - if (ret < 0) - goto out_unlock; + if (ret < 0) + goto out_unlock; + extent_reserved = true; + } cur_alloc_size = ins.offset; - extent_reserved = true; ram_size = ins.offset; em = create_io_em(inode, start, ins.offset, /* len */ @@ -1026,8 +1087,9 @@ static noinline int cow_file_range(struct inode *inode, goto out_reserve; free_extent_map(em); - ret = btrfs_add_ordered_extent(inode, start, ins.objectid, - ram_size, cur_alloc_size, 0); + ret = btrfs_add_ordered_extent_dedupe(inode, start, + ins.objectid, cur_alloc_size, ins.offset, + 0, hash); if (ret) goto out_drop_extent_cache; @@ -1051,7 +1113,14 @@ static noinline int cow_file_range(struct inode *inode, start + ram_size - 1, 0); } - btrfs_dec_block_group_reservations(fs_info, ins.objectid); + /* + * Hash hit didn't allocate extent, no need to dec bg + * reservation. + * Or we will underflow reservations and block balance. + */ + if (!btrfs_dedupe_hash_hit(hash)) + btrfs_dec_block_group_reservations(fs_info, + ins.objectid); /* we're not doing compressed IO, don't unlock the first * page (which the caller expects to stay locked), don't @@ -1126,6 +1195,79 @@ static noinline int cow_file_range(struct inode *inode, goto out; } +static int hash_file_ranges(struct inode *inode, u64 start, u64 end, + struct async_cow *async_cow, int *num_added) +{ + struct btrfs_root *root = BTRFS_I(inode)->root; + struct btrfs_fs_info *fs_info = root->fs_info; + struct btrfs_dedupe_info *dedupe_info = fs_info->dedupe_info; + struct page *locked_page = async_cow->locked_page; + u16 hash_algo; + u64 dedupe_bs; + u64 cur_offset = start; + int ret = 0; + + /* If dedupe is not enabled, don't split extent into dedupe_bs */ + if (fs_info->dedupe_enabled && dedupe_info) { + dedupe_bs = dedupe_info->blocksize; + hash_algo = dedupe_info->hash_algo; + } else { + dedupe_bs = SZ_128M; + /* Just dummy, to avoid access NULL pointer */ + hash_algo = BTRFS_DEDUPE_HASH_SHA256; + } + + while (cur_offset < end) { + struct btrfs_dedupe_hash *hash = NULL; + u64 len; + + len = min(end + 1 - cur_offset, dedupe_bs); + if (len < dedupe_bs) + goto next; + + hash = btrfs_dedupe_alloc_hash(hash_algo); + if (!hash) { + ret = -ENOMEM; + goto out; + } + ret = btrfs_dedupe_calc_hash(fs_info, inode, cur_offset, hash); + if (ret < 0) { + kfree(hash); + goto out; + } + + ret = btrfs_dedupe_search(fs_info, inode, cur_offset, hash); + if (ret < 0) { + kfree(hash); + goto out; + } + ret = 0; + +next: + /* Redirty the locked page if it corresponds to our extent */ + if (page_offset(locked_page) >= start && + page_offset(locked_page) <= end) + __set_page_dirty_nobuffers(locked_page); + + add_async_extent(async_cow, cur_offset, len, 0, NULL, 0, + BTRFS_COMPRESS_NONE, 1, hash); + cur_offset += len; + (*num_added)++; + } +out: + /* + * Caller won't unlock pages, so if error happens, we must unlock + * pages by ourselves. + */ + if (ret) + extent_clear_unlock_delalloc(inode, cur_offset, + end, end, NULL, EXTENT_LOCKED | EXTENT_DO_ACCOUNTING | + EXTENT_DELALLOC | EXTENT_DEFRAG, PAGE_UNLOCK | + PAGE_CLEAR_DIRTY | PAGE_SET_WRITEBACK | + PAGE_END_WRITEBACK | PAGE_SET_ERROR); + return ret; +} + /* * work queue call back to started compression on a file and pages */ @@ -1133,11 +1275,17 @@ static noinline void async_cow_start(struct btrfs_work *work) { struct async_cow *async_cow; int num_added = 0; + int ret = 0; async_cow = container_of(work, struct async_cow, work); - compress_file_range(async_cow->inode, async_cow->locked_page, - async_cow->start, async_cow->end, async_cow, - &num_added); + if (async_cow->reserve_type == BTRFS_RESERVE_COMPRESS) + compress_file_range(async_cow->inode, async_cow->locked_page, + async_cow->start, async_cow->end, async_cow, + &num_added); + else + ret = hash_file_ranges(async_cow->inode, async_cow->start, + async_cow->end, async_cow, &num_added); + if (num_added == 0) { btrfs_add_delayed_iput(async_cow->inode); async_cow->inode = NULL; @@ -1190,6 +1338,7 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page, struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct async_cow *async_cow; struct btrfs_root *root = BTRFS_I(inode)->root; + struct btrfs_dedupe_info *dedupe_info = fs_info->dedupe_info; unsigned long nr_pages; u64 cur_end; @@ -1202,10 +1351,17 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page, async_cow->root = root; async_cow->locked_page = locked_page; async_cow->start = start; + async_cow->reserve_type = reserve_type; cur_end = end; if (reserve_type == BTRFS_RESERVE_COMPRESS) cur_end = min(end, start + SZ_512K - 1); + else if (reserve_type == BTRFS_RESERVE_DEDUPE) { + u64 len = max_t(u64, SZ_512K, dedupe_info->blocksize); + + cur_end = min(end, start + len - 1); + } else + ASSERT(0); async_cow->end = cur_end; INIT_LIST_HEAD(&async_cow->extents); @@ -1582,29 +1738,33 @@ static int run_delalloc_range(void *private_data, struct page *locked_page, int ret; int force_cow = need_force_cow(inode, start, end); struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; - int need_compress; enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL; + int need_compress, need_dedupe; need_compress = test_range_bit(io_tree, start, end, EXTENT_COMPRESS, 1, NULL); + need_dedupe = test_range_bit(io_tree, start, end, + EXTENT_DEDUPE, 1, NULL); if (need_compress) reserve_type = BTRFS_RESERVE_COMPRESS; + else if (need_dedupe) + reserve_type = BTRFS_RESERVE_DEDUPE; if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW && !force_cow) { - if (need_compress) + if (need_compress || need_dedupe) adjust_outstanding_extents(inode, start, end, reserve_type); ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 1, nr_written); } else if (BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC && !force_cow) { - if (need_compress) + if (need_compress || need_dedupe) adjust_outstanding_extents(inode, start, end, reserve_type); ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 0, nr_written); - } else if (!need_compress) { + } else if (!need_compress && !need_dedupe) { ret = cow_file_range(inode, locked_page, start, end, end, page_started, nr_written, 1, NULL); } else { @@ -1635,7 +1795,9 @@ static void btrfs_split_extent_hook(void *private_data, if (orig->state & EXTENT_COMPRESS) reserve_type = BTRFS_RESERVE_COMPRESS; - max_extent_size = btrfs_max_extent_size(reserve_type); + else if (orig->state & EXTENT_DEDUPE) + reserve_type = BTRFS_RESERVE_DEDUPE; + max_extent_size = btrfs_max_extent_size(BTRFS_I(inode), reserve_type); size = orig->end - orig->start + 1; if (size > max_extent_size) { @@ -1684,7 +1846,9 @@ static void btrfs_merge_extent_hook(void *private_data, if (other->state & EXTENT_COMPRESS) reserve_type = BTRFS_RESERVE_COMPRESS; - max_extent_size = btrfs_max_extent_size(reserve_type); + else if (other->state & EXTENT_DEDUPE) + reserve_type = BTRFS_RESERVE_DEDUPE; + max_extent_size = btrfs_max_extent_size(BTRFS_I(inode), reserve_type); if (new->start > other->start) new_size = new->end - other->start + 1; @@ -1802,7 +1966,10 @@ static void btrfs_set_bit_hook(void *private_data, if (*bits & EXTENT_COMPRESS) reserve_type = BTRFS_RESERVE_COMPRESS; - max_extent_size = btrfs_max_extent_size(reserve_type); + else if (*bits & EXTENT_DEDUPE) + reserve_type = BTRFS_RESERVE_DEDUPE; + max_extent_size = btrfs_max_extent_size(BTRFS_I(inode), + reserve_type); num_extents = count_max_extents(len, max_extent_size); if (*bits & EXTENT_FIRST_DELALLOC) @@ -1869,7 +2036,9 @@ static void btrfs_clear_bit_hook(void *private_data, if (state->state & EXTENT_COMPRESS) reserve_type = BTRFS_RESERVE_COMPRESS; - max_extent_size = btrfs_max_extent_size(reserve_type); + else if (state->state & EXTENT_DEDUPE) + reserve_type = BTRFS_RESERVE_DEDUPE; + max_extent_size = btrfs_max_extent_size(inode, reserve_type); num_extents = count_max_extents(len, max_extent_size); if (*bits & EXTENT_FIRST_DELALLOC) { @@ -2091,13 +2260,16 @@ int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end, { int ret; unsigned int bits; - u64 max_extent_size = btrfs_max_extent_size(reserve_type); + u64 max_extent_size = btrfs_max_extent_size(BTRFS_I(inode), + reserve_type); u64 num_extents = div64_u64(end - start + max_extent_size, max_extent_size); /* compression path */ if (reserve_type == BTRFS_RESERVE_COMPRESS) bits = EXTENT_DELALLOC | EXTENT_COMPRESS | EXTENT_UPTODATE; + else if (reserve_type == BTRFS_RESERVE_DEDUPE) + bits = EXTENT_DELALLOC | EXTENT_DEDUPE | EXTENT_UPTODATE; else bits = EXTENT_DELALLOC | EXTENT_UPTODATE; @@ -2131,7 +2303,8 @@ int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end, { int ret; unsigned int bits; - u64 max_extent_size = btrfs_max_extent_size(reserve_type); + u64 max_extent_size = btrfs_max_extent_size(BTRFS_I(inode), + reserve_type); u64 num_extents = div64_u64(end - start + max_extent_size, max_extent_size); @@ -2205,6 +2378,9 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work) if (inode_need_compress(inode)) reserve_type = BTRFS_RESERVE_COMPRESS; + else if (inode_need_dedupe(inode)) + reserve_type = BTRFS_RESERVE_DEDUPE; + ret = btrfs_delalloc_reserve_space(inode, &data_reserved, page_start, PAGE_SIZE, reserve_type); if (ret) { @@ -2270,7 +2446,8 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans, u64 disk_bytenr, u64 disk_num_bytes, u64 num_bytes, u64 ram_bytes, u8 compression, u8 encryption, - u16 other_encoding, int extent_type) + u16 other_encoding, int extent_type, + struct btrfs_dedupe_hash *hash) { struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_file_extent_item *fi; @@ -2334,16 +2511,44 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans, ins.offset = disk_num_bytes; ins.type = BTRFS_EXTENT_ITEM_KEY; - /* - * Release the reserved range from inode dirty range map, as it is - * already moved into delayed_ref_head - */ - ret = btrfs_qgroup_release_data(inode, file_pos, ram_bytes); - if (ret < 0) - goto out; - qg_released = ret; - ret = btrfs_alloc_reserved_file_extent(trans, root->root_key.objectid, - btrfs_ino(BTRFS_I(inode)), file_pos, qg_released, &ins); + if (btrfs_dedupe_hash_hit(hash)) { + /* + * Hash hit won't create a new data extent, so its reserved + * space won't be freed by new delayed_ref_head. + * Manually free it. + */ + btrfs_free_reserved_data_space(inode, NULL, file_pos, + ram_bytes); + } else { + /* + * Hash miss or none-dedupe write, will create a new data + * extent, we need to release the qgroup reserved data space. + */ + ret = btrfs_qgroup_release_data(inode, file_pos, ram_bytes); + if (ret < 0) + goto out; + qg_released = ret; + ret = btrfs_alloc_reserved_file_extent(trans, + root->root_key.objectid, + btrfs_ino(BTRFS_I(inode)), file_pos, + qg_released, &ins); + if (ret < 0) + goto out; + } + + /* Add missed hash into dedupe tree */ + if (hash && hash->bytenr == 0) { + hash->bytenr = ins.objectid; + hash->num_bytes = ins.offset; + + /* + * Here we ignore dedupe_add error, as even it failed, + * it won't corrupt the filesystem. It will only only slightly + * reduce dedup rate + */ + btrfs_dedupe_add(trans, root->fs_info, hash); + } + out: btrfs_free_path(path); @@ -3032,6 +3237,7 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) bool range_locked = false; bool clear_new_delalloc_bytes = false; enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL; + int hash_hit = btrfs_dedupe_hash_hit(ordered_extent->hash); if (!test_bit(BTRFS_ORDERED_NOCOW, &ordered_extent->flags) && !test_bit(BTRFS_ORDERED_PREALLOC, &ordered_extent->flags) && @@ -3119,7 +3325,8 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) if (test_bit(BTRFS_ORDERED_COMPRESSED, &ordered_extent->flags)) { compress_type = ordered_extent->compress_type; reserve_type = BTRFS_RESERVE_COMPRESS; - } + } else if (ordered_extent->hash) + reserve_type = BTRFS_RESERVE_DEDUPE; if (test_bit(BTRFS_ORDERED_PREALLOC, &ordered_extent->flags)) { BUG_ON(compress_type); @@ -3135,8 +3342,10 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) ordered_extent->disk_len, logical_len, logical_len, compress_type, 0, 0, - BTRFS_FILE_EXTENT_REG); - if (!ret) + BTRFS_FILE_EXTENT_REG, + ordered_extent->hash); + /* Hash hit case doesn't reserve delalloc bytes */ + if (!ret && !hash_hit) btrfs_release_delalloc_bytes(fs_info, ordered_extent->start, ordered_extent->disk_len); @@ -3199,8 +3408,11 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) * wrong we need to return the space for this ordered extent * back to the allocator. We only free the extent in the * truncated case if we didn't write out the extent at all. + * + * For hash hit case, never free that extent, as it's being used + * by others. */ - if ((ret || !logical_len) && + if ((ret || !logical_len) && !hash_hit && !test_bit(BTRFS_ORDERED_NOCOW, &ordered_extent->flags) && !test_bit(BTRFS_ORDERED_PREALLOC, &ordered_extent->flags)) btrfs_free_reserved_extent(fs_info, @@ -3208,7 +3420,6 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) ordered_extent->disk_len, 1); } - /* * This needs to be done to make sure anybody waiting knows we are done * updating everything for this ordered extent. @@ -4919,6 +5130,8 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, if (inode_need_compress(inode)) reserve_type = BTRFS_RESERVE_COMPRESS; + else if (inode_need_dedupe(inode)) + reserve_type = BTRFS_RESERVE_DEDUPE; if ((offset & (blocksize - 1)) == 0 && (!len || ((len & (blocksize - 1)) == 0))) @@ -7838,7 +8051,8 @@ static void adjust_dio_outstanding_extents(struct inode *inode, struct btrfs_dio_data *dio_data, const u64 len) { - u64 max_extent_size = btrfs_max_extent_size(BTRFS_RESERVE_NORMAL); + u64 max_extent_size = btrfs_max_extent_size(BTRFS_I(inode), + BTRFS_RESERVE_NORMAL); unsigned int num_extents = count_max_extents(len, max_extent_size); /* @@ -8868,7 +9082,8 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) bool wakeup = true; bool relock = false; ssize_t ret; - u64 max_extent_size = btrfs_max_extent_size(BTRFS_RESERVE_NORMAL); + u64 max_extent_size = btrfs_max_extent_size(BTRFS_I(inode), + BTRFS_RESERVE_NORMAL); if (check_direct_IO(fs_info, iocb, iter, offset)) return 0; @@ -9216,6 +9431,9 @@ int btrfs_page_mkwrite(struct vm_fault *vmf) if (inode_need_compress(inode)) reserve_type = BTRFS_RESERVE_COMPRESS; + else if (inode_need_dedupe(inode)) + reserve_type = BTRFS_RESERVE_DEDUPE; + /* * Reserving delalloc space after obtaining the page lock can lead to * deadlock. For example, if a dirty page is locked by this function @@ -10633,7 +10851,8 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode, cur_offset, ins.objectid, ins.offset, ins.offset, ins.offset, 0, 0, 0, - BTRFS_FILE_EXTENT_PREALLOC); + BTRFS_FILE_EXTENT_PREALLOC, + NULL); if (ret) { btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, 0); diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index e0c3f8829e52..e745052eae04 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -60,6 +60,7 @@ #include "qgroup.h" #include "tree-log.h" #include "compression.h" +#include "dedupe.h" #ifdef CONFIG_64BIT /* If we have a 32-bit userspace and 64-bit kernel, then the UAPI diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 366de2c2f86c..02a4b989a577 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -32,6 +32,7 @@ #include "free-space-cache.h" #include "inode-map.h" #include "qgroup.h" +#include "dedupe.h" /* * backref_node, mapping_node and tree_block start with this @@ -3190,6 +3191,8 @@ static int relocate_file_extent_cluster(struct inode *inode, if (inode_need_compress(inode)) reserve_type = BTRFS_RESERVE_COMPRESS; + else if (inode_need_dedupe(inode)) + reserve_type = BTRFS_RESERVE_DEDUPE; ra = kzalloc(sizeof(*ra), GFP_NOFS); if (!ra) @@ -4118,6 +4121,20 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc) rc->search_start = key.objectid; } } + /* + * This data extent will be replaced, but normal dedupe_del() + * will only happen at run_delayed_ref() time, which is too + * late, so delete dedupe_hash early to prevent its ref get + * increased during relocation + */ + if (rc->stage == MOVE_DATA_EXTENTS && + (flags & BTRFS_EXTENT_FLAG_DATA)) { + ret = btrfs_dedupe_del(trans, fs_info, key.objectid); + if (ret < 0) { + err = ret; + break; + } + } btrfs_end_transaction_throttle(trans); btrfs_btree_balance_dirty(fs_info);