From patchwork Mon May 18 11:14:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 11555263 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CC0B6913 for ; Mon, 18 May 2020 11:14:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A9B9A2083E for ; Mon, 18 May 2020 11:14:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1589800496; bh=kM5DHvfPo9fihFU2HkaivVbQZpFnXzMT0op6YuWsh0w=; h=From:To:Subject:Date:List-ID:From; b=e2IFaodwZ8h7jfYoAAn9duWco3NoHfYg3WBBKplCS6jRqTBYzMkVVOXQS7u4Or+Pb UbrjceIZ1M4wHVZQFgwJN8pQrWPXAVzZg3m4ettMKOyeGIWvD1hVVO2fMiypY47yD5 SXtIGLVBVtRH9T2BQiffPGwf7jpMNugMfAAYAZS0= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727799AbgERLO4 (ORCPT ); Mon, 18 May 2020 07:14:56 -0400 Received: from mail.kernel.org ([198.145.29.99]:38100 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726724AbgERLOz (ORCPT ); Mon, 18 May 2020 07:14:55 -0400 Received: from debian6.Home (bl8-197-74.dsl.telepac.pt [85.241.197.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id C42412083E for ; Mon, 18 May 2020 11:14:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1589800494; bh=kM5DHvfPo9fihFU2HkaivVbQZpFnXzMT0op6YuWsh0w=; h=From:To:Subject:Date:From; b=KQ5S2PvysuvugSwSEf4ny8ucyewn6FXsTn+mrjYzLi1qADr/GiQRzTDMceilX0k85 YmhFWwA8PnJhnK+RzTaPUk2NycvvONLDCF2EC+WsRwKefkK4mgk8+yZHXaiso9DgUS YfA7peLzpUB83+kpK4EpTiwD50K0WSy0gcsP7YNk= From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 1/4] Btrfs: fix corrupt log due to concurrent fsync of inodes with shared extents Date: Mon, 18 May 2020 12:14:50 +0100 Message-Id: <20200518111450.30771-1-fdmanana@kernel.org> X-Mailer: git-send-email 2.11.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana When we have extents shared amongst different inodes in the same subvolume, if we fsync them in parallel we can end up with checksum items in the log tree that represent ranges which overlap. For example, consider we have inodes A and B, both sharing an extent that covers the logical range from X to X + 64Kb: 1) Task A starts an fsync on inode A; 2) Task B starts an fsync on inode B; 3) Task A calls btrfs_csum_file_blocks(), and the first search in the log tree, through btrfs_lookup_csum(), returns -EFBIG because it finds an existing checksum item that covers the range from X - 64Kb to X; 4) Task A checks that the checksum item has not reached the maximum possible size (MAX_CSUM_ITEMS) and then releases the search path before it does another path search for insertion (through a direct call to btrfs_search_slot()); 5) As soon as task A releases the path and before it does the search for insertion, task B calls btrfs_csum_file_blocks() and gets -EFBIG too, because there is an existing checksum item that has an end offset that matches the start offset (X) of the checksum range we want to log; 6) Task B releases the path; 7) Task A does the path search for insertion (through btrfs_search_slot()) and then verifies that the checksum item that ends at offset X still exists and extends its size to insert the checksums for the range from X to X + 64Kb; 8) Task A releases the path and returns from btrfs_csum_file_blocks(), having inserted the checksums into an existing checksum item that got its size extended. At this point we have one checksum item in the log tree that covers the logical range from X - 64Kb to X + 64Kb; 9) Task B now does a search for insertion using btrfs_search_slot() too, but it finds that the previous checksum item no longer ends at the offset X, it now ends at an of offset X + 64Kb, so it leaves that item untouched. Then it releases the path and calls btrfs_insert_empty_item() that inserts a checksum item with a key offset corresponding to X and a size for inserting a single checksum (4 bytes in case of crc32c). Subsequent iterations end up extending this new checksum item so that it contains the checksums for the range from X to X + 64Kb. So after task B returns from btrfs_csum_file_blocks() we end up with two checksum items in the log tree that have overlapping ranges, one for the range from X - 64Kb to X + 64Kb, and another for the range from X to X + 64Kb. Having checksum items that represent ranges which overlap, regardless of being in the log tree or in the chekcsums tree, can lead to problems where checksums for a file range end up not being found. This type of problem has happened a few times in the past and the following commits fixed them and explain in detail why having checksum items with overlapping ranges is problematic: commit 27b9a8122ff71a ("Btrfs: fix csum tree corruption, duplicate and outdated checksums") commit b84b8390d6009c ("Btrfs: fix file read corruption after extent cloning and fsync") commit 40e046acbd2f36 ("Btrfs: fix missing data checksums after replaying a log tree") Since this specific instance of the problem can only happen when logging inodes, because it is the only case where concurrent attempts to insert checksums for the same range can happen, fix the issue by using an extent io tree as a range lock to serialize checksum insertion during inode logging. This issue could often be reproduced by the test case generic/457 from fstests. When it happens it produces the following trace: BTRFS critical (device dm-0): corrupt leaf: root=18446744073709551610 block=30625792 slot=42, csum end range (15020032) goes beyond the start range (15015936) of the next csum item BTRFS info (device dm-0): leaf 30625792 gen 7 total ptrs 49 free space 2402 owner 18446744073709551610 BTRFS info (device dm-0): refs 1 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) lock_owner 0 current 15884 item 0 key (18446744073709551606 128 13979648) itemoff 3991 itemsize 4 item 1 key (18446744073709551606 128 13983744) itemoff 3987 itemsize 4 item 2 key (18446744073709551606 128 13987840) itemoff 3983 itemsize 4 item 3 key (18446744073709551606 128 13991936) itemoff 3979 itemsize 4 item 4 key (18446744073709551606 128 13996032) itemoff 3975 itemsize 4 item 5 key (18446744073709551606 128 14000128) itemoff 3971 itemsize 4 (...) BTRFS error (device dm-0): block=30625792 write time tree block corruption detected ------------[ cut here ]------------ WARNING: CPU: 1 PID: 15884 at fs/btrfs/disk-io.c:539 btree_csum_one_bio+0x268/0x2d0 [btrfs] Modules linked in: btrfs dm_thin_pool ... CPU: 1 PID: 15884 Comm: fsx Tainted: G W 5.6.0-rc7-btrfs-next-58 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:btree_csum_one_bio+0x268/0x2d0 [btrfs] Code: c7 c7 ... RSP: 0018:ffffbb0109e6f8e0 EFLAGS: 00010296 RAX: 0000000000000000 RBX: ffffe1c0847b6080 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffffaa963988 RDI: 0000000000000001 RBP: ffff956a4f4d2000 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000526 R11: 0000000000000000 R12: ffff956a5cd28bb0 R13: 0000000000000000 R14: ffff956a649c9388 R15: 000000011ed82000 FS: 00007fb419959e80(0000) GS:ffff956a7aa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000fe6d54 CR3: 0000000138696005 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: btree_submit_bio_hook+0x67/0xc0 [btrfs] submit_one_bio+0x31/0x50 [btrfs] btree_write_cache_pages+0x2db/0x4b0 [btrfs] ? __filemap_fdatawrite_range+0xb1/0x110 do_writepages+0x23/0x80 __filemap_fdatawrite_range+0xd2/0x110 btrfs_write_marked_extents+0x15e/0x180 [btrfs] btrfs_sync_log+0x206/0x10a0 [btrfs] ? kmem_cache_free+0x315/0x3b0 ? btrfs_log_inode+0x1e8/0xf90 [btrfs] ? __mutex_unlock_slowpath+0x45/0x2a0 ? lockref_put_or_lock+0x9/0x30 ? dput+0x2d/0x580 ? dput+0xb5/0x580 ? btrfs_sync_file+0x464/0x4d0 [btrfs] btrfs_sync_file+0x464/0x4d0 [btrfs] do_fsync+0x38/0x60 __x64_sys_fsync+0x10/0x20 do_syscall_64+0x5c/0x280 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fb41953a6d0 Code: 48 3d ... RSP: 002b:00007ffcc86bd218 EFLAGS: 00000246 ORIG_RAX: 000000000000004a RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fb41953a6d0 RDX: 0000000000000009 RSI: 0000000000040000 RDI: 0000000000000003 RBP: 0000000000040000 R08: 0000000000000001 R09: 0000000000000009 R10: 0000000000000064 R11: 0000000000000246 R12: 0000556cf4b2c060 R13: 0000000000000100 R14: 0000000000000000 R15: 0000556cf322b420 irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [] copy_process+0x74f/0x2020 softirqs last enabled at (0): [] copy_process+0x74f/0x2020 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace d543fc76f5ad7fd8 ]--- In that trace the tree checker detected the overlapping checksum items at the time when we triggered writeback for the log tree when syncing the log. Another trace that can happen is due to BUG_ON() when deleting checksum items while logging an inode: BTRFS critical (device dm-0): slot 81 key (18446744073709551606 128 13635584) new key (18446744073709551606 128 13635584) BTRFS info (device dm-0): leaf 30949376 gen 7 total ptrs 98 free space 8527 owner 18446744073709551610 BTRFS info (device dm-0): refs 4 lock (w:1 r:0 bw:0 br:0 sw:1 sr:0) lock_owner 13473 current 13473 item 0 key (257 1 0) itemoff 16123 itemsize 160 inode generation 7 size 262144 mode 100600 item 1 key (257 12 256) itemoff 16103 itemsize 20 item 2 key (257 108 0) itemoff 16050 itemsize 53 extent data disk bytenr 13631488 nr 4096 extent data offset 0 nr 131072 ram 131072 (...) ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:3153! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI CPU: 1 PID: 13473 Comm: fsx Not tainted 5.6.0-rc7-btrfs-next-58 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_set_item_key_safe+0x1ea/0x270 [btrfs] Code: 0f b6 ... RSP: 0018:ffff95e3889179d0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000051 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffffb7763988 RDI: 0000000000000001 RBP: fffffffffffffff6 R08: 0000000000000000 R09: 0000000000000001 R10: 00000000000009ef R11: 0000000000000000 R12: ffff8912a8ba5a08 R13: ffff95e388917a06 R14: ffff89138dcf68c8 R15: ffff95e388917ace FS: 00007fe587084e80(0000) GS:ffff8913baa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe587091000 CR3: 0000000126dac005 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: btrfs_del_csums+0x2f4/0x540 [btrfs] copy_items+0x4b5/0x560 [btrfs] btrfs_log_inode+0x910/0xf90 [btrfs] btrfs_log_inode_parent+0x2a0/0xe40 [btrfs] ? dget_parent+0x5/0x370 btrfs_log_dentry_safe+0x4a/0x70 [btrfs] btrfs_sync_file+0x42b/0x4d0 [btrfs] __x64_sys_msync+0x199/0x200 do_syscall_64+0x5c/0x280 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fe586c65760 Code: 00 f7 ... RSP: 002b:00007ffe250f98b8 EFLAGS: 00000246 ORIG_RAX: 000000000000001a RAX: ffffffffffffffda RBX: 00000000000040e1 RCX: 00007fe586c65760 RDX: 0000000000000004 RSI: 0000000000006b51 RDI: 00007fe58708b000 RBP: 0000000000006a70 R08: 0000000000000003 R09: 00007fe58700cb61 R10: 0000000000000100 R11: 0000000000000246 R12: 00000000000000e1 R13: 00007fe58708b000 R14: 0000000000006b51 R15: 0000558de021a420 Modules linked in: dm_log_writes ... ---[ end trace c92a7f447a8515f5 ]--- CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana --- fs/btrfs/ctree.h | 3 +++ fs/btrfs/disk-io.c | 5 ++++- fs/btrfs/extent-io-tree.h | 1 + fs/btrfs/tree-log.c | 22 +++++++++++++++++++--- include/trace/events/btrfs.h | 1 + 5 files changed, 28 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index e2ae26f6b9d0..69ee6400adc0 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1146,6 +1146,9 @@ struct btrfs_root { /* Record pairs of swapped blocks for qgroup */ struct btrfs_qgroup_swapped_blocks swapped_blocks; + /* Used only by log trees, when logging csum items. */ + struct extent_io_tree log_csum_range; + #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS u64 alloc_bytenr; #endif diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index d10c7be10f3b..91def9fd9456 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1137,9 +1137,12 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info, root->log_transid = 0; root->log_transid_committed = -1; root->last_log_commit = 0; - if (!dummy) + if (!dummy) { extent_io_tree_init(fs_info, &root->dirty_log_pages, IO_TREE_ROOT_DIRTY_LOG_PAGES, NULL); + extent_io_tree_init(fs_info, &root->log_csum_range, + IO_TREE_LOG_CSUM_RANGE, NULL); + } memset(&root->root_key, 0, sizeof(root->root_key)); memset(&root->root_item, 0, sizeof(root->root_item)); diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h index b4a7bad3e82e..b6561455b3c4 100644 --- a/fs/btrfs/extent-io-tree.h +++ b/fs/btrfs/extent-io-tree.h @@ -44,6 +44,7 @@ enum { IO_TREE_TRANS_DIRTY_PAGES, IO_TREE_ROOT_DIRTY_LOG_PAGES, IO_TREE_INODE_FILE_EXTENT, + IO_TREE_LOG_CSUM_RANGE, IO_TREE_SELFTEST, }; diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 1143f557141b..9e34d9fac6df 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3299,6 +3299,7 @@ static void free_log_tree(struct btrfs_trans_handle *trans, clear_extent_bits(&log->dirty_log_pages, 0, (u64)-1, EXTENT_DIRTY | EXTENT_NEW | EXTENT_NEED_WAIT); + extent_io_tree_release(&log->log_csum_range); btrfs_put_root(log); } @@ -3916,9 +3917,21 @@ static int log_csums(struct btrfs_trans_handle *trans, struct btrfs_root *log_root, struct btrfs_ordered_sum *sums) { + const u64 lock_end = sums->bytenr + sums->len - 1; + struct extent_state *cached_state = NULL; int ret; /* + * Serialize logging for checksums. This is to avoid racing with the + * same checksum being logged by another task that is logging another + * file which happens to refer to the same extent as well. Such races + * can leave checksum items in the log with overlapping ranges. + */ + ret = lock_extent_bits(&log_root->log_csum_range, sums->bytenr, + lock_end, &cached_state); + if (ret) + return ret; + /* * Due to extent cloning, we might have logged a csum item that covers a * subrange of a cloned extent, and later we can end up logging a csum * item for a larger subrange of the same extent or the entire range. @@ -3928,10 +3941,13 @@ static int log_csums(struct btrfs_trans_handle *trans, * trim and adjust) any existing csum items in the log for this range. */ ret = btrfs_del_csums(trans, log_root, sums->bytenr, sums->len); - if (ret) - return ret; + if (!ret) + ret = btrfs_csum_file_blocks(trans, log_root, sums); - return btrfs_csum_file_blocks(trans, log_root, sums); + unlock_extent_cached(&log_root->log_csum_range, sums->bytenr, lock_end, + &cached_state); + + return ret; } static noinline int copy_items(struct btrfs_trans_handle *trans, diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index bcbc763b8814..360b0f9d2220 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -89,6 +89,7 @@ TRACE_DEFINE_ENUM(COMMIT_TRANS); { IO_TREE_TRANS_DIRTY_PAGES, "TRANS_DIRTY_PAGES" }, \ { IO_TREE_ROOT_DIRTY_LOG_PAGES, "ROOT_DIRTY_LOG_PAGES" }, \ { IO_TREE_INODE_FILE_EXTENT, "INODE_FILE_EXTENT" }, \ + { IO_TREE_LOG_CSUM_RANGE, "LOG_CSUM_RANGE" }, \ { IO_TREE_SELFTEST, "SELFTEST" }) #define BTRFS_GROUP_FLAGS \ From patchwork Mon May 18 11:15:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 11555265 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 36CCB618 for ; Mon, 18 May 2020 11:15:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1F5502084C for ; Mon, 18 May 2020 11:15:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1589800505; bh=cQZbP4XHGBvAhpTFkKPZwl0JxUF6acU7bhugkDOzk5s=; h=From:To:Subject:Date:List-ID:From; b=nG/Gubd1LLn0alOzwKfpe8EtIVHsSDrRXjeee9KF9c+HbXHl1eC5GXUawcD++t99Z lznW4pjz4AjwXh5Qe2m0ibIRVZhPObCwJz/jwabJkhzK/FZpNk940niqQu35wdcXWe yOnHGqWZ+cweutCYDoKv3omOWDCeH4cFm13IE+YM= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726847AbgERLPE (ORCPT ); Mon, 18 May 2020 07:15:04 -0400 Received: from mail.kernel.org ([198.145.29.99]:38258 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726526AbgERLPE (ORCPT ); Mon, 18 May 2020 07:15:04 -0400 Received: from debian6.Home (bl8-197-74.dsl.telepac.pt [85.241.197.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id B6D142084C for ; Mon, 18 May 2020 11:15:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1589800503; bh=cQZbP4XHGBvAhpTFkKPZwl0JxUF6acU7bhugkDOzk5s=; h=From:To:Subject:Date:From; b=OhWQNHsQeZ6P8tP9azrItLI+4rbzIvBWZpl18Fefc4Mn+bJSQ7cmO+gzou4F+wFb8 jFjJ71Eqkgfa9f0HdFEAEs7ffQd/cwc7D1NJ/SQmI8pQR/gCsyzYLJCm0G96Qm2C+z dOjQb+n7sixC1+2sbcz0e6c9Mw4J4ttuCMpDkxdQ= From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 2/4] Btrfs: make checksum item extension more efficient Date: Mon, 18 May 2020 12:15:00 +0100 Message-Id: <20200518111500.7268-1-fdmanana@kernel.org> X-Mailer: git-send-email 2.11.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana When we want to add checksums into the checksums tree, or a log tree, we try whenever possible to extend existing checksum items, as this helps reduce amount of metadata space used, since adding a new item uses extra metadata space for a btrfs_item structure (25 bytes). However we have two inneficiencies in the current approach: 1) After finding a checksum item that covers a range with an end offset that matches the start offset of the checksum range we want to insert, we release the search path populated by btrfs_lookup_csum() and then do another COW search on tree with the goal of getting additional space for at least one checksum. Doing this path release and then searching again is a waste of time because very often the leaf already has enough free space for at least one more checksum; 2) After the COW search that guarantees we get free space in the leaf for at least one more checksum, we end up not doing the extension of the previous checksum item, and fallback to insertion of a new checksum item, if the leaf doesn't have an amount of free space larger then the space required for 2 checksums plus one btrfs_item structure - this is pointless for two reasons: a) We want to extend an existing item, so we don't need to account for a btrfs_item structure (25 bytes); b) We made the COW search with an insertion size for 1 single checksum, so if the leaf ends up with a free space amount smaller then 2 checksums plus the size of a btrfs_item structure, we give up on the extension of the existing item and jump to the 'insert' label, where we end up releasing the path and then doing yet another search to insert a new checksum item for a single checksum. Fix these inefficiencies by doing the following: - For case 1), before releasing the path just check if the leaf already has enough space for at least 1 more checksum, and if it does, jump directly to the item extension code, with releasing our current path, which was already COWed by btrfs_lookup_csum(); - For case 2), fix the logic so that for item extension we require only that the leaf has enough free space for 1 checksum, and not a minimum of 2 checksums plus space for a btrfs_item structure. Signed-off-by: Filipe Manana --- fs/btrfs/file-item.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index b618ad5339ba..104858e2a836 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -905,9 +905,22 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans, } /* - * at this point, we know the tree has an item, but it isn't big - * enough yet to put our csum in. Grow it + * At this point, we know the tree has a checksum item that ends at an + * offset matching the start of the checksum range we want to insert. + * We try to extend that item as much as possible and then add as many + * checksums to it as they fit. + * + * First check if the leaf has enough free space for at least one + * checksum. If it has go directly to the item extension code, otherwise + * release the path and do a search for insertion before the extension. */ + if (btrfs_leaf_free_space(leaf) >= csum_size) { + btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]); + csum_offset = (bytenr - found_key.offset) >> + fs_info->sb->s_blocksize_bits; + goto extend_csum; + } + btrfs_release_path(path); ret = btrfs_search_slot(trans, root, &file_key, path, csum_size, 1); @@ -931,19 +944,13 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans, goto insert; } +extend_csum: if (csum_offset == btrfs_item_size_nr(leaf, path->slots[0]) / csum_size) { int extend_nr; u64 tmp; u32 diff; - u32 free_space; - - if (btrfs_leaf_free_space(leaf) < - sizeof(struct btrfs_item) + csum_size * 2) - goto insert; - free_space = btrfs_leaf_free_space(leaf) - - sizeof(struct btrfs_item) - csum_size; tmp = sums->len - total_bytes; tmp >>= fs_info->sb->s_blocksize_bits; WARN_ON(tmp < 1); @@ -954,7 +961,7 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans, MAX_CSUM_ITEMS(fs_info, csum_size) * csum_size); diff = diff - btrfs_item_size_nr(leaf, path->slots[0]); - diff = min(free_space, diff); + diff = min_t(u32, btrfs_leaf_free_space(leaf), diff); diff /= csum_size; diff *= csum_size; From patchwork Mon May 18 11:15:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 11555267 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D55E4618 for ; Mon, 18 May 2020 11:15:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B296E2083E for ; Mon, 18 May 2020 11:15:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1589800512; bh=Akqbndr75LaEW0M3DBNI+KTbH8yvVBmKDrh573VcEYE=; h=From:To:Subject:Date:List-ID:From; b=1XiQ6Lg7wzIDjwr3C2kB8oZsQRlMdOfqtmyTklUAm0mHkCUPrYIsbqrUlEqYsoBpl 797lYeZ8ZH95padKxXvqWAcVtk7BZt9ee94P4XiXVERkXV9hqc+SvAHBdCFlcQ3bNI nHO05pijtBF4oafuW/Q8uczmcdks8KKU7JFm3FVg= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727005AbgERLPM (ORCPT ); Mon, 18 May 2020 07:15:12 -0400 Received: from mail.kernel.org ([198.145.29.99]:38360 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726726AbgERLPL (ORCPT ); Mon, 18 May 2020 07:15:11 -0400 Received: from debian6.Home (bl8-197-74.dsl.telepac.pt [85.241.197.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0F5952083E for ; Mon, 18 May 2020 11:15:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1589800511; bh=Akqbndr75LaEW0M3DBNI+KTbH8yvVBmKDrh573VcEYE=; h=From:To:Subject:Date:From; b=zL4NfIHXF7mFsTeF/LA8P+O0FFr1+TtJNDbpv8Pqu1WAZiR072HNGUaf/JCsR4lW1 WgoNNi6PWl/0XLZU/k9cUNqOV8Pch0r7roE7u21poAFNVhIYLUP39OTMmUDYv6vyx+ RS3wguNfTCzkmv9qTUKC4J/Sf0Xe3I1cKkugr6L8= From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 3/4] Btrfs: do not ignore error from btrfs_next_leaf() when inserting checksums Date: Mon, 18 May 2020 12:15:09 +0100 Message-Id: <20200518111509.12935-1-fdmanana@kernel.org> X-Mailer: git-send-email 2.11.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana We are currently treating any non-zero return value from btrfs_next_leaf() the same way, by going to the code that inserts a new checksum item in the tree. However if btrfs_next_leaf() returns an error (a value < 0), we should just stop and return the error, and not behave as if nothing has happened, since in that case we do not have a way to know if there is a next leaf or we are currently at the last leaf already. So fix that by returning the error from btrfs_next_leaf(). Signed-off-by: Filipe Manana --- fs/btrfs/file-item.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index 104858e2a836..4f563469c97a 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -887,10 +887,12 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans, nritems = btrfs_header_nritems(path->nodes[0]); if (!nritems || (path->slots[0] >= nritems - 1)) { ret = btrfs_next_leaf(root, path); - if (ret == 1) + if (ret < 0) { + goto out; + } else if (ret > 0) { found_next = 1; - if (ret != 0) goto insert; + } slot = path->slots[0]; } btrfs_item_key_to_cpu(path->nodes[0], &found_key, slot); From patchwork Mon May 18 11:15:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 11555269 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 28651618 for ; Mon, 18 May 2020 11:15:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 06EBB20829 for ; Mon, 18 May 2020 11:15:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1589800522; bh=yGWutGgACSorq2W/OXenDP8sAK8+8S5WoQNVhYKNv2E=; h=From:To:Subject:Date:List-ID:From; b=xDpNoaNWK0UdlterXKzhs/sk1ebHVyILdrBG3yeDi+h5oy7rOAaIp3tOkBqmZF94M CEa4rrE3fjdx2qPswqnumPHvR6OYnlXQDMVnf8ezS/YClkoQbSwRDHLRkaaoPFikbM /5y/4ZlDZyTrDbCe56roiSTG3KAKukEz3amaJU1A= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726682AbgERLPV (ORCPT ); Mon, 18 May 2020 07:15:21 -0400 Received: from mail.kernel.org ([198.145.29.99]:38468 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726428AbgERLPV (ORCPT ); Mon, 18 May 2020 07:15:21 -0400 Received: from debian6.Home (bl8-197-74.dsl.telepac.pt [85.241.197.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 718B9207F5 for ; Mon, 18 May 2020 11:15:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1589800520; bh=yGWutGgACSorq2W/OXenDP8sAK8+8S5WoQNVhYKNv2E=; h=From:To:Subject:Date:From; b=fEiuwT3aPBagX9eVrbwfr7ului3IbxRhHNH0XhWwjSULDd/+VfZBovLTu30Y97+02 GXSsuKg1IvhhFftHqzEOO3jVprEA3XWdXKAixYMwpzCnNwcfrFpZTUYcc8XvTNNpUl DFB5I53ISI8il8d07wCa5uTI3Zd1JeSZZ+bIzSdU= From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH 4/4] Btrfs: remove useless 'fail_unlock' label from btrfs_csum_file_blocks() Date: Mon, 18 May 2020 12:15:18 +0100 Message-Id: <20200518111518.18334-1-fdmanana@kernel.org> X-Mailer: git-send-email 2.11.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Filipe Manana The label 'fail_unlock' is pointless, all it does is to jump to the label 'out', so just remove it. Signed-off-by: Filipe Manana --- fs/btrfs/file-item.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index 4f563469c97a..80a4651b72f5 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -869,7 +869,7 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans, } ret = PTR_ERR(item); if (ret != -EFBIG && ret != -ENOENT) - goto fail_unlock; + goto out; if (ret == -EFBIG) { u32 item_size; @@ -927,7 +927,7 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans, ret = btrfs_search_slot(trans, root, &file_key, path, csum_size, 1); if (ret < 0) - goto fail_unlock; + goto out; if (ret > 0) { if (path->slots[0] == 0) @@ -994,9 +994,9 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans, ins_size); path->leave_spinning = 0; if (ret < 0) - goto fail_unlock; + goto out; if (WARN_ON(ret != 0)) - goto fail_unlock; + goto out; leaf = path->nodes[0]; csum: item = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_csum_item); @@ -1026,9 +1026,6 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans, out: btrfs_free_path(path); return ret; - -fail_unlock: - goto out; } void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,