btrfs: unlock the cow block on error

Message ID	7706fb8d62576840c3e7dd69b326b1ae9e6d3ab7.1601304541.git.josef@toxicpanda.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=cyQG=DF=vger.kernel.org=linux-btrfs-owner@kernel.org> From: Josef Bacik <josef@toxicpanda.com> To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH] btrfs: unlock the cow block on error Date: Mon, 28 Sep 2020 10:49:13 -0400 Message-Id: <7706fb8d62576840c3e7dd69b326b1ae9e6d3ab7.1601304541.git.josef@toxicpanda.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	btrfs: unlock the cow block on error \| expand btrfs: unlock the cow block on error

Message ID

7706fb8d62576840c3e7dd69b326b1ae9e6d3ab7.1601304541.git.josef@toxicpanda.com (mailing list archive)

State

New, archived

Headers

From: Josef Bacik <josef@toxicpanda.com>
To: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: [PATCH] btrfs: unlock the cow block on error
Date: Mon, 28 Sep 2020 10:49:13 -0400
Message-Id: 
 <7706fb8d62576840c3e7dd69b326b1ae9e6d3ab7.1601304541.git.josef@toxicpanda.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

btrfs: unlock the cow block on error | expand

Commit Message

Josef Bacik Sept. 28, 2020, 2:49 p.m. UTC

With my automated fstest runs I noticed one of my workers didn't report
results.  Turns out it was hung trying to write back inodes, and the
write back thread was stuck trying to lock an extent buffer

[root@xfstests2 xfstests-dev]# cat /proc/2143497/stack
[<0>] __btrfs_tree_lock+0x108/0x250
[<0>] lock_extent_buffer_for_io+0x35e/0x3a0
[<0>] btree_write_cache_pages+0x15a/0x3b0
[<0>] do_writepages+0x28/0xb0
[<0>] __writeback_single_inode+0x54/0x5c0
[<0>] writeback_sb_inodes+0x1e8/0x510
[<0>] wb_writeback+0xcc/0x440
[<0>] wb_workfn+0xd7/0x650
[<0>] process_one_work+0x236/0x560
[<0>] worker_thread+0x55/0x3c0
[<0>] kthread+0x13a/0x150
[<0>] ret_from_fork+0x1f/0x30

This is because we got an error while cow'ing a block, specifically here

        if (test_bit(BTRFS_ROOT_SHAREABLE, &root->state)) {
                ret = btrfs_reloc_cow_block(trans, root, buf, cow);
                if (ret) {
                        btrfs_abort_transaction(trans, ret);
                        return ret;
                }
        }

The problem here is that as soon as we allocate the new block it is
locked and marked dirty in the btree inode.  This means that we could
attempt to writeback this block and need to lock the extent buffer.
However we're not unlocking it here and thus we deadlock.

Fix this by unlocking the cow block if we have any errors inside of
__btrfs_cow_block.

Fixes: 65b51a009e29 ("btrfs_search_slot: reduce lock contention by cowing in two stages")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Filipe Manana Sept. 29, 2020, 12:38 p.m. UTC | #1

On Mon, Sep 28, 2020 at 3:52 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> With my automated fstest runs I noticed one of my workers didn't report
> results.  Turns out it was hung trying to write back inodes, and the
> write back thread was stuck trying to lock an extent buffer
>
> [root@xfstests2 xfstests-dev]# cat /proc/2143497/stack
> [<0>] __btrfs_tree_lock+0x108/0x250
> [<0>] lock_extent_buffer_for_io+0x35e/0x3a0
> [<0>] btree_write_cache_pages+0x15a/0x3b0
> [<0>] do_writepages+0x28/0xb0
> [<0>] __writeback_single_inode+0x54/0x5c0
> [<0>] writeback_sb_inodes+0x1e8/0x510
> [<0>] wb_writeback+0xcc/0x440
> [<0>] wb_workfn+0xd7/0x650
> [<0>] process_one_work+0x236/0x560
> [<0>] worker_thread+0x55/0x3c0
> [<0>] kthread+0x13a/0x150
> [<0>] ret_from_fork+0x1f/0x30
>
> This is because we got an error while cow'ing a block, specifically here
>
>         if (test_bit(BTRFS_ROOT_SHAREABLE, &root->state)) {
>                 ret = btrfs_reloc_cow_block(trans, root, buf, cow);
>                 if (ret) {
>                         btrfs_abort_transaction(trans, ret);
>                         return ret;
>                 }
>         }
>
> The problem here is that as soon as we allocate the new block it is
> locked and marked dirty in the btree inode.  This means that we could
> attempt to writeback this block and need to lock the extent buffer.
> However we're not unlocking it here and thus we deadlock.
>
> Fix this by unlocking the cow block if we have any errors inside of
> __btrfs_cow_block.
>
> Fixes: 65b51a009e29 ("btrfs_search_slot: reduce lock contention by cowing in two stages")
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, thanks.

> ---
>  fs/btrfs/ctree.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index a165093739c4..a6b6d1f74f23 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1064,6 +1064,7 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
>
>         ret = update_ref_for_cow(trans, root, buf, cow, &last_ref);
>         if (ret) {
> +               btrfs_tree_unlock(cow);
>                 btrfs_abort_transaction(trans, ret);
>                 return ret;
>         }
> @@ -1071,6 +1072,7 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
>         if (test_bit(BTRFS_ROOT_SHAREABLE, &root->state)) {
>                 ret = btrfs_reloc_cow_block(trans, root, buf, cow);
>                 if (ret) {
> +                       btrfs_tree_unlock(cow);
>                         btrfs_abort_transaction(trans, ret);
>                         return ret;
>                 }
> @@ -1103,6 +1105,7 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
>                 if (last_ref) {
>                         ret = tree_mod_log_free_eb(buf);
>                         if (ret) {
> +                               btrfs_tree_unlock(cow);
>                                 btrfs_abort_transaction(trans, ret);
>                                 return ret;
>                         }
> --
> 2.26.2
>

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index a165093739c4..a6b6d1f74f23 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1064,6 +1064,7 @@  static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
 
 	ret = update_ref_for_cow(trans, root, buf, cow, &last_ref);
 	if (ret) {
+		btrfs_tree_unlock(cow);
 		btrfs_abort_transaction(trans, ret);
 		return ret;
 	}
@@ -1071,6 +1072,7 @@  static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
 	if (test_bit(BTRFS_ROOT_SHAREABLE, &root->state)) {
 		ret = btrfs_reloc_cow_block(trans, root, buf, cow);
 		if (ret) {
+			btrfs_tree_unlock(cow);
 			btrfs_abort_transaction(trans, ret);
 			return ret;
 		}
@@ -1103,6 +1105,7 @@  static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
 		if (last_ref) {
 			ret = tree_mod_log_free_eb(buf);
 			if (ret) {
+				btrfs_tree_unlock(cow);
 				btrfs_abort_transaction(trans, ret);
 				return ret;
 			}

btrfs: unlock the cow block on error

Commit Message

Comments

Patch