diff mbox series

[RFC] btrfs: relocation: Hunt down BUG_ON() in merge_reloc_roots()

Message ID 20190926063545.20403-1-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series [RFC] btrfs: relocation: Hunt down BUG_ON() in merge_reloc_roots() | expand

Commit Message

Qu Wenruo Sept. 26, 2019, 6:35 a.m. UTC
[BUG]
There is one BUG_ON() report where a transaction is aborted during
balance, then kernel BUG_ON() in merge_reloc_roots():

  void merge_reloc_roots(struct reloc_control *rc)
  {
	...
	BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root)); <<<
  }

[CAUSE]
It's still uncertain why we can get to such situation.
As all __add_reloc_root() calls will also link that reloc root to
rc->reloc_roots, and in merge_reloc_roots() we cleanup rc->reloc_roots.

So the root cause is still uncertain.

[FIX]
But we can still hunt down all the BUG_ON() in merge_reloc_roots().

There are 3 BUG_ON()s in it:
- BUG_ON() for read_fs_root() result
- BUG_ON() for root->reloc_root != reloc_root case
- BUG_ON() for the non-empty reloc_root_tree

For the first two, just grab the return value and goto out tag for
cleanup.

For the last one, make it more graceful by:
- grab the lock before doing read/write
- warn instead of panic
- cleanup the nodes in rc->reloc_root_tree

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Reason for RFC:
The root cause to leak nodes in reloc_root_tree is still unknown.
---
 fs/btrfs/relocation.c | 39 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 36 insertions(+), 3 deletions(-)

Comments

Josef Bacik Sept. 26, 2019, 6:33 p.m. UTC | #1
On Thu, Sep 26, 2019 at 02:35:45PM +0800, Qu Wenruo wrote:
> [BUG]
> There is one BUG_ON() report where a transaction is aborted during
> balance, then kernel BUG_ON() in merge_reloc_roots():
> 
>   void merge_reloc_roots(struct reloc_control *rc)
>   {
> 	...
> 	BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root)); <<<
>   }
> 
> [CAUSE]
> It's still uncertain why we can get to such situation.
> As all __add_reloc_root() calls will also link that reloc root to
> rc->reloc_roots, and in merge_reloc_roots() we cleanup rc->reloc_roots.
> 
> So the root cause is still uncertain.
> 
> [FIX]
> But we can still hunt down all the BUG_ON() in merge_reloc_roots().
> 
> There are 3 BUG_ON()s in it:
> - BUG_ON() for read_fs_root() result
> - BUG_ON() for root->reloc_root != reloc_root case
> - BUG_ON() for the non-empty reloc_root_tree
> 
> For the first two, just grab the return value and goto out tag for
> cleanup.
> 
> For the last one, make it more graceful by:
> - grab the lock before doing read/write
> - warn instead of panic
> - cleanup the nodes in rc->reloc_root_tree
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

If we're going to do these things we might as well add the ability to error
inject and add an xfstest to verify they don't break anything.  The
BUG_ON(root->reloc_root != reloc_root) isn't able to be tested, but you can at
least make read_fs_root() and merge_reloc_root() and then test this patch by
triggering errors in these paths.

If you do that I'm ok with not being able to explain the leaking nodes, getting
rid of the BUG_ON()'s is good enough reason.  Thanks,

Josef
Qu Wenruo Sept. 27, 2019, 12:25 a.m. UTC | #2
On 2019/9/27 上午2:33, Josef Bacik wrote:
> On Thu, Sep 26, 2019 at 02:35:45PM +0800, Qu Wenruo wrote:
>> [BUG]
>> There is one BUG_ON() report where a transaction is aborted during
>> balance, then kernel BUG_ON() in merge_reloc_roots():
>>
>>   void merge_reloc_roots(struct reloc_control *rc)
>>   {
>> 	...
>> 	BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root)); <<<
>>   }
>>
>> [CAUSE]
>> It's still uncertain why we can get to such situation.
>> As all __add_reloc_root() calls will also link that reloc root to
>> rc->reloc_roots, and in merge_reloc_roots() we cleanup rc->reloc_roots.
>>
>> So the root cause is still uncertain.
>>
>> [FIX]
>> But we can still hunt down all the BUG_ON() in merge_reloc_roots().
>>
>> There are 3 BUG_ON()s in it:
>> - BUG_ON() for read_fs_root() result
>> - BUG_ON() for root->reloc_root != reloc_root case
>> - BUG_ON() for the non-empty reloc_root_tree
>>
>> For the first two, just grab the return value and goto out tag for
>> cleanup.
>>
>> For the last one, make it more graceful by:
>> - grab the lock before doing read/write
>> - warn instead of panic
>> - cleanup the nodes in rc->reloc_root_tree
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> 
> If we're going to do these things we might as well add the ability to error
> inject and add an xfstest to verify they don't break anything.

BCC has tool to inject certain errors to memory allocation.
And I have btrfs-profiler tool to inject any overrideable function with
similiar callchain condition.

The problem is, it's not something can be provided through xfstest itself.

Furthermore, I see no reason why merge_reloc_root() failure could lead
to the final BUG_ON(), thus I'm not sure even with error injection we
could hit the final BUG_ON(), meaning even with error injection, we may
not get full coverage.

But anyway, I'll add the error injection test result in the next update.

Thanks,
Qu

>  The
> BUG_ON(root->reloc_root != reloc_root) isn't able to be tested, but you can at
> least make read_fs_root() and merge_reloc_root() and then test this patch by
> triggering errors in these paths.
> 
> If you do that I'm ok with not being able to explain the leaking nodes, getting
> rid of the BUG_ON()'s is good enough reason.  Thanks,
> 
> Josef
>
David Sterba Sept. 30, 2019, 5:51 p.m. UTC | #3
On Thu, Sep 26, 2019 at 02:35:45PM +0800, Qu Wenruo wrote:
> [BUG]
> There is one BUG_ON() report where a transaction is aborted during
> balance, then kernel BUG_ON() in merge_reloc_roots():

Do you have details from the report, eg. what's the error code?

>   void merge_reloc_roots(struct reloc_control *rc)
>   {
> 	...
> 	BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root)); <<<
>   }
> 
> [CAUSE]
> It's still uncertain why we can get to such situation.
> As all __add_reloc_root() calls will also link that reloc root to
> rc->reloc_roots, and in merge_reloc_roots() we cleanup rc->reloc_roots.
> 
> So the root cause is still uncertain.
> 
> [FIX]
> But we can still hunt down all the BUG_ON() in merge_reloc_roots().
> 
> There are 3 BUG_ON()s in it:
> - BUG_ON() for read_fs_root() result
> - BUG_ON() for root->reloc_root != reloc_root case
> - BUG_ON() for the non-empty reloc_root_tree

relocation.c is worst regarding number of BUG_ONs, some of them look
like runtime assertions of the invariants but some of them are
substituting for error handling.

The first one BUG_ON(IS_ERR(root)); is clearly the latter, the other are
assertions

> 
> For the first two, just grab the return value and goto out tag for
> cleanup.
> 
> For the last one, make it more graceful by:
> - grab the lock before doing read/write
> - warn instead of panic
> - cleanup the nodes in rc->reloc_root_tree
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Reason for RFC:
> The root cause to leak nodes in reloc_root_tree is still unknown.
> ---
>  fs/btrfs/relocation.c | 39 ++++++++++++++++++++++++++++++++++++---
>  1 file changed, 36 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 655f1d5a8c27..d562b5c52a40 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -2484,11 +2484,26 @@ void merge_reloc_roots(struct reloc_control *rc)
>  		if (btrfs_root_refs(&reloc_root->root_item) > 0) {
>  			root = read_fs_root(fs_info,
>  					    reloc_root->root_key.offset);
> -			BUG_ON(IS_ERR(root));
> -			BUG_ON(root->reloc_root != reloc_root);
> +			if (IS_ERR(root)) {

This is bug_on -> error handling, ok

> +				ret = PTR_ERR(root);
> +				btrfs_err(fs_info,
> +					  "failed to read root %llu: %d",
> +					  reloc_root->root_key.offset, ret);
> +				goto out;
> +			}
> +			if (root->reloc_root != reloc_root) {

With this one I'm not sure it could happen but replacing the bug on is
still good.

> +				btrfs_err(fs_info,
> +					"reloc root mismatch for root %llu",

Would be good to print the number of the other root as well.

> +					root->root_key.objectid);
> +				ret = -EINVAL;
> +				goto out;
> +			}
>  
>  			ret = merge_reloc_root(rc, root);
>  			if (ret) {
> +				btrfs_err(fs_info,
> +			"failed to merge reloc tree for root %llu: %d",
> +					  root->root_key.objectid, ret);
>  				if (list_empty(&reloc_root->root_list))
>  					list_add_tail(&reloc_root->root_list,
>  						      &reloc_roots);
> @@ -2520,7 +2535,25 @@ void merge_reloc_roots(struct reloc_control *rc)
>  			free_reloc_roots(&reloc_roots);
>  	}
>  
> -	BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root));

This one looks more like the invariant, the tree should be really empty
here. While the cleanup is trying to make things work despite there's a
problem, I think the warning should not be debugging only.

> +	spin_lock(&rc->reloc_root_tree.lock);
> +	/* Cleanup dirty reloc tree nodes */
> +	if (!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root)) {
> +		struct mapping_node *node;
> +		struct mapping_node *next;
> +
> +		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));

...
Qu Wenruo Sept. 30, 2019, 11:45 p.m. UTC | #4
On 2019/10/1 上午1:51, David Sterba wrote:
> On Thu, Sep 26, 2019 at 02:35:45PM +0800, Qu Wenruo wrote:
>> [BUG]
>> There is one BUG_ON() report where a transaction is aborted during
>> balance, then kernel BUG_ON() in merge_reloc_roots():
> 
> Do you have details from the report, eg. what's the error code?

Nope, what I get is only kernel message.

Not enough to get the error code.

> 
>>   void merge_reloc_roots(struct reloc_control *rc)
>>   {
>> 	...
>> 	BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root)); <<<
>>   }
>>
>> [CAUSE]
>> It's still uncertain why we can get to such situation.
>> As all __add_reloc_root() calls will also link that reloc root to
>> rc->reloc_roots, and in merge_reloc_roots() we cleanup rc->reloc_roots.
>>
>> So the root cause is still uncertain.
>>
>> [FIX]
>> But we can still hunt down all the BUG_ON() in merge_reloc_roots().
>>
>> There are 3 BUG_ON()s in it:
>> - BUG_ON() for read_fs_root() result
>> - BUG_ON() for root->reloc_root != reloc_root case
>> - BUG_ON() for the non-empty reloc_root_tree
> 
> relocation.c is worst regarding number of BUG_ONs, some of them look
> like runtime assertions of the invariants but some of them are
> substituting for error handling.

Yeah, if we inject kmalloc error into btrfs_relocate_block_group(), it's
too easy to hit BUG_ON().

> 
> The first one BUG_ON(IS_ERR(root)); is clearly the latter, the other are
> assertions
> 
>>
>> For the first two, just grab the return value and goto out tag for
>> cleanup.
>>
>> For the last one, make it more graceful by:
>> - grab the lock before doing read/write
>> - warn instead of panic
>> - cleanup the nodes in rc->reloc_root_tree
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> Reason for RFC:
>> The root cause to leak nodes in reloc_root_tree is still unknown.
>> ---
>>  fs/btrfs/relocation.c | 39 ++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 36 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>> index 655f1d5a8c27..d562b5c52a40 100644
>> --- a/fs/btrfs/relocation.c
>> +++ b/fs/btrfs/relocation.c
>> @@ -2484,11 +2484,26 @@ void merge_reloc_roots(struct reloc_control *rc)
>>  		if (btrfs_root_refs(&reloc_root->root_item) > 0) {
>>  			root = read_fs_root(fs_info,
>>  					    reloc_root->root_key.offset);
>> -			BUG_ON(IS_ERR(root));
>> -			BUG_ON(root->reloc_root != reloc_root);
>> +			if (IS_ERR(root)) {
> 
> This is bug_on -> error handling, ok
> 
>> +				ret = PTR_ERR(root);
>> +				btrfs_err(fs_info,
>> +					  "failed to read root %llu: %d",
>> +					  reloc_root->root_key.offset, ret);
>> +				goto out;
>> +			}
>> +			if (root->reloc_root != reloc_root) {
> 
> With this one I'm not sure it could happen but replacing the bug on is
> still good.
> 
>> +				btrfs_err(fs_info,
>> +					"reloc root mismatch for root %llu",
> 
> Would be good to print the number of the other root as well.
> 
>> +					root->root_key.objectid);
>> +				ret = -EINVAL;
>> +				goto out;
>> +			}
>>  
>>  			ret = merge_reloc_root(rc, root);
>>  			if (ret) {
>> +				btrfs_err(fs_info,
>> +			"failed to merge reloc tree for root %llu: %d",
>> +					  root->root_key.objectid, ret);
>>  				if (list_empty(&reloc_root->root_list))
>>  					list_add_tail(&reloc_root->root_list,
>>  						      &reloc_roots);
>> @@ -2520,7 +2535,25 @@ void merge_reloc_roots(struct reloc_control *rc)
>>  			free_reloc_roots(&reloc_roots);
>>  	}
>>  
>> -	BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root));
> 
> This one looks more like the invariant, the tree should be really empty
> here. While the cleanup is trying to make things work despite there's a
> problem, I think the warning should not be debugging only.

Indeed, we should output the warning no matter whatever.

Thanks,
Qu
> 
>> +	spin_lock(&rc->reloc_root_tree.lock);
>> +	/* Cleanup dirty reloc tree nodes */
>> +	if (!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root)) {
>> +		struct mapping_node *node;
>> +		struct mapping_node *next;
>> +
>> +		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> 
> ...
>
diff mbox series

Patch

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 655f1d5a8c27..d562b5c52a40 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2484,11 +2484,26 @@  void merge_reloc_roots(struct reloc_control *rc)
 		if (btrfs_root_refs(&reloc_root->root_item) > 0) {
 			root = read_fs_root(fs_info,
 					    reloc_root->root_key.offset);
-			BUG_ON(IS_ERR(root));
-			BUG_ON(root->reloc_root != reloc_root);
+			if (IS_ERR(root)) {
+				ret = PTR_ERR(root);
+				btrfs_err(fs_info,
+					  "failed to read root %llu: %d",
+					  reloc_root->root_key.offset, ret);
+				goto out;
+			}
+			if (root->reloc_root != reloc_root) {
+				btrfs_err(fs_info,
+					"reloc root mismatch for root %llu",
+					root->root_key.objectid);
+				ret = -EINVAL;
+				goto out;
+			}
 
 			ret = merge_reloc_root(rc, root);
 			if (ret) {
+				btrfs_err(fs_info,
+			"failed to merge reloc tree for root %llu: %d",
+					  root->root_key.objectid, ret);
 				if (list_empty(&reloc_root->root_list))
 					list_add_tail(&reloc_root->root_list,
 						      &reloc_roots);
@@ -2520,7 +2535,25 @@  void merge_reloc_roots(struct reloc_control *rc)
 			free_reloc_roots(&reloc_roots);
 	}
 
-	BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root));
+	spin_lock(&rc->reloc_root_tree.lock);
+	/* Cleanup dirty reloc tree nodes */
+	if (!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root)) {
+		struct mapping_node *node;
+		struct mapping_node *next;
+
+		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+		rbtree_postorder_for_each_entry_safe(node, next,
+				&rc->reloc_root_tree.rb_root, rb_node) {
+			struct btrfs_root *root = node->data;
+
+			if (root)
+				btrfs_debug(fs_info,
+				"root %llu still in rc->reloc_root_tree",
+					    root->root_key.offset);
+			kfree(node);
+		}
+	}
+	spin_unlock(&rc->reloc_root_tree.lock);
 }
 
 static void free_block_list(struct rb_root *blocks)