diff mbox series

[5/5] btrfs: restart snapshot delete if we have to end the transaction

Message ID 20200320183436.16908-6-josef@toxicpanda.com (mailing list archive)
State New, archived
Headers show
Series Relocation and backref resolution fixes | expand

Commit Message

Josef Bacik March 20, 2020, 6:34 p.m. UTC
This is to fully fix the deadlock described in

btrfs: do not resolve backrefs for roots that are being deleted

Holding write locks on our deleted snapshot across trans handles will
just lead to sadness, and our backref lookup code is going to want to
still process dropped snapshots for things like qgroup accounting.

Fix this by simply dropping our path before we restart our transaction,
and picking back up from our drop_progress key.  This is less efficient
obviously, but it also doesn't deadlock, so it feels like a reasonable
trade off.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent-tree.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Comments

David Sterba March 20, 2020, 7:39 p.m. UTC | #1
On Fri, Mar 20, 2020 at 02:34:36PM -0400, Josef Bacik wrote:
> @@ -5377,6 +5392,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>  			}
>  			if (block_rsv)
>  				trans->block_rsv = block_rsv;
> +			goto again;

This hunk does not apply cleanly and there's no block_rsv around in
current misc-next + this series. I get this as automatic merge
resolution but that's obviously wrong and I don't see how can I fix it.

  3166                 ret = add_to_free_space_tree(trans, bytenr, num_bytes);                                                                                                                                     
  3167                 if (ret) {                                                                                                                                                                                  
  3168                         btrfs_abort_transaction(trans, ret);                                                                                                                                                
  3169                         goto out;                                                                                                                                                                           
  3170                 }                                                                                                                                                                                           
  3171                                                                                                                                                                                                             
  3172                 ret = btrfs_update_block_group(trans, bytenr, num_bytes, 0);                                                                                                                                
  3173                 if (ret) {                                                                                                                                                                                  
  3174                         btrfs_abort_transaction(trans, ret);                                                                                                                                                
  3175                         goto out;                                                                                                                                                                           
+ 3176                         goto again;                                                                                                                                                                         
  3177                 }                                                                                                                                                                                           
  3178         }                                                                                                                                                                                                   
  3179         btrfs_release_path(path);                                                                                                                                                                           
  3180                                                                                                                                                                                                             
  3181 out:                                                                                                                                                                                                        
  3182         btrfs_free_path(path);                                                                                                                                                                              
  3183         return ret;                                                                                                                                                                                         
  3184 }

So I guess it's because of some other patches in your tree. I'm about to
push misc-next with patches 1-4, so you can have a look.
Holger Hoffstätte Oct. 28, 2020, 10:51 p.m. UTC | #2
On 2020-03-20 19:34, Josef Bacik wrote:
> This is to fully fix the deadlock described in
> 
> btrfs: do not resolve backrefs for roots that are being deleted
> 
> Holding write locks on our deleted snapshot across trans handles will
> just lead to sadness, and our backref lookup code is going to want to
> still process dropped snapshots for things like qgroup accounting.
> 
> Fix this by simply dropping our path before we restart our transaction,
> and picking back up from our drop_progress key.  This is less efficient
> obviously, but it also doesn't deadlock, so it feels like a reasonable
> trade off.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>   fs/btrfs/extent-tree.c | 16 ++++++++++++++++
>   1 file changed, 16 insertions(+)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 2925b3ad77a1..bfb413747283 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -5257,6 +5257,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>   	 * already dropped.
>   	 */
>   	set_bit(BTRFS_ROOT_DELETING, &root->state);
> +again:
>   	if (btrfs_disk_key_objectid(&root_item->drop_progress) == 0) {
>   		level = btrfs_header_level(root->node);
>   		path->nodes[level] = btrfs_lock_root_node(root);
> @@ -5269,7 +5270,9 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>   		btrfs_disk_key_to_cpu(&key, &root_item->drop_progress);
>   		memcpy(&wc->update_progress, &key,
>   		       sizeof(wc->update_progress));
> +		memcpy(&wc->drop_progress, &key, sizeof(key));
>   
> +		wc->drop_level = root_item->drop_level;
>   		level = root_item->drop_level;
>   		BUG_ON(level == 0);
>   		path->lowest_level = level;
> @@ -5362,6 +5365,18 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>   				goto out_end_trans;
>   			}
>   
> +			/*
> +			 * We used to keep the path open until we completed the
> +			 * snapshot delete.  However this can deadlock with
> +			 * things like backref walking that may want to resolve
> +			 * references that still point to this deleted root.  We
> +			 * already have the ability to restart snapshot
> +			 * deletions on mount, so just clear our walk_control,
> +			 * drop the path, and go to the beginning and re-lookup
> +			 * our drop_progress key and continue from there.
> +			 */
> +			memset(wc, 0, sizeof(*wc));
> +			btrfs_release_path(path);
>   			btrfs_end_transaction_throttle(trans);
>   			if (!for_reloc && btrfs_need_cleaner_sleep(fs_info)) {
>   				btrfs_debug(fs_info,
> @@ -5377,6 +5392,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>   			}
>   			if (block_rsv)
>   				trans->block_rsv = block_rsv;
> +			goto again;
>   		}
>   	}
>   	btrfs_release_path(path);
> 

Josef,

the above fix still seems to be missing, apparently since Dave couldn't merge it
properly at the time (see [1]). Is this still needed? There were several long
discussions about balance loops and it would be great to get this fixed once and
for all. It applies and (seems to?) work fine in 5.9 (at least it hasn't eaten
anything here so far) but if it's not needed anymore then all the better.

thanks
Holger

[1] https://lore.kernel.org/linux-btrfs/20200320193927.GH12659@twin.jikos.cz/
Holger Hoffstätte Nov. 20, 2020, 8:48 a.m. UTC | #3
Trying again out-of-thread, but I haven't seen any answers to this yet..

On 2020-03-20 19:34, Josef Bacik wrote:
> This is to fully fix the deadlock described in
> 
> btrfs: do not resolve backrefs for roots that are being deleted
> 
> Holding write locks on our deleted snapshot across trans handles will
> just lead to sadness, and our backref lookup code is going to want to
> still process dropped snapshots for things like qgroup accounting.
> 
> Fix this by simply dropping our path before we restart our transaction,
> and picking back up from our drop_progress key.  This is less efficient
> obviously, but it also doesn't deadlock, so it feels like a reasonable
> trade off.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>   fs/btrfs/extent-tree.c | 16 ++++++++++++++++
>   1 file changed, 16 insertions(+)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 2925b3ad77a1..bfb413747283 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -5257,6 +5257,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>   	 * already dropped.
>   	 */
>   	set_bit(BTRFS_ROOT_DELETING, &root->state);
> +again:
>   	if (btrfs_disk_key_objectid(&root_item->drop_progress) == 0) {
>   		level = btrfs_header_level(root->node);
>   		path->nodes[level] = btrfs_lock_root_node(root);
> @@ -5269,7 +5270,9 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>   		btrfs_disk_key_to_cpu(&key, &root_item->drop_progress);
>   		memcpy(&wc->update_progress, &key,
>   		       sizeof(wc->update_progress));
> +		memcpy(&wc->drop_progress, &key, sizeof(key));
>   
> +		wc->drop_level = root_item->drop_level;
>   		level = root_item->drop_level;
>   		BUG_ON(level == 0);
>   		path->lowest_level = level;
> @@ -5362,6 +5365,18 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>   				goto out_end_trans;
>   			}
>   
> +			/*
> +			 * We used to keep the path open until we completed the
> +			 * snapshot delete.  However this can deadlock with
> +			 * things like backref walking that may want to resolve
> +			 * references that still point to this deleted root.  We
> +			 * already have the ability to restart snapshot
> +			 * deletions on mount, so just clear our walk_control,
> +			 * drop the path, and go to the beginning and re-lookup
> +			 * our drop_progress key and continue from there.
> +			 */
> +			memset(wc, 0, sizeof(*wc));
> +			btrfs_release_path(path);
>   			btrfs_end_transaction_throttle(trans);
>   			if (!for_reloc && btrfs_need_cleaner_sleep(fs_info)) {
>   				btrfs_debug(fs_info,
> @@ -5377,6 +5392,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>   			}
>   			if (block_rsv)
>   				trans->block_rsv = block_rsv;
> +			goto again;
>   		}
>   	}
>   	btrfs_release_path(path);
> 

Josef,

the above fix still seems to be missing, apparently since Dave couldn't merge it
properly at the time (see [1]). Is this still needed? There were several long
discussions about balance loops and it would be great to get this fixed once and
for all. It applies and (seems to?) work fine in 5.9 (at least it hasn't eaten
anything here so far) but if it's not needed anymore then all the better.

thanks
Holger

[1] https://lore.kernel.org/linux-btrfs/20200320193927.GH12659@twin.jikos.cz/
diff mbox series

Patch

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 2925b3ad77a1..bfb413747283 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5257,6 +5257,7 @@  int btrfs_drop_snapshot(struct btrfs_root *root,
 	 * already dropped.
 	 */
 	set_bit(BTRFS_ROOT_DELETING, &root->state);
+again:
 	if (btrfs_disk_key_objectid(&root_item->drop_progress) == 0) {
 		level = btrfs_header_level(root->node);
 		path->nodes[level] = btrfs_lock_root_node(root);
@@ -5269,7 +5270,9 @@  int btrfs_drop_snapshot(struct btrfs_root *root,
 		btrfs_disk_key_to_cpu(&key, &root_item->drop_progress);
 		memcpy(&wc->update_progress, &key,
 		       sizeof(wc->update_progress));
+		memcpy(&wc->drop_progress, &key, sizeof(key));
 
+		wc->drop_level = root_item->drop_level;
 		level = root_item->drop_level;
 		BUG_ON(level == 0);
 		path->lowest_level = level;
@@ -5362,6 +5365,18 @@  int btrfs_drop_snapshot(struct btrfs_root *root,
 				goto out_end_trans;
 			}
 
+			/*
+			 * We used to keep the path open until we completed the
+			 * snapshot delete.  However this can deadlock with
+			 * things like backref walking that may want to resolve
+			 * references that still point to this deleted root.  We
+			 * already have the ability to restart snapshot
+			 * deletions on mount, so just clear our walk_control,
+			 * drop the path, and go to the beginning and re-lookup
+			 * our drop_progress key and continue from there.
+			 */
+			memset(wc, 0, sizeof(*wc));
+			btrfs_release_path(path);
 			btrfs_end_transaction_throttle(trans);
 			if (!for_reloc && btrfs_need_cleaner_sleep(fs_info)) {
 				btrfs_debug(fs_info,
@@ -5377,6 +5392,7 @@  int btrfs_drop_snapshot(struct btrfs_root *root,
 			}
 			if (block_rsv)
 				trans->block_rsv = block_rsv;
+			goto again;
 		}
 	}
 	btrfs_release_path(path);