diff mbox

[v2,2/3] btrfs: relocation: Fix leaking qgroups numbers on data extents

Message ID 20160705093207.14983-3-quwenruo@cn.fujitsu.com (mailing list archive)
State Superseded
Headers show

Commit Message

Qu Wenruo July 5, 2016, 9:32 a.m. UTC
When balancing data extents, qgroup will leak all its numbers for
balanced data extents.

The root cause is that balancing is doing non-standard and almost insane
tree block swap hack.

The problem happens in the following steps:
(Use 4M as original data extent size, and 257 as src root objectid)

1) Balance create new data extents and increase its refs

Balance will alloc new data extent and create new EXTENT_DATA in data
reloc tree, while its refernece is increased with 2 different
referencer: 257 and data reloc tree.

While at that time, file tree is still referring to old extents.

So, the backref walker will only find data reloc tree as referencer for
the new data extent, which doesn't count as file tree.

Extent bytenr   |  Real referencer  |     backrefs     | nr_old_roots |
-----------------------------------------------------------------------
New             | Data reloc        | Data reloc + 257 | 0            |
Old             | 257               | 257              | 1            |

Qgroup number: 4M + metadata

2) Commit trans before merging reloc tree
Then we goes to prepare_to_merge(), which will commit transacation.

In the qgroup update codes inside commit_transaction, although backref
walk codes find the new data extents has 2 backref, but file tree
backref can't find referencer(file tree is still referring to old
extents), and data reloc doesn't count as file tree.

So for nr_old_roots and nr_new_roots of that 2 extents:
Extent bytenr   |   nr_old_roots   | nr_new_roots       | qgroup change |
------------------------------------------------------------------------|
New             | 0(doesn't exist) | 0(data reloc tree) | 0             |
Old             | 1(root 257)      | 1(257)             | 0             |

Qgroup number: 4M + metadata +-0 = 4M + metadata

3) Swap tree blocks and free old tree blocks
Then we goes to merge_reloc_roots(), which swaps the tree blocks
directly, and free the old tree blocks.
Freeing tree blocks will also free its data extents, this goes through
normal routine, and qgroup handles it well, decreasing the numbers.

And since new data extent is not updated here (updated in step 1), so
qgroup won't scan new data extent.

Extent bytenr   | nr_old_roots |   nr_new_roots   | qgroup change |
------------------------------------------------------------------|
New             |---No modification, doesn't go through qgroup----|Error
Old             | 1(root 257)  | 0(doesn't exist) | -4M           |

In fact, New extent's owner changed from data reloc to root 257, which
must go through qgroup accounting.
But the hideous hack of replacing path doesn't do it.

Qgroup number: 4M + metadata -4M = metadata

This patch will fix it by re-dirtying new extent at step 3), so backref
walk and qgroup can get correct result.

After this patch, step 3) will be:

Extent bytenr | nr_old_roots       |   nr_new_roots   | qgroup change |
----------------------------------------------------------------------|
New           | 0(data reloc tree) | 1(root 257)      | +4M           |
Old           | 1(root 257)        | 0(doesn't exist) | -4M           |

And Qgroup number will remain correct:
4M + metadata +4M -4M = 4M + metadata

So we only need to ensure we don't miss some extents.

Reported-by: Mark Fasheh <mfasheh@suse.de>
Reported-by: Filipe Manana <fdmanana@gmail.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
  Use refactored interface.
  Update comment and commit message
---
 fs/btrfs/relocation.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)
diff mbox

Patch

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 0477dca..0a7ddf1 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -31,6 +31,7 @@ 
 #include "async-thread.h"
 #include "free-space-cache.h"
 #include "inode-map.h"
+#include "qgroup.h"
 
 /*
  * backref_node, mapping_node and tree_block start with this
@@ -1750,6 +1751,68 @@  int memcmp_node_keys(struct extent_buffer *eb, int slot,
 }
 
 /*
+ * Helper function to fixup screwed qgroups caused by increased extent ref,
+ * which doesn't follow normal extent ref update behavior.
+ * (Correct behavior is, increase extent ref and modify source root in
+ *  one trans)
+ * No better solution as long as we're doing swapping trick to do balance.
+ */
+static int qgroup_redirty_data_extents(struct btrfs_trans_handle *trans,
+				       struct btrfs_root *root, u64 bytenr,
+				       u64 gen)
+{
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct extent_buffer *leaf;
+	struct btrfs_delayed_ref_root *delayed_refs;
+	int slot;
+	int ret = 0;
+
+	if (!fs_info->quota_enabled || !is_fstree(root->objectid))
+		return 0;
+	if (WARN_ON(!trans))
+		return -EINVAL;
+
+	delayed_refs = &trans->transaction->delayed_refs;
+
+	leaf = read_tree_block(root, bytenr, gen);
+	if (IS_ERR(leaf)) {
+		return PTR_ERR(leaf);
+	} else if (!extent_buffer_uptodate(leaf)) {
+		ret = -EIO;
+		goto out;
+	}
+
+	/* We only care leaf, which may contains EXTENT_DATA */
+	if (btrfs_header_level(leaf) != 0)
+		goto out;
+
+	for (slot = 0; slot < btrfs_header_nritems(leaf); slot++) {
+		struct btrfs_key key;
+		struct btrfs_file_extent_item *fi;
+
+		btrfs_item_key_to_cpu(leaf, &key, slot);
+		if (key.type != BTRFS_EXTENT_DATA_KEY)
+			continue;
+		fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
+		if (btrfs_file_extent_type(leaf, fi) ==
+		    BTRFS_FILE_EXTENT_INLINE ||
+		    btrfs_file_extent_disk_bytenr(leaf, fi) == 0)
+			continue;
+
+		ret = btrfs_qgroup_record_dirty_extent(trans, fs_info,
+				btrfs_file_extent_disk_bytenr(leaf, fi),
+				btrfs_file_extent_disk_num_bytes(leaf, fi),
+				GFP_NOFS);
+		if (ret < 0)
+			break;
+	}
+out:
+	free_extent_buffer(leaf);
+	return ret;
+
+}
+
+/*
  * try to replace tree blocks in fs tree with the new blocks
  * in reloc tree. tree blocks haven't been modified since the
  * reloc tree was create can be replaced.
@@ -1919,7 +1982,25 @@  again:
 					0);
 		BUG_ON(ret);
 
+		/*
+		 * The relocated EXTENT_DATA inside new tree block is allocated
+		 * from data reloc tree, which doesn't affect qgroup.
+		 *
+		 * While after path replacement, EXTENT_DATA in the tree block
+		 * is accounted as fs/file tree, which does affect qgroup.
+		 *
+		 * So here we need to info qgroup to re-account all these
+		 * possible owner-changed EXTENT_DATA, to make qgroup
+		 * correct again.
+		 *
+		 * XXX: Why balance is always using such hacks? Isn't there any
+		 * more sane method?
+		 */
+		ret = qgroup_redirty_data_extents(trans, dest, new_bytenr,
+						  new_ptr_gen);
 		btrfs_unlock_up_safe(path, 0);
+		if (ret < 0)
+			break;
 
 		ret = level;
 		break;