[v3] btrfs: qgroup: Remove qgroup items along with subvolume deletion
diff mbox series

Message ID 3ea480fc-46ad-4072-198d-1e8f49121be0@jp.fujitsu.com
State New
Headers show
Series
  • [v3] btrfs: qgroup: Remove qgroup items along with subvolume deletion
Related show

Commit Message

Misono Tomohiro Aug. 6, 2018, 4:53 a.m. UTC
When qgroup is on, subvolume deletion does not remove qgroup items
of the subvolume (qgroup info, limit, relation) from quota tree and
they need to get removed manually by "btrfs qgroup destroy".

Since level 0 qgroup cannot be used/inherited by any other subvolume,
let's remove them automatically when subvolume is deleted
(to be precise, when the subvolume root is dropped).

Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
---
v2 -> v3:
  Use root->root_key.objectid instead of root->objectid
  Add Reviewed-by tag

v1 -> v2:
  Move call of btrfs_remove_qgroup() from btrfs_delete_subvolume()
  to btrfs_snapshot_destroy() so that it will be called after the
  subvolume root is really dropped

 fs/btrfs/extent-tree.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

Comments

Lu Fengqi Aug. 7, 2018, 4:23 p.m. UTC | #1
On Mon, Aug 06, 2018 at 01:53:28PM +0900, Misono Tomohiro wrote:
>When qgroup is on, subvolume deletion does not remove qgroup items
>of the subvolume (qgroup info, limit, relation) from quota tree and
>they need to get removed manually by "btrfs qgroup destroy".
>
>Since level 0 qgroup cannot be used/inherited by any other subvolume,
>let's remove them automatically when subvolume is deleted
>(to be precise, when the subvolume root is dropped).
>
>Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
>Reviewed-by: Qu Wenruo <wqu@suse.com>
>Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
>---
>v2 -> v3:
>  Use root->root_key.objectid instead of root->objectid
>  Add Reviewed-by tag
>
>v1 -> v2:
>  Move call of btrfs_remove_qgroup() from btrfs_delete_subvolume()
>  to btrfs_snapshot_destroy() so that it will be called after the
>  subvolume root is really dropped
>
> fs/btrfs/extent-tree.c | 16 ++++++++++++----
> 1 file changed, 12 insertions(+), 4 deletions(-)
>
>diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>index 9e7b237b9547..48edf839ed2c 100644
>--- a/fs/btrfs/extent-tree.c
>+++ b/fs/btrfs/extent-tree.c
>@@ -8871,12 +8871,13 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
> 	struct btrfs_root_item *root_item = &root->root_item;
> 	struct walk_control *wc;
> 	struct btrfs_key key;
>+	u64 objectid = root->root_key.objectid;
> 	int err = 0;
> 	int ret;
> 	int level;
> 	bool root_dropped = false;
> 
>-	btrfs_debug(fs_info, "Drop subvolume %llu", root->objectid);
>+	btrfs_debug(fs_info, "Drop subvolume %llu", objectid);
> 
> 	path = btrfs_alloc_path();
> 	if (!path) {
>@@ -9030,7 +9031,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
> 		goto out_end_trans;
> 	}
> 
>-	if (root->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID) {
>+	if (objectid != BTRFS_TREE_RELOC_OBJECTID) {
> 		ret = btrfs_find_root(tree_root, &root->root_key, path,
> 				      NULL, NULL);
> 		if (ret < 0) {
>@@ -9043,8 +9044,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
> 			 *
> 			 * The most common failure here is just -ENOENT.
> 			 */
>-			btrfs_del_orphan_item(trans, tree_root,
>-					      root->root_key.objectid);
>+			btrfs_del_orphan_item(trans, tree_root, objectid);
> 		}
> 	}
> 
>@@ -9056,6 +9056,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
> 		btrfs_put_fs_root(root);
> 	}
> 	root_dropped = true;
>+
>+	 /* Remove level-0 qgroup items since no other subvolume can use them */
>+	ret = btrfs_remove_qgroup(trans, objectid);
>+	if (ret && ret != -EINVAL && ret != -ENOENT) {

I'm sorry for missing the snapshot case. If it is a snapshot, then when
we remove the relevant qgroup, we will not be able to perform
quick_update_accounting(), and it will return 1. So we shouldn't abort
the transaction when the return value = 1.

btrfs_remove_qgroup
-> __del_qgroup_relation
   -> quick_update_accounting << if qgroup->excl != qgroup->rfer; return 1
Misono Tomohiro Aug. 8, 2018, 2:56 a.m. UTC | #2
On 2018/08/08 1:23, Lu Fengqi wrote:
> On Mon, Aug 06, 2018 at 01:53:28PM +0900, Misono Tomohiro wrote:
>> When qgroup is on, subvolume deletion does not remove qgroup items
>> of the subvolume (qgroup info, limit, relation) from quota tree and
>> they need to get removed manually by "btrfs qgroup destroy".
>>
>> Since level 0 qgroup cannot be used/inherited by any other subvolume,
>> let's remove them automatically when subvolume is deleted
>> (to be precise, when the subvolume root is dropped).
>>
>> Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
>> Reviewed-by: Qu Wenruo <wqu@suse.com>
>> Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
>> ---
>> v2 -> v3:
>>  Use root->root_key.objectid instead of root->objectid
>>  Add Reviewed-by tag
>>
>> v1 -> v2:
>>  Move call of btrfs_remove_qgroup() from btrfs_delete_subvolume()
>>  to btrfs_snapshot_destroy() so that it will be called after the
>>  subvolume root is really dropped
>>
>> fs/btrfs/extent-tree.c | 16 ++++++++++++----
>> 1 file changed, 12 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 9e7b237b9547..48edf839ed2c 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -8871,12 +8871,13 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>> 	struct btrfs_root_item *root_item = &root->root_item;
>> 	struct walk_control *wc;
>> 	struct btrfs_key key;
>> +	u64 objectid = root->root_key.objectid;
>> 	int err = 0;
>> 	int ret;
>> 	int level;
>> 	bool root_dropped = false;
>>
>> -	btrfs_debug(fs_info, "Drop subvolume %llu", root->objectid);
>> +	btrfs_debug(fs_info, "Drop subvolume %llu", objectid);
>>
>> 	path = btrfs_alloc_path();
>> 	if (!path) {
>> @@ -9030,7 +9031,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>> 		goto out_end_trans;
>> 	}
>>
>> -	if (root->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID) {
>> +	if (objectid != BTRFS_TREE_RELOC_OBJECTID) {
>> 		ret = btrfs_find_root(tree_root, &root->root_key, path,
>> 				      NULL, NULL);
>> 		if (ret < 0) {
>> @@ -9043,8 +9044,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>> 			 *
>> 			 * The most common failure here is just -ENOENT.
>> 			 */
>> -			btrfs_del_orphan_item(trans, tree_root,
>> -					      root->root_key.objectid);
>> +			btrfs_del_orphan_item(trans, tree_root, objectid);
>> 		}
>> 	}
>>
>> @@ -9056,6 +9056,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>> 		btrfs_put_fs_root(root);
>> 	}
>> 	root_dropped = true;
>> +
>> +	 /* Remove level-0 qgroup items since no other subvolume can use them */
>> +	ret = btrfs_remove_qgroup(trans, objectid);
>> +	if (ret && ret != -EINVAL && ret != -ENOENT) {
> 
> I'm sorry for missing the snapshot case. If it is a snapshot, then when
> we remove the relevant qgroup, we will not be able to perform
> quick_update_accounting(), and it will return 1. So we shouldn't abort
> the transaction when the return value = 1.
> 
> btrfs_remove_qgroup
> -> __del_qgroup_relation
>    -> quick_update_accounting << if qgroup->excl != qgroup->rfer; return 1
> 

oh, thanks for pointing out, I missed that case...
I wonder if there is a way to keep qgroup consistency without enforcing rescan.

btw, I noticed that removing qgroup relation when excl != rfer causes
some problem without this patch:

(4.18-rc7)
$ mkfs.btrfs -fq $DEV
$ mount $DEV /mnt

$ btrfs quota enable /mnt
$ btrfs qgroup create 1/0 /mnt
$ btrfs sub create /mnt/sub
$ btrfs qgroup assign 0/257 1/0 /mnt

$ dd if=/dev/urandom of=/mnt/sub/file bs=1k count=1000
$ btrfs sub snap /mnt/sub /mnt/snap
$ dd if=/dev/urandom of=/mnt/sub/file2 bs=1k count=1000

$ btrfs fi sync /mnt
$ btrfs qgroup show -pcre /mnt
qgroupid         rfer         excl     max_rfer     max_excl parent  child
--------         ----         ----     --------     -------- ------  -----
0/5          16.00KiB     16.00KiB         none         none ---     ---
0/257         1.97MiB   1016.00KiB         none         none 1/0     ---
0/258      1016.00KiB     16.00KiB         none         none ---     ---
1/0           1.97MiB      1.97MiB         none         none ---     0/257
                           ^^^^^^^ should be 1016Kib

$ btrfs qgroup remove 0/257 1/0 /mnt
WARNING: quotas may be inconsistent, rescan needed
$ btrfs qgroup show -pcre  /mnt
WARNING: qgroup data inconsistent, rescan recommended
qgroupid         rfer         excl     max_rfer     max_excl parent  child
--------         ----         ----     --------     -------- ------  -----
0/5          16.00KiB     16.00KiB         none         none ---     ---
0/258      1016.00KiB   1016.00KiB         none         none ---     ---
1/0           1.97MiB      1.97MiB         none         none ---     ---

$ btrfs quota rescan -w /mnt
$ btrfs qgroup show -pcre /mnt
qgroupid         rfer         excl     max_rfer     max_excl parent  child
--------         ----         ----     --------     -------- ------  -----
0/5          16.00KiB     16.00KiB         none         none ---     ---
0/258      1016.00KiB   1016.00KiB         none         none ---     ---
1/0           1.97MiB      1.97MiB         none         none ---     ---
               ^^^^^^      ^^^^^^^ not changed, although inconsistency flag is cleared

I will look at the case to better understand qgroup.

Thanks,
Misono

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Aug. 8, 2018, 5:15 a.m. UTC | #3
On 2018年08月08日 10:56, Misono Tomohiro wrote:
> On 2018/08/08 1:23, Lu Fengqi wrote:
>> On Mon, Aug 06, 2018 at 01:53:28PM +0900, Misono Tomohiro wrote:
>>> When qgroup is on, subvolume deletion does not remove qgroup items
>>> of the subvolume (qgroup info, limit, relation) from quota tree and
>>> they need to get removed manually by "btrfs qgroup destroy".
>>>
>>> Since level 0 qgroup cannot be used/inherited by any other subvolume,
>>> let's remove them automatically when subvolume is deleted
>>> (to be precise, when the subvolume root is dropped).
>>>
>>> Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
>>> Reviewed-by: Qu Wenruo <wqu@suse.com>
>>> Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
>>> ---
>>> v2 -> v3:
>>>  Use root->root_key.objectid instead of root->objectid
>>>  Add Reviewed-by tag
>>>
>>> v1 -> v2:
>>>  Move call of btrfs_remove_qgroup() from btrfs_delete_subvolume()
>>>  to btrfs_snapshot_destroy() so that it will be called after the
>>>  subvolume root is really dropped
>>>
>>> fs/btrfs/extent-tree.c | 16 ++++++++++++----
>>> 1 file changed, 12 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index 9e7b237b9547..48edf839ed2c 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -8871,12 +8871,13 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>>> 	struct btrfs_root_item *root_item = &root->root_item;
>>> 	struct walk_control *wc;
>>> 	struct btrfs_key key;
>>> +	u64 objectid = root->root_key.objectid;
>>> 	int err = 0;
>>> 	int ret;
>>> 	int level;
>>> 	bool root_dropped = false;
>>>
>>> -	btrfs_debug(fs_info, "Drop subvolume %llu", root->objectid);
>>> +	btrfs_debug(fs_info, "Drop subvolume %llu", objectid);
>>>
>>> 	path = btrfs_alloc_path();
>>> 	if (!path) {
>>> @@ -9030,7 +9031,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>>> 		goto out_end_trans;
>>> 	}
>>>
>>> -	if (root->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID) {
>>> +	if (objectid != BTRFS_TREE_RELOC_OBJECTID) {
>>> 		ret = btrfs_find_root(tree_root, &root->root_key, path,
>>> 				      NULL, NULL);
>>> 		if (ret < 0) {
>>> @@ -9043,8 +9044,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>>> 			 *
>>> 			 * The most common failure here is just -ENOENT.
>>> 			 */
>>> -			btrfs_del_orphan_item(trans, tree_root,
>>> -					      root->root_key.objectid);
>>> +			btrfs_del_orphan_item(trans, tree_root, objectid);
>>> 		}
>>> 	}
>>>
>>> @@ -9056,6 +9056,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>>> 		btrfs_put_fs_root(root);
>>> 	}
>>> 	root_dropped = true;
>>> +
>>> +	 /* Remove level-0 qgroup items since no other subvolume can use them */
>>> +	ret = btrfs_remove_qgroup(trans, objectid);
>>> +	if (ret && ret != -EINVAL && ret != -ENOENT) {
>>
>> I'm sorry for missing the snapshot case. If it is a snapshot, then when
>> we remove the relevant qgroup, we will not be able to perform
>> quick_update_accounting(), and it will return 1. So we shouldn't abort
>> the transaction when the return value = 1.
>>
>> btrfs_remove_qgroup
>> -> __del_qgroup_relation
>>    -> quick_update_accounting << if qgroup->excl != qgroup->rfer; return 1
>>
> 
> oh, thanks for pointing out, I missed that case...
> I wonder if there is a way to keep qgroup consistency without enforcing rescan. 

In theory, it's possible by just dirtying all extents of related level 0
qgroups.

In practical, not really possible.
As above method will hang current transaction almost forever.

> 
> btw, I noticed that removing qgroup relation when excl != rfer causes
> some problem without this patch:
> 
> (4.18-rc7)
> $ mkfs.btrfs -fq $DEV
> $ mount $DEV /mnt
> 
> $ btrfs quota enable /mnt
> $ btrfs qgroup create 1/0 /mnt
> $ btrfs sub create /mnt/sub
> $ btrfs qgroup assign 0/257 1/0 /mnt
> 
> $ dd if=/dev/urandom of=/mnt/sub/file bs=1k count=1000
> $ btrfs sub snap /mnt/sub /mnt/snap

In fact, at this point we should already mark btrfs quota inconsistent.

When creating qgroup 0/258 we're already doing quick hack for its
number, but since 0/257 is already belong to 1/0, the quick update
itself is already making quota inconsistent.

(Yes, snapshot creation under multi-level qgroups is the biggest
challenge for btrfs quota)

> $ dd if=/dev/urandom of=/mnt/sub/file2 bs=1k count=1000
> 
> $ btrfs fi sync /mnt
> $ btrfs qgroup show -pcre /mnt

At this point, qgroups is already inconsistent.
I'll try to fix it.

Thanks,
Qu

> qgroupid         rfer         excl     max_rfer     max_excl parent  child
> --------         ----         ----     --------     -------- ------  -----
> 0/5          16.00KiB     16.00KiB         none         none ---     ---
> 0/257         1.97MiB   1016.00KiB         none         none 1/0     ---
> 0/258      1016.00KiB     16.00KiB         none         none ---     ---
> 1/0           1.97MiB      1.97MiB         none         none ---     0/257
>                            ^^^^^^^ should be 1016Kib
> 
> $ btrfs qgroup remove 0/257 1/0 /mnt
> WARNING: quotas may be inconsistent, rescan needed
> $ btrfs qgroup show -pcre  /mnt
> WARNING: qgroup data inconsistent, rescan recommended
> qgroupid         rfer         excl     max_rfer     max_excl parent  child
> --------         ----         ----     --------     -------- ------  -----
> 0/5          16.00KiB     16.00KiB         none         none ---     ---
> 0/258      1016.00KiB   1016.00KiB         none         none ---     ---
> 1/0           1.97MiB      1.97MiB         none         none ---     ---
> 
> $ btrfs quota rescan -w /mnt
> $ btrfs qgroup show -pcre /mnt
> qgroupid         rfer         excl     max_rfer     max_excl parent  child
> --------         ----         ----     --------     -------- ------  -----
> 0/5          16.00KiB     16.00KiB         none         none ---     ---
> 0/258      1016.00KiB   1016.00KiB         none         none ---     ---
> 1/0           1.97MiB      1.97MiB         none         none ---     ---
>                ^^^^^^      ^^^^^^^ not changed, although inconsistency flag is cleared
> 
> I will look at the case to better understand qgroup.
> 
> Thanks,
> Misono
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Patch
diff mbox series

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 9e7b237b9547..48edf839ed2c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8871,12 +8871,13 @@  int btrfs_drop_snapshot(struct btrfs_root *root,
 	struct btrfs_root_item *root_item = &root->root_item;
 	struct walk_control *wc;
 	struct btrfs_key key;
+	u64 objectid = root->root_key.objectid;
 	int err = 0;
 	int ret;
 	int level;
 	bool root_dropped = false;
 
-	btrfs_debug(fs_info, "Drop subvolume %llu", root->objectid);
+	btrfs_debug(fs_info, "Drop subvolume %llu", objectid);
 
 	path = btrfs_alloc_path();
 	if (!path) {
@@ -9030,7 +9031,7 @@  int btrfs_drop_snapshot(struct btrfs_root *root,
 		goto out_end_trans;
 	}
 
-	if (root->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID) {
+	if (objectid != BTRFS_TREE_RELOC_OBJECTID) {
 		ret = btrfs_find_root(tree_root, &root->root_key, path,
 				      NULL, NULL);
 		if (ret < 0) {
@@ -9043,8 +9044,7 @@  int btrfs_drop_snapshot(struct btrfs_root *root,
 			 *
 			 * The most common failure here is just -ENOENT.
 			 */
-			btrfs_del_orphan_item(trans, tree_root,
-					      root->root_key.objectid);
+			btrfs_del_orphan_item(trans, tree_root, objectid);
 		}
 	}
 
@@ -9056,6 +9056,14 @@  int btrfs_drop_snapshot(struct btrfs_root *root,
 		btrfs_put_fs_root(root);
 	}
 	root_dropped = true;
+
+	 /* Remove level-0 qgroup items since no other subvolume can use them */
+	ret = btrfs_remove_qgroup(trans, objectid);
+	if (ret && ret != -EINVAL && ret != -ENOENT) {
+		btrfs_abort_transaction(trans, ret);
+		err = ret;
+	}
+
 out_end_trans:
 	btrfs_end_transaction_throttle(trans);
 out_free: