diff mbox series

btrfs: qgroup: Mark qgroup inconsistent if we're inherting snapshot to a new qgroup

Message ID 20200402063735.32808-1-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: qgroup: Mark qgroup inconsistent if we're inherting snapshot to a new qgroup | expand

Commit Message

Qu Wenruo April 2, 2020, 6:37 a.m. UTC
[BUG]
For the following operation, qgroup is guaranteed to be screwed up due
to snapshot adding to a new qgroup:

  # mkfs.btrfs -f $dev
  # mount $dev $mnt
  # btrfs qgroup en $mnt
  # btrfs subv create $mnt/src
  # xfs_io -f -c "pwrite 0 1m" $mnt/src/file
  # sync
  # btrfs qgroup create 1/0 $mnt/src
  # btrfs subv snapshot -i 1/0 $mnt/src $mnt/snapshot
  # btrfs qgroup show -prce $mnt/src
  qgroupid         rfer         excl     max_rfer     max_excl parent  child
  --------         ----         ----     --------     -------- ------  -----
  0/5          16.00KiB     16.00KiB         none         none ---     ---
  0/257         1.02MiB     16.00KiB         none         none ---     ---
  0/258         1.02MiB     16.00KiB         none         none 1/0     ---
  1/0             0.00B        0.00B         none         none ---     0/258
	        ^^^^^^^^^^^^^^^^^^^^

[CAUSE]
The problem is in btrfs_qgroup_inherit(), we don't have good enough
check to determine if the new relation ship would break the existing
accounting.

Unlike btrfs_add_qgroup_relation(), which has proper check to determine
if we can do quick update without a rescan, in btrfs_qgroup_inherit() we
can even assign a snapshot to multiple qgroups.

[FIX]
Fix the problem by manually marking qgroup inconsistent for snapshot
inheritance.

For subvolume creation, since all its extents are exclusively owned by
itself, we don't need to rescan.

In theory, we should call relationship check like quick_update_accounting()
when doing qgroup inheritance and inform user about qgroup inconsistent.

But we don't have good enough mechanism to inform user in the snapshot
creation context, thus we can only silently mark the qgroup
inconsistent.

Anyway, user shouldn't use qgroup inheritance during snapshot creation,
and should add qgroup relationship after snapshot creation by 'btrfs
qgroup assign', which has a much better UI to inform user about qgroup
inconsistent and kick in rescan automatically.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/qgroup.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Qu Wenruo April 8, 2020, 12:36 p.m. UTC | #1
Forgot to mention, although this doesn't cause any data corruption, it
breaks snapper, which has some kind of "space aware cleaner algorithm",
which put all newly created snapshots into 1/0, but not the current root
subvolume.

And since snapper uses snapshot ioctl to assign qgroup relationship
directly, without using qgrou assign ioctl, it has no way to detect such
problem.

Hopes we can get this patch into current release cycle.

Thanks,
Qu

On 2020/4/2 下午2:37, Qu Wenruo wrote:
> [BUG]
> For the following operation, qgroup is guaranteed to be screwed up due
> to snapshot adding to a new qgroup:
> 
>   # mkfs.btrfs -f $dev
>   # mount $dev $mnt
>   # btrfs qgroup en $mnt
>   # btrfs subv create $mnt/src
>   # xfs_io -f -c "pwrite 0 1m" $mnt/src/file
>   # sync
>   # btrfs qgroup create 1/0 $mnt/src
>   # btrfs subv snapshot -i 1/0 $mnt/src $mnt/snapshot
>   # btrfs qgroup show -prce $mnt/src
>   qgroupid         rfer         excl     max_rfer     max_excl parent  child
>   --------         ----         ----     --------     -------- ------  -----
>   0/5          16.00KiB     16.00KiB         none         none ---     ---
>   0/257         1.02MiB     16.00KiB         none         none ---     ---
>   0/258         1.02MiB     16.00KiB         none         none 1/0     ---
>   1/0             0.00B        0.00B         none         none ---     0/258
> 	        ^^^^^^^^^^^^^^^^^^^^
> 
> [CAUSE]
> The problem is in btrfs_qgroup_inherit(), we don't have good enough
> check to determine if the new relation ship would break the existing
> accounting.
> 
> Unlike btrfs_add_qgroup_relation(), which has proper check to determine
> if we can do quick update without a rescan, in btrfs_qgroup_inherit() we
> can even assign a snapshot to multiple qgroups.
> 
> [FIX]
> Fix the problem by manually marking qgroup inconsistent for snapshot
> inheritance.
> 
> For subvolume creation, since all its extents are exclusively owned by
> itself, we don't need to rescan.
> 
> In theory, we should call relationship check like quick_update_accounting()
> when doing qgroup inheritance and inform user about qgroup inconsistent.
> 
> But we don't have good enough mechanism to inform user in the snapshot
> creation context, thus we can only silently mark the qgroup
> inconsistent.
> 
> Anyway, user shouldn't use qgroup inheritance during snapshot creation,
> and should add qgroup relationship after snapshot creation by 'btrfs
> qgroup assign', which has a much better UI to inform user about qgroup
> inconsistent and kick in rescan automatically.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/qgroup.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index c3888fb367e7..81b2efca48b4 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2622,6 +2622,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
>  	struct btrfs_root *quota_root;
>  	struct btrfs_qgroup *srcgroup;
>  	struct btrfs_qgroup *dstgroup;
> +	bool need_rescan = false;
>  	u32 level_size = 0;
>  	u64 nums;
>  
> @@ -2765,6 +2766,13 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
>  				goto unlock;
>  		}
>  		++i_qgroups;
> +
> +		/*
> +		 * If we're doing a snapshot, and adding the snapshot to a new
> +		 * qgroup, the numbers are guaranteed to be incorrect.
> +		 */
> +		if (srcid)
> +			need_rescan = true;
>  	}
>  
>  	for (i = 0; i <  inherit->num_ref_copies; ++i, i_qgroups += 2) {
> @@ -2784,6 +2792,9 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
>  
>  		dst->rfer = src->rfer - level_size;
>  		dst->rfer_cmpr = src->rfer_cmpr - level_size;
> +
> +		/* Manually tweaking numbers? No way to keep qgroup sane */
> +		need_rescan = true;
>  	}
>  	for (i = 0; i <  inherit->num_excl_copies; ++i, i_qgroups += 2) {
>  		struct btrfs_qgroup *src;
> @@ -2802,6 +2813,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
>  
>  		dst->excl = src->excl + level_size;
>  		dst->excl_cmpr = src->excl_cmpr + level_size;
> +		need_rescan = true;
>  	}
>  
>  unlock:
> @@ -2809,6 +2821,8 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
>  out:
>  	if (!committing)
>  		mutex_unlock(&fs_info->qgroup_ioctl_lock);
> +	if (need_rescan)
> +		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
>  	return ret;
>  }
>  
>
Josef Bacik April 10, 2020, 7:19 p.m. UTC | #2
On 4/2/20 2:37 AM, Qu Wenruo wrote:
> [BUG]
> For the following operation, qgroup is guaranteed to be screwed up due
> to snapshot adding to a new qgroup:
> 
>    # mkfs.btrfs -f $dev
>    # mount $dev $mnt
>    # btrfs qgroup en $mnt
>    # btrfs subv create $mnt/src
>    # xfs_io -f -c "pwrite 0 1m" $mnt/src/file
>    # sync
>    # btrfs qgroup create 1/0 $mnt/src
>    # btrfs subv snapshot -i 1/0 $mnt/src $mnt/snapshot
>    # btrfs qgroup show -prce $mnt/src
>    qgroupid         rfer         excl     max_rfer     max_excl parent  child
>    --------         ----         ----     --------     -------- ------  -----
>    0/5          16.00KiB     16.00KiB         none         none ---     ---
>    0/257         1.02MiB     16.00KiB         none         none ---     ---
>    0/258         1.02MiB     16.00KiB         none         none 1/0     ---
>    1/0             0.00B        0.00B         none         none ---     0/258
> 	        ^^^^^^^^^^^^^^^^^^^^
> 

Can we get an xfstest for this?  When I go through qgroups in a few months I 
want to have as many regression tests as possible to rely on.

> [CAUSE]
> The problem is in btrfs_qgroup_inherit(), we don't have good enough
> check to determine if the new relation ship would break the existing
> accounting.
> 
> Unlike btrfs_add_qgroup_relation(), which has proper check to determine
> if we can do quick update without a rescan, in btrfs_qgroup_inherit() we
> can even assign a snapshot to multiple qgroups.
> 
> [FIX]
> Fix the problem by manually marking qgroup inconsistent for snapshot
> inheritance.
> 
> For subvolume creation, since all its extents are exclusively owned by
> itself, we don't need to rescan.
> 
> In theory, we should call relationship check like quick_update_accounting()
> when doing qgroup inheritance and inform user about qgroup inconsistent.
> 
> But we don't have good enough mechanism to inform user in the snapshot
> creation context, thus we can only silently mark the qgroup
> inconsistent.
> 
> Anyway, user shouldn't use qgroup inheritance during snapshot creation,
> and should add qgroup relationship after snapshot creation by 'btrfs
> qgroup assign', which has a much better UI to inform user about qgroup
> inconsistent and kick in rescan automatically.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef
David Sterba May 6, 2020, 3:04 p.m. UTC | #3
On Wed, Apr 08, 2020 at 08:36:27PM +0800, Qu Wenruo wrote:
> Forgot to mention, although this doesn't cause any data corruption, it
> breaks snapper, which has some kind of "space aware cleaner algorithm",
> which put all newly created snapshots into 1/0, but not the current root
> subvolume.
> 
> And since snapper uses snapshot ioctl to assign qgroup relationship
> directly, without using qgrou assign ioctl, it has no way to detect such
> problem.
> 
> Hopes we can get this patch into current release cycle.

It's still time to get it to 5.8, with CC: stable it could get
propagated to other versions. For 5.7 it's not clear at this point as
the bug does not seem to be urgent and as far as I understand it,
there's a workaround.

Also with an application (snapper) using some semantics of the ioctls,
we need to actually test it with patched and unpatched kernel, or maybe
snapper needs some fixup first.

> > --- a/fs/btrfs/qgroup.c
> > +++ b/fs/btrfs/qgroup.c
> > @@ -2622,6 +2622,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
> >  	struct btrfs_root *quota_root;
> >  	struct btrfs_qgroup *srcgroup;
> >  	struct btrfs_qgroup *dstgroup;
> > +	bool need_rescan = false;
> >  	u32 level_size = 0;
> >  	u64 nums;
> >  
> > @@ -2765,6 +2766,13 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
> >  				goto unlock;
> >  		}
> >  		++i_qgroups;
> > +
> > +		/*
> > +		 * If we're doing a snapshot, and adding the snapshot to a new
> > +		 * qgroup, the numbers are guaranteed to be incorrect.
> > +		 */
> > +		if (srcid)
> > +			need_rescan = true;
> >  	}
> >  
> >  	for (i = 0; i <  inherit->num_ref_copies; ++i, i_qgroups += 2) {
> > @@ -2784,6 +2792,9 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
> >  
> >  		dst->rfer = src->rfer - level_size;
> >  		dst->rfer_cmpr = src->rfer_cmpr - level_size;
> > +
> > +		/* Manually tweaking numbers? No way to keep qgroup sane */
> > +		need_rescan = true;
> >  	}
> >  	for (i = 0; i <  inherit->num_excl_copies; ++i, i_qgroups += 2) {
> >  		struct btrfs_qgroup *src;
> > @@ -2802,6 +2813,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
> >  
> >  		dst->excl = src->excl + level_size;
> >  		dst->excl_cmpr = src->excl_cmpr + level_size;
> > +		need_rescan = true;
> >  	}
> >  
> >  unlock:
> > @@ -2809,6 +2821,8 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
> >  out:
> >  	if (!committing)
> >  		mutex_unlock(&fs_info->qgroup_ioctl_lock);
> > +	if (need_rescan)
> > +		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;

This got me curious, a non-atomic change to qgroup_flags and without
any protection. The function is not running in a safe context (like
quota enable or disable) so lack of synchronization seems suspicious. I
grepped for other changes to the qgroup_flags and it's very
inconsistent. Sometimes it's the fs_info::qgroup_lock, no lokcing at all
or no obvious lock but likely fs_info::qgroup_ioctl_lock or
qgroup_rescan_lock.

I was considering using atomic bit updates but that would be another
patch.
Qu Wenruo May 6, 2020, 10:48 p.m. UTC | #4
On 2020/5/6 下午11:04, David Sterba wrote:
> On Wed, Apr 08, 2020 at 08:36:27PM +0800, Qu Wenruo wrote:
>> Forgot to mention, although this doesn't cause any data corruption, it
>> breaks snapper, which has some kind of "space aware cleaner algorithm",
>> which put all newly created snapshots into 1/0, but not the current root
>> subvolume.
>>
>> And since snapper uses snapshot ioctl to assign qgroup relationship
>> directly, without using qgrou assign ioctl, it has no way to detect such
>> problem.
>>
>> Hopes we can get this patch into current release cycle.
> 
> It's still time to get it to 5.8, with CC: stable it could get
> propagated to other versions. For 5.7 it's not clear at this point as
> the bug does not seem to be urgent and as far as I understand it,
> there's a workaround.
> 
> Also with an application (snapper) using some semantics of the ioctls,
> we need to actually test it with patched and unpatched kernel, or maybe
> snapper needs some fixup first.
> 
>>> --- a/fs/btrfs/qgroup.c
>>> +++ b/fs/btrfs/qgroup.c
>>> @@ -2622,6 +2622,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
>>>  	struct btrfs_root *quota_root;
>>>  	struct btrfs_qgroup *srcgroup;
>>>  	struct btrfs_qgroup *dstgroup;
>>> +	bool need_rescan = false;
>>>  	u32 level_size = 0;
>>>  	u64 nums;
>>>  
>>> @@ -2765,6 +2766,13 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
>>>  				goto unlock;
>>>  		}
>>>  		++i_qgroups;
>>> +
>>> +		/*
>>> +		 * If we're doing a snapshot, and adding the snapshot to a new
>>> +		 * qgroup, the numbers are guaranteed to be incorrect.
>>> +		 */
>>> +		if (srcid)
>>> +			need_rescan = true;
>>>  	}
>>>  
>>>  	for (i = 0; i <  inherit->num_ref_copies; ++i, i_qgroups += 2) {
>>> @@ -2784,6 +2792,9 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
>>>  
>>>  		dst->rfer = src->rfer - level_size;
>>>  		dst->rfer_cmpr = src->rfer_cmpr - level_size;
>>> +
>>> +		/* Manually tweaking numbers? No way to keep qgroup sane */
>>> +		need_rescan = true;
>>>  	}
>>>  	for (i = 0; i <  inherit->num_excl_copies; ++i, i_qgroups += 2) {
>>>  		struct btrfs_qgroup *src;
>>> @@ -2802,6 +2813,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
>>>  
>>>  		dst->excl = src->excl + level_size;
>>>  		dst->excl_cmpr = src->excl_cmpr + level_size;
>>> +		need_rescan = true;
>>>  	}
>>>  
>>>  unlock:
>>> @@ -2809,6 +2821,8 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
>>>  out:
>>>  	if (!committing)
>>>  		mutex_unlock(&fs_info->qgroup_ioctl_lock);
>>> +	if (need_rescan)
>>> +		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
> 
> This got me curious, a non-atomic change to qgroup_flags and without
> any protection. The function is not running in a safe context (like
> quota enable or disable) so lack of synchronization seems suspicious. I
> grepped for other changes to the qgroup_flags and it's very
> inconsistent. Sometimes it's the fs_info::qgroup_lock, no lokcing at all
> or no obvious lock but likely fs_info::qgroup_ioctl_lock or
> qgroup_rescan_lock.
> 
> I was considering using atomic bit updates but that would be another
> patch.
> 
Isn't the context safe because we're at the commit transaction critical
section?

Thanks,
Qu
David Sterba May 7, 2020, 8:51 p.m. UTC | #5
On Thu, May 07, 2020 at 06:48:24AM +0800, Qu Wenruo wrote:
> >>> @@ -2809,6 +2821,8 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
> >>>  out:
> >>>  	if (!committing)
> >>>  		mutex_unlock(&fs_info->qgroup_ioctl_lock);
> >>> +	if (need_rescan)
> >>> +		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
> > 
> > This got me curious, a non-atomic change to qgroup_flags and without
> > any protection. The function is not running in a safe context (like
> > quota enable or disable) so lack of synchronization seems suspicious. I
> > grepped for other changes to the qgroup_flags and it's very
> > inconsistent. Sometimes it's the fs_info::qgroup_lock, no lokcing at all
> > or no obvious lock but likely fs_info::qgroup_ioctl_lock or
> > qgroup_rescan_lock.
> > 
> > I was considering using atomic bit updates but that would be another
> > patch.
> > 
> Isn't the context safe because we're at the commit transaction critical
> section?

I don't see why, the qgroup_flags could be changed from ioctls, eg.
adding a group relation and other places. Without protection the update
can race and some of the changes lost.
David Sterba May 7, 2020, 8:57 p.m. UTC | #6
On Thu, Apr 02, 2020 at 02:37:35PM +0800, Qu Wenruo wrote:
> Anyway, user shouldn't use qgroup inheritance during snapshot creation,
> and should add qgroup relationship after snapshot creation by 'btrfs
> qgroup assign', which has a much better UI to inform user about qgroup
> inconsistent and kick in rescan automatically.

If users shouldn't do something, there needs to be a documentation,
warning and deprecation of the interface at least. If it's there,
somebody will use it.

I'll apply the patch despite the qgroup_flags concerns as it's not
critical.
Qu Wenruo May 8, 2020, 12:08 a.m. UTC | #7
On 2020/5/8 上午4:51, David Sterba wrote:
> On Thu, May 07, 2020 at 06:48:24AM +0800, Qu Wenruo wrote:
>>>>> @@ -2809,6 +2821,8 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
>>>>>  out:
>>>>>  	if (!committing)
>>>>>  		mutex_unlock(&fs_info->qgroup_ioctl_lock);
>>>>> +	if (need_rescan)
>>>>> +		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
>>>
>>> This got me curious, a non-atomic change to qgroup_flags and without
>>> any protection. The function is not running in a safe context (like
>>> quota enable or disable) so lack of synchronization seems suspicious. I
>>> grepped for other changes to the qgroup_flags and it's very
>>> inconsistent. Sometimes it's the fs_info::qgroup_lock, no lokcing at all
>>> or no obvious lock but likely fs_info::qgroup_ioctl_lock or
>>> qgroup_rescan_lock.
>>>
>>> I was considering using atomic bit updates but that would be another
>>> patch.
>>>
>> Isn't the context safe because we're at the commit transaction critical
>> section?
> 
> I don't see why, the qgroup_flags could be changed from ioctls, eg.
> adding a group relation and other places. Without protection the update
> can race and some of the changes lost.
> 
Oh, you're right.

Do I need another patch to address all the inconsistence?

Thanks,
Qu
diff mbox series

Patch

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index c3888fb367e7..81b2efca48b4 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2622,6 +2622,7 @@  int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
 	struct btrfs_root *quota_root;
 	struct btrfs_qgroup *srcgroup;
 	struct btrfs_qgroup *dstgroup;
+	bool need_rescan = false;
 	u32 level_size = 0;
 	u64 nums;
 
@@ -2765,6 +2766,13 @@  int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
 				goto unlock;
 		}
 		++i_qgroups;
+
+		/*
+		 * If we're doing a snapshot, and adding the snapshot to a new
+		 * qgroup, the numbers are guaranteed to be incorrect.
+		 */
+		if (srcid)
+			need_rescan = true;
 	}
 
 	for (i = 0; i <  inherit->num_ref_copies; ++i, i_qgroups += 2) {
@@ -2784,6 +2792,9 @@  int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
 
 		dst->rfer = src->rfer - level_size;
 		dst->rfer_cmpr = src->rfer_cmpr - level_size;
+
+		/* Manually tweaking numbers? No way to keep qgroup sane */
+		need_rescan = true;
 	}
 	for (i = 0; i <  inherit->num_excl_copies; ++i, i_qgroups += 2) {
 		struct btrfs_qgroup *src;
@@ -2802,6 +2813,7 @@  int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
 
 		dst->excl = src->excl + level_size;
 		dst->excl_cmpr = src->excl_cmpr + level_size;
+		need_rescan = true;
 	}
 
 unlock:
@@ -2809,6 +2821,8 @@  int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
 out:
 	if (!committing)
 		mutex_unlock(&fs_info->qgroup_ioctl_lock);
+	if (need_rescan)
+		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
 	return ret;
 }