Message ID | 20200402063735.32808-1-wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: qgroup: Mark qgroup inconsistent if we're inherting snapshot to a new qgroup | expand |
Forgot to mention, although this doesn't cause any data corruption, it breaks snapper, which has some kind of "space aware cleaner algorithm", which put all newly created snapshots into 1/0, but not the current root subvolume. And since snapper uses snapshot ioctl to assign qgroup relationship directly, without using qgrou assign ioctl, it has no way to detect such problem. Hopes we can get this patch into current release cycle. Thanks, Qu On 2020/4/2 下午2:37, Qu Wenruo wrote: > [BUG] > For the following operation, qgroup is guaranteed to be screwed up due > to snapshot adding to a new qgroup: > > # mkfs.btrfs -f $dev > # mount $dev $mnt > # btrfs qgroup en $mnt > # btrfs subv create $mnt/src > # xfs_io -f -c "pwrite 0 1m" $mnt/src/file > # sync > # btrfs qgroup create 1/0 $mnt/src > # btrfs subv snapshot -i 1/0 $mnt/src $mnt/snapshot > # btrfs qgroup show -prce $mnt/src > qgroupid rfer excl max_rfer max_excl parent child > -------- ---- ---- -------- -------- ------ ----- > 0/5 16.00KiB 16.00KiB none none --- --- > 0/257 1.02MiB 16.00KiB none none --- --- > 0/258 1.02MiB 16.00KiB none none 1/0 --- > 1/0 0.00B 0.00B none none --- 0/258 > ^^^^^^^^^^^^^^^^^^^^ > > [CAUSE] > The problem is in btrfs_qgroup_inherit(), we don't have good enough > check to determine if the new relation ship would break the existing > accounting. > > Unlike btrfs_add_qgroup_relation(), which has proper check to determine > if we can do quick update without a rescan, in btrfs_qgroup_inherit() we > can even assign a snapshot to multiple qgroups. > > [FIX] > Fix the problem by manually marking qgroup inconsistent for snapshot > inheritance. > > For subvolume creation, since all its extents are exclusively owned by > itself, we don't need to rescan. > > In theory, we should call relationship check like quick_update_accounting() > when doing qgroup inheritance and inform user about qgroup inconsistent. > > But we don't have good enough mechanism to inform user in the snapshot > creation context, thus we can only silently mark the qgroup > inconsistent. > > Anyway, user shouldn't use qgroup inheritance during snapshot creation, > and should add qgroup relationship after snapshot creation by 'btrfs > qgroup assign', which has a much better UI to inform user about qgroup > inconsistent and kick in rescan automatically. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/qgroup.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c > index c3888fb367e7..81b2efca48b4 100644 > --- a/fs/btrfs/qgroup.c > +++ b/fs/btrfs/qgroup.c > @@ -2622,6 +2622,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, > struct btrfs_root *quota_root; > struct btrfs_qgroup *srcgroup; > struct btrfs_qgroup *dstgroup; > + bool need_rescan = false; > u32 level_size = 0; > u64 nums; > > @@ -2765,6 +2766,13 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, > goto unlock; > } > ++i_qgroups; > + > + /* > + * If we're doing a snapshot, and adding the snapshot to a new > + * qgroup, the numbers are guaranteed to be incorrect. > + */ > + if (srcid) > + need_rescan = true; > } > > for (i = 0; i < inherit->num_ref_copies; ++i, i_qgroups += 2) { > @@ -2784,6 +2792,9 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, > > dst->rfer = src->rfer - level_size; > dst->rfer_cmpr = src->rfer_cmpr - level_size; > + > + /* Manually tweaking numbers? No way to keep qgroup sane */ > + need_rescan = true; > } > for (i = 0; i < inherit->num_excl_copies; ++i, i_qgroups += 2) { > struct btrfs_qgroup *src; > @@ -2802,6 +2813,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, > > dst->excl = src->excl + level_size; > dst->excl_cmpr = src->excl_cmpr + level_size; > + need_rescan = true; > } > > unlock: > @@ -2809,6 +2821,8 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, > out: > if (!committing) > mutex_unlock(&fs_info->qgroup_ioctl_lock); > + if (need_rescan) > + fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; > return ret; > } > >
On 4/2/20 2:37 AM, Qu Wenruo wrote: > [BUG] > For the following operation, qgroup is guaranteed to be screwed up due > to snapshot adding to a new qgroup: > > # mkfs.btrfs -f $dev > # mount $dev $mnt > # btrfs qgroup en $mnt > # btrfs subv create $mnt/src > # xfs_io -f -c "pwrite 0 1m" $mnt/src/file > # sync > # btrfs qgroup create 1/0 $mnt/src > # btrfs subv snapshot -i 1/0 $mnt/src $mnt/snapshot > # btrfs qgroup show -prce $mnt/src > qgroupid rfer excl max_rfer max_excl parent child > -------- ---- ---- -------- -------- ------ ----- > 0/5 16.00KiB 16.00KiB none none --- --- > 0/257 1.02MiB 16.00KiB none none --- --- > 0/258 1.02MiB 16.00KiB none none 1/0 --- > 1/0 0.00B 0.00B none none --- 0/258 > ^^^^^^^^^^^^^^^^^^^^ > Can we get an xfstest for this? When I go through qgroups in a few months I want to have as many regression tests as possible to rely on. > [CAUSE] > The problem is in btrfs_qgroup_inherit(), we don't have good enough > check to determine if the new relation ship would break the existing > accounting. > > Unlike btrfs_add_qgroup_relation(), which has proper check to determine > if we can do quick update without a rescan, in btrfs_qgroup_inherit() we > can even assign a snapshot to multiple qgroups. > > [FIX] > Fix the problem by manually marking qgroup inconsistent for snapshot > inheritance. > > For subvolume creation, since all its extents are exclusively owned by > itself, we don't need to rescan. > > In theory, we should call relationship check like quick_update_accounting() > when doing qgroup inheritance and inform user about qgroup inconsistent. > > But we don't have good enough mechanism to inform user in the snapshot > creation context, thus we can only silently mark the qgroup > inconsistent. > > Anyway, user shouldn't use qgroup inheritance during snapshot creation, > and should add qgroup relationship after snapshot creation by 'btrfs > qgroup assign', which has a much better UI to inform user about qgroup > inconsistent and kick in rescan automatically. > > Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef
On Wed, Apr 08, 2020 at 08:36:27PM +0800, Qu Wenruo wrote: > Forgot to mention, although this doesn't cause any data corruption, it > breaks snapper, which has some kind of "space aware cleaner algorithm", > which put all newly created snapshots into 1/0, but not the current root > subvolume. > > And since snapper uses snapshot ioctl to assign qgroup relationship > directly, without using qgrou assign ioctl, it has no way to detect such > problem. > > Hopes we can get this patch into current release cycle. It's still time to get it to 5.8, with CC: stable it could get propagated to other versions. For 5.7 it's not clear at this point as the bug does not seem to be urgent and as far as I understand it, there's a workaround. Also with an application (snapper) using some semantics of the ioctls, we need to actually test it with patched and unpatched kernel, or maybe snapper needs some fixup first. > > --- a/fs/btrfs/qgroup.c > > +++ b/fs/btrfs/qgroup.c > > @@ -2622,6 +2622,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, > > struct btrfs_root *quota_root; > > struct btrfs_qgroup *srcgroup; > > struct btrfs_qgroup *dstgroup; > > + bool need_rescan = false; > > u32 level_size = 0; > > u64 nums; > > > > @@ -2765,6 +2766,13 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, > > goto unlock; > > } > > ++i_qgroups; > > + > > + /* > > + * If we're doing a snapshot, and adding the snapshot to a new > > + * qgroup, the numbers are guaranteed to be incorrect. > > + */ > > + if (srcid) > > + need_rescan = true; > > } > > > > for (i = 0; i < inherit->num_ref_copies; ++i, i_qgroups += 2) { > > @@ -2784,6 +2792,9 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, > > > > dst->rfer = src->rfer - level_size; > > dst->rfer_cmpr = src->rfer_cmpr - level_size; > > + > > + /* Manually tweaking numbers? No way to keep qgroup sane */ > > + need_rescan = true; > > } > > for (i = 0; i < inherit->num_excl_copies; ++i, i_qgroups += 2) { > > struct btrfs_qgroup *src; > > @@ -2802,6 +2813,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, > > > > dst->excl = src->excl + level_size; > > dst->excl_cmpr = src->excl_cmpr + level_size; > > + need_rescan = true; > > } > > > > unlock: > > @@ -2809,6 +2821,8 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, > > out: > > if (!committing) > > mutex_unlock(&fs_info->qgroup_ioctl_lock); > > + if (need_rescan) > > + fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; This got me curious, a non-atomic change to qgroup_flags and without any protection. The function is not running in a safe context (like quota enable or disable) so lack of synchronization seems suspicious. I grepped for other changes to the qgroup_flags and it's very inconsistent. Sometimes it's the fs_info::qgroup_lock, no lokcing at all or no obvious lock but likely fs_info::qgroup_ioctl_lock or qgroup_rescan_lock. I was considering using atomic bit updates but that would be another patch.
On 2020/5/6 下午11:04, David Sterba wrote: > On Wed, Apr 08, 2020 at 08:36:27PM +0800, Qu Wenruo wrote: >> Forgot to mention, although this doesn't cause any data corruption, it >> breaks snapper, which has some kind of "space aware cleaner algorithm", >> which put all newly created snapshots into 1/0, but not the current root >> subvolume. >> >> And since snapper uses snapshot ioctl to assign qgroup relationship >> directly, without using qgrou assign ioctl, it has no way to detect such >> problem. >> >> Hopes we can get this patch into current release cycle. > > It's still time to get it to 5.8, with CC: stable it could get > propagated to other versions. For 5.7 it's not clear at this point as > the bug does not seem to be urgent and as far as I understand it, > there's a workaround. > > Also with an application (snapper) using some semantics of the ioctls, > we need to actually test it with patched and unpatched kernel, or maybe > snapper needs some fixup first. > >>> --- a/fs/btrfs/qgroup.c >>> +++ b/fs/btrfs/qgroup.c >>> @@ -2622,6 +2622,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, >>> struct btrfs_root *quota_root; >>> struct btrfs_qgroup *srcgroup; >>> struct btrfs_qgroup *dstgroup; >>> + bool need_rescan = false; >>> u32 level_size = 0; >>> u64 nums; >>> >>> @@ -2765,6 +2766,13 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, >>> goto unlock; >>> } >>> ++i_qgroups; >>> + >>> + /* >>> + * If we're doing a snapshot, and adding the snapshot to a new >>> + * qgroup, the numbers are guaranteed to be incorrect. >>> + */ >>> + if (srcid) >>> + need_rescan = true; >>> } >>> >>> for (i = 0; i < inherit->num_ref_copies; ++i, i_qgroups += 2) { >>> @@ -2784,6 +2792,9 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, >>> >>> dst->rfer = src->rfer - level_size; >>> dst->rfer_cmpr = src->rfer_cmpr - level_size; >>> + >>> + /* Manually tweaking numbers? No way to keep qgroup sane */ >>> + need_rescan = true; >>> } >>> for (i = 0; i < inherit->num_excl_copies; ++i, i_qgroups += 2) { >>> struct btrfs_qgroup *src; >>> @@ -2802,6 +2813,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, >>> >>> dst->excl = src->excl + level_size; >>> dst->excl_cmpr = src->excl_cmpr + level_size; >>> + need_rescan = true; >>> } >>> >>> unlock: >>> @@ -2809,6 +2821,8 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, >>> out: >>> if (!committing) >>> mutex_unlock(&fs_info->qgroup_ioctl_lock); >>> + if (need_rescan) >>> + fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; > > This got me curious, a non-atomic change to qgroup_flags and without > any protection. The function is not running in a safe context (like > quota enable or disable) so lack of synchronization seems suspicious. I > grepped for other changes to the qgroup_flags and it's very > inconsistent. Sometimes it's the fs_info::qgroup_lock, no lokcing at all > or no obvious lock but likely fs_info::qgroup_ioctl_lock or > qgroup_rescan_lock. > > I was considering using atomic bit updates but that would be another > patch. > Isn't the context safe because we're at the commit transaction critical section? Thanks, Qu
On Thu, May 07, 2020 at 06:48:24AM +0800, Qu Wenruo wrote: > >>> @@ -2809,6 +2821,8 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, > >>> out: > >>> if (!committing) > >>> mutex_unlock(&fs_info->qgroup_ioctl_lock); > >>> + if (need_rescan) > >>> + fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; > > > > This got me curious, a non-atomic change to qgroup_flags and without > > any protection. The function is not running in a safe context (like > > quota enable or disable) so lack of synchronization seems suspicious. I > > grepped for other changes to the qgroup_flags and it's very > > inconsistent. Sometimes it's the fs_info::qgroup_lock, no lokcing at all > > or no obvious lock but likely fs_info::qgroup_ioctl_lock or > > qgroup_rescan_lock. > > > > I was considering using atomic bit updates but that would be another > > patch. > > > Isn't the context safe because we're at the commit transaction critical > section? I don't see why, the qgroup_flags could be changed from ioctls, eg. adding a group relation and other places. Without protection the update can race and some of the changes lost.
On Thu, Apr 02, 2020 at 02:37:35PM +0800, Qu Wenruo wrote: > Anyway, user shouldn't use qgroup inheritance during snapshot creation, > and should add qgroup relationship after snapshot creation by 'btrfs > qgroup assign', which has a much better UI to inform user about qgroup > inconsistent and kick in rescan automatically. If users shouldn't do something, there needs to be a documentation, warning and deprecation of the interface at least. If it's there, somebody will use it. I'll apply the patch despite the qgroup_flags concerns as it's not critical.
On 2020/5/8 上午4:51, David Sterba wrote: > On Thu, May 07, 2020 at 06:48:24AM +0800, Qu Wenruo wrote: >>>>> @@ -2809,6 +2821,8 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, >>>>> out: >>>>> if (!committing) >>>>> mutex_unlock(&fs_info->qgroup_ioctl_lock); >>>>> + if (need_rescan) >>>>> + fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; >>> >>> This got me curious, a non-atomic change to qgroup_flags and without >>> any protection. The function is not running in a safe context (like >>> quota enable or disable) so lack of synchronization seems suspicious. I >>> grepped for other changes to the qgroup_flags and it's very >>> inconsistent. Sometimes it's the fs_info::qgroup_lock, no lokcing at all >>> or no obvious lock but likely fs_info::qgroup_ioctl_lock or >>> qgroup_rescan_lock. >>> >>> I was considering using atomic bit updates but that would be another >>> patch. >>> >> Isn't the context safe because we're at the commit transaction critical >> section? > > I don't see why, the qgroup_flags could be changed from ioctls, eg. > adding a group relation and other places. Without protection the update > can race and some of the changes lost. > Oh, you're right. Do I need another patch to address all the inconsistence? Thanks, Qu
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index c3888fb367e7..81b2efca48b4 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2622,6 +2622,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, struct btrfs_root *quota_root; struct btrfs_qgroup *srcgroup; struct btrfs_qgroup *dstgroup; + bool need_rescan = false; u32 level_size = 0; u64 nums; @@ -2765,6 +2766,13 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, goto unlock; } ++i_qgroups; + + /* + * If we're doing a snapshot, and adding the snapshot to a new + * qgroup, the numbers are guaranteed to be incorrect. + */ + if (srcid) + need_rescan = true; } for (i = 0; i < inherit->num_ref_copies; ++i, i_qgroups += 2) { @@ -2784,6 +2792,9 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, dst->rfer = src->rfer - level_size; dst->rfer_cmpr = src->rfer_cmpr - level_size; + + /* Manually tweaking numbers? No way to keep qgroup sane */ + need_rescan = true; } for (i = 0; i < inherit->num_excl_copies; ++i, i_qgroups += 2) { struct btrfs_qgroup *src; @@ -2802,6 +2813,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, dst->excl = src->excl + level_size; dst->excl_cmpr = src->excl_cmpr + level_size; + need_rescan = true; } unlock: @@ -2809,6 +2821,8 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, out: if (!committing) mutex_unlock(&fs_info->qgroup_ioctl_lock); + if (need_rescan) + fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; return ret; }
[BUG] For the following operation, qgroup is guaranteed to be screwed up due to snapshot adding to a new qgroup: # mkfs.btrfs -f $dev # mount $dev $mnt # btrfs qgroup en $mnt # btrfs subv create $mnt/src # xfs_io -f -c "pwrite 0 1m" $mnt/src/file # sync # btrfs qgroup create 1/0 $mnt/src # btrfs subv snapshot -i 1/0 $mnt/src $mnt/snapshot # btrfs qgroup show -prce $mnt/src qgroupid rfer excl max_rfer max_excl parent child -------- ---- ---- -------- -------- ------ ----- 0/5 16.00KiB 16.00KiB none none --- --- 0/257 1.02MiB 16.00KiB none none --- --- 0/258 1.02MiB 16.00KiB none none 1/0 --- 1/0 0.00B 0.00B none none --- 0/258 ^^^^^^^^^^^^^^^^^^^^ [CAUSE] The problem is in btrfs_qgroup_inherit(), we don't have good enough check to determine if the new relation ship would break the existing accounting. Unlike btrfs_add_qgroup_relation(), which has proper check to determine if we can do quick update without a rescan, in btrfs_qgroup_inherit() we can even assign a snapshot to multiple qgroups. [FIX] Fix the problem by manually marking qgroup inconsistent for snapshot inheritance. For subvolume creation, since all its extents are exclusively owned by itself, we don't need to rescan. In theory, we should call relationship check like quick_update_accounting() when doing qgroup inheritance and inform user about qgroup inconsistent. But we don't have good enough mechanism to inform user in the snapshot creation context, thus we can only silently mark the qgroup inconsistent. Anyway, user shouldn't use qgroup inheritance during snapshot creation, and should add qgroup relationship after snapshot creation by 'btrfs qgroup assign', which has a much better UI to inform user about qgroup inconsistent and kick in rescan automatically. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/qgroup.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)