Message ID | 20190913015127.14953-2-wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space | expand |
On 13.09.19 г. 4:51 ч., Qu Wenruo wrote: > [BUG] > The following script can cause btrfs qgroup data space leak: > > mkfs.btrfs -f $dev > mount $dev -o nospace_cache $mnt > > btrfs subv create $mnt/subv > btrfs quota en $mnt > btrfs quota rescan -w $mnt > btrfs qgroup limit 128m $mnt/subv > > for (( i = 0; i < 3; i++)); do > # Create 3 64M holes for latter fallocate to fail > truncate -s 192m $mnt/subv/file > xfs_io -c "pwrite 64m 4k" $mnt/subv/file > /dev/null > xfs_io -c "pwrite 128m 4k" $mnt/subv/file > /dev/null > sync > > # it's supposed to fail, and each failure will leak at least 64M > # data space > xfs_io -f -c "falloc 0 192m" $mnt/subv/file &> /dev/null > rm $mnt/subv/file > sync > done > > # Shouldn't fail after we removed the file > xfs_io -f -c "falloc 0 64m" $mnt/subv/file > > [CAUSE] > Btrfs qgroup data reserve code allows multiple reserve happen on a ^ reservations to happen > single extent_changeset: > > The only usage is in btrfs_fallocate(): > struct extent_changeset *data_reserved = NULL; > btrfs_qgroup_reserve_data(inode, &data_reserved, > range_start, range_len); > ... > btrfs_qgroup_reserve_data(inode, &data_reserved, > new_range_start, new_range_len); > extent_changeset_free(data_reserved); I take it you refer to the while() loop in btrfs_fallocate. The code above is really just a _VERY_ condensed version. extent_changeset_free is at the end of the function. Instead of putting random lines of code just explicitly state it, something along the lines of: "The only such pattern is in btrfs_fallocate in the main while loop in that function". > > However in btrfs_qgroup_reserve_data(), if one of the call failed, it > will cleanup all reserved space. > The cleanup itself is OK, but it only cleans up all > EXTENT_QGROUP_RESERVED flag, forget to release the reserved bytes. > > So if multiple btrfs_qgroup_reserve_data() get called, and the last one > failed, then previously reserved data space will get leaked. > > And due to the fact that EXTENT_QGROUP_RESERVED flag is cleaned > correctly, btrfs_qgroup_check_reserved_leak() won't catch the leakage. How about rephraing the above 3 paragraphs along the lines of: "btrfs_qgroup_reserve_data's error handling has a bug in that on error it clears all ranges in the io_tree with EXTENT_QGROUP_RESERVED flag and doesn't free the reserved bytes. This behavior has a two fold effect: 1. Clearing EXTENT_QGROUP_RESERVED ranges prevents btrfs_qgroup_check_reserved_leak to catch the leakage. 2. Leak the previously reserved data bytes. The bug manifests when N calls to btrfs_qgroup_reserve_data are made and the last one fails, leaking space allocated in the previous ones. " > > [FIX] > Also free previously reserved data bytes when btrfs_qgroup_reserve_data > fails. > > Fixes: 524725537023 ("btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function") > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/qgroup.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c > index 64bdc3e3652d..59f6a9981087 100644 > --- a/fs/btrfs/qgroup.c > +++ b/fs/btrfs/qgroup.c > @@ -3448,6 +3448,9 @@ int btrfs_qgroup_reserve_data(struct inode *inode, > while ((unode = ulist_next(&reserved->range_changed, &uiter))) > clear_extent_bit(&BTRFS_I(inode)->io_tree, unode->val, > unode->aux, EXTENT_QGROUP_RESERVED, 0, 0, NULL); > + /* Also free data bytes of already reserved one */ > + btrfs_qgroup_free_refroot(root->fs_info, root->root_key.objectid, > + orig_reserved, BTRFS_QGROUP_RSV_DATA); > extent_changeset_release(reserved); > return ret; > } >
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 64bdc3e3652d..59f6a9981087 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -3448,6 +3448,9 @@ int btrfs_qgroup_reserve_data(struct inode *inode, while ((unode = ulist_next(&reserved->range_changed, &uiter))) clear_extent_bit(&BTRFS_I(inode)->io_tree, unode->val, unode->aux, EXTENT_QGROUP_RESERVED, 0, 0, NULL); + /* Also free data bytes of already reserved one */ + btrfs_qgroup_free_refroot(root->fs_info, root->root_key.objectid, + orig_reserved, BTRFS_QGROUP_RSV_DATA); extent_changeset_release(reserved); return ret; }
[BUG] The following script can cause btrfs qgroup data space leak: mkfs.btrfs -f $dev mount $dev -o nospace_cache $mnt btrfs subv create $mnt/subv btrfs quota en $mnt btrfs quota rescan -w $mnt btrfs qgroup limit 128m $mnt/subv for (( i = 0; i < 3; i++)); do # Create 3 64M holes for latter fallocate to fail truncate -s 192m $mnt/subv/file xfs_io -c "pwrite 64m 4k" $mnt/subv/file > /dev/null xfs_io -c "pwrite 128m 4k" $mnt/subv/file > /dev/null sync # it's supposed to fail, and each failure will leak at least 64M # data space xfs_io -f -c "falloc 0 192m" $mnt/subv/file &> /dev/null rm $mnt/subv/file sync done # Shouldn't fail after we removed the file xfs_io -f -c "falloc 0 64m" $mnt/subv/file [CAUSE] Btrfs qgroup data reserve code allows multiple reserve happen on a single extent_changeset: The only usage is in btrfs_fallocate(): struct extent_changeset *data_reserved = NULL; btrfs_qgroup_reserve_data(inode, &data_reserved, range_start, range_len); ... btrfs_qgroup_reserve_data(inode, &data_reserved, new_range_start, new_range_len); extent_changeset_free(data_reserved); However in btrfs_qgroup_reserve_data(), if one of the call failed, it will cleanup all reserved space. The cleanup itself is OK, but it only cleans up all EXTENT_QGROUP_RESERVED flag, forget to release the reserved bytes. So if multiple btrfs_qgroup_reserve_data() get called, and the last one failed, then previously reserved data space will get leaked. And due to the fact that EXTENT_QGROUP_RESERVED flag is cleaned correctly, btrfs_qgroup_check_reserved_leak() won't catch the leakage. [FIX] Also free previously reserved data bytes when btrfs_qgroup_reserve_data fails. Fixes: 524725537023 ("btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function") Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/qgroup.c | 3 +++ 1 file changed, 3 insertions(+)