diff mbox

[2/2] btrfs: extent-tree: Ensure btrfs_trim_fs can trim the whole fs

Message ID 20171121072145.24413-2-wqu@suse.com (mailing list archive)
State New, archived
Headers show

Commit Message

Qu Wenruo Nov. 21, 2017, 7:21 a.m. UTC
[BUG]
fstrim on some btrfs only trims the unallocated space, not trimming any
space in existing block groups.

[CAUSE]
fstrim_range passed in by default fstrim will be:

range->start = 0
range->len = fs_size (which equals with super->total_bytes)
range->min_len = 512

However btrfs_trim_fs() following above parameter to search block groups
to trim.

While it's quite possible that all chunks start beyond
super->total_bytes if the fs is balanced several times.

In that case, btrfs will skip trimming block groups and only trim the
unallocated space of each device.

[FIX]
For common full fs trimming range passed in, extent its len to (u64)-1
so we will iterate all block groups.

And for custom fs trimming range, due to the fact that the range will
always be truncated by range [0, super->total_bytes), making custom fs
trimming range useless.

Just return -ENOTTY for custom fs trimming range.

Reported-by: Chris Murphy <lists@colorremedies.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-tree.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

Comments

Nikolay Borisov Nov. 21, 2017, 7:41 a.m. UTC | #1
On 21.11.2017 09:21, Qu Wenruo wrote:
> [BUG]
> fstrim on some btrfs only trims the unallocated space, not trimming any
> space in existing block groups.
> 
> [CAUSE]
> fstrim_range passed in by default fstrim will be:
> 
> range->start = 0
> range->len = fs_size (which equals with super->total_bytes)
> range->min_len = 512
> 
> However btrfs_trim_fs() following above parameter to search block groups
> to trim.
> 
> While it's quite possible that all chunks start beyond
> super->total_bytes if the fs is balanced several times.
> 
> In that case, btrfs will skip trimming block groups and only trim the
> unallocated space of each device.
> 
> [FIX]
> For common full fs trimming range passed in, extent its len to (u64)-1
> so we will iterate all block groups.
> 
> And for custom fs trimming range, due to the fact that the range will
> always be truncated by range [0, super->total_bytes), making custom fs
> trimming range useless.
> 
> Just return -ENOTTY for custom fs trimming range.
> 
> Reported-by: Chris Murphy <lists@colorremedies.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

I think we want this tagged for stable ?


> ---
>  fs/btrfs/extent-tree.c | 29 ++++++++++++++++++++++++-----
>  1 file changed, 24 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 3a252d7af158..22bbcc8c4f6c 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -11024,12 +11024,31 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>  	int ret = 0;
>  
>  	/*
> -	 * try to trim all FS space, our block group may start from non-zero.
> +	 * NOTE: Btrfs uses its own logical address space, where its first
> +	 * chunk can start anywhere if it wants.
> +	 * If we follow common start = 0 and len = fs_size from @range, we
> +	 * can end up without trimming any block groups, since it's highly
> +	 * possible all chunks start beyond that range.
> +	 *
> +	 * So if we want to trim the whole fs, extent the len to (u64)-1 to trim
> +	 * all block groups.
> +	 *
> +	 * Also, since @range will always be truncated to fs size, manually
> +	 * passing range to trim specified range doesn't make much sense.
> +	 * (No mean to trim any block group whose bytenr starts beyond
> +	 *  @total_bytes)
> +	 * So in that case, return -ENOTTY directly to prevent any custom trim
> +	 * request.
>  	 */
> -	if (range->len == total_bytes)
> -		cache = btrfs_lookup_first_block_group(fs_info, range->start);
> -	else
> -		cache = btrfs_lookup_block_group(fs_info, range->start);
> +	if (range->start == 0 && range->len == total_bytes) {
> +		range->len = (u64)-1;
> +	} else {
> +		btrfs_info(fs_info,
> +		"trimming custom range is not supported due to the limitation of fstrim_range");
> +		return -ENOTTY;
> +	}
> +
> +	cache = btrfs_lookup_first_block_group(fs_info, range->start);
>  
>  	for (; cache; cache = next_block_group(fs_info, cache)) {
>  		if (cache->key.objectid >= (range->start + range->len)) {
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Nov. 21, 2017, 8:03 a.m. UTC | #2
On 2017年11月21日 15:41, Nikolay Borisov wrote:
> 
> 
> On 21.11.2017 09:21, Qu Wenruo wrote:
>> [BUG]
>> fstrim on some btrfs only trims the unallocated space, not trimming any
>> space in existing block groups.
>>
>> [CAUSE]
>> fstrim_range passed in by default fstrim will be:
>>
>> range->start = 0
>> range->len = fs_size (which equals with super->total_bytes)
>> range->min_len = 512
>>
>> However btrfs_trim_fs() following above parameter to search block groups
>> to trim.
>>
>> While it's quite possible that all chunks start beyond
>> super->total_bytes if the fs is balanced several times.
>>
>> In that case, btrfs will skip trimming block groups and only trim the
>> unallocated space of each device.
>>
>> [FIX]
>> For common full fs trimming range passed in, extent its len to (u64)-1
>> so we will iterate all block groups.
>>
>> And for custom fs trimming range, due to the fact that the range will
>> always be truncated by range [0, super->total_bytes), making custom fs
>> trimming range useless.
>>
>> Just return -ENOTTY for custom fs trimming range.
>>
>> Reported-by: Chris Murphy <lists@colorremedies.com>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> 
> I think we want this tagged for stable ?

Sounds pretty good.

Although I'm a little concerned about the ENOTTY return branch, not sure
if it will break script of some guys.

Thanks,
Qu

> 
> 
>> ---
>>  fs/btrfs/extent-tree.c | 29 ++++++++++++++++++++++++-----
>>  1 file changed, 24 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 3a252d7af158..22bbcc8c4f6c 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -11024,12 +11024,31 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>>  	int ret = 0;
>>  
>>  	/*
>> -	 * try to trim all FS space, our block group may start from non-zero.
>> +	 * NOTE: Btrfs uses its own logical address space, where its first
>> +	 * chunk can start anywhere if it wants.
>> +	 * If we follow common start = 0 and len = fs_size from @range, we
>> +	 * can end up without trimming any block groups, since it's highly
>> +	 * possible all chunks start beyond that range.
>> +	 *
>> +	 * So if we want to trim the whole fs, extent the len to (u64)-1 to trim
>> +	 * all block groups.
>> +	 *
>> +	 * Also, since @range will always be truncated to fs size, manually
>> +	 * passing range to trim specified range doesn't make much sense.
>> +	 * (No mean to trim any block group whose bytenr starts beyond
>> +	 *  @total_bytes)
>> +	 * So in that case, return -ENOTTY directly to prevent any custom trim
>> +	 * request.
>>  	 */
>> -	if (range->len == total_bytes)
>> -		cache = btrfs_lookup_first_block_group(fs_info, range->start);
>> -	else
>> -		cache = btrfs_lookup_block_group(fs_info, range->start);
>> +	if (range->start == 0 && range->len == total_bytes) {
>> +		range->len = (u64)-1;
>> +	} else {
>> +		btrfs_info(fs_info,
>> +		"trimming custom range is not supported due to the limitation of fstrim_range");
>> +		return -ENOTTY;
>> +	}
>> +
>> +	cache = btrfs_lookup_first_block_group(fs_info, range->start);
>>  
>>  	for (; cache; cache = next_block_group(fs_info, cache)) {
>>  		if (cache->key.objectid >= (range->start + range->len)) {
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Filipe Manana Nov. 21, 2017, 3:12 p.m. UTC | #3
On Tue, Nov 21, 2017 at 7:21 AM, Qu Wenruo <wqu@suse.com> wrote:
> [BUG]
> fstrim on some btrfs only trims the unallocated space, not trimming any
> space in existing block groups.
>
> [CAUSE]
> fstrim_range passed in by default fstrim will be:
>
> range->start = 0
> range->len = fs_size (which equals with super->total_bytes)
> range->min_len = 512
>
> However btrfs_trim_fs() following above parameter to search block groups
> to trim.
>
> While it's quite possible that all chunks start beyond
> super->total_bytes if the fs is balanced several times.
>
> In that case, btrfs will skip trimming block groups and only trim the
> unallocated space of each device.
>
> [FIX]
> For common full fs trimming range passed in, extent its len to (u64)-1
> so we will iterate all block groups.
>
> And for custom fs trimming range, due to the fact that the range will
> always be truncated by range [0, super->total_bytes), making custom fs
> trimming range useless.
>
> Just return -ENOTTY for custom fs trimming range.
>
> Reported-by: Chris Murphy <lists@colorremedies.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent-tree.c | 29 ++++++++++++++++++++++++-----
>  1 file changed, 24 insertions(+), 5 deletions(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 3a252d7af158..22bbcc8c4f6c 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -11024,12 +11024,31 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>         int ret = 0;
>
>         /*
> -        * try to trim all FS space, our block group may start from non-zero.
> +        * NOTE: Btrfs uses its own logical address space, where its first
> +        * chunk can start anywhere if it wants.
> +        * If we follow common start = 0 and len = fs_size from @range, we
> +        * can end up without trimming any block groups, since it's highly
> +        * possible all chunks start beyond that range.
> +        *
> +        * So if we want to trim the whole fs, extent the len to (u64)-1 to trim
> +        * all block groups.
> +        *
> +        * Also, since @range will always be truncated to fs size, manually
> +        * passing range to trim specified range doesn't make much sense.
> +        * (No mean to trim any block group whose bytenr starts beyond
> +        *  @total_bytes)
> +        * So in that case, return -ENOTTY directly to prevent any custom trim
> +        * request.
>          */
> -       if (range->len == total_bytes)
> -               cache = btrfs_lookup_first_block_group(fs_info, range->start);
> -       else
> -               cache = btrfs_lookup_block_group(fs_info, range->start);
> +       if (range->start == 0 && range->len == total_bytes) {
> +               range->len = (u64)-1;

After the fs_trim program gets the value for the range's length and
before it invokes the trim ioctl, the value might have changed,
resulting in returning the enotty error below.

> +       } else {
> +               btrfs_info(fs_info,
> +               "trimming custom range is not supported due to the limitation of fstrim_range");

I can't understand this message, and I doubt the average user/admin can.

To me it seems this can be a lot more simple by ignoring the range,
that is, always considering [0, (u64)-1[. After all, due to the way
btrfs organizes space, the range does not make any sense and I doubt
users/programs will have all the necessary knowledge and willing to
compute a range that makes sense to btrfs based on the current block
group layout of the fs...

> +               return -ENOTTY;
> +       }
> +
> +       cache = btrfs_lookup_first_block_group(fs_info, range->start);
>
>         for (; cache; cache = next_block_group(fs_info, cache)) {
>                 if (cache->key.objectid >= (range->start + range->len)) {
> --
> 2.15.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Nov. 22, 2017, 12:42 a.m. UTC | #4
On 2017年11月21日 23:12, Filipe Manana wrote:
> On Tue, Nov 21, 2017 at 7:21 AM, Qu Wenruo <wqu@suse.com> wrote:
>> [BUG]
>> fstrim on some btrfs only trims the unallocated space, not trimming any
>> space in existing block groups.
>>
>> [CAUSE]
>> fstrim_range passed in by default fstrim will be:
>>
>> range->start = 0
>> range->len = fs_size (which equals with super->total_bytes)
>> range->min_len = 512
>>
>> However btrfs_trim_fs() following above parameter to search block groups
>> to trim.
>>
>> While it's quite possible that all chunks start beyond
>> super->total_bytes if the fs is balanced several times.
>>
>> In that case, btrfs will skip trimming block groups and only trim the
>> unallocated space of each device.
>>
>> [FIX]
>> For common full fs trimming range passed in, extent its len to (u64)-1
>> so we will iterate all block groups.
>>
>> And for custom fs trimming range, due to the fact that the range will
>> always be truncated by range [0, super->total_bytes), making custom fs
>> trimming range useless.
>>
>> Just return -ENOTTY for custom fs trimming range.
>>
>> Reported-by: Chris Murphy <lists@colorremedies.com>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/extent-tree.c | 29 ++++++++++++++++++++++++-----
>>  1 file changed, 24 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 3a252d7af158..22bbcc8c4f6c 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -11024,12 +11024,31 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>>         int ret = 0;
>>
>>         /*
>> -        * try to trim all FS space, our block group may start from non-zero.
>> +        * NOTE: Btrfs uses its own logical address space, where its first
>> +        * chunk can start anywhere if it wants.
>> +        * If we follow common start = 0 and len = fs_size from @range, we
>> +        * can end up without trimming any block groups, since it's highly
>> +        * possible all chunks start beyond that range.
>> +        *
>> +        * So if we want to trim the whole fs, extent the len to (u64)-1 to trim
>> +        * all block groups.
>> +        *
>> +        * Also, since @range will always be truncated to fs size, manually
>> +        * passing range to trim specified range doesn't make much sense.
>> +        * (No mean to trim any block group whose bytenr starts beyond
>> +        *  @total_bytes)
>> +        * So in that case, return -ENOTTY directly to prevent any custom trim
>> +        * request.
>>          */
>> -       if (range->len == total_bytes)
>> -               cache = btrfs_lookup_first_block_group(fs_info, range->start);
>> -       else
>> -               cache = btrfs_lookup_block_group(fs_info, range->start);
>> +       if (range->start == 0 && range->len == total_bytes) {
>> +               range->len = (u64)-1;
> 
> After the fs_trim program gets the value for the range's length and
> before it invokes the trim ioctl, the value might have changed,
> resulting in returning the enotty error below.
> 
>> +       } else {
>> +               btrfs_info(fs_info,
>> +               "trimming custom range is not supported due to the limitation of fstrim_range");
> 
> I can't understand this message, and I doubt the average user/admin can.
> 
> To me it seems this can be a lot more simple by ignoring the range,
> that is, always considering [0, (u64)-1[. After all, due to the way
> btrfs organizes space, the range does not make any sense and I doubt
> users/programs will have all the necessary knowledge and willing to
> compute a range that makes sense to btrfs based on the current block
> group layout of the fs...

Yes, just ignoring the range at all is the simplest solution.

Although I'm a little worried about false bug report about wrong trimmed
bytes later.

Current bug is exposed by Chris Murphy who takes extra care about the
trimmed bytes.

If we change the behavior to always trim the whole fs, maybe some other
careful guy trying to trim some strange range will report bug about this.

Maybe put some note in btrfs(5) and always trim the whole fs will be a
better solution?

Thanks,
Qu

> 
>> +               return -ENOTTY;
>> +       }
>> +
>> +       cache = btrfs_lookup_first_block_group(fs_info, range->start);
>>
>>         for (; cache; cache = next_block_group(fs_info, cache)) {
>>                 if (cache->key.objectid >= (range->start + range->len)) {
>> --
>> 2.15.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
>
diff mbox

Patch

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3a252d7af158..22bbcc8c4f6c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -11024,12 +11024,31 @@  int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
 	int ret = 0;
 
 	/*
-	 * try to trim all FS space, our block group may start from non-zero.
+	 * NOTE: Btrfs uses its own logical address space, where its first
+	 * chunk can start anywhere if it wants.
+	 * If we follow common start = 0 and len = fs_size from @range, we
+	 * can end up without trimming any block groups, since it's highly
+	 * possible all chunks start beyond that range.
+	 *
+	 * So if we want to trim the whole fs, extent the len to (u64)-1 to trim
+	 * all block groups.
+	 *
+	 * Also, since @range will always be truncated to fs size, manually
+	 * passing range to trim specified range doesn't make much sense.
+	 * (No mean to trim any block group whose bytenr starts beyond
+	 *  @total_bytes)
+	 * So in that case, return -ENOTTY directly to prevent any custom trim
+	 * request.
 	 */
-	if (range->len == total_bytes)
-		cache = btrfs_lookup_first_block_group(fs_info, range->start);
-	else
-		cache = btrfs_lookup_block_group(fs_info, range->start);
+	if (range->start == 0 && range->len == total_bytes) {
+		range->len = (u64)-1;
+	} else {
+		btrfs_info(fs_info,
+		"trimming custom range is not supported due to the limitation of fstrim_range");
+		return -ENOTTY;
+	}
+
+	cache = btrfs_lookup_first_block_group(fs_info, range->start);
 
 	for (; cache; cache = next_block_group(fs_info, cache)) {
 		if (cache->key.objectid >= (range->start + range->len)) {