diff mbox series

btrfs: trim: Check the range passed into to prevent overflow

Message ID 20190528082154.6450-1-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: trim: Check the range passed into to prevent overflow | expand

Commit Message

Qu Wenruo May 28, 2019, 8:21 a.m. UTC
Normally the range->len is set to default value (U64_MAX), but when it's
not default value, we should check if the range overflows.

And if overflows, return -EINVAL before doing anything.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-tree.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

Comments

Nikolay Borisov May 28, 2019, 12:22 p.m. UTC | #1
On 28.05.19 г. 11:21 ч., Qu Wenruo wrote:
> Normally the range->len is set to default value (U64_MAX), but when it's
> not default value, we should check if the range overflows.
> 
> And if overflows, return -EINVAL before doing anything.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>


> ---
>  fs/btrfs/extent-tree.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index f79e477a378e..62bfba6d3c07 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -11245,6 +11245,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>  	struct btrfs_device *device;
>  	struct list_head *devices;
>  	u64 group_trimmed;
> +	u64 range_end = U64_MAX;
>  	u64 start;
>  	u64 end;
>  	u64 trimmed = 0;
> @@ -11254,16 +11255,23 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>  	int dev_ret = 0;
>  	int ret = 0;
>  
> +	/*
> +	 * Check range overflow if range->len is set.
> +	 * The default range->len is U64_MAX.
> +	 */
> +	if (range->len != U64_MAX && check_add_overflow(range->start,
> +				range->len, &range_end))
> +		return -EINVAL;
> +
>  	cache = btrfs_lookup_first_block_group(fs_info, range->start);
>  	for (; cache; cache = next_block_group(cache)) {
> -		if (cache->key.objectid >= (range->start + range->len)) {
> +		if (cache->key.objectid >= range_end) {
>  			btrfs_put_block_group(cache);
>  			break;
>  		}
>  
>  		start = max(range->start, cache->key.objectid);
> -		end = min(range->start + range->len,
> -				cache->key.objectid + cache->key.offset);
> +		end = min(range_end, cache->key.objectid + cache->key.offset);
>  
>  		if (end - start >= range->minlen) {
>  			if (!block_group_cache_done(cache)) {
>
David Sterba May 30, 2019, 12:04 p.m. UTC | #2
On Tue, May 28, 2019 at 04:21:54PM +0800, Qu Wenruo wrote:
> Normally the range->len is set to default value (U64_MAX), but when it's
> not default value, we should check if the range overflows.
> 
> And if overflows, return -EINVAL before doing anything.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

The range support of TRIM will be reverted so this patch won't be
needed.
Qu Wenruo May 30, 2019, 12:05 p.m. UTC | #3
On 2019/5/30 下午8:04, David Sterba wrote:
> On Tue, May 28, 2019 at 04:21:54PM +0800, Qu Wenruo wrote:
>> Normally the range->len is set to default value (U64_MAX), but when it's
>> not default value, we should check if the range overflows.
>>
>> And if overflows, return -EINVAL before doing anything.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> 
> The range support of TRIM will be reverted so this patch won't be
> needed.

No, this patch is an independent one.

The objective is to behavior the same when @range->start + @range->len
overflows when @range->len is not default value (U64MAX).

With the trim range patch reverted, we should still detect overflow and
return -EINVAL to meet the new generic/260 check.

So please still include this patch.

Thanks,
Qu
Nikolay Borisov May 30, 2019, 12:13 p.m. UTC | #4
On 30.05.19 г. 15:04 ч., David Sterba wrote:
> On Tue, May 28, 2019 at 04:21:54PM +0800, Qu Wenruo wrote:
>> Normally the range->len is set to default value (U64_MAX), but when it's
>> not default value, we should check if the range overflows.
>>
>> And if overflows, return -EINVAL before doing anything.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> 
> The range support of TRIM will be reverted so this patch won't be
> needed.
> 

Range support is going to be removed only from unallocated device space
trimming, not block group trimming. So it's still required.
Anand Jain May 31, 2019, 5:35 a.m. UTC | #5
On 5/28/19 4:21 PM, Qu Wenruo wrote:
> Normally the range->len is set to default value (U64_MAX), but when it's
> not default value, we should check if the range overflows.
> 
> And if overflows, return -EINVAL before doing anything.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

fstests patch
    https://patchwork.kernel.org/patch/10964105/
makes the sub-test like [1] in generic/260 skipped

[1]
-----
fssize=$($DF_PROG -k | grep "$SCRATCH_MNT" | grep "$SCRATCH_DEV"  | awk 
'{print $3}')
beyond_eofs=$((fssize*2048))
fstrim -o $beyond_eofs $SCRATCH_MNT <-- should_fail
-----

Originally [1] reported expected EINVAL until the patch
   6ba9fc8e628b btrfs: Ensure btrfs_trim_fs can trim the whole filesystem

Not sure how will some of the production machines will find this as,
not compatible with previous versions? Nevertheless in practical terms
things are fine.

  Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand

> ---
>   fs/btrfs/extent-tree.c | 14 +++++++++++---
>   1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index f79e477a378e..62bfba6d3c07 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -11245,6 +11245,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>   	struct btrfs_device *device;
>   	struct list_head *devices;
>   	u64 group_trimmed;
> +	u64 range_end = U64_MAX;
>   	u64 start;
>   	u64 end;
>   	u64 trimmed = 0;
> @@ -11254,16 +11255,23 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>   	int dev_ret = 0;
>   	int ret = 0;
>   
> +	/*
> +	 * Check range overflow if range->len is set.
> +	 * The default range->len is U64_MAX.
> +	 */
> +	if (range->len != U64_MAX && check_add_overflow(range->start,
> +				range->len, &range_end))
> +		return -EINVAL;
> +
>   	cache = btrfs_lookup_first_block_group(fs_info, range->start);
>   	for (; cache; cache = next_block_group(cache)) {
> -		if (cache->key.objectid >= (range->start + range->len)) {
> +		if (cache->key.objectid >= range_end) {
>   			btrfs_put_block_group(cache);
>   			break;
>   		}
>   
>   		start = max(range->start, cache->key.objectid);
> -		end = min(range->start + range->len,
> -				cache->key.objectid + cache->key.offset);
> +		end = min(range_end, cache->key.objectid + cache->key.offset);
>   
>   		if (end - start >= range->minlen) {
>   			if (!block_group_cache_done(cache)) {
>
Qu Wenruo May 31, 2019, 5:56 a.m. UTC | #6
On 2019/5/31 下午1:35, Anand Jain wrote:
> On 5/28/19 4:21 PM, Qu Wenruo wrote:
>> Normally the range->len is set to default value (U64_MAX), but when it's
>> not default value, we should check if the range overflows.
>>
>> And if overflows, return -EINVAL before doing anything.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> 
> fstests patch
>    https://patchwork.kernel.org/patch/10964105/
> makes the sub-test like [1] in generic/260 skipped
> 
> [1]
> -----
> fssize=$($DF_PROG -k | grep "$SCRATCH_MNT" | grep "$SCRATCH_DEV"  | awk
> '{print $3}')
> beyond_eofs=$((fssize*2048))
> fstrim -o $beyond_eofs $SCRATCH_MNT <-- should_fail
> -----

As I mentioned in the commit message and offline, the idea of *end of
filesystem* is not clear enough.

For regular fs, they have every byte mapped directly to its block device
(except external journal).
So its end of filesystem is easy to determine.
But we can still argue, how to trim the external journal device? Or
should the external journal device contribute to the end of the fs?


Now for btrfs, it's a dm-linear space, then dm-raid/dm-linear for each
chunk. Thus we can argue either the end of btrfs is U64MAX (from
dm-linear view), or the end of last block group (from mapped chunk view).

Further more, how to define end of a filesystem when the fs spans across
several devices?

I'd say this is a good timing for us to make the fstrim behavior more clear.

Thanks,
Qu

> 
> Originally [1] reported expected EINVAL until the patch
>   6ba9fc8e628b btrfs: Ensure btrfs_trim_fs can trim the whole filesystem
> 
> Not sure how will some of the production machines will find this as,
> not compatible with previous versions? Nevertheless in practical terms
> things are fine.
> 
>  Reviewed-by: Anand Jain <anand.jain@oracle.com>
> 
> Thanks, Anand
> 
>> ---
>>   fs/btrfs/extent-tree.c | 14 +++++++++++---
>>   1 file changed, 11 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index f79e477a378e..62bfba6d3c07 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -11245,6 +11245,7 @@ int btrfs_trim_fs(struct btrfs_fs_info
>> *fs_info, struct fstrim_range *range)
>>       struct btrfs_device *device;
>>       struct list_head *devices;
>>       u64 group_trimmed;
>> +    u64 range_end = U64_MAX;
>>       u64 start;
>>       u64 end;
>>       u64 trimmed = 0;
>> @@ -11254,16 +11255,23 @@ int btrfs_trim_fs(struct btrfs_fs_info
>> *fs_info, struct fstrim_range *range)
>>       int dev_ret = 0;
>>       int ret = 0;
>>   +    /*
>> +     * Check range overflow if range->len is set.
>> +     * The default range->len is U64_MAX.
>> +     */
>> +    if (range->len != U64_MAX && check_add_overflow(range->start,
>> +                range->len, &range_end))
>> +        return -EINVAL;
>> +
>>       cache = btrfs_lookup_first_block_group(fs_info, range->start);
>>       for (; cache; cache = next_block_group(cache)) {
>> -        if (cache->key.objectid >= (range->start + range->len)) {
>> +        if (cache->key.objectid >= range_end) {
>>               btrfs_put_block_group(cache);
>>               break;
>>           }
>>             start = max(range->start, cache->key.objectid);
>> -        end = min(range->start + range->len,
>> -                cache->key.objectid + cache->key.offset);
>> +        end = min(range_end, cache->key.objectid + cache->key.offset);
>>             if (end - start >= range->minlen) {
>>               if (!block_group_cache_done(cache)) {
>>
>
Qu Wenruo May 31, 2019, 6:01 a.m. UTC | #7
On 2019/5/31 下午1:56, Qu Wenruo wrote:
> 
> 
> On 2019/5/31 下午1:35, Anand Jain wrote:
>> On 5/28/19 4:21 PM, Qu Wenruo wrote:
>>> Normally the range->len is set to default value (U64_MAX), but when it's
>>> not default value, we should check if the range overflows.
>>>
>>> And if overflows, return -EINVAL before doing anything.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>
>> fstests patch
>>    https://patchwork.kernel.org/patch/10964105/
>> makes the sub-test like [1] in generic/260 skipped
>>
>> [1]
>> -----
>> fssize=$($DF_PROG -k | grep "$SCRATCH_MNT" | grep "$SCRATCH_DEV"  | awk
>> '{print $3}')
>> beyond_eofs=$((fssize*2048))
>> fstrim -o $beyond_eofs $SCRATCH_MNT <-- should_fail
>> -----
> 
> As I mentioned in the commit message and offline, the idea of *end of
> filesystem* is not clear enough.
> 
> For regular fs, they have almost every byte mapped directly to its block device
> (except external journal).
> So its end of filesystem is easy to determine.
> But we can still argue, how to trim the external journal device? Or
> should the external journal device contribute to the end of the fs?
> 
> 
> Now for btrfs, it's a dm-linear space, then dm-raid/dm-linear for each
> chunk. Thus we can argue either the end of btrfs is U64MAX (from
> dm-linear view), or the end of last block group (from mapped chunk view).
> 
> Further more, how to define end of a filesystem when the fs spans across
> several devices?
> 
> I'd say this is a good timing for us to make the fstrim behavior more clear.

Also add fsdevel list into the discussion.

> 
> Thanks,
> Qu
> 
>>
>> Originally [1] reported expected EINVAL until the patch
>>   6ba9fc8e628b btrfs: Ensure btrfs_trim_fs can trim the whole filesystem
>>
>> Not sure how will some of the production machines will find this as,
>> not compatible with previous versions? Nevertheless in practical terms
>> things are fine.
>>
>>  Reviewed-by: Anand Jain <anand.jain@oracle.com>
>>
>> Thanks, Anand
>>
>>> ---
>>>   fs/btrfs/extent-tree.c | 14 +++++++++++---
>>>   1 file changed, 11 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index f79e477a378e..62bfba6d3c07 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -11245,6 +11245,7 @@ int btrfs_trim_fs(struct btrfs_fs_info
>>> *fs_info, struct fstrim_range *range)
>>>       struct btrfs_device *device;
>>>       struct list_head *devices;
>>>       u64 group_trimmed;
>>> +    u64 range_end = U64_MAX;
>>>       u64 start;
>>>       u64 end;
>>>       u64 trimmed = 0;
>>> @@ -11254,16 +11255,23 @@ int btrfs_trim_fs(struct btrfs_fs_info
>>> *fs_info, struct fstrim_range *range)
>>>       int dev_ret = 0;
>>>       int ret = 0;
>>>   +    /*
>>> +     * Check range overflow if range->len is set.
>>> +     * The default range->len is U64_MAX.
>>> +     */
>>> +    if (range->len != U64_MAX && check_add_overflow(range->start,
>>> +                range->len, &range_end))
>>> +        return -EINVAL;
>>> +
>>>       cache = btrfs_lookup_first_block_group(fs_info, range->start);
>>>       for (; cache; cache = next_block_group(cache)) {
>>> -        if (cache->key.objectid >= (range->start + range->len)) {
>>> +        if (cache->key.objectid >= range_end) {
>>>               btrfs_put_block_group(cache);
>>>               break;
>>>           }
>>>             start = max(range->start, cache->key.objectid);
>>> -        end = min(range->start + range->len,
>>> -                cache->key.objectid + cache->key.offset);
>>> +        end = min(range_end, cache->key.objectid + cache->key.offset);
>>>             if (end - start >= range->minlen) {
>>>               if (!block_group_cache_done(cache)) {
>>>
>>
>
Qu Wenruo Aug. 7, 2019, 12:03 p.m. UTC | #8
Gentle ping?

Thanks to the discussion with Anand, I find this patch is not merged yet.

Thanks,
Qu

On 2019/5/28 下午4:21, Qu Wenruo wrote:
> Normally the range->len is set to default value (U64_MAX), but when it's
> not default value, we should check if the range overflows.
> 
> And if overflows, return -EINVAL before doing anything.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent-tree.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index f79e477a378e..62bfba6d3c07 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -11245,6 +11245,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>  	struct btrfs_device *device;
>  	struct list_head *devices;
>  	u64 group_trimmed;
> +	u64 range_end = U64_MAX;
>  	u64 start;
>  	u64 end;
>  	u64 trimmed = 0;
> @@ -11254,16 +11255,23 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>  	int dev_ret = 0;
>  	int ret = 0;
>  
> +	/*
> +	 * Check range overflow if range->len is set.
> +	 * The default range->len is U64_MAX.
> +	 */
> +	if (range->len != U64_MAX && check_add_overflow(range->start,
> +				range->len, &range_end))
> +		return -EINVAL;
> +
>  	cache = btrfs_lookup_first_block_group(fs_info, range->start);
>  	for (; cache; cache = next_block_group(cache)) {
> -		if (cache->key.objectid >= (range->start + range->len)) {
> +		if (cache->key.objectid >= range_end) {
>  			btrfs_put_block_group(cache);
>  			break;
>  		}
>  
>  		start = max(range->start, cache->key.objectid);
> -		end = min(range->start + range->len,
> -				cache->key.objectid + cache->key.offset);
> +		end = min(range_end, cache->key.objectid + cache->key.offset);
>  
>  		if (end - start >= range->minlen) {
>  			if (!block_group_cache_done(cache)) {
>
David Sterba Aug. 7, 2019, 12:30 p.m. UTC | #9
On Wed, Aug 07, 2019 at 08:03:41PM +0800, Qu Wenruo wrote:
> Gentle ping?
> 
> Thanks to the discussion with Anand, I find this patch is not merged yet.

I had the patch marked as dropped as it was during the trim range
changes. Reading the discussion again it's ok to merge and I'll add it
to the queue.
diff mbox series

Patch

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f79e477a378e..62bfba6d3c07 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -11245,6 +11245,7 @@  int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
 	struct btrfs_device *device;
 	struct list_head *devices;
 	u64 group_trimmed;
+	u64 range_end = U64_MAX;
 	u64 start;
 	u64 end;
 	u64 trimmed = 0;
@@ -11254,16 +11255,23 @@  int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
 	int dev_ret = 0;
 	int ret = 0;
 
+	/*
+	 * Check range overflow if range->len is set.
+	 * The default range->len is U64_MAX.
+	 */
+	if (range->len != U64_MAX && check_add_overflow(range->start,
+				range->len, &range_end))
+		return -EINVAL;
+
 	cache = btrfs_lookup_first_block_group(fs_info, range->start);
 	for (; cache; cache = next_block_group(cache)) {
-		if (cache->key.objectid >= (range->start + range->len)) {
+		if (cache->key.objectid >= range_end) {
 			btrfs_put_block_group(cache);
 			break;
 		}
 
 		start = max(range->start, cache->key.objectid);
-		end = min(range->start + range->len,
-				cache->key.objectid + cache->key.offset);
+		end = min(range_end, cache->key.objectid + cache->key.offset);
 
 		if (end - start >= range->minlen) {
 			if (!block_group_cache_done(cache)) {