diff mbox series

[v4] btrfs: trim: fix underflow in trim length to prevent access beyond device boundary

Message ID 20200731112911.115665-1-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series [v4] btrfs: trim: fix underflow in trim length to prevent access beyond device boundary | expand

Commit Message

Qu Wenruo July 31, 2020, 11:29 a.m. UTC
[BUG]
The following script can lead to tons of beyond device boundary access:

  mkfs.btrfs -f $dev -b 10G
  mount $dev $mnt
  trimfs $mnt
  btrfs filesystem resize 1:-1G $mnt
  trimfs $mnt

[CAUSE]
Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
find_first_clear_extent_bit"), we try to avoid trimming ranges that's
already trimmed.

So we check device->alloc_state by finding the first range which doesn't
have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.

But if we shrunk the device, that bits are not cleared, thus we could
easily got a range starts beyond the shrunk device size.

This results the returned @start and @end are all beyond device size,
then we call "end = min(end, device->total_bytes -1);" making @end
smaller than device size.

Then finally we goes "len = end - start + 1", totally underflow the
result, and lead to the beyond-device-boundary access.

[FIX]
This patch will fix the problem in two ways:
- Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
  This is the root fix

- Add extra safe net when trimming free device extents
  We check and warn if the returned range is already beyond current
  device.

Link: https://github.com/kdave/btrfs-progs/issues/282
Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
---
Changelog:
v2:
- Add proper fixes tag
- Add extra warning for beyond device end case
- Add graceful exit for already trimmed case
v3:
- Don't return EUCLEAN for beyond boundary access
- Rephrase the warning message for beyond boundary access
v4:
- Remove one duplicated check on exiting the trim loop
---
 fs/btrfs/extent-tree.c | 14 ++++++++++++++
 fs/btrfs/volumes.c     | 12 ++++++++++++
 2 files changed, 26 insertions(+)

Comments

David Sterba July 31, 2020, 2:08 p.m. UTC | #1
On Fri, Jul 31, 2020 at 07:29:11PM +0800, Qu Wenruo wrote:
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4720,6 +4720,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>  	}
>  
>  	mutex_lock(&fs_info->chunk_mutex);
> +	/*
> +	 * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
> +	 * current device boundary.
> +	 * This shouldn't fail, as alloc_state should only utilize those two
> +	 * bits, thus we shouldn't alloc new memory for clearing the status.

If this fails or not depends on implementation details of
clear_extent_bits and this comment will get out of sync eventually, so I
don't think it should be that specific.

If the new_size is somewhere in the middle of an existing state, it'll
need to be split anyway, no?

alloc_state |-----+++++|
clear             |------------------------- ... (u64)-1|

So we'd need to keep the state "-" and unset bits only from "+", and
this will require a split.

But I still have doubts about just clearing the range, why are there any
device->alloc_state entries at all after device is shrunk? Using
clear_extent_bits here is not wrong if we look at the end result of
clearing the range, but otherwise it leaves some state information
and allocated memory behind.
Qu Wenruo July 31, 2020, 11:35 p.m. UTC | #2
On 2020/7/31 下午10:08, David Sterba wrote:
> On Fri, Jul 31, 2020 at 07:29:11PM +0800, Qu Wenruo wrote:
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -4720,6 +4720,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>>  	}
>>  
>>  	mutex_lock(&fs_info->chunk_mutex);
>> +	/*
>> +	 * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
>> +	 * current device boundary.
>> +	 * This shouldn't fail, as alloc_state should only utilize those two
>> +	 * bits, thus we shouldn't alloc new memory for clearing the status.
> 
> If this fails or not depends on implementation details of
> clear_extent_bits and this comment will get out of sync eventually, so I
> don't think it should be that specific.
> 
> If the new_size is somewhere in the middle of an existing state, it'll
> need to be split anyway, no?

Nope. Because in alloc_state we only have two bits utilized,
CHUNK_TRIMMED and CHUNK_ALLOCATED.

Thus what we're doing is to clear all utilized bits.

> 
> alloc_state |-----+++++|
> clear             |------------------------- ... (u64)-1|
> 
> So we'd need to keep the state "-" and unset bits only from "+", and
> this will require a split.

In this case, we would only reduce the the size of the existing status,
or just remove it completely.

> 
> But I still have doubts about just clearing the range, why are there any
> device->alloc_state entries at all after device is shrunk?

Because the alloc_state is mostly only utilized by trim facility, thus
existing functions won't bother clearing/setting it.

In this particular case, previous fstrim run would set the CHUNK_TRIMMED
bit for all unallocated range (except the super reserve).
Then shrink doesn't clear the exceed range, and cause problem.

Thus clearing the bit in btrfs_shrink_device() makes sense.

> Using
> clear_extent_bits here is not wrong if we look at the end result of
> clearing the range, but otherwise it leaves some state information
> and allocated memory behind.
> 
Not that complex case, just plain not fully considered corner case.

Thanks,
Qu
David Sterba Aug. 11, 2020, 7:22 a.m. UTC | #3
On Sat, Aug 01, 2020 at 07:35:26AM +0800, Qu Wenruo wrote:
> 
> 
> On 2020/7/31 下午10:08, David Sterba wrote:
> > On Fri, Jul 31, 2020 at 07:29:11PM +0800, Qu Wenruo wrote:
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -4720,6 +4720,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
> >>  	}
> >>  
> >>  	mutex_lock(&fs_info->chunk_mutex);
> >> +	/*
> >> +	 * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
> >> +	 * current device boundary.
> >> +	 * This shouldn't fail, as alloc_state should only utilize those two
> >> +	 * bits, thus we shouldn't alloc new memory for clearing the status.
> > 
> > If this fails or not depends on implementation details of
> > clear_extent_bits and this comment will get out of sync eventually, so I
> > don't think it should be that specific.
> > 
> > If the new_size is somewhere in the middle of an existing state, it'll
> > need to be split anyway, no?
> 
> Nope. Because in alloc_state we only have two bits utilized,
> CHUNK_TRIMMED and CHUNK_ALLOCATED.
> 
> Thus what we're doing is to clear all utilized bits.

Which is true for now, adding a new bit would change that.

> > 
> > alloc_state |-----+++++|
> > clear             |------------------------- ... (u64)-1|
> > 
> > So we'd need to keep the state "-" and unset bits only from "+", and
> > this will require a split.
> 
> In this case, we would only reduce the the size of the existing status,
> or just remove it completely.

I haven't found the 'only reduce the size' in the code, thre's always
some split. The case in __clear_extent_bit is

 773          *     | ---- desired range ---- |
 774          *  | state | or
 775          *  | ------------- state -------------- |

the case on line 774 and followed by split_state.

> > But I still have doubts about just clearing the range, why are there any
> > device->alloc_state entries at all after device is shrunk?
> 
> Because the alloc_state is mostly only utilized by trim facility, thus
> existing functions won't bother clearing/setting it.
> 
> In this particular case, previous fstrim run would set the CHUNK_TRIMMED
> bit for all unallocated range (except the super reserve).
> Then shrink doesn't clear the exceed range, and cause problem.

So the unallocated range on a device is also represented in the
alloc_state tree?

> Thus clearing the bit in btrfs_shrink_device() makes sense.
> 
> > Using
> > clear_extent_bits here is not wrong if we look at the end result of
> > clearing the range, but otherwise it leaves some state information
> > and allocated memory behind.
> > 
> Not that complex case, just plain not fully considered corner case.

So what to do about that? I expect the alloc_state tree to represent the
device accurately and don't want to leave known issues unfixed.
Qu Wenruo Aug. 11, 2020, 7:42 a.m. UTC | #4
On 2020/8/11 下午3:22, David Sterba wrote:
> On Sat, Aug 01, 2020 at 07:35:26AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2020/7/31 下午10:08, David Sterba wrote:
>>> On Fri, Jul 31, 2020 at 07:29:11PM +0800, Qu Wenruo wrote:
>>>> --- a/fs/btrfs/volumes.c
>>>> +++ b/fs/btrfs/volumes.c
>>>> @@ -4720,6 +4720,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>>>>  	}
>>>>
>>>>  	mutex_lock(&fs_info->chunk_mutex);
>>>> +	/*
>>>> +	 * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
>>>> +	 * current device boundary.
>>>> +	 * This shouldn't fail, as alloc_state should only utilize those two
>>>> +	 * bits, thus we shouldn't alloc new memory for clearing the status.
>>>
>>> If this fails or not depends on implementation details of
>>> clear_extent_bits and this comment will get out of sync eventually, so I
>>> don't think it should be that specific.
>>>
>>> If the new_size is somewhere in the middle of an existing state, it'll
>>> need to be split anyway, no?
>>
>> Nope. Because in alloc_state we only have two bits utilized,
>> CHUNK_TRIMMED and CHUNK_ALLOCATED.
>>
>> Thus what we're doing is to clear all utilized bits.
>
> Which is true for now, adding a new bit would change that.

Yep, that's also why I had the comment here.

>
>>>
>>> alloc_state |-----+++++|
>>> clear             |------------------------- ... (u64)-1|
>>>
>>> So we'd need to keep the state "-" and unset bits only from "+", and
>>> this will require a split.
>>
>> In this case, we would only reduce the the size of the existing status,
>> or just remove it completely.
>
> I haven't found the 'only reduce the size' in the code, thre's always
> some split. The case in __clear_extent_bit is
>
>  773          *     | ---- desired range ---- |
>  774          *  | state | or
>  775          *  | ------------- state -------------- |
>
> the case on line 774 and followed by split_state.

You're right, we still need to allocate memory in this case.
And it's the BUG_ON() and the preallocated state saving us from the
memory failure.

>
>>> But I still have doubts about just clearing the range, why are there any
>>> device->alloc_state entries at all after device is shrunk?
>>
>> Because the alloc_state is mostly only utilized by trim facility, thus
>> existing functions won't bother clearing/setting it.
>>
>> In this particular case, previous fstrim run would set the CHUNK_TRIMMED
>> bit for all unallocated range (except the super reserve).
>> Then shrink doesn't clear the exceed range, and cause problem.
>
> So the unallocated range on a device is also represented in the
> alloc_state tree?

If the unallocated range has been trimmed, then it has an state, with
CHUNK_TRIMMED bit set.

>
>> Thus clearing the bit in btrfs_shrink_device() makes sense.
>>
>>> Using
>>> clear_extent_bits here is not wrong if we look at the end result of
>>> clearing the range, but otherwise it leaves some state information
>>> and allocated memory behind.
>>>
>> Not that complex case, just plain not fully considered corner case.
>
> So what to do about that? I expect the alloc_state tree to represent the
> device accurately and don't want to leave known issues unfixed.
>
The patch still stand as it is.

The reproducer is still the same:
- fstrim
  This marks the unallocated range of one device with CHUNK_TRIMMED bit
  And the range starts from the last dev extent end, to the end of the
device.

- shrink device
  We need to remove the CHUNK_* bits, or problems will happen in next step

- fstrim again
  Due to the bad check, we could underflow the length of the range, leads to
  access beyond boundary.

The patch fixes it in two locations.
When shrink, we clear the CHUNK_* bits, as these bits makes no sense for
range beyond device boundary.

When fstrim, we do extra check to ensure we don't underflow/overflow
anything.

Is there anything unclear that needs extra comments?

Thanks,
Qu
Nikolay Borisov Aug. 11, 2020, 8:41 a.m. UTC | #5
On 31.07.20 г. 14:29 ч., Qu Wenruo wrote:
> [BUG]
> The following script can lead to tons of beyond device boundary access:
> 
>   mkfs.btrfs -f $dev -b 10G
>   mount $dev $mnt
>   trimfs $mnt
>   btrfs filesystem resize 1:-1G $mnt
>   trimfs $mnt
> 
> [CAUSE]
> Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
> find_first_clear_extent_bit"), we try to avoid trimming ranges that's
> already trimmed.
> 
> So we check device->alloc_state by finding the first range which doesn't
> have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.
> 
> But if we shrunk the device, that bits are not cleared, thus we could
> easily got a range starts beyond the shrunk device size.
> 
> This results the returned @start and @end are all beyond device size,
> then we call "end = min(end, device->total_bytes -1);" making @end
> smaller than device size.
> 
> Then finally we goes "len = end - start + 1", totally underflow the
> result, and lead to the beyond-device-boundary access.
> 
> [FIX]
> This patch will fix the problem in two ways:
> - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
>   This is the root fix
> 
> - Add extra safe net when trimming free device extents
>   We check and warn if the returned range is already beyond current
>   device.
> 
> Link: https://github.com/kdave/btrfs-progs/issues/282
> Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit")
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> ---
> Changelog:
> v2:
> - Add proper fixes tag
> - Add extra warning for beyond device end case
> - Add graceful exit for already trimmed case
> v3:
> - Don't return EUCLEAN for beyond boundary access
> - Rephrase the warning message for beyond boundary access
> v4:
> - Remove one duplicated check on exiting the trim loop
> ---
>  fs/btrfs/extent-tree.c | 14 ++++++++++++++
>  fs/btrfs/volumes.c     | 12 ++++++++++++
>  2 files changed, 26 insertions(+)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index fa7d83051587..6b1b5dfba4b3 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -33,6 +33,7 @@
>  #include "delalloc-space.h"
>  #include "block-group.h"
>  #include "discard.h"
> +#include "rcu-string.h"
>  
>  #undef SCRAMBLE_DELAYED_REFS
>  
> @@ -5669,6 +5670,19 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
>  					    &start, &end,
>  					    CHUNK_TRIMMED | CHUNK_ALLOCATED);
>  
> +		/* CHUNK_* bits not cleared properly */
> +		if (start > device->total_bytes) {
> +			WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> +			btrfs_warn_in_rcu(fs_info,
> +"ignoring attempt to trim beyond device size: offset %llu length %llu device %s device size %llu",
> +					  start, end - start + 1,
> +					  rcu_str_deref(device->name),
> +					  device->total_bytes);
> +			mutex_unlock(&fs_info->chunk_mutex);
> +			ret = 0;
> +			break;
> +		}

Isn't this a NOOP, because the latter chunk ensures we can never cross
device->total_bytes. Since this is a purely defensive mechanism and
following this patch we *should* never have CHUNK_* bits set beyond
device->total_bytes I'd say make this an ASSERT(). Otherwise you force
people to pay the cost of the check for every trim ...


> +
>  		/* Ensure we skip the reserved area in the first 1M */
>  		start = max_t(u64, start, SZ_1M);
>  
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index d7670e2a9f39..4e51ef68ea72 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4720,6 +4720,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>  	}
>  
>  	mutex_lock(&fs_info->chunk_mutex);
> +	/*
> +	 * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
> +	 * current device boundary.
> +	 * This shouldn't fail, as alloc_state should only utilize those two
> +	 * bits, thus we shouldn't alloc new memory for clearing the status.
> +	 *
> +	 * So here we just do an ASSERT() to catch future behavior change.
> +	 */
> +	ret = clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
> +				CHUNK_TRIMMED | CHUNK_ALLOCATED);
> +	ASSERT(!ret);

I agree with this part.

> +
>  	btrfs_device_set_disk_total_bytes(device, new_size);
>  	if (list_empty(&device->post_commit_list))
>  		list_add_tail(&device->post_commit_list,
>
Qu Wenruo Aug. 11, 2020, 8:46 a.m. UTC | #6
On 2020/8/11 下午4:41, Nikolay Borisov wrote:
> 
> 
> On 31.07.20 г. 14:29 ч., Qu Wenruo wrote:
>> [BUG]
>> The following script can lead to tons of beyond device boundary access:
>>
>>   mkfs.btrfs -f $dev -b 10G
>>   mount $dev $mnt
>>   trimfs $mnt
>>   btrfs filesystem resize 1:-1G $mnt
>>   trimfs $mnt
>>
>> [CAUSE]
>> Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
>> find_first_clear_extent_bit"), we try to avoid trimming ranges that's
>> already trimmed.
>>
>> So we check device->alloc_state by finding the first range which doesn't
>> have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.
>>
>> But if we shrunk the device, that bits are not cleared, thus we could
>> easily got a range starts beyond the shrunk device size.
>>
>> This results the returned @start and @end are all beyond device size,
>> then we call "end = min(end, device->total_bytes -1);" making @end
>> smaller than device size.
>>
>> Then finally we goes "len = end - start + 1", totally underflow the
>> result, and lead to the beyond-device-boundary access.
>>
>> [FIX]
>> This patch will fix the problem in two ways:
>> - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
>>   This is the root fix
>>
>> - Add extra safe net when trimming free device extents
>>   We check and warn if the returned range is already beyond current
>>   device.
>>
>> Link: https://github.com/kdave/btrfs-progs/issues/282
>> Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit")
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> Reviewed-by: Filipe Manana <fdmanana@suse.com>
>> ---
>> Changelog:
>> v2:
>> - Add proper fixes tag
>> - Add extra warning for beyond device end case
>> - Add graceful exit for already trimmed case
>> v3:
>> - Don't return EUCLEAN for beyond boundary access
>> - Rephrase the warning message for beyond boundary access
>> v4:
>> - Remove one duplicated check on exiting the trim loop
>> ---
>>  fs/btrfs/extent-tree.c | 14 ++++++++++++++
>>  fs/btrfs/volumes.c     | 12 ++++++++++++
>>  2 files changed, 26 insertions(+)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index fa7d83051587..6b1b5dfba4b3 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -33,6 +33,7 @@
>>  #include "delalloc-space.h"
>>  #include "block-group.h"
>>  #include "discard.h"
>> +#include "rcu-string.h"
>>  
>>  #undef SCRAMBLE_DELAYED_REFS
>>  
>> @@ -5669,6 +5670,19 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
>>  					    &start, &end,
>>  					    CHUNK_TRIMMED | CHUNK_ALLOCATED);
>>  
>> +		/* CHUNK_* bits not cleared properly */
>> +		if (start > device->total_bytes) {
>> +			WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
>> +			btrfs_warn_in_rcu(fs_info,
>> +"ignoring attempt to trim beyond device size: offset %llu length %llu device %s device size %llu",
>> +					  start, end - start + 1,
>> +					  rcu_str_deref(device->name),
>> +					  device->total_bytes);
>> +			mutex_unlock(&fs_info->chunk_mutex);
>> +			ret = 0;
>> +			break;
>> +		}
> 
> Isn't this a NOOP, because the latter chunk ensures we can never cross
> device->total_bytes. Since this is a purely defensive mechanism and
> following this patch we *should* never have CHUNK_* bits set beyond
> device->total_bytes I'd say make this an ASSERT(). Otherwise you force
> people to pay the cost of the check for every trim ...

I'm fine with the ASSERT() idea.

But on the other hand, we really don't know how things can go wrong, and
such graceful exit makes us way easier to expose and fix bugs when it
happens in a production system.

So currently I'm 50-50 on change it to ASSERT().

Thanks,
Qu

> 
> 
>> +
>>  		/* Ensure we skip the reserved area in the first 1M */
>>  		start = max_t(u64, start, SZ_1M);
>>  
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index d7670e2a9f39..4e51ef68ea72 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -4720,6 +4720,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>>  	}
>>  
>>  	mutex_lock(&fs_info->chunk_mutex);
>> +	/*
>> +	 * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
>> +	 * current device boundary.
>> +	 * This shouldn't fail, as alloc_state should only utilize those two
>> +	 * bits, thus we shouldn't alloc new memory for clearing the status.
>> +	 *
>> +	 * So here we just do an ASSERT() to catch future behavior change.
>> +	 */
>> +	ret = clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
>> +				CHUNK_TRIMMED | CHUNK_ALLOCATED);
>> +	ASSERT(!ret);
> 
> I agree with this part.
> 
>> +
>>  	btrfs_device_set_disk_total_bytes(device, new_size);
>>  	if (list_empty(&device->post_commit_list))
>>  		list_add_tail(&device->post_commit_list,
>>
>
Filipe Manana Aug. 11, 2020, 10:24 a.m. UTC | #7
On Tue, Aug 11, 2020 at 9:48 AM Qu Wenruo <wqu@suse.com> wrote:
>
>
>
> On 2020/8/11 下午4:41, Nikolay Borisov wrote:
> >
> >
> > On 31.07.20 г. 14:29 ч., Qu Wenruo wrote:
> >> [BUG]
> >> The following script can lead to tons of beyond device boundary access:
> >>
> >>   mkfs.btrfs -f $dev -b 10G
> >>   mount $dev $mnt
> >>   trimfs $mnt
> >>   btrfs filesystem resize 1:-1G $mnt
> >>   trimfs $mnt
> >>
> >> [CAUSE]
> >> Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
> >> find_first_clear_extent_bit"), we try to avoid trimming ranges that's
> >> already trimmed.
> >>
> >> So we check device->alloc_state by finding the first range which doesn't
> >> have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.
> >>
> >> But if we shrunk the device, that bits are not cleared, thus we could
> >> easily got a range starts beyond the shrunk device size.
> >>
> >> This results the returned @start and @end are all beyond device size,
> >> then we call "end = min(end, device->total_bytes -1);" making @end
> >> smaller than device size.
> >>
> >> Then finally we goes "len = end - start + 1", totally underflow the
> >> result, and lead to the beyond-device-boundary access.
> >>
> >> [FIX]
> >> This patch will fix the problem in two ways:
> >> - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
> >>   This is the root fix
> >>
> >> - Add extra safe net when trimming free device extents
> >>   We check and warn if the returned range is already beyond current
> >>   device.
> >>
> >> Link: https://github.com/kdave/btrfs-progs/issues/282
> >> Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit")
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> >> ---
> >> Changelog:
> >> v2:
> >> - Add proper fixes tag
> >> - Add extra warning for beyond device end case
> >> - Add graceful exit for already trimmed case
> >> v3:
> >> - Don't return EUCLEAN for beyond boundary access
> >> - Rephrase the warning message for beyond boundary access
> >> v4:
> >> - Remove one duplicated check on exiting the trim loop
> >> ---
> >>  fs/btrfs/extent-tree.c | 14 ++++++++++++++
> >>  fs/btrfs/volumes.c     | 12 ++++++++++++
> >>  2 files changed, 26 insertions(+)
> >>
> >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> >> index fa7d83051587..6b1b5dfba4b3 100644
> >> --- a/fs/btrfs/extent-tree.c
> >> +++ b/fs/btrfs/extent-tree.c
> >> @@ -33,6 +33,7 @@
> >>  #include "delalloc-space.h"
> >>  #include "block-group.h"
> >>  #include "discard.h"
> >> +#include "rcu-string.h"
> >>
> >>  #undef SCRAMBLE_DELAYED_REFS
> >>
> >> @@ -5669,6 +5670,19 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
> >>                                          &start, &end,
> >>                                          CHUNK_TRIMMED | CHUNK_ALLOCATED);
> >>
> >> +            /* CHUNK_* bits not cleared properly */
> >> +            if (start > device->total_bytes) {
> >> +                    WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> >> +                    btrfs_warn_in_rcu(fs_info,
> >> +"ignoring attempt to trim beyond device size: offset %llu length %llu device %s device size %llu",
> >> +                                      start, end - start + 1,
> >> +                                      rcu_str_deref(device->name),
> >> +                                      device->total_bytes);
> >> +                    mutex_unlock(&fs_info->chunk_mutex);
> >> +                    ret = 0;
> >> +                    break;
> >> +            }
> >
> > Isn't this a NOOP, because the latter chunk ensures we can never cross
> > device->total_bytes. Since this is a purely defensive mechanism and
> > following this patch we *should* never have CHUNK_* bits set beyond
> > device->total_bytes I'd say make this an ASSERT(). Otherwise you force
> > people to pay the cost of the check for every trim ...
>
> I'm fine with the ASSERT() idea.
>
> But on the other hand, we really don't know how things can go wrong, and
> such graceful exit makes us way easier to expose and fix bugs when it
> happens in a production system.
>
> So currently I'm 50-50 on change it to ASSERT().

Typical non-debug kernels provided by at least some distros (looking
at debian) don't have btrfs asserts enabled by default.
So such a type of bug can lead to losing any data a user might have
stored beyond the new size boundary.
And if they are enabled, it results in a crash / BUG_ON(). So I'm
strongly for the warning and skipping trim requests beyond the fs
size.

Thanks.

>
> Thanks,
> Qu
>
> >
> >
> >> +
> >>              /* Ensure we skip the reserved area in the first 1M */
> >>              start = max_t(u64, start, SZ_1M);
> >>
> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >> index d7670e2a9f39..4e51ef68ea72 100644
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -4720,6 +4720,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
> >>      }
> >>
> >>      mutex_lock(&fs_info->chunk_mutex);
> >> +    /*
> >> +     * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
> >> +     * current device boundary.
> >> +     * This shouldn't fail, as alloc_state should only utilize those two
> >> +     * bits, thus we shouldn't alloc new memory for clearing the status.
> >> +     *
> >> +     * So here we just do an ASSERT() to catch future behavior change.
> >> +     */
> >> +    ret = clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
> >> +                            CHUNK_TRIMMED | CHUNK_ALLOCATED);
> >> +    ASSERT(!ret);
> >
> > I agree with this part.
> >
> >> +
> >>      btrfs_device_set_disk_total_bytes(device, new_size);
> >>      if (list_empty(&device->post_commit_list))
> >>              list_add_tail(&device->post_commit_list,
> >>
> >
>
David Sterba Aug. 12, 2020, 6:10 a.m. UTC | #8
On Tue, Aug 11, 2020 at 03:42:31PM +0800, Qu Wenruo wrote:
> >> CHUNK_TRIMMED and CHUNK_ALLOCATED.
> >>
> >> Thus what we're doing is to clear all utilized bits.
> >
> > Which is true for now, adding a new bit would change that.
> 
> Yep, that's also why I had the comment here.

The comment is not enough to make the code future proof, as we really
want do clear all bits in the range beyond device->total_size and that
works only if we specify them all (clear_state_bit).

Either we can define a bitmask of all the bits next to their definition
(slightly more error prone if we forget to add that manually), or pass a
all ones to the clear_extent_bit. Though this is a bit uncommon, with a
comment I'd prefer this option.

> >>> But I still have doubts about just clearing the range, why are there any
> >>> device->alloc_state entries at all after device is shrunk?
> >>
> >> Because the alloc_state is mostly only utilized by trim facility, thus
> >> existing functions won't bother clearing/setting it.
> >>
> >> In this particular case, previous fstrim run would set the CHUNK_TRIMMED
> >> bit for all unallocated range (except the super reserve).
> >> Then shrink doesn't clear the exceed range, and cause problem.
> >
> > So the unallocated range on a device is also represented in the
> > alloc_state tree?
> 
> If the unallocated range has been trimmed, then it has an state, with
> CHUNK_TRIMMED bit set.

To clarify my previous doubts: I was talking about the slack space (ie.
space beyond device->total_size up to block device size), 'unallocated'
is ambiguous and confusing here.

The missing bit about clearing the whole range lies inside
clear_state_bit that deletes all state tracking once all bits are
dropped.
David Sterba Aug. 12, 2020, 6:14 a.m. UTC | #9
On Tue, Aug 11, 2020 at 11:24:29AM +0100, Filipe Manana wrote:
> On Tue, Aug 11, 2020 at 9:48 AM Qu Wenruo <wqu@suse.com> wrote:
> >
> >
> >
> > On 2020/8/11 下午4:41, Nikolay Borisov wrote:
> > >
> > >
> > > On 31.07.20 г. 14:29 ч., Qu Wenruo wrote:
> > >> [BUG]
> > >> The following script can lead to tons of beyond device boundary access:
> > >>
> > >>   mkfs.btrfs -f $dev -b 10G
> > >>   mount $dev $mnt
> > >>   trimfs $mnt
> > >>   btrfs filesystem resize 1:-1G $mnt
> > >>   trimfs $mnt
> > >>
> > >> [CAUSE]
> > >> Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
> > >> find_first_clear_extent_bit"), we try to avoid trimming ranges that's
> > >> already trimmed.
> > >>
> > >> So we check device->alloc_state by finding the first range which doesn't
> > >> have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.
> > >>
> > >> But if we shrunk the device, that bits are not cleared, thus we could
> > >> easily got a range starts beyond the shrunk device size.
> > >>
> > >> This results the returned @start and @end are all beyond device size,
> > >> then we call "end = min(end, device->total_bytes -1);" making @end
> > >> smaller than device size.
> > >>
> > >> Then finally we goes "len = end - start + 1", totally underflow the
> > >> result, and lead to the beyond-device-boundary access.
> > >>
> > >> [FIX]
> > >> This patch will fix the problem in two ways:
> > >> - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
> > >>   This is the root fix
> > >>
> > >> - Add extra safe net when trimming free device extents
> > >>   We check and warn if the returned range is already beyond current
> > >>   device.
> > >>
> > >> Link: https://github.com/kdave/btrfs-progs/issues/282
> > >> Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit")
> > >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> > >> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> > >> ---
> > >> Changelog:
> > >> v2:
> > >> - Add proper fixes tag
> > >> - Add extra warning for beyond device end case
> > >> - Add graceful exit for already trimmed case
> > >> v3:
> > >> - Don't return EUCLEAN for beyond boundary access
> > >> - Rephrase the warning message for beyond boundary access
> > >> v4:
> > >> - Remove one duplicated check on exiting the trim loop
> > >> ---
> > >>  fs/btrfs/extent-tree.c | 14 ++++++++++++++
> > >>  fs/btrfs/volumes.c     | 12 ++++++++++++
> > >>  2 files changed, 26 insertions(+)
> > >>
> > >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > >> index fa7d83051587..6b1b5dfba4b3 100644
> > >> --- a/fs/btrfs/extent-tree.c
> > >> +++ b/fs/btrfs/extent-tree.c
> > >> @@ -33,6 +33,7 @@
> > >>  #include "delalloc-space.h"
> > >>  #include "block-group.h"
> > >>  #include "discard.h"
> > >> +#include "rcu-string.h"
> > >>
> > >>  #undef SCRAMBLE_DELAYED_REFS
> > >>
> > >> @@ -5669,6 +5670,19 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
> > >>                                          &start, &end,
> > >>                                          CHUNK_TRIMMED | CHUNK_ALLOCATED);
> > >>
> > >> +            /* CHUNK_* bits not cleared properly */
> > >> +            if (start > device->total_bytes) {
> > >> +                    WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> > >> +                    btrfs_warn_in_rcu(fs_info,
> > >> +"ignoring attempt to trim beyond device size: offset %llu length %llu device %s device size %llu",
> > >> +                                      start, end - start + 1,
> > >> +                                      rcu_str_deref(device->name),
> > >> +                                      device->total_bytes);
> > >> +                    mutex_unlock(&fs_info->chunk_mutex);
> > >> +                    ret = 0;
> > >> +                    break;
> > >> +            }
> > >
> > > Isn't this a NOOP, because the latter chunk ensures we can never cross
> > > device->total_bytes. Since this is a purely defensive mechanism and
> > > following this patch we *should* never have CHUNK_* bits set beyond
> > > device->total_bytes I'd say make this an ASSERT(). Otherwise you force
> > > people to pay the cost of the check for every trim ...
> >
> > I'm fine with the ASSERT() idea.
> >
> > But on the other hand, we really don't know how things can go wrong, and
> > such graceful exit makes us way easier to expose and fix bugs when it
> > happens in a production system.
> >
> > So currently I'm 50-50 on change it to ASSERT().
> 
> Typical non-debug kernels provided by at least some distros (looking
> at debian) don't have btrfs asserts enabled by default.
> So such a type of bug can lead to losing any data a user might have
> stored beyond the new size boundary.
> And if they are enabled, it results in a crash / BUG_ON(). So I'm
> strongly for the warning and skipping trim requests beyond the fs
> size.

I agree, the check should be a always enabled and just warn.
Qu Wenruo Aug. 12, 2020, 6:33 a.m. UTC | #10
On 2020/8/12 下午2:10, David Sterba wrote:
> On Tue, Aug 11, 2020 at 03:42:31PM +0800, Qu Wenruo wrote:
>>>> CHUNK_TRIMMED and CHUNK_ALLOCATED.
>>>>
>>>> Thus what we're doing is to clear all utilized bits.
>>>
>>> Which is true for now, adding a new bit would change that.
>>
>> Yep, that's also why I had the comment here.
>
> The comment is not enough to make the code future proof, as we really
> want do clear all bits in the range beyond device->total_size and that
> works only if we specify them all (clear_state_bit).
>
> Either we can define a bitmask of all the bits next to their definition
> (slightly more error prone if we forget to add that manually), or pass a
> all ones to the clear_extent_bit. Though this is a bit uncommon, with a
> comment I'd prefer this option.

OK, I'd go that (u64)-1 as clear_bits direction, with updated comment.

Thanks,
Qu

>
>>>>> But I still have doubts about just clearing the range, why are there any
>>>>> device->alloc_state entries at all after device is shrunk?
>>>>
>>>> Because the alloc_state is mostly only utilized by trim facility, thus
>>>> existing functions won't bother clearing/setting it.
>>>>
>>>> In this particular case, previous fstrim run would set the CHUNK_TRIMMED
>>>> bit for all unallocated range (except the super reserve).
>>>> Then shrink doesn't clear the exceed range, and cause problem.
>>>
>>> So the unallocated range on a device is also represented in the
>>> alloc_state tree?
>>
>> If the unallocated range has been trimmed, then it has an state, with
>> CHUNK_TRIMMED bit set.
>
> To clarify my previous doubts: I was talking about the slack space (ie.
> space beyond device->total_size up to block device size), 'unallocated'
> is ambiguous and confusing here.
>
> The missing bit about clearing the whole range lies inside
> clear_state_bit that deletes all state tracking once all bits are
> dropped.
>
David Sterba Aug. 12, 2020, 6:37 a.m. UTC | #11
On Wed, Aug 12, 2020 at 02:33:33PM +0800, Qu Wenruo wrote:
> 
> 
> On 2020/8/12 下午2:10, David Sterba wrote:
> > On Tue, Aug 11, 2020 at 03:42:31PM +0800, Qu Wenruo wrote:
> >>>> CHUNK_TRIMMED and CHUNK_ALLOCATED.
> >>>>
> >>>> Thus what we're doing is to clear all utilized bits.
> >>>
> >>> Which is true for now, adding a new bit would change that.
> >>
> >> Yep, that's also why I had the comment here.
> >
> > The comment is not enough to make the code future proof, as we really
> > want do clear all bits in the range beyond device->total_size and that
> > works only if we specify them all (clear_state_bit).
> >
> > Either we can define a bitmask of all the bits next to their definition
> > (slightly more error prone if we forget to add that manually), or pass a
> > all ones to the clear_extent_bit. Though this is a bit uncommon, with a
> > comment I'd prefer this option.
> 
> OK, I'd go that (u64)-1 as clear_bits direction, with updated comment.

Hold, on, I'm sending out v5 that I'm about to commit.
David Sterba Aug. 12, 2020, 6:43 a.m. UTC | #12
The v5 changes were discussed but were not all trivial to be just
committed. I need to add the patch to pull request branch soon so am
not waiting for your v5

v5:

- add mask for chunk state bits and use that to clear the range a after
  device shrink; on a second thought doing all ones did not look clean
  to me

- removed assert after clear_extent_bits - make it consistent with all
  other calls where we don't check the return value for now

- reworded comments

---

From: Qu Wenruo <wqu@suse.com>
Subject: [PATCH] btrfs: trim: fix underflow in trim length to prevent access
 beyond device boundary

[BUG]
The following script can lead to tons of beyond device boundary access:

  mkfs.btrfs -f $dev -b 10G
  mount $dev $mnt
  trimfs $mnt
  btrfs filesystem resize 1:-1G $mnt
  trimfs $mnt

[CAUSE]
Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
find_first_clear_extent_bit"), we try to avoid trimming ranges that's
already trimmed.

So we check device->alloc_state by finding the first range which doesn't
have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.

But if we shrunk the device, that bits are not cleared, thus we could
easily got a range starts beyond the shrunk device size.

This results the returned @start and @end are all beyond device size,
then we call "end = min(end, device->total_bytes -1);" making @end
smaller than device size.

Then finally we goes "len = end - start + 1", totally underflow the
result, and lead to the beyond-device-boundary access.

[FIX]
This patch will fix the problem in two ways:

- Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
  This is the root fix

- Add extra safety check when trimming free device extents
  We check and warn if the returned range is already beyond current
  device.

Link: https://github.com/kdave/btrfs-progs/issues/282
Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit")
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/extent-io-tree.h |  2 ++
 fs/btrfs/extent-tree.c    | 14 ++++++++++++++
 fs/btrfs/volumes.c        |  4 ++++
 3 files changed, 20 insertions(+)

diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index f39d47a2d01a..219a09a2b734 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -34,6 +34,8 @@ struct io_failure_record;
  */
 #define CHUNK_ALLOCATED				EXTENT_DIRTY
 #define CHUNK_TRIMMED				EXTENT_DEFRAG
+#define CHUNK_STATE_MASK			(CHUNK_ALLOCATED |		\
+						 CHUNK_TRIMMED)
 
 enum {
 	IO_TREE_FS_PINNED_EXTENTS,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index fa7d83051587..597505df90b4 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -33,6 +33,7 @@
 #include "delalloc-space.h"
 #include "block-group.h"
 #include "discard.h"
+#include "rcu-string.h"
 
 #undef SCRAMBLE_DELAYED_REFS
 
@@ -5669,6 +5670,19 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
 					    &start, &end,
 					    CHUNK_TRIMMED | CHUNK_ALLOCATED);
 
+		/* Check if there are any CHUNK_* bits left */
+		if (start > device->total_bytes) {
+			WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+			btrfs_warn_in_rcu(fs_info,
+"ignoring attempt to trim beyond device size: offset %llu length %llu device %s device size %llu",
+					  start, end - start + 1,
+					  rcu_str_deref(device->name),
+					  device->total_bytes);
+			mutex_unlock(&fs_info->chunk_mutex);
+			ret = 0;
+			break;
+		}
+
 		/* Ensure we skip the reserved area in the first 1M */
 		start = max_t(u64, start, SZ_1M);
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d7670e2a9f39..ee96c5869f57 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4720,6 +4720,10 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 	}
 
 	mutex_lock(&fs_info->chunk_mutex);
+	/* Clear all state bits beyond the shrunk device size */
+	clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
+			  CHUNK_STATE_MASK);
+
 	btrfs_device_set_disk_total_bytes(device, new_size);
 	if (list_empty(&device->post_commit_list))
 		list_add_tail(&device->post_commit_list,
Qu Wenruo Aug. 12, 2020, 6:57 a.m. UTC | #13
On 2020/8/12 下午2:43, David Sterba wrote:
> The v5 changes were discussed but were not all trivial to be just
> committed. I need to add the patch to pull request branch soon so am
> not waiting for your v5
> 
> v5:
> 
> - add mask for chunk state bits and use that to clear the range a after
>   device shrink; on a second thought doing all ones did not look clean
>   to me
> 
> - removed assert after clear_extent_bits - make it consistent with all
>   other calls where we don't check the return value for now
> 
> - reworded comments

Looks good to me.

Like to give a reviewed-by but that won't make any sense...

Thanks for your effort!
Qu

> 
> ---
> 
> From: Qu Wenruo <wqu@suse.com>
> Subject: [PATCH] btrfs: trim: fix underflow in trim length to prevent access
>  beyond device boundary
> 
> [BUG]
> The following script can lead to tons of beyond device boundary access:
> 
>   mkfs.btrfs -f $dev -b 10G
>   mount $dev $mnt
>   trimfs $mnt
>   btrfs filesystem resize 1:-1G $mnt
>   trimfs $mnt
> 
> [CAUSE]
> Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
> find_first_clear_extent_bit"), we try to avoid trimming ranges that's
> already trimmed.
> 
> So we check device->alloc_state by finding the first range which doesn't
> have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.
> 
> But if we shrunk the device, that bits are not cleared, thus we could
> easily got a range starts beyond the shrunk device size.
> 
> This results the returned @start and @end are all beyond device size,
> then we call "end = min(end, device->total_bytes -1);" making @end
> smaller than device size.
> 
> Then finally we goes "len = end - start + 1", totally underflow the
> result, and lead to the beyond-device-boundary access.
> 
> [FIX]
> This patch will fix the problem in two ways:
> 
> - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
>   This is the root fix
> 
> - Add extra safety check when trimming free device extents
>   We check and warn if the returned range is already beyond current
>   device.
> 
> Link: https://github.com/kdave/btrfs-progs/issues/282
> Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit")
> CC: stable@vger.kernel.org # 5.4+
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  fs/btrfs/extent-io-tree.h |  2 ++
>  fs/btrfs/extent-tree.c    | 14 ++++++++++++++
>  fs/btrfs/volumes.c        |  4 ++++
>  3 files changed, 20 insertions(+)
> 
> diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
> index f39d47a2d01a..219a09a2b734 100644
> --- a/fs/btrfs/extent-io-tree.h
> +++ b/fs/btrfs/extent-io-tree.h
> @@ -34,6 +34,8 @@ struct io_failure_record;
>   */
>  #define CHUNK_ALLOCATED				EXTENT_DIRTY
>  #define CHUNK_TRIMMED				EXTENT_DEFRAG
> +#define CHUNK_STATE_MASK			(CHUNK_ALLOCATED |		\
> +						 CHUNK_TRIMMED)
>  
>  enum {
>  	IO_TREE_FS_PINNED_EXTENTS,
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index fa7d83051587..597505df90b4 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -33,6 +33,7 @@
>  #include "delalloc-space.h"
>  #include "block-group.h"
>  #include "discard.h"
> +#include "rcu-string.h"
>  
>  #undef SCRAMBLE_DELAYED_REFS
>  
> @@ -5669,6 +5670,19 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
>  					    &start, &end,
>  					    CHUNK_TRIMMED | CHUNK_ALLOCATED);
>  
> +		/* Check if there are any CHUNK_* bits left */
> +		if (start > device->total_bytes) {
> +			WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> +			btrfs_warn_in_rcu(fs_info,
> +"ignoring attempt to trim beyond device size: offset %llu length %llu device %s device size %llu",
> +					  start, end - start + 1,
> +					  rcu_str_deref(device->name),
> +					  device->total_bytes);
> +			mutex_unlock(&fs_info->chunk_mutex);
> +			ret = 0;
> +			break;
> +		}
> +
>  		/* Ensure we skip the reserved area in the first 1M */
>  		start = max_t(u64, start, SZ_1M);
>  
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index d7670e2a9f39..ee96c5869f57 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4720,6 +4720,10 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>  	}
>  
>  	mutex_lock(&fs_info->chunk_mutex);
> +	/* Clear all state bits beyond the shrunk device size */
> +	clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
> +			  CHUNK_STATE_MASK);
> +
>  	btrfs_device_set_disk_total_bytes(device, new_size);
>  	if (list_empty(&device->post_commit_list))
>  		list_add_tail(&device->post_commit_list,
>
Qu Wenruo Aug. 12, 2020, 11:14 a.m. UTC | #14
On 2020/8/12 下午2:43, David Sterba wrote:
> The v5 changes were discussed but were not all trivial to be just
> committed. I need to add the patch to pull request branch soon so am
> not waiting for your v5
>
> v5:
>
> - add mask for chunk state bits and use that to clear the range a after
>   device shrink; on a second thought doing all ones did not look clean
>   to me

Extra idea inspired by this patch.

We can do extra extent_io_tree bits sanity check for DEBUG build.

In the past, extent_io_tree got its owner member, which each
extent_io_tree should have one. (Unfortunately, when alloc_state is
added, we didn't add a new entry for it)

With that, we can easily verify the set/clear bits against its owner to
ensure we don't set wrong bits for wrong extent_io_tree.
E.g. CHUNK_* bits are only for alloc_state, while
DELALLOC/QGROUP_RESERVED are only for inode io tree.

Of course, this would be in a new patch.

Thanks,
Qu
>
> - removed assert after clear_extent_bits - make it consistent with all
>   other calls where we don't check the return value for now
>
> - reworded comments
>
> ---
>
> From: Qu Wenruo <wqu@suse.com>
> Subject: [PATCH] btrfs: trim: fix underflow in trim length to prevent access
>  beyond device boundary
>
> [BUG]
> The following script can lead to tons of beyond device boundary access:
>
>   mkfs.btrfs -f $dev -b 10G
>   mount $dev $mnt
>   trimfs $mnt
>   btrfs filesystem resize 1:-1G $mnt
>   trimfs $mnt
>
> [CAUSE]
> Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
> find_first_clear_extent_bit"), we try to avoid trimming ranges that's
> already trimmed.
>
> So we check device->alloc_state by finding the first range which doesn't
> have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.
>
> But if we shrunk the device, that bits are not cleared, thus we could
> easily got a range starts beyond the shrunk device size.
>
> This results the returned @start and @end are all beyond device size,
> then we call "end = min(end, device->total_bytes -1);" making @end
> smaller than device size.
>
> Then finally we goes "len = end - start + 1", totally underflow the
> result, and lead to the beyond-device-boundary access.
>
> [FIX]
> This patch will fix the problem in two ways:
>
> - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
>   This is the root fix
>
> - Add extra safety check when trimming free device extents
>   We check and warn if the returned range is already beyond current
>   device.
>
> Link: https://github.com/kdave/btrfs-progs/issues/282
> Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit")
> CC: stable@vger.kernel.org # 5.4+
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  fs/btrfs/extent-io-tree.h |  2 ++
>  fs/btrfs/extent-tree.c    | 14 ++++++++++++++
>  fs/btrfs/volumes.c        |  4 ++++
>  3 files changed, 20 insertions(+)
>
> diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
> index f39d47a2d01a..219a09a2b734 100644
> --- a/fs/btrfs/extent-io-tree.h
> +++ b/fs/btrfs/extent-io-tree.h
> @@ -34,6 +34,8 @@ struct io_failure_record;
>   */
>  #define CHUNK_ALLOCATED				EXTENT_DIRTY
>  #define CHUNK_TRIMMED				EXTENT_DEFRAG
> +#define CHUNK_STATE_MASK			(CHUNK_ALLOCATED |		\
> +						 CHUNK_TRIMMED)
>
>  enum {
>  	IO_TREE_FS_PINNED_EXTENTS,
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index fa7d83051587..597505df90b4 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -33,6 +33,7 @@
>  #include "delalloc-space.h"
>  #include "block-group.h"
>  #include "discard.h"
> +#include "rcu-string.h"
>
>  #undef SCRAMBLE_DELAYED_REFS
>
> @@ -5669,6 +5670,19 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
>  					    &start, &end,
>  					    CHUNK_TRIMMED | CHUNK_ALLOCATED);
>
> +		/* Check if there are any CHUNK_* bits left */
> +		if (start > device->total_bytes) {
> +			WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> +			btrfs_warn_in_rcu(fs_info,
> +"ignoring attempt to trim beyond device size: offset %llu length %llu device %s device size %llu",
> +					  start, end - start + 1,
> +					  rcu_str_deref(device->name),
> +					  device->total_bytes);
> +			mutex_unlock(&fs_info->chunk_mutex);
> +			ret = 0;
> +			break;
> +		}
> +
>  		/* Ensure we skip the reserved area in the first 1M */
>  		start = max_t(u64, start, SZ_1M);
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index d7670e2a9f39..ee96c5869f57 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4720,6 +4720,10 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>  	}
>
>  	mutex_lock(&fs_info->chunk_mutex);
> +	/* Clear all state bits beyond the shrunk device size */
> +	clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
> +			  CHUNK_STATE_MASK);
> +
>  	btrfs_device_set_disk_total_bytes(device, new_size);
>  	if (list_empty(&device->post_commit_list))
>  		list_add_tail(&device->post_commit_list,
>
Nikolay Borisov Aug. 12, 2020, 11:24 a.m. UTC | #15
On 12.08.20 г. 14:14 ч., Qu Wenruo wrote:
> 
> 
> On 2020/8/12 下午2:43, David Sterba wrote:
>> The v5 changes were discussed but were not all trivial to be just
>> committed. I need to add the patch to pull request branch soon so am
>> not waiting for your v5
>>
>> v5:
>>
>> - add mask for chunk state bits and use that to clear the range a after
>>   device shrink; on a second thought doing all ones did not look clean
>>   to me
> 
> Extra idea inspired by this patch.
> 
> We can do extra extent_io_tree bits sanity check for DEBUG build.
> 
> In the past, extent_io_tree got its owner member, which each
> extent_io_tree should have one. (Unfortunately, when alloc_state is
> added, we didn't add a new entry for it)
> 
> With that, we can easily verify the set/clear bits against its owner to
> ensure we don't set wrong bits for wrong extent_io_tree.
> E.g. CHUNK_* bits are only for alloc_state, while
> DELALLOC/QGROUP_RESERVED are only for inode io tree.

Will this work given the CHUNK_* bits are defined to 2 existing flags,
chosen such that to not clash with the special logic in bit management
functions? (check comment above CHUNK_* bits defines).


> 
> Of course, this would be in a new patch.
> 
> Thanks,
> Qu
>>
>> - removed assert after clear_extent_bits - make it consistent with all
>>   other calls where we don't check the return value for now
>>
>> - reworded comments
>>
>> ---
>>
>> From: Qu Wenruo <wqu@suse.com>
>> Subject: [PATCH] btrfs: trim: fix underflow in trim length to prevent access
>>  beyond device boundary
>>
>> [BUG]
>> The following script can lead to tons of beyond device boundary access:
>>
>>   mkfs.btrfs -f $dev -b 10G
>>   mount $dev $mnt
>>   trimfs $mnt
>>   btrfs filesystem resize 1:-1G $mnt
>>   trimfs $mnt
>>
>> [CAUSE]
>> Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
>> find_first_clear_extent_bit"), we try to avoid trimming ranges that's
>> already trimmed.
>>
>> So we check device->alloc_state by finding the first range which doesn't
>> have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.
>>
>> But if we shrunk the device, that bits are not cleared, thus we could
>> easily got a range starts beyond the shrunk device size.
>>
>> This results the returned @start and @end are all beyond device size,
>> then we call "end = min(end, device->total_bytes -1);" making @end
>> smaller than device size.
>>
>> Then finally we goes "len = end - start + 1", totally underflow the
>> result, and lead to the beyond-device-boundary access.
>>
>> [FIX]
>> This patch will fix the problem in two ways:
>>
>> - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
>>   This is the root fix
>>
>> - Add extra safety check when trimming free device extents
>>   We check and warn if the returned range is already beyond current
>>   device.
>>
>> Link: https://github.com/kdave/btrfs-progs/issues/282
>> Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit")
>> CC: stable@vger.kernel.org # 5.4+
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> Reviewed-by: Filipe Manana <fdmanana@suse.com>
>> Signed-off-by: David Sterba <dsterba@suse.com>
>> ---
>>  fs/btrfs/extent-io-tree.h |  2 ++
>>  fs/btrfs/extent-tree.c    | 14 ++++++++++++++
>>  fs/btrfs/volumes.c        |  4 ++++
>>  3 files changed, 20 insertions(+)
>>
>> diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
>> index f39d47a2d01a..219a09a2b734 100644
>> --- a/fs/btrfs/extent-io-tree.h
>> +++ b/fs/btrfs/extent-io-tree.h
>> @@ -34,6 +34,8 @@ struct io_failure_record;
>>   */
>>  #define CHUNK_ALLOCATED				EXTENT_DIRTY
>>  #define CHUNK_TRIMMED				EXTENT_DEFRAG
>> +#define CHUNK_STATE_MASK			(CHUNK_ALLOCATED |		\
>> +						 CHUNK_TRIMMED)
>>
>>  enum {
>>  	IO_TREE_FS_PINNED_EXTENTS,
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index fa7d83051587..597505df90b4 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -33,6 +33,7 @@
>>  #include "delalloc-space.h"
>>  #include "block-group.h"
>>  #include "discard.h"
>> +#include "rcu-string.h"
>>
>>  #undef SCRAMBLE_DELAYED_REFS
>>
>> @@ -5669,6 +5670,19 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
>>  					    &start, &end,
>>  					    CHUNK_TRIMMED | CHUNK_ALLOCATED);
>>
>> +		/* Check if there are any CHUNK_* bits left */
>> +		if (start > device->total_bytes) {
>> +			WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
>> +			btrfs_warn_in_rcu(fs_info,
>> +"ignoring attempt to trim beyond device size: offset %llu length %llu device %s device size %llu",
>> +					  start, end - start + 1,
>> +					  rcu_str_deref(device->name),
>> +					  device->total_bytes);
>> +			mutex_unlock(&fs_info->chunk_mutex);
>> +			ret = 0;
>> +			break;
>> +		}
>> +
>>  		/* Ensure we skip the reserved area in the first 1M */
>>  		start = max_t(u64, start, SZ_1M);
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index d7670e2a9f39..ee96c5869f57 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -4720,6 +4720,10 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>>  	}
>>
>>  	mutex_lock(&fs_info->chunk_mutex);
>> +	/* Clear all state bits beyond the shrunk device size */
>> +	clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
>> +			  CHUNK_STATE_MASK);
>> +
>>  	btrfs_device_set_disk_total_bytes(device, new_size);
>>  	if (list_empty(&device->post_commit_list))
>>  		list_add_tail(&device->post_commit_list,
>>
>
Qu Wenruo Aug. 12, 2020, 11:26 a.m. UTC | #16
On 2020/8/12 下午7:24, Nikolay Borisov wrote:
>
>
> On 12.08.20 г. 14:14 ч., Qu Wenruo wrote:
>>
>>
>> On 2020/8/12 下午2:43, David Sterba wrote:
>>> The v5 changes were discussed but were not all trivial to be just
>>> committed. I need to add the patch to pull request branch soon so am
>>> not waiting for your v5
>>>
>>> v5:
>>>
>>> - add mask for chunk state bits and use that to clear the range a after
>>>   device shrink; on a second thought doing all ones did not look clean
>>>   to me
>>
>> Extra idea inspired by this patch.
>>
>> We can do extra extent_io_tree bits sanity check for DEBUG build.
>>
>> In the past, extent_io_tree got its owner member, which each
>> extent_io_tree should have one. (Unfortunately, when alloc_state is
>> added, we didn't add a new entry for it)
>>
>> With that, we can easily verify the set/clear bits against its owner to
>> ensure we don't set wrong bits for wrong extent_io_tree.
>> E.g. CHUNK_* bits are only for alloc_state, while
>> DELALLOC/QGROUP_RESERVED are only for inode io tree.
>
> Will this work given the CHUNK_* bits are defined to 2 existing flags,
> chosen such that to not clash with the special logic in bit management
> functions? (check comment above CHUNK_* bits defines).

No problem.

We're setting a allowed mask for each owner, although we're reusing
bits, it doesn't really matter that much.

Thanks,
Qu
>
>
>>
>> Of course, this would be in a new patch.
>>
>> Thanks,
>> Qu
>>>
>>> - removed assert after clear_extent_bits - make it consistent with all
>>>   other calls where we don't check the return value for now
>>>
>>> - reworded comments
>>>
>>> ---
>>>
>>> From: Qu Wenruo <wqu@suse.com>
>>> Subject: [PATCH] btrfs: trim: fix underflow in trim length to prevent access
>>>  beyond device boundary
>>>
>>> [BUG]
>>> The following script can lead to tons of beyond device boundary access:
>>>
>>>   mkfs.btrfs -f $dev -b 10G
>>>   mount $dev $mnt
>>>   trimfs $mnt
>>>   btrfs filesystem resize 1:-1G $mnt
>>>   trimfs $mnt
>>>
>>> [CAUSE]
>>> Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
>>> find_first_clear_extent_bit"), we try to avoid trimming ranges that's
>>> already trimmed.
>>>
>>> So we check device->alloc_state by finding the first range which doesn't
>>> have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.
>>>
>>> But if we shrunk the device, that bits are not cleared, thus we could
>>> easily got a range starts beyond the shrunk device size.
>>>
>>> This results the returned @start and @end are all beyond device size,
>>> then we call "end = min(end, device->total_bytes -1);" making @end
>>> smaller than device size.
>>>
>>> Then finally we goes "len = end - start + 1", totally underflow the
>>> result, and lead to the beyond-device-boundary access.
>>>
>>> [FIX]
>>> This patch will fix the problem in two ways:
>>>
>>> - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
>>>   This is the root fix
>>>
>>> - Add extra safety check when trimming free device extents
>>>   We check and warn if the returned range is already beyond current
>>>   device.
>>>
>>> Link: https://github.com/kdave/btrfs-progs/issues/282
>>> Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit")
>>> CC: stable@vger.kernel.org # 5.4+
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> Reviewed-by: Filipe Manana <fdmanana@suse.com>
>>> Signed-off-by: David Sterba <dsterba@suse.com>
>>> ---
>>>  fs/btrfs/extent-io-tree.h |  2 ++
>>>  fs/btrfs/extent-tree.c    | 14 ++++++++++++++
>>>  fs/btrfs/volumes.c        |  4 ++++
>>>  3 files changed, 20 insertions(+)
>>>
>>> diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
>>> index f39d47a2d01a..219a09a2b734 100644
>>> --- a/fs/btrfs/extent-io-tree.h
>>> +++ b/fs/btrfs/extent-io-tree.h
>>> @@ -34,6 +34,8 @@ struct io_failure_record;
>>>   */
>>>  #define CHUNK_ALLOCATED				EXTENT_DIRTY
>>>  #define CHUNK_TRIMMED				EXTENT_DEFRAG
>>> +#define CHUNK_STATE_MASK			(CHUNK_ALLOCATED |		\
>>> +						 CHUNK_TRIMMED)
>>>
>>>  enum {
>>>  	IO_TREE_FS_PINNED_EXTENTS,
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index fa7d83051587..597505df90b4 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -33,6 +33,7 @@
>>>  #include "delalloc-space.h"
>>>  #include "block-group.h"
>>>  #include "discard.h"
>>> +#include "rcu-string.h"
>>>
>>>  #undef SCRAMBLE_DELAYED_REFS
>>>
>>> @@ -5669,6 +5670,19 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
>>>  					    &start, &end,
>>>  					    CHUNK_TRIMMED | CHUNK_ALLOCATED);
>>>
>>> +		/* Check if there are any CHUNK_* bits left */
>>> +		if (start > device->total_bytes) {
>>> +			WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
>>> +			btrfs_warn_in_rcu(fs_info,
>>> +"ignoring attempt to trim beyond device size: offset %llu length %llu device %s device size %llu",
>>> +					  start, end - start + 1,
>>> +					  rcu_str_deref(device->name),
>>> +					  device->total_bytes);
>>> +			mutex_unlock(&fs_info->chunk_mutex);
>>> +			ret = 0;
>>> +			break;
>>> +		}
>>> +
>>>  		/* Ensure we skip the reserved area in the first 1M */
>>>  		start = max_t(u64, start, SZ_1M);
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index d7670e2a9f39..ee96c5869f57 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -4720,6 +4720,10 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>>>  	}
>>>
>>>  	mutex_lock(&fs_info->chunk_mutex);
>>> +	/* Clear all state bits beyond the shrunk device size */
>>> +	clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
>>> +			  CHUNK_STATE_MASK);
>>> +
>>>  	btrfs_device_set_disk_total_bytes(device, new_size);
>>>  	if (list_empty(&device->post_commit_list))
>>>  		list_add_tail(&device->post_commit_list,
>>>
>>
diff mbox series

Patch

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index fa7d83051587..6b1b5dfba4b3 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -33,6 +33,7 @@ 
 #include "delalloc-space.h"
 #include "block-group.h"
 #include "discard.h"
+#include "rcu-string.h"
 
 #undef SCRAMBLE_DELAYED_REFS
 
@@ -5669,6 +5670,19 @@  static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
 					    &start, &end,
 					    CHUNK_TRIMMED | CHUNK_ALLOCATED);
 
+		/* CHUNK_* bits not cleared properly */
+		if (start > device->total_bytes) {
+			WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+			btrfs_warn_in_rcu(fs_info,
+"ignoring attempt to trim beyond device size: offset %llu length %llu device %s device size %llu",
+					  start, end - start + 1,
+					  rcu_str_deref(device->name),
+					  device->total_bytes);
+			mutex_unlock(&fs_info->chunk_mutex);
+			ret = 0;
+			break;
+		}
+
 		/* Ensure we skip the reserved area in the first 1M */
 		start = max_t(u64, start, SZ_1M);
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d7670e2a9f39..4e51ef68ea72 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4720,6 +4720,18 @@  int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 	}
 
 	mutex_lock(&fs_info->chunk_mutex);
+	/*
+	 * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
+	 * current device boundary.
+	 * This shouldn't fail, as alloc_state should only utilize those two
+	 * bits, thus we shouldn't alloc new memory for clearing the status.
+	 *
+	 * So here we just do an ASSERT() to catch future behavior change.
+	 */
+	ret = clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
+				CHUNK_TRIMMED | CHUNK_ALLOCATED);
+	ASSERT(!ret);
+
 	btrfs_device_set_disk_total_bytes(device, new_size);
 	if (list_empty(&device->post_commit_list))
 		list_add_tail(&device->post_commit_list,