diff mbox series

btrfs: trim: fix underflow in trim length to prevent access beyond device boundary

Message ID 20200730111921.60051-1-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: trim: fix underflow in trim length to prevent access beyond device boundary | expand

Commit Message

Qu Wenruo July 30, 2020, 11:19 a.m. UTC
[BUG]
The following script can lead to tons of beyond device boundary access:

  mkfs.btrfs -f $dev -b 10G
  mount $dev $mnt
  trimfs $mnt
  btrfs filesystem resize 1:-1G $mnt
  trimfs $mnt

[CAUSE]
Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
find_first_clear_extent_bit"), we try to avoid trimming ranges that's
already trimmed.

So we check device->alloc_state by finding the first range which doesn't
have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.

But if we shrunk the device, that bits are not cleared, thus we could
easily got a range starts beyond the shrunk device size.

This results the returned @start and @end are all beyond device size,
then we call "end = min(end, device->total_bytes -1);" making @end
smaller than device size.

Then finally we goes "len = end - start + 1", totally underflow the
result, and lead to the beyond-device-boundary access.

[FIX]
This patch will fix the problem in two ways:
- Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
  This is the root fix

- Add extra safe net when trimming free device extents
  We check if the returned range is already beyond current device
  boundary.

Link: https://github.com/kdave/btrfs-progs/issues/282
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-tree.c |  5 +++++
 fs/btrfs/volumes.c     | 12 ++++++++++++
 2 files changed, 17 insertions(+)

Comments

Filipe Manana July 30, 2020, 4:54 p.m. UTC | #1
On Thu, Jul 30, 2020 at 12:20 PM Qu Wenruo <wqu@suse.com> wrote:
>
> [BUG]
> The following script can lead to tons of beyond device boundary access:
>
>   mkfs.btrfs -f $dev -b 10G
>   mount $dev $mnt
>   trimfs $mnt
>   btrfs filesystem resize 1:-1G $mnt
>   trimfs $mnt
>
> [CAUSE]
> Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
> find_first_clear_extent_bit"), we try to avoid trimming ranges that's
> already trimmed.
>
> So we check device->alloc_state by finding the first range which doesn't
> have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.
>
> But if we shrunk the device, that bits are not cleared, thus we could
> easily got a range starts beyond the shrunk device size.
>
> This results the returned @start and @end are all beyond device size,
> then we call "end = min(end, device->total_bytes -1);" making @end
> smaller than device size.
>
> Then finally we goes "len = end - start + 1", totally underflow the
> result, and lead to the beyond-device-boundary access.
>
> [FIX]
> This patch will fix the problem in two ways:
> - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
>   This is the root fix
>
> - Add extra safe net when trimming free device extents
>   We check if the returned range is already beyond current device
>   boundary.
>
> Link: https://github.com/kdave/btrfs-progs/issues/282
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent-tree.c |  5 +++++
>  fs/btrfs/volumes.c     | 12 ++++++++++++
>  2 files changed, 17 insertions(+)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 61ede335f6c3..758f963feb96 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -5667,6 +5667,11 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
>                 find_first_clear_extent_bit(&device->alloc_state, start,
>                                             &start, &end,
>                                             CHUNK_TRIMMED | CHUNK_ALLOCATED);
> +               if (start >= device->total_bytes) {
> +                       mutex_unlock(&fs_info->chunk_mutex);
> +                       ret = 0;
> +                       break;
> +               }

It's good to ensure we never trim beyond the end of the fs, to avoid
corruption of whatever lies beyond it.
However we should have at least a WARN_ON / WARN_ON_ONCE here to help
more easily detect other current or future bugs that lead to the same
type of issue.

Other than that it looks good to me.
After that you can have,

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

>
>                 /* Ensure we skip the reserved area in the first 1M */
>                 start = max_t(u64, start, SZ_1M);
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 537ccf66ee20..906704c61a51 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4705,6 +4705,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>         }
>
>         mutex_lock(&fs_info->chunk_mutex);
> +       /*
> +        * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
> +        * current device boundary.
> +        */
> +       ret = clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
> +                               CHUNK_TRIMMED | CHUNK_ALLOCATED);
> +       if (ret < 0) {
> +               mutex_unlock(&fs_info->chunk_mutex);
> +               btrfs_abort_transaction(trans, ret);
> +               btrfs_end_transaction(trans);
> +               goto done;
> +       }
>         btrfs_device_set_disk_total_bytes(device, new_size);
>         if (list_empty(&device->post_commit_list))
>                 list_add_tail(&device->post_commit_list,
> --
> 2.28.0
>
David Sterba July 30, 2020, 5:03 p.m. UTC | #2
On Thu, Jul 30, 2020 at 07:19:21PM +0800, Qu Wenruo wrote:
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4705,6 +4705,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>  	}
>  
>  	mutex_lock(&fs_info->chunk_mutex);
> +	/*
> +	 * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
> +	 * current device boundary.
> +	 */
> +	ret = clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
> +				CHUNK_TRIMMED | CHUNK_ALLOCATED);
> +	if (ret < 0) {
> +		mutex_unlock(&fs_info->chunk_mutex);
> +		btrfs_abort_transaction(trans, ret);
> +		btrfs_end_transaction(trans);

Can this be made more lightweight? If the device shrink succeeded, we
now update only the device status, so failing the whole transaction here
seems to be too much.

As the device size is being reduced, we could possibly update the last
relevant entry in the alloc_state tree and delete everything after that,
instead of clearing the bits.
Qu Wenruo July 30, 2020, 11:44 p.m. UTC | #3
On 2020/7/31 上午12:54, Filipe Manana wrote:
> On Thu, Jul 30, 2020 at 12:20 PM Qu Wenruo <wqu@suse.com> wrote:
>>
>> [BUG]
>> The following script can lead to tons of beyond device boundary access:
>>
>>   mkfs.btrfs -f $dev -b 10G
>>   mount $dev $mnt
>>   trimfs $mnt
>>   btrfs filesystem resize 1:-1G $mnt
>>   trimfs $mnt
>>
>> [CAUSE]
>> Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
>> find_first_clear_extent_bit"), we try to avoid trimming ranges that's
>> already trimmed.
>>
>> So we check device->alloc_state by finding the first range which doesn't
>> have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.
>>
>> But if we shrunk the device, that bits are not cleared, thus we could
>> easily got a range starts beyond the shrunk device size.
>>
>> This results the returned @start and @end are all beyond device size,
>> then we call "end = min(end, device->total_bytes -1);" making @end
>> smaller than device size.
>>
>> Then finally we goes "len = end - start + 1", totally underflow the
>> result, and lead to the beyond-device-boundary access.
>>
>> [FIX]
>> This patch will fix the problem in two ways:
>> - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
>>   This is the root fix
>>
>> - Add extra safe net when trimming free device extents
>>   We check if the returned range is already beyond current device
>>   boundary.
>>
>> Link: https://github.com/kdave/btrfs-progs/issues/282
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/extent-tree.c |  5 +++++
>>  fs/btrfs/volumes.c     | 12 ++++++++++++
>>  2 files changed, 17 insertions(+)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 61ede335f6c3..758f963feb96 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -5667,6 +5667,11 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
>>                 find_first_clear_extent_bit(&device->alloc_state, start,
>>                                             &start, &end,
>>                                             CHUNK_TRIMMED | CHUNK_ALLOCATED);
>> +               if (start >= device->total_bytes) {
>> +                       mutex_unlock(&fs_info->chunk_mutex);
>> +                       ret = 0;
>> +                       break;
>> +               }
>
> It's good to ensure we never trim beyond the end of the fs, to avoid
> corruption of whatever lies beyond it.
> However we should have at least a WARN_ON / WARN_ON_ONCE here to help
> more easily detect other current or future bugs that lead to the same
> type of issue.

Since you're also on the idea of extra warning, it's definitely worthy then.

Thanks for the advice,
Qu

>
> Other than that it looks good to me.
> After that you can have,
>
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
>
> Thanks.
>
>>
>>                 /* Ensure we skip the reserved area in the first 1M */
>>                 start = max_t(u64, start, SZ_1M);
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 537ccf66ee20..906704c61a51 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -4705,6 +4705,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>>         }
>>
>>         mutex_lock(&fs_info->chunk_mutex);
>> +       /*
>> +        * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
>> +        * current device boundary.
>> +        */
>> +       ret = clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
>> +                               CHUNK_TRIMMED | CHUNK_ALLOCATED);
>> +       if (ret < 0) {
>> +               mutex_unlock(&fs_info->chunk_mutex);
>> +               btrfs_abort_transaction(trans, ret);
>> +               btrfs_end_transaction(trans);
>> +               goto done;
>> +       }
>>         btrfs_device_set_disk_total_bytes(device, new_size);
>>         if (list_empty(&device->post_commit_list))
>>                 list_add_tail(&device->post_commit_list,
>> --
>> 2.28.0
>>
>
>
Qu Wenruo July 31, 2020, 12:01 a.m. UTC | #4
On 2020/7/31 上午1:03, David Sterba wrote:
> On Thu, Jul 30, 2020 at 07:19:21PM +0800, Qu Wenruo wrote:
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -4705,6 +4705,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>>  	}
>>
>>  	mutex_lock(&fs_info->chunk_mutex);
>> +	/*
>> +	 * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
>> +	 * current device boundary.
>> +	 */
>> +	ret = clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
>> +				CHUNK_TRIMMED | CHUNK_ALLOCATED);
>> +	if (ret < 0) {
>> +		mutex_unlock(&fs_info->chunk_mutex);
>> +		btrfs_abort_transaction(trans, ret);
>> +		btrfs_end_transaction(trans);
>
> Can this be made more lightweight? If the device shrink succeeded, we
> now update only the device status, so failing the whole transaction here
> seems to be too much.

It can be lightweight by just ignoring the return value.

1. Currently clear_extent_bits() only return 0
As currently clear_extent_bits() uses BUG_ON() to handle the memory
allocation error, and can only return 0.

2. We have the extra safenet to prevent problems
But we may want to add extra warning in the safenet itself, so we
shouldn't completely rely on that.

3. Since we're clearing the only two utilized bits, it should not
allocate mew memory
Thus it should always return 0.

So yep, we can safely ignore the return value.

Thanks,
Qu


>
> As the device size is being reduced, we could possibly update the last
> relevant entry in the alloc_state tree and delete everything after that,
> instead of clearing the bits.
>
diff mbox series

Patch

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 61ede335f6c3..758f963feb96 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5667,6 +5667,11 @@  static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
 		find_first_clear_extent_bit(&device->alloc_state, start,
 					    &start, &end,
 					    CHUNK_TRIMMED | CHUNK_ALLOCATED);
+		if (start >= device->total_bytes) {
+			mutex_unlock(&fs_info->chunk_mutex);
+			ret = 0;
+			break;
+		}
 
 		/* Ensure we skip the reserved area in the first 1M */
 		start = max_t(u64, start, SZ_1M);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 537ccf66ee20..906704c61a51 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4705,6 +4705,18 @@  int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 	}
 
 	mutex_lock(&fs_info->chunk_mutex);
+	/*
+	 * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
+	 * current device boundary.
+	 */
+	ret = clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
+				CHUNK_TRIMMED | CHUNK_ALLOCATED);
+	if (ret < 0) {
+		mutex_unlock(&fs_info->chunk_mutex);
+		btrfs_abort_transaction(trans, ret);
+		btrfs_end_transaction(trans);
+		goto done;
+	}
 	btrfs_device_set_disk_total_bytes(device, new_size);
 	if (list_empty(&device->post_commit_list))
 		list_add_tail(&device->post_commit_list,