btrfs: trim: fix underflow in trim length to prevent access beyond device boundary
diff mbox series

Message ID 20200731072258.85861-1-wqu@suse.com
State New
Headers show
Series
  • btrfs: trim: fix underflow in trim length to prevent access beyond device boundary
Related show

Commit Message

Qu Wenruo July 31, 2020, 7:22 a.m. UTC
[BUG]
The following script can lead to tons of beyond device boundary access:

  mkfs.btrfs -f $dev -b 10G
  mount $dev $mnt
  trimfs $mnt
  btrfs filesystem resize 1:-1G $mnt
  trimfs $mnt

[CAUSE]
Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
find_first_clear_extent_bit"), we try to avoid trimming ranges that's
already trimmed.

So we check device->alloc_state by finding the first range which doesn't
have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.

But if we shrunk the device, that bits are not cleared, thus we could
easily got a range starts beyond the shrunk device size.

This results the returned @start and @end are all beyond device size,
then we call "end = min(end, device->total_bytes -1);" making @end
smaller than device size.

Then finally we goes "len = end - start + 1", totally underflow the
result, and lead to the beyond-device-boundary access.

[FIX]
This patch will fix the problem in two ways:
- Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
  This is the root fix

- Add extra safe net when trimming free device extents
  We check and warn if the returned range is already beyond current
  device.

Link: https://github.com/kdave/btrfs-progs/issues/282
Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
---
Changelog:
v2:
- Add proper fixes tag
- Add extra warning for beyond device end case
- Add graceful exit for already trimmed case
---
 fs/btrfs/extent-tree.c | 18 ++++++++++++++++++
 fs/btrfs/volumes.c     | 12 ++++++++++++
 2 files changed, 30 insertions(+)

Comments

Filipe Manana July 31, 2020, 9:10 a.m. UTC | #1
On Fri, Jul 31, 2020 at 8:24 AM Qu Wenruo <wqu@suse.com> wrote:
>
> [BUG]
> The following script can lead to tons of beyond device boundary access:
>
>   mkfs.btrfs -f $dev -b 10G
>   mount $dev $mnt
>   trimfs $mnt
>   btrfs filesystem resize 1:-1G $mnt
>   trimfs $mnt
>
> [CAUSE]
> Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
> find_first_clear_extent_bit"), we try to avoid trimming ranges that's
> already trimmed.
>
> So we check device->alloc_state by finding the first range which doesn't
> have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.
>
> But if we shrunk the device, that bits are not cleared, thus we could
> easily got a range starts beyond the shrunk device size.
>
> This results the returned @start and @end are all beyond device size,
> then we call "end = min(end, device->total_bytes -1);" making @end
> smaller than device size.
>
> Then finally we goes "len = end - start + 1", totally underflow the
> result, and lead to the beyond-device-boundary access.
>
> [FIX]
> This patch will fix the problem in two ways:
> - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
>   This is the root fix
>
> - Add extra safe net when trimming free device extents
>   We check and warn if the returned range is already beyond current
>   device.
>
> Link: https://github.com/kdave/btrfs-progs/issues/282
> Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit")
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> ---
> Changelog:
> v2:
> - Add proper fixes tag
> - Add extra warning for beyond device end case
> - Add graceful exit for already trimmed case
> ---
>  fs/btrfs/extent-tree.c | 18 ++++++++++++++++++
>  fs/btrfs/volumes.c     | 12 ++++++++++++
>  2 files changed, 30 insertions(+)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index fa7d83051587..84ec24506fc1 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -5669,6 +5669,24 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
>                                             &start, &end,
>                                             CHUNK_TRIMMED | CHUNK_ALLOCATED);
>
> +               /* CHUNK_* bits not cleared properly */
> +               if (start > device->total_bytes) {
> +                       WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> +                       btrfs_err(fs_info,
> +               "alloc_state not cleared properly for shrink, devid %llu",
> +                                 device->devid);

Hum, I find the message a bit cryptic: referring to the name of
attributes of structures, like alloc_state is not very user friendly.
Plus this message is assuming all possible current and future bugs of
attempts to trim beyond the device size are caused by a past shrink
operation.
I'm pretty sure we had such bugs in the past due to other causes.

How about something like:

btrfs_warn("ignoring attempt to trim beyond device size: offset %llu
length %llu device %s device size %llu")

I find it a lot more helpful for both users and developers.

> +                       mutex_unlock(&fs_info->chunk_mutex);
> +                       ret = -EUCLEAN;
> +                       break;

I don't see a reason to return an error. Especially EUCLEAN since
nothing is really corrupted on disk.

Thanks.

> +               }
> +
> +               /* The remaining part has already been trimmed */
> +               if (start == device->total_bytes) {
> +                       mutex_unlock(&fs_info->chunk_mutex);
> +                       ret = 0;
> +                       break;
> +               }
> +
>                 /* Ensure we skip the reserved area in the first 1M */
>                 start = max_t(u64, start, SZ_1M);
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index d7670e2a9f39..4e51ef68ea72 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4720,6 +4720,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>         }
>
>         mutex_lock(&fs_info->chunk_mutex);
> +       /*
> +        * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
> +        * current device boundary.
> +        * This shouldn't fail, as alloc_state should only utilize those two
> +        * bits, thus we shouldn't alloc new memory for clearing the status.
> +        *
> +        * So here we just do an ASSERT() to catch future behavior change.
> +        */
> +       ret = clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
> +                               CHUNK_TRIMMED | CHUNK_ALLOCATED);
> +       ASSERT(!ret);
> +
>         btrfs_device_set_disk_total_bytes(device, new_size);
>         if (list_empty(&device->post_commit_list))
>                 list_add_tail(&device->post_commit_list,
> --
> 2.28.0
>
Qu Wenruo July 31, 2020, 9:19 a.m. UTC | #2
On 2020/7/31 下午5:10, Filipe Manana wrote:
> On Fri, Jul 31, 2020 at 8:24 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> [BUG]
>> The following script can lead to tons of beyond device boundary access:
>>
>>   mkfs.btrfs -f $dev -b 10G
>>   mount $dev $mnt
>>   trimfs $mnt
>>   btrfs filesystem resize 1:-1G $mnt
>>   trimfs $mnt
>>
>> [CAUSE]
>> Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to
>> find_first_clear_extent_bit"), we try to avoid trimming ranges that's
>> already trimmed.
>>
>> So we check device->alloc_state by finding the first range which doesn't
>> have CHUNK_TRIMMED and CHUNK_ALLOCATED not set.
>>
>> But if we shrunk the device, that bits are not cleared, thus we could
>> easily got a range starts beyond the shrunk device size.
>>
>> This results the returned @start and @end are all beyond device size,
>> then we call "end = min(end, device->total_bytes -1);" making @end
>> smaller than device size.
>>
>> Then finally we goes "len = end - start + 1", totally underflow the
>> result, and lead to the beyond-device-boundary access.
>>
>> [FIX]
>> This patch will fix the problem in two ways:
>> - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device
>>   This is the root fix
>>
>> - Add extra safe net when trimming free device extents
>>   We check and warn if the returned range is already beyond current
>>   device.
>>
>> Link: https://github.com/kdave/btrfs-progs/issues/282
>> Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit")
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> Reviewed-by: Filipe Manana <fdmanana@suse.com>
>> ---
>> Changelog:
>> v2:
>> - Add proper fixes tag
>> - Add extra warning for beyond device end case
>> - Add graceful exit for already trimmed case
>> ---
>>  fs/btrfs/extent-tree.c | 18 ++++++++++++++++++
>>  fs/btrfs/volumes.c     | 12 ++++++++++++
>>  2 files changed, 30 insertions(+)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index fa7d83051587..84ec24506fc1 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -5669,6 +5669,24 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
>>                                             &start, &end,
>>                                             CHUNK_TRIMMED | CHUNK_ALLOCATED);
>>
>> +               /* CHUNK_* bits not cleared properly */
>> +               if (start > device->total_bytes) {
>> +                       WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
>> +                       btrfs_err(fs_info,
>> +               "alloc_state not cleared properly for shrink, devid %llu",
>> +                                 device->devid);
> 
> Hum, I find the message a bit cryptic: referring to the name of
> attributes of structures, like alloc_state is not very user friendly.
> Plus this message is assuming all possible current and future bugs of
> attempts to trim beyond the device size are caused by a past shrink
> operation.
> I'm pretty sure we had such bugs in the past due to other causes.
> 
> How about something like:
> 
> btrfs_warn("ignoring attempt to trim beyond device size: offset %llu
> length %llu device %s device size %llu")

That's indeed much better.

> 
> I find it a lot more helpful for both users and developers.
> 
>> +                       mutex_unlock(&fs_info->chunk_mutex);
>> +                       ret = -EUCLEAN;
>> +                       break;
> 
> I don't see a reason to return an error. Especially EUCLEAN since
> nothing is really corrupted on disk.

OK, let's just return 0 as usual.

Thanks,
Qu
> 
> Thanks.
> 
>> +               }
>> +
>> +               /* The remaining part has already been trimmed */
>> +               if (start == device->total_bytes) {
>> +                       mutex_unlock(&fs_info->chunk_mutex);
>> +                       ret = 0;
>> +                       break;
>> +               }
>> +
>>                 /* Ensure we skip the reserved area in the first 1M */
>>                 start = max_t(u64, start, SZ_1M);
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index d7670e2a9f39..4e51ef68ea72 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -4720,6 +4720,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>>         }
>>
>>         mutex_lock(&fs_info->chunk_mutex);
>> +       /*
>> +        * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
>> +        * current device boundary.
>> +        * This shouldn't fail, as alloc_state should only utilize those two
>> +        * bits, thus we shouldn't alloc new memory for clearing the status.
>> +        *
>> +        * So here we just do an ASSERT() to catch future behavior change.
>> +        */
>> +       ret = clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
>> +                               CHUNK_TRIMMED | CHUNK_ALLOCATED);
>> +       ASSERT(!ret);
>> +
>>         btrfs_device_set_disk_total_bytes(device, new_size);
>>         if (list_empty(&device->post_commit_list))
>>                 list_add_tail(&device->post_commit_list,
>> --
>> 2.28.0
>>
> 
>

Patch
diff mbox series

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index fa7d83051587..84ec24506fc1 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5669,6 +5669,24 @@  static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
 					    &start, &end,
 					    CHUNK_TRIMMED | CHUNK_ALLOCATED);
 
+		/* CHUNK_* bits not cleared properly */
+		if (start > device->total_bytes) {
+			WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+			btrfs_err(fs_info,
+		"alloc_state not cleared properly for shrink, devid %llu",
+				  device->devid);
+			mutex_unlock(&fs_info->chunk_mutex);
+			ret = -EUCLEAN;
+			break;
+		}
+
+		/* The remaining part has already been trimmed */
+		if (start == device->total_bytes) {
+			mutex_unlock(&fs_info->chunk_mutex);
+			ret = 0;
+			break;
+		}
+
 		/* Ensure we skip the reserved area in the first 1M */
 		start = max_t(u64, start, SZ_1M);
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d7670e2a9f39..4e51ef68ea72 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4720,6 +4720,18 @@  int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 	}
 
 	mutex_lock(&fs_info->chunk_mutex);
+	/*
+	 * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the
+	 * current device boundary.
+	 * This shouldn't fail, as alloc_state should only utilize those two
+	 * bits, thus we shouldn't alloc new memory for clearing the status.
+	 *
+	 * So here we just do an ASSERT() to catch future behavior change.
+	 */
+	ret = clear_extent_bits(&device->alloc_state, new_size, (u64)-1,
+				CHUNK_TRIMMED | CHUNK_ALLOCATED);
+	ASSERT(!ret);
+
 	btrfs_device_set_disk_total_bytes(device, new_size);
 	if (list_empty(&device->post_commit_list))
 		list_add_tail(&device->post_commit_list,