Message ID | 20200731072258.85861-1-wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: trim: fix underflow in trim length to prevent access beyond device boundary | expand |
On Fri, Jul 31, 2020 at 8:24 AM Qu Wenruo <wqu@suse.com> wrote: > > [BUG] > The following script can lead to tons of beyond device boundary access: > > mkfs.btrfs -f $dev -b 10G > mount $dev $mnt > trimfs $mnt > btrfs filesystem resize 1:-1G $mnt > trimfs $mnt > > [CAUSE] > Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to > find_first_clear_extent_bit"), we try to avoid trimming ranges that's > already trimmed. > > So we check device->alloc_state by finding the first range which doesn't > have CHUNK_TRIMMED and CHUNK_ALLOCATED not set. > > But if we shrunk the device, that bits are not cleared, thus we could > easily got a range starts beyond the shrunk device size. > > This results the returned @start and @end are all beyond device size, > then we call "end = min(end, device->total_bytes -1);" making @end > smaller than device size. > > Then finally we goes "len = end - start + 1", totally underflow the > result, and lead to the beyond-device-boundary access. > > [FIX] > This patch will fix the problem in two ways: > - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device > This is the root fix > > - Add extra safe net when trimming free device extents > We check and warn if the returned range is already beyond current > device. > > Link: https://github.com/kdave/btrfs-progs/issues/282 > Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit") > Signed-off-by: Qu Wenruo <wqu@suse.com> > Reviewed-by: Filipe Manana <fdmanana@suse.com> > --- > Changelog: > v2: > - Add proper fixes tag > - Add extra warning for beyond device end case > - Add graceful exit for already trimmed case > --- > fs/btrfs/extent-tree.c | 18 ++++++++++++++++++ > fs/btrfs/volumes.c | 12 ++++++++++++ > 2 files changed, 30 insertions(+) > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index fa7d83051587..84ec24506fc1 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -5669,6 +5669,24 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed) > &start, &end, > CHUNK_TRIMMED | CHUNK_ALLOCATED); > > + /* CHUNK_* bits not cleared properly */ > + if (start > device->total_bytes) { > + WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG)); > + btrfs_err(fs_info, > + "alloc_state not cleared properly for shrink, devid %llu", > + device->devid); Hum, I find the message a bit cryptic: referring to the name of attributes of structures, like alloc_state is not very user friendly. Plus this message is assuming all possible current and future bugs of attempts to trim beyond the device size are caused by a past shrink operation. I'm pretty sure we had such bugs in the past due to other causes. How about something like: btrfs_warn("ignoring attempt to trim beyond device size: offset %llu length %llu device %s device size %llu") I find it a lot more helpful for both users and developers. > + mutex_unlock(&fs_info->chunk_mutex); > + ret = -EUCLEAN; > + break; I don't see a reason to return an error. Especially EUCLEAN since nothing is really corrupted on disk. Thanks. > + } > + > + /* The remaining part has already been trimmed */ > + if (start == device->total_bytes) { > + mutex_unlock(&fs_info->chunk_mutex); > + ret = 0; > + break; > + } > + > /* Ensure we skip the reserved area in the first 1M */ > start = max_t(u64, start, SZ_1M); > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index d7670e2a9f39..4e51ef68ea72 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -4720,6 +4720,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size) > } > > mutex_lock(&fs_info->chunk_mutex); > + /* > + * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the > + * current device boundary. > + * This shouldn't fail, as alloc_state should only utilize those two > + * bits, thus we shouldn't alloc new memory for clearing the status. > + * > + * So here we just do an ASSERT() to catch future behavior change. > + */ > + ret = clear_extent_bits(&device->alloc_state, new_size, (u64)-1, > + CHUNK_TRIMMED | CHUNK_ALLOCATED); > + ASSERT(!ret); > + > btrfs_device_set_disk_total_bytes(device, new_size); > if (list_empty(&device->post_commit_list)) > list_add_tail(&device->post_commit_list, > -- > 2.28.0 >
On 2020/7/31 下午5:10, Filipe Manana wrote: > On Fri, Jul 31, 2020 at 8:24 AM Qu Wenruo <wqu@suse.com> wrote: >> >> [BUG] >> The following script can lead to tons of beyond device boundary access: >> >> mkfs.btrfs -f $dev -b 10G >> mount $dev $mnt >> trimfs $mnt >> btrfs filesystem resize 1:-1G $mnt >> trimfs $mnt >> >> [CAUSE] >> Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to >> find_first_clear_extent_bit"), we try to avoid trimming ranges that's >> already trimmed. >> >> So we check device->alloc_state by finding the first range which doesn't >> have CHUNK_TRIMMED and CHUNK_ALLOCATED not set. >> >> But if we shrunk the device, that bits are not cleared, thus we could >> easily got a range starts beyond the shrunk device size. >> >> This results the returned @start and @end are all beyond device size, >> then we call "end = min(end, device->total_bytes -1);" making @end >> smaller than device size. >> >> Then finally we goes "len = end - start + 1", totally underflow the >> result, and lead to the beyond-device-boundary access. >> >> [FIX] >> This patch will fix the problem in two ways: >> - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device >> This is the root fix >> >> - Add extra safe net when trimming free device extents >> We check and warn if the returned range is already beyond current >> device. >> >> Link: https://github.com/kdave/btrfs-progs/issues/282 >> Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit") >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> Reviewed-by: Filipe Manana <fdmanana@suse.com> >> --- >> Changelog: >> v2: >> - Add proper fixes tag >> - Add extra warning for beyond device end case >> - Add graceful exit for already trimmed case >> --- >> fs/btrfs/extent-tree.c | 18 ++++++++++++++++++ >> fs/btrfs/volumes.c | 12 ++++++++++++ >> 2 files changed, 30 insertions(+) >> >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c >> index fa7d83051587..84ec24506fc1 100644 >> --- a/fs/btrfs/extent-tree.c >> +++ b/fs/btrfs/extent-tree.c >> @@ -5669,6 +5669,24 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed) >> &start, &end, >> CHUNK_TRIMMED | CHUNK_ALLOCATED); >> >> + /* CHUNK_* bits not cleared properly */ >> + if (start > device->total_bytes) { >> + WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG)); >> + btrfs_err(fs_info, >> + "alloc_state not cleared properly for shrink, devid %llu", >> + device->devid); > > Hum, I find the message a bit cryptic: referring to the name of > attributes of structures, like alloc_state is not very user friendly. > Plus this message is assuming all possible current and future bugs of > attempts to trim beyond the device size are caused by a past shrink > operation. > I'm pretty sure we had such bugs in the past due to other causes. > > How about something like: > > btrfs_warn("ignoring attempt to trim beyond device size: offset %llu > length %llu device %s device size %llu") That's indeed much better. > > I find it a lot more helpful for both users and developers. > >> + mutex_unlock(&fs_info->chunk_mutex); >> + ret = -EUCLEAN; >> + break; > > I don't see a reason to return an error. Especially EUCLEAN since > nothing is really corrupted on disk. OK, let's just return 0 as usual. Thanks, Qu > > Thanks. > >> + } >> + >> + /* The remaining part has already been trimmed */ >> + if (start == device->total_bytes) { >> + mutex_unlock(&fs_info->chunk_mutex); >> + ret = 0; >> + break; >> + } >> + >> /* Ensure we skip the reserved area in the first 1M */ >> start = max_t(u64, start, SZ_1M); >> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >> index d7670e2a9f39..4e51ef68ea72 100644 >> --- a/fs/btrfs/volumes.c >> +++ b/fs/btrfs/volumes.c >> @@ -4720,6 +4720,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size) >> } >> >> mutex_lock(&fs_info->chunk_mutex); >> + /* >> + * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the >> + * current device boundary. >> + * This shouldn't fail, as alloc_state should only utilize those two >> + * bits, thus we shouldn't alloc new memory for clearing the status. >> + * >> + * So here we just do an ASSERT() to catch future behavior change. >> + */ >> + ret = clear_extent_bits(&device->alloc_state, new_size, (u64)-1, >> + CHUNK_TRIMMED | CHUNK_ALLOCATED); >> + ASSERT(!ret); >> + >> btrfs_device_set_disk_total_bytes(device, new_size); >> if (list_empty(&device->post_commit_list)) >> list_add_tail(&device->post_commit_list, >> -- >> 2.28.0 >> > >
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index fa7d83051587..84ec24506fc1 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -5669,6 +5669,24 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed) &start, &end, CHUNK_TRIMMED | CHUNK_ALLOCATED); + /* CHUNK_* bits not cleared properly */ + if (start > device->total_bytes) { + WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG)); + btrfs_err(fs_info, + "alloc_state not cleared properly for shrink, devid %llu", + device->devid); + mutex_unlock(&fs_info->chunk_mutex); + ret = -EUCLEAN; + break; + } + + /* The remaining part has already been trimmed */ + if (start == device->total_bytes) { + mutex_unlock(&fs_info->chunk_mutex); + ret = 0; + break; + } + /* Ensure we skip the reserved area in the first 1M */ start = max_t(u64, start, SZ_1M); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index d7670e2a9f39..4e51ef68ea72 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4720,6 +4720,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size) } mutex_lock(&fs_info->chunk_mutex); + /* + * Also clear any CHUNK_TRIMMED and CHUNK_ALLOCATED bits beyond the + * current device boundary. + * This shouldn't fail, as alloc_state should only utilize those two + * bits, thus we shouldn't alloc new memory for clearing the status. + * + * So here we just do an ASSERT() to catch future behavior change. + */ + ret = clear_extent_bits(&device->alloc_state, new_size, (u64)-1, + CHUNK_TRIMMED | CHUNK_ALLOCATED); + ASSERT(!ret); + btrfs_device_set_disk_total_bytes(device, new_size); if (list_empty(&device->post_commit_list)) list_add_tail(&device->post_commit_list,