Message ID | c26d8d377147d3a80e352ee31e432591c28e3f4b.1651905487.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: allow defrag to convert inline extents to regular extents | expand |
On 5/7/22 12:09, Qu Wenruo wrote: > Btrfs defaults to max_inline=2K to make small writes inlined into > metadata. > > The default value is always a win, as even DUP/RAID1/RAID10 doubles the > metadata usage, it should still cause less physical space used compared > to a 4K regular extents. > > But since the introduce of RAID1C3 and RAID1C4 it's no longer the case, > users may find inlined extents causing too much space wasted, and want > to convert those inlined extents back to regular extents. > > Unfortunately defrag will unconditionally skip all inline extents, no > matter if the user is trying to converting them back to regular extents. > > So this patch will add a small exception for defrag_collect_targets() to > allow defragging inline extents, if and only if the inlined extents are > larger than max_inline, allowing users to convert them to regular ones. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/ioctl.c | 24 ++++++++++++++++++++++-- > 1 file changed, 22 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 9d8e46815ee4..852c49565ab2 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -1420,8 +1420,19 @@ static int defrag_collect_targets(struct btrfs_inode *inode, > if (!em) > break; > > - /* Skip hole/inline/preallocated extents */ > - if (em->block_start >= EXTENT_MAP_LAST_BYTE || > + /* > + * If the file extent is an inlined one, we may still want to > + * defrag it (fallthrough) if it will cause a regular extent. > + * This is for users who want to convert inline extents to > + * regular ones through max_inline= mount option. > + */ > + if (em->block_start == EXTENT_MAP_INLINE && > + em->len <= inode->root->fs_info->max_inline) > + goto next; > + > + /* Skip hole/delalloc/preallocated extents */ > + if (em->block_start == EXTENT_MAP_HOLE || > + em->block_start == EXTENT_MAP_DELALLOC || > test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) > goto next; > > @@ -1480,6 +1491,15 @@ static int defrag_collect_targets(struct btrfs_inode *inode, > if (em->len >= get_extent_max_capacity(em)) > goto next; > > + /* > + * For inline extent it should be the first extent and it > + * should not have a next extent. > + * If the inlined extent passed all above checks, just add it > + * for defrag, and be converted to regular extents. > + */ > + if (em->block_start == EXTENT_MAP_INLINE) > + goto add; > + > next_mergeable = defrag_check_next_extent(&inode->vfs_inode, em, > extent_thresh, newer_than, locked); > if (!next_mergeable) { Why not also let the inline extent have the next_mergeable checked? So the new regular extent will defrag. No? -Anand
On 2022/5/9 10:15, Anand Jain wrote: > On 5/7/22 12:09, Qu Wenruo wrote: >> Btrfs defaults to max_inline=2K to make small writes inlined into >> metadata. >> >> The default value is always a win, as even DUP/RAID1/RAID10 doubles the >> metadata usage, it should still cause less physical space used compared >> to a 4K regular extents. >> >> But since the introduce of RAID1C3 and RAID1C4 it's no longer the case, >> users may find inlined extents causing too much space wasted, and want >> to convert those inlined extents back to regular extents. >> >> Unfortunately defrag will unconditionally skip all inline extents, no >> matter if the user is trying to converting them back to regular extents. >> >> So this patch will add a small exception for defrag_collect_targets() to >> allow defragging inline extents, if and only if the inlined extents are >> larger than max_inline, allowing users to convert them to regular ones. >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/ioctl.c | 24 ++++++++++++++++++++++-- >> 1 file changed, 22 insertions(+), 2 deletions(-) >> >> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c >> index 9d8e46815ee4..852c49565ab2 100644 >> --- a/fs/btrfs/ioctl.c >> +++ b/fs/btrfs/ioctl.c >> @@ -1420,8 +1420,19 @@ static int defrag_collect_targets(struct >> btrfs_inode *inode, >> if (!em) >> break; >> - /* Skip hole/inline/preallocated extents */ >> - if (em->block_start >= EXTENT_MAP_LAST_BYTE || >> + /* >> + * If the file extent is an inlined one, we may still want to >> + * defrag it (fallthrough) if it will cause a regular extent. >> + * This is for users who want to convert inline extents to >> + * regular ones through max_inline= mount option. >> + */ >> + if (em->block_start == EXTENT_MAP_INLINE && >> + em->len <= inode->root->fs_info->max_inline) >> + goto next; >> + >> + /* Skip hole/delalloc/preallocated extents */ >> + if (em->block_start == EXTENT_MAP_HOLE || >> + em->block_start == EXTENT_MAP_DELALLOC || >> test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) >> goto next; >> @@ -1480,6 +1491,15 @@ static int defrag_collect_targets(struct >> btrfs_inode *inode, >> if (em->len >= get_extent_max_capacity(em)) >> goto next; > > >> + /* >> + * For inline extent it should be the first extent and it >> + * should not have a next extent. >> + * If the inlined extent passed all above checks, just add it >> + * for defrag, and be converted to regular extents. >> + */ >> + if (em->block_start == EXTENT_MAP_INLINE) >> + goto add; >> + >> next_mergeable = defrag_check_next_extent(&inode->vfs_inode, >> em, >> extent_thresh, newer_than, locked); >> if (!next_mergeable) { > Why not also let the inline extent have the next_mergeable checked? > So the new regular extent will defrag. No? You definitely forget the fact that inline extent should NOT have regular extents following it. > > -Anand >
On 5/9/22 07:47, Qu Wenruo wrote: > > > On 2022/5/9 10:15, Anand Jain wrote: >> On 5/7/22 12:09, Qu Wenruo wrote: >>> Btrfs defaults to max_inline=2K to make small writes inlined into >>> metadata. >>> >>> The default value is always a win, as even DUP/RAID1/RAID10 doubles the >>> metadata usage, it should still cause less physical space used compared >>> to a 4K regular extents. >>> >>> But since the introduce of RAID1C3 and RAID1C4 it's no longer the case, >>> users may find inlined extents causing too much space wasted, and want >>> to convert those inlined extents back to regular extents. >>> >>> Unfortunately defrag will unconditionally skip all inline extents, no >>> matter if the user is trying to converting them back to regular extents. >>> >>> So this patch will add a small exception for defrag_collect_targets() to >>> allow defragging inline extents, if and only if the inlined extents are >>> larger than max_inline, allowing users to convert them to regular ones. >>> >>> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> >>> --- >>> fs/btrfs/ioctl.c | 24 ++++++++++++++++++++++-- >>> 1 file changed, 22 insertions(+), 2 deletions(-) >>> >>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c >>> index 9d8e46815ee4..852c49565ab2 100644 >>> --- a/fs/btrfs/ioctl.c >>> +++ b/fs/btrfs/ioctl.c >>> @@ -1420,8 +1420,19 @@ static int defrag_collect_targets(struct >>> btrfs_inode *inode, >>> if (!em) >>> break; >>> - /* Skip hole/inline/preallocated extents */ >>> - if (em->block_start >= EXTENT_MAP_LAST_BYTE || >>> + /* >>> + * If the file extent is an inlined one, we may still want to >>> + * defrag it (fallthrough) if it will cause a regular extent. >>> + * This is for users who want to convert inline extents to >>> + * regular ones through max_inline= mount option. >>> + */ >>> + if (em->block_start == EXTENT_MAP_INLINE && >>> + em->len <= inode->root->fs_info->max_inline) >>> + goto next; >>> + >>> + /* Skip hole/delalloc/preallocated extents */ >>> + if (em->block_start == EXTENT_MAP_HOLE || >>> + em->block_start == EXTENT_MAP_DELALLOC || >>> test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) >>> goto next; >>> @@ -1480,6 +1491,15 @@ static int defrag_collect_targets(struct >>> btrfs_inode *inode, >>> if (em->len >= get_extent_max_capacity(em)) >>> goto next; >> >> >>> + /* >>> + * For inline extent it should be the first extent and it >>> + * should not have a next extent. >>> + * If the inlined extent passed all above checks, just add it >>> + * for defrag, and be converted to regular extents. >>> + */ >>> + if (em->block_start == EXTENT_MAP_INLINE) >>> + goto add; >>> + >>> next_mergeable = >>> defrag_check_next_extent(&inode->vfs_inode, em, >>> extent_thresh, newer_than, locked); >>> if (!next_mergeable) { >> Why not also let the inline extent have the next_mergeable checked? >> So the new regular extent will defrag. No? > > You definitely forget the fact that inline extent should NOT have > regular extents following it. > >> >> -Anand >> >
On Sat, May 07, 2022 at 02:39:27PM +0800, Qu Wenruo wrote: > Btrfs defaults to max_inline=2K to make small writes inlined into > metadata. > > The default value is always a win, as even DUP/RAID1/RAID10 doubles the > metadata usage, it should still cause less physical space used compared > to a 4K regular extents. > > But since the introduce of RAID1C3 and RAID1C4 it's no longer the case, > users may find inlined extents causing too much space wasted, and want > to convert those inlined extents back to regular extents. > > Unfortunately defrag will unconditionally skip all inline extents, no > matter if the user is trying to converting them back to regular extents. > > So this patch will add a small exception for defrag_collect_targets() to > allow defragging inline extents, if and only if the inlined extents are > larger than max_inline, allowing users to convert them to regular ones. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/ioctl.c | 24 ++++++++++++++++++++++-- > 1 file changed, 22 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 9d8e46815ee4..852c49565ab2 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -1420,8 +1420,19 @@ static int defrag_collect_targets(struct btrfs_inode *inode, > if (!em) > break; > > - /* Skip hole/inline/preallocated extents */ > - if (em->block_start >= EXTENT_MAP_LAST_BYTE || > + /* > + * If the file extent is an inlined one, we may still want to > + * defrag it (fallthrough) if it will cause a regular extent. > + * This is for users who want to convert inline extents to > + * regular ones through max_inline= mount option. > + */ > + if (em->block_start == EXTENT_MAP_INLINE && > + em->len <= inode->root->fs_info->max_inline) > + goto next; > + > + /* Skip hole/delalloc/preallocated extents */ > + if (em->block_start == EXTENT_MAP_HOLE || > + em->block_start == EXTENT_MAP_DELALLOC || > test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) > goto next; > > @@ -1480,6 +1491,15 @@ static int defrag_collect_targets(struct btrfs_inode *inode, > if (em->len >= get_extent_max_capacity(em)) > goto next; > > + /* > + * For inline extent it should be the first extent and it > + * should not have a next extent. This is misleading. As you know, and we've discussed this in a few threads in the past, there are at least a couple causes where we can have an inline extent followed by other extents. One has to do with compresson and the other with fallocate. So either this part of the comment should be rephrased or go away. This is also a good oppurtunity to convert cases where we have an inlined compressed extent followed by one (or more) extents: $ mount -o compress /dev/sdi /mnt $ xfs_io -f -s -c "pwrite -S 0xab 0 4K" -c "pwrite -S 0xcd -b 16K 4K 16K" /mnt/foobar In this case a defrag could mark the [0, 20K[ for defrag and we end up saving both data and metadata space (one less extent item in the fs tree and maybe in the extent tree too). Thanks. > + * If the inlined extent passed all above checks, just add it > + * for defrag, and be converted to regular extents. > + */ > + if (em->block_start == EXTENT_MAP_INLINE) > + goto add; > + > next_mergeable = defrag_check_next_extent(&inode->vfs_inode, em, > extent_thresh, newer_than, locked); > if (!next_mergeable) { > -- > 2.36.0 >
On 2022/5/9 17:56, Filipe Manana wrote: > On Sat, May 07, 2022 at 02:39:27PM +0800, Qu Wenruo wrote: >> Btrfs defaults to max_inline=2K to make small writes inlined into >> metadata. >> >> The default value is always a win, as even DUP/RAID1/RAID10 doubles the >> metadata usage, it should still cause less physical space used compared >> to a 4K regular extents. >> >> But since the introduce of RAID1C3 and RAID1C4 it's no longer the case, >> users may find inlined extents causing too much space wasted, and want >> to convert those inlined extents back to regular extents. >> >> Unfortunately defrag will unconditionally skip all inline extents, no >> matter if the user is trying to converting them back to regular extents. >> >> So this patch will add a small exception for defrag_collect_targets() to >> allow defragging inline extents, if and only if the inlined extents are >> larger than max_inline, allowing users to convert them to regular ones. >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/ioctl.c | 24 ++++++++++++++++++++++-- >> 1 file changed, 22 insertions(+), 2 deletions(-) >> >> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c >> index 9d8e46815ee4..852c49565ab2 100644 >> --- a/fs/btrfs/ioctl.c >> +++ b/fs/btrfs/ioctl.c >> @@ -1420,8 +1420,19 @@ static int defrag_collect_targets(struct btrfs_inode *inode, >> if (!em) >> break; >> >> - /* Skip hole/inline/preallocated extents */ >> - if (em->block_start >= EXTENT_MAP_LAST_BYTE || >> + /* >> + * If the file extent is an inlined one, we may still want to >> + * defrag it (fallthrough) if it will cause a regular extent. >> + * This is for users who want to convert inline extents to >> + * regular ones through max_inline= mount option. >> + */ >> + if (em->block_start == EXTENT_MAP_INLINE && >> + em->len <= inode->root->fs_info->max_inline) >> + goto next; >> + >> + /* Skip hole/delalloc/preallocated extents */ >> + if (em->block_start == EXTENT_MAP_HOLE || >> + em->block_start == EXTENT_MAP_DELALLOC || >> test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) >> goto next; >> >> @@ -1480,6 +1491,15 @@ static int defrag_collect_targets(struct btrfs_inode *inode, >> if (em->len >= get_extent_max_capacity(em)) >> goto next; >> >> + /* >> + * For inline extent it should be the first extent and it >> + * should not have a next extent. > > This is misleading. > > As you know, and we've discussed this in a few threads in the past, there are > at least a couple causes where we can have an inline extent followed by other > extents. One has to do with compresson and the other with fallocate. Yes, I totally know the case. That's why I only mentioned "should", not "must". > > So either this part of the comment should be rephrased or go away. Since Anand questioned why we need to skip the next mergeable check, it looks like we'd better re-phase it. What about "normally inline extents should have no more extent after it, thus @next_mergeable would be false under most cases."? Thanks, Qu > > This is also a good oppurtunity to convert cases where we have an inlined > compressed extent followed by one (or more) extents: > > $ mount -o compress /dev/sdi /mnt > $ xfs_io -f -s -c "pwrite -S 0xab 0 4K" -c "pwrite -S 0xcd -b 16K 4K 16K" /mnt/foobar > > In this case a defrag could mark the [0, 20K[ for defrag and we end up saving > both data and metadata space (one less extent item in the fs tree and maybe in > the extent tree too). > > Thanks. > >> + * If the inlined extent passed all above checks, just add it >> + * for defrag, and be converted to regular extents. >> + */ >> + if (em->block_start == EXTENT_MAP_INLINE) >> + goto add; >> + >> next_mergeable = defrag_check_next_extent(&inode->vfs_inode, em, >> extent_thresh, newer_than, locked); >> if (!next_mergeable) { >> -- >> 2.36.0 >>
On Mon, May 9, 2022 at 12:12 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > On 2022/5/9 17:56, Filipe Manana wrote: > > On Sat, May 07, 2022 at 02:39:27PM +0800, Qu Wenruo wrote: > >> Btrfs defaults to max_inline=2K to make small writes inlined into > >> metadata. > >> > >> The default value is always a win, as even DUP/RAID1/RAID10 doubles the > >> metadata usage, it should still cause less physical space used compared > >> to a 4K regular extents. > >> > >> But since the introduce of RAID1C3 and RAID1C4 it's no longer the case, > >> users may find inlined extents causing too much space wasted, and want > >> to convert those inlined extents back to regular extents. > >> > >> Unfortunately defrag will unconditionally skip all inline extents, no > >> matter if the user is trying to converting them back to regular extents. > >> > >> So this patch will add a small exception for defrag_collect_targets() to > >> allow defragging inline extents, if and only if the inlined extents are > >> larger than max_inline, allowing users to convert them to regular ones. > >> > >> Signed-off-by: Qu Wenruo <wqu@suse.com> > >> --- > >> fs/btrfs/ioctl.c | 24 ++++++++++++++++++++++-- > >> 1 file changed, 22 insertions(+), 2 deletions(-) > >> > >> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > >> index 9d8e46815ee4..852c49565ab2 100644 > >> --- a/fs/btrfs/ioctl.c > >> +++ b/fs/btrfs/ioctl.c > >> @@ -1420,8 +1420,19 @@ static int defrag_collect_targets(struct btrfs_inode *inode, > >> if (!em) > >> break; > >> > >> - /* Skip hole/inline/preallocated extents */ > >> - if (em->block_start >= EXTENT_MAP_LAST_BYTE || > >> + /* > >> + * If the file extent is an inlined one, we may still want to > >> + * defrag it (fallthrough) if it will cause a regular extent. > >> + * This is for users who want to convert inline extents to > >> + * regular ones through max_inline= mount option. > >> + */ > >> + if (em->block_start == EXTENT_MAP_INLINE && > >> + em->len <= inode->root->fs_info->max_inline) > >> + goto next; > >> + > >> + /* Skip hole/delalloc/preallocated extents */ > >> + if (em->block_start == EXTENT_MAP_HOLE || > >> + em->block_start == EXTENT_MAP_DELALLOC || > >> test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) > >> goto next; > >> > >> @@ -1480,6 +1491,15 @@ static int defrag_collect_targets(struct btrfs_inode *inode, > >> if (em->len >= get_extent_max_capacity(em)) > >> goto next; > >> > >> + /* > >> + * For inline extent it should be the first extent and it > >> + * should not have a next extent. > > > > This is misleading. > > > > As you know, and we've discussed this in a few threads in the past, there are > > at least a couple causes where we can have an inline extent followed by other > > extents. One has to do with compresson and the other with fallocate. > > Yes, I totally know the case. That's why I only mentioned "should", not > "must". > > > > > So either this part of the comment should be rephrased or go away. > > Since Anand questioned why we need to skip the next mergeable check, it > looks like we'd better re-phase it. > > What about "normally inline extents should have no more extent after it, > thus @next_mergeable would be false under most cases."? Yes, that's fine. Thanks. > > Thanks, > Qu > > > > > This is also a good oppurtunity to convert cases where we have an inlined > > compressed extent followed by one (or more) extents: > > > > $ mount -o compress /dev/sdi /mnt > > $ xfs_io -f -s -c "pwrite -S 0xab 0 4K" -c "pwrite -S 0xcd -b 16K 4K 16K" /mnt/foobar > > > > In this case a defrag could mark the [0, 20K[ for defrag and we end up saving > > both data and metadata space (one less extent item in the fs tree and maybe in > > the extent tree too). > > > > Thanks. > > > >> + * If the inlined extent passed all above checks, just add it > >> + * for defrag, and be converted to regular extents. > >> + */ > >> + if (em->block_start == EXTENT_MAP_INLINE) > >> + goto add; > >> + > >> next_mergeable = defrag_check_next_extent(&inode->vfs_inode, em, > >> extent_thresh, newer_than, locked); > >> if (!next_mergeable) { > >> -- > >> 2.36.0 > >>
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 9d8e46815ee4..852c49565ab2 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1420,8 +1420,19 @@ static int defrag_collect_targets(struct btrfs_inode *inode, if (!em) break; - /* Skip hole/inline/preallocated extents */ - if (em->block_start >= EXTENT_MAP_LAST_BYTE || + /* + * If the file extent is an inlined one, we may still want to + * defrag it (fallthrough) if it will cause a regular extent. + * This is for users who want to convert inline extents to + * regular ones through max_inline= mount option. + */ + if (em->block_start == EXTENT_MAP_INLINE && + em->len <= inode->root->fs_info->max_inline) + goto next; + + /* Skip hole/delalloc/preallocated extents */ + if (em->block_start == EXTENT_MAP_HOLE || + em->block_start == EXTENT_MAP_DELALLOC || test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) goto next; @@ -1480,6 +1491,15 @@ static int defrag_collect_targets(struct btrfs_inode *inode, if (em->len >= get_extent_max_capacity(em)) goto next; + /* + * For inline extent it should be the first extent and it + * should not have a next extent. + * If the inlined extent passed all above checks, just add it + * for defrag, and be converted to regular extents. + */ + if (em->block_start == EXTENT_MAP_INLINE) + goto add; + next_mergeable = defrag_check_next_extent(&inode->vfs_inode, em, extent_thresh, newer_than, locked); if (!next_mergeable) {
Btrfs defaults to max_inline=2K to make small writes inlined into metadata. The default value is always a win, as even DUP/RAID1/RAID10 doubles the metadata usage, it should still cause less physical space used compared to a 4K regular extents. But since the introduce of RAID1C3 and RAID1C4 it's no longer the case, users may find inlined extents causing too much space wasted, and want to convert those inlined extents back to regular extents. Unfortunately defrag will unconditionally skip all inline extents, no matter if the user is trying to converting them back to regular extents. So this patch will add a small exception for defrag_collect_targets() to allow defragging inline extents, if and only if the inlined extents are larger than max_inline, allowing users to convert them to regular ones. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/ioctl.c | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-)