diff mbox series

btrfs: update writeback index when starting defrag

Message ID 20aad8ccf0fbdecddd49216f25fa772754f77978.1642700395.git.fdmanana@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: update writeback index when starting defrag | expand

Commit Message

Filipe Manana Jan. 20, 2022, 5:41 p.m. UTC
From: Filipe Manana <fdmanana@suse.com>

When starting a defrag, we should update the writeback index of the
inode's mapping in case it currently has a value beyond the start of the
range we are defragging. This can help performance and often result in
getting less extents after writeback - for e.g., if the current value
of the writeback index sits somewhere in the middle of a range that
gets dirty by the defrag, then after writeback we can get two smaller
extents instead of a single, larger extent.

We used to have this before the refactoring in 5.16, but it was removed
without any reason to do so. Orginally it was added in kernel 3.1, by
commit 2a0f7f5769992b ("Btrfs: fix recursive auto-defrag"), in order to
fix a loop with autodefrag resulting in dirtying and writing pages over
and over, but some testing on current code did not show that happening,
at least with the test described in that commit.

So add back the behaviour, as at the very least it is a nice to have
optimization.

Fixes: 7b508037d4cac3 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ioctl.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Qu Wenruo Jan. 21, 2022, 12:03 a.m. UTC | #1
On 2022/1/21 01:41, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> When starting a defrag, we should update the writeback index of the
> inode's mapping in case it currently has a value beyond the start of the
> range we are defragging. This can help performance and often result in
> getting less extents after writeback - for e.g., if the current value
> of the writeback index sits somewhere in the middle of a range that
> gets dirty by the defrag, then after writeback we can get two smaller
> extents instead of a single, larger extent.
>
> We used to have this before the refactoring in 5.16, but it was removed
> without any reason to do so. Orginally it was added in kernel 3.1, by
> commit 2a0f7f5769992b ("Btrfs: fix recursive auto-defrag"), in order to
> fix a loop with autodefrag resulting in dirtying and writing pages over
> and over, but some testing on current code did not show that happening,
> at least with the test described in that commit.
>
> So add back the behaviour, as at the very least it is a nice to have
> optimization.

Writeback_index is always one mystery to me.

In fact just re-checking the writeback_index usage, I found the metadata
writeback is reading that value just like data writeback.
But we don't have any call sites to set writeback_index for btree inode.

Is there any better doc for the proper behavior for writeback_index?

Thanks,
Qu

>
> Fixes: 7b508037d4cac3 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>   fs/btrfs/ioctl.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index bfe5ed65e92b..95d0e210f063 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1535,6 +1535,7 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
>   	int compress_type = BTRFS_COMPRESS_ZLIB;
>   	int ret = 0;
>   	u32 extent_thresh = range->extent_thresh;
> +	pgoff_t start_index;
>
>   	if (isize == 0)
>   		return 0;
> @@ -1576,6 +1577,14 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
>   			file_ra_state_init(ra, inode->i_mapping);
>   	}
>
> +	/*
> +	 * Make writeback start from the beginning of the range, so that the
> +	 * defrag range can be written sequentially.
> +	 */
> +	start_index = cur >> PAGE_SHIFT;
> +	if (start_index < inode->i_mapping->writeback_index)
> +		inode->i_mapping->writeback_index = start_index;
> +
>   	while (cur < last_byte) {
>   		const unsigned long prev_sectors_defragged = sectors_defragged;
>   		u64 cluster_end;
Filipe Manana Jan. 21, 2022, 10:18 a.m. UTC | #2
On Fri, Jan 21, 2022 at 12:03 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2022/1/21 01:41, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > When starting a defrag, we should update the writeback index of the
> > inode's mapping in case it currently has a value beyond the start of the
> > range we are defragging. This can help performance and often result in
> > getting less extents after writeback - for e.g., if the current value
> > of the writeback index sits somewhere in the middle of a range that
> > gets dirty by the defrag, then after writeback we can get two smaller
> > extents instead of a single, larger extent.
> >
> > We used to have this before the refactoring in 5.16, but it was removed
> > without any reason to do so. Orginally it was added in kernel 3.1, by
> > commit 2a0f7f5769992b ("Btrfs: fix recursive auto-defrag"), in order to
> > fix a loop with autodefrag resulting in dirtying and writing pages over
> > and over, but some testing on current code did not show that happening,
> > at least with the test described in that commit.
> >
> > So add back the behaviour, as at the very least it is a nice to have
> > optimization.
>
> Writeback_index is always one mystery to me.
>
> In fact just re-checking the writeback_index usage, I found the metadata
> writeback is reading that value just like data writeback.
> But we don't have any call sites to set writeback_index for btree inode.
>
> Is there any better doc for the proper behavior for writeback_index?

I don't know... maybe somewhere in the generic writeback code there
are comments.

>
> Thanks,
> Qu
>
> >
> > Fixes: 7b508037d4cac3 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()")
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> >   fs/btrfs/ioctl.c | 9 +++++++++
> >   1 file changed, 9 insertions(+)
> >
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index bfe5ed65e92b..95d0e210f063 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -1535,6 +1535,7 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
> >       int compress_type = BTRFS_COMPRESS_ZLIB;
> >       int ret = 0;
> >       u32 extent_thresh = range->extent_thresh;
> > +     pgoff_t start_index;
> >
> >       if (isize == 0)
> >               return 0;
> > @@ -1576,6 +1577,14 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
> >                       file_ra_state_init(ra, inode->i_mapping);
> >       }
> >
> > +     /*
> > +      * Make writeback start from the beginning of the range, so that the
> > +      * defrag range can be written sequentially.
> > +      */
> > +     start_index = cur >> PAGE_SHIFT;
> > +     if (start_index < inode->i_mapping->writeback_index)
> > +             inode->i_mapping->writeback_index = start_index;
> > +
> >       while (cur < last_byte) {
> >               const unsigned long prev_sectors_defragged = sectors_defragged;
> >               u64 cluster_end;
David Sterba Jan. 24, 2022, 5:09 p.m. UTC | #3
On Thu, Jan 20, 2022 at 05:41:17PM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When starting a defrag, we should update the writeback index of the
> inode's mapping in case it currently has a value beyond the start of the
> range we are defragging. This can help performance and often result in
> getting less extents after writeback - for e.g., if the current value
> of the writeback index sits somewhere in the middle of a range that
> gets dirty by the defrag, then after writeback we can get two smaller
> extents instead of a single, larger extent.
> 
> We used to have this before the refactoring in 5.16, but it was removed
> without any reason to do so. Orginally it was added in kernel 3.1, by
> commit 2a0f7f5769992b ("Btrfs: fix recursive auto-defrag"), in order to
> fix a loop with autodefrag resulting in dirtying and writing pages over
> and over, but some testing on current code did not show that happening,
> at least with the test described in that commit.
> 
> So add back the behaviour, as at the very least it is a nice to have
> optimization.
> 
> Fixes: 7b508037d4cac3 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Added to misc-next, thank.
diff mbox series

Patch

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index bfe5ed65e92b..95d0e210f063 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1535,6 +1535,7 @@  int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
 	int compress_type = BTRFS_COMPRESS_ZLIB;
 	int ret = 0;
 	u32 extent_thresh = range->extent_thresh;
+	pgoff_t start_index;
 
 	if (isize == 0)
 		return 0;
@@ -1576,6 +1577,14 @@  int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
 			file_ra_state_init(ra, inode->i_mapping);
 	}
 
+	/*
+	 * Make writeback start from the beginning of the range, so that the
+	 * defrag range can be written sequentially.
+	 */
+	start_index = cur >> PAGE_SHIFT;
+	if (start_index < inode->i_mapping->writeback_index)
+		inode->i_mapping->writeback_index = start_index;
+
 	while (cur < last_byte) {
 		const unsigned long prev_sectors_defragged = sectors_defragged;
 		u64 cluster_end;