Message ID | 20aad8ccf0fbdecddd49216f25fa772754f77978.1642700395.git.fdmanana@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: update writeback index when starting defrag | expand |
On 2022/1/21 01:41, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > When starting a defrag, we should update the writeback index of the > inode's mapping in case it currently has a value beyond the start of the > range we are defragging. This can help performance and often result in > getting less extents after writeback - for e.g., if the current value > of the writeback index sits somewhere in the middle of a range that > gets dirty by the defrag, then after writeback we can get two smaller > extents instead of a single, larger extent. > > We used to have this before the refactoring in 5.16, but it was removed > without any reason to do so. Orginally it was added in kernel 3.1, by > commit 2a0f7f5769992b ("Btrfs: fix recursive auto-defrag"), in order to > fix a loop with autodefrag resulting in dirtying and writing pages over > and over, but some testing on current code did not show that happening, > at least with the test described in that commit. > > So add back the behaviour, as at the very least it is a nice to have > optimization. Writeback_index is always one mystery to me. In fact just re-checking the writeback_index usage, I found the metadata writeback is reading that value just like data writeback. But we don't have any call sites to set writeback_index for btree inode. Is there any better doc for the proper behavior for writeback_index? Thanks, Qu > > Fixes: 7b508037d4cac3 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()") > Signed-off-by: Filipe Manana <fdmanana@suse.com> > --- > fs/btrfs/ioctl.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index bfe5ed65e92b..95d0e210f063 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -1535,6 +1535,7 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra, > int compress_type = BTRFS_COMPRESS_ZLIB; > int ret = 0; > u32 extent_thresh = range->extent_thresh; > + pgoff_t start_index; > > if (isize == 0) > return 0; > @@ -1576,6 +1577,14 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra, > file_ra_state_init(ra, inode->i_mapping); > } > > + /* > + * Make writeback start from the beginning of the range, so that the > + * defrag range can be written sequentially. > + */ > + start_index = cur >> PAGE_SHIFT; > + if (start_index < inode->i_mapping->writeback_index) > + inode->i_mapping->writeback_index = start_index; > + > while (cur < last_byte) { > const unsigned long prev_sectors_defragged = sectors_defragged; > u64 cluster_end;
On Fri, Jan 21, 2022 at 12:03 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > On 2022/1/21 01:41, fdmanana@kernel.org wrote: > > From: Filipe Manana <fdmanana@suse.com> > > > > When starting a defrag, we should update the writeback index of the > > inode's mapping in case it currently has a value beyond the start of the > > range we are defragging. This can help performance and often result in > > getting less extents after writeback - for e.g., if the current value > > of the writeback index sits somewhere in the middle of a range that > > gets dirty by the defrag, then after writeback we can get two smaller > > extents instead of a single, larger extent. > > > > We used to have this before the refactoring in 5.16, but it was removed > > without any reason to do so. Orginally it was added in kernel 3.1, by > > commit 2a0f7f5769992b ("Btrfs: fix recursive auto-defrag"), in order to > > fix a loop with autodefrag resulting in dirtying and writing pages over > > and over, but some testing on current code did not show that happening, > > at least with the test described in that commit. > > > > So add back the behaviour, as at the very least it is a nice to have > > optimization. > > Writeback_index is always one mystery to me. > > In fact just re-checking the writeback_index usage, I found the metadata > writeback is reading that value just like data writeback. > But we don't have any call sites to set writeback_index for btree inode. > > Is there any better doc for the proper behavior for writeback_index? I don't know... maybe somewhere in the generic writeback code there are comments. > > Thanks, > Qu > > > > > Fixes: 7b508037d4cac3 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()") > > Signed-off-by: Filipe Manana <fdmanana@suse.com> > > --- > > fs/btrfs/ioctl.c | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > > index bfe5ed65e92b..95d0e210f063 100644 > > --- a/fs/btrfs/ioctl.c > > +++ b/fs/btrfs/ioctl.c > > @@ -1535,6 +1535,7 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra, > > int compress_type = BTRFS_COMPRESS_ZLIB; > > int ret = 0; > > u32 extent_thresh = range->extent_thresh; > > + pgoff_t start_index; > > > > if (isize == 0) > > return 0; > > @@ -1576,6 +1577,14 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra, > > file_ra_state_init(ra, inode->i_mapping); > > } > > > > + /* > > + * Make writeback start from the beginning of the range, so that the > > + * defrag range can be written sequentially. > > + */ > > + start_index = cur >> PAGE_SHIFT; > > + if (start_index < inode->i_mapping->writeback_index) > > + inode->i_mapping->writeback_index = start_index; > > + > > while (cur < last_byte) { > > const unsigned long prev_sectors_defragged = sectors_defragged; > > u64 cluster_end;
On Thu, Jan 20, 2022 at 05:41:17PM +0000, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > When starting a defrag, we should update the writeback index of the > inode's mapping in case it currently has a value beyond the start of the > range we are defragging. This can help performance and often result in > getting less extents after writeback - for e.g., if the current value > of the writeback index sits somewhere in the middle of a range that > gets dirty by the defrag, then after writeback we can get two smaller > extents instead of a single, larger extent. > > We used to have this before the refactoring in 5.16, but it was removed > without any reason to do so. Orginally it was added in kernel 3.1, by > commit 2a0f7f5769992b ("Btrfs: fix recursive auto-defrag"), in order to > fix a loop with autodefrag resulting in dirtying and writing pages over > and over, but some testing on current code did not show that happening, > at least with the test described in that commit. > > So add back the behaviour, as at the very least it is a nice to have > optimization. > > Fixes: 7b508037d4cac3 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()") > Signed-off-by: Filipe Manana <fdmanana@suse.com> Added to misc-next, thank.
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index bfe5ed65e92b..95d0e210f063 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1535,6 +1535,7 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra, int compress_type = BTRFS_COMPRESS_ZLIB; int ret = 0; u32 extent_thresh = range->extent_thresh; + pgoff_t start_index; if (isize == 0) return 0; @@ -1576,6 +1577,14 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra, file_ra_state_init(ra, inode->i_mapping); } + /* + * Make writeback start from the beginning of the range, so that the + * defrag range can be written sequentially. + */ + start_index = cur >> PAGE_SHIFT; + if (start_index < inode->i_mapping->writeback_index) + inode->i_mapping->writeback_index = start_index; + while (cur < last_byte) { const unsigned long prev_sectors_defragged = sectors_defragged; u64 cluster_end;