Message ID | 66698e7eb0589e818eec555abc3b04969dcb48f1.1742443383.git.wqu@suse.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | btrfs: refactor btrfs_buffered_write() for the incoming large data folios | expand |
Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
On Thu, Mar 20, 2025 at 5:36 AM Qu Wenruo <wqu@suse.com> wrote: > > Commit c87c299776e4 ("btrfs: make buffered write to copy one page a > time") changed how the variable @force_page_uptodate is updated. > > Before that commit the variable is only initialized to false at the > beginning of the function, and after hitting a short copy, the next > retry on the same folio will force the foilio to be read from the disk. foilio -> folio > > But after the commit, the variable is always updated to false for each I think saying "is always initialized to false at the beginning of the loop's scope" is more clear. When you say updated it gives the idea it was declared in an outer scope, but that's not the case anymore. > iteration, causing prepare_one_folio() never to get a true value passed > in. > > The change in behavior is not a huge deal, it only makes a difference > on how we handle short copies: > > Old: Allow the buffer to be split > > The first short copy will be rejected, that's the same for both > cases. > > But for the next retry, we require the folio to be read from disk. > > Then even if we hit a short copy again, since the folio is already > uptodate, we do not need to handle partial uptodate range, and can > continue, marking the short copied range as dirty and continue. > > This will split the buffer write into the folio as two buffered > writes. > > New: Do not allow the buffer to be split > > The first short copy will be rejected, that's the same for both > cases. > > For the next retry, we do nothing special, thus if the short copy > happened again, we reject it again, until either the short copy is > gone, or we failed to fault in the buffer. > > This will mean the buffer write into the folio will either fail or > success, no split will happen. > > To me, either solution is fine, but the newer one makes it simpler and > requires no special handling, so I prefer that solution. Ok so this explanation of different behaviour is something that should have been in the change log of commit c87c299776e4 ("btrfs: make buffered write to copy one page a time"). With the new behaviour, when folios larger than 1 page are supported, I wonder if we don't risk looping over the same subrange many times, in case we keep needing to faultin due to memory pressure. > > And since @force_page_uptodate is always false when passed into > prepare_one_folio(), we can just remove the variable. > > Signed-off-by: Qu Wenruo <wqu@suse.com> Anyway, this change looks good, and at least with the typo fixed: Reviewed-by: Filipe Manana <fdmanana@suse.com> Thanks. > --- > fs/btrfs/file.c | 19 ++++++------------- > 1 file changed, 6 insertions(+), 13 deletions(-) > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index c2648858772a..b7eb1f0164bb 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -800,7 +800,7 @@ int btrfs_mark_extent_written(struct btrfs_trans_handle *trans, > * On success return a locked folio and 0 > */ > static int prepare_uptodate_folio(struct inode *inode, struct folio *folio, u64 pos, > - u64 len, bool force_uptodate) > + u64 len) > { > u64 clamp_start = max_t(u64, pos, folio_pos(folio)); > u64 clamp_end = min_t(u64, pos + len, folio_pos(folio) + folio_size(folio)); > @@ -810,8 +810,7 @@ static int prepare_uptodate_folio(struct inode *inode, struct folio *folio, u64 > if (folio_test_uptodate(folio)) > return 0; > > - if (!force_uptodate && > - IS_ALIGNED(clamp_start, blocksize) && > + if (IS_ALIGNED(clamp_start, blocksize) && > IS_ALIGNED(clamp_end, blocksize)) > return 0; > > @@ -858,7 +857,7 @@ static gfp_t get_prepare_gfp_flags(struct inode *inode, bool nowait) > */ > static noinline int prepare_one_folio(struct inode *inode, struct folio **folio_ret, > loff_t pos, size_t write_bytes, > - bool force_uptodate, bool nowait) > + bool nowait) > { > unsigned long index = pos >> PAGE_SHIFT; > gfp_t mask = get_prepare_gfp_flags(inode, nowait); > @@ -881,7 +880,7 @@ static noinline int prepare_one_folio(struct inode *inode, struct folio **folio_ > folio_put(folio); > return ret; > } > - ret = prepare_uptodate_folio(inode, folio, pos, write_bytes, force_uptodate); > + ret = prepare_uptodate_folio(inode, folio, pos, write_bytes); > if (ret) { > /* The folio is already unlocked. */ > folio_put(folio); > @@ -1127,7 +1126,6 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i) > size_t num_sectors; > struct folio *folio = NULL; > int extents_locked; > - bool force_page_uptodate = false; > > /* > * Fault pages before locking them in prepare_one_folio() > @@ -1196,8 +1194,7 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i) > break; > } > > - ret = prepare_one_folio(inode, &folio, pos, write_bytes, > - force_page_uptodate, false); > + ret = prepare_one_folio(inode, &folio, pos, write_bytes, false); > if (ret) { > btrfs_delalloc_release_extents(BTRFS_I(inode), > reserve_bytes); > @@ -1240,12 +1237,8 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i) > fs_info->sectorsize); > dirty_sectors = BTRFS_BYTES_TO_BLKS(fs_info, dirty_sectors); > > - if (copied == 0) { > - force_page_uptodate = true; > + if (copied == 0) > dirty_sectors = 0; > - } else { > - force_page_uptodate = false; > - } > > if (num_sectors > dirty_sectors) { > /* release everything except the sectors we dirtied */ > -- > 2.49.0 > >
在 2025/3/21 22:30, Filipe Manana 写道: > On Thu, Mar 20, 2025 at 5:36 AM Qu Wenruo <wqu@suse.com> wrote: [...] >> >> To me, either solution is fine, but the newer one makes it simpler and >> requires no special handling, so I prefer that solution. > > Ok so this explanation of different behaviour is something that should > have been in the change log of commit c87c299776e4 ("btrfs: make > buffered write to copy one page a time"). Yeah, that's in fact an unexpected behavior change, or you can call it a bug in this case. > > With the new behaviour, when folios larger than 1 page are supported, > I wonder if we don't risk looping over the same subrange many times, > in case we keep needing to faultin due to memory pressure. There is a small save there, that when we fault in the iov_iter, any bytes not faulted in will immediately trigger an -EFAULT error. So in that case, we should hit -EFAULT more, other than looping over the same range again and again. Although this may be a problem for future iomap migration, as iomap only requires to fault in part of the iov_iter. Thanks, Qu > >> >> And since @force_page_uptodate is always false when passed into >> prepare_one_folio(), we can just remove the variable. >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> > > Anyway, this change looks good, and at least with the typo fixed: > > Reviewed-by: Filipe Manana <fdmanana@suse.com> > > Thanks. > >> --- >> fs/btrfs/file.c | 19 ++++++------------- >> 1 file changed, 6 insertions(+), 13 deletions(-) >> >> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c >> index c2648858772a..b7eb1f0164bb 100644 >> --- a/fs/btrfs/file.c >> +++ b/fs/btrfs/file.c >> @@ -800,7 +800,7 @@ int btrfs_mark_extent_written(struct btrfs_trans_handle *trans, >> * On success return a locked folio and 0 >> */ >> static int prepare_uptodate_folio(struct inode *inode, struct folio *folio, u64 pos, >> - u64 len, bool force_uptodate) >> + u64 len) >> { >> u64 clamp_start = max_t(u64, pos, folio_pos(folio)); >> u64 clamp_end = min_t(u64, pos + len, folio_pos(folio) + folio_size(folio)); >> @@ -810,8 +810,7 @@ static int prepare_uptodate_folio(struct inode *inode, struct folio *folio, u64 >> if (folio_test_uptodate(folio)) >> return 0; >> >> - if (!force_uptodate && >> - IS_ALIGNED(clamp_start, blocksize) && >> + if (IS_ALIGNED(clamp_start, blocksize) && >> IS_ALIGNED(clamp_end, blocksize)) >> return 0; >> >> @@ -858,7 +857,7 @@ static gfp_t get_prepare_gfp_flags(struct inode *inode, bool nowait) >> */ >> static noinline int prepare_one_folio(struct inode *inode, struct folio **folio_ret, >> loff_t pos, size_t write_bytes, >> - bool force_uptodate, bool nowait) >> + bool nowait) >> { >> unsigned long index = pos >> PAGE_SHIFT; >> gfp_t mask = get_prepare_gfp_flags(inode, nowait); >> @@ -881,7 +880,7 @@ static noinline int prepare_one_folio(struct inode *inode, struct folio **folio_ >> folio_put(folio); >> return ret; >> } >> - ret = prepare_uptodate_folio(inode, folio, pos, write_bytes, force_uptodate); >> + ret = prepare_uptodate_folio(inode, folio, pos, write_bytes); >> if (ret) { >> /* The folio is already unlocked. */ >> folio_put(folio); >> @@ -1127,7 +1126,6 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i) >> size_t num_sectors; >> struct folio *folio = NULL; >> int extents_locked; >> - bool force_page_uptodate = false; >> >> /* >> * Fault pages before locking them in prepare_one_folio() >> @@ -1196,8 +1194,7 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i) >> break; >> } >> >> - ret = prepare_one_folio(inode, &folio, pos, write_bytes, >> - force_page_uptodate, false); >> + ret = prepare_one_folio(inode, &folio, pos, write_bytes, false); >> if (ret) { >> btrfs_delalloc_release_extents(BTRFS_I(inode), >> reserve_bytes); >> @@ -1240,12 +1237,8 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i) >> fs_info->sectorsize); >> dirty_sectors = BTRFS_BYTES_TO_BLKS(fs_info, dirty_sectors); >> >> - if (copied == 0) { >> - force_page_uptodate = true; >> + if (copied == 0) >> dirty_sectors = 0; >> - } else { >> - force_page_uptodate = false; >> - } >> >> if (num_sectors > dirty_sectors) { >> /* release everything except the sectors we dirtied */ >> -- >> 2.49.0 >> >>
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index c2648858772a..b7eb1f0164bb 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -800,7 +800,7 @@ int btrfs_mark_extent_written(struct btrfs_trans_handle *trans, * On success return a locked folio and 0 */ static int prepare_uptodate_folio(struct inode *inode, struct folio *folio, u64 pos, - u64 len, bool force_uptodate) + u64 len) { u64 clamp_start = max_t(u64, pos, folio_pos(folio)); u64 clamp_end = min_t(u64, pos + len, folio_pos(folio) + folio_size(folio)); @@ -810,8 +810,7 @@ static int prepare_uptodate_folio(struct inode *inode, struct folio *folio, u64 if (folio_test_uptodate(folio)) return 0; - if (!force_uptodate && - IS_ALIGNED(clamp_start, blocksize) && + if (IS_ALIGNED(clamp_start, blocksize) && IS_ALIGNED(clamp_end, blocksize)) return 0; @@ -858,7 +857,7 @@ static gfp_t get_prepare_gfp_flags(struct inode *inode, bool nowait) */ static noinline int prepare_one_folio(struct inode *inode, struct folio **folio_ret, loff_t pos, size_t write_bytes, - bool force_uptodate, bool nowait) + bool nowait) { unsigned long index = pos >> PAGE_SHIFT; gfp_t mask = get_prepare_gfp_flags(inode, nowait); @@ -881,7 +880,7 @@ static noinline int prepare_one_folio(struct inode *inode, struct folio **folio_ folio_put(folio); return ret; } - ret = prepare_uptodate_folio(inode, folio, pos, write_bytes, force_uptodate); + ret = prepare_uptodate_folio(inode, folio, pos, write_bytes); if (ret) { /* The folio is already unlocked. */ folio_put(folio); @@ -1127,7 +1126,6 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i) size_t num_sectors; struct folio *folio = NULL; int extents_locked; - bool force_page_uptodate = false; /* * Fault pages before locking them in prepare_one_folio() @@ -1196,8 +1194,7 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i) break; } - ret = prepare_one_folio(inode, &folio, pos, write_bytes, - force_page_uptodate, false); + ret = prepare_one_folio(inode, &folio, pos, write_bytes, false); if (ret) { btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes); @@ -1240,12 +1237,8 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i) fs_info->sectorsize); dirty_sectors = BTRFS_BYTES_TO_BLKS(fs_info, dirty_sectors); - if (copied == 0) { - force_page_uptodate = true; + if (copied == 0) dirty_sectors = 0; - } else { - force_page_uptodate = false; - } if (num_sectors > dirty_sectors) { /* release everything except the sectors we dirtied */
Commit c87c299776e4 ("btrfs: make buffered write to copy one page a time") changed how the variable @force_page_uptodate is updated. Before that commit the variable is only initialized to false at the beginning of the function, and after hitting a short copy, the next retry on the same folio will force the foilio to be read from the disk. But after the commit, the variable is always updated to false for each iteration, causing prepare_one_folio() never to get a true value passed in. The change in behavior is not a huge deal, it only makes a difference on how we handle short copies: Old: Allow the buffer to be split The first short copy will be rejected, that's the same for both cases. But for the next retry, we require the folio to be read from disk. Then even if we hit a short copy again, since the folio is already uptodate, we do not need to handle partial uptodate range, and can continue, marking the short copied range as dirty and continue. This will split the buffer write into the folio as two buffered writes. New: Do not allow the buffer to be split The first short copy will be rejected, that's the same for both cases. For the next retry, we do nothing special, thus if the short copy happened again, we reject it again, until either the short copy is gone, or we failed to fault in the buffer. This will mean the buffer write into the folio will either fail or success, no split will happen. To me, either solution is fine, but the newer one makes it simpler and requires no special handling, so I prefer that solution. And since @force_page_uptodate is always false when passed into prepare_one_folio(), we can just remove the variable. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/file.c | 19 ++++++------------- 1 file changed, 6 insertions(+), 13 deletions(-)