diff mbox series

[1/4] btrfs: remove force_page_uptodate variable from btrfs_buffered_write()

Message ID 66698e7eb0589e818eec555abc3b04969dcb48f1.1742443383.git.wqu@suse.com (mailing list archive)
State New
Headers show
Series btrfs: refactor btrfs_buffered_write() for the incoming large data folios | expand

Commit Message

Qu Wenruo March 20, 2025, 5:34 a.m. UTC
Commit c87c299776e4 ("btrfs: make buffered write to copy one page a
time") changed how the variable @force_page_uptodate is updated.

Before that commit the variable is only initialized to false at the
beginning of the function, and after hitting a short copy, the next
retry on the same folio will force the foilio to be read from the disk.

But after the commit, the variable is always updated to false for each
iteration, causing prepare_one_folio() never to get a true value passed
in.

The change in behavior is not a huge deal, it only makes a difference
on how we handle short copies:

Old: Allow the buffer to be split

     The first short copy will be rejected, that's the same for both
     cases.

     But for the next retry, we require the folio to be read from disk.

     Then even if we hit a short copy again, since the folio is already
     uptodate, we do not need to handle partial uptodate range, and can
     continue, marking the short copied range as dirty and continue.

     This will split the buffer write into the folio as two buffered
     writes.

New: Do not allow the buffer to be split

     The first short copy will be rejected, that's the same for both
     cases.

     For the next retry, we do nothing special, thus if the short copy
     happened again, we reject it again, until either the short copy is
     gone, or we failed to fault in the buffer.

     This will mean the buffer write into the folio will either fail or
     success, no split will happen.

To me, either solution is fine, but the newer one makes it simpler and
requires no special handling, so I prefer that solution.

And since @force_page_uptodate is always false when passed into
prepare_one_folio(), we can just remove the variable.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

Comments

Johannes Thumshirn March 21, 2025, 9:41 a.m. UTC | #1
Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Filipe Manana March 21, 2025, noon UTC | #2
On Thu, Mar 20, 2025 at 5:36 AM Qu Wenruo <wqu@suse.com> wrote:
>
> Commit c87c299776e4 ("btrfs: make buffered write to copy one page a
> time") changed how the variable @force_page_uptodate is updated.
>
> Before that commit the variable is only initialized to false at the
> beginning of the function, and after hitting a short copy, the next
> retry on the same folio will force the foilio to be read from the disk.

foilio -> folio

>
> But after the commit, the variable is always updated to false for each

I think saying "is always initialized to false at the beginning of the
loop's scope" is more clear.
When you say updated it gives the idea it was declared in an outer
scope, but that's not the case anymore.

> iteration, causing prepare_one_folio() never to get a true value passed
> in.
>
> The change in behavior is not a huge deal, it only makes a difference
> on how we handle short copies:
>
> Old: Allow the buffer to be split
>
>      The first short copy will be rejected, that's the same for both
>      cases.
>
>      But for the next retry, we require the folio to be read from disk.
>
>      Then even if we hit a short copy again, since the folio is already
>      uptodate, we do not need to handle partial uptodate range, and can
>      continue, marking the short copied range as dirty and continue.
>
>      This will split the buffer write into the folio as two buffered
>      writes.
>
> New: Do not allow the buffer to be split
>
>      The first short copy will be rejected, that's the same for both
>      cases.
>
>      For the next retry, we do nothing special, thus if the short copy
>      happened again, we reject it again, until either the short copy is
>      gone, or we failed to fault in the buffer.
>
>      This will mean the buffer write into the folio will either fail or
>      success, no split will happen.
>
> To me, either solution is fine, but the newer one makes it simpler and
> requires no special handling, so I prefer that solution.

Ok so this explanation of different behaviour is something that should
have been in the change log of commit c87c299776e4 ("btrfs: make
buffered write to copy one page a time").

With the new behaviour, when folios larger than 1 page are supported,
I wonder if we don't risk looping over the same subrange many times,
in case we keep needing to faultin due to memory pressure.

>
> And since @force_page_uptodate is always false when passed into
> prepare_one_folio(), we can just remove the variable.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Anyway, this change looks good, and at least with the typo fixed:

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

> ---
>  fs/btrfs/file.c | 19 ++++++-------------
>  1 file changed, 6 insertions(+), 13 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index c2648858772a..b7eb1f0164bb 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -800,7 +800,7 @@ int btrfs_mark_extent_written(struct btrfs_trans_handle *trans,
>   * On success return a locked folio and 0
>   */
>  static int prepare_uptodate_folio(struct inode *inode, struct folio *folio, u64 pos,
> -                                 u64 len, bool force_uptodate)
> +                                 u64 len)
>  {
>         u64 clamp_start = max_t(u64, pos, folio_pos(folio));
>         u64 clamp_end = min_t(u64, pos + len, folio_pos(folio) + folio_size(folio));
> @@ -810,8 +810,7 @@ static int prepare_uptodate_folio(struct inode *inode, struct folio *folio, u64
>         if (folio_test_uptodate(folio))
>                 return 0;
>
> -       if (!force_uptodate &&
> -           IS_ALIGNED(clamp_start, blocksize) &&
> +       if (IS_ALIGNED(clamp_start, blocksize) &&
>             IS_ALIGNED(clamp_end, blocksize))
>                 return 0;
>
> @@ -858,7 +857,7 @@ static gfp_t get_prepare_gfp_flags(struct inode *inode, bool nowait)
>   */
>  static noinline int prepare_one_folio(struct inode *inode, struct folio **folio_ret,
>                                       loff_t pos, size_t write_bytes,
> -                                     bool force_uptodate, bool nowait)
> +                                     bool nowait)
>  {
>         unsigned long index = pos >> PAGE_SHIFT;
>         gfp_t mask = get_prepare_gfp_flags(inode, nowait);
> @@ -881,7 +880,7 @@ static noinline int prepare_one_folio(struct inode *inode, struct folio **folio_
>                 folio_put(folio);
>                 return ret;
>         }
> -       ret = prepare_uptodate_folio(inode, folio, pos, write_bytes, force_uptodate);
> +       ret = prepare_uptodate_folio(inode, folio, pos, write_bytes);
>         if (ret) {
>                 /* The folio is already unlocked. */
>                 folio_put(folio);
> @@ -1127,7 +1126,6 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
>                 size_t num_sectors;
>                 struct folio *folio = NULL;
>                 int extents_locked;
> -               bool force_page_uptodate = false;
>
>                 /*
>                  * Fault pages before locking them in prepare_one_folio()
> @@ -1196,8 +1194,7 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
>                         break;
>                 }
>
> -               ret = prepare_one_folio(inode, &folio, pos, write_bytes,
> -                                       force_page_uptodate, false);
> +               ret = prepare_one_folio(inode, &folio, pos, write_bytes, false);
>                 if (ret) {
>                         btrfs_delalloc_release_extents(BTRFS_I(inode),
>                                                        reserve_bytes);
> @@ -1240,12 +1237,8 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
>                                         fs_info->sectorsize);
>                 dirty_sectors = BTRFS_BYTES_TO_BLKS(fs_info, dirty_sectors);
>
> -               if (copied == 0) {
> -                       force_page_uptodate = true;
> +               if (copied == 0)
>                         dirty_sectors = 0;
> -               } else {
> -                       force_page_uptodate = false;
> -               }
>
>                 if (num_sectors > dirty_sectors) {
>                         /* release everything except the sectors we dirtied */
> --
> 2.49.0
>
>
Qu Wenruo March 22, 2025, 8:50 a.m. UTC | #3
在 2025/3/21 22:30, Filipe Manana 写道:
> On Thu, Mar 20, 2025 at 5:36 AM Qu Wenruo <wqu@suse.com> wrote:
[...]
>>
>> To me, either solution is fine, but the newer one makes it simpler and
>> requires no special handling, so I prefer that solution.
> 
> Ok so this explanation of different behaviour is something that should
> have been in the change log of commit c87c299776e4 ("btrfs: make
> buffered write to copy one page a time").

Yeah, that's in fact an unexpected behavior change, or you can call it a 
bug in this case.

> 
> With the new behaviour, when folios larger than 1 page are supported,
> I wonder if we don't risk looping over the same subrange many times,
> in case we keep needing to faultin due to memory pressure.

There is a small save there, that when we fault in the iov_iter, any 
bytes not faulted in will immediately trigger an -EFAULT error.

So in that case, we should hit -EFAULT more, other than looping over the 
same range again and again.

Although this may be a problem for future iomap migration, as iomap only 
requires to fault in part of the iov_iter.

Thanks,
Qu

> 
>>
>> And since @force_page_uptodate is always false when passed into
>> prepare_one_folio(), we can just remove the variable.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> 
> Anyway, this change looks good, and at least with the typo fixed:
> 
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> 
> Thanks.
> 
>> ---
>>   fs/btrfs/file.c | 19 ++++++-------------
>>   1 file changed, 6 insertions(+), 13 deletions(-)
>>
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index c2648858772a..b7eb1f0164bb 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -800,7 +800,7 @@ int btrfs_mark_extent_written(struct btrfs_trans_handle *trans,
>>    * On success return a locked folio and 0
>>    */
>>   static int prepare_uptodate_folio(struct inode *inode, struct folio *folio, u64 pos,
>> -                                 u64 len, bool force_uptodate)
>> +                                 u64 len)
>>   {
>>          u64 clamp_start = max_t(u64, pos, folio_pos(folio));
>>          u64 clamp_end = min_t(u64, pos + len, folio_pos(folio) + folio_size(folio));
>> @@ -810,8 +810,7 @@ static int prepare_uptodate_folio(struct inode *inode, struct folio *folio, u64
>>          if (folio_test_uptodate(folio))
>>                  return 0;
>>
>> -       if (!force_uptodate &&
>> -           IS_ALIGNED(clamp_start, blocksize) &&
>> +       if (IS_ALIGNED(clamp_start, blocksize) &&
>>              IS_ALIGNED(clamp_end, blocksize))
>>                  return 0;
>>
>> @@ -858,7 +857,7 @@ static gfp_t get_prepare_gfp_flags(struct inode *inode, bool nowait)
>>    */
>>   static noinline int prepare_one_folio(struct inode *inode, struct folio **folio_ret,
>>                                        loff_t pos, size_t write_bytes,
>> -                                     bool force_uptodate, bool nowait)
>> +                                     bool nowait)
>>   {
>>          unsigned long index = pos >> PAGE_SHIFT;
>>          gfp_t mask = get_prepare_gfp_flags(inode, nowait);
>> @@ -881,7 +880,7 @@ static noinline int prepare_one_folio(struct inode *inode, struct folio **folio_
>>                  folio_put(folio);
>>                  return ret;
>>          }
>> -       ret = prepare_uptodate_folio(inode, folio, pos, write_bytes, force_uptodate);
>> +       ret = prepare_uptodate_folio(inode, folio, pos, write_bytes);
>>          if (ret) {
>>                  /* The folio is already unlocked. */
>>                  folio_put(folio);
>> @@ -1127,7 +1126,6 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
>>                  size_t num_sectors;
>>                  struct folio *folio = NULL;
>>                  int extents_locked;
>> -               bool force_page_uptodate = false;
>>
>>                  /*
>>                   * Fault pages before locking them in prepare_one_folio()
>> @@ -1196,8 +1194,7 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
>>                          break;
>>                  }
>>
>> -               ret = prepare_one_folio(inode, &folio, pos, write_bytes,
>> -                                       force_page_uptodate, false);
>> +               ret = prepare_one_folio(inode, &folio, pos, write_bytes, false);
>>                  if (ret) {
>>                          btrfs_delalloc_release_extents(BTRFS_I(inode),
>>                                                         reserve_bytes);
>> @@ -1240,12 +1237,8 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
>>                                          fs_info->sectorsize);
>>                  dirty_sectors = BTRFS_BYTES_TO_BLKS(fs_info, dirty_sectors);
>>
>> -               if (copied == 0) {
>> -                       force_page_uptodate = true;
>> +               if (copied == 0)
>>                          dirty_sectors = 0;
>> -               } else {
>> -                       force_page_uptodate = false;
>> -               }
>>
>>                  if (num_sectors > dirty_sectors) {
>>                          /* release everything except the sectors we dirtied */
>> --
>> 2.49.0
>>
>>
diff mbox series

Patch

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index c2648858772a..b7eb1f0164bb 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -800,7 +800,7 @@  int btrfs_mark_extent_written(struct btrfs_trans_handle *trans,
  * On success return a locked folio and 0
  */
 static int prepare_uptodate_folio(struct inode *inode, struct folio *folio, u64 pos,
-				  u64 len, bool force_uptodate)
+				  u64 len)
 {
 	u64 clamp_start = max_t(u64, pos, folio_pos(folio));
 	u64 clamp_end = min_t(u64, pos + len, folio_pos(folio) + folio_size(folio));
@@ -810,8 +810,7 @@  static int prepare_uptodate_folio(struct inode *inode, struct folio *folio, u64
 	if (folio_test_uptodate(folio))
 		return 0;
 
-	if (!force_uptodate &&
-	    IS_ALIGNED(clamp_start, blocksize) &&
+	if (IS_ALIGNED(clamp_start, blocksize) &&
 	    IS_ALIGNED(clamp_end, blocksize))
 		return 0;
 
@@ -858,7 +857,7 @@  static gfp_t get_prepare_gfp_flags(struct inode *inode, bool nowait)
  */
 static noinline int prepare_one_folio(struct inode *inode, struct folio **folio_ret,
 				      loff_t pos, size_t write_bytes,
-				      bool force_uptodate, bool nowait)
+				      bool nowait)
 {
 	unsigned long index = pos >> PAGE_SHIFT;
 	gfp_t mask = get_prepare_gfp_flags(inode, nowait);
@@ -881,7 +880,7 @@  static noinline int prepare_one_folio(struct inode *inode, struct folio **folio_
 		folio_put(folio);
 		return ret;
 	}
-	ret = prepare_uptodate_folio(inode, folio, pos, write_bytes, force_uptodate);
+	ret = prepare_uptodate_folio(inode, folio, pos, write_bytes);
 	if (ret) {
 		/* The folio is already unlocked. */
 		folio_put(folio);
@@ -1127,7 +1126,6 @@  ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
 		size_t num_sectors;
 		struct folio *folio = NULL;
 		int extents_locked;
-		bool force_page_uptodate = false;
 
 		/*
 		 * Fault pages before locking them in prepare_one_folio()
@@ -1196,8 +1194,7 @@  ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
 			break;
 		}
 
-		ret = prepare_one_folio(inode, &folio, pos, write_bytes,
-					force_page_uptodate, false);
+		ret = prepare_one_folio(inode, &folio, pos, write_bytes, false);
 		if (ret) {
 			btrfs_delalloc_release_extents(BTRFS_I(inode),
 						       reserve_bytes);
@@ -1240,12 +1237,8 @@  ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
 					fs_info->sectorsize);
 		dirty_sectors = BTRFS_BYTES_TO_BLKS(fs_info, dirty_sectors);
 
-		if (copied == 0) {
-			force_page_uptodate = true;
+		if (copied == 0)
 			dirty_sectors = 0;
-		} else {
-			force_page_uptodate = false;
-		}
 
 		if (num_sectors > dirty_sectors) {
 			/* release everything except the sectors we dirtied */