diff mbox series

[RFC,v2,02/11] fs/buffer: add a for_each_bh() for block_read_full_folio()

Message ID 20241214031050.1337920-3-mcgrof@kernel.org (mailing list archive)
State New
Headers show
Series enable bs > ps for block devices | expand

Commit Message

Luis Chamberlain Dec. 14, 2024, 3:10 a.m. UTC
We want to be able to work through all buffer heads on a folio
for an async read, but in the future we want to support the option
to stop before we've processed all linked buffer heads. To make
code easier to read and follow adopt a for_each_bh(tmp, head) loop
instead of using a do { ... } while () to make the code easier to
read and later be expanded in subsequent patches.

This introduces no functional changes.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 fs/buffer.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

Comments

Matthew Wilcox (Oracle) Dec. 14, 2024, 4:02 a.m. UTC | #1
On Fri, Dec 13, 2024 at 07:10:40PM -0800, Luis Chamberlain wrote:
> -	do {
> +	for_each_bh(bh, head) {
>  		if (buffer_uptodate(bh))
>  			continue;
>  
> @@ -2454,7 +2464,9 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block)
>  				continue;
>  		}
>  		arr[nr++] = bh;
> -	} while (i++, iblock++, (bh = bh->b_this_page) != head);
> +		i++;
> +		iblock++;
> +	}

This is non-equivalent.  That 'continue' you can see would increment i
and iblock.  Now it doesn't.
Luis Chamberlain Dec. 16, 2024, 6:56 p.m. UTC | #2
On Sat, Dec 14, 2024 at 04:02:53AM +0000, Matthew Wilcox wrote:
> On Fri, Dec 13, 2024 at 07:10:40PM -0800, Luis Chamberlain wrote:
> > -	do {
> > +	for_each_bh(bh, head) {
> >  		if (buffer_uptodate(bh))
> >  			continue;
> >  
> > @@ -2454,7 +2464,9 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block)
> >  				continue;
> >  		}
> >  		arr[nr++] = bh;
> > -	} while (i++, iblock++, (bh = bh->b_this_page) != head);
> > +		i++;
> > +		iblock++;
> > +	}
> 
> This is non-equivalent.  That 'continue' you can see would increment i
> and iblock.  Now it doesn't.

Thanks, not sure how I missed that! With that fix in place I ran a full
baseline against ext4 and all XFS profiles.

For ext4 the new failures I see are just:

  * generic/044
  * generic/045
  * generic/046

For cases where we race writing a file, truncate it and check to verify
if the file is non-zero it should have extents. I'll do a regression test
to see which commit messes this up.

For XFS I've tested 20 XFS proflies (non-LBS) and 4 LBS profiles, and
using the latest kdevops-results-archive test results for "fixes-6.13_2024-12-11"
as the baseline and these paatches + the loop fix you mentioned as a
test I mostly see these I need to look into:

  * xfs/009
  * xfs/059
  * xfs/155
  * xfs/168
  * xfs/185
  * xfs/301
  * generic/753

I'm not sure yet if these are flaky or real. The LBS profiles are using 4k sector sizes.

Also when testing with the xfs 32k secttor size profile generic/470
reveals device mapper needs to be updated to reject larger sector sizes
if it does not yet support it as we do with nvme block driver.

The full set of failures for XFS with 32k sector sizes:

generic/054 generic/055 generic/081 generic/102 generic/172 generic/223
generic/347 generic/405 generic/455 generic/457 generic/482 generic/500
generic/741 xfs/014 xfs/020 xfs/032 xfs/049 xfs/078 xfs/129 xfs/144
xfs/149 xfs/164 xfs/165 xfs/170 xfs/174 xfs/188 xfs/206 xfs/216 xfs/234
xfs/250 xfs/253 xfs/284 xfs/289 xfs/292 xfs/294 xfs/503 xfs/514 xfs/522
xfs/524 xfs/543 xfs/597 xfs/598 xfs/604 xfs/605 xfs/606 xfs/614 xfs/631
xfs/806

The full output I get by comparing the test results from
fixes-6.13_2024-12-11 and the run I just did with inside
kdevops-results-archive:

./bin/compare-results-fstests.py d48182fc621f87bc941ef4445e4585a3891923e9 cd7aa6fc6e46733a5dcf6a10b89566cabe0beaf

Comparing commits:
Baseline:      d48182fc621f | linux-xfs-kpd: Merge tag 'fixes-6.13_2024-12-11' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into next-rc
Test:          cd7aa6fc6e46 | linux-xfs-kpd: loop fix noted by willy

Baseline Kernel:6.13.0-rc2+
Test Kernel:   6.13.0-rc2+

Test Results Comparison:
================================================================================

Profile: xfs_crc
  New Failures:
    + xfs/059

Profile: xfs_crc_rtdev_extsize_28k
  New Failures:
    + xfs/301
  Resolved Failures:
    - xfs/185

Profile: xfs_crc_rtdev_extsize_64k
  New Failures:
    + xfs/155
    + xfs/301
  Resolved Failures:
    - xfs/629

Profile: xfs_nocrc
  New Failures:
    + generic/753

Profile: xfs_nocrc_2k
  New Failures:
    + xfs/009

Profile: xfs_nocrc_4k
  New Failures:
    + xfs/301

Profile: xfs_reflink_1024
  New Failures:
    + xfs/168
  Resolved Failures:
    - xfs/033

Profile: xfs_reflink_16k_4ks
  New Failures:
    + xfs/059

Profile: xfs_reflink_8k_4ks
  New Failures:
    + xfs/301

Profile: xfs_reflink_dir_bsize_8k
  New Failures:
    + xfs/301

Profile: xfs_reflink_stripe_len
  New Failures:
    + xfs/301

[0] https://github.com/linux-kdevops/kdevops-results-archive

  Luis
Luis Chamberlain Dec. 16, 2024, 8:05 p.m. UTC | #3
On Mon, Dec 16, 2024 at 10:56:44AM -0800, Luis Chamberlain wrote:
> On Sat, Dec 14, 2024 at 04:02:53AM +0000, Matthew Wilcox wrote:
> > On Fri, Dec 13, 2024 at 07:10:40PM -0800, Luis Chamberlain wrote:
> > > -	do {
> > > +	for_each_bh(bh, head) {
> > >  		if (buffer_uptodate(bh))
> > >  			continue;
> > >  
> > > @@ -2454,7 +2464,9 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block)
> > >  				continue;
> > >  		}
> > >  		arr[nr++] = bh;
> > > -	} while (i++, iblock++, (bh = bh->b_this_page) != head);
> > > +		i++;
> > > +		iblock++;
> > > +	}
> > 
> > This is non-equivalent.  That 'continue' you can see would increment i
> > and iblock.  Now it doesn't.
> 
> Thanks, not sure how I missed that! With that fix in place I ran a full
> baseline against ext4 and all XFS profiles.
> 
> For ext4 the new failures I see are just:
> 
>   * generic/044
>   * generic/045
>   * generic/046

Oh my, these all fail on vanilla v6.12-rc2 so its not the code which is
at fault.

 Luis
Luis Chamberlain Dec. 16, 2024, 9:46 p.m. UTC | #4
On Mon, Dec 16, 2024 at 12:05:54PM -0800, Luis Chamberlain wrote:
> On Mon, Dec 16, 2024 at 10:56:44AM -0800, Luis Chamberlain wrote:
> > On Sat, Dec 14, 2024 at 04:02:53AM +0000, Matthew Wilcox wrote:
> > > On Fri, Dec 13, 2024 at 07:10:40PM -0800, Luis Chamberlain wrote:
> > > > -	do {
> > > > +	for_each_bh(bh, head) {
> > > >  		if (buffer_uptodate(bh))
> > > >  			continue;
> > > >  
> > > > @@ -2454,7 +2464,9 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block)
> > > >  				continue;
> > > >  		}
> > > >  		arr[nr++] = bh;
> > > > -	} while (i++, iblock++, (bh = bh->b_this_page) != head);
> > > > +		i++;
> > > > +		iblock++;
> > > > +	}
> > > 
> > > This is non-equivalent.  That 'continue' you can see would increment i
> > > and iblock.  Now it doesn't.
> > 
> > Thanks, not sure how I missed that! With that fix in place I ran a full
> > baseline against ext4 and all XFS profiles.
> > 
> > For ext4 the new failures I see are just:
> > 
> >   * generic/044
> >   * generic/045
> >   * generic/046
> 
> Oh my, these all fail on vanilla v6.12-rc2 so its not the code which is
> at fault.

I looked inside my bag of "tribal knowedlge" and found that these are
known to fail because by default ext4 uses mount -o data=ordered mode
in favor of performance instead of the     mount -o data=journal mode.
And I confirm using mount -o data=journal fixes this for both the
baselines v6.13-rc2 and with these patches. In fstets you do that with:

MOUNT_OPTIONS='-o data=journal'

And so these failure are part of the baseline, and so, so far no
regressions are found with ext4 with this patch series.

  Luis
Luis Chamberlain Dec. 17, 2024, 8:46 a.m. UTC | #5
So all XFS failures were due to flaky tests and failures, reproducible
on the baseline.
 
  Luis
Hannes Reinecke Dec. 17, 2024, 9:57 a.m. UTC | #6
On 12/14/24 04:10, Luis Chamberlain wrote:
> We want to be able to work through all buffer heads on a folio
> for an async read, but in the future we want to support the option
> to stop before we've processed all linked buffer heads. To make
> code easier to read and follow adopt a for_each_bh(tmp, head) loop
> instead of using a do { ... } while () to make the code easier to
> read and later be expanded in subsequent patches.
> 
> This introduces no functional changes.
> 
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>   fs/buffer.c | 18 +++++++++++++++---
>   1 file changed, 15 insertions(+), 3 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
diff mbox series

Patch

diff --git a/fs/buffer.c b/fs/buffer.c
index 580451337efa..108e1c36fc1a 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2392,6 +2392,17 @@  static void bh_read_batch_async(struct folio *folio,
 	}
 }
 
+#define bh_is_last(__bh, __head) ((__bh)->b_this_page == (__head))
+
+#define bh_next(__bh, __head) \
+    (bh_is_last(__bh, __head) ? NULL : (__bh)->b_this_page)
+
+/* Starts from the provided head */
+#define for_each_bh(__tmp, __head)			\
+    for ((__tmp) = (__head);				\
+         (__tmp);					\
+         (__tmp) = bh_next(__tmp, __head))
+
 /*
  * Generic "read_folio" function for block devices that have the normal
  * get_block functionality. This is most of the block device filesystems.
@@ -2421,11 +2432,10 @@  int block_read_full_folio(struct folio *folio, get_block_t *get_block)
 
 	iblock = div_u64(folio_pos(folio), blocksize);
 	lblock = div_u64(limit + blocksize - 1, blocksize);
-	bh = head;
 	nr = 0;
 	i = 0;
 
-	do {
+	for_each_bh(bh, head) {
 		if (buffer_uptodate(bh))
 			continue;
 
@@ -2454,7 +2464,9 @@  int block_read_full_folio(struct folio *folio, get_block_t *get_block)
 				continue;
 		}
 		arr[nr++] = bh;
-	} while (i++, iblock++, (bh = bh->b_this_page) != head);
+		i++;
+		iblock++;
+	}
 
 	bh_read_batch_async(folio, nr, arr, fully_mapped, nr == 0, page_error);