diff mbox series

[v14] mm: don't set readahead flag on a folio when lookahead_size > nr_to_read

Message ID 20241015164106.465253-1-kernel@pankajraghav.com (mailing list archive)
State New
Headers show
Series [v14] mm: don't set readahead flag on a folio when lookahead_size > nr_to_read | expand

Commit Message

Pankaj Raghav (Samsung) Oct. 15, 2024, 4:41 p.m. UTC
From: Pankaj Raghav <p.raghav@samsung.com>

The readahead flag is set on a folio based on the lookahead_size and
nr_to_read. For example, when the readahead happens from index to index
+ nr_to_read, then the readahead `mark` offset from index is set at
nr_to_read - lookahead_size.

There are some scenarios where the lookahead_size > nr_to_read. If this
happens, readahead flag is not set on any folio on the current
readahead window.

There are two problems at the moment in the way `mark` is calculated
when lookahead_size > nr_to_read:

- unsigned long `mark` will be assigned a negative value which can lead
  to unexpected results in extreme cases due to wrap around.

- The current calculation for `mark` with mapping_min_order > 0 gives
  incorrect results when lookahead_size > nr_to_read due to rounding
  up operation.

Explicitly initialize `mark` to be ULONG_MAX and only calculate it
when lookahead_size is within the readahead window.

Fixes: 26cfdb395eef ("readahead: allocate folios with mapping_min_order in readahead")
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
---
 mm/readahead.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)


base-commit: d61a00525464bfc5fe92c6ad713350988e492b88

Comments

Pankaj Raghav (Samsung) Oct. 15, 2024, 5:29 p.m. UTC | #1
Oops. This is v1 not v14.

--
Pankaj
Matthew Wilcox Oct. 15, 2024, 5:33 p.m. UTC | #2
On Tue, Oct 15, 2024 at 06:41:06PM +0200, Pankaj Raghav (Samsung) wrote:

v14?  Where are v1..13 of this patch?  It's the first time I've seen it.

> The readahead flag is set on a folio based on the lookahead_size and
> nr_to_read. For example, when the readahead happens from index to index
> + nr_to_read, then the readahead `mark` offset from index is set at
> nr_to_read - lookahead_size.
> 
> There are some scenarios where the lookahead_size > nr_to_read. If this
> happens, readahead flag is not set on any folio on the current
> readahead window.

Please describe those scenarios, as that's the important bit.

> There are two problems at the moment in the way `mark` is calculated
> when lookahead_size > nr_to_read:
> 
> - unsigned long `mark` will be assigned a negative value which can lead
>   to unexpected results in extreme cases due to wrap around.

Can such an extreme case happen?

> - The current calculation for `mark` with mapping_min_order > 0 gives
>   incorrect results when lookahead_size > nr_to_read due to rounding
>   up operation.
> 
> Explicitly initialize `mark` to be ULONG_MAX and only calculate it
> when lookahead_size is within the readahead window.

You haven't really spelled out the consequences of this properly.
Perhaps a worked example would help.

I think the worst case scenario is that we set the flag on the wrong
folio, causing readahead to occur when it should not.  But maybe I'm
wrong?
Pankaj Raghav (Samsung) Oct. 16, 2024, 10:05 a.m. UTC | #3
On Tue, Oct 15, 2024 at 06:33:11PM +0100, Matthew Wilcox wrote:
> On Tue, Oct 15, 2024 at 06:41:06PM +0200, Pankaj Raghav (Samsung) wrote:
> 
> v14?  Where are v1..13 of this patch?  It's the first time I've seen it.

Sorry for the confusion. My git send script messed up the version
number. It is v1 :)

> 
> > The readahead flag is set on a folio based on the lookahead_size and
> > nr_to_read. For example, when the readahead happens from index to index
> > + nr_to_read, then the readahead `mark` offset from index is set at
> > nr_to_read - lookahead_size.
> > 
> > There are some scenarios where the lookahead_size > nr_to_read. If this
> > happens, readahead flag is not set on any folio on the current
> > readahead window.
> 
> Please describe those scenarios, as that's the important bit.
> 

Yes. I will do that in the next version. do_page_cache_ra() can clamp
the nr_to_read if the readahead window extends beyond EOF.

I think this probably happens when readahead window was created and
the file was truncated before the readahead starts.

> > There are two problems at the moment in the way `mark` is calculated
> > when lookahead_size > nr_to_read:
> > 
> > - unsigned long `mark` will be assigned a negative value which can lead
> >   to unexpected results in extreme cases due to wrap around.
> 
> Can such an extreme case happen?
> 

I think this is highly unlikely. I will probably remove this reason
from the commit message. It was just a bit weird to me that we are
assigning a negative number to an unsigned value which is supposed to be
the offset.

> > - The current calculation for `mark` with mapping_min_order > 0 gives
> >   incorrect results when lookahead_size > nr_to_read due to rounding
> >   up operation.
> > 
> > Explicitly initialize `mark` to be ULONG_MAX and only calculate it
> > when lookahead_size is within the readahead window.
> 
> You haven't really spelled out the consequences of this properly.
> Perhaps a worked example would help.
> 

Got it. I saw this while running generic/476 on XFS with 64k block size.

Let's assume the following values:
index = 128
nr_to_read = 16
lookahead_size = 28
mapping_min_order = 4 (16 pages)

The lookahead_size is actually lying outside the current readahead
window. The calculation without this patch will result in incorrect mark
as follows:

ra_folio_index = round_up(128 + 16 - 28, 16) = 128;
mark = 128 - 128 = 0;

So we will be marking the folio on 0th index with RA flag, even though
we shouldn't have. Does that make sense?

> I think the worst case scenario is that we set the flag on the wrong
> folio, causing readahead to occur when it should not.  But maybe I'm
> wrong?

You are right. We might unnecessarily trigger a readahead where we
should not. I think it is good to mention this consequence as well in
the commit message to be clear.

--
Pankaj
Matthew Wilcox Oct. 16, 2024, 11:57 a.m. UTC | #4
On Wed, Oct 16, 2024 at 03:35:27PM +0530, Pankaj Raghav (Samsung) wrote:
> > > - The current calculation for `mark` with mapping_min_order > 0 gives
> > >   incorrect results when lookahead_size > nr_to_read due to rounding
> > >   up operation.
> > > 
> > > Explicitly initialize `mark` to be ULONG_MAX and only calculate it
> > > when lookahead_size is within the readahead window.
> > 
> > You haven't really spelled out the consequences of this properly.
> > Perhaps a worked example would help.
> 
> Got it. I saw this while running generic/476 on XFS with 64k block size.
> 
> Let's assume the following values:
> index = 128
> nr_to_read = 16
> lookahead_size = 28
> mapping_min_order = 4 (16 pages)
> 
> The lookahead_size is actually lying outside the current readahead
> window. The calculation without this patch will result in incorrect mark
> as follows:
> 
> ra_folio_index = round_up(128 + 16 - 28, 16) = 128;
> mark = 128 - 128 = 0;
> 
> So we will be marking the folio on 0th index with RA flag, even though
> we shouldn't have. Does that make sense?

But we don't go back and find the folio for index 0.  We only consider
the folios we're actually reading for marking.  So if 'mark' lies
outside the readahead window, we simply won't mark any of them.  So I
don't think your patch changes anything.  Or did I miss something?
Pankaj Raghav (Samsung) Oct. 16, 2024, 1:06 p.m. UTC | #5
On Wed, Oct 16, 2024 at 12:57:44PM +0100, Matthew Wilcox wrote:
> On Wed, Oct 16, 2024 at 03:35:27PM +0530, Pankaj Raghav (Samsung) wrote:
> > > > - The current calculation for `mark` with mapping_min_order > 0 gives
> > > >   incorrect results when lookahead_size > nr_to_read due to rounding
> > > >   up operation.
> > > > 
> > > > Explicitly initialize `mark` to be ULONG_MAX and only calculate it
> > > > when lookahead_size is within the readahead window.
> > > 
> > > You haven't really spelled out the consequences of this properly.
> > > Perhaps a worked example would help.
> > 
> > Got it. I saw this while running generic/476 on XFS with 64k block size.
> > 
> > Let's assume the following values:
> > index = 128
> > nr_to_read = 16
> > lookahead_size = 28
> > mapping_min_order = 4 (16 pages)
> > 
> > The lookahead_size is actually lying outside the current readahead
> > window. The calculation without this patch will result in incorrect mark
> > as follows:
> > 
> > ra_folio_index = round_up(128 + 16 - 28, 16) = 128;
> > mark = 128 - 128 = 0;
> > 
> > So we will be marking the folio on 0th index with RA flag, even though

Oops. I shouldn't have said 0th index. I meant at offset 0 from the
index.

> > we shouldn't have. Does that make sense?
> 
> But we don't go back and find the folio for index 0.  We only consider
> the folios we're actually reading for marking.  So if 'mark' lies
> outside the readahead window, we simply won't mark any of them.  So I

`mark` is the offset from index. So we compare `mark` with the 
iterator `i`, which starts at 0. So we will set the RA flag on index
128 in this example, which is not correct.

if (i == mark)
	folio_set_readahead(folio);

With this patch, mark will be ULONG_MAX and not 0.

> don't think your patch changes anything.  Or did I miss something?

Does that make sense?
diff mbox series

Patch

diff --git a/mm/readahead.c b/mm/readahead.c
index 3dc6c7a128dd..475d2940a1ed 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -206,9 +206,9 @@  void page_cache_ra_unbounded(struct readahead_control *ractl,
 		unsigned long nr_to_read, unsigned long lookahead_size)
 {
 	struct address_space *mapping = ractl->mapping;
-	unsigned long ra_folio_index, index = readahead_index(ractl);
+	unsigned long index = readahead_index(ractl);
 	gfp_t gfp_mask = readahead_gfp_mask(mapping);
-	unsigned long mark, i = 0;
+	unsigned long mark = ULONG_MAX, i = 0;
 	unsigned int min_nrpages = mapping_min_folio_nrpages(mapping);
 
 	/*
@@ -232,9 +232,14 @@  void page_cache_ra_unbounded(struct readahead_control *ractl,
 	 * index that only has lookahead or "async_region" to set the
 	 * readahead flag.
 	 */
-	ra_folio_index = round_up(readahead_index(ractl) + nr_to_read - lookahead_size,
-				  min_nrpages);
-	mark = ra_folio_index - index;
+	if (lookahead_size <= nr_to_read) {
+		unsigned long ra_folio_index;
+
+		ra_folio_index = round_up(readahead_index(ractl) +
+					  nr_to_read - lookahead_size,
+					  min_nrpages);
+		mark = ra_folio_index - index;
+	}
 	nr_to_read += readahead_index(ractl) - index;
 	ractl->_index = index;