diff mbox series

[01/10] readahead: Make sure sync readahead reads needed page

Message ID 20240625101909.12234-1-jack@suse.cz (mailing list archive)
State New
Headers show
Series mm: Fix various readahead quirks | expand

Commit Message

Jan Kara June 25, 2024, 10:18 a.m. UTC
page_cache_sync_ra() is called when a folio we want to read is not in
the page cache. It is expected that it creates the folio (and perhaps
the following folios as well) and submits reads for them unless some
error happens. However if index == ra->start + ra->size,
ondemand_readahead() will treat the call as another async readahead hit.
Thus ra->start will be advanced and we create pages and queue reads from
ra->start + ra->size further. Consequentially the page at 'index' is not
created and filemap_get_pages() has to always go through
filemap_create_folio() path.

This behavior has particularly unfortunate consequences
when we have two IO threads sequentially reading from a shared file (as
is the case when NFS serves sequential reads). In that case what can
happen is:

suppose ra->size == ra->async_size == 128, ra->start = 512

T1					T2
reads 128 pages at index 512
  - hits async readahead mark
    filemap_readahead()
      ondemand_readahead()
        if (index == expected ...)
	  ra->start = 512 + 128 = 640
          ra->size = 128
	  ra->async_size = 128
	page_cache_ra_order()
	  blocks in ra_alloc_folio()
					reads 128 pages at index 640
					  - no page found
					  page_cache_sync_readahead()
					    ondemand_readahead()
					      if (index == expected ...)
						ra->start = 640 + 128 = 768
						ra->size = 128
						ra->async_size = 128
					    page_cache_ra_order()
					      submits reads from 768
					  - still no page found at index 640
					    filemap_create_folio()
					  - goes on to index 641
					  page_cache_sync_readahead()
					    ondemand_readahead()
					      - founds ra is confused,
					        trims is to small size
  	  finds pages were already inserted

And as a result read performance suffers.

Fix the problem by triggering async readahead case in
ondemand_readahead() only if we are calling the function because we hit
the readahead marker. In any other case we need to read the folio at
'index' and thus we cannot really use the current ra state.

Note that the above situation could be viewed as a special case of
file->f_ra state corruption. In fact two thread reading using the shared
file can also seemingly corrupt file->f_ra in interesting ways due to
concurrent access. I never saw that in practice and the fix is going to
be much more complex so for now at least fix this practical problem
while we ponder about the theoretically correct solution.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/readahead.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff mbox series

Patch

diff --git a/mm/readahead.c b/mm/readahead.c
index c1b23989d9ca..af0fbd302a38 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -580,7 +580,7 @@  static void ondemand_readahead(struct readahead_control *ractl,
 	 */
 	expected = round_down(ra->start + ra->size - ra->async_size,
 			1UL << order);
-	if (index == expected || index == (ra->start + ra->size)) {
+	if (folio && index == expected) {
 		ra->start += ra->size;
 		ra->size = get_next_ra_size(ra, max_pages);
 		ra->async_size = ra->size;