readahead: Avoid multiple marked readahead pages

Message ID	20240104085839.21029-1-jack@suse.cz (mailing list archive)
State	New
Headers	show Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 544FA1EA90 for <linux-fsdevel@vger.kernel.org>; Thu, 4 Jan 2024 08:58:44 +0000 (UTC) From: Jan Kara <jack@suse.cz> To: Matthew Wilcox <willy@infradead.org> Cc: <linux-fsdevel@vger.kernel.org>, <linux-mm@kvack.org>, Jan Kara <jack@suse.cz>, Guo Xuenan <guoxuenan@huawei.com> Subject: [PATCH] readahead: Avoid multiple marked readahead pages Date: Thu, 4 Jan 2024 09:58:39 +0100 Message-Id: <20240104085839.21029-1-jack@suse.cz> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	readahead: Avoid multiple marked readahead pages \| expand readahead: Avoid multiple marked readahead pages

Message ID

20240104085839.21029-1-jack@suse.cz (mailing list archive)

State

New

Headers

From: Jan Kara <jack@suse.cz>
To: Matthew Wilcox <willy@infradead.org>
Cc: <linux-fsdevel@vger.kernel.org>,
	<linux-mm@kvack.org>,
	Jan Kara <jack@suse.cz>,
	Guo Xuenan <guoxuenan@huawei.com>
Subject: [PATCH] readahead: Avoid multiple marked readahead pages
Date: Thu,  4 Jan 2024 09:58:39 +0100
Message-Id: <20240104085839.21029-1-jack@suse.cz>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

readahead: Avoid multiple marked readahead pages | expand

Commit Message

Jan Kara Jan. 4, 2024, 8:58 a.m. UTC

ra_alloc_folio() marks a page that should trigger next round of async
readahead. However it rounds up computed index to the order of page
being allocated. This can however lead to multiple consecutive pages
being marked with readahead flag. Consider situation with index == 1,
mark == 1, order == 0. We insert order 0 page at index 1 and mark it.
Then we bump order to 1, index to 2, mark (still == 1) is rounded up to
2 so page at index 2 is marked as well. Then we bump order to 2, index
is incremented to 4, mark gets rounded to 4 so page at index 4 is marked
as well. The fact that multiple pages get marked within a single
readahead window confuses the readahead logic and results in readahead
window being trimmed back to 1. This situation is triggered in
particular when maximum readahead window size is not a power of two (in
the observed case it was 768 KB) and as a result sequential read
throughput suffers.

Fix the problem by rounding 'mark' down instead of up. Because the index
is naturally aligned to 'order', we are guaranteed 'rounded mark' ==
index iff 'mark' is within the page we are allocating at 'index' and
thus exactly one page is marked with readahead flag as required by the
readahead code and sequential read performance is restored.

This effectively reverts part of commit b9ff43dd2743 ("mm/readahead: Fix
readahead with large folios"). The commit changed the rounding with the
rationale:

"... we were setting the readahead flag on the folio which contains the
last byte read from the block. This is wrong because we will trigger
readahead at the end of the read without waiting to see if a subsequent
read is going to use the pages we just read."

Although this is true, the fact is this was always the case with read
sizes not aligned to folio boundaries and large folios in the page cache
just make the situation more obvious (and frequent). Also for sequential
read workloads it is better to trigger the readahead earlier rather than
later. It is true that the difference in the rounding and thus earlier
triggering of the readahead can result in reading more for semi-random
workloads. However workloads really suffering from this seem to be rare.
In particular I have verified that the workload described in commit
b9ff43dd2743 ("mm/readahead: Fix readahead with large folios") of
reading random 100k blocks from a file like:

[reader]
bs=100k
rw=randread
numjobs=1
size=64g
runtime=60s

is not impacted by the rounding change and achieves ~70MB/s in both
cases.

Fixes: b9ff43dd2743 ("mm/readahead: Fix readahead with large folios")
CC: Guo Xuenan <guoxuenan@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/readahead.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index 6925e6959fd3..3032fbdce276 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -469,7 +469,7 @@  static inline int ra_alloc_folio(struct readahead_control *ractl, pgoff_t index,
 
 	if (!folio)
 		return -ENOMEM;
-	mark = round_up(mark, 1UL << order);
+	mark = round_down(mark, 1UL << order);
 	if (index == mark)
 		folio_set_readahead(folio);
 	err = filemap_add_folio(ractl->mapping, folio, index, gfp);

readahead: Avoid multiple marked readahead pages

Commit Message

Patch