mm/readahead: Fix large folio support in async readahead

Message ID	20241106092114.8408-1-laoar.shao@gmail.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Yafang Shao <laoar.shao@gmail.com> To: willy@infradead.org, akpm@linux-foundation.org Cc: linux-mm@kvack.org, Yafang Shao <laoar.shao@gmail.com> Subject: [PATCH] mm/readahead: Fix large folio support in async readahead Date: Wed, 6 Nov 2024 17:21:14 +0800 Message-Id: <20241106092114.8408-1-laoar.shao@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm/readahead: Fix large folio support in async readahead \| expand mm/readahead: Fix large folio support in async readahead

Message ID

20241106092114.8408-1-laoar.shao@gmail.com (mailing list archive)

State

New

Headers

From: Yafang Shao <laoar.shao@gmail.com>
To: willy@infradead.org,
	akpm@linux-foundation.org
Cc: linux-mm@kvack.org,
	Yafang Shao <laoar.shao@gmail.com>
Subject: [PATCH] mm/readahead: Fix large folio support in async readahead
Date: Wed,  6 Nov 2024 17:21:14 +0800
Message-Id: <20241106092114.8408-1-laoar.shao@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

mm/readahead: Fix large folio support in async readahead | expand

Commit Message

Yafang Shao Nov. 6, 2024, 9:21 a.m. UTC

When testing large folio support with XFS on our servers, we observed that
only a few large folios are mapped when reading large files via mmap.
After a thorough analysis, I identified it was caused by the
`/sys/block/*/queue/read_ahead_kb` setting. On our test servers, this
parameter is set to 128KB. After I tune it to 2MB, the large folio can
work as expected. However, I believe the large folio behavior should not be
dependent on the value of read_ahead_kb. It would be more robust if the
kernel can automatically adopt to it.

With `/sys/block/*/queue/read_ahead_kb` set to a non-2MB aligned size,
this issue can be verified with a simple test case, as shown below:

      #define LEN (1024 * 1024 * 1024) // 1GB file
      int main(int argc, char *argv[])
      {
          char *addr;
          int fd, i;

          fd = open("data", O_RDWR);
          if (fd < 0) {
              perror("open");
              exit(-1);
          }

          addr = mmap(NULL, LEN, PROT_READ|PROT_WRITE,
                      MAP_SHARED, fd, 0);
          if (addr == MAP_FAILED) {
              perror("mmap");
              exit(-1);
          }

          if (madvise(addr, LEN, MADV_HUGEPAGE)) {
              perror("madvise");
              exit(-1);
          }

          for (i = 0; i < LEN / 4096; i++)
                memset(addr + i * 4096, 1, 1);

          while (1) {} // Verifiable with /proc/meminfo

          munmap(addr, LEN);
          close(fd);
          exit(0);
      }

When large folio support is enabled and read_ahead_kb is set to a smaller
value, ra->size (4MB) may exceed the maximum allowed size (e.g., 128KB). To
address this, we need to add a conditional check for such cases. However,
this alone is insufficient, as users might set read_ahead_kb to a larger,
non-hugepage-aligned value (e.g., 4MB + 128KB). In these instances, it is
essential to explicitly align ra->size with the hugepage size.

Fixes: 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file mappings")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
---
 mm/readahead.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Changes: 
RFC->v1:
- Simplify the code as suggested by Matthew

RFC: https://lore.kernel.org/linux-mm/20241104143015.34684-1-laoar.shao@gmail.com/

Comments

Andrew Morton Nov. 6, 2024, 9:03 p.m. UTC | #1

On Wed,  6 Nov 2024 17:21:14 +0800 Yafang Shao <laoar.shao@gmail.com> wrote:

> When large folio support is enabled and read_ahead_kb is set to a smaller
> value, ra->size (4MB) may exceed the maximum allowed size (e.g., 128KB). To
> address this, we need to add a conditional check for such cases. However,
> this alone is insufficient, as users might set read_ahead_kb to a larger,
> non-hugepage-aligned value (e.g., 4MB + 128KB). In these instances, it is
> essential to explicitly align ra->size with the hugepage size.

How much performance improvement is this likely to offer our users? 
IOW, should we consider backporting it?

(I bet anyone who comes across this will say "oh goody" and backport it
anyway, so why not do this for them?)

diff --git a/mm/readahead.c b/mm/readahead.c
index 3dc6c7a128dd..9e2c6168ebfa 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -385,6 +385,8 @@  static unsigned long get_next_ra_size(struct file_ra_state *ra,
 		return 4 * cur;
 	if (cur <= max / 2)
 		return 2 * cur;
+	if (cur > max)
+		return cur;
 	return max;
 }
 
@@ -642,7 +644,7 @@  void page_cache_async_ra(struct readahead_control *ractl,
 			1UL << order);
 	if (index == expected) {
 		ra->start += ra->size;
-		ra->size = get_next_ra_size(ra, max_pages);
+		ra->size = ALIGN(get_next_ra_size(ra, max_pages), 1 << order);
 		ra->async_size = ra->size;
 		goto readit;
 	}

mm/readahead: Fix large folio support in async readahead

Commit Message

Comments

Patch