diff mbox series

[RFC,01/26] block: bdev: blockdev page cache is movable

Message ID 20230418191313.268131-2-hannes@cmpxchg.org (mailing list archive)
State New
Headers show
Series mm: reliable huge page allocator | expand

Commit Message

Johannes Weiner April 18, 2023, 7:12 p.m. UTC
While inspecting page blocks for the type of pages in them, I noticed
a large number of blockdev cache in unmovable blocks. However, these
pages are actually on the LRU, and their mapping has a .migrate_folio
callback; they can be reclaimed and compacted as necessary.

Put them into movable blocks, so they don't cause pollution, and
subsequent proliferation, of unmovable blocks.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 block/bdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Matthew Wilcox April 19, 2023, 4:07 a.m. UTC | #1
On Tue, Apr 18, 2023 at 03:12:48PM -0400, Johannes Weiner wrote:
> While inspecting page blocks for the type of pages in them, I noticed
> a large number of blockdev cache in unmovable blocks. However, these
> pages are actually on the LRU, and their mapping has a .migrate_folio
> callback; they can be reclaimed and compacted as necessary.

Wise to split this out into a separate patch.  Perhaps we can get it
into -next for a while to shake out any problems with it.  I don't have
any specific code that I think is broken, but code like this is in ext2:

	bh = sb_bread(sb, logic_sb_block);
        es = (struct ext2_super_block *) (((char *)bh->b_data) + offset);
        sbi->s_es = es;

ie it reads into the page cache and then keeps a pointer to it during
the lifetime of the mount.

This specific example is, I believe, safe.  There's a refcount on
the buffer (released by brelse() at unmount) and so the page cannot
be migrated.

But that speaks to a different problem; sometimes buffers are held
pinned for short periods of time (eg reading a directory, modifying a
bitmap) and other times they're held pinned for a long period of time
(a superblock).  We notice that pages are being long-term pinned (eg
GUP) and migrate them out of the MOVABLE zone when that happens to them.
Perhaps we need something similar for buffer heads where the filesystem
can specify if it's just having a quick look or if it intends for this
buffer to be pinned over, let's say, a return to userspace.
Mel Gorman April 21, 2023, 12:25 p.m. UTC | #2
On Tue, Apr 18, 2023 at 03:12:48PM -0400, Johannes Weiner wrote:
> While inspecting page blocks for the type of pages in them, I noticed
> a large number of blockdev cache in unmovable blocks. However, these
> pages are actually on the LRU, and their mapping has a .migrate_folio
> callback; they can be reclaimed and compacted as necessary.
> 
> Put them into movable blocks, so they don't cause pollution, and
> subsequent proliferation, of unmovable blocks.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Deserves to be split out and needs careful treatment. I don't recall exactly
but I believe I encountered a problem with these allocations in the very
distant past. Some of the allocations were movable and others weren't
or had very long lifetimes. Something like being superblock-related or
directory inodes being pinned for very long lengths of time. This is a
*long* time ago and the picture almost certainly has changed but it's
very possible that a high percentage, but not 100% of these allocations
are not movable in practice.
diff mbox series

Patch

diff --git a/block/bdev.c b/block/bdev.c
index edc110d90df4..6abe4766d073 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -488,7 +488,7 @@  struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
 	inode->i_mode = S_IFBLK;
 	inode->i_rdev = 0;
 	inode->i_data.a_ops = &def_blk_aops;
-	mapping_set_gfp_mask(&inode->i_data, GFP_USER);
+	mapping_set_gfp_mask(&inode->i_data, GFP_USER|__GFP_MOVABLE);
 
 	bdev = I_BDEV(inode);
 	mutex_init(&bdev->bd_fsfreeze_mutex);