Message ID | 20220720022118.1495752-1-yang.yang29@zte.com.cn (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fs: drop_caches: skip dropping pagecache which is always dirty | expand |
On Wed, Jul 20, 2022 at 02:21:19AM +0000, cgel.zte@gmail.com wrote: > From: Yang Yang <yang.yang29@zte.com.cn> > > Pagecache of some kind of fs has PG_dirty bit set once it was > allocated, so it can't be dropped. These fs include ramfs and > tmpfs. This can make drop_pagecache_sb() more efficient. Why do we want to make drop_pagecache_sb() more efficient?
On Wed, Jul 20, 2022 at 04:02:40AM +0100, Matthew Wilcox wrote: > On Wed, Jul 20, 2022 at 02:21:19AM +0000, cgel.zte@gmail.com wrote: > > From: Yang Yang <yang.yang29@zte.com.cn> > > > > Pagecache of some kind of fs has PG_dirty bit set once it was > > allocated, so it can't be dropped. These fs include ramfs and > > tmpfs. This can make drop_pagecache_sb() more efficient. > > Why do we want to make drop_pagecache_sb() more efficient? Some users may use drop_caches besides testing or debugging. For example, some systems will create a lot of pagecache when boot up while reading bzImage, ramdisk, docker images etc. Most of this pagecache is useless after boot up. It may has a longterm negative effects for the workload when trigger page reclaim. It is especially harmful when trigger direct_reclaim or we need allocate pages in atomic context. So users may chose to drop_caches after boot up.
On Wed, Jul 20, 2022 at 06:02:32AM +0000, CGEL wrote: > For example, some systems will create a lot of pagecache when boot up > while reading bzImage, ramdisk, docker images etc. Most of this pagecache > is useless after boot up. It may has a longterm negative effects for the > workload when trigger page reclaim. It is especially harmful when trigger > direct_reclaim or we need allocate pages in atomic context. So users may > chose to drop_caches after boot up. It is purely a debug interface. If you want to drop specific page cache that needs to be done through madvise.
On Tue, Jul 19, 2022 at 11:04:36PM -0700, Christoph Hellwig wrote: > On Wed, Jul 20, 2022 at 06:02:32AM +0000, CGEL wrote: > > For example, some systems will create a lot of pagecache when boot up > > while reading bzImage, ramdisk, docker images etc. Most of this pagecache > > is useless after boot up. It may has a longterm negative effects for the > > workload when trigger page reclaim. It is especially harmful when trigger > > direct_reclaim or we need allocate pages in atomic context. So users may > > chose to drop_caches after boot up. > > It is purely a debug interface. If you want to drop specific page cache > that needs to be done through madvise. It's not easy for users to use madvise in complex system, it takes cost. Since drop_caches is not forbidden, and users may want simply use it in the scene above mail described.
On Wed, Jul 20, 2022 at 06:02:32AM +0000, CGEL wrote: > On Wed, Jul 20, 2022 at 04:02:40AM +0100, Matthew Wilcox wrote: > > On Wed, Jul 20, 2022 at 02:21:19AM +0000, cgel.zte@gmail.com wrote: > > > From: Yang Yang <yang.yang29@zte.com.cn> > > > > > > Pagecache of some kind of fs has PG_dirty bit set once it was > > > allocated, so it can't be dropped. These fs include ramfs and > > > tmpfs. This can make drop_pagecache_sb() more efficient. > > > > Why do we want to make drop_pagecache_sb() more efficient? > > Some users may use drop_caches besides testing or debugging. This is a terrible reason. > For example, some systems will create a lot of pagecache when boot up > while reading bzImage, ramdisk, docker images etc. Most of this pagecache > is useless after boot up. It may has a longterm negative effects for the > workload when trigger page reclaim. It is especially harmful when trigger > direct_reclaim or we need allocate pages in atomic context. So users may > chose to drop_caches after boot up. If that's actually a problem, work on fixing that.
On Wed, Jul 20, 2022 at 04:02:04PM +0100, Matthew Wilcox wrote: > On Wed, Jul 20, 2022 at 06:02:32AM +0000, CGEL wrote: > > On Wed, Jul 20, 2022 at 04:02:40AM +0100, Matthew Wilcox wrote: > > > On Wed, Jul 20, 2022 at 02:21:19AM +0000, cgel.zte@gmail.com wrote: > > > > From: Yang Yang <yang.yang29@zte.com.cn> > > > > > > > > Pagecache of some kind of fs has PG_dirty bit set once it was > > > > allocated, so it can't be dropped. These fs include ramfs and > > > > tmpfs. This can make drop_pagecache_sb() more efficient. > > > > > > Why do we want to make drop_pagecache_sb() more efficient? > > > > Some users may use drop_caches besides testing or debugging. > > This is a terrible reason. > Another case that may use drop_caches: "Migration of virtual machines will go faster if there are fewer pages to copy, so administrators would like to be able to force a virtual machine to reclaim as much memory as possible before the migration begins. " See https://lwn.net/Articles/894849/ > > For example, some systems will create a lot of pagecache when boot up > > while reading bzImage, ramdisk, docker images etc. Most of this pagecache > > is useless after boot up. It may has a longterm negative effects for the > > workload when trigger page reclaim. It is especially harmful when trigger > > direct_reclaim or we need allocate pages in atomic context. So users may > > chose to drop_caches after boot up. > > If that's actually a problem, work on fixing that.
diff --git a/fs/drop_caches.c b/fs/drop_caches.c index e619c31b6bd9..16956d5d3922 100644 --- a/fs/drop_caches.c +++ b/fs/drop_caches.c @@ -19,6 +19,13 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused) { struct inode *inode, *toput_inode = NULL; + /* + * Pagecache of this kind of fs has PG_dirty bit set once it was + * allocated, so it can't be dropped. + */ + if (sb->s_type->fs_flags & FS_ALWAYS_DIRTY) + return; + spin_lock(&sb->s_inode_list_lock); list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { spin_lock(&inode->i_lock); diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c index bc66d0173e33..5fb62d37618f 100644 --- a/fs/ramfs/inode.c +++ b/fs/ramfs/inode.c @@ -289,7 +289,7 @@ static struct file_system_type ramfs_fs_type = { .init_fs_context = ramfs_init_fs_context, .parameters = ramfs_fs_parameters, .kill_sb = ramfs_kill_sb, - .fs_flags = FS_USERNS_MOUNT, + .fs_flags = FS_USERNS_MOUNT | FS_ALWAYS_DIRTY, }; static int __init init_ramfs_fs(void) diff --git a/include/linux/fs.h b/include/linux/fs.h index e285bd9d6188..90cdd10d683e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2532,6 +2532,7 @@ struct file_system_type { #define FS_USERNS_MOUNT 8 /* Can be mounted by userns root */ #define FS_DISALLOW_NOTIFY_PERM 16 /* Disable fanotify permission events */ #define FS_ALLOW_IDMAP 32 /* FS has been updated to handle vfs idmappings. */ +#define FS_ALWAYS_DIRTY 64 /* Pagecache is always dirty. */ #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */ int (*init_fs_context)(struct fs_context *); const struct fs_parameter_spec *parameters; diff --git a/mm/shmem.c b/mm/shmem.c index 8baf26eda989..5d549f61735f 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -3974,7 +3974,7 @@ static struct file_system_type shmem_fs_type = { .parameters = shmem_fs_parameters, #endif .kill_sb = kill_litter_super, - .fs_flags = FS_USERNS_MOUNT, + .fs_flags = FS_USERNS_MOUNT | FS_ALWAYS_DIRTY, }; void __init shmem_init(void)