diff mbox series

[v10,12/14] mm: multi-gen LRU: debugfs interface

Message ID 20220407031525.2368067-13-yuzhao@google.com (mailing list archive)
State New
Headers show
Series Multi-Gen LRU Framework | expand

Commit Message

Yu Zhao April 7, 2022, 3:15 a.m. UTC
Add /sys/kernel/debug/lru_gen for working set estimation and proactive
reclaim. These features are required to optimize job scheduling (bin
packing) in data centers [1][2].

Compared with the page table-based approach and the PFN-based
approach, e.g., mm/damon/[vp]addr.c, this lruvec-based approach has
the following advantages:
1. It offers better choices because it is aware of memcgs, NUMA nodes,
   shared mappings and unmapped page cache.
2. It is more scalable because it is O(nr_hot_pages), whereas the
   PFN-based approach is O(nr_total_pages).

Add /sys/kernel/debug/lru_gen_full for debugging.

[1] https://dl.acm.org/doi/10.1145/3297858.3304053
[2] https://dl.acm.org/doi/10.1145/3503222.3507731

Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
 include/linux/nodemask.h |   1 +
 mm/vmscan.c              | 346 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 347 insertions(+)

Comments

Andrew Morton April 12, 2022, 2:16 a.m. UTC | #1
On Wed,  6 Apr 2022 21:15:24 -0600 Yu Zhao <yuzhao@google.com> wrote:

> Add /sys/kernel/debug/lru_gen for working set estimation and proactive
> reclaim. These features are required to optimize job scheduling (bin
> packing) in data centers [1][2].

debugfs is for ephemeral temp stuf which can and will change or
disappear at any time.  Anything which is "required" by userspace
should not be in debufgs.

Presumably sysfs is the place.  Fully documented and with usage
examples in the changelog so we can carefully review the proposed
extensions to Linux's ABI.  Extensions which must be maintained
unchanged for all time.
Yu Zhao April 16, 2022, 12:03 a.m. UTC | #2
On Mon, Apr 11, 2022 at 8:16 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Wed,  6 Apr 2022 21:15:24 -0600 Yu Zhao <yuzhao@google.com> wrote:
>
> > Add /sys/kernel/debug/lru_gen for working set estimation and proactive
> > reclaim. These features are required to optimize job scheduling (bin
> > packing) in data centers [1][2].
>
> debugfs is for ephemeral temp stuf which can and will change or
> disappear at any time.  Anything which is "required" by userspace
> should not be in debufgs.

Right. "required" is probably a poor choice of words. "These
techniques are commonly used to optimize job scheduling" would sound
better.

> Presumably sysfs is the place.  Fully documented and with usage
> examples in the changelog so we can carefully review the proposed
> extensions to Linux's ABI.  Extensions which must be maintained
> unchanged for all time.

Eventually, yes. There still is a long way to go. Rest assured, this
is something Google will keep investing resources on.
Andrew Morton April 16, 2022, 4:20 a.m. UTC | #3
On Fri, 15 Apr 2022 18:03:16 -0600 Yu Zhao <yuzhao@google.com> wrote:

> > Presumably sysfs is the place.  Fully documented and with usage
> > examples in the changelog so we can carefully review the proposed
> > extensions to Linux's ABI.  Extensions which must be maintained
> > unchanged for all time.
> 
> Eventually, yes. There still is a long way to go. Rest assured, this
> is something Google will keep investing resources on.

So.  The plan is to put these interfaces in debugfs for now, with a
view to migrating stabilized interfaces into sysfs (or procfs or
whatever) once end-user requirements and use cases are better
understood?

If so, that sounds totally great to me.  But it should have been in
the darn changelog!  This is the sort of thing which we care about most
keenly.

It would be helpful for reviewers to understand the proposed timeline
for this process, because the entire feature isn't really real until
this is completed, is it?  I do think we should get this nailed down
relatively rapidly, otherwise people will be reluctant to invest much
into a moving target.

And I must say, I see dissonance between the overall maturity of the
feature as described in these emails versus the immaturity of these
userspace control interfaces.  What's happening there?
Yu Zhao April 26, 2022, 6:59 a.m. UTC | #4
On Fri, Apr 15, 2022 at 10:20 PM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> On Fri, 15 Apr 2022 18:03:16 -0600 Yu Zhao <yuzhao@google.com> wrote:
>
> > > Presumably sysfs is the place.  Fully documented and with usage
> > > examples in the changelog so we can carefully review the proposed
> > > extensions to Linux's ABI.  Extensions which must be maintained
> > > unchanged for all time.
> >
> > Eventually, yes. There still is a long way to go. Rest assured, this
> > is something Google will keep investing resources on.
>
> So.  The plan is to put these interfaces in debugfs for now, with a
> view to migrating stabilized interfaces into sysfs (or procfs or
> whatever) once end-user requirements and use cases are better
> understood?

The requirements are well understood and the use cases are proven,
e.g., Google [1], Meta [2] and Alibaba [3].

[1] https://dl.acm.org/doi/10.1145/3297858.3304053
[2] https://dl.acm.org/doi/10.1145/3503222.3507731
[3] https://gitee.com/anolis/cloud-kernel/blob/release-5.10/mm/kidled.c

> If so, that sounds totally great to me.  But it should have been in
> the darn changelog!  This is the sort of thing which we care about most
> keenly.
>
> It would be helpful for reviewers to understand the proposed timeline
> for this process, because the entire feature isn't really real until
> this is completed, is it?  I do think we should get this nailed down
> relatively rapidly, otherwise people will be reluctant to invest much
> into a moving target.
>
> And I must say, I see dissonance between the overall maturity of the
> feature as described in these emails versus the immaturity of these
> userspace control interfaces.  What's happening there?

Very observant. To answer both of the questions above: each iteration
of the entire stack is a multi-year effort.

Given its ROI, companies I know of constantly pour money into this
area. Given its scale, this debugfs is the least of their concerns. A
good example is the proactive reclaim sysfs interface [4]. It's been
used at Google for many years and at Meta for a few years. We only
started finalizing it recently.

[4] https://lore.kernel.org/r/20220425190040.2475377-1-yosryahmed@google.com/
Andrew Morton April 26, 2022, 9:30 p.m. UTC | #5
On Tue, 26 Apr 2022 00:59:37 -0600 Yu Zhao <yuzhao@google.com> wrote:

> On Fri, Apr 15, 2022 at 10:20 PM Andrew Morton
> <akpm@linux-foundation.org> wrote:
> >
> > On Fri, 15 Apr 2022 18:03:16 -0600 Yu Zhao <yuzhao@google.com> wrote:
> >
> > > > Presumably sysfs is the place.  Fully documented and with usage
> > > > examples in the changelog so we can carefully review the proposed
> > > > extensions to Linux's ABI.  Extensions which must be maintained
> > > > unchanged for all time.
> > >
> > > Eventually, yes. There still is a long way to go. Rest assured, this
> > > is something Google will keep investing resources on.
> >
> > So.  The plan is to put these interfaces in debugfs for now, with a
> > view to migrating stabilized interfaces into sysfs (or procfs or
> > whatever) once end-user requirements and use cases are better
> > understood?
> 
> The requirements are well understood and the use cases are proven,
> e.g., Google [1], Meta [2] and Alibaba [3].
> 
> [1] https://dl.acm.org/doi/10.1145/3297858.3304053
> [2] https://dl.acm.org/doi/10.1145/3503222.3507731
> [3] https://gitee.com/anolis/cloud-kernel/blob/release-5.10/mm/kidled.c

So will these interfaces be moved into sysfs?

> > If so, that sounds totally great to me.  But it should have been in
> > the darn changelog!  This is the sort of thing which we care about most
> > keenly.
> >
> > It would be helpful for reviewers to understand the proposed timeline
> > for this process, because the entire feature isn't really real until
> > this is completed, is it?  I do think we should get this nailed down
> > relatively rapidly, otherwise people will be reluctant to invest much
> > into a moving target.
> >
> > And I must say, I see dissonance between the overall maturity of the
> > feature as described in these emails versus the immaturity of these
> > userspace control interfaces.  What's happening there?
> 
> Very observant. To answer both of the questions above: each iteration
> of the entire stack is a multi-year effort.
> 
> Given its ROI, companies I know of constantly pour money into this
> area. Given its scale, this debugfs is the least of their concerns. A
> good example is the proactive reclaim sysfs interface [4]. It's been
> used at Google for many years and at Meta for a few years. We only
> started finalizing it recently.
> 
> [4] https://lore.kernel.org/r/20220425190040.2475377-1-yosryahmed@google.com/

Sure, if one organization is involved in both the userspace code and
the kernel interfaces then the alteration of kernel interfaces can be
handled in a coordinated fashion.

But releasing interfaces to the whole world is a different deal.  It's
acceptable to say "this is in debugfs for now because it's a work
in progress" but it sounds like mglru's interfaces are beyond that
stage?
Yu Zhao April 26, 2022, 10:15 p.m. UTC | #6
On Tue, Apr 26, 2022 at 3:30 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Tue, 26 Apr 2022 00:59:37 -0600 Yu Zhao <yuzhao@google.com> wrote:
>
> > On Fri, Apr 15, 2022 at 10:20 PM Andrew Morton
> > <akpm@linux-foundation.org> wrote:
> > >
> > > On Fri, 15 Apr 2022 18:03:16 -0600 Yu Zhao <yuzhao@google.com> wrote:
> > >
> > > > > Presumably sysfs is the place.  Fully documented and with usage
> > > > > examples in the changelog so we can carefully review the proposed
> > > > > extensions to Linux's ABI.  Extensions which must be maintained
> > > > > unchanged for all time.
> > > >
> > > > Eventually, yes. There still is a long way to go. Rest assured, this
> > > > is something Google will keep investing resources on.
> > >
> > > So.  The plan is to put these interfaces in debugfs for now, with a
> > > view to migrating stabilized interfaces into sysfs (or procfs or
> > > whatever) once end-user requirements and use cases are better
> > > understood?
> >
> > The requirements are well understood and the use cases are proven,
> > e.g., Google [1], Meta [2] and Alibaba [3].
> >
> > [1] https://dl.acm.org/doi/10.1145/3297858.3304053
> > [2] https://dl.acm.org/doi/10.1145/3503222.3507731
> > [3] https://gitee.com/anolis/cloud-kernel/blob/release-5.10/mm/kidled.c
>
> So will these interfaces be moved into sysfs?

So the debugfs interface from this patch provides:
1. proactive reclaim,
2. working set estimation.

The sysfs interface for item 1 is being finalized by [4], and it's a
subset of this debugfs interface because we want it to be more
general. The sysfs interface for item 2 will be eventually proposed as
well, with the same approach. It will look like a histogram in which
the active/inactive LRU has two bins whereas MGLRU has more bins. Bins
contain pages and multiple bins represent different hotness/coldness.
Since [4] took about two years, I'd say this histogram-like interface
would take no less than a couple of years as well.

This debugfs interface stays even after that, and it will serve its
true purpose (debugging), not a substitute for the sysfs interfaces.

> > > If so, that sounds totally great to me.  But it should have been in
> > > the darn changelog!  This is the sort of thing which we care about most
> > > keenly.
> > >
> > > It would be helpful for reviewers to understand the proposed timeline
> > > for this process, because the entire feature isn't really real until
> > > this is completed, is it?  I do think we should get this nailed down
> > > relatively rapidly, otherwise people will be reluctant to invest much
> > > into a moving target.
> > >
> > > And I must say, I see dissonance between the overall maturity of the
> > > feature as described in these emails versus the immaturity of these
> > > userspace control interfaces.  What's happening there?
> >
> > Very observant. To answer both of the questions above: each iteration
> > of the entire stack is a multi-year effort.
> >
> > Given its ROI, companies I know of constantly pour money into this
> > area. Given its scale, this debugfs is the least of their concerns. A
> > good example is the proactive reclaim sysfs interface [4]. It's been
> > used at Google for many years and at Meta for a few years. We only
> > started finalizing it recently.
> >
> > [4] https://lore.kernel.org/r/20220425190040.2475377-1-yosryahmed@google.com/
>
> Sure, if one organization is involved in both the userspace code and
> the kernel interfaces then the alteration of kernel interfaces can be
> handled in a coordinated fashion.
>
> But releasing interfaces to the whole world is a different deal.  It's
> acceptable to say "this is in debugfs for now because it's a work
> in progress" but it sounds like mglru's interfaces are beyond that
> stage?

Correct. It's a WIP in the sense of "know what needs to be done but
can't get it done immediately", not "don't know what's next; try this
for now".
diff mbox series

Patch

diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index 567c3ddba2c4..90840c459abc 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -486,6 +486,7 @@  static inline int num_node_state(enum node_states state)
 #define first_online_node	0
 #define first_memory_node	0
 #define next_online_node(nid)	(MAX_NUMNODES)
+#define next_memory_node(nid)	(MAX_NUMNODES)
 #define nr_node_ids		1U
 #define nr_online_nodes		1U
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4b7da68b8750..913c28805236 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -53,6 +53,7 @@ 
 #include <linux/pagewalk.h>
 #include <linux/shmem_fs.h>
 #include <linux/ctype.h>
+#include <linux/debugfs.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -5191,6 +5192,348 @@  static struct attribute_group lru_gen_attr_group = {
 	.attrs = lru_gen_attrs,
 };
 
+/******************************************************************************
+ *                          debugfs interface
+ ******************************************************************************/
+
+static void *lru_gen_seq_start(struct seq_file *m, loff_t *pos)
+{
+	struct mem_cgroup *memcg;
+	loff_t nr_to_skip = *pos;
+
+	m->private = kvmalloc(PATH_MAX, GFP_KERNEL);
+	if (!m->private)
+		return ERR_PTR(-ENOMEM);
+
+	memcg = mem_cgroup_iter(NULL, NULL, NULL);
+	do {
+		int nid;
+
+		for_each_node_state(nid, N_MEMORY) {
+			if (!nr_to_skip--)
+				return get_lruvec(memcg, nid);
+		}
+	} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)));
+
+	return NULL;
+}
+
+static void lru_gen_seq_stop(struct seq_file *m, void *v)
+{
+	if (!IS_ERR_OR_NULL(v))
+		mem_cgroup_iter_break(NULL, lruvec_memcg(v));
+
+	kvfree(m->private);
+	m->private = NULL;
+}
+
+static void *lru_gen_seq_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	int nid = lruvec_pgdat(v)->node_id;
+	struct mem_cgroup *memcg = lruvec_memcg(v);
+
+	++*pos;
+
+	nid = next_memory_node(nid);
+	if (nid == MAX_NUMNODES) {
+		memcg = mem_cgroup_iter(NULL, memcg, NULL);
+		if (!memcg)
+			return NULL;
+
+		nid = first_memory_node;
+	}
+
+	return get_lruvec(memcg, nid);
+}
+
+static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec,
+				  unsigned long max_seq, unsigned long *min_seq,
+				  unsigned long seq)
+{
+	int i;
+	int type, tier;
+	int hist = lru_hist_from_seq(seq);
+	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+
+	for (tier = 0; tier < MAX_NR_TIERS; tier++) {
+		seq_printf(m, "            %10d", tier);
+		for (type = 0; type < ANON_AND_FILE; type++) {
+			unsigned long n[3] = {};
+
+			if (seq == max_seq) {
+				n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]);
+				n[1] = READ_ONCE(lrugen->avg_total[type][tier]);
+
+				seq_printf(m, " %10luR %10luT %10lu ", n[0], n[1], n[2]);
+			} else if (seq == min_seq[type] || NR_HIST_GENS > 1) {
+				n[0] = atomic_long_read(&lrugen->refaulted[hist][type][tier]);
+				n[1] = atomic_long_read(&lrugen->evicted[hist][type][tier]);
+				if (tier)
+					n[2] = READ_ONCE(lrugen->protected[hist][type][tier - 1]);
+
+				seq_printf(m, " %10lur %10lue %10lup", n[0], n[1], n[2]);
+			} else
+				seq_puts(m, "          0           0           0 ");
+		}
+		seq_putc(m, '\n');
+	}
+
+	seq_puts(m, "                      ");
+	for (i = 0; i < NR_MM_STATS; i++) {
+		if (seq == max_seq && NR_HIST_GENS == 1)
+			seq_printf(m, " %10lu%c", READ_ONCE(lruvec->mm_state.stats[hist][i]),
+				   toupper(MM_STAT_CODES[i]));
+		else if (seq != max_seq && NR_HIST_GENS > 1)
+			seq_printf(m, " %10lu%c", READ_ONCE(lruvec->mm_state.stats[hist][i]),
+				   MM_STAT_CODES[i]);
+		else
+			seq_puts(m, "          0 ");
+	}
+	seq_putc(m, '\n');
+}
+
+static int lru_gen_seq_show(struct seq_file *m, void *v)
+{
+	unsigned long seq;
+	bool full = !debugfs_real_fops(m->file)->write;
+	struct lruvec *lruvec = v;
+	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+	int nid = lruvec_pgdat(lruvec)->node_id;
+	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
+	DEFINE_MAX_SEQ(lruvec);
+	DEFINE_MIN_SEQ(lruvec);
+
+	if (nid == first_memory_node) {
+		const char *path = memcg ? m->private : "";
+
+#ifdef CONFIG_MEMCG
+		if (memcg)
+			cgroup_path(memcg->css.cgroup, m->private, PATH_MAX);
+#endif
+		seq_printf(m, "memcg %5hu %s\n", mem_cgroup_id(memcg), path);
+	}
+
+	seq_printf(m, " node %5d\n", nid);
+
+	if (!full)
+		seq = min_seq[LRU_GEN_ANON];
+	else if (max_seq >= MAX_NR_GENS)
+		seq = max_seq - MAX_NR_GENS + 1;
+	else
+		seq = 0;
+
+	for (; seq <= max_seq; seq++) {
+		int type, zone;
+		int gen = lru_gen_from_seq(seq);
+		unsigned long birth = READ_ONCE(lruvec->lrugen.timestamps[gen]);
+
+		seq_printf(m, " %10lu %10u", seq, jiffies_to_msecs(jiffies - birth));
+
+		for (type = 0; type < ANON_AND_FILE; type++) {
+			long size = 0;
+			char mark = full && seq < min_seq[type] ? 'x' : ' ';
+
+			for (zone = 0; zone < MAX_NR_ZONES; zone++)
+				size += READ_ONCE(lrugen->nr_pages[gen][type][zone]);
+
+			seq_printf(m, " %10lu%c", max(size, 0L), mark);
+		}
+
+		seq_putc(m, '\n');
+
+		if (full)
+			lru_gen_seq_show_full(m, lruvec, max_seq, min_seq, seq);
+	}
+
+	return 0;
+}
+
+static const struct seq_operations lru_gen_seq_ops = {
+	.start = lru_gen_seq_start,
+	.stop = lru_gen_seq_stop,
+	.next = lru_gen_seq_next,
+	.show = lru_gen_seq_show,
+};
+
+static int run_aging(struct lruvec *lruvec, unsigned long seq, struct scan_control *sc,
+		     bool can_swap, bool full_scan)
+{
+	DEFINE_MAX_SEQ(lruvec);
+
+	if (seq == max_seq)
+		try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, full_scan);
+
+	return seq > max_seq ? -EINVAL : 0;
+}
+
+static int run_eviction(struct lruvec *lruvec, unsigned long seq, struct scan_control *sc,
+			int swappiness, unsigned long nr_to_reclaim)
+{
+	struct blk_plug plug;
+	int err = -EINTR;
+	DEFINE_MAX_SEQ(lruvec);
+
+	if (seq + MIN_NR_GENS > max_seq)
+		return -EINVAL;
+
+	sc->nr_reclaimed = 0;
+
+	blk_start_plug(&plug);
+
+	while (!signal_pending(current)) {
+		DEFINE_MIN_SEQ(lruvec);
+
+		if (seq < min_seq[!swappiness] || sc->nr_reclaimed >= nr_to_reclaim ||
+		    !evict_folios(lruvec, sc, swappiness, NULL)) {
+			err = 0;
+			break;
+		}
+
+		cond_resched();
+	}
+
+	blk_finish_plug(&plug);
+
+	return err;
+}
+
+static int run_cmd(char cmd, int memcg_id, int nid, unsigned long seq,
+		   struct scan_control *sc, int swappiness, unsigned long opt)
+{
+	struct lruvec *lruvec;
+	int err = -EINVAL;
+	struct mem_cgroup *memcg = NULL;
+
+	if (!mem_cgroup_disabled()) {
+		rcu_read_lock();
+		memcg = mem_cgroup_from_id(memcg_id);
+#ifdef CONFIG_MEMCG
+		if (memcg && !css_tryget(&memcg->css))
+			memcg = NULL;
+#endif
+		rcu_read_unlock();
+
+		if (!memcg)
+			goto done;
+	}
+	if (memcg_id != mem_cgroup_id(memcg))
+		goto done;
+
+	if (nid < 0 || nid >= MAX_NUMNODES || !node_state(nid, N_MEMORY))
+		goto done;
+
+	lruvec = get_lruvec(memcg, nid);
+
+	if (swappiness < 0)
+		swappiness = get_swappiness(lruvec, sc);
+	else if (swappiness > 200)
+		goto done;
+
+	switch (cmd) {
+	case '+':
+		err = run_aging(lruvec, seq, sc, swappiness, opt);
+		break;
+	case '-':
+		err = run_eviction(lruvec, seq, sc, swappiness, opt);
+		break;
+	}
+done:
+	mem_cgroup_put(memcg);
+
+	return err;
+}
+
+static ssize_t lru_gen_seq_write(struct file *file, const char __user *src,
+				 size_t len, loff_t *pos)
+{
+	void *buf;
+	char *cur, *next;
+	unsigned int flags;
+	int err = 0;
+	struct scan_control sc = {
+		.may_writepage = true,
+		.may_unmap = true,
+		.may_swap = true,
+		.reclaim_idx = MAX_NR_ZONES - 1,
+		.gfp_mask = GFP_KERNEL,
+	};
+
+	buf = kvmalloc(len + 1, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	if (copy_from_user(buf, src, len)) {
+		kvfree(buf);
+		return -EFAULT;
+	}
+
+	next = buf;
+	next[len] = '\0';
+
+	sc.reclaim_state.mm_walk = alloc_mm_walk();
+	if (!sc.reclaim_state.mm_walk) {
+		kvfree(buf);
+		return -ENOMEM;
+	}
+
+	set_task_reclaim_state(current, &sc.reclaim_state);
+	flags = memalloc_noreclaim_save();
+
+	while ((cur = strsep(&next, ",;\n"))) {
+		int n;
+		int end;
+		char cmd;
+		unsigned int memcg_id;
+		unsigned int nid;
+		unsigned long seq;
+		unsigned int swappiness = -1;
+		unsigned long opt = -1;
+
+		cur = skip_spaces(cur);
+		if (!*cur)
+			continue;
+
+		n = sscanf(cur, "%c %u %u %lu %n %u %n %lu %n", &cmd, &memcg_id, &nid,
+			   &seq, &end, &swappiness, &end, &opt, &end);
+		if (n < 4 || cur[end]) {
+			err = -EINVAL;
+			break;
+		}
+
+		err = run_cmd(cmd, memcg_id, nid, seq, &sc, swappiness, opt);
+		if (err)
+			break;
+	}
+
+	memalloc_noreclaim_restore(flags);
+	set_task_reclaim_state(current, NULL);
+
+	free_mm_walk(sc.reclaim_state.mm_walk);
+	kvfree(buf);
+
+	return err ? : len;
+}
+
+static int lru_gen_seq_open(struct inode *inode, struct file *file)
+{
+	return seq_open(file, &lru_gen_seq_ops);
+}
+
+static const struct file_operations lru_gen_rw_fops = {
+	.open = lru_gen_seq_open,
+	.read = seq_read,
+	.write = lru_gen_seq_write,
+	.llseek = seq_lseek,
+	.release = seq_release,
+};
+
+static const struct file_operations lru_gen_ro_fops = {
+	.open = lru_gen_seq_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = seq_release,
+};
+
 /******************************************************************************
  *                          initialization
  ******************************************************************************/
@@ -5249,6 +5592,9 @@  static int __init init_lru_gen(void)
 	if (sysfs_create_group(mm_kobj, &lru_gen_attr_group))
 		pr_err("lru_gen: failed to create sysfs group\n");
 
+	debugfs_create_file("lru_gen", 0644, NULL, NULL, &lru_gen_rw_fops);
+	debugfs_create_file("lru_gen_full", 0444, NULL, NULL, &lru_gen_ro_fops);
+
 	return 0;
 };
 late_initcall(init_lru_gen);