diff mbox series

[3/4] mm: workingset: add vmstat counter for shadow nodes

Message ID 20181009184732.762-4-hannes@cmpxchg.org (mailing list archive)
State New, archived
Headers show
Series mm: workingset & shrinker fixes | expand

Commit Message

Johannes Weiner Oct. 9, 2018, 6:47 p.m. UTC
Make it easier to catch bugs in the shadow node shrinker by adding a
counter for the shadow nodes in circulation.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/mmzone.h |  1 +
 mm/vmstat.c            |  1 +
 mm/workingset.c        | 12 ++++++++++--
 3 files changed, 12 insertions(+), 2 deletions(-)

Comments

Andrew Morton Oct. 9, 2018, 10:04 p.m. UTC | #1
On Tue,  9 Oct 2018 14:47:32 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:

> Make it easier to catch bugs in the shadow node shrinker by adding a
> counter for the shadow nodes in circulation.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  include/linux/mmzone.h |  1 +
>  mm/vmstat.c            |  1 +
>  mm/workingset.c        | 12 ++++++++++--
>  3 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 4179e67add3d..d82e80d82aa6 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -161,6 +161,7 @@ enum node_stat_item {
>  	NR_SLAB_UNRECLAIMABLE,
>  	NR_ISOLATED_ANON,	/* Temporary isolated pages from anon lru */
>  	NR_ISOLATED_FILE,	/* Temporary isolated pages from file lru */
> +	WORKINGSET_NODES,

Documentation/admin-guide/cgroup-v2.rst, please.  And please check for
any other missing items while in there?
Andrew Morton Oct. 9, 2018, 10:08 p.m. UTC | #2
On Tue,  9 Oct 2018 14:47:32 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:

> --- a/mm/workingset.c
> +++ b/mm/workingset.c
> @@ -378,11 +378,17 @@ void workingset_update_node(struct xa_node *node)
>  	 * as node->private_list is protected by the i_pages lock.
>  	 */
>  	if (node->count && node->count == node->nr_values) {
> -		if (list_empty(&node->private_list))
> +		if (list_empty(&node->private_list)) {
>  			list_lru_add(&shadow_nodes, &node->private_list);
> +			__inc_lruvec_page_state(virt_to_page(node),
> +						WORKINGSET_NODES);
> +		}
>  	} else {
> -		if (!list_empty(&node->private_list))
> +		if (!list_empty(&node->private_list)) {
>  			list_lru_del(&shadow_nodes, &node->private_list);
> +			__dec_lruvec_page_state(virt_to_page(node),
> +						WORKINGSET_NODES);
> +		}
>  	}
>  }

A bit worried that we're depending on the caller's caller to have
disabled interrupts to avoid subtle and rare errors.

Can we do this?

--- a/mm/workingset.c~mm-workingset-add-vmstat-counter-for-shadow-nodes-fix
+++ a/mm/workingset.c
@@ -377,6 +377,8 @@ void workingset_update_node(struct radix
 	 * already where they should be. The list_empty() test is safe
 	 * as node->private_list is protected by the i_pages lock.
 	 */
+	WARN_ON_ONCE(!irqs_disabled());	/* For __inc_lruvec_page_state */
+
 	if (node->count && node->count == node->exceptional) {
 		if (list_empty(&node->private_list)) {
 			list_lru_add(&shadow_nodes, &node->private_list);
Johannes Weiner Oct. 10, 2018, 2:02 p.m. UTC | #3
On Tue, Oct 09, 2018 at 03:04:01PM -0700, Andrew Morton wrote:
> On Tue,  9 Oct 2018 14:47:32 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > Make it easier to catch bugs in the shadow node shrinker by adding a
> > counter for the shadow nodes in circulation.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > ---
> >  include/linux/mmzone.h |  1 +
> >  mm/vmstat.c            |  1 +
> >  mm/workingset.c        | 12 ++++++++++--
> >  3 files changed, 12 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 4179e67add3d..d82e80d82aa6 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -161,6 +161,7 @@ enum node_stat_item {
> >  	NR_SLAB_UNRECLAIMABLE,
> >  	NR_ISOLATED_ANON,	/* Temporary isolated pages from anon lru */
> >  	NR_ISOLATED_FILE,	/* Temporary isolated pages from file lru */
> > +	WORKINGSET_NODES,
> 
> Documentation/admin-guide/cgroup-v2.rst, please.  And please check for
> any other missing items while in there?

The new counter isn't being added to the per-cgroup memory.stat,
actually, it just shows in /proc/vmstat.

It seemed a bit too low-level for the cgroup interface, and the other
stats in there are in bytes, which isn't straight-forward to calculate
with sl*b packing.

Not that I'm against adding a cgroup breakdown in general, but the
global counter was enough to see if things were working right or not,
so I'd cross that bridge when somebody needs it per cgroup.

But I checked cgroup-v2.rst anyway: all the exported items are
documented. Only the reclaim vs. refault stats were in different
orders: the doc has the refault stats first, the interface leads with
the reclaim stats. The refault stats go better with the page fault
stats, and are probably of more interest (since they have higher
impact on performance) than the LRU shuffling, so maybe this?

---
Subject: [PATCH] mm: memcontrol: fix memory.stat item ordering

The refault stats go better with the page fault stats, and are of
higher interest than the stats on LRU operations. In fact they used to
be grouped together; when the LRU operation stats were added later on,
they were wedged in between.

Move them back together. Documentation/admin-guide/cgroup-v2.rst
already lists them in the right order.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 81b47d0b14d7..ed15f233d31d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5575,6 +5575,13 @@ static int memory_stat_show(struct seq_file *m, void *v)
 	seq_printf(m, "pgfault %lu\n", acc.events[PGFAULT]);
 	seq_printf(m, "pgmajfault %lu\n", acc.events[PGMAJFAULT]);
 
+	seq_printf(m, "workingset_refault %lu\n",
+		   acc.stat[WORKINGSET_REFAULT]);
+	seq_printf(m, "workingset_activate %lu\n",
+		   acc.stat[WORKINGSET_ACTIVATE]);
+	seq_printf(m, "workingset_nodereclaim %lu\n",
+		   acc.stat[WORKINGSET_NODERECLAIM]);
+
 	seq_printf(m, "pgrefill %lu\n", acc.events[PGREFILL]);
 	seq_printf(m, "pgscan %lu\n", acc.events[PGSCAN_KSWAPD] +
 		   acc.events[PGSCAN_DIRECT]);
@@ -5585,13 +5592,6 @@ static int memory_stat_show(struct seq_file *m, void *v)
 	seq_printf(m, "pglazyfree %lu\n", acc.events[PGLAZYFREE]);
 	seq_printf(m, "pglazyfreed %lu\n", acc.events[PGLAZYFREED]);
 
-	seq_printf(m, "workingset_refault %lu\n",
-		   acc.stat[WORKINGSET_REFAULT]);
-	seq_printf(m, "workingset_activate %lu\n",
-		   acc.stat[WORKINGSET_ACTIVATE]);
-	seq_printf(m, "workingset_nodereclaim %lu\n",
-		   acc.stat[WORKINGSET_NODERECLAIM]);
-
 	return 0;
 }
Johannes Weiner Oct. 10, 2018, 3:05 p.m. UTC | #4
On Tue, Oct 09, 2018 at 03:08:45PM -0700, Andrew Morton wrote:
> On Tue,  9 Oct 2018 14:47:32 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > --- a/mm/workingset.c
> > +++ b/mm/workingset.c
> > @@ -378,11 +378,17 @@ void workingset_update_node(struct xa_node *node)
> >  	 * as node->private_list is protected by the i_pages lock.
> >  	 */
> >  	if (node->count && node->count == node->nr_values) {
> > -		if (list_empty(&node->private_list))
> > +		if (list_empty(&node->private_list)) {
> >  			list_lru_add(&shadow_nodes, &node->private_list);
> > +			__inc_lruvec_page_state(virt_to_page(node),
> > +						WORKINGSET_NODES);
> > +		}
> >  	} else {
> > -		if (!list_empty(&node->private_list))
> > +		if (!list_empty(&node->private_list)) {
> >  			list_lru_del(&shadow_nodes, &node->private_list);
> > +			__dec_lruvec_page_state(virt_to_page(node),
> > +						WORKINGSET_NODES);
> > +		}
> >  	}
> >  }
> 
> A bit worried that we're depending on the caller's caller to have
> disabled interrupts to avoid subtle and rare errors.
> 
> Can we do this?

I'm not opposed to it, but the i_pages lock is guaranteed to be held
during the tree update, and that lock is also taken from the io
completion irq to maintain the tree's dirty/writeback state. It seems
like a robust assumption that interrupts will be disabled here.

But all that isn't very obvious from the code at hand, so I wouldn't
mind adding the check for documentation purposes.

It's not a super hot path, but maybe VM_WARN_ON_ONCE()?

> --- a/mm/workingset.c~mm-workingset-add-vmstat-counter-for-shadow-nodes-fix
> +++ a/mm/workingset.c
> @@ -377,6 +377,8 @@ void workingset_update_node(struct radix
>  	 * already where they should be. The list_empty() test is safe
>  	 * as node->private_list is protected by the i_pages lock.
>  	 */
> +	WARN_ON_ONCE(!irqs_disabled());	/* For __inc_lruvec_page_state */
> +
>  	if (node->count && node->count == node->exceptional) {
>  		if (list_empty(&node->private_list)) {
>  			list_lru_add(&shadow_nodes, &node->private_list);
> _
Mel Gorman Oct. 16, 2018, 8:49 a.m. UTC | #5
On Tue, Oct 09, 2018 at 03:08:45PM -0700, Andrew Morton wrote:
> On Tue,  9 Oct 2018 14:47:32 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > --- a/mm/workingset.c
> > +++ b/mm/workingset.c
> > @@ -378,11 +378,17 @@ void workingset_update_node(struct xa_node *node)
> >  	 * as node->private_list is protected by the i_pages lock.
> >  	 */
> >  	if (node->count && node->count == node->nr_values) {
> > -		if (list_empty(&node->private_list))
> > +		if (list_empty(&node->private_list)) {
> >  			list_lru_add(&shadow_nodes, &node->private_list);
> > +			__inc_lruvec_page_state(virt_to_page(node),
> > +						WORKINGSET_NODES);
> > +		}
> >  	} else {
> > -		if (!list_empty(&node->private_list))
> > +		if (!list_empty(&node->private_list)) {
> >  			list_lru_del(&shadow_nodes, &node->private_list);
> > +			__dec_lruvec_page_state(virt_to_page(node),
> > +						WORKINGSET_NODES);
> > +		}
> >  	}
> >  }
> 
> A bit worried that we're depending on the caller's caller to have
> disabled interrupts to avoid subtle and rare errors.
> 
> Can we do this?
> 
> --- a/mm/workingset.c~mm-workingset-add-vmstat-counter-for-shadow-nodes-fix
> +++ a/mm/workingset.c
> @@ -377,6 +377,8 @@ void workingset_update_node(struct radix
>  	 * already where they should be. The list_empty() test is safe
>  	 * as node->private_list is protected by the i_pages lock.
>  	 */
> +	WARN_ON_ONCE(!irqs_disabled());	/* For __inc_lruvec_page_state */
> +
>  	if (node->count && node->count == node->exceptional) {
>  		if (list_empty(&node->private_list)) {
>  			list_lru_add(&shadow_nodes, &node->private_list);

Note that for whatever reason, I've observed that irqs_disabled() is
actually quite an expensive call. I'm not saying the warning is a bad
idea but it should not be sprinkled around unnecessary and may be more
suitable as a debug option.
Andrew Morton Oct. 16, 2018, 10:27 p.m. UTC | #6
On Tue, 16 Oct 2018 09:49:23 +0100 Mel Gorman <mgorman@techsingularity.net> wrote:

> > Can we do this?
> > 
> > --- a/mm/workingset.c~mm-workingset-add-vmstat-counter-for-shadow-nodes-fix
> > +++ a/mm/workingset.c
> > @@ -377,6 +377,8 @@ void workingset_update_node(struct radix
> >  	 * already where they should be. The list_empty() test is safe
> >  	 * as node->private_list is protected by the i_pages lock.
> >  	 */
> > +	WARN_ON_ONCE(!irqs_disabled());	/* For __inc_lruvec_page_state */
> > +
> >  	if (node->count && node->count == node->exceptional) {
> >  		if (list_empty(&node->private_list)) {
> >  			list_lru_add(&shadow_nodes, &node->private_list);
> 
> Note that for whatever reason, I've observed that irqs_disabled() is
> actually quite an expensive call. I'm not saying the warning is a bad
> idea but it should not be sprinkled around unnecessary and may be more
> suitable as a debug option.

Yup, it is now VM_WARN_ON_ONCE().
diff mbox series

Patch

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 4179e67add3d..d82e80d82aa6 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -161,6 +161,7 @@  enum node_stat_item {
 	NR_SLAB_UNRECLAIMABLE,
 	NR_ISOLATED_ANON,	/* Temporary isolated pages from anon lru */
 	NR_ISOLATED_FILE,	/* Temporary isolated pages from file lru */
+	WORKINGSET_NODES,
 	WORKINGSET_REFAULT,
 	WORKINGSET_ACTIVATE,
 	WORKINGSET_RESTORE,
diff --git a/mm/vmstat.c b/mm/vmstat.c
index d08ed044759d..6038ce593ce3 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1143,6 +1143,7 @@  const char * const vmstat_text[] = {
 	"nr_slab_unreclaimable",
 	"nr_isolated_anon",
 	"nr_isolated_file",
+	"workingset_nodes",
 	"workingset_refault",
 	"workingset_activate",
 	"workingset_restore",
diff --git a/mm/workingset.c b/mm/workingset.c
index f564aaa6b71d..cfdf6adf7e7c 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -378,11 +378,17 @@  void workingset_update_node(struct xa_node *node)
 	 * as node->private_list is protected by the i_pages lock.
 	 */
 	if (node->count && node->count == node->nr_values) {
-		if (list_empty(&node->private_list))
+		if (list_empty(&node->private_list)) {
 			list_lru_add(&shadow_nodes, &node->private_list);
+			__inc_lruvec_page_state(virt_to_page(node),
+						WORKINGSET_NODES);
+		}
 	} else {
-		if (!list_empty(&node->private_list))
+		if (!list_empty(&node->private_list)) {
 			list_lru_del(&shadow_nodes, &node->private_list);
+			__dec_lruvec_page_state(virt_to_page(node),
+						WORKINGSET_NODES);
+		}
 	}
 }
 
@@ -472,6 +478,8 @@  static enum lru_status shadow_lru_isolate(struct list_head *item,
 	}
 
 	list_lru_isolate(lru, item);
+	__dec_lruvec_page_state(virt_to_page(node), WORKINGSET_NODES);
+
 	spin_unlock(lru_lock);
 
 	/*