diff mbox series

[RFC,7/9] mm/vmscan: Consider anonymous pages without swap

Message ID 20201007161749.4C56D1F1@viggo.jf.intel.com (mailing list archive)
State New, archived
Headers show
Series Migrate Pages in lieu of discard | expand

Commit Message

Dave Hansen Oct. 7, 2020, 4:17 p.m. UTC
From: Keith Busch <kbusch@kernel.org>

Age and reclaim anonymous pages if a migration path is available. The
node has other recourses for inactive anonymous pages beyond swap,

#Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: Keith Busch <kbusch@kernel.org>
[vishal: fixup the migration->demotion rename]
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>

--

Changes from Dave 06/2020:
 * rename reclaim_anon_pages()->can_reclaim_anon_pages()

Note: Keith's Intel SoB is commented out because he is no
longer at Intel and his @intel.com mail will bouncee
---

 b/include/linux/node.h |    9 +++++++++
 b/mm/vmscan.c          |   33 ++++++++++++++++++++++++++++-----
 2 files changed, 37 insertions(+), 5 deletions(-)

Comments

Oscar Salvador Oct. 29, 2020, 8:14 a.m. UTC | #1
On Wed, Oct 07, 2020 at 09:17:49AM -0700, Dave Hansen wrote:
> 
> From: Keith Busch <kbusch@kernel.org>
> 
> Age and reclaim anonymous pages if a migration path is available. The
> node has other recourses for inactive anonymous pages beyond swap,
> 
> #Signed-off-by: Keith Busch <keith.busch@intel.com>
> Cc: Keith Busch <kbusch@kernel.org>
> [vishal: fixup the migration->demotion rename]
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Yang Shi <yang.shi@linux.alibaba.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: David Hildenbrand <david@redhat.com>

I have a question regarding this one.

It seems that we do have places where we read total_swap_pages directly and other
places where we use get_nr_swap_pages.
One seems to give the total number of swap pages, while the other gives 
the number of free swap pages.

With this patch, we will use always the atomic version get_nr_swap_pages from
now on.
Is that ok? I guess so, but it might warrant a mention in the changelog?

E.g: age_active_anon seems to base one of its decisions on whether we have
swap (it seems it does not care if swap space is available).
Dave Hansen Oct. 29, 2020, 2:33 p.m. UTC | #2
On 10/29/20 1:14 AM, Oscar Salvador wrote:
> With this patch, we will use always the atomic version
> get_nr_swap_pages from now on. Is that ok? I guess so, but it might
> warrant a mention in the changelog?

I _think_ it's OK.  But, you're right that it's a potential behavior
change that's not mentioned in the changelog.

I'll mention it in the changelog and see if I can dream up any other
practical implications from this change.

Thanks for taking a look!
Yang Shi Oct. 29, 2020, 3:57 p.m. UTC | #3
On Thu, Oct 29, 2020 at 7:33 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 10/29/20 1:14 AM, Oscar Salvador wrote:
> > With this patch, we will use always the atomic version
> > get_nr_swap_pages from now on. Is that ok? I guess so, but it might
> > warrant a mention in the changelog?
>
> I _think_ it's OK.  But, you're right that it's a potential behavior
> change that's not mentioned in the changelog.
>
> I'll mention it in the changelog and see if I can dream up any other
> practical implications from this change.

IMHO, we don't have to modify those two places at all. They are used
to rebalance the anon lru active/inactive ratio even if we did not try
to evict anon pages at all, so "total_swap_pages" is used instead of
checking swappiness and available swap space.

The changes may result in imbalanced anon lru.

>
> Thanks for taking a look!
>
Oscar Salvador Oct. 29, 2020, 7:08 p.m. UTC | #4
On Thu, Oct 29, 2020 at 08:57:32AM -0700, Yang Shi wrote:
> IMHO, we don't have to modify those two places at all. They are used
> to rebalance the anon lru active/inactive ratio even if we did not try
> to evict anon pages at all, so "total_swap_pages" is used instead of
> checking swappiness and available swap space.
> 
> The changes may result in imbalanced anon lru.

I might be missing something, so bear with me.

It is true that since we are only rebalancing the lists, we do not need to
check for swap space yet, but here we are also adding a new end-point where we
can migrate to in case of memory pressure.

So in case we can demote pages, it makes sense to proceed with the aging
and rebalancing regardless of whether we have swap in place, right?

But maybe the right procedure would be to perform some sort of the
following check in those two places:

	if (total_swap_pages || can_migrate_to_demote_node)
		- proceed_with_rebalancing_or_aging

--
Oscar Salvador
SUSE L3
Yang Shi Oct. 29, 2020, 7:30 p.m. UTC | #5
On Thu, Oct 29, 2020 at 12:08 PM osalvador <osalvador@suse.de> wrote:
>
> On Thu, Oct 29, 2020 at 08:57:32AM -0700, Yang Shi wrote:
> > IMHO, we don't have to modify those two places at all. They are used
> > to rebalance the anon lru active/inactive ratio even if we did not try
> > to evict anon pages at all, so "total_swap_pages" is used instead of
> > checking swappiness and available swap space.
> >
> > The changes may result in imbalanced anon lru.
>
> I might be missing something, so bear with me.
>
> It is true that since we are only rebalancing the lists, we do not need to
> check for swap space yet, but here we are also adding a new end-point where we
> can migrate to in case of memory pressure.
>
> So in case we can demote pages, it makes sense to proceed with the aging
> and rebalancing regardless of whether we have swap in place, right?

Yes, makes sense. I missed that point.

>
> But maybe the right procedure would be to perform some sort of the
> following check in those two places:
>
>         if (total_swap_pages || can_migrate_to_demote_node)
>                 - proceed_with_rebalancing_or_aging

Looks sane to me.

>
> --
> Oscar Salvador
> SUSE L3
diff mbox series

Patch

diff -puN include/linux/node.h~0009-mm-vmscan-Consider-anonymous-pages-without-swap include/linux/node.h
--- a/include/linux/node.h~0009-mm-vmscan-Consider-anonymous-pages-without-swap	2020-10-07 09:15:33.390642436 -0700
+++ b/include/linux/node.h	2020-10-07 09:15:33.399642436 -0700
@@ -180,4 +180,13 @@  static inline void register_hugetlbfs_wi
 
 #define to_node(device) container_of(device, struct node, dev)
 
+#ifdef CONFIG_MIGRATION
+extern int next_demotion_node(int node);
+#else
+static inline int next_demotion_node(int node)
+{
+	return NUMA_NO_NODE;
+}
+#endif
+
 #endif /* _LINUX_NODE_H_ */
diff -puN mm/vmscan.c~0009-mm-vmscan-Consider-anonymous-pages-without-swap mm/vmscan.c
--- a/mm/vmscan.c~0009-mm-vmscan-Consider-anonymous-pages-without-swap	2020-10-07 09:15:33.392642436 -0700
+++ b/mm/vmscan.c	2020-10-07 09:15:33.400642436 -0700
@@ -290,6 +290,26 @@  static bool writeback_throttling_sane(st
 }
 #endif
 
+static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
+					  int node_id)
+{
+	/* Always age anon pages when we have swap */
+	if (memcg == NULL) {
+		if (get_nr_swap_pages() > 0)
+			return true;
+	} else {
+		if (mem_cgroup_get_nr_swap_pages(memcg) > 0)
+			return true;
+	}
+
+	/* Also age anon pages if we can auto-migrate them */
+	if (next_demotion_node(node_id) >= 0)
+		return true;
+
+	/* No way to reclaim anon pages */
+	return false;
+}
+
 /*
  * This misses isolated pages which are not accounted for to save counters.
  * As the data only determines if reclaim or compaction continues, it is
@@ -301,7 +321,7 @@  unsigned long zone_reclaimable_pages(str
 
 	nr = zone_page_state_snapshot(zone, NR_ZONE_INACTIVE_FILE) +
 		zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_FILE);
-	if (get_nr_swap_pages() > 0)
+	if (can_reclaim_anon_pages(NULL, zone_to_nid(zone)))
 		nr += zone_page_state_snapshot(zone, NR_ZONE_INACTIVE_ANON) +
 			zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_ANON);
 
@@ -2337,6 +2357,7 @@  enum scan_balance {
 static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 			   unsigned long *nr)
 {
+	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
 	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
 	unsigned long anon_cost, file_cost, total_cost;
 	int swappiness = mem_cgroup_swappiness(memcg);
@@ -2347,7 +2368,7 @@  static void get_scan_count(struct lruvec
 	enum lru_list lru;
 
 	/* If we have no swap space, do not bother scanning anon pages. */
-	if (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) <= 0) {
+	if (!sc->may_swap || !can_reclaim_anon_pages(memcg, pgdat->node_id)) {
 		scan_balance = SCAN_FILE;
 		goto out;
 	}
@@ -2631,7 +2652,9 @@  static void shrink_lruvec(struct lruvec
 	 * Even if we did not try to evict anon pages at all, we want to
 	 * rebalance the anon lru active/inactive ratio.
 	 */
-	if (total_swap_pages && inactive_is_low(lruvec, LRU_INACTIVE_ANON))
+	if (can_reclaim_anon_pages(lruvec_memcg(lruvec),
+			       lruvec_pgdat(lruvec)->node_id) &&
+	    inactive_is_low(lruvec, LRU_INACTIVE_ANON))
 		shrink_active_list(SWAP_CLUSTER_MAX, lruvec,
 				   sc, LRU_ACTIVE_ANON);
 }
@@ -2701,7 +2724,7 @@  static inline bool should_continue_recla
 	 */
 	pages_for_compaction = compact_gap(sc->order);
 	inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE);
-	if (get_nr_swap_pages() > 0)
+	if (can_reclaim_anon_pages(NULL, pgdat->node_id))
 		inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON);
 
 	return inactive_lru_pages > pages_for_compaction;
@@ -3460,7 +3483,7 @@  static void age_active_anon(struct pglis
 	struct mem_cgroup *memcg;
 	struct lruvec *lruvec;
 
-	if (!total_swap_pages)
+	if (!can_reclaim_anon_pages(NULL, pgdat->node_id))
 		return;
 
 	lruvec = mem_cgroup_lruvec(NULL, pgdat);