diff mbox series

[RFC,12/26] mm: page_alloc: per-migratetype free counts

Message ID 20230418191313.268131-13-hannes@cmpxchg.org (mailing list archive)
State New
Headers show
Series mm: reliable huge page allocator | expand

Commit Message

Johannes Weiner April 18, 2023, 7:12 p.m. UTC
Increase visibility into the defragmentation behavior by tracking and
reporting per-migratetype free counters.

Subsequent patches will also use those counters to make more targeted
reclaim/compaction decisions.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/mmzone.h |  5 +++++
 mm/page_alloc.c        | 29 +++++++++++++++++++++++++----
 mm/vmstat.c            |  5 +++++
 3 files changed, 35 insertions(+), 4 deletions(-)

Comments

Mel Gorman April 21, 2023, 2:28 p.m. UTC | #1
On Tue, Apr 18, 2023 at 03:12:59PM -0400, Johannes Weiner wrote:
> Increase visibility into the defragmentation behavior by tracking and
> reporting per-migratetype free counters.
> 
> Subsequent patches will also use those counters to make more targeted
> reclaim/compaction decisions.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Visibility into fragmentation behaviour is information that is
almost certainly only useful to a developer and even then, there is
/proc/pagetypeinfo. At minimum, move this patch to later in the series
but I'm skeptical about its benefit.
Johannes Weiner April 21, 2023, 3:35 p.m. UTC | #2
On Fri, Apr 21, 2023 at 03:28:41PM +0100, Mel Gorman wrote:
> On Tue, Apr 18, 2023 at 03:12:59PM -0400, Johannes Weiner wrote:
> > Increase visibility into the defragmentation behavior by tracking and
> > reporting per-migratetype free counters.
> > 
> > Subsequent patches will also use those counters to make more targeted
> > reclaim/compaction decisions.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> Visibility into fragmentation behaviour is information that is
> almost certainly only useful to a developer and even then, there is
> /proc/pagetypeinfo. At minimum, move this patch to later in the series
> but I'm skeptical about its benefit.

Having them available in the memory dump (OOM, sysrq) was essential
while debugging problems in later patches. For OOMs or lockups,
pagetypeinfo isn't available. It would be useful to have them included
in user reports if any issues pop up.

They're used internally in several places later on, too.

I'll expand on the changelog and move them ahead in the series.

Thanks
Mel Gorman April 21, 2023, 4:03 p.m. UTC | #3
On Fri, Apr 21, 2023 at 11:35:01AM -0400, Johannes Weiner wrote:
> On Fri, Apr 21, 2023 at 03:28:41PM +0100, Mel Gorman wrote:
> > On Tue, Apr 18, 2023 at 03:12:59PM -0400, Johannes Weiner wrote:
> > > Increase visibility into the defragmentation behavior by tracking and
> > > reporting per-migratetype free counters.
> > > 
> > > Subsequent patches will also use those counters to make more targeted
> > > reclaim/compaction decisions.
> > > 
> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > 
> > Visibility into fragmentation behaviour is information that is
> > almost certainly only useful to a developer and even then, there is
> > /proc/pagetypeinfo. At minimum, move this patch to later in the series
> > but I'm skeptical about its benefit.
> 
> Having them available in the memory dump (OOM, sysrq) was essential
> while debugging problems in later patches. For OOMs or lockups,
> pagetypeinfo isn't available. It would be useful to have them included
> in user reports if any issues pop up.
> 

OOM+sysrq could optionally take the very expensive step of traversing the
lists to get the count so yes, it helps debugging, but not necessarily
critical.

> They're used internally in several places later on, too.
> 

I did see that for deciding the suitability for compaction. Minimally, put
the patches adjacent in the series and later if possible so that the series
can be taken in parts. There are a lot of patches that should be relatively
uncontroversial so maybe make "mm: page_alloc: introduce MIGRATE_FREE" the
pivot point between incremental improvements and "everything on and after
this patch is relatively high risk, could excessively compact/reclaim,
could livelock etc".
Johannes Weiner April 21, 2023, 4:32 p.m. UTC | #4
On Fri, Apr 21, 2023 at 05:03:20PM +0100, Mel Gorman wrote:
> On Fri, Apr 21, 2023 at 11:35:01AM -0400, Johannes Weiner wrote:
> > On Fri, Apr 21, 2023 at 03:28:41PM +0100, Mel Gorman wrote:
> > > On Tue, Apr 18, 2023 at 03:12:59PM -0400, Johannes Weiner wrote:
> > > > Increase visibility into the defragmentation behavior by tracking and
> > > > reporting per-migratetype free counters.
> > > > 
> > > > Subsequent patches will also use those counters to make more targeted
> > > > reclaim/compaction decisions.
> > > > 
> > > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > > 
> > > Visibility into fragmentation behaviour is information that is
> > > almost certainly only useful to a developer and even then, there is
> > > /proc/pagetypeinfo. At minimum, move this patch to later in the series
> > > but I'm skeptical about its benefit.
> > 
> > Having them available in the memory dump (OOM, sysrq) was essential
> > while debugging problems in later patches. For OOMs or lockups,
> > pagetypeinfo isn't available. It would be useful to have them included
> > in user reports if any issues pop up.
> > 
> 
> OOM+sysrq could optionally take the very expensive step of traversing the
> lists to get the count so yes, it helps debugging, but not necessarily
> critical.
> 
> > They're used internally in several places later on, too.
> > 
> 
> I did see that for deciding the suitability for compaction. Minimally, put
> the patches adjacent in the series and later if possible so that the series
> can be taken in parts. There are a lot of patches that should be relatively
> uncontroversial so maybe make "mm: page_alloc: introduce MIGRATE_FREE" the
> pivot point between incremental improvements and "everything on and after
> this patch is relatively high risk, could excessively compact/reclaim,
> could livelock etc".

Okay, I see now where you're coming from. That's good feedback.

Actually most of the patches work toward the final goal of managing
free memory in whole blocks. The only exception are the block pages,
the nofs deadlock, the page_isolation kernel doc, and *maybe* the
should_[compact|reclaim]_retry cleanups. I tried to find the
standalone value in each of the prep patches as well to avoid
forward-referencing in the series too much. But obviously these
standalone reasons tend to be on the weak side.

I'll rework the changelogs (and patch ordering) where applicable to
try to make the dependencies clearer.
diff mbox series

Patch

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 20542e5a0a43..d1083ab81998 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -139,6 +139,11 @@  enum numa_stat_item {
 enum zone_stat_item {
 	/* First 128 byte cacheline (assuming 64 bit words) */
 	NR_FREE_PAGES,
+	NR_FREE_UNMOVABLE,
+	NR_FREE_MOVABLE,
+	NR_FREE_RECLAIMABLE,
+	NR_FREE_HIGHATOMIC,
+	NR_FREE_FREE,
 	NR_ZONE_LRU_BASE, /* Used only for compaction and reclaim retry */
 	NR_ZONE_INACTIVE_ANON = NR_ZONE_LRU_BASE,
 	NR_ZONE_ACTIVE_ANON,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 44da23625f51..5f2a0037bed1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -959,8 +959,12 @@  static inline void account_freepages(struct page *page, struct zone *zone,
 
 	__mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
 
-	if (is_migrate_cma(migratetype))
+	if (migratetype <= MIGRATE_FREE)
+		__mod_zone_page_state(zone, NR_FREE_UNMOVABLE + migratetype, nr_pages);
+	else if (is_migrate_cma(migratetype))
 		__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages);
+	else
+		VM_WARN_ONCE(1, "unexpected migratetype %d\n", migratetype);
 }
 
 /* Used for pages not on another list */
@@ -6175,7 +6179,9 @@  void __show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_zone_i
 		" mapped:%lu shmem:%lu pagetables:%lu\n"
 		" sec_pagetables:%lu bounce:%lu\n"
 		" kernel_misc_reclaimable:%lu\n"
-		" free:%lu free_pcp:%lu free_cma:%lu\n",
+		" free:%lu free_unmovable:%lu free_movable:%lu\n"
+		" free_reclaimable:%lu free_highatomic:%lu free_free:%lu\n"
+		" free_cma:%lu free_pcp:%lu\n",
 		global_node_page_state(NR_ACTIVE_ANON),
 		global_node_page_state(NR_INACTIVE_ANON),
 		global_node_page_state(NR_ISOLATED_ANON),
@@ -6194,8 +6200,13 @@  void __show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_zone_i
 		global_zone_page_state(NR_BOUNCE),
 		global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE),
 		global_zone_page_state(NR_FREE_PAGES),
-		free_pcp,
-		global_zone_page_state(NR_FREE_CMA_PAGES));
+		global_zone_page_state(NR_FREE_UNMOVABLE),
+		global_zone_page_state(NR_FREE_MOVABLE),
+		global_zone_page_state(NR_FREE_RECLAIMABLE),
+		global_zone_page_state(NR_FREE_HIGHATOMIC),
+		global_zone_page_state(NR_FREE_FREE),
+		global_zone_page_state(NR_FREE_CMA_PAGES),
+		free_pcp);
 
 	for_each_online_pgdat(pgdat) {
 		if (show_mem_node_skip(filter, pgdat->node_id, nodemask))
@@ -6273,6 +6284,11 @@  void __show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_zone_i
 		printk(KERN_CONT
 			"%s"
 			" free:%lukB"
+			" free_unmovable:%lukB"
+			" free_movable:%lukB"
+			" free_reclaimable:%lukB"
+			" free_highatomic:%lukB"
+			" free_free:%lukB"
 			" boost:%lukB"
 			" min:%lukB"
 			" low:%lukB"
@@ -6294,6 +6310,11 @@  void __show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_zone_i
 			"\n",
 			zone->name,
 			K(zone_page_state(zone, NR_FREE_PAGES)),
+			K(zone_page_state(zone, NR_FREE_UNMOVABLE)),
+			K(zone_page_state(zone, NR_FREE_MOVABLE)),
+			K(zone_page_state(zone, NR_FREE_RECLAIMABLE)),
+			K(zone_page_state(zone, NR_FREE_HIGHATOMIC)),
+			K(zone_page_state(zone, NR_FREE_FREE)),
 			K(zone->watermark_boost),
 			K(min_wmark_pages(zone)),
 			K(low_wmark_pages(zone)),
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 1ea6a5ce1c41..c8b8e6e259da 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1168,6 +1168,11 @@  int fragmentation_index(struct zone *zone, unsigned int order)
 const char * const vmstat_text[] = {
 	/* enum zone_stat_item counters */
 	"nr_free_pages",
+	"nr_free_unmovable",
+	"nr_free_movable",
+	"nr_free_reclaimable",
+	"nr_free_highatomic",
+	"nr_free_free",
 	"nr_zone_inactive_anon",
 	"nr_zone_active_anon",
 	"nr_zone_inactive_file",