Message ID | 20220216073815.2505536-2-ying.huang@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | NUMA balancing: optimize memory placement for memory tiering system | expand |
On Wed, Feb 16, 2022 at 03:38:13PM +0800, Huang Ying wrote: > In a system with multiple memory types, e.g. DRAM and PMEM, the CPU > and DRAM in one socket will be put in one NUMA node as before, while > the PMEM will be put in another NUMA node as described in the > description of the commit c221c0b0308f ("device-dax: "Hotplug" > persistent memory for use like normal RAM"). So, the NUMA balancing > mechanism will identify all PMEM accesses as remote access and try to > promote the PMEM pages to DRAM. > > To distinguish the number of the inter-type promoted pages from that > of the inter-socket migrated pages. A new vmstat count is added. The > counter is per-node (count in the target node). So this can be used > to identify promotion imbalance among the NUMA nodes. > > Signed-off-by: "Huang, Ying" <ying.huang@intel.com> > Reviewed-by: Yang Shi <shy828301@gmail.com> > Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> > Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Rik van Riel <riel@surriel.com> > Cc: Mel Gorman <mgorman@techsingularity.net> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: Zi Yan <ziy@nvidia.com> > Cc: Wei Xu <weixugc@google.com> > Cc: osalvador <osalvador@suse.de> > Cc: Shakeel Butt <shakeelb@google.com> > Cc: zhongjiang-ali <zhongjiang-ali@linux.alibaba.com> > Cc: linux-kernel@vger.kernel.org > Cc: linux-mm@kvack.org > --- ... > @@ -2072,6 +2072,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, > pg_data_t *pgdat = NODE_DATA(node); > int isolated; > int nr_remaining; > + int nr_succeeded; I think we should make this consistent and make it "unsigned int". That is what migrate_pages() expects, and what the other caller using nr_succeeded (demote_page_list()) already uses. Unless there is a strong reason not to do so. Reviewed-by: Oscar Salvador <osalvador@suse.de>
Oscar Salvador <osalvador@suse.de> writes: > On Wed, Feb 16, 2022 at 03:38:13PM +0800, Huang Ying wrote: >> In a system with multiple memory types, e.g. DRAM and PMEM, the CPU >> and DRAM in one socket will be put in one NUMA node as before, while >> the PMEM will be put in another NUMA node as described in the >> description of the commit c221c0b0308f ("device-dax: "Hotplug" >> persistent memory for use like normal RAM"). So, the NUMA balancing >> mechanism will identify all PMEM accesses as remote access and try to >> promote the PMEM pages to DRAM. >> >> To distinguish the number of the inter-type promoted pages from that >> of the inter-socket migrated pages. A new vmstat count is added. The >> counter is per-node (count in the target node). So this can be used >> to identify promotion imbalance among the NUMA nodes. >> >> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> >> Reviewed-by: Yang Shi <shy828301@gmail.com> >> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> >> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> >> Acked-by: Johannes Weiner <hannes@cmpxchg.org> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: Michal Hocko <mhocko@suse.com> >> Cc: Rik van Riel <riel@surriel.com> >> Cc: Mel Gorman <mgorman@techsingularity.net> >> Cc: Peter Zijlstra <peterz@infradead.org> >> Cc: Dave Hansen <dave.hansen@linux.intel.com> >> Cc: Zi Yan <ziy@nvidia.com> >> Cc: Wei Xu <weixugc@google.com> >> Cc: osalvador <osalvador@suse.de> >> Cc: Shakeel Butt <shakeelb@google.com> >> Cc: zhongjiang-ali <zhongjiang-ali@linux.alibaba.com> >> Cc: linux-kernel@vger.kernel.org >> Cc: linux-mm@kvack.org >> --- > ... > >> @@ -2072,6 +2072,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, >> pg_data_t *pgdat = NODE_DATA(node); >> int isolated; >> int nr_remaining; >> + int nr_succeeded; > > I think we should make this consistent and make it "unsigned int". > That is what migrate_pages() expects, and what the other caller using > nr_succeeded (demote_page_list()) already uses. > Unless there is a strong reason not to do so. Yes. Will do that in the next version. > Reviewed-by: Oscar Salvador <osalvador@suse.de> Thanks! Best Regards, Huang, Ying
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index aed44e9b5d89..44bd054ca12b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -210,6 +210,9 @@ enum node_stat_item { NR_PAGETABLE, /* used for pagetables */ #ifdef CONFIG_SWAP NR_SWAPCACHE, +#endif +#ifdef CONFIG_NUMA_BALANCING + PGPROMOTE_SUCCESS, /* promote successfully */ #endif NR_VM_NODE_STAT_ITEMS }; diff --git a/include/linux/node.h b/include/linux/node.h index bb21fd631b16..81bbf1c0afd3 100644 --- a/include/linux/node.h +++ b/include/linux/node.h @@ -181,4 +181,9 @@ static inline void register_hugetlbfs_with_node(node_registration_func_t reg, #define to_node(device) container_of(device, struct node, dev) +static inline bool node_is_toptier(int node) +{ + return node_state(node, N_CPU); +} + #endif /* _LINUX_NODE_H_ */ diff --git a/mm/migrate.c b/mm/migrate.c index 665dbe8cad72..cb6f3d2a57ce 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2072,6 +2072,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, pg_data_t *pgdat = NODE_DATA(node); int isolated; int nr_remaining; + int nr_succeeded; LIST_HEAD(migratepages); new_page_t *new; bool compound; @@ -2110,7 +2111,8 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, list_add(&page->lru, &migratepages); nr_remaining = migrate_pages(&migratepages, *new, NULL, node, - MIGRATE_ASYNC, MR_NUMA_MISPLACED, NULL); + MIGRATE_ASYNC, MR_NUMA_MISPLACED, + &nr_succeeded); if (nr_remaining) { if (!list_empty(&migratepages)) { list_del(&page->lru); @@ -2119,8 +2121,13 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, putback_lru_page(page); } isolated = 0; - } else - count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_pages); + } + if (nr_succeeded) { + count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded); + if (!node_is_toptier(page_to_nid(page)) && node_is_toptier(node)) + mod_node_page_state(pgdat, PGPROMOTE_SUCCESS, + nr_succeeded); + } BUG_ON(!list_empty(&migratepages)); return isolated; diff --git a/mm/vmstat.c b/mm/vmstat.c index 4057372745d0..846b670dd346 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1242,6 +1242,9 @@ const char * const vmstat_text[] = { #ifdef CONFIG_SWAP "nr_swapcached", #endif +#ifdef CONFIG_NUMA_BALANCING + "pgpromote_success", +#endif /* enum writeback_stat_item counters */ "nr_dirty_threshold",