Message ID | 20230320180745.858515310@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fold per-CPU vmstats remotely | expand |
On Mon, 2023-03-20 at 15:03 -0300, Marcelo Tosatti wrote: > > + > + if (do_pagesets) { > + cond_resched(); > + /* > + * Deal with draining the remote pageset of a > + * processor > + * > + * Check if there are pages remaining in this pageset > + * if not then there is nothing to expire. > + */ > + if (!pcp->expire || !pcp->count) > + continue;refresh_cpu_vm_stats > + > + /* > + * We never drain zones local to this processor. > + */ > + if (zone_to_nid(zone) == cpu_to_node(cpu)) { > + pcp->expire = 0; > + continue; > + } > + > + WARN_ON(pcp->expire < 0); > + /* > + * pcp->expire is only accessed from vmstat_shepherd context, > + * therefore no locking is required. > + */ > + if (--pcp->expire) > + continue; > + > + if (pcp->count) > + drain_zone_pages(zone, pcp); > + } This logic is the same to that for the do_pagesets portion of code in refresh_cpu_vm_stats(). Is it possible to consolidate to avoid replicating the logic across two functions? Tim
On Mon, Mar 20, 2023 at 01:43:09PM -0700, Tim Chen wrote: > On Mon, 2023-03-20 at 15:03 -0300, Marcelo Tosatti wrote: > > > > + > > + if (do_pagesets) { > > + cond_resched(); > > + /* > > + * Deal with draining the remote pageset of a > > + * processor > > + * > > + * Check if there are pages remaining in this pageset > > + * if not then there is nothing to expire. > > + */ > > + if (!pcp->expire || !pcp->count) > > + continue;refresh_cpu_vm_stats > > + > > + /* > > + * We never drain zones local to this processor. > > + */ > > + if (zone_to_nid(zone) == cpu_to_node(cpu)) { > > + pcp->expire = 0; > > + continue; > > + } > > + > > + WARN_ON(pcp->expire < 0); > > + /* > > + * pcp->expire is only accessed from vmstat_shepherd context, > > + * therefore no locking is required. > > + */ > > + if (--pcp->expire) > > + continue; > > + > > + if (pcp->count) > > + drain_zone_pages(zone, pcp); > > + } > > This logic is the same to that for the do_pagesets portion of code in refresh_cpu_vm_stats(). > Is it possible to consolidate to avoid replicating the logic across two functions? > > Tim Tim, One of the versions is for remote access, the other version is for local access (thats why merging the two is not so simple). I suppose the next logical step would be ensure that the current !CONFIG_HAVE_CMPXCHG_LOCAL also uses the cmpxchg remotely (then refresh_cpu_vm_stats can be removed).
Index: linux-vmstat-remote/mm/vmstat.c =================================================================== --- linux-vmstat-remote.orig/mm/vmstat.c +++ linux-vmstat-remote/mm/vmstat.c @@ -928,7 +928,7 @@ static int refresh_cpu_vm_stats(bool do_ * There cannot be any access by the offline cpu and therefore * synchronization is simplified. */ -void cpu_vm_stats_fold(int cpu) +void cpu_vm_stats_fold(int cpu, bool do_pagesets) { struct pglist_data *pgdat; struct zone *zone; @@ -938,6 +938,9 @@ void cpu_vm_stats_fold(int cpu) for_each_populated_zone(zone) { struct per_cpu_zonestat *pzstats; +#ifdef CONFIG_NUMA + struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); +#endif pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu); @@ -948,6 +951,11 @@ void cpu_vm_stats_fold(int cpu) v = xchg(&pzstats->vm_stat_diff[i], 0); atomic_long_add(v, &zone->vm_stat[i]); global_zone_diff[i] += v; +#ifdef CONFIG_NUMA + /* 3 seconds idle till flush */ + if (do_pagesets) + pcp->expire = 3; +#endif } } #ifdef CONFIG_NUMA @@ -959,6 +967,38 @@ void cpu_vm_stats_fold(int cpu) zone_numa_event_add(v, zone, i); } } + + if (do_pagesets) { + cond_resched(); + /* + * Deal with draining the remote pageset of a + * processor + * + * Check if there are pages remaining in this pageset + * if not then there is nothing to expire. + */ + if (!pcp->expire || !pcp->count) + continue; + + /* + * We never drain zones local to this processor. + */ + if (zone_to_nid(zone) == cpu_to_node(cpu)) { + pcp->expire = 0; + continue; + } + + WARN_ON(pcp->expire < 0); + /* + * pcp->expire is only accessed from vmstat_shepherd context, + * therefore no locking is required. + */ + if (--pcp->expire) + continue; + + if (pcp->count) + drain_zone_pages(zone, pcp); + } #endif } @@ -2060,7 +2100,7 @@ static int refresh_all_vm_stats(void) cpus_read_lock(); for_each_online_cpu(cpu) { - cpu_vm_stats_fold(cpu); + cpu_vm_stats_fold(cpu, true); cond_resched(); } cpus_read_unlock(); Index: linux-vmstat-remote/include/linux/vmstat.h =================================================================== --- linux-vmstat-remote.orig/include/linux/vmstat.h +++ linux-vmstat-remote/include/linux/vmstat.h @@ -291,7 +291,7 @@ extern void __dec_zone_state(struct zone extern void __dec_node_state(struct pglist_data *, enum node_stat_item); void quiet_vmstat(void); -void cpu_vm_stats_fold(int cpu); +void cpu_vm_stats_fold(int cpu, bool do_pagesets); void refresh_zone_stat_thresholds(void); struct ctl_table; Index: linux-vmstat-remote/mm/page_alloc.c =================================================================== --- linux-vmstat-remote.orig/mm/page_alloc.c +++ linux-vmstat-remote/mm/page_alloc.c @@ -8629,7 +8629,7 @@ static int page_alloc_cpu_dead(unsigned * Zero the differential counters of the dead processor * so that the vm statistics are consistent. */ - cpu_vm_stats_fold(cpu); + cpu_vm_stats_fold(cpu, false); for_each_populated_zone(zone) zone_pcp_update(zone, 0);
Large NUMA systems might have significant portions of system memory to be trapped in pcp queues. The number of pcp is determined by the number of processors and nodes in a system. A system with 4 processors and 2 nodes has 8 pcps which is okay. But a system with 1024 processors and 512 nodes has 512k pcps with a high potential for large amount of memory being caught in them. Enable remote node draining for the CONFIG_HAVE_CMPXCHG_LOCAL case, where vmstat_shepherd will perform the aging and draining via cpu_vm_stats_fold. Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> ---