diff mbox series

[v2,09/11] mm/vmstat: use cmpxchg loop in cpu_vm_stats_fold

Message ID 20230209153204.873999366@redhat.com (mailing list archive)
State New
Headers show
Series fold per-CPU vmstats remotely | expand

Commit Message

Marcelo Tosatti Feb. 9, 2023, 3:01 p.m. UTC
In preparation to switch vmstat shepherd to flush
per-CPU counters remotely, use a cmpxchg loop 
instead of a pair of read/write instructions.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Comments

Peter Xu March 1, 2023, 10:57 p.m. UTC | #1
On Thu, Feb 09, 2023 at 12:01:59PM -0300, Marcelo Tosatti wrote:
>  /*
> - * Fold the data for an offline cpu into the global array.
> + * Fold the data for a cpu into the global array.
>   * There cannot be any access by the offline cpu and therefore
>   * synchronization is simplified.
>   */
> @@ -906,8 +906,9 @@ void cpu_vm_stats_fold(int cpu)
>  			if (pzstats->vm_stat_diff[i]) {
>  				int v;
>  
> -				v = pzstats->vm_stat_diff[i];
> -				pzstats->vm_stat_diff[i] = 0;
> +				do {
> +					v = pzstats->vm_stat_diff[i];
> +				} while (!try_cmpxchg(&pzstats->vm_stat_diff[i], &v, 0));

IIUC try_cmpxchg will update "v" already, so I'd assume this'll work the
same:

        while (!try_cmpxchg(&pzstats->vm_stat_diff[i], &v, 0));

Then I figured, maybe it's easier to use xchg()?

I've no knowledge at all on cpu offline code, so sorry if this will be a
naive question.  But from what I understand this should not be touched by
anyone else.  Reasons:

  (1) cpu_vm_stats_fold() is only called in page_alloc_cpu_dead(), and the
      comment says:
  
	/*
	 * Zero the differential counters of the dead processor
	 * so that the vm statistics are consistent.
	 *
	 * This is only okay since the processor is dead and cannot
	 * race with what we are doing.
	 */
	cpu_vm_stats_fold(cpu);

      so.. I think that's what it says..

  (2) If someone can modify the dead cpu's vm_stat_diff, what guarantees it
      won't be e.g. boosted again right after try_cmpxchg() / xchg()
      returns?  What to do with the left-overs?

Thanks,
Marcelo Tosatti March 2, 2023, 1:55 p.m. UTC | #2
On Wed, Mar 01, 2023 at 05:57:08PM -0500, Peter Xu wrote:
> On Thu, Feb 09, 2023 at 12:01:59PM -0300, Marcelo Tosatti wrote:
> >  /*
> > - * Fold the data for an offline cpu into the global array.
> > + * Fold the data for a cpu into the global array.
> >   * There cannot be any access by the offline cpu and therefore
> >   * synchronization is simplified.
> >   */
> > @@ -906,8 +906,9 @@ void cpu_vm_stats_fold(int cpu)
> >  			if (pzstats->vm_stat_diff[i]) {
> >  				int v;
> >  
> > -				v = pzstats->vm_stat_diff[i];
> > -				pzstats->vm_stat_diff[i] = 0;
> > +				do {
> > +					v = pzstats->vm_stat_diff[i];
> > +				} while (!try_cmpxchg(&pzstats->vm_stat_diff[i], &v, 0));
> 
> IIUC try_cmpxchg will update "v" already, so I'd assume this'll work the
> same:
> 
>         while (!try_cmpxchg(&pzstats->vm_stat_diff[i], &v, 0));
> 
> Then I figured, maybe it's easier to use xchg()?

Yes, fixed.

> I've no knowledge at all on cpu offline code, so sorry if this will be a
> naive question.  But from what I understand this should not be touched by
> anyone else.  Reasons:
> 
>   (1) cpu_vm_stats_fold() is only called in page_alloc_cpu_dead(), and the
>       comment says:
>   
> 	/*
> 	 * Zero the differential counters of the dead processor
> 	 * so that the vm statistics are consistent.
> 	 *
> 	 * This is only okay since the processor is dead and cannot
> 	 * race with what we are doing.
> 	 */
> 	cpu_vm_stats_fold(cpu);
> 
>       so.. I think that's what it says..

This refers to the use of this_cpu operations being performed by the
counter updates.

If both the updater and reader use atomic accesses (which is the case after patch 8:
"mm/vmstat: switch counter modification to cmpxchg"), and
CONFIG_HAVE_CMPXCHG_LOCAL is set, then the comment is stale.

Removed it.

>   (2) If someone can modify the dead cpu's vm_stat_diff,

The only context that can modify the cpu's vm_stat_diff are:

1) The CPU itself (increases the counter).
2) cpu_vm_stats_fold (from vmstat_shepherd kernel thread), from 
x -> 0 only.

So you should not be able to increase the counter after this point. 
I suppose this is what this comment refers to.

>       what guarantees it
>       won't be e.g. boosted again right after try_cmpxchg() / xchg()
>       returns?  What to do with the left-overs?

If any code runs on the CPU that is being hotunplugged,
after cpu_vm_stats_fold (from page_alloc_cpu_dead), then there will 
be left-overs. But such bugs would exist today as well.

Or, if that bug exists, you could replace "for_each_online_cpu" to 
"for_each_cpu" here:

static void vmstat_shepherd(struct work_struct *w)
{
        int cpu;

        cpus_read_lock();
        /* Check processors whose vmstat worker threads have been disabled */
        for_each_online_cpu(cpu) {
Peter Xu March 2, 2023, 9:19 p.m. UTC | #3
On Thu, Mar 02, 2023 at 10:55:09AM -0300, Marcelo Tosatti wrote:
> >   (2) If someone can modify the dead cpu's vm_stat_diff,
> 
> The only context that can modify the cpu's vm_stat_diff are:
> 
> 1) The CPU itself (increases the counter).
> 2) cpu_vm_stats_fold (from vmstat_shepherd kernel thread), from 
> x -> 0 only.

I think I didn't continue reading so I didn't see cpu_vm_stats_fold() will
be reused when commenting, sorry.

Now with a reworked (and SMP-safe) cpu_vm_stats_fold() and vmstats, I'm
wondering the possibility of merging it with refresh_cpu_vm_stats() since
they really look similar.

IIUC the new refresh_cpu_vm_stats() logically doesn't need the small
preempt disabled sections, not anymore, if with a cpu_id passed over to
cpu_vm_stats_fold(), which seems to be even a good side effect. But not
sure I missed something.
Marcelo Tosatti March 3, 2023, 3:17 p.m. UTC | #4
On Thu, Mar 02, 2023 at 04:19:50PM -0500, Peter Xu wrote:
> On Thu, Mar 02, 2023 at 10:55:09AM -0300, Marcelo Tosatti wrote:
> > >   (2) If someone can modify the dead cpu's vm_stat_diff,
> > 
> > The only context that can modify the cpu's vm_stat_diff are:
> > 
> > 1) The CPU itself (increases the counter).
> > 2) cpu_vm_stats_fold (from vmstat_shepherd kernel thread), from 
> > x -> 0 only.
> 
> I think I didn't continue reading so I didn't see cpu_vm_stats_fold() will
> be reused when commenting, sorry.
> 
> Now with a reworked (and SMP-safe) cpu_vm_stats_fold() and vmstats, I'm
> wondering the possibility of merging it with refresh_cpu_vm_stats() since
> they really look similar.

Seems like a possibility. However that might require replacing

v = this_cpu_xchg(pzstats->vm_stat_diff[i], 0);

with

pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu);

Which would drop the this_cpu optimization described at 

7340a0b15280c9d902c7dd0608b8e751b5a7c403

Also you would not want the unified function to sync NUMA events 
(as it would be called from NOHZ entry and exit).

See f19298b9516c1a031b34b4147773457e3efe743b

> IIUC the new refresh_cpu_vm_stats() logically doesn't need the small
> preempt disabled sections, not anymore, 

What preempt disabled sections you refer to?

> if with a cpu_id passed over to
> cpu_vm_stats_fold(), which seems to be even a good side effect. But not
> sure I missed something.
> 
> -- 
> Peter Xu
> 
>
diff mbox series

Patch

Index: linux-2.6/mm/vmstat.c
===================================================================
--- linux-2.6.orig/mm/vmstat.c
+++ linux-2.6/mm/vmstat.c
@@ -885,7 +885,7 @@  static int refresh_cpu_vm_stats(void)
 }
 
 /*
- * Fold the data for an offline cpu into the global array.
+ * Fold the data for a cpu into the global array.
  * There cannot be any access by the offline cpu and therefore
  * synchronization is simplified.
  */
@@ -906,8 +906,9 @@  void cpu_vm_stats_fold(int cpu)
 			if (pzstats->vm_stat_diff[i]) {
 				int v;
 
-				v = pzstats->vm_stat_diff[i];
-				pzstats->vm_stat_diff[i] = 0;
+				do {
+					v = pzstats->vm_stat_diff[i];
+				} while (!try_cmpxchg(&pzstats->vm_stat_diff[i], &v, 0));
 				atomic_long_add(v, &zone->vm_stat[i]);
 				global_zone_diff[i] += v;
 			}
@@ -917,8 +918,9 @@  void cpu_vm_stats_fold(int cpu)
 			if (pzstats->vm_numa_event[i]) {
 				unsigned long v;
 
-				v = pzstats->vm_numa_event[i];
-				pzstats->vm_numa_event[i] = 0;
+				do {
+					v = pzstats->vm_numa_event[i];
+				} while (!try_cmpxchg(&pzstats->vm_numa_event[i], &v, 0));
 				zone_numa_event_add(v, zone, i);
 			}
 		}
@@ -934,8 +936,9 @@  void cpu_vm_stats_fold(int cpu)
 			if (p->vm_node_stat_diff[i]) {
 				int v;
 
-				v = p->vm_node_stat_diff[i];
-				p->vm_node_stat_diff[i] = 0;
+				do {
+					v = p->vm_node_stat_diff[i];
+				} while	(!try_cmpxchg(&p->vm_node_stat_diff[i], &v, 0));
 				atomic_long_add(v, &pgdat->vm_stat[i]);
 				global_node_diff[i] += v;
 			}