diff mbox series

[v5,2/2] tick/sched: Ensure quiet_vmstat() is called when the idle tick was stopped too

Message ID 20220801234258.134609-3-atomlin@redhat.com (mailing list archive)
State New
Headers show
Series tick/sched: Ensure quiet_vmstat() is called when the idle tick was stopped too | expand

Commit Message

Aaron Tomlin Aug. 1, 2022, 11:42 p.m. UTC
In the context of the idle task and an adaptive-tick mode/or a nohz_full
CPU, quiet_vmstat() can be called: before stopping the idle tick,
entering an idle state and on exit. In particular, for the latter case,
when the idle task is required to reschedule, the idle tick can remain
stopped and the timer expiration time endless i.e., KTIME_MAX. Now,
indeed before a nohz_full CPU enters an idle state, CPU-specific vmstat
counters should be processed to ensure the respective values have been
reset and folded into the zone specific 'vm_stat[]'. That being said, it
can only occur when: the idle tick was previously stopped, and
reprogramming of the timer is not required.

A customer provided some evidence which indicates that the idle tick was
stopped; albeit, CPU-specific vmstat counters still remained populated.
Thus one can only assume quiet_vmstat() was not invoked on return to the
idle loop.

If I understand correctly, I suspect this divergence might erroneously
prevent a reclaim attempt by kswapd. If the number of zone specific free
pages are below their per-cpu drift value then
zone_page_state_snapshot() is used to compute a more accurate view of
the aforementioned statistic.  Thus any task blocked on the NUMA node
specific pfmemalloc_wait queue will be unable to make significant
progress via direct reclaim unless it is killed after being woken up by
kswapd (see throttle_direct_reclaim()).

Consider the following theoretical scenario:

        1.      CPU Y migrated running task A to CPU X that was
                in an idle state i.e. waiting for an IRQ - not
                polling; marked the current task on CPU X to
                need/or require a reschedule i.e., set
                TIF_NEED_RESCHED and invoked a reschedule IPI to
                CPU X (see sched_move_task())

        2.      CPU X acknowledged the reschedule IPI from CPU Y;
                generic idle loop code noticed the
                TIF_NEED_RESCHED flag against the idle task and
                attempts to exit of the loop and calls the main
                scheduler function i.e. __schedule().

                Since the idle tick was previously stopped no
                scheduling-clock tick would occur.
                So, no deferred timers would be handled

        3.      Post transition to kernel execution Task A
                running on CPU Y, indirectly released a few pages
                (e.g. see __free_one_page()); CPU Y's
                'vm_stat_diff[NR_FREE_PAGES]' was updated and zone
                specific 'vm_stat[]' update was deferred as per the
                CPU-specific stat threshold

        4.      Task A does invoke exit(2) and the kernel does
                remove the task from the run-queue; the idle task
                was selected to execute next since there are no
                other runnable tasks assigned to the given CPU
                (see pick_next_task() and pick_next_task_idle())

        5.      On return to the idle loop since the idle tick
                was already stopped and can remain so (see [1]
                below) e.g. no pending soft IRQs, no attempt is
                made to zero and fold CPU Y's vmstat counters
                since reprogramming of the scheduling-clock tick
                is not required/or needed (see [2])

		  ...
		    do_idle
		    {

		      __current_set_polling()
		      tick_nohz_idle_enter()

		      while (!need_resched()) {

			local_irq_disable()

			...

			/* No polling or broadcast event */
			cpuidle_idle_call()
			{

			  if (cpuidle_not_available(drv, dev)) {
			    tick_nohz_idle_stop_tick()
			      __tick_nohz_idle_stop_tick(this_cpu_ptr(&tick_cpu_sched))
			      {
				int cpu = smp_processor_id()

				if (ts->timer_expires_base)
				  expires = ts->timer_expires
				else if (can_stop_idle_tick(cpu, ts))
	      (1) ------->        expires = tick_nohz_next_event(ts, cpu)
				else
				  return

				ts->idle_calls++

				if (expires > 0LL) {

				  tick_nohz_stop_tick(ts, cpu)
				  {

				    if (ts->tick_stopped && (expires == ts->next_tick)) {
	      (2) ------->            if (tick == KTIME_MAX || ts->next_tick ==
					hrtimer_get_expires(&ts->sched_timer))
					return
				    }
				    ...
				  }

So the idea of with this patch is to ensure refresh_cpu_vm_stats(false) is
called, when it is appropriate, on return to the idle loop when the idle
tick was previously stopped too. Additionally, in the context of
nohz_full, when the scheduling-tick is stopped and before exiting
to user-mode, ensure no CPU-specific vmstat differentials remain.

Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
---
 include/linux/tick.h     |  9 ++-------
 kernel/time/tick-sched.c | 19 ++++++++++++++++++-
 2 files changed, 20 insertions(+), 8 deletions(-)

Comments

kernel test robot Aug. 2, 2022, 10:11 p.m. UTC | #1
Hi Aaron,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.19 next-20220728]
[cannot apply to tip/timers/nohz]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Aaron-Tomlin/tick-sched-Ensure-quiet_vmstat-is-called-when-the-idle-tick-was-stopped-too/20220802-074341
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 9de1f9c8ca5100a02a2e271bdbde36202e251b4b
config: i386-randconfig-a002-20220801 (https://download.01.org/0day-ci/archive/20220803/202208030608.T5WKMCpb-lkp@intel.com/config)
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 52cd00cabf479aa7eb6dbb063b7ba41ea57bce9e)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/a0d3b9fe31484c4c44c430d10d0b60e2e0551525
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Aaron-Tomlin/tick-sched-Ensure-quiet_vmstat-is-called-when-the-idle-tick-was-stopped-too/20220802-074341
        git checkout a0d3b9fe31484c4c44c430d10d0b60e2e0551525
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> ld.lld: error: undefined symbol: tick_nohz_user_enter_prepare
   >>> referenced by common.c
   >>>               entry/common.o:(exit_to_user_mode_prepare) in archive kernel/built-in.a
   >>> referenced by common.c
   >>>               entry/common.o:(exit_to_user_mode_prepare) in archive kernel/built-in.a
kernel test robot Aug. 3, 2022, 5:43 a.m. UTC | #2
Hi Aaron,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.19 next-20220802]
[cannot apply to tip/timers/nohz]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Aaron-Tomlin/tick-sched-Ensure-quiet_vmstat-is-called-when-the-idle-tick-was-stopped-too/20220802-074341
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 9de1f9c8ca5100a02a2e271bdbde36202e251b4b
config: x86_64-randconfig-a016-20220801 (https://download.01.org/0day-ci/archive/20220803/202208031315.Wwa9w3jr-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/a0d3b9fe31484c4c44c430d10d0b60e2e0551525
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Aaron-Tomlin/tick-sched-Ensure-quiet_vmstat-is-called-when-the-idle-tick-was-stopped-too/20220802-074341
        git checkout a0d3b9fe31484c4c44c430d10d0b60e2e0551525
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ld: vmlinux.o: in function `exit_to_user_mode_loop':
>> kernel/entry/common.c:182: undefined reference to `tick_nohz_user_enter_prepare'
   ld: vmlinux.o: in function `exit_to_user_mode_prepare':
   kernel/entry/common.c:198: undefined reference to `tick_nohz_user_enter_prepare'


vim +182 kernel/entry/common.c

a9f3a74a29af095 Thomas Gleixner     2020-07-22  144  
a9f3a74a29af095 Thomas Gleixner     2020-07-22  145  static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
a9f3a74a29af095 Thomas Gleixner     2020-07-22  146  					    unsigned long ti_work)
a9f3a74a29af095 Thomas Gleixner     2020-07-22  147  {
a9f3a74a29af095 Thomas Gleixner     2020-07-22  148  	/*
a9f3a74a29af095 Thomas Gleixner     2020-07-22  149  	 * Before returning to user space ensure that all pending work
a9f3a74a29af095 Thomas Gleixner     2020-07-22  150  	 * items have been completed.
a9f3a74a29af095 Thomas Gleixner     2020-07-22  151  	 */
a9f3a74a29af095 Thomas Gleixner     2020-07-22  152  	while (ti_work & EXIT_TO_USER_MODE_WORK) {
a9f3a74a29af095 Thomas Gleixner     2020-07-22  153  
a9f3a74a29af095 Thomas Gleixner     2020-07-22  154  		local_irq_enable_exit_to_user(ti_work);
a9f3a74a29af095 Thomas Gleixner     2020-07-22  155  
a9f3a74a29af095 Thomas Gleixner     2020-07-22  156  		if (ti_work & _TIF_NEED_RESCHED)
a9f3a74a29af095 Thomas Gleixner     2020-07-22  157  			schedule();
a9f3a74a29af095 Thomas Gleixner     2020-07-22  158  
a9f3a74a29af095 Thomas Gleixner     2020-07-22  159  		if (ti_work & _TIF_UPROBE)
a9f3a74a29af095 Thomas Gleixner     2020-07-22  160  			uprobe_notify_resume(regs);
a9f3a74a29af095 Thomas Gleixner     2020-07-22  161  
a9f3a74a29af095 Thomas Gleixner     2020-07-22  162  		if (ti_work & _TIF_PATCH_PENDING)
a9f3a74a29af095 Thomas Gleixner     2020-07-22  163  			klp_update_patch_state(current);
a9f3a74a29af095 Thomas Gleixner     2020-07-22  164  
12db8b690010ccf Jens Axboe          2020-10-26  165  		if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
8ba62d37949e248 Eric W. Biederman   2022-02-09  166  			arch_do_signal_or_restart(regs);
a9f3a74a29af095 Thomas Gleixner     2020-07-22  167  
a68de80f61f6af3 Sean Christopherson 2021-09-01  168  		if (ti_work & _TIF_NOTIFY_RESUME)
03248addadf1a5e Eric W. Biederman   2022-02-09  169  			resume_user_mode_work(regs);
a9f3a74a29af095 Thomas Gleixner     2020-07-22  170  
a9f3a74a29af095 Thomas Gleixner     2020-07-22  171  		/* Architecture specific TIF work */
a9f3a74a29af095 Thomas Gleixner     2020-07-22  172  		arch_exit_to_user_mode_work(regs, ti_work);
a9f3a74a29af095 Thomas Gleixner     2020-07-22  173  
a9f3a74a29af095 Thomas Gleixner     2020-07-22  174  		/*
a9f3a74a29af095 Thomas Gleixner     2020-07-22  175  		 * Disable interrupts and reevaluate the work flags as they
a9f3a74a29af095 Thomas Gleixner     2020-07-22  176  		 * might have changed while interrupts and preemption was
a9f3a74a29af095 Thomas Gleixner     2020-07-22  177  		 * enabled above.
a9f3a74a29af095 Thomas Gleixner     2020-07-22  178  		 */
a9f3a74a29af095 Thomas Gleixner     2020-07-22  179  		local_irq_disable_exit_to_user();
47b8ff194c1fd73 Frederic Weisbecker 2021-02-01  180  
47b8ff194c1fd73 Frederic Weisbecker 2021-02-01  181  		/* Check if any of the above work has queued a deferred wakeup */
f268c3737ecaefc Frederic Weisbecker 2021-05-27 @182  		tick_nohz_user_enter_prepare();
47b8ff194c1fd73 Frederic Weisbecker 2021-02-01  183  
6ce895128b3bff7 Mark Rutland        2021-11-29  184  		ti_work = read_thread_flags();
a9f3a74a29af095 Thomas Gleixner     2020-07-22  185  	}
a9f3a74a29af095 Thomas Gleixner     2020-07-22  186  
a9f3a74a29af095 Thomas Gleixner     2020-07-22  187  	/* Return the latest work state for arch_exit_to_user_mode() */
a9f3a74a29af095 Thomas Gleixner     2020-07-22  188  	return ti_work;
a9f3a74a29af095 Thomas Gleixner     2020-07-22  189  }
a9f3a74a29af095 Thomas Gleixner     2020-07-22  190
kernel test robot Aug. 3, 2022, 6:14 a.m. UTC | #3
Hi Aaron,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.19 next-20220802]
[cannot apply to tip/timers/nohz]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Aaron-Tomlin/tick-sched-Ensure-quiet_vmstat-is-called-when-the-idle-tick-was-stopped-too/20220802-074341
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 9de1f9c8ca5100a02a2e271bdbde36202e251b4b
config: x86_64-randconfig-a013-20220801 (https://download.01.org/0day-ci/archive/20220803/202208031440.kq5bbt4F-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/a0d3b9fe31484c4c44c430d10d0b60e2e0551525
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Aaron-Tomlin/tick-sched-Ensure-quiet_vmstat-is-called-when-the-idle-tick-was-stopped-too/20220802-074341
        git checkout a0d3b9fe31484c4c44c430d10d0b60e2e0551525
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ld: vmlinux.o: in function `exit_to_user_mode_prepare':
>> common.c:(.text+0x1d4569): undefined reference to `tick_nohz_user_enter_prepare'
>> ld: common.c:(.text+0x1d45df): undefined reference to `tick_nohz_user_enter_prepare'
   ld: common.c:(.text+0x1d460c): undefined reference to `tick_nohz_user_enter_prepare'
diff mbox series

Patch

diff --git a/include/linux/tick.h b/include/linux/tick.h
index bfd571f18cfd..4c576c9ca0a2 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -11,7 +11,6 @@ 
 #include <linux/context_tracking_state.h>
 #include <linux/cpumask.h>
 #include <linux/sched.h>
-#include <linux/rcupdate.h>
 
 #ifdef CONFIG_GENERIC_CLOCKEVENTS
 extern void __init tick_init(void);
@@ -123,6 +122,8 @@  enum tick_dep_bits {
 #define TICK_DEP_MASK_RCU		(1 << TICK_DEP_BIT_RCU)
 #define TICK_DEP_MASK_RCU_EXP		(1 << TICK_DEP_BIT_RCU_EXP)
 
+void tick_nohz_user_enter_prepare(void);
+
 #ifdef CONFIG_NO_HZ_COMMON
 extern bool tick_nohz_enabled;
 extern bool tick_nohz_tick_stopped(void);
@@ -305,10 +306,4 @@  static inline void tick_nohz_task_switch(void)
 		__tick_nohz_task_switch();
 }
 
-static inline void tick_nohz_user_enter_prepare(void)
-{
-	if (tick_nohz_full_cpu(smp_processor_id()))
-		rcu_nocb_flush_deferred_wakeup();
-}
-
 #endif
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 30049580cd62..c7c69a974414 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -26,6 +26,7 @@ 
 #include <linux/posix-timers.h>
 #include <linux/context_tracking.h>
 #include <linux/mm.h>
+#include <linux/rcupdate.h>
 
 #include <asm/irq_regs.h>
 
@@ -43,6 +44,20 @@  struct tick_sched *tick_get_tick_sched(int cpu)
 	return &per_cpu(tick_cpu_sched, cpu);
 }
 
+void tick_nohz_user_enter_prepare(void)
+{
+	struct tick_sched *ts;
+
+	if (tick_nohz_full_cpu(smp_processor_id())) {
+		ts = this_cpu_ptr(&tick_cpu_sched);
+
+		if (ts->tick_stopped)
+			quiet_vmstat();
+		rcu_nocb_flush_deferred_wakeup();
+	}
+}
+EXPORT_SYMBOL(tick_nohz_user_enter_prepare);
+
 #if defined(CONFIG_NO_HZ_COMMON) || defined(CONFIG_HIGH_RES_TIMERS)
 /*
  * The time, when the last jiffy update happened. Write access must hold
@@ -890,6 +905,9 @@  static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu)
 		ts->do_timer_last = 0;
 	}
 
+	/* Attempt to fold when the idle tick is stopped or not */
+	quiet_vmstat();
+
 	/* Skip reprogram of event if its not changed */
 	if (ts->tick_stopped && (expires == ts->next_tick)) {
 		/* Sanity check: make sure clockevent is actually programmed */
@@ -911,7 +929,6 @@  static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu)
 	 */
 	if (!ts->tick_stopped) {
 		calc_load_nohz_start();
-		quiet_vmstat();
 
 		ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
 		ts->tick_stopped = 1;