Message ID | 20220801234258.134609-3-atomlin@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | tick/sched: Ensure quiet_vmstat() is called when the idle tick was stopped too | expand |
Hi Aaron, Thank you for the patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v5.19 next-20220728] [cannot apply to tip/timers/nohz] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Aaron-Tomlin/tick-sched-Ensure-quiet_vmstat-is-called-when-the-idle-tick-was-stopped-too/20220802-074341 base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 9de1f9c8ca5100a02a2e271bdbde36202e251b4b config: i386-randconfig-a002-20220801 (https://download.01.org/0day-ci/archive/20220803/202208030608.T5WKMCpb-lkp@intel.com/config) compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 52cd00cabf479aa7eb6dbb063b7ba41ea57bce9e) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/a0d3b9fe31484c4c44c430d10d0b60e2e0551525 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Aaron-Tomlin/tick-sched-Ensure-quiet_vmstat-is-called-when-the-idle-tick-was-stopped-too/20220802-074341 git checkout a0d3b9fe31484c4c44c430d10d0b60e2e0551525 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): >> ld.lld: error: undefined symbol: tick_nohz_user_enter_prepare >>> referenced by common.c >>> entry/common.o:(exit_to_user_mode_prepare) in archive kernel/built-in.a >>> referenced by common.c >>> entry/common.o:(exit_to_user_mode_prepare) in archive kernel/built-in.a
Hi Aaron,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on linus/master]
[also build test ERROR on v5.19 next-20220802]
[cannot apply to tip/timers/nohz]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Aaron-Tomlin/tick-sched-Ensure-quiet_vmstat-is-called-when-the-idle-tick-was-stopped-too/20220802-074341
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 9de1f9c8ca5100a02a2e271bdbde36202e251b4b
config: x86_64-randconfig-a016-20220801 (https://download.01.org/0day-ci/archive/20220803/202208031315.Wwa9w3jr-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/a0d3b9fe31484c4c44c430d10d0b60e2e0551525
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Aaron-Tomlin/tick-sched-Ensure-quiet_vmstat-is-called-when-the-idle-tick-was-stopped-too/20220802-074341
git checkout a0d3b9fe31484c4c44c430d10d0b60e2e0551525
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
ld: vmlinux.o: in function `exit_to_user_mode_loop':
>> kernel/entry/common.c:182: undefined reference to `tick_nohz_user_enter_prepare'
ld: vmlinux.o: in function `exit_to_user_mode_prepare':
kernel/entry/common.c:198: undefined reference to `tick_nohz_user_enter_prepare'
vim +182 kernel/entry/common.c
a9f3a74a29af095 Thomas Gleixner 2020-07-22 144
a9f3a74a29af095 Thomas Gleixner 2020-07-22 145 static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
a9f3a74a29af095 Thomas Gleixner 2020-07-22 146 unsigned long ti_work)
a9f3a74a29af095 Thomas Gleixner 2020-07-22 147 {
a9f3a74a29af095 Thomas Gleixner 2020-07-22 148 /*
a9f3a74a29af095 Thomas Gleixner 2020-07-22 149 * Before returning to user space ensure that all pending work
a9f3a74a29af095 Thomas Gleixner 2020-07-22 150 * items have been completed.
a9f3a74a29af095 Thomas Gleixner 2020-07-22 151 */
a9f3a74a29af095 Thomas Gleixner 2020-07-22 152 while (ti_work & EXIT_TO_USER_MODE_WORK) {
a9f3a74a29af095 Thomas Gleixner 2020-07-22 153
a9f3a74a29af095 Thomas Gleixner 2020-07-22 154 local_irq_enable_exit_to_user(ti_work);
a9f3a74a29af095 Thomas Gleixner 2020-07-22 155
a9f3a74a29af095 Thomas Gleixner 2020-07-22 156 if (ti_work & _TIF_NEED_RESCHED)
a9f3a74a29af095 Thomas Gleixner 2020-07-22 157 schedule();
a9f3a74a29af095 Thomas Gleixner 2020-07-22 158
a9f3a74a29af095 Thomas Gleixner 2020-07-22 159 if (ti_work & _TIF_UPROBE)
a9f3a74a29af095 Thomas Gleixner 2020-07-22 160 uprobe_notify_resume(regs);
a9f3a74a29af095 Thomas Gleixner 2020-07-22 161
a9f3a74a29af095 Thomas Gleixner 2020-07-22 162 if (ti_work & _TIF_PATCH_PENDING)
a9f3a74a29af095 Thomas Gleixner 2020-07-22 163 klp_update_patch_state(current);
a9f3a74a29af095 Thomas Gleixner 2020-07-22 164
12db8b690010ccf Jens Axboe 2020-10-26 165 if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
8ba62d37949e248 Eric W. Biederman 2022-02-09 166 arch_do_signal_or_restart(regs);
a9f3a74a29af095 Thomas Gleixner 2020-07-22 167
a68de80f61f6af3 Sean Christopherson 2021-09-01 168 if (ti_work & _TIF_NOTIFY_RESUME)
03248addadf1a5e Eric W. Biederman 2022-02-09 169 resume_user_mode_work(regs);
a9f3a74a29af095 Thomas Gleixner 2020-07-22 170
a9f3a74a29af095 Thomas Gleixner 2020-07-22 171 /* Architecture specific TIF work */
a9f3a74a29af095 Thomas Gleixner 2020-07-22 172 arch_exit_to_user_mode_work(regs, ti_work);
a9f3a74a29af095 Thomas Gleixner 2020-07-22 173
a9f3a74a29af095 Thomas Gleixner 2020-07-22 174 /*
a9f3a74a29af095 Thomas Gleixner 2020-07-22 175 * Disable interrupts and reevaluate the work flags as they
a9f3a74a29af095 Thomas Gleixner 2020-07-22 176 * might have changed while interrupts and preemption was
a9f3a74a29af095 Thomas Gleixner 2020-07-22 177 * enabled above.
a9f3a74a29af095 Thomas Gleixner 2020-07-22 178 */
a9f3a74a29af095 Thomas Gleixner 2020-07-22 179 local_irq_disable_exit_to_user();
47b8ff194c1fd73 Frederic Weisbecker 2021-02-01 180
47b8ff194c1fd73 Frederic Weisbecker 2021-02-01 181 /* Check if any of the above work has queued a deferred wakeup */
f268c3737ecaefc Frederic Weisbecker 2021-05-27 @182 tick_nohz_user_enter_prepare();
47b8ff194c1fd73 Frederic Weisbecker 2021-02-01 183
6ce895128b3bff7 Mark Rutland 2021-11-29 184 ti_work = read_thread_flags();
a9f3a74a29af095 Thomas Gleixner 2020-07-22 185 }
a9f3a74a29af095 Thomas Gleixner 2020-07-22 186
a9f3a74a29af095 Thomas Gleixner 2020-07-22 187 /* Return the latest work state for arch_exit_to_user_mode() */
a9f3a74a29af095 Thomas Gleixner 2020-07-22 188 return ti_work;
a9f3a74a29af095 Thomas Gleixner 2020-07-22 189 }
a9f3a74a29af095 Thomas Gleixner 2020-07-22 190
Hi Aaron, Thank you for the patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v5.19 next-20220802] [cannot apply to tip/timers/nohz] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Aaron-Tomlin/tick-sched-Ensure-quiet_vmstat-is-called-when-the-idle-tick-was-stopped-too/20220802-074341 base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 9de1f9c8ca5100a02a2e271bdbde36202e251b4b config: x86_64-randconfig-a013-20220801 (https://download.01.org/0day-ci/archive/20220803/202208031440.kq5bbt4F-lkp@intel.com/config) compiler: gcc-11 (Debian 11.3.0-3) 11.3.0 reproduce (this is a W=1 build): # https://github.com/intel-lab-lkp/linux/commit/a0d3b9fe31484c4c44c430d10d0b60e2e0551525 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Aaron-Tomlin/tick-sched-Ensure-quiet_vmstat-is-called-when-the-idle-tick-was-stopped-too/20220802-074341 git checkout a0d3b9fe31484c4c44c430d10d0b60e2e0551525 # save the config file mkdir build_dir && cp config build_dir/.config make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): ld: vmlinux.o: in function `exit_to_user_mode_prepare': >> common.c:(.text+0x1d4569): undefined reference to `tick_nohz_user_enter_prepare' >> ld: common.c:(.text+0x1d45df): undefined reference to `tick_nohz_user_enter_prepare' ld: common.c:(.text+0x1d460c): undefined reference to `tick_nohz_user_enter_prepare'
diff --git a/include/linux/tick.h b/include/linux/tick.h index bfd571f18cfd..4c576c9ca0a2 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -11,7 +11,6 @@ #include <linux/context_tracking_state.h> #include <linux/cpumask.h> #include <linux/sched.h> -#include <linux/rcupdate.h> #ifdef CONFIG_GENERIC_CLOCKEVENTS extern void __init tick_init(void); @@ -123,6 +122,8 @@ enum tick_dep_bits { #define TICK_DEP_MASK_RCU (1 << TICK_DEP_BIT_RCU) #define TICK_DEP_MASK_RCU_EXP (1 << TICK_DEP_BIT_RCU_EXP) +void tick_nohz_user_enter_prepare(void); + #ifdef CONFIG_NO_HZ_COMMON extern bool tick_nohz_enabled; extern bool tick_nohz_tick_stopped(void); @@ -305,10 +306,4 @@ static inline void tick_nohz_task_switch(void) __tick_nohz_task_switch(); } -static inline void tick_nohz_user_enter_prepare(void) -{ - if (tick_nohz_full_cpu(smp_processor_id())) - rcu_nocb_flush_deferred_wakeup(); -} - #endif diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 30049580cd62..c7c69a974414 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -26,6 +26,7 @@ #include <linux/posix-timers.h> #include <linux/context_tracking.h> #include <linux/mm.h> +#include <linux/rcupdate.h> #include <asm/irq_regs.h> @@ -43,6 +44,20 @@ struct tick_sched *tick_get_tick_sched(int cpu) return &per_cpu(tick_cpu_sched, cpu); } +void tick_nohz_user_enter_prepare(void) +{ + struct tick_sched *ts; + + if (tick_nohz_full_cpu(smp_processor_id())) { + ts = this_cpu_ptr(&tick_cpu_sched); + + if (ts->tick_stopped) + quiet_vmstat(); + rcu_nocb_flush_deferred_wakeup(); + } +} +EXPORT_SYMBOL(tick_nohz_user_enter_prepare); + #if defined(CONFIG_NO_HZ_COMMON) || defined(CONFIG_HIGH_RES_TIMERS) /* * The time, when the last jiffy update happened. Write access must hold @@ -890,6 +905,9 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu) ts->do_timer_last = 0; } + /* Attempt to fold when the idle tick is stopped or not */ + quiet_vmstat(); + /* Skip reprogram of event if its not changed */ if (ts->tick_stopped && (expires == ts->next_tick)) { /* Sanity check: make sure clockevent is actually programmed */ @@ -911,7 +929,6 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu) */ if (!ts->tick_stopped) { calc_load_nohz_start(); - quiet_vmstat(); ts->last_tick = hrtimer_get_expires(&ts->sched_timer); ts->tick_stopped = 1;
In the context of the idle task and an adaptive-tick mode/or a nohz_full CPU, quiet_vmstat() can be called: before stopping the idle tick, entering an idle state and on exit. In particular, for the latter case, when the idle task is required to reschedule, the idle tick can remain stopped and the timer expiration time endless i.e., KTIME_MAX. Now, indeed before a nohz_full CPU enters an idle state, CPU-specific vmstat counters should be processed to ensure the respective values have been reset and folded into the zone specific 'vm_stat[]'. That being said, it can only occur when: the idle tick was previously stopped, and reprogramming of the timer is not required. A customer provided some evidence which indicates that the idle tick was stopped; albeit, CPU-specific vmstat counters still remained populated. Thus one can only assume quiet_vmstat() was not invoked on return to the idle loop. If I understand correctly, I suspect this divergence might erroneously prevent a reclaim attempt by kswapd. If the number of zone specific free pages are below their per-cpu drift value then zone_page_state_snapshot() is used to compute a more accurate view of the aforementioned statistic. Thus any task blocked on the NUMA node specific pfmemalloc_wait queue will be unable to make significant progress via direct reclaim unless it is killed after being woken up by kswapd (see throttle_direct_reclaim()). Consider the following theoretical scenario: 1. CPU Y migrated running task A to CPU X that was in an idle state i.e. waiting for an IRQ - not polling; marked the current task on CPU X to need/or require a reschedule i.e., set TIF_NEED_RESCHED and invoked a reschedule IPI to CPU X (see sched_move_task()) 2. CPU X acknowledged the reschedule IPI from CPU Y; generic idle loop code noticed the TIF_NEED_RESCHED flag against the idle task and attempts to exit of the loop and calls the main scheduler function i.e. __schedule(). Since the idle tick was previously stopped no scheduling-clock tick would occur. So, no deferred timers would be handled 3. Post transition to kernel execution Task A running on CPU Y, indirectly released a few pages (e.g. see __free_one_page()); CPU Y's 'vm_stat_diff[NR_FREE_PAGES]' was updated and zone specific 'vm_stat[]' update was deferred as per the CPU-specific stat threshold 4. Task A does invoke exit(2) and the kernel does remove the task from the run-queue; the idle task was selected to execute next since there are no other runnable tasks assigned to the given CPU (see pick_next_task() and pick_next_task_idle()) 5. On return to the idle loop since the idle tick was already stopped and can remain so (see [1] below) e.g. no pending soft IRQs, no attempt is made to zero and fold CPU Y's vmstat counters since reprogramming of the scheduling-clock tick is not required/or needed (see [2]) ... do_idle { __current_set_polling() tick_nohz_idle_enter() while (!need_resched()) { local_irq_disable() ... /* No polling or broadcast event */ cpuidle_idle_call() { if (cpuidle_not_available(drv, dev)) { tick_nohz_idle_stop_tick() __tick_nohz_idle_stop_tick(this_cpu_ptr(&tick_cpu_sched)) { int cpu = smp_processor_id() if (ts->timer_expires_base) expires = ts->timer_expires else if (can_stop_idle_tick(cpu, ts)) (1) -------> expires = tick_nohz_next_event(ts, cpu) else return ts->idle_calls++ if (expires > 0LL) { tick_nohz_stop_tick(ts, cpu) { if (ts->tick_stopped && (expires == ts->next_tick)) { (2) -------> if (tick == KTIME_MAX || ts->next_tick == hrtimer_get_expires(&ts->sched_timer)) return } ... } So the idea of with this patch is to ensure refresh_cpu_vm_stats(false) is called, when it is appropriate, on return to the idle loop when the idle tick was previously stopped too. Additionally, in the context of nohz_full, when the scheduling-tick is stopped and before exiting to user-mode, ensure no CPU-specific vmstat differentials remain. Signed-off-by: Aaron Tomlin <atomlin@redhat.com> --- include/linux/tick.h | 9 ++------- kernel/time/tick-sched.c | 19 ++++++++++++++++++- 2 files changed, 20 insertions(+), 8 deletions(-)