From patchwork Tue Jun 21 17:22:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Tomlin X-Patchwork-Id: 12889569 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D5A4C43334 for ; Tue, 21 Jun 2022 17:22:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EC3818E0008; Tue, 21 Jun 2022 13:22:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E72F48E0003; Tue, 21 Jun 2022 13:22:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D13548E0008; Tue, 21 Jun 2022 13:22:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id BB0848E0003 for ; Tue, 21 Jun 2022 13:22:15 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8786A33F6B for ; Tue, 21 Jun 2022 17:22:15 +0000 (UTC) X-FDA: 79602911430.23.75DC8CA Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf05.hostedemail.com (Postfix) with ESMTP id 0E74810009F for ; Tue, 21 Jun 2022 17:22:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1655832134; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=n82nW4f6LzIoIy9fCyVGX/F0PQMNBSAUqhTISZ/4aYQ=; b=AORjBoPpSmEcX2lqb/oC0kBErwqasJEiFIwSVh07c84M84Jk41xZ9NhtXi/icgDMakgNZ4 OB2zaW9tgD4GbZxD/la8ajyGIGywBdfhPUFYXJKg9gsiBDG9tFhdzfK6h6bjz1XyUMdtPw Iu1LW0d8JEJExZejBKT0izoMq+AgQEU= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-370-WK7I9lHlNTWvleu8QkzeLg-1; Tue, 21 Jun 2022 13:22:11 -0400 X-MC-Unique: WK7I9lHlNTWvleu8QkzeLg-1 Received: by mail-wm1-f70.google.com with SMTP id o3-20020a05600c510300b0039743540ac7so4500922wms.5 for ; Tue, 21 Jun 2022 10:22:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=n82nW4f6LzIoIy9fCyVGX/F0PQMNBSAUqhTISZ/4aYQ=; b=p9pFHYEiC+i7QaeWU53Ehoi89wqAcvbcHmXUQkiDt9f7InU3k0YmWjKlGENr1Vimlt hqIdDpLJ70JXj3TgpHUkg6rWshCapPifUwtH5LPBmQ7JCVL3vhj4bwdw857R4hA6FS3i sTe9hdAEy4zRTYBJu+9eOHouWsohZ/XcO83qm1UgNTOw3OugC4/kv3p73PxbkjGftkWV 3Sl/VYxfPXk/M1Ec0ku8zLCrZ7GQzDNtet4l5QMryoiCAHitV0W1AGlX9IhLbE7w6iFg D8GqSrccWbDXCwiwym/9SPt9OspmwilciqMoBY033mjj+Rq2lYK3a8sXea+mJAEQseh3 q0lg== X-Gm-Message-State: AOAM533H94jiWhwPFaYTH3OF+UThU4R8kzA2lMLPh+SoWqkxAcC8p+oN 4+PzL6QBF4M6jPcX9nG9yzDooAapoLhXAf8A9SR+ErR1eAs2P4btpTVltMPPH+JIKf/GOe0j5Q+ Mo8uJQ8GU2Q== X-Received: by 2002:a05:600c:3ba6:b0:397:5508:652d with SMTP id n38-20020a05600c3ba600b003975508652dmr41835992wms.126.1655832129795; Tue, 21 Jun 2022 10:22:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx0qdNEb/S3YUXmFTctSGPqDClTWBriszdGGLttJ8eHf4Djb0WI+P9FDV9RoeiVotDKNqeyPw== X-Received: by 2002:a05:600c:3ba6:b0:397:5508:652d with SMTP id n38-20020a05600c3ba600b003975508652dmr41835948wms.126.1655832129357; Tue, 21 Jun 2022 10:22:09 -0700 (PDT) Received: from localhost (cpc111743-lutn13-2-0-cust979.9-3.cable.virginm.net. [82.17.115.212]) by smtp.gmail.com with ESMTPSA id w16-20020adfec50000000b0021b97ffa2a9sm2520043wrn.46.2022.06.21.10.22.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Jun 2022 10:22:08 -0700 (PDT) From: Aaron Tomlin To: frederic@kernel.org, mtosatti@redhat.com Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, atomlin@atomlin.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v4] tick/sched: Ensure quiet_vmstat() is called when the idle tick was stopped too Date: Tue, 21 Jun 2022 18:22:07 +0100 Message-Id: <20220621172207.1501641-1-atomlin@redhat.com> X-Mailer: git-send-email 2.34.3 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655832135; a=rsa-sha256; cv=none; b=j1g1AiwMUEgQkfHwelHojwtxkTGJ5ekx5XCN5c3sB8m5trzAX9PCpd0vXf7Q5L9RaHEhsu X99INKHjcxeLqvqGdkpuZzIJl2AijgysaFjGFshepC6k5YBAx694unmocZji+4v7Iy2+NX Vwm3PtCmwOAFHgdZmWnxHisJgAvMe0A= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=AORjBoPp; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf05.hostedemail.com: domain of atomlin@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=atomlin@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655832135; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=n82nW4f6LzIoIy9fCyVGX/F0PQMNBSAUqhTISZ/4aYQ=; b=JMLIoyuHA5Mhqo3jxOE4r5U7fO/oS27Ur4G2g0xL5tJ3b4x3kqSLC1MFJ70ZJYiTI0JZ+o c4mDlll1JncJgBEYife2iLK9cpb1RKeLkhVkn/LIdieYpQwVLT9yk9zXjHDf584uEsXzBb XK0BZRhF6W/KOiWs35Hag/LSbbON6rs= Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=AORjBoPp; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf05.hostedemail.com: domain of atomlin@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=atomlin@redhat.com X-Rspam-User: X-Stat-Signature: cbw5ngsa7m4fdcotj7y9ugrf4b69tqhn X-Rspamd-Queue-Id: 0E74810009F X-Rspamd-Server: rspam08 X-HE-Tag: 1655832134-544737 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Frederic and Marcelo, I have incorporated an idea from Marcelo's patch [1] where a CPU-specific variable is used to indicate if a vmstat differential/or imbalance is present for a given CPU. So, at the appropriate time, vmstat processing can be initiated. The hope is that this particular approach is "cheaper" when compared to need_update() - used currently; in the context of nohz_full and the scheduling-clock tick being stopped, we would now with this patch, check if a CPU-specific vmstat imbalance is present before exiting user-mode (see tick_nohz_user_enter_prepare()). This trivial test program [2] was used to determine the somewhat impact under vanilla and with the proposed changes; mlock(2) and munlock(2) was used solely to modify vmstat item 'NR_MLOCK'. The following is an average count of CPU-cycles across the aforementioned system calls and the idle loop, respectively. I believe these results are negligible: Modified | Vanilla | | cycles per syscall: 7399 | cycles per syscall: 4150 cycles per idle loop: 141048 | cycles per idle loop: 144730 | Any feedback would be appreciated. Thanks. Changes since v3 [3]: - Used EXPORT_SYMBOL() on tick_nohz_user_enter_prepare() - Replaced need_update() - Introduced CPU-specific variable namely 'vmstat_dirty' and mark_vmstat_dirty() [1]: https://lore.kernel.org/lkml/20220204173554.763888172@fedora.localdomain/ [2]: https://pastebin.com/8AtzSAuK [3]: https://lore.kernel.org/lkml/20220422193647.3808657-1-atomlin@redhat.com/ diff --git a/include/linux/tick.h b/include/linux/tick.h index bfd571f18cfd..4c576c9ca0a2 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -11,7 +11,6 @@ #include #include #include -#include #ifdef CONFIG_GENERIC_CLOCKEVENTS extern void __init tick_init(void); @@ -123,6 +122,8 @@ enum tick_dep_bits { #define TICK_DEP_MASK_RCU (1 << TICK_DEP_BIT_RCU) #define TICK_DEP_MASK_RCU_EXP (1 << TICK_DEP_BIT_RCU_EXP) +void tick_nohz_user_enter_prepare(void); + #ifdef CONFIG_NO_HZ_COMMON extern bool tick_nohz_enabled; extern bool tick_nohz_tick_stopped(void); @@ -305,10 +306,4 @@ static inline void tick_nohz_task_switch(void) __tick_nohz_task_switch(); } -static inline void tick_nohz_user_enter_prepare(void) -{ - if (tick_nohz_full_cpu(smp_processor_id())) - rcu_nocb_flush_deferred_wakeup(); -} - #endif diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index d257721c68b8..4cdd71cc292f 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -26,6 +26,7 @@ #include #include #include +#include #include @@ -43,6 +44,20 @@ struct tick_sched *tick_get_tick_sched(int cpu) return &per_cpu(tick_cpu_sched, cpu); } +void tick_nohz_user_enter_prepare(void) +{ + struct tick_sched *ts; + + if (tick_nohz_full_cpu(smp_processor_id())) { + ts = this_cpu_ptr(&tick_cpu_sched); + + if (ts->tick_stopped) + quiet_vmstat(); + rcu_nocb_flush_deferred_wakeup(); + } +} +EXPORT_SYMBOL(tick_nohz_user_enter_prepare); + #if defined(CONFIG_NO_HZ_COMMON) || defined(CONFIG_HIGH_RES_TIMERS) /* * The time, when the last jiffy update happened. Write access must hold @@ -891,6 +906,9 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu) ts->do_timer_last = 0; } + /* Attempt to fold when the idle tick is stopped or not */ + quiet_vmstat(); + /* Skip reprogram of event if its not changed */ if (ts->tick_stopped && (expires == ts->next_tick)) { /* Sanity check: make sure clockevent is actually programmed */ @@ -912,7 +930,6 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu) */ if (!ts->tick_stopped) { calc_load_nohz_start(); - quiet_vmstat(); ts->last_tick = hrtimer_get_expires(&ts->sched_timer); ts->tick_stopped = 1; diff --git a/mm/vmstat.c b/mm/vmstat.c index b75b1a64b54c..7bfcafafe8f7 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -195,6 +195,12 @@ void fold_vm_numa_events(void) #endif #ifdef CONFIG_SMP +static DEFINE_PER_CPU_ALIGNED(bool, vmstat_dirty); + +static inline void mark_vmstat_dirty(void) +{ + this_cpu_write(vmstat_dirty, true); +} int calculate_pressure_threshold(struct zone *zone) { @@ -367,6 +373,7 @@ void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item, x = 0; } __this_cpu_write(*p, x); + mark_vmstat_dirty(); if (IS_ENABLED(CONFIG_PREEMPT_RT)) preempt_enable(); @@ -405,6 +412,7 @@ void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item, x = 0; } __this_cpu_write(*p, x); + mark_vmstat_dirty(); if (IS_ENABLED(CONFIG_PREEMPT_RT)) preempt_enable(); @@ -603,6 +611,7 @@ static inline void mod_zone_state(struct zone *zone, if (z) zone_page_state_add(z, zone, item); + mark_vmstat_dirty(); } void mod_zone_page_state(struct zone *zone, enum zone_stat_item item, @@ -671,6 +680,7 @@ static inline void mod_node_state(struct pglist_data *pgdat, if (z) node_page_state_add(z, pgdat, item); + mark_vmstat_dirty(); } void mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item, @@ -1866,6 +1876,7 @@ int sysctl_stat_interval __read_mostly = HZ; static void refresh_vm_stats(struct work_struct *work) { refresh_cpu_vm_stats(true); + this_cpu_write(vmstat_dirty, false); } int vmstat_refresh(struct ctl_table *table, int write, @@ -1930,6 +1941,7 @@ int vmstat_refresh(struct ctl_table *table, int write, static void vmstat_update(struct work_struct *w) { if (refresh_cpu_vm_stats(true)) { + this_cpu_write(vmstat_dirty, false); /* * Counters were updated so we expect more updates * to occur in the future. Keep on running the @@ -1941,35 +1953,6 @@ static void vmstat_update(struct work_struct *w) } } -/* - * Check if the diffs for a certain cpu indicate that - * an update is needed. - */ -static bool need_update(int cpu) -{ - pg_data_t *last_pgdat = NULL; - struct zone *zone; - - for_each_populated_zone(zone) { - struct per_cpu_zonestat *pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu); - struct per_cpu_nodestat *n; - - /* - * The fast way of checking if there are any vmstat diffs. - */ - if (memchr_inv(pzstats->vm_stat_diff, 0, sizeof(pzstats->vm_stat_diff))) - return true; - - if (last_pgdat == zone->zone_pgdat) - continue; - last_pgdat = zone->zone_pgdat; - n = per_cpu_ptr(zone->zone_pgdat->per_cpu_nodestats, cpu); - if (memchr_inv(n->vm_node_stat_diff, 0, sizeof(n->vm_node_stat_diff))) - return true; - } - return false; -} - /* * Switch off vmstat processing and then fold all the remaining differentials * until the diffs stay at zero. The function is used by NOHZ and can only be @@ -1983,7 +1966,7 @@ void quiet_vmstat(void) if (!delayed_work_pending(this_cpu_ptr(&vmstat_work))) return; - if (!need_update(smp_processor_id())) + if (!__this_cpu_read(vmstat_dirty)) return; /* @@ -1993,6 +1976,7 @@ void quiet_vmstat(void) * vmstat_shepherd will take care about that for us. */ refresh_cpu_vm_stats(false); + __this_cpu_write(vmstat_dirty, false); } /* @@ -2014,7 +1998,7 @@ static void vmstat_shepherd(struct work_struct *w) for_each_online_cpu(cpu) { struct delayed_work *dw = &per_cpu(vmstat_work, cpu); - if (!delayed_work_pending(dw) && need_update(cpu)) + if (!delayed_work_pending(dw) && per_cpu(vmstat_dirty, cpu)) queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0); cond_resched();