From patchwork Wed Dec 21 16:58:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 13078957 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97612C4167B for ; Wed, 21 Dec 2022 17:10:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 570528E0007; Wed, 21 Dec 2022 12:10:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3EEAC8E0009; Wed, 21 Dec 2022 12:10:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DA60A8E0007; Wed, 21 Dec 2022 12:10:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C1E2B8E0005 for ; Wed, 21 Dec 2022 12:10:53 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 81A451208A5 for ; Wed, 21 Dec 2022 17:10:53 +0000 (UTC) X-FDA: 80266953186.08.4D113FD Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf17.hostedemail.com (Postfix) with ESMTP id BC4744000C for ; Wed, 21 Dec 2022 17:10:51 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=hIXCUIHH; spf=pass (imf17.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671642651; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:references:dkim-signature; bh=A8ptA+mtJpfx4WDFilLLKvFD1fvtEmfC0YGkopBqCFA=; b=Ds6dJFAiAp9MuWHY0UPob+334MbYbuE1QKM718WTe44JHq7t92K0J6ZigG8YThG5cw6yXB 9elMS/PQaQbz/oBVaLtsMY0YyRpgRmhVvvx48vnGc9FeZWNjJqJMVAu3OMHHQ7eleuAxNN Ctepy9IFpb6WSojJltSJNMm9XHdWWok= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=hIXCUIHH; spf=pass (imf17.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671642651; a=rsa-sha256; cv=none; b=P4bhWbomdtYP0zSz8Lv27GddJUqe76s+FkZycL9uVbF5EeVW4rWXgsnaBaV0FBPbSwWQZh N79OPvRZZBGqg+7ZlWr0UZH3+L/z2D+M3ypFjfwQPtsbLPFihuPyCl9Cxipo2lo3EzbGy7 a7TYw4vhEHTniYQfbj/P2ooejnoBko4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671642651; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=A8ptA+mtJpfx4WDFilLLKvFD1fvtEmfC0YGkopBqCFA=; b=hIXCUIHHzGe+MD85x6MH7egkv0pTriE1bsa+PbJQ98slfmBirjcq1dNYlTdFB3vckdu95h v1u4I+1teHDkgOI/7sRxXBoOqiYqj4vydI9BjRChqvKhM/wg8Td7UYerx/3sLk1XIwmQAK yXY5kG81S68iKoeu2OECmtepKTMsRNM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-591-yibgqVRuNwCx0j0WJvOmOg-1; Wed, 21 Dec 2022 12:10:47 -0500 X-MC-Unique: yibgqVRuNwCx0j0WJvOmOg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C9029882821; Wed, 21 Dec 2022 17:10:46 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 9824B40C945A; Wed, 21 Dec 2022 17:10:46 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 9FC4340408D42; Wed, 21 Dec 2022 14:09:34 -0300 (-03) Message-ID: <20221221170436.252896271@redhat.com> User-Agent: quilt/0.66 Date: Wed, 21 Dec 2022 13:58:02 -0300 From: Marcelo Tosatti To: atomlin@atomlin.com, frederic@kernel.org Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH v11 1/6] mm/vmstat: Add CPU-specific variable to track a vmstat discrepancy References: <20221221165801.362118576@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Stat-Signature: f4x9ea8xfk3j64n343ogiejawiyf1x4x X-Rspam-User: X-Rspamd-Queue-Id: BC4744000C X-Rspamd-Server: rspam06 X-HE-Tag: 1671642651-248433 X-HE-Meta: U2FsdGVkX19HNDSV8liTxbXa+Z54Wdma2FnootABlwtkHX0rhdSJ9ho2lZEdKOObefqjhEUZD3+jr2lel6b1T+V5vrNJU0H0U77CMjsibU4g4P3b/hrAKrZQEFEsjzSeqWyYHsAZtM5GBmZyE/ZEs1siFbuupO5WGQbaSMVkiyeLMo0Foez5AdWx5UpmQSMEOQsLR1GmkqGb8JzqYrIvU59OxUb++NSWjdm/SPGhTcKukA2JIksrJCBKE0fJJregTMU7g7R0W89DVgLVLAMcIzIq1LD9/5Nh6zxR7zxyHty9XkrfegRWKYRHQagpYX9Ixv4OO/ncpPr38cWljsLDm23/J8AXlWU7OTzyMHiTnbwd1NO5v6A+7RBCoQ1C4QRieyjTR927KfQCAND9QawUGJvoy9y6xXa+8QBf7vO6qJ3EuW47Vt8k2ggTjZFQJ6uhYZDMUYEbP+sb8T4a5UJMHqTKIHMviYAdtcANrmqA4pXvuzLmJMMOtciuXWQ9prK4XOngl7b3O4N6s+f9KhWp00d3+zvpkPyNeb/74ql4HmdI2xnw9VdI69SIB6PATooN4BxNoq517KThr5DsgRM2HvmG3rB+X2Rq7/c3Djzxuov/FM6nn0IdaIyGsjbLabvMdgvhxYm14Ab9JdnTrGTf/dnhPV+YmWmuj2oS7n0Zp8d0w+xQbGQDPoWUsD0YUDsgDyyEcMKp4EWYk9PZ2z/cUdvb/QuS5Gz6/JZjCxmYp382SabeBY6iwloUrJgvvb6DEGRoIKON4wVRa1B7uEobr2cBQYfgY1X1THEHpFSlEPQQr6QqulbF17kqch57zldqhSDQ+eNlaOSg/rWpXSOfh1vXzxmugqs82OQik4rGWbQs9PYdBaF63y+bJ780LsWb X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Aaron Tomlin Introduce a CPU-specific variable namely vmstat_dirty to indicate if a vmstat imbalance is present for a given CPU. Therefore, at the appropriate time, we can fold all the remaining differentials. This patch also provides trivial helpers for modification and testing. Signed-off-by: Aaron Tomlin Signed-off-by: Marcelo Tosatti --- mm/vmstat.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) Index: linux-2.6/mm/vmstat.c =================================================================== --- linux-2.6.orig/mm/vmstat.c +++ linux-2.6/mm/vmstat.c @@ -194,6 +194,22 @@ void fold_vm_numa_events(void) #endif #ifdef CONFIG_SMP +static DEFINE_PER_CPU_ALIGNED(bool, vmstat_dirty); + +static inline void vmstat_mark_dirty(void) +{ + this_cpu_write(vmstat_dirty, true); +} + +static inline void vmstat_clear_dirty(void) +{ + this_cpu_write(vmstat_dirty, false); +} + +static inline bool is_vmstat_dirty(void) +{ + return this_cpu_read(vmstat_dirty); +} int calculate_pressure_threshold(struct zone *zone) { From patchwork Wed Dec 21 16:58:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 13078954 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56184C4332F for ; Wed, 21 Dec 2022 17:10:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0AEFD8E0006; Wed, 21 Dec 2022 12:10:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0600A8E0005; Wed, 21 Dec 2022 12:10:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DCD718E0006; Wed, 21 Dec 2022 12:10:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id BBDFB8E0005 for ; Wed, 21 Dec 2022 12:10:51 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 92A77A09D3 for ; Wed, 21 Dec 2022 17:10:51 +0000 (UTC) X-FDA: 80266953102.02.BC69C7C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf13.hostedemail.com (Postfix) with ESMTP id 050472001B for ; Wed, 21 Dec 2022 17:10:49 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=X763BN3c; spf=pass (imf13.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671642650; a=rsa-sha256; cv=none; b=5BL3sWKbSL7oCsoSDA5HCHVXUbyH40YD1yFmABohFcmnJgm/cAW/6VN2ny+hDD5LYJVN3R 0Wt3IeDe/W7qzwkfQzEM7IoFD2yoqDqNWFwAtWOQd8jEVCtK+Gxkwitsjr08EgO4/VOzLg hzRa/hmQqfMEEcIMgQMW/46fxBxxAZE= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=X763BN3c; spf=pass (imf13.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671642650; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:references:dkim-signature; bh=r/LpF1VkdRdOhfpQs0+SAaS5Z0AIHcjMWCSMDWY5FIc=; b=VI6MIe7J4siSLdOTWMu+rIYymgB6gUPSWoDIx1KaQTWAM5yNZSOsNd2XMEgFzJSM08nCjd 9SJ0boEWBvTig/abzsyFYD6yb6qaDoktI4dyfWADODE8kpR7JCIIWWyjWVnzgJ+SmZ+dOP hcm+aWIKXhRDA/2LJtF414QxvhtxamA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671642649; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=r/LpF1VkdRdOhfpQs0+SAaS5Z0AIHcjMWCSMDWY5FIc=; b=X763BN3c/aOUh/7mUs/XErkbo0L4UIjQvW8GhbYDWQ/2krFd8CLLXs96VNPChgY+kTkJfa fSdV1ur//66FSxm06P861vW4cdlDZlNQW2ERy1vG+v67ysUY8sm8aJttiPEWrLtMVIRo8n CCQsY5XKJ25ue4gvKzNAIBrMXqxO1Lw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-652-hd3noGltNP29S8NfNyWmVw-1; Wed, 21 Dec 2022 12:10:45 -0500 X-MC-Unique: hd3noGltNP29S8NfNyWmVw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4FF058F6E87; Wed, 21 Dec 2022 17:10:45 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id CDD132026D76; Wed, 21 Dec 2022 17:10:44 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id A32B840408D43; Wed, 21 Dec 2022 14:09:34 -0300 (-03) Message-ID: <20221221170436.292370701@redhat.com> User-Agent: quilt/0.66 Date: Wed, 21 Dec 2022 13:58:03 -0300 From: Marcelo Tosatti To: atomlin@atomlin.com, frederic@kernel.org Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH v11 2/6] mm/vmstat: Use vmstat_dirty to track CPU-specific vmstat discrepancies References: <20221221165801.362118576@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Rspam-User: X-Rspamd-Queue-Id: 050472001B X-Rspamd-Server: rspam01 X-Stat-Signature: hdn1aqbebcm1xfdp3ofx9owu83fxksen X-HE-Tag: 1671642649-781686 X-HE-Meta: U2FsdGVkX18wjLUUNKjyzoGeOphz6kMjAsml/wB6C5Akmpgsan/dfYql0Ouh00BWW0PFy4MqPXEbgC5YPjfP2wxPkHOMtb2Hra7AVkTReF6njUH9U474O/u7OCDYi/9lIXPXX+krkkNQOS2Vfp3RQ3T1y5/z/2tHfhjB5cvzxq+qW4lkF+Iy+YB2lhqEHhwRxfabBZGv/zQRqEwTsLWMHWXyz5mVf+yTbb2oS3QTOmpZO5MMRZy6wAcoCB7mppogcexpNDmL4BBEngPBXXPMDpYL2seSUMLGbcIlUzry3tBy1So8+wunSD0cHRqpwoR+ZYVPz4HBgLhnve9Y5pIxX+aYGAUGa6kDY6q+w9d7w50JIJnBlOyg6T7j3Fcu5YpmZkHmhs0oy3pe9nj8T9YfGdqa7ORjl9ePzQOAzZYWwenkw1fYz681tqu1Pn8+h+mkxP9WU9sYlrGyOOSTNSKbvOavvFd+32/ho997RABZBlAddBt2ZTQwCrNap+vA0Z16BjwJglkK69GhBwzEEsxugqA+FdrsQFMniIvljZQQAdtXgNnan4QalNn9WVJ84Cb/f84mlGdPh/UE2SAKzhZGQSSsxiJ/LHsE0FtsnHjLbH6qD0UKNeQ1vVcvYi7c9Cdp50NU4avw7jajt61A+pq3fiw7FTVQZgTXtDf2lPq75o7nP9ZXIOB/0VBQws5psS5mmyKtsyY8HxMGR+XI22B6F7C9Yy/bvkhDw93HhSqR5gFLFzEVA+ZF91sJEUguH8Nc1lYLKF0qYiVOllOwROFdDowPC5QbkXJo/ss/Te94bgZd1EV0pkiB/l1XiuOFUMtLcu/LH/JKZAIj9toT9wtgloCumRJQW+oUBXNgzYtT65CvmeZzTAtQd9t2DanDD+W8kpW2D0lKLT8Y32prl3I7Qhu75rEp4DaPuScbJdXoI9Q= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Aaron Tomlin This patch will now use the previously introduced CPU-specific variable namely vmstat_dirty to indicate if a vmstat differential/or imbalance is present for a given CPU. So, at the appropriate time, vmstat processing can be initiated. The hope is that this particular approach is "cheaper" when compared to need_update(). The idea is based on Marcelo's patch [1]. [1]: https://lore.kernel.org/lkml/20220204173554.763888172@fedora.localdomain/ Signed-off-by: Aaron Tomlin Signed-off-by: Marcelo Tosatti --- mm/vmstat.c | 48 ++++++++++++++---------------------------------- 1 file changed, 14 insertions(+), 34 deletions(-) Index: linux-2.6/mm/vmstat.c =================================================================== --- linux-2.6.orig/mm/vmstat.c +++ linux-2.6/mm/vmstat.c @@ -381,6 +381,7 @@ void __mod_zone_page_state(struct zone * x = 0; } __this_cpu_write(*p, x); + vmstat_mark_dirty(); preempt_enable_nested(); } @@ -417,6 +418,7 @@ void __mod_node_page_state(struct pglist x = 0; } __this_cpu_write(*p, x); + vmstat_mark_dirty(); preempt_enable_nested(); } @@ -606,6 +608,7 @@ static inline void mod_zone_state(struct if (z) zone_page_state_add(z, zone, item); + vmstat_mark_dirty(); } void mod_zone_page_state(struct zone *zone, enum zone_stat_item item, @@ -674,6 +677,7 @@ static inline void mod_node_state(struct if (z) node_page_state_add(z, pgdat, item); + vmstat_mark_dirty(); } void mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item, @@ -828,6 +832,14 @@ static int refresh_cpu_vm_stats(bool do_ int global_node_diff[NR_VM_NODE_STAT_ITEMS] = { 0, }; int changes = 0; + /* + * Clear vmstat_dirty before clearing the percpu vmstats. + * If interrupts are enabled, it is possible that an interrupt + * or another task modifies a percpu vmstat, which will + * set vmstat_dirty to true. + */ + vmstat_clear_dirty(); + for_each_populated_zone(zone) { struct per_cpu_zonestat __percpu *pzstats = zone->per_cpu_zonestats; #ifdef CONFIG_NUMA @@ -1957,35 +1969,6 @@ static void vmstat_update(struct work_st } /* - * Check if the diffs for a certain cpu indicate that - * an update is needed. - */ -static bool need_update(int cpu) -{ - pg_data_t *last_pgdat = NULL; - struct zone *zone; - - for_each_populated_zone(zone) { - struct per_cpu_zonestat *pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu); - struct per_cpu_nodestat *n; - - /* - * The fast way of checking if there are any vmstat diffs. - */ - if (memchr_inv(pzstats->vm_stat_diff, 0, sizeof(pzstats->vm_stat_diff))) - return true; - - if (last_pgdat == zone->zone_pgdat) - continue; - last_pgdat = zone->zone_pgdat; - n = per_cpu_ptr(zone->zone_pgdat->per_cpu_nodestats, cpu); - if (memchr_inv(n->vm_node_stat_diff, 0, sizeof(n->vm_node_stat_diff))) - return true; - } - return false; -} - -/* * Switch off vmstat processing and then fold all the remaining differentials * until the diffs stay at zero. The function is used by NOHZ and can only be * invoked when tick processing is not active. @@ -1995,10 +1978,7 @@ void quiet_vmstat(void) if (system_state != SYSTEM_RUNNING) return; - if (!delayed_work_pending(this_cpu_ptr(&vmstat_work))) - return; - - if (!need_update(smp_processor_id())) + if (!is_vmstat_dirty()) return; /* @@ -2029,7 +2009,7 @@ static void vmstat_shepherd(struct work_ for_each_online_cpu(cpu) { struct delayed_work *dw = &per_cpu(vmstat_work, cpu); - if (!delayed_work_pending(dw) && need_update(cpu)) + if (!delayed_work_pending(dw) && per_cpu(vmstat_dirty, cpu)) queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0); cond_resched(); From patchwork Wed Dec 21 16:58:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 13078953 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6ADD8C41535 for ; Wed, 21 Dec 2022 17:10:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 347898E0001; Wed, 21 Dec 2022 12:10:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F7898E0003; Wed, 21 Dec 2022 12:10:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 173998E0001; Wed, 21 Dec 2022 12:10:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id F0FB48E0003 for ; Wed, 21 Dec 2022 12:10:50 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 85FCB1602C2 for ; Wed, 21 Dec 2022 17:10:50 +0000 (UTC) X-FDA: 80266953060.17.A7C0F1F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf28.hostedemail.com (Postfix) with ESMTP id EBE3CC0018 for ; Wed, 21 Dec 2022 17:10:47 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=LKMOj+Wg; spf=pass (imf28.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671642648; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:references:dkim-signature; bh=3gD3dwT2L6CCCjRN92vyud3vtNAmOazkyM6y6Mm+R8Y=; b=nhM+TWKx7fAsh/nkpqtchG0IVzx7MlljFWh7KWW8wjfF1HHEaLN0zkCbRxGjg6fuilJDFz pdtot2XOSo8ffSNud+k9ZPYRQy2mtGV45bR70N7NbcBGbxosYt9uDLW2cROAshXSYHnz5L H4xGEsALk04v0mAWFG4bWM32ARxjzCA= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=LKMOj+Wg; spf=pass (imf28.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671642648; a=rsa-sha256; cv=none; b=NBIh59U3ZeuBWnQ60n2/3tgyIlVS8fEWuBuZ6ffrHX3UlzDfLJHgkWmq+9mNfvjkEOCp2G wn/HFxtnHqA0ykGajjQsiuSXo1UeLc4pXxmYwcc4CHEwM30g81Jw+Tf4+W6tNxGo1y/o6W FOly2atcOB1CeZMyJndlr7fak0jTs0Q= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671642647; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=3gD3dwT2L6CCCjRN92vyud3vtNAmOazkyM6y6Mm+R8Y=; b=LKMOj+WgB7ard1dd3TpUGbeaKWeUKtE144wJQ04o45RVYMHi08ClxTyhepu6ppvrwiKabF 8TnyFRj/5EjpcsJ/NDeOfG4sQaeyPdBswuVvy0n+q6EM/UBUyTkGWRKRa3dyVwK4r+K95n 4nIG+c/880HvrJxDi4a0/aqhzVheirI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-517-P822zIk2PAq546CNbWl-3w-1; Wed, 21 Dec 2022 12:10:45 -0500 X-MC-Unique: P822zIk2PAq546CNbWl-3w-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3BF6918E0046; Wed, 21 Dec 2022 17:10:45 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id AF8E34014EBD; Wed, 21 Dec 2022 17:10:44 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id A67D440408D49; Wed, 21 Dec 2022 14:09:34 -0300 (-03) Message-ID: <20221221170436.330627967@redhat.com> User-Agent: quilt/0.66 Date: Wed, 21 Dec 2022 13:58:04 -0300 From: Marcelo Tosatti To: atomlin@atomlin.com, frederic@kernel.org Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH v11 3/6] mm/vmstat: manage per-CPU stats from CPU context when NOHZ full References: <20221221165801.362118576@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Stat-Signature: x4aadsz7safr4rgw44pdg8r8nwnmud96 X-Rspam-User: X-Rspamd-Queue-Id: EBE3CC0018 X-Rspamd-Server: rspam06 X-HE-Tag: 1671642647-645488 X-HE-Meta: U2FsdGVkX1/Q+C85uLTmtkttzeahHoyI2+Q+06Jig97TgJHKb7ipW/RiAXWLBifkaMPJoeFb4IqWPk7t9G9lCxqFcb84wlutl37l8HDMdQLAG93H27uuzTf0GJJlIxB8zp0viCHR8Bw7jBz9mTtSntbiTRPfW+S6Vg/T0n8e59W5IjLtq45rnSGQq54W56ekUjheFCFEMo71DvF5ykfIbc9G+uo5WOZkXeXdfjiFIB2QXFtP3upJyQyMS+3yuh/d9Aw98PDwo557valfwHcucWH4u+ebjPEr8sXQ78TMl+T/WvzehMagEYZ479SVKemFSaW56vPywgsNafEiSRfKI7eIsE8ZAjdBB1/X/t1g7AWVADBd2640xxrbvxQjpB3E1OH1kvtsHRl0arbKykRceHCQL0KTncchtZzjZ0RRryrnF239eAEfKq0IVGHBAiu3pk6a6RftPfacqTpO/UFtFuXv2FHK8Zf1TrCC415p4o6rbsxAKuuLU01yt1ITtlfxP2KrN3f95G+RAeaIiHbgRtt4ZG2XyRVQe+Flc7naxDEx1xwmsmjLOqAVTsl4T4cjLQAwmoWOlHXMV7oWZfsXt2zvLAfz892antQA2iU3hAVdzKowlg6WWg+z2pkycYwVulQSrwJd//xzPTQ2izWQnCNMnav3plRyykHTDcrMFyNbwwZCbBbXMiWsVkoJt9K8aAy42Nvm/ZSoZJC18FdnM9Yr1nqRaRMBHCuaHNRjIS9s/KHvO78AyDvBi0PL52LUpsQYbvHv3YH+VjsdZfP67Z7sqbCYor9AEQya68ElvvoHKGAikStVkkT6OomgFr0pgFD2Wm/52cGuNa1VppqV+suSlgYWy8ljPmSe0R0lgrhzvvR74PoZFZZXv+udwiev X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For nohz full CPUs, we'd like the per-CPU vm statistics to be synchronized when userspace is executing. Otherwise, the vmstat_shepherd might queue a work item to synchronize them, which is undesired intereference for isolated CPUs. This means that its necessary to check for, and possibly sync, the statistics when returning to userspace. This means that there are now two execution contexes, on different CPUs, which require awareness about each other: context switch and vmstat shepherd kernel threadr. To avoid the shared variables between these two contexes (which would require atomic accesses), delegate the responsability of statistics synchronization from vmstat_shepherd to local CPU context, for nohz_full CPUs. Do that by queueing a delayed work when marking per-CPU vmstat dirty. When returning to userspace, fold the stats and cancel the delayed work. When entering idle, only fold the stats. Signed-off-by: Marcelo Tosatti --- include/linux/vmstat.h | 4 ++-- kernel/time/tick-sched.c | 2 +- mm/vmstat.c | 41 ++++++++++++++++++++++++++++++++--------- 3 files changed, 35 insertions(+), 12 deletions(-) Index: linux-2.6/mm/vmstat.c =================================================================== --- linux-2.6.orig/mm/vmstat.c +++ linux-2.6/mm/vmstat.c @@ -28,6 +28,7 @@ #include #include #include +#include #include "internal.h" @@ -194,21 +195,50 @@ void fold_vm_numa_events(void) #endif #ifdef CONFIG_SMP -static DEFINE_PER_CPU_ALIGNED(bool, vmstat_dirty); + +struct vmstat_dirty { + bool dirty; + bool cpuhotplug; +}; + +static DEFINE_PER_CPU_ALIGNED(struct vmstat_dirty, vmstat_dirty_pcpu); +static DEFINE_PER_CPU(struct delayed_work, vmstat_work); +int sysctl_stat_interval __read_mostly = HZ; static inline void vmstat_mark_dirty(void) { - this_cpu_write(vmstat_dirty, true); + struct vmstat_dirty *vms = this_cpu_ptr(&vmstat_dirty_pcpu); + +#ifdef CONFIG_FLUSH_WORK_ON_RESUME_USER + int cpu = smp_processor_id(); + + if (tick_nohz_full_cpu(cpu) && !vms->dirty) { + struct delayed_work *dw; + + dw = this_cpu_ptr(&vmstat_work); + if (!delayed_work_pending(dw) && !vms->cpuhotplug) { + unsigned long delay; + + delay = round_jiffies_relative(sysctl_stat_interval); + queue_delayed_work_on(cpu, mm_percpu_wq, dw, delay); + } + } +#endif + vms->dirty = true; } static inline void vmstat_clear_dirty(void) { - this_cpu_write(vmstat_dirty, false); + struct vmstat_dirty *vms = this_cpu_ptr(&vmstat_dirty_pcpu); + + vms->dirty = false; } static inline bool is_vmstat_dirty(void) { - return this_cpu_read(vmstat_dirty); + struct vmstat_dirty *vms = this_cpu_ptr(&vmstat_dirty_pcpu); + + return vms->dirty; } int calculate_pressure_threshold(struct zone *zone) @@ -1886,9 +1916,6 @@ static const struct seq_operations vmsta #endif /* CONFIG_PROC_FS */ #ifdef CONFIG_SMP -static DEFINE_PER_CPU(struct delayed_work, vmstat_work); -int sysctl_stat_interval __read_mostly = HZ; - #ifdef CONFIG_PROC_FS static void refresh_vm_stats(struct work_struct *work) { @@ -1973,7 +2000,7 @@ static void vmstat_update(struct work_st * until the diffs stay at zero. The function is used by NOHZ and can only be * invoked when tick processing is not active. */ -void quiet_vmstat(void) +void quiet_vmstat(bool user) { if (system_state != SYSTEM_RUNNING) return; @@ -1981,13 +2008,18 @@ void quiet_vmstat(void) if (!is_vmstat_dirty()) return; + refresh_cpu_vm_stats(false); + +#ifdef CONFIG_FLUSH_WORK_ON_RESUME_USER + if (!user) + return; /* - * Just refresh counters and do not care about the pending delayed - * vmstat_update. It doesn't fire that often to matter and canceling - * it would be too expensive from this path. - * vmstat_shepherd will take care about that for us. + * If the tick is stopped, cancel any delayed work to avoid + * interruptions to this CPU in the future. */ - refresh_cpu_vm_stats(false); + if (delayed_work_pending(this_cpu_ptr(&vmstat_work))) + cancel_delayed_work(this_cpu_ptr(&vmstat_work)); +#endif } /* @@ -2008,8 +2040,15 @@ static void vmstat_shepherd(struct work_ /* Check processors whose vmstat worker threads have been disabled */ for_each_online_cpu(cpu) { struct delayed_work *dw = &per_cpu(vmstat_work, cpu); + struct vmstat_dirty *vms = per_cpu_ptr(&vmstat_dirty_pcpu, cpu); - if (!delayed_work_pending(dw) && per_cpu(vmstat_dirty, cpu)) +#ifdef CONFIG_FLUSH_WORK_ON_RESUME_USER + /* NOHZ full CPUs manage their own vmstat flushing */ + if (tick_nohz_full_cpu(cpu)) + continue; +#endif + + if (!delayed_work_pending(dw) && vms->dirty) queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0); cond_resched(); @@ -2044,6 +2083,25 @@ static void __init init_cpu_node_state(v static int vmstat_cpu_online(unsigned int cpu) { +#ifdef CONFIG_FLUSH_WORK_ON_RESUME_USER + struct vmstat_dirty *vms = per_cpu_ptr(&vmstat_dirty_pcpu, cpu); + + if (tick_nohz_full_cpu(cpu)) { + struct delayed_work *dw; + + vms->cpuhotplug = false; + vms->dirty = true; + + dw = this_cpu_ptr(&vmstat_work); + if (!delayed_work_pending(dw)) { + unsigned long delay; + + delay = round_jiffies_relative(sysctl_stat_interval); + queue_delayed_work_on(cpu, mm_percpu_wq, dw, delay); + } + } +#endif + refresh_zone_stat_thresholds(); if (!node_state(cpu_to_node(cpu), N_CPU)) { @@ -2053,8 +2111,15 @@ static int vmstat_cpu_online(unsigned in return 0; } +/* + * ONLINE: The callbacks are invoked on the hotplugged CPU from the per CPU + * hotplug thread with interrupts and preemption enabled. + */ static int vmstat_cpu_down_prep(unsigned int cpu) { + struct vmstat_dirty *vms = per_cpu_ptr(&vmstat_dirty_pcpu, cpu); + + vms->cpuhotplug = true; cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)); return 0; } Index: linux-2.6/include/linux/vmstat.h =================================================================== --- linux-2.6.orig/include/linux/vmstat.h +++ linux-2.6/include/linux/vmstat.h @@ -290,7 +290,7 @@ extern void dec_zone_state(struct zone * extern void __dec_zone_state(struct zone *, enum zone_stat_item); extern void __dec_node_state(struct pglist_data *, enum node_stat_item); -void quiet_vmstat(void); +void quiet_vmstat(bool user); void cpu_vm_stats_fold(int cpu); void refresh_zone_stat_thresholds(void); @@ -403,7 +403,7 @@ static inline void __dec_node_page_state static inline void refresh_zone_stat_thresholds(void) { } static inline void cpu_vm_stats_fold(int cpu) { } -static inline void quiet_vmstat(void) { } +static inline void quiet_vmstat(bool user) { } static inline void drain_zonestat(struct zone *zone, struct per_cpu_zonestat *pzstats) { } Index: linux-2.6/kernel/time/tick-sched.c =================================================================== --- linux-2.6.orig/kernel/time/tick-sched.c +++ linux-2.6/kernel/time/tick-sched.c @@ -911,7 +911,7 @@ static void tick_nohz_stop_tick(struct t */ if (!ts->tick_stopped) { calc_load_nohz_start(); - quiet_vmstat(); + quiet_vmstat(false); ts->last_tick = hrtimer_get_expires(&ts->sched_timer); ts->tick_stopped = 1; Index: linux-2.6/mm/Kconfig =================================================================== --- linux-2.6.orig/mm/Kconfig +++ linux-2.6/mm/Kconfig @@ -1124,6 +1124,19 @@ config PTE_MARKER_UFFD_WP purposes. It is required to enable userfaultfd write protection on file-backed memory types like shmem and hugetlbfs. +config FLUSH_WORK_ON_RESUME_USER + bool "Flush per-CPU vmstats on user return (for nohz full CPUs)" + depends on NO_HZ_FULL + default y + + help + By default, nohz full CPUs flush per-CPU vm statistics on return + to userspace (to avoid additional interferences when executing + userspace code). This has a small but measurable impact on + system call performance. You can disable this to improve system call + performance, at the expense of potential interferences to userspace + execution. + # multi-gen LRU { config LRU_GEN bool "Multi-Gen LRU" From patchwork Wed Dec 21 16:58:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 13078952 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5B1AC3DA7A for ; Wed, 21 Dec 2022 17:10:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 829E48E0003; Wed, 21 Dec 2022 12:10:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7D9838E0005; Wed, 21 Dec 2022 12:10:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6337D8E0003; Wed, 21 Dec 2022 12:10:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 458198E0005 for ; Wed, 21 Dec 2022 12:10:51 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id EF1ED40694 for ; Wed, 21 Dec 2022 17:10:50 +0000 (UTC) X-FDA: 80266953060.15.3A08DC2 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf05.hostedemail.com (Postfix) with ESMTP id 520DF100012 for ; Wed, 21 Dec 2022 17:10:48 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HfzeWMFr; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf05.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671642648; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:references:dkim-signature; bh=C7fcHUyzcW4F+ucufGFezQLMgXWfhaM9eBPMwThasic=; b=gWu7+M/80TFAR/Xqm3dmWziUaUvho6nHH2kAciYDK6pwLNaqStDv1uUUxpo5yuCsKWd6rO Croll52TAjRy4iLLxQEjz9oTsuvmSOslisciBogLZCdEg3rt4r3Ydu290/AXb1lI/CdOqr peu5LgsZ3kt+cEMXr4KP8r6uDNuIoVE= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HfzeWMFr; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf05.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671642648; a=rsa-sha256; cv=none; b=2rSytdXOemY6a9Xl9IqWmhQ38FKVUrvI+8oQe0vEYaMn+pwTHXTLzZlN6BklW99r88g538 YmKIjbxVBJXfY/4uY3XxUCK/ZYuU1ugjtRXP0m7Z1bRg/WoarXKmaNQnTWy8lCtVPFwdvp pvvlRw2SPBtaEQNKVBwKnyiNaVfzFAg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671642647; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=C7fcHUyzcW4F+ucufGFezQLMgXWfhaM9eBPMwThasic=; b=HfzeWMFruT5vmiWNp5IexF9VOEJfvJS/9gbovyrkwH1gSVu0BqQkJfDMThVy+jTx1pJPq3 d2fmgCJix9EXzFtffzTXxlMAGAwU1VrrutNeTf/iwsraxtUZjXoe3R2xibdrGKKAmvf9bC cgMyOw7TppLyEfTkg2xc/R75kOo3EHo= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-25-i--luYGMNhq27CYJNt6lVw-1; Wed, 21 Dec 2022 12:10:45 -0500 X-MC-Unique: i--luYGMNhq27CYJNt6lVw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 23A8B18E0045; Wed, 21 Dec 2022 17:10:45 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id CADF8112132C; Wed, 21 Dec 2022 17:10:44 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id A883040408D5D; Wed, 21 Dec 2022 14:09:34 -0300 (-03) Message-ID: <20221221170436.370028855@redhat.com> User-Agent: quilt/0.66 Date: Wed, 21 Dec 2022 13:58:05 -0300 From: Marcelo Tosatti To: atomlin@atomlin.com, frederic@kernel.org Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH v11 4/6] tick/nohz_full: Ensure quiet_vmstat() is called on exit to user-mode when the idle tick is stopped References: <20221221165801.362118576@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Rspamd-Queue-Id: 520DF100012 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: s8bdknbtycmq4eqw3kkonbgg3ruzfciw X-HE-Tag: 1671642648-955717 X-HE-Meta: U2FsdGVkX19PEhsv6jzhE1clw9eDyzs1q8z7o9+EZxW+yz7jnmbO5awlMCEeHNAFNqvcUjVKZebi711iojDhJoTDNu29hJRzii198jKja34Nz3GeY6v6j+rW5fPmTIeRzuBu7xTZgKkUsQ4bk2lZ9sdPs0arD6zO1zmvGBlEXcrl1cLRtnpYPOz5m5Am1GW5mj51DQyJ3x86IhTPnJ5NcAUL8TEziYV+vvdH7UNERB/hSn6umxWXGWe0IKVU8wx72DjCvTtqVsoHOWe19lTAYiLybw2WRzenZDiNOOdL/rCsEhKYd8w0ypTAUolOP3vqoSg4nj/5yxFaBEyWrWJ+3iGdLo/nCKLmGmkezEqBryBVsqhDQJY5tccLFh8FRq2lDev0bGfJNJJR7bLNWvp3MMh9JKqpNVC7lMX+mkuHlhKUrTsdqJRRmoLMQjozuAnqsWPn0q7jvqWx7gtGjLk3DC2Ow7Lyig5D6QWis/hRYO3jXuK4JjtEcCHtnFfdX1CqMwxxWcGVr1osM2kQclhZC8BePS1hFLQlzgVWVrv25xF3y28qdRxMLV/JeQQ4J1RU/kITW3IKtgxhfDN7YC2+2GZ/6Bnl6BQGYL0XdXr/7sB6bF7FC29URdengGN59WGBSTGi4BCYOrCT6c1st73MEXEegb/nJEXJk4BC0MwxXBDpl3i0nxFIJoX+KMe6BmI3BP3PXTLcCaBerb301qVqBBW5pWghWGR0HIy+UhedeQYnn/KmvQxLwqdIBt4/RFIZ79wx+Bgfq4Bpcn4O8QPYQb3RhSIWfFL4Zn0ZDETBo/GDyA1JJnOhOwXAdwf23KzzcZLV9yRwl9x5Tc3jLAyrGGKjkfyb35HyKW+l1UcQmyLVcyVptsNkmkP/9LbK5uQTsNetSRpvX63zAYCe0+LHR42emwNgwpQrn9EN/BQ9sng= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Aaron Tomlin For nohz full CPUs, we'd like the per-CPU vm statistics to be synchronized when userspace is executing. Otherwise, the vmstat_shepherd might queue a work item to synchronize them, which is undesired intereference for isolated CPUs. This patch syncs CPU-specific vmstat differentials, on return to userspace, if CONFIG_FLUSH_WORK_ON_RESUME_USER is enabled and the tick is stopped. A trivial test program was used to determine the impact of the proposed changes and under vanilla. The mlock(2) and munlock(2) system calls was used solely to modify vmstat item 'NR_MLOCK'. The following is an average count of CPU-cycles across the aforementioned system calls: Vanilla Modified Cycles per syscall 8461 8690 (+2.6%) Signed-off-by: Aaron Tomlin Signed-off-by: Marcelo Tosatti --- include/linux/tick.h | 5 +++-- kernel/time/tick-sched.c | 15 +++++++++++++++ 2 files changed, 18 insertions(+), 2 deletions(-) Index: linux-2.6/include/linux/tick.h =================================================================== --- linux-2.6.orig/include/linux/tick.h +++ linux-2.6/include/linux/tick.h @@ -11,7 +11,6 @@ #include #include #include -#include #ifdef CONFIG_GENERIC_CLOCKEVENTS extern void __init tick_init(void); @@ -272,6 +271,7 @@ static inline void tick_dep_clear_signal extern void tick_nohz_full_kick_cpu(int cpu); extern void __tick_nohz_task_switch(void); +void __tick_nohz_user_enter_prepare(void); extern void __init tick_nohz_full_setup(cpumask_var_t cpumask); #else static inline bool tick_nohz_full_enabled(void) { return false; } @@ -296,6 +296,7 @@ static inline void tick_dep_clear_signal static inline void tick_nohz_full_kick_cpu(int cpu) { } static inline void __tick_nohz_task_switch(void) { } +static inline void __tick_nohz_user_enter_prepare(void) { } static inline void tick_nohz_full_setup(cpumask_var_t cpumask) { } #endif @@ -308,7 +309,7 @@ static inline void tick_nohz_task_switch static inline void tick_nohz_user_enter_prepare(void) { if (tick_nohz_full_cpu(smp_processor_id())) - rcu_nocb_flush_deferred_wakeup(); + __tick_nohz_user_enter_prepare(); } #endif Index: linux-2.6/kernel/time/tick-sched.c =================================================================== --- linux-2.6.orig/kernel/time/tick-sched.c +++ linux-2.6/kernel/time/tick-sched.c @@ -26,6 +26,7 @@ #include #include #include +#include #include @@ -519,6 +520,22 @@ void __tick_nohz_task_switch(void) } } +void __tick_nohz_user_enter_prepare(void) +{ + if (tick_nohz_full_cpu(smp_processor_id())) { +#ifdef CONFIG_FLUSH_WORK_ON_RESUME_USER + struct tick_sched *ts; + + ts = this_cpu_ptr(&tick_cpu_sched); + + if (ts->tick_stopped) + quiet_vmstat(true); +#endif + rcu_nocb_flush_deferred_wakeup(); + } +} +EXPORT_SYMBOL_GPL(__tick_nohz_user_enter_prepare); + /* Get the boot-time nohz CPU list from the kernel parameters. */ void __init tick_nohz_full_setup(cpumask_var_t cpumask) { From patchwork Wed Dec 21 16:58:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 13078951 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82F24C4332F for ; Wed, 21 Dec 2022 17:10:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DCE1D8E0002; Wed, 21 Dec 2022 12:10:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D7E188E0001; Wed, 21 Dec 2022 12:10:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C465B8E0002; Wed, 21 Dec 2022 12:10:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B14D28E0001 for ; Wed, 21 Dec 2022 12:10:50 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 7939C4079A for ; Wed, 21 Dec 2022 17:10:50 +0000 (UTC) X-FDA: 80266953060.18.5552E48 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf21.hostedemail.com (Postfix) with ESMTP id C8CD41C0013 for ; Wed, 21 Dec 2022 17:10:48 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=O0Masb6Y; spf=pass (imf21.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671642648; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:references:dkim-signature; bh=GqbgBqzYEO8PAbvMLoGoPX+4TxxqLWlHdIHKhW7PgDE=; b=Sga9Qpa5OcSwRGbQtpRzbzdNVu5Xf/gXAaFSHi3Bt8/BQYnb7yIwUDwL6hAA98TJa+Jnkq XRk/zJpkS+bhFbQ7suOSboVUU4IDRof42xNBCRuM/BzhKBj5ymNzlqHHuUBa91eQgKN+XU rCLzM4DShuKB8p06MI6tp7fwn1kOmsM= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=O0Masb6Y; spf=pass (imf21.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671642648; a=rsa-sha256; cv=none; b=lrp6+3dIXqqwmTKzIgtIquh/Q6EyxJKKtk1crzHH15p89b25/3P+9yRlLuVajYsKvGgfhZ xedq6jiKu87Sh/aAGLS61hn0Jj74dLfZppBlcf1p2tfOCQVrAGVLi5Asrb0sxHIXx05Xdt 5rxdTPscjzZd68+5TR8f3Jtn+mXgi18= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671642648; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=GqbgBqzYEO8PAbvMLoGoPX+4TxxqLWlHdIHKhW7PgDE=; b=O0Masb6Yk0IMtdbflG3P7HVsGy/A3UdYPBqcEoRT2gZhB5Wsc8CNES5m6hn9aKGMqdBy+o zqsgIE1ct/dl1hOArsAZPBMhPe3zd+UdfxVwytd9uj7OY7FAdcrW+dJSNHwH4vZV8mhpJP mfeH8/CfDdIUY9UHBz37YG9zMiUL0AE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-306-91eF53Q9Mv229HhI4Py33g-1; Wed, 21 Dec 2022 12:10:45 -0500 X-MC-Unique: 91eF53Q9Mv229HhI4Py33g-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3EA4F882823; Wed, 21 Dec 2022 17:10:45 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id CADD92166B26; Wed, 21 Dec 2022 17:10:44 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id ADC4340408D5E; Wed, 21 Dec 2022 14:09:34 -0300 (-03) Message-ID: <20221221170436.409732339@redhat.com> User-Agent: quilt/0.66 Date: Wed, 21 Dec 2022 13:58:06 -0300 From: Marcelo Tosatti To: atomlin@atomlin.com, frederic@kernel.org Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH v11 5/6] tick/sched: Ensure quiet_vmstat() is called when the idle tick was stopped too References: <20221221165801.362118576@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: C8CD41C0013 X-Stat-Signature: m4wbbqgmss8pnparffpu3fyb5migfza7 X-Rspam-User: X-HE-Tag: 1671642648-456134 X-HE-Meta: U2FsdGVkX19uFx50IQN8bfybrVW4qFrx8Gk7j2uv38jTBYZtNWG2UVDkOy8fwNP70kcp0HcUd0majhG4Ol1ERuUJuCHD4Z+1Ienq6e0rWHWKXJhVPm0QLjIzSA/AH1iJoegCIsUoPJIkAJ5lUqHFZPL45bFQkyxlsvwmbaFj42y/WsynptSO+uqcbihUHFnMMbLvPHyjmqMTDYP3NnsNugbHVjzK8MMosyMBZhWszsQEWsksHhhVLuBm+sxGhu0LvnzaZlX30yjGenBDOLAQuF0hfk8e8rHQyzbqBIZubArxrTrSPMY+UHIhAbD+DHL2ZLJDSzK/Vyk3f+YnAqlEZqLnV1MprXLDtQFmfH0Vhq4z4UjJk8Tz6UXS6l3VZaA+93/jHABUH+JQoTgLG5rv8v+a5lLvrZ4ZLxQnJ8f59IBdDli3plMolecDRqT1dfQWVUoQsUzE6GjvNqlm3mPV7n7/lMPU9CPaDqr/nRldtxiSgRGyIPiVAuW441NY4PRN82sWKdIPQwXieRWiDZoIkOylc7pmVvHjQvjelJzUCEqdh7YM4ylf/oxuPwuYb5fWky9tbNL6K70OfZ3RsDmwd2m2EFO0DtnMpaBVUXG0OplIvBovKptVyTBgYvtJXrtyvPHey3Iv3E1blXukRxJOj+JczqWytF1P15ct7yq8a3cHib3m3/3YH1x3EhhQzFOh8PdlrKR9WgYg3/+1RcC4fiAAzbGd1t7b1k08m1WhfyhqBWCRob0jUnF5fM5CWamouNCC5DKlVtfAiZu9VWJF1Hvu7BRTN9BGd7bliWMErzTTxGZzOVLZkQpiIFIwvjyMW4Xv20oRYx2RTE7hmeLhDWq9qgCEmQ7Q/ix4wCHIEC2ABygIazqr2LSWZGJdpaNOiZX2SdbA3zc2zTdmztrjaZw5vRXMDqXzlLCJHYOCthQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Aaron Tomlin In the context of the idle task and an adaptive-tick mode/or a nohz_full CPU, quiet_vmstat() can be called: before stopping the idle tick, entering an idle state and on exit. In particular, for the latter case, when the idle task is required to reschedule, the idle tick can remain stopped and the timer expiration time endless i.e., KTIME_MAX. Now, indeed before a nohz_full CPU enters an idle state, CPU-specific vmstat counters should be processed to ensure the respective values have been reset and folded into the zone specific 'vm_stat[]'. That being said, it can only occur when: the idle tick was previously stopped, and reprogramming of the timer is not required. A customer provided some evidence which indicates that the idle tick was stopped; albeit, CPU-specific vmstat counters still remained populated. Thus one can only assume quiet_vmstat() was not invoked on return to the idle loop. If I understand correctly, I suspect this divergence might erroneously prevent a reclaim attempt by kswapd. If the number of zone specific free pages are below their per-cpu drift value then zone_page_state_snapshot() is used to compute a more accurate view of the aforementioned statistic. Thus any task blocked on the NUMA node specific pfmemalloc_wait queue will be unable to make significant progress via direct reclaim unless it is killed after being woken up by kswapd (see throttle_direct_reclaim()). Consider the following theoretical scenario: - Note: CPU X is part of 'tick_nohz_full_mask' 1. CPU Y migrated running task A to CPU X that was in an idle state i.e. waiting for an IRQ; marked the current task on CPU X to need/or require a reschedule i.e., set TIF_NEED_RESCHED and invoked a reschedule IPI to CPU X (see sched_move_task()) 2. CPU X acknowledged the reschedule IPI. Generic idle loop code noticed the TIF_NEED_RESCHED flag against the idle task and attempts to exit of the loop and calls the main scheduler function i.e. __schedule(). Since the idle tick was previously stopped no scheduling-clock tick would occur. So, no deferred timers would be handled 3. Post transition to kernel execution Task A running on CPU X, indirectly released a few pages (e.g. see __free_one_page()); CPU X's 'vm_stat_diff[NR_FREE_PAGES]' was updated and zone specific 'vm_stat[]' update was deferred as per the CPU-specific stat threshold 4. Task A does invoke exit(2) and the kernel does remove the task from the run-queue; the idle task was selected to execute next since there are no other runnable tasks assigned to the given CPU (see pick_next_task() and pick_next_task_idle()) 5. On return to the idle loop since the idle tick was already stopped and can remain so (see [1] below) e.g. no pending soft IRQs, no attempt is made to zero and fold CPU X's vmstat counters since reprogramming of the scheduling-clock tick is not required/or needed (see [2]) ... do_idle { __current_set_polling() tick_nohz_idle_enter() while (!need_resched()) { local_irq_disable() ... /* No polling or broadcast event */ cpuidle_idle_call() { if (cpuidle_not_available(drv, dev)) { tick_nohz_idle_stop_tick() __tick_nohz_idle_stop_tick(this_cpu_ptr(&tick_cpu_sched)) { int cpu = smp_processor_id() if (ts->timer_expires_base) expires = ts->timer_expires else if (can_stop_idle_tick(cpu, ts)) (1) -------> expires = tick_nohz_next_event(ts, cpu) else return ts->idle_calls++ if (expires > 0LL) { tick_nohz_stop_tick(ts, cpu) { if (ts->tick_stopped && (expires == ts->next_tick)) { (2) -------> if (tick == KTIME_MAX || ts->next_tick == hrtimer_get_expires(&ts->sched_timer)) return } ... } So, the idea of this patch is to ensure refresh_cpu_vm_stats(false) is called, when it is appropriate, on return to the idle loop if the idle tick was previously stopped too. A trivial test program was used to determine the impact of the proposed changes and under vanilla. The nanosleep(2) system call was used several times to suspend execution for a period of time to approximately compute the number of CPU-cycles in the idle code path. The following is an average count of CPU-cycles: Vanilla Modified Cycles per idle loop 151858 153258 (+1.0%) Signed-off-by: Aaron Tomlin Signed-off-by: Marcelo Tosatti --- kernel/time/tick-sched.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Index: linux-2.6/kernel/time/tick-sched.c =================================================================== --- linux-2.6.orig/kernel/time/tick-sched.c +++ linux-2.6/kernel/time/tick-sched.c @@ -928,13 +928,14 @@ static void tick_nohz_stop_tick(struct t */ if (!ts->tick_stopped) { calc_load_nohz_start(); - quiet_vmstat(false); ts->last_tick = hrtimer_get_expires(&ts->sched_timer); ts->tick_stopped = 1; trace_tick_stop(1, TICK_DEP_MASK_NONE); } + /* Attempt to fold when the idle tick is stopped or not */ + quiet_vmstat(false); ts->next_tick = tick; /* From patchwork Wed Dec 21 16:58:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 13078955 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5B6FC4167B for ; Wed, 21 Dec 2022 17:10:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E44FF8E0008; Wed, 21 Dec 2022 12:10:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E1A8D8E0005; Wed, 21 Dec 2022 12:10:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B82FF8E0008; Wed, 21 Dec 2022 12:10:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 9173F8E0005 for ; Wed, 21 Dec 2022 12:10:53 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 6D1F8A09F5 for ; Wed, 21 Dec 2022 17:10:53 +0000 (UTC) X-FDA: 80266953186.11.A4ADBD7 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf22.hostedemail.com (Postfix) with ESMTP id D4B72C001E for ; Wed, 21 Dec 2022 17:10:51 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UR+48wp1; spf=pass (imf22.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671642651; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:references:dkim-signature; bh=TlUBF+2Ovi4fw5cA3chagzYiWYAEFxQHjy6HVV9843g=; b=YEU8/kRhZWleie303ve/1GqgNZhcLI436/jOIcxWbGXuX/q+ZAMF6JdycrFepnBkLKZtfM hnUQpk1xAuXJg/eew87AIxjF4jRufdVOeJcP+2R+86s0nUa9v/IwOKxW/avo7PWauaJbl4 tSMFmOQbBhXNJLggB+BduaL/wQ6zHxo= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UR+48wp1; spf=pass (imf22.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671642651; a=rsa-sha256; cv=none; b=0FSqDSdjGABoWwFE3PY4WiWeQ1wnJyCaqCwRlVYn3L44IP0iJZnK9UJUYAMGakaLUOQC03 fvuWshF9x+xvfGsSzeILarfI5H2blPxDzZXLvK+pvht+dDgiSRXnlqfD+b3TJG+xVkS3XZ TvMqZom8mdcZL1fFogvZe8fyg5yCc+c= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671642651; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=TlUBF+2Ovi4fw5cA3chagzYiWYAEFxQHjy6HVV9843g=; b=UR+48wp16vL8S8kqlGgp9m++9gZrvRQ1BhYrCrBKsxY/kzA3hbA7z0Fco2l/yKM5SZJt3H p2Q24dSm/URqofnGckNPmJ2p1isZ30Oxt/u+K4h75tv4IZMfCd1cnT+BRNV83EvLEIQD9P kyX1RXgrF+uPXed75atrhzhsfgwa4WM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-483-4PpJoZByPimihLmmqWBpNQ-1; Wed, 21 Dec 2022 12:10:46 -0500 X-MC-Unique: 4PpJoZByPimihLmmqWBpNQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 20631882827; Wed, 21 Dec 2022 17:10:46 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DA5E92026D4B; Wed, 21 Dec 2022 17:10:44 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id B17E640408D5F; Wed, 21 Dec 2022 14:09:34 -0300 (-03) Message-ID: <20221221170436.449941687@redhat.com> User-Agent: quilt/0.66 Date: Wed, 21 Dec 2022 13:58:07 -0300 From: Marcelo Tosatti To: atomlin@atomlin.com, frederic@kernel.org Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH v11 6/6] mm/vmstat: avoid queueing work item if cpu stats are clean References: <20221221165801.362118576@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: D4B72C001E X-Stat-Signature: t1bmyjqihc31metqueo3ui94153wqxi7 X-Rspam-User: X-HE-Tag: 1671642651-866796 X-HE-Meta: U2FsdGVkX1+Q3Fca4LD2+1iFeuDJcel8OWdFi6ICGN1MPKMjQkhcTGGKFNoiJcjSYGWsywyBxTG/oHbnG6EY4G0xKXJQBSSpbyc21XOSA6f+94cEFmCnHqqEErNmoBjlGDn+zFYoXSpYxSlaEVGp1buidG531yzbaBbrd/b0z/9bKIaSR3kAf6afVNRU9z5kMwBMTbcwBEnNNZy0IBiW0jK6SVr8OfaBRAzLPxzGUc01SIZoLYJhtkRkW3fSTg5iiCLnOt/2uVrPKVtXfk0O+Wf8IdjiIDaHXUAY96BKduiIta4t0qLAYKjE9FSa+ZcU8Wp+einmS8K7dOcIlbLksjlrJ3L4NU72yQMWoCDWQ5XogGqMAB2i0pm35Ca9wx9PMMc2ZDia7k55Ad3dZch1FOHKTrqG5qYR0A8YjmEkZZlxGYg10zSOQqFZblVOX8BhTDxT1OCol7/dTPxIQX3MJOYmsEv0oE/jLtgJ0cNgZgBFj0wKR4ZfejgipMJsxisiWxLCjEto7RcIw1ixALOSiLtr6y4BnBJ3rxV5LN1GNMW2LIOuXlkq9f/jl2SWTgLw5Vn/6fOThsS4vyrOcCSPBH+oWbi8tMVndb/6aBUI7Kox8ASLrTnoftBZXV4ZwQ+dvoPH//Xr9hByCjaNnPHUr5ngx7five7J8LHl90XVaC+jcZVvxkrJaWxMU2hs0H0mL+A1HkTjsjovDYPvPFHm5BCB5PPOhpo9g4xNkGlnWBOXZ3Xhqbe+nKxD0j8Jv3xbjBI0ROkU7w1IGdaxoZoAKo6z/dx1lf2mKL7vdP0TMgBfSYU05OJ1cOWyKjktjjsUzRYrwPCXrCRWSnNmvhjqBbPkqEoRM64i4uBL/LBQhRn3LmzYwUhMYeBhZpQob7Q1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: It is not necessary to queue work item to run refresh_vm_stats on a remote CPU if that CPU has no dirty stats and no per-CPU allocations for remote nodes. This fixes sosreport hang (which uses vmstat_refresh) with spinning SCHED_FIFO process. Signed-off-by: Marcelo Tosatti Index: linux-2.6/mm/vmstat.c =================================================================== --- linux-2.6.orig/mm/vmstat.c +++ linux-2.6/mm/vmstat.c @@ -1917,6 +1917,31 @@ static const struct seq_operations vmsta #ifdef CONFIG_SMP #ifdef CONFIG_PROC_FS +static bool need_drain_remote_zones(int cpu) +{ +#ifdef CONFIG_NUMA + struct zone *zone; + + for_each_populated_zone(zone) { + struct per_cpu_pages *pcp; + + pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); + if (!pcp->count) + continue; + + if (!pcp->expire) + continue; + + if (zone_to_nid(zone) == cpu_to_node(cpu)) + continue; + + return true; + } +#endif + + return false; +} + static void refresh_vm_stats(struct work_struct *work) { refresh_cpu_vm_stats(true); @@ -1926,8 +1951,12 @@ int vmstat_refresh(struct ctl_table *tab void *buffer, size_t *lenp, loff_t *ppos) { long val; - int err; - int i; + int i, cpu; + struct work_struct __percpu *works; + + works = alloc_percpu(struct work_struct); + if (!works) + return -ENOMEM; /* * The regular update, every sysctl_stat_interval, may come later @@ -1941,9 +1970,21 @@ int vmstat_refresh(struct ctl_table *tab * transiently negative values, report an error here if any of * the stats is negative, so we know to go looking for imbalance. */ - err = schedule_on_each_cpu(refresh_vm_stats); - if (err) - return err; + cpus_read_lock(); + for_each_online_cpu(cpu) { + struct work_struct *work = per_cpu_ptr(works, cpu); + struct vmstat_dirty *vms = per_cpu_ptr(&vmstat_dirty_pcpu, cpu); + + INIT_WORK(work, refresh_vm_stats); + + if (vms->dirty || need_drain_remote_zones(cpu)) + schedule_work_on(cpu, work); + } + for_each_online_cpu(cpu) + flush_work(per_cpu_ptr(works, cpu)); + cpus_read_unlock(); + free_percpu(works); + for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) { /* * Skip checking stats known to go negative occasionally.