From patchwork Thu Mar 30 19:17:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13194754 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69F50C77B6D for ; Thu, 30 Mar 2023 19:18:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 339C2900004; Thu, 30 Mar 2023 15:18:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2E973280001; Thu, 30 Mar 2023 15:18:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0ED32900006; Thu, 30 Mar 2023 15:18:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E9CB8900004 for ; Thu, 30 Mar 2023 15:18:13 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C2227AC957 for ; Thu, 30 Mar 2023 19:18:13 +0000 (UTC) X-FDA: 80626525266.28.61E00B3 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf12.hostedemail.com (Postfix) with ESMTP id EFDAB40023 for ; Thu, 30 Mar 2023 19:18:11 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=X2RFp1pZ; spf=pass (imf12.hostedemail.com: domain of 3cuAlZAoKCPErhlkrTafXWZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--yosryahmed.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3cuAlZAoKCPErhlkrTafXWZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680203892; a=rsa-sha256; cv=none; b=QCUR8nHQQOs7dJIBRTH2QIsnN4uAm87NERso1v0/iI1FgBwkRYNzexwEEQH6/SJBNpdm/I lCig0ONPe3lYxhMeiECiJbFORdX2Svt85vD/1aduJwSQKNb+HP+kEllC1xIy2Q1KT3kcgS DxDG2X5Nd9z40nJufN2O0P5CB04n4uM= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=X2RFp1pZ; spf=pass (imf12.hostedemail.com: domain of 3cuAlZAoKCPErhlkrTafXWZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--yosryahmed.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3cuAlZAoKCPErhlkrTafXWZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680203892; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4nnUf1Bi8zb8HMhH9DtFsVEWdtAjMz82Sf34ZimV0fg=; b=SRnktgfIhbD9mKRWK+QI4wxKb5oqISfXOG+1wlzGpYY9g5O0srB+QbkXnpHT9cQQ0RJMqb lVySssD5all2iPP8e3OV8FPmGSDOhAJmpY0AxZ/iYFUtVFU0RHgc5wc2hLDSAGw6nR95uJ CuvHN1LwCpX4NXHMBt5jKkGVsHzsVrg= Received: by mail-pl1-f201.google.com with SMTP id l1-20020a170903244100b001a0468b4afcso11726622pls.12 for ; Thu, 30 Mar 2023 12:18:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680203891; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=4nnUf1Bi8zb8HMhH9DtFsVEWdtAjMz82Sf34ZimV0fg=; b=X2RFp1pZWOYnRsIh7IWjsVsM+KEFc9WJSHxF/6WWA0h+C4aKaj5fmmxIF9JIbSgACd wSFJlPVZHSShfit9sIv2gEMkHiLDYI1aj0ZjNxN+KDFEUuJ5hrS0dAiMoZtgG8tE01Qw Z6IuI1YwDe0K3jJKl09PQ+88WTXCXmLj3mXjZyc3NpqNvz62t2kYidmfKl22OQzA3sQQ EAjXhE1Jh+vHizPfZUZ3LQ+IzSkqjVL6en02vxZp22lS6vBDlsf8gGPBHsHSCXP5Ac15 JluiS/D70cGu14FmTesFiFf/4cN9gJ7flhGTivB535PauxJnWfGkMyHN0wJ36oB1Mtk7 YlVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680203891; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4nnUf1Bi8zb8HMhH9DtFsVEWdtAjMz82Sf34ZimV0fg=; b=PFBfqTO5YeTSEPgcUGX5+BHZxZniz7uQb0o05SdLWK7Txw3lsFaq9osqYhAesQ4ooJ /FasqmzcDGkiLxrCuomb3Vwb7uVS9CytwwCtcufyupBUl7riVps2o1V6n7sMpmBUUwmZ FWiDup+5i/czRna+nI7cN6ycHfFWSdIM8xX8s0gH748qDUWvS1fvRiJHQejkDMK6uv2b WMIU06M3H9Cndhzu1IqoITrDiIHHsa0lRyvgjkNeAhKuF0V8fWnNt7s5Tq5FWZQgqLyh +yKVX5KQilAW67KOqrq/4Xjb6Fk/527fUMGKEcNLfUVfZWhgFJIe7xFD8FJ+49nSte4D a+Fg== X-Gm-Message-State: AAQBX9d3JPe7/TDpklJp1Jw23TqaWw4fxPRmuuy9A8fjcdNqmBloXS7q Lfp1EeW98gAdThTJbqqYf9WbbjmZ/Yn2lN+J X-Google-Smtp-Source: AKy350ZmAXD9xXrcG5sJYCYyMiAcFBYVD5KvS63XKanxHd2/adSILGt8yBb9ajmjtAKFUTblNAGcd8BbimVIOaMZ X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a05:6a00:2286:b0:627:e6d5:ba2d with SMTP id f6-20020a056a00228600b00627e6d5ba2dmr13290706pfe.6.1680203890966; Thu, 30 Mar 2023 12:18:10 -0700 (PDT) Date: Thu, 30 Mar 2023 19:17:57 +0000 In-Reply-To: <20230330191801.1967435-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230330191801.1967435-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog Message-ID: <20230330191801.1967435-5-yosryahmed@google.com> Subject: [PATCH v3 4/8] memcg: replace stats_flush_lock with an atomic From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , " =?utf-8?q?Michal_Koutn=C3=BD?= " Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed , Michal Hocko X-Rspam-User: X-Rspamd-Queue-Id: EFDAB40023 X-Rspamd-Server: rspam01 X-Stat-Signature: fkqcdey1y1t3ueo3ppemp4zf6ox8ttzt X-HE-Tag: 1680203891-72137 X-HE-Meta: U2FsdGVkX18VU1r1IRSA6C5iHqqMJX0V7J8whux0a6ZJ+hIr9HJoQIiWUgM3hNBBvJXUhsWH10QPCdYH5aoJNaUdGxDjWO1IoyG5V9iiQ5/Ly+gh6P0tCQZq/NE42D2xszhpV0DgbypkffTaqpfMaQdNcpjjrZOA1G9yPoQXO2/RHQ0A/M1yhexugQLq2ycz3ml7EdLnQHCKvAi43DoxBqVlD8kqWOhPmAF//Slp22lyttzh390qtFjOCoWPB6ZpsI0FXAdoUyGlP1vOc8lLofq0oEdyQR2j8p92hahmcqqeX+mhqXNI9k886PaVXzvM2z3b+46XCNUsOrSahzZhhFzieuPutXDaqFLV1zKmvvKpK6TAu+XkUOh3yYf55XUbXN0kZVK6GlhDBU54iEsy00TPV+MlmjNEmOAqHIJm3ZTfmQDT7IcRXqBHyhiDt6x38V/9HzxBBHwy4hfzfT+d8s7Un/3Ni79rFE2G3YNPeqILzmDDLFPBuRXLCoPKmcbaRqjpHEIo5kuKWlnNlWvKvlXTyVm2W45b/jJEjEcMhoYwook8ZeTr3uCzgM+4Kkw7CNNjzKfdkoh4Gvl0xACMPYY8nhNviBANOgqK15VjerqtjT4XN9fpFYM62Db7n3431YUo0l0k2owjXluLU5dNgwstF937ciSc43UuiyeSZP9rV0vzEhw2dJitor2EEWujnuiDCjf/3hivoeMK/fxqPQYZxyhtpK+W8AgkUpjLWSfbk+x/YEtxvqPJ4Xpz5REkYzUkxiyiB+myhg+Y5HstLonnWhEpJUAW5OfG8sQBHPmkleV9BxxTfBesVYGD5rZRkDMBOnMt0gwJK9BSE3WMRYXC4u/jYLrj7AcX6XzFh0kN0OqzdksPuadIMuHm/gJszQmpNxYttAMeenQfyRN9w3ZV2vmfZd8ssi3KoaPe6cAINFH68FmHVL9WfQoLblVS1b/47kVmp8mxy32oVEp ouwuIAp9 AOGa3dYoSHh/JmFNqGbEJ6VwiPdN+R7U7Qp06Xs2XceR+M5anh7ZlbiNh26/qetw3qtI+QuJMSdfZDtexj20lrSpDAZVD7EugKc/NjKZhLP7MMJt6CaOjadz6f57wGbANyH3gjoJoKWkR5J4Vd1gQ81+asw0vubimwVohLDsOcl9Q3LqBUvRgg/zPgooXYZYzEcxq0GTH9AXaUbKSKRAVwVgYO6JYiwaa+iFKnZ4RuncrBITjP/1ZuiDxLP5U5AVEtPIFqHmGUwD4jTql9O/+w+6dqS9m31u77CygsKpKedCKWDUv1qF4nTA62SLckL11tRbwNJCNXUzXztm00dgOJht3LJ24QUmdV2uj2heAjGSBAxHq0HX9K4Z5r5QgxUQkp8mpYizu/7wbv3Xcxt5f0n8b/f+vizFrXNoth1+vyCjTOJznK8UhIxpBYdxWWeoosRs9n6FGCq7sTquNeLA4tdS3apOC9j+1OApSTYT9kECFOf0NYzk5BoE8KEjslORNq0UMuehKqqNVIPzYbXNQXdilfHSJ/Y5tQqi+PrGLiz711fedEUBjLeXit+XJ7W3sZh7ArGUYWXttgvq2tBMVlENOBARz3HiMJeHhj8rc1iKvw5s6sALWJTY67AXUTyGFdwOXJA3GHKY3YmgqeD1MupsP6+T/Cg6CFcblhQ/BeFJeOA7mbLwLEOaT7UELpHsR30+RyzXLeD3tPqTVYF0n50OCEG7H29Abx8IqQpPDbGXiP0RcemBTmXKGYtSOntrRnIq6S45qea6YPy3yUYxwTHVwautjzlOcUGoGDbg8tZJiX2MiXW4pvfC0UqS0rwDxV7kV9hLQzhJy0cINhCMgrCjUGP3kx4NMHe4f+qHTcaCojhAdeqdqsb/RuPcpHlBsXs08 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: As Johannes notes in [1], stats_flush_lock is currently used to: (a) Protect updated to stats_flush_threshold. (b) Protect updates to flush_next_time. (c) Serializes calls to cgroup_rstat_flush() based on those ratelimits. However: 1. stats_flush_threshold is already an atomic 2. flush_next_time is not atomic. The writer is locked, but the reader is lockless. If the reader races with a flush, you could see this: if (time_after(jiffies, flush_next_time)) spin_trylock() flush_next_time = now + delay flush() spin_unlock() spin_trylock() flush_next_time = now + delay flush() spin_unlock() which means we already can get flushes at a higher frequency than FLUSH_TIME during races. But it isn't really a problem. The reader could also see garbled partial updates if the compiler decides to split the write, so it needs at least READ_ONCE and WRITE_ONCE protection. 3. Serializing cgroup_rstat_flush() calls against the ratelimit factors is currently broken because of the race in 2. But the race is actually harmless, all we might get is the occasional earlier flush. If there is no delta, the flush won't do much. And if there is, the flush is justified. So the lock can be removed all together. However, the lock also served the purpose of preventing a thundering herd problem for concurrent flushers, see [2]. Use an atomic instead to serve the purpose of unifying concurrent flushers. [1]https://lore.kernel.org/lkml/20230323172732.GE739026@cmpxchg.org/ [2]https://lore.kernel.org/lkml/20210716212137.1391164-2-shakeelb@google.com/ Signed-off-by: Yosry Ahmed Acked-by: Johannes Weiner Acked-by: Shakeel Butt Acked-by: Michal Hocko --- mm/memcontrol.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ff39f78f962e..65750f8b8259 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -585,8 +585,8 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) */ static void flush_memcg_stats_dwork(struct work_struct *w); static DECLARE_DEFERRABLE_WORK(stats_flush_dwork, flush_memcg_stats_dwork); -static DEFINE_SPINLOCK(stats_flush_lock); static DEFINE_PER_CPU(unsigned int, stats_updates); +static atomic_t stats_flush_ongoing = ATOMIC_INIT(0); static atomic_t stats_flush_threshold = ATOMIC_INIT(0); static u64 flush_next_time; @@ -636,15 +636,19 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) static void __mem_cgroup_flush_stats(void) { - unsigned long flag; - - if (!spin_trylock_irqsave(&stats_flush_lock, flag)) + /* + * We always flush the entire tree, so concurrent flushers can just + * skip. This avoids a thundering herd problem on the rstat global lock + * from memcg flushers (e.g. reclaim, refault, etc). + */ + if (atomic_read(&stats_flush_ongoing) || + atomic_xchg(&stats_flush_ongoing, 1)) return; - flush_next_time = jiffies_64 + 2*FLUSH_TIME; + WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME); cgroup_rstat_flush_atomic(root_mem_cgroup->css.cgroup); atomic_set(&stats_flush_threshold, 0); - spin_unlock_irqrestore(&stats_flush_lock, flag); + atomic_set(&stats_flush_ongoing, 0); } void mem_cgroup_flush_stats(void) @@ -655,7 +659,7 @@ void mem_cgroup_flush_stats(void) void mem_cgroup_flush_stats_ratelimited(void) { - if (time_after64(jiffies_64, flush_next_time)) + if (time_after64(jiffies_64, READ_ONCE(flush_next_time))) mem_cgroup_flush_stats(); }