From patchwork Tue Mar 28 06:16:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13190558 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63F8AC76196 for ; Tue, 28 Mar 2023 06:16:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F3D7F280002; Tue, 28 Mar 2023 02:16:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F21F8280001; Tue, 28 Mar 2023 02:16:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D6686280002; Tue, 28 Mar 2023 02:16:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C9A4F280001 for ; Tue, 28 Mar 2023 02:16:53 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 9A83E1A011B for ; Tue, 28 Mar 2023 06:16:53 +0000 (UTC) X-FDA: 80617298706.18.4B21C83 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf14.hostedemail.com (Postfix) with ESMTP id D16BA10000D for ; Tue, 28 Mar 2023 06:16:51 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=dxPzci0v; spf=pass (imf14.hostedemail.com: domain of 3UoYiZAoKCBEF598Fry3vux55x2v.t532z4BE-331Crt1.58x@flex--yosryahmed.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3UoYiZAoKCBEF598Fry3vux55x2v.t532z4BE-331Crt1.58x@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679984211; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IdlOdGF0YCCYf+h2+Wx+Pr2285UVOLm52gCW9Jb/OEo=; b=Du1RYOtgxZl7RMoQ44s8el/HXUln3NlQttKc7Jyr+hJBqoPGs/o93fIOTqRwWsZu1nr76N ZD+RghXHxJ57RQHGPYpTaMOUqS8btBETke2Fhrs/KmCz7NkwJDXZHPPcdf2HWPULdaP/ix 5Qpj/ts+6c0KLeqVM2emFr+Wkiyddx4= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=dxPzci0v; spf=pass (imf14.hostedemail.com: domain of 3UoYiZAoKCBEF598Fry3vux55x2v.t532z4BE-331Crt1.58x@flex--yosryahmed.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3UoYiZAoKCBEF598Fry3vux55x2v.t532z4BE-331Crt1.58x@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679984211; a=rsa-sha256; cv=none; b=iSPSJlAeXzfNY4M4JVyxHSvTC/R3+qhaHbcJRHmvJTzG85KQ1bukTenMr10WtJceUq3Uow DoOrfkrLMZkcIER7DrF6f9/GXGzxA6c53zHdTzoHAD4PMwBcjKMNjIVztL3Vh+ir3qnmW+ klV1EOMWb4rg8U125DkYgqCehyjtYmw= Received: by mail-pl1-f201.google.com with SMTP id x4-20020a170902ec8400b001a1a5f6f272so7068002plg.1 for ; Mon, 27 Mar 2023 23:16:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679984211; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=IdlOdGF0YCCYf+h2+Wx+Pr2285UVOLm52gCW9Jb/OEo=; b=dxPzci0vIh1xyJxlLBjWkEhAjm9YbPSdsUTO5u3B5j0mjvT8KDBeosrn5UiPm3Ksna ECFLKks3+i/JLwnVARkeiq9XCne6PDhOU+plEpcmqroZ0FIo5b3stpDp4azRz7RDEUXr fg0RX+xHWkbr/97SE9yMggMcSWIL9SDqbA2byVm27sTtHVM/RcaUTUFVwNm3hT6OyTW8 KGm1B3oJJOu2mLGqIrSENlsxO5AJatCTwMfnNnO9IVILzvLEURQL9lNGe4Pge9n4mWvC 01ODxHoNCu+DMqEmep2UjWrEts9KjVkOV7sWpi8aLYcEgH/6hlGK54jM+qhbvTAI+Y6D 5qIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679984211; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=IdlOdGF0YCCYf+h2+Wx+Pr2285UVOLm52gCW9Jb/OEo=; b=hkELiDIM7IY8S7L2MxyyF90s2O6I9mv+dSb323wZD0c5UpTEtup/iCXNBRb12tBlUj 6D7Tyw+ur+VPRxw9oY0SvdE2lhKBEJs8tq2pTv80jHHofKln/wI9K/4UjrU7hnwA8I1O syCiJh3f4gpukzuU4lTdjkz9Ic2MOUqR7r5MbVmHrUf85t9N7FjHPeCjXU4DM6qk5tGR yZvU+VFGA51YLcIduIo5xBG3OnrkZhQjwAUO1mQqrVYrsoxd24frrJaX914KCrNAmATj 87W6FM/DHZUGqNyJ56mLR0KuycphSFWW4iYfUg7Y6NqftO0qpFWzRJOc4KrBjnBNdZmo G7mg== X-Gm-Message-State: AAQBX9ds3fa6/J/JGYP75/tix3eMQN1PvwkwaTCfe5CFFqRdfSxpRDv0 TuXubSCCh+NTNWHDyUsP2SAvLVRfduwj8VkA X-Google-Smtp-Source: AKy350ZiMUeqTdYquDazwFQP3GVswROLJyitcbpH2br9NmUFXZZud9qfmMVPueO4yaGFykMkMD2PV8f94RgXrI2o X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a17:90a:74c9:b0:23f:1105:4295 with SMTP id p9-20020a17090a74c900b0023f11054295mr4298168pjl.7.1679984210825; Mon, 27 Mar 2023 23:16:50 -0700 (PDT) Date: Tue, 28 Mar 2023 06:16:34 +0000 In-Reply-To: <20230328061638.203420-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230328061638.203420-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog Message-ID: <20230328061638.203420-6-yosryahmed@google.com> Subject: [PATCH v1 5/9] memcg: replace stats_flush_lock with an atomic From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , " =?utf-8?q?Michal_Koutn=C3=BD?= " Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: D16BA10000D X-Rspam-User: X-Stat-Signature: uyksggcihr8ofj6wpn4c1g4zju19qzu9 X-HE-Tag: 1679984211-988156 X-HE-Meta: U2FsdGVkX1+diptex1VatIeXxEZTlxLZMwfIQR9GxRSwD+LRu+4ZV00/9L+4noxy5yWbjcmwcwHJkQlCeWQfPvtwid/1WbSM49zxidy0iseWE3rXdCINiL3qQwnJWIGAc11pETXuQGVYUX5xOysV+InnKVYYwLqfUKj7FAeG6yfJZx+r2GWkhLDjBdb9G+aGBjSxY3EQ1JQlAVKjjMbrf4Zu+nhIsyDp4l01YCHI7rQWw5izWravnIdgxXXLd1cF+//UZc9Pq76yNCagNgxjgeGtIZDt8oKyYHtD+Mw7DbfqJ6EQR2t41q5CZNyDgsT+ahq7ZcE0NXGV+bCWaS3/cmpkWoufwut6LKUgFxZ4qXbQpS8TPT8mO4g705EBc4NAhYAffzq154lCfVengqX3xhS0mGl+xAaSDvKLKuNjtgjLttT9tbeaOE5gNCdOdetZveWYc07gGmYWsq/VxA/7WNAL1CPr9z6o+OlAf1YE0sJoaPoK+4vNhLy0ka7GLRmjUTeSx5ORU2CgDB9ctZezaVq+/noRh9vJpIPVOZe36SIN4vqg9hxsAym5joGMC/qw7/wGDEBh9qgRcDV08GIy/qtlGxjXzN+D5HJznCNQySRXjmdat/S+fnAaYAi3WFCpaWpB+SmPJMq08mKb/0DOK4+Sr7W8ashD23ytnDQ8I1NXMtt1D7t3qWMS4/5PyINxhNe5QCFPX61SpTpFoTzAz84mThslA4Tz6/y++4K900irZgjnq5NQDPdMu9+oZd0gWATiCBlM4zVgYY9A7QER9i7WtxbvCWGMqJOSuQ+ayEqPUOF27GdXWlLk6d979OM9P436r9GLY6JdvSvORC77dyCKBysdG+umuVZOwaoxC3M59kS7gyrhqX+Cl0fDBWBSzrMNPf4A69hdA+Z5jcKTn4MKdbBWC31aYYs1FxN6B3aICBt07Gs8fWsjPrt4XOV9a+GGT4R1Ly5AoA6eolp /mi//Qih UfyRdrN2xKDmE+SrIHDQhVOKEcyeK9tECLPGBEa0IkwrhXiJKN0xqZYS7QU4ByZLM0op5OtMwlDet76bUwQA++gcz/4rVBBwBKU8TxKy/BdJoKD+XjtjQeiTa4Y3DBkQsTrUmj9ckwsC8Wbp7MsGw5aNltrndYjRbjmzcXlhFdBvh4k27KrqATfjnrGX91xiBxUK6eLPwuAAE3xCRsovksG+QfHMSzYjnrH8sGIZSQWt5mc1Kac3VVaWHfcDIoliXlFHMXnmF5MDJqJBkoiL/TzCXHbTDNnSCHoZwbNSmvnwPYx7MyAQQugTGGxyXwyBfGryHbP2IFHK/6VgKb3g0/u9h4pBvKLri4+oS0wFRbfeYuNR0sb8DRUrOFOYG9Lea8n7hDlJH/PjWuRbg0kmqYcwcqSgwqDowuT/taun4+Z3+vTUk1DquhY0dQKUcheliK48ym8BZWUW3dHiBzCQWRM1UtTR5Rv/uY1B3oZAB6BlDy9aItNEz8O8vJYW4J1Wzx1JT3LWDcfExn79LaKh4c7kDTWwx2GpBBv96zNZvoONAgYEHDu1yGmfmxAuh9+p2cxOxNw8wqjJRceeohR3kdzh6Hh/YMsHSPx9c7oTOzSMgkVH+4b+cdoDAsPxJxscnv538N2Pk/OkAN+hkbYzVJmdyS/0NVPwTEfNq3Rq5csCZk/pk8mgmfiOLKKRhIxL3WKxH7opZbrb5KJZJMKDxKMTiDAPw9+ojTAXA+PRl1ukxr58sQ2fvp+EDCT69qve9BObr9I3b57RoqNIGW4C7DGs7m7EzjQwP12Nv9rfOaaqDq7q0DW8vbVO68525EGUPS7yx X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: As Johannes notes in [1], stats_flush_lock is currently used to: (a) Protect updated to stats_flush_threshold. (b) Protect updates to flush_next_time. (c) Serializes calls to cgroup_rstat_flush() based on those ratelimits. However: 1. stats_flush_threshold is already an atomic 2. flush_next_time is not atomic. The writer is locked, but the reader is lockless. If the reader races with a flush, you could see this: if (time_after(jiffies, flush_next_time)) spin_trylock() flush_next_time = now + delay flush() spin_unlock() spin_trylock() flush_next_time = now + delay flush() spin_unlock() which means we already can get flushes at a higher frequency than FLUSH_TIME during races. But it isn't really a problem. The reader could also see garbled partial updates, so it needs at least READ_ONCE and WRITE_ONCE protection. 3. Serializing cgroup_rstat_flush() calls against the ratelimit factors is currently broken because of the race in 2. But the race is actually harmless, all we might get is the occasional earlier flush. If there is no delta, the flush won't do much. And if there is, the flush is justified. So the lock can be removed all together. However, the lock also served the purpose of preventing a thundering herd problem for concurrent flushers, see [2]. Use an atomic instead to serve the purpose of unifying concurrent flushers. [1]https://lore.kernel.org/lkml/20230323172732.GE739026@cmpxchg.org/ [2]https://lore.kernel.org/lkml/20210716212137.1391164-2-shakeelb@google.com/ Signed-off-by: Yosry Ahmed Acked-by: Johannes Weiner --- mm/memcontrol.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ff39f78f962e..64ff33e02c96 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -585,8 +585,8 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) */ static void flush_memcg_stats_dwork(struct work_struct *w); static DECLARE_DEFERRABLE_WORK(stats_flush_dwork, flush_memcg_stats_dwork); -static DEFINE_SPINLOCK(stats_flush_lock); static DEFINE_PER_CPU(unsigned int, stats_updates); +static atomic_t stats_flush_ongoing = ATOMIC_INIT(0); static atomic_t stats_flush_threshold = ATOMIC_INIT(0); static u64 flush_next_time; @@ -636,15 +636,18 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) static void __mem_cgroup_flush_stats(void) { - unsigned long flag; - - if (!spin_trylock_irqsave(&stats_flush_lock, flag)) + /* + * We always flush the entire tree, so concurrent flushers can just + * skip. This avoids a thundering herd problem on the rstat global lock + * from memcg flushers (e.g. reclaim, refault, etc). + */ + if (atomic_xchg(&stats_flush_ongoing, 1)) return; - flush_next_time = jiffies_64 + 2*FLUSH_TIME; + WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME); cgroup_rstat_flush_atomic(root_mem_cgroup->css.cgroup); atomic_set(&stats_flush_threshold, 0); - spin_unlock_irqrestore(&stats_flush_lock, flag); + atomic_set(&stats_flush_ongoing, 0); } void mem_cgroup_flush_stats(void) @@ -655,7 +658,7 @@ void mem_cgroup_flush_stats(void) void mem_cgroup_flush_stats_ratelimited(void) { - if (time_after64(jiffies_64, flush_next_time)) + if (time_after64(jiffies_64, READ_ONCE(flush_next_time))) mem_cgroup_flush_stats(); }