From patchwork Tue Mar 28 22:16:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13191618 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE930C6FD18 for ; Tue, 28 Mar 2023 22:17:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3F6096B0078; Tue, 28 Mar 2023 18:17:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 355A06B007E; Tue, 28 Mar 2023 18:17:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1CF726B0080; Tue, 28 Mar 2023 18:17:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 0C92E6B0078 for ; Tue, 28 Mar 2023 18:17:03 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D38F280B35 for ; Tue, 28 Mar 2023 22:17:02 +0000 (UTC) X-FDA: 80619718284.15.1BB758F Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf12.hostedemail.com (Postfix) with ESMTP id 13EC940011 for ; Tue, 28 Mar 2023 22:17:00 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=h3KlqW+j; spf=pass (imf12.hostedemail.com: domain of 3W2cjZAoKCN4YOSRYAHMEDGOOGLE.COMLINUX-MMKVACK.ORG@flex--yosryahmed.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3W2cjZAoKCN4YOSRYAHMEDGOOGLE.COMLINUX-MMKVACK.ORG@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680041821; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PicCaNLBAprYzcnNZKIFMVmM/pcbqkg3z9jyIWcMc+U=; b=n8jZfsAuBDRS5C380KWR27ZvdC9bzqnifKhvh6uwhBHgmopC2oZzZ6xiYYq5hR2wkwlQ3d uKnyGzPYL4z7s6ktAh/X7DVAk+TRfRBxL+4GZOelLp4X2wbvQbubZmhLWyLakVEMBU1WPP ngQBL6vz/7Emq5IMoPexZEkmWib2hQ4= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=h3KlqW+j; spf=pass (imf12.hostedemail.com: domain of 3W2cjZAoKCN4YOSRYAHMEDGOOGLE.COMLINUX-MMKVACK.ORG@flex--yosryahmed.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3W2cjZAoKCN4YOSRYAHMEDGOOGLE.COMLINUX-MMKVACK.ORG@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680041821; a=rsa-sha256; cv=none; b=vr+sQ5Vp5LVmLRZbaUylIT9ll9ganvbH/0LBGWG/zhrx8upOT2Mb2r6xbgsMKD9t/7onGn vZe7OXDDlmnrpW4rMzAJH+x+R/xMNN1Xe2/CnPfxsSw/79X9LuhbuZampsMIFhXgOAV0dD c5RLB6uUmaj+vH8looK7sVAnxBJHKWw= Received: by mail-pf1-f202.google.com with SMTP id m12-20020a62f20c000000b0062612a76a08so6323742pfh.2 for ; Tue, 28 Mar 2023 15:17:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680041820; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=PicCaNLBAprYzcnNZKIFMVmM/pcbqkg3z9jyIWcMc+U=; b=h3KlqW+jIEJYTK+BpLRlcd0fDHvnJwiAwYWj6YfVTBV4VsUspeq7grO2+Nyd5Hs9eL i4P1tnbfhFqPfNn0TjyAIHaXO2gZSr29zn48TaONjgFqCTDEnUQ3f8yJSse0IzidGN/G NpCoYbM12CNmhz16TMo5PtvGfqzYl/w2IuM1QOzqxIgbpDSLm4BrT8d/pKei+Ju4GCP/ Ogx40HCdcd1U7cHn91PPNKl9xn/1CmyhrjIjg+lC++Gh679/KOFvr1XoPNmeY+cl/pIr M7KlNiqZzpwmUKaPk8gBIyxokfcnZJdVNjXvvh+f/KGJuZhwnOInjtgGjgRxKbQyAU78 gp4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680041820; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PicCaNLBAprYzcnNZKIFMVmM/pcbqkg3z9jyIWcMc+U=; b=DCvYYeECX4TlRII+uMIwxyQlVaj58N+vsninMhg/QMGfnvA971b/qT3T2KdORtsXfI EOhV724a86dJfTNY63kKn92SB0jv6HWMr0w74QuaHleE1YrGO8fgnPoxBViKQBSZp9Gi YYVus5vIVKu3IyfHAF+VoQ3sTF+35GizEsetHy6CYftaCz72N4NaHdV82yn5k7HfreIZ O/AwmN8yt2SUYGT6U3YJ8gBEMzhPk/Qg1Q9YyEnEmbu9Mby+yJ+lAlmgi/FLVqvYp9eh dvEC4zeghW9Z9IihPAjLvySTY+HKR23YwiMw7QtoAHrZXsZB0VPzf/3pQm05ZKf5HHO+ kRXw== X-Gm-Message-State: AAQBX9cxivUXUdu38v8Za4SpT217Bdaqwvyea9SeQbt6C10873DGFctD 9Dah+69nB3fNlHMMLf0z/zzSmZqJcDOuGQQ+ X-Google-Smtp-Source: AKy350aocwWe2yH7hemOfHMFd7M5GQ6mfc2FkU5LSUuK9d8xU2zKVz1ThP6UHoZndV91Ud7xpyV60PqDU9oxBKYZ X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a63:ce08:0:b0:50c:6cd:cace with SMTP id y8-20020a63ce08000000b0050c06cdcacemr4745231pgf.2.1680041819916; Tue, 28 Mar 2023 15:16:59 -0700 (PDT) Date: Tue, 28 Mar 2023 22:16:40 +0000 In-Reply-To: <20230328221644.803272-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230328221644.803272-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog Message-ID: <20230328221644.803272-6-yosryahmed@google.com> Subject: [PATCH v2 5/9] memcg: replace stats_flush_lock with an atomic From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , " =?utf-8?q?Michal_Koutn=C3=BD?= " Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 13EC940011 X-Stat-Signature: chmjdirwniqefe4npg6m649ak8gqtgrf X-HE-Tag: 1680041820-689296 X-HE-Meta: U2FsdGVkX1/643WY/nvm/hqA2+54/4wt6lHg2nc7keazz02jofRT8wSykKOp/OKrRjPQA4KPkHIDu2G7PQJip14iNwk9d6oo/1j28nLFyO0buuE9P9/vVooUbAEK/aSamR8UiwQlcL9hfv5EqmGMb99bwQO2UmMax6NzaWuvp9Z3JKIWPnPXhdw4U+6Kg0K2ttk/k5uyCCgKhmKqEcl8+3H/pR5mFlpBI3ivQkkZpx6G/2aW4KgCpTRRVyZAy4qmOpQ8BlDL/PFZmrh5pjDarNMQ/HnLkZN0ep0YuzntNFT7bzMlPfQFNhLi4GSN0RzICIRpf8KYEkM/5yBwaQPuvu5LZvato0eONFevf8GEc2fczAREAJIP0ZuErIdal9fzR3V+SirlUBYrU6/McOst2fyA7HIVZ0fX1hunaa7ReztqOdYflWf+nAniQpsjq/bOrJW2rwiisyLPpxRit2UG91IZrmXBDsMcaJ7CbdaWISTW7ZhX/aLHIjXfjrKfwXkeWKeThdj/IYibFZm6gR3X2cJs4ezgkh7skLQwsfyBTmi1pHWwM/3+FNKG6SStleQNC/g5lgz1dvOrTjHVqBgMjEUvOlEieUktoicV6pgMjAyQHGtw6OvrNv+QqCrk6zExF1o5Hl72xgDgkOxMQYpSkPhO5gJ+17EptLPRZeBj/lcXFpAm0BuXwH46AmOcPW+rGPUNSmjE3s+BVLcZxGBj9H+VulKN1X4ZK2frEndZVO+7HNOw48+x9Yx390+x/IP1MBaKEp5IcQZCb8+gzdUJ8PpUAMooeU+LNqGO+j+eWbnq7ZVxCrJh1XMmjl9piL/5Og1Ip+moQhZmZK35U3MCH8GUS0MdQGUGxPSUT68nkCAfClwc39TK5qq9S39i+UzbKHbSpLLlZEqkvpEbLTgwL/NsX8D9RmiBvPDeKG/S8NnzISxRKxDj+gM5vpCaA8WVUW/J3J3nHCuzAt1bZgr EAEcM5Nf GwqD7p6Cn7ShZ+So83dtulgSMS4wR5uHjkTBXGb3MKreOGxCk/V/5NfBqrmWPmQTXm49j3yX1vPPImXkZkaW66BnBKhHIbVVzfadAsthZLmN0Vy4zPXEnnRSB8qUIZTAWNh26sY1qyo8yJ+PQau2tRCLcn7yVJbUbF1W0H9aLylpvrCX3aUdz/0l3dEitcEH5eXHEZvka2X8OlQ4+FNG26LT3AJbdi9k6w8QSId5qrswIATAMTM+HtBMch8/P9A1mXxpzrVwu0b4DS2uankZmLd1ddX/7VTsT+bfJGGv2j0ltrNMhFcV1nILa9Jz/RukQ5j+19tg1VnoIQI4l6YIOtQkolB1WHCn52Y10prM5n2fwTLdPqMumxP/Jk6cfMtskXScvX6nFsRjsAJvWwBSLPMFKDPNqJC1rehCZi4/kR916F+H6nzfEFJ8jH8TYxIcTkNWrMmRFzVkeaY/AnG44yRMGnl7iigaqm/slGGCzJoUEsdwJPGoRA/L2gNxdEx+3U1DgTXagb1v/CBrtQApoa9LlufD6c88ROUlwHCL3gX83M5njNxMrYXXrr3fPnEPZVd/ROMaB0cMHHYFOHYAGCkNJ11H3jEstjm53+fLZOZ3C+BvogHdWt+nkU7XJUP+bPX2iBzqhaZyEvuaRPJjAo4JERr2VQDs7GAstKiO3Cr6h6tLuqiHCqJU3N3EReugjs5XlxZoZHGemrfz96YjPzL5mNOqECkINF/nPZc69plHQRHGCGOh+LEQXlt5fnwHm5DqrJF2Q03DHmpxYGZHifNWMg9TSEReu1dNWmYgPeIR18UPpe691sL6ocxawr4tZxIWd X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: As Johannes notes in [1], stats_flush_lock is currently used to: (a) Protect updated to stats_flush_threshold. (b) Protect updates to flush_next_time. (c) Serializes calls to cgroup_rstat_flush() based on those ratelimits. However: 1. stats_flush_threshold is already an atomic 2. flush_next_time is not atomic. The writer is locked, but the reader is lockless. If the reader races with a flush, you could see this: if (time_after(jiffies, flush_next_time)) spin_trylock() flush_next_time = now + delay flush() spin_unlock() spin_trylock() flush_next_time = now + delay flush() spin_unlock() which means we already can get flushes at a higher frequency than FLUSH_TIME during races. But it isn't really a problem. The reader could also see garbled partial updates, so it needs at least READ_ONCE and WRITE_ONCE protection. 3. Serializing cgroup_rstat_flush() calls against the ratelimit factors is currently broken because of the race in 2. But the race is actually harmless, all we might get is the occasional earlier flush. If there is no delta, the flush won't do much. And if there is, the flush is justified. So the lock can be removed all together. However, the lock also served the purpose of preventing a thundering herd problem for concurrent flushers, see [2]. Use an atomic instead to serve the purpose of unifying concurrent flushers. [1]https://lore.kernel.org/lkml/20230323172732.GE739026@cmpxchg.org/ [2]https://lore.kernel.org/lkml/20210716212137.1391164-2-shakeelb@google.com/ Signed-off-by: Yosry Ahmed Acked-by: Johannes Weiner Acked-by: Shakeel Butt Acked-by: Michal Hocko --- mm/memcontrol.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ff39f78f962e..65750f8b8259 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -585,8 +585,8 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) */ static void flush_memcg_stats_dwork(struct work_struct *w); static DECLARE_DEFERRABLE_WORK(stats_flush_dwork, flush_memcg_stats_dwork); -static DEFINE_SPINLOCK(stats_flush_lock); static DEFINE_PER_CPU(unsigned int, stats_updates); +static atomic_t stats_flush_ongoing = ATOMIC_INIT(0); static atomic_t stats_flush_threshold = ATOMIC_INIT(0); static u64 flush_next_time; @@ -636,15 +636,19 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) static void __mem_cgroup_flush_stats(void) { - unsigned long flag; - - if (!spin_trylock_irqsave(&stats_flush_lock, flag)) + /* + * We always flush the entire tree, so concurrent flushers can just + * skip. This avoids a thundering herd problem on the rstat global lock + * from memcg flushers (e.g. reclaim, refault, etc). + */ + if (atomic_read(&stats_flush_ongoing) || + atomic_xchg(&stats_flush_ongoing, 1)) return; - flush_next_time = jiffies_64 + 2*FLUSH_TIME; + WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME); cgroup_rstat_flush_atomic(root_mem_cgroup->css.cgroup); atomic_set(&stats_flush_threshold, 0); - spin_unlock_irqrestore(&stats_flush_lock, flag); + atomic_set(&stats_flush_ongoing, 0); } void mem_cgroup_flush_stats(void) @@ -655,7 +659,7 @@ void mem_cgroup_flush_stats(void) void mem_cgroup_flush_stats_ratelimited(void) { - if (time_after64(jiffies_64, flush_next_time)) + if (time_after64(jiffies_64, READ_ONCE(flush_next_time))) mem_cgroup_flush_stats(); }