From patchwork Thu Feb 27 21:55:42 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: JP Kobryn X-Patchwork-Id: 13995243 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A77D7C197BF for ; Thu, 27 Feb 2025 21:56:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 079A36B008C; Thu, 27 Feb 2025 16:56:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F41486B0092; Thu, 27 Feb 2025 16:56:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DE415280004; Thu, 27 Feb 2025 16:56:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id BB0F26B008C for ; Thu, 27 Feb 2025 16:56:03 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 478544AA3C for ; Thu, 27 Feb 2025 21:56:03 +0000 (UTC) X-FDA: 83167083006.03.AE04744 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf04.hostedemail.com (Postfix) with ESMTP id 33FEC40009 for ; Thu, 27 Feb 2025 21:56:01 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jdRxp1Uh; spf=pass (imf04.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740693361; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mG5vp9T1tPNWJYl4dV5u9R8irLnKVTPrFYBdhD+XKBw=; b=qvcRHKb+1TIaYdH3n6L1nk9u+vIMr8EqWfFbv1T6a5e7JFapnZJ4GnB4jdFnMAMwyww4GP cqGJ5YnY9l/+MJb9Gqr3eRPKX8XHVNfSBri0pTe72dejR83fTYPoHTrusUd/bG2ExJZujW bqGfv2MNDeKPqqZcvPFpVg4IwnAb/a4= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jdRxp1Uh; spf=pass (imf04.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740693361; a=rsa-sha256; cv=none; b=MPwlJ6+W9XoRqrnuAtB/0uWWjJv1/paQV7nX2vCl3TQ1WVyp6uX2mIEUv7qHMVosSuHDx/ sA3eEDZqbEhGij7LMHokJmvuftC6bTtwIXsj3u8BlX7vpdYlkdKvFLswQ16lpcrRGUWwoJ bDMK4no46Mps65rmoXmJQRaw2FKZKpk= Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-2234bec7192so38861555ad.2 for ; Thu, 27 Feb 2025 13:56:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740693360; x=1741298160; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mG5vp9T1tPNWJYl4dV5u9R8irLnKVTPrFYBdhD+XKBw=; b=jdRxp1UhpEi0aunu4E3Mfe0mRMJhoOlSCYiawzFF6q4x1ssARh3HTbQoooNntLG2fS ZTZh2vdyAKlmWPxMA/Qc5FkluvDGqhNVg3ez1RkbEZqgDAO8a128xgxa9nYzFe5Lut0S HfTLasNqpTjA2I8kXhtkvIIbt5YCl3FKPTrqSCdsRNsrGFw7uYz88N4BgvxN+Cil8zDU BpWzNyuNcmRsQSiQnM+7XpL/Fe4VvymhmVwfexDCpMj/fx2wgtBx5xKXeLG0trWsRhtY V6v8AntEtVDXCYVQI86Xz+AWEJbZMOxzkyL3uIdX8thyFgD/CPdZJVFR37uynMvaqcFY TcnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740693360; x=1741298160; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mG5vp9T1tPNWJYl4dV5u9R8irLnKVTPrFYBdhD+XKBw=; b=H1YduPndRODxGjqnpAh/1VBAHt4T5B9SRqbE50G0OqmdoK+aK2zfCA2aree4F9pEYM 1wORRb1U+pCoEHL4tMgLcDDN9Evy6wC/cpLoWR9hoJzPp7e2CeDTR+LfkGuUXTTiv3Nv Z/D+2p9nNLQuSxHfBAu8pyXZM5XZF3moBJ7ckqAGBDXWHDWqDOTkebyzA9LS4madbXiA 0Kk7gdDZxaRcJJkkSb+zKotsGPU5Pv/AyveMKTWIH9XrKR5qMNqRPuuDZRDVz6g+NW+T jcjhEDW5klXZXfAa9pJhMuWc8ruBkeWb+TN0+TbJ54nR9BVbcbRiJ7ejmu2RXBnr6WvW O7Lg== X-Gm-Message-State: AOJu0YwZ15835qBrBEfObSxFTf6CuaYeHi8UZUrCqqk6XS2pht+G2iuV 2kz5TaqV8NUGDljeGwuPldKVstdZ5oXFuYZIP2ha/RZIL4jRUactSvTG8Q== X-Gm-Gg: ASbGncsZX8RfW6Rf1G+F0hTK59N6PTl5KOA8VNcupaGGAi/jubay7d5tY79IDw7+nG1 TmbvzFzSjeJhrnO7LLxMFzra5GJiB5gdcIeNL9Hoz+y4xD7+1GyZWzzg92DnnLwwsRaSjGO/WSw XEfFuSLRwhvZnPbjD6V+ykh0bJipKHYg4QEnqwvV9Qnv5YS/DcdNylmaOw6phTTRH+uBTIcQFvF gN05Sy30IkQcOTTngTSGmIV8vYzMuZ1gtPMvrN5lktSGVnQwm0yqp6FqkL7BluIlgjDPxQqSYYs kW9EGBQzdK0f9sMF2vV3gYvreKXs0L3XCzQ5fTCxn43bf+rOX4LgcQ== X-Google-Smtp-Source: AGHT+IF+ODqFionKKWEwTTB+YdOdfH0lHB5SMuZxWTz9E/RuYeTRSAvL8fa3TKA7DBGZe3irBXJCIw== X-Received: by 2002:a17:902:db0e:b0:223:619e:71da with SMTP id d9443c01a7336-2236921eb84mr13886465ad.49.1740693359918; Thu, 27 Feb 2025 13:55:59 -0800 (PST) Received: from jpkobryn-fedora-PF5CFKNC.thefacebook.com ([2620:10d:c090:500::4:4d60]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-734a003eb65sm2301321b3a.149.2025.02.27.13.55.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Feb 2025 13:55:59 -0800 (PST) From: inwardvessel To: tj@kernel.org, shakeel.butt@linux.dev, yosryahmed@google.com, mhocko@kernel.org, hannes@cmpxchg.org, akpm@linux-foundation.org Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 3/4 v2] cgroup: separate rstat locks for subsystems Date: Thu, 27 Feb 2025 13:55:42 -0800 Message-ID: <20250227215543.49928-4-inwardvessel@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250227215543.49928-1-inwardvessel@gmail.com> References: <20250227215543.49928-1-inwardvessel@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: oi11ehbc9cpssw8cwpopsf4s7c98mxjq X-Rspamd-Queue-Id: 33FEC40009 X-Rspamd-Server: rspam07 X-HE-Tag: 1740693361-567134 X-HE-Meta: U2FsdGVkX1+hx5E+H/46o3U66bFwV3pL0WCEcpEJNkLDIkB5Q0JrlaBkLXcuMhkDkoa+7OiGDxF/MfPkZdsXZmZvv4c2X8rm7/8dothp02YK3aBeSpH2oPhpn2ytyUmWgjsPo4ler2SvgImVOFduvefpHwk5vtEZWifhjTWcSLdOi8ux2uXXTM+PbMipGJb7KGalMgfpUiVD8Gh1LoX6nce1S2rd+BOOPaYiMARndyoXysNLiq+WXTBJGNJZYdeilWO86F+nMmviu+WHkt+ffRtQZyl2TYckpLKu+53NaSKHzsEvxe4vfd+tKlzrIWyh6OTXQW60xzTHP9m/u0HS13cztP1VW3rwR4FcGHeLhHtdlEQ05XfNz+44wL9SUl7ueNMw5wG0Knd0Jh41yFwIa/P++k/8MpQmGQYi3vHF+geQBm5f76LVybtWD45/HvxyApsAvsisf0i1DR+aqwRoCbo8GPIHoRqL5W0YlRUuuZP8PHEGaTP4V2m1Mm6sgbn+/P5AAecQC/DBlqqxtSi04EzkpN7mZ7J1/dd8Pva55dFKiyD10zg4tbJvyC3BYlnP1uSowrL6Ayz1bBrPb50qvSHtEl4rLM+2jwesOomeY77xmE77qTcrshstjg70yu2obVdcfRR2KEeiKsYTD8A3PBMmfzhBBVOXVvcWj5vS8WwhZTPLUz2gGrrINpJmmtiNBC2cSHvEIqoSuO88+V8fkLMGo8wa2VJ2K3d8C+VD2QJ5EtiNxrdxNvqGiszDQAUpzgxdqU/qFpVe9NfQ7DynHFN90lTOUJ2uDbV9my8e+KLIIJJBLxhqPLfgabJdnBP97k7v2MsgKS2kWO8HUMrr1BVEF58jH5da3splMnYMh/JCrA1wsdB/SyiWNhs7MjnwmOKiLLDfuNbsjiKOA5DvQXDwj/7tJIqv2RDiw2iIJpYGR3JIZfRx32xl3G0u8BCNeFrWy2G5xMr3g1fsjIx JxlbJWsf DqP3YVUWn5OH4+lNR+fBT2GP88mVuBWEFGRShVKPJwWG4GUdqIPR+5kxs9cnGbMZPgUzPLRzpTHIics3h0n2dx2q7sn3tITP6lP3cuLDDguw/4vQUntuHdNn+bhG+cmtXt/WHqw1aW+r2LhXGBa29M1YyS+zrqkVJH/7nW+W7Pii4pGLxVnqJpsg6DYnhNI4hVPbxlM+ve/TNBpxGIMw270CooMdlzXMF8h+g0uerWW1NW01u94m2fY93Hcp+LNvG04bPiGEhN8tE0zicGokd4pfKsYUOMSejEAPS9nRP6CvPMita7WYHiI9atcYIFgN8UwiKSi8jv9gMGSZdqdjTkRR+fT4dokMytATPC20ScUxTT7Jemr8QUpXk6RU9ISKbbBG0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: JP Kobryn Let the existing locks be dedicated to the base stats and rename them as such. Also add new rstat locks for each enabled subsystem. When handling cgroup subsystem states, distinguish between formal subsystems (memory, io, etc) and the base stats subsystem state (represented by cgroup::self) to decide on which locks to take. This change is made to prevent contention between subsystems when updating/flushing stats. Signed-off-by: JP Kobryn Reviewed-by: Shakeel Butt --- kernel/cgroup/rstat.c | 93 +++++++++++++++++++++++++++++++++---------- 1 file changed, 72 insertions(+), 21 deletions(-) diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index 88908ef9212d..b3eaefc1fd07 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -9,8 +9,12 @@ #include -static DEFINE_SPINLOCK(cgroup_rstat_lock); -static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_cpu_lock); +static DEFINE_SPINLOCK(cgroup_rstat_base_lock); +static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_base_cpu_lock); + +static spinlock_t cgroup_rstat_subsys_lock[CGROUP_SUBSYS_COUNT]; +static DEFINE_PER_CPU(raw_spinlock_t, + cgroup_rstat_subsys_cpu_lock[CGROUP_SUBSYS_COUNT]); static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu); @@ -20,8 +24,13 @@ static struct cgroup_rstat_cpu *cgroup_rstat_cpu( return per_cpu_ptr(css->rstat_cpu, cpu); } +static inline bool is_base_css(struct cgroup_subsys_state *css) +{ + return css->ss == NULL; +} + /* - * Helper functions for rstat per CPU lock (cgroup_rstat_cpu_lock). + * Helper functions for rstat per CPU locks. * * This makes it easier to diagnose locking issues and contention in * production environments. The parameter @fast_path determine the @@ -36,12 +45,12 @@ unsigned long _cgroup_rstat_cpu_lock(raw_spinlock_t *cpu_lock, int cpu, bool contended; /* - * The _irqsave() is needed because cgroup_rstat_lock is - * spinlock_t which is a sleeping lock on PREEMPT_RT. Acquiring - * this lock with the _irq() suffix only disables interrupts on - * a non-PREEMPT_RT kernel. The raw_spinlock_t below disables - * interrupts on both configurations. The _irqsave() ensures - * that interrupts are always disabled and later restored. + * The _irqsave() is needed because the locks used for flushing are + * spinlock_t which is a sleeping lock on PREEMPT_RT. Acquiring this lock + * with the _irq() suffix only disables interrupts on a non-PREEMPT_RT + * kernel. The raw_spinlock_t below disables interrupts on both + * configurations. The _irqsave() ensures that interrupts are always + * disabled and later restored. */ contended = !raw_spin_trylock_irqsave(cpu_lock, flags); if (contended) { @@ -87,7 +96,7 @@ __bpf_kfunc void cgroup_rstat_updated( struct cgroup_subsys_state *css, int cpu) { struct cgroup *cgrp = css->cgroup; - raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); + raw_spinlock_t *cpu_lock; unsigned long flags; /* @@ -101,6 +110,12 @@ __bpf_kfunc void cgroup_rstat_updated( if (data_race(cgroup_rstat_cpu(css, cpu)->updated_next)) return; + if (is_base_css(css)) + cpu_lock = per_cpu_ptr(&cgroup_rstat_base_cpu_lock, cpu); + else + cpu_lock = per_cpu_ptr(cgroup_rstat_subsys_cpu_lock, cpu) + + css->ss->id; + flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, true); /* put @css and all ancestors on the corresponding updated lists */ @@ -208,11 +223,17 @@ static struct cgroup_subsys_state *cgroup_rstat_updated_list( struct cgroup_subsys_state *root, int cpu) { struct cgroup *cgrp = root->cgroup; - raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(root, cpu); struct cgroup_subsys_state *head = NULL, *parent, *child; + raw_spinlock_t *cpu_lock; unsigned long flags; + if (is_base_css(root)) + cpu_lock = per_cpu_ptr(&cgroup_rstat_base_cpu_lock, cpu); + else + cpu_lock = per_cpu_ptr(cgroup_rstat_subsys_cpu_lock, cpu) + + root->ss->id; + flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, false); /* Return NULL if this subtree is not on-list */ @@ -315,7 +336,7 @@ static void cgroup_rstat_flush_locked(struct cgroup_subsys_state *css, struct cgroup *cgrp = css->cgroup; int cpu; - lockdep_assert_held(&cgroup_rstat_lock); + lockdep_assert_held(&lock); for_each_possible_cpu(cpu) { struct cgroup_subsys_state *pos; @@ -356,12 +377,18 @@ static void cgroup_rstat_flush_locked(struct cgroup_subsys_state *css, __bpf_kfunc void cgroup_rstat_flush(struct cgroup_subsys_state *css) { struct cgroup *cgrp = css->cgroup; + spinlock_t *lock; + + if (is_base_css(css)) + lock = &cgroup_rstat_base_lock; + else + lock = &cgroup_rstat_subsys_lock[css->ss->id]; might_sleep(); - __cgroup_rstat_lock(&cgroup_rstat_lock, cgrp, -1); - cgroup_rstat_flush_locked(css, &cgroup_rstat_lock); - __cgroup_rstat_unlock(&cgroup_rstat_lock, cgrp, -1); + __cgroup_rstat_lock(lock, cgrp, -1); + cgroup_rstat_flush_locked(css, lock); + __cgroup_rstat_unlock(lock, cgrp, -1); } /** @@ -376,10 +403,16 @@ __bpf_kfunc void cgroup_rstat_flush(struct cgroup_subsys_state *css) void cgroup_rstat_flush_hold(struct cgroup_subsys_state *css) { struct cgroup *cgrp = css->cgroup; + spinlock_t *lock; + + if (is_base_css(css)) + lock = &cgroup_rstat_base_lock; + else + lock = &cgroup_rstat_subsys_lock[css->ss->id]; might_sleep(); - __cgroup_rstat_lock(&cgroup_rstat_lock, cgrp, -1); - cgroup_rstat_flush_locked(css, &cgroup_rstat_lock); + __cgroup_rstat_lock(lock, cgrp, -1); + cgroup_rstat_flush_locked(css, lock); } /** @@ -389,7 +422,14 @@ void cgroup_rstat_flush_hold(struct cgroup_subsys_state *css) void cgroup_rstat_flush_release(struct cgroup_subsys_state *css) { struct cgroup *cgrp = css->cgroup; - __cgroup_rstat_unlock(&cgroup_rstat_lock, cgrp, -1); + spinlock_t *lock; + + if (is_base_css(css)) + lock = &cgroup_rstat_base_lock; + else + lock = &cgroup_rstat_subsys_lock[css->ss->id]; + + __cgroup_rstat_unlock(lock, cgrp, -1); } int cgroup_rstat_init(struct cgroup_subsys_state *css) @@ -435,10 +475,21 @@ void cgroup_rstat_exit(struct cgroup_subsys_state *css) void __init cgroup_rstat_boot(void) { - int cpu; + struct cgroup_subsys *ss; + int cpu, ssid; - for_each_possible_cpu(cpu) - raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu)); + for_each_subsys(ss, ssid) { + spin_lock_init(&cgroup_rstat_subsys_lock[ssid]); + } + + for_each_possible_cpu(cpu) { + raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_base_cpu_lock, cpu)); + + for_each_subsys(ss, ssid) { + raw_spin_lock_init( + per_cpu_ptr(cgroup_rstat_subsys_cpu_lock, cpu) + ssid); + } + } } /*