From patchwork Fri Jan 3 01:50:16 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: JP Kobryn X-Patchwork-Id: 13925087 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA8D9E77198 for ; Fri, 3 Jan 2025 01:50:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 83DD06B008A; Thu, 2 Jan 2025 20:50:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7C7C16B0092; Thu, 2 Jan 2025 20:50:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6188A6B0093; Thu, 2 Jan 2025 20:50:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3A6516B008A for ; Thu, 2 Jan 2025 20:50:41 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id DE6A5140108 for ; Fri, 3 Jan 2025 01:50:40 +0000 (UTC) X-FDA: 82964459466.06.1C88C66 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf30.hostedemail.com (Postfix) with ESMTP id 1A0BC80007 for ; Fri, 3 Jan 2025 01:49:00 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZashkFF0; spf=pass (imf30.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735869000; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=06Rq6Yp6bkyCHegIVI2ycFM+KAFm/5mAM9hGlEHX3pE=; b=PHSCs3gjdhQzE4V35NQ2OmDIvO9U8tJPwUA9cMj74VIncvPohcGyXfLVWpczSioMCI/4kE AlSmyDZLhbIEFf6D3sqihVyNoe076CZK14myrOC6B/I9n93wyCmrjZ4SCRLI4oEuAok8RJ xhDhkxj3/gqKesWvpbT+mQ8PADAWvnY= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZashkFF0; spf=pass (imf30.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735869000; a=rsa-sha256; cv=none; b=CJmoe48RQN0RmxbHd9Nhub3qYc1JvPSR9oXkxnabH7XrIK6LFfIq+zm13i7j8zbggfWclo uFvOf4tADxZcNCrAvvCTZKETd9DKKws1YJJRCAptBwAi/SKAk4FmM4EWIWPcwKU4P4gxwD i/EjcUdW55LM7uWk/0OlcTSLpEMzfxs= Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-21619108a6bso151367345ad.3 for ; Thu, 02 Jan 2025 17:50:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735869038; x=1736473838; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=06Rq6Yp6bkyCHegIVI2ycFM+KAFm/5mAM9hGlEHX3pE=; b=ZashkFF0FtmGZnc/XZzsXMDE+f7JAOyqOhkHhTjf3yZjXMb2WzOoXjJmErPkDMEHik X7T94oHCyr+XFwjsMcQSGXC7xz04I7gO/QOjxS3oglD986R46fg8lEKp9mA7DYlTALwK S1CePxVKm1LGtI0pD6yMY6gfGuGHAP3+p6ZPQ3yBXBnssN/0z1xK/FyKG6LBOwOJDiNo 7HtCY/6CfqlGdB10JfKbzIxwVAgyVefxSvltPuITfrrGj/qTDnnrygvlXw7YMJllxXFC +Xfk0XVxll72aiKag8MywliH2OZEkm2ciKpF7nBXZnStvq9gHJuAN1ziMx5dfUQZRRot 3ThA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735869038; x=1736473838; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=06Rq6Yp6bkyCHegIVI2ycFM+KAFm/5mAM9hGlEHX3pE=; b=HMFZ+Bv4G6q6o1RACz+AfPc5tsKYNmlBsVtp2a8U0J+nZIQdhhInzGyhWUtvaODC0F lRGUwS9eSuwO9NVBLReyoxZwD2sgmNhkqlho3/jvbMgTx0x/e2lnEFbx315TA/SWYYCT wVt+1lpDGupBm9HNo989zE1JVE9IXMll1UCiG03XhQo3Q/hk2ehbmX6iioiJfk0gMJcP wq/4aUDlWK5HZ6+aLvle19Gs5VP0ISlFkp0+TdsHCqRJ0xvGSTQ3jhWkcjte3cxKWQVE 2Je7E06PaRwhOxu7Kh+xWPYiCB0mcKVjcG8SiX+ZkNK2QZNRZn48f4WtRmNY3GR1+G5O Cfyg== X-Gm-Message-State: AOJu0YzljuFo6UwPyryTxvtUZT0uFIi+tfW8saXcB4cbLNTSLiThC4TB 8ywbNaF1a6iPRdqo+lDwmmgtzeX7kMuGM/RA6ULSGJ6AaixNbq1L X-Gm-Gg: ASbGnct11lkQjq+JrHB5Y26tPP6wtMf4pgIUuAE3itDUKhJLtJJBhCVvntwJT/sLu4Q vyef3Qsyg7HAehe+xVsvF1WIDMwzQBQ3hn1dlH/DtqIUiYkSQE3EEOW44aYPDPqN1zsjvlPEQlm FQnrmS+NF4ym/1PuQXNW1GvjDHv1ThGoX/BASgpOO7mZtb8cj0XDquJW55joKAtiSxndqBrR+c/ bZargueAX5sl+kh1ITHB7f6n04rSOheoyk9c2RLo8WLddmWsOw2wlEJAu9AnLKzvHU+iN5qzu+M ne9kTTvju9w65K7ZjA== X-Google-Smtp-Source: AGHT+IGrZJLbbkVIwX5AQ1w7ifn+2yxZj+rAsWC13phN3JNek86G54n2zYjuYiKjyl8m0KsBcHLbEg== X-Received: by 2002:a17:903:230c:b0:216:1543:195d with SMTP id d9443c01a7336-219e6eb3a5dmr635218695ad.25.1735869037897; Thu, 02 Jan 2025 17:50:37 -0800 (PST) Received: from saturn.. (c-67-188-127-15.hsd1.ca.comcast.net. [67.188.127.15]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dca04ce7sm228851505ad.283.2025.01.02.17.50.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Jan 2025 17:50:37 -0800 (PST) From: JP Kobryn To: shakeel.butt@linux.dev, tj@kernel.org, mhocko@kernel.org, hannes@cmpxchg.org, yosryahmed@google.com, akpm@linux-foundation.org Cc: linux-mm@kvack.org, cgroups@vger.kernel.org Subject: [RFC PATCH 5/9 v2] cgroup: separate locking between base css and others Date: Thu, 2 Jan 2025 17:50:16 -0800 Message-ID: <20250103015020.78547-6-inwardvessel@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250103015020.78547-1-inwardvessel@gmail.com> References: <20250103015020.78547-1-inwardvessel@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 1A0BC80007 X-Rspamd-Server: rspam12 X-Stat-Signature: 7cnhg3jzb9983eabbnzgkosaijm5uhtx X-Rspam-User: X-HE-Tag: 1735868940-909535 X-HE-Meta: U2FsdGVkX19fenkl/4YKy+meFwxcDDYzRgMpWd+Ug6ZDVEiKXNoQjfADj0LFzTV4Tm/X+N62eRa9jU/jqtLQMujxZZNjOjAckRuWcpLj6WWeo17k/el5c65XaWcsPdDnPXbrWwqgjpV1OP3nC+TVnIdryD9a03EV1NvgbwYsWjm1Xw93w5h2u5VYl1DR4qDlzqFMXYj8CMhOlZQJP6GkBvHzEWYMHA3cZBqmWIv2yqovuEeU6DWogSZFFfjk3Q8F3jkdR2gX59UmnNONsmG3MfsdJqFyYTUXDwY8YTSyNfnd2bBCOlv+9eGdKrLj/57DC92Y9RY8LU4vnhql9lesGl7qw5OPWmi1rogXpSnbdMgkSgU22UDudmDiYO5+WaZzKijOh9UnFoyHkfk/LEl73owlH8Uot9lxiKutjypVZST1FMahzpchpC64pKg96BniJFGX+0npZ0BV40tUstSJ7Q84sIxnZnB7i8K8otu67FZj+JlQygm6qfmoqamy/AQpvcgCZXQ600HZUnNs7tEruJZPJ4khTT/t98v34dwi+v6AghmPm5FBIzImRfKcOyaHrRm3YHUOWJXTol+UkvhDqbiYL6hToQvkle2HGKUtug5Se0qpVI1illuPYQZFfS49vU0Rlr7xiwm0PBJwycBLdxqpl3seCFvy3WePe6k+zfvVpOQIvG8PJiAicqwa0ZTn1mw9UBUtMwUiKXXXUNgXYQV9nJISRwS16oYwV+D4UteRK3pmH+cgqC0qoO2iQjtHBkY2uhDZcyf73YepDFlOZbt4j7ss7LTdt440i4iqOQgivqsuWafAKYPYfTYjuJyDWsdLmFIVuNUFjcEvm1SHwzV7aIWtDzChzmNeaVdgkab4vWtbQoCet5B/OhOMDq/b3J3TIfUbhxriJwAGm03zHERFbfpf5HCzsMXvTp0ME9O0b0K4PXqn/gORYSzikUMcrsu9LufJFLHUORBAx++ h/3ZvNjd AhTuYL9sS6C+zrvipUMOUD9FgUG7LoLJ5Yfo5r4AtXYBImQyI4Eh4A28EU1/cGbwbADYvEs/ympP/sRehW2zTnTQE8QC4/aXYpGr6VcOccHZhk4arb6ligywqJDjCrthfdNwV4ZuLmewX0QJM31YzPHdoY12x+xGeOqzonoj0HOR4b3Q6MA58I2y7iVK71eGv1dGbetCi0ycCdx3iKMJyjO0cBV6aUi3GUe0YuKroZnK20kaBCwIP2pJt5AhSllwJ/ACz2Mr/zf9MkvaSQFUTGTN2qkY5wTq+4IOgLcgWzeB0GgdLTFDdVPuAdWISppYSntHRU5WiVM+87gHIGMfERo4ZvFvqp5x/VKGSEJBhfKJtkBhRsuI8KMbJ8g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Separate locks can be used to eliminate contention between subsystems that make use of rstat. The base stats also get their own lock. Where applicable, check for the existence of a subsystem pointer to determine if the given cgroup_subsys_state is the base css or not for deciding which lock to take. Signed-off-by: JP Kobryn --- include/linux/cgroup-defs.h | 2 + kernel/cgroup/rstat.c | 92 +++++++++++++++++++++++++------------ 2 files changed, 65 insertions(+), 29 deletions(-) diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 1932f8ae7995..4d87519ff023 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -790,6 +790,8 @@ struct cgroup_subsys { * specifies the mask of subsystems that this one depends on. */ unsigned int depends_on; + + spinlock_t rstat_lock; }; extern struct percpu_rw_semaphore cgroup_threadgroup_rwsem; diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index 4381eb9ac426..958bdccf0359 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -9,8 +9,9 @@ #include -static DEFINE_SPINLOCK(cgroup_rstat_lock); -static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_cpu_lock); +static DEFINE_SPINLOCK(cgroup_rstat_base_lock); +static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_base_cpu_lock); +static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_cpu_lock[CGROUP_SUBSYS_COUNT]); static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu); @@ -86,7 +87,7 @@ void _cgroup_rstat_cpu_unlock(raw_spinlock_t *cpu_lock, int cpu, __bpf_kfunc void cgroup_rstat_updated(struct cgroup_subsys_state *css, int cpu) { struct cgroup *cgrp = css->cgroup; - raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); + raw_spinlock_t *cpu_lock; unsigned long flags; /* @@ -100,6 +101,11 @@ __bpf_kfunc void cgroup_rstat_updated(struct cgroup_subsys_state *css, int cpu) if (data_race(css_rstat_cpu(css, cpu)->updated_next)) return; + if (css->ss) + cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock[css->ss->id], cpu); + else + cpu_lock = per_cpu_ptr(&cgroup_rstat_base_cpu_lock, cpu); + flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, true); /* put @cgrp and all ancestors on the corresponding updated lists */ @@ -207,11 +213,16 @@ static struct cgroup_subsys_state *cgroup_rstat_push_children( static struct cgroup_subsys_state *cgroup_rstat_updated_list( struct cgroup_subsys_state *root, int cpu) { - raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); struct cgroup_rstat_cpu *rstatc = css_rstat_cpu(root, cpu); struct cgroup_subsys_state *head = NULL, *parent, *child; + raw_spinlock_t *cpu_lock; unsigned long flags; + if (root->ss) + cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock[root->ss->id], cpu); + else + cpu_lock = per_cpu_ptr(&cgroup_rstat_base_cpu_lock, cpu); + flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, root->cgroup, false); /* Return NULL if this subtree is not on-list */ @@ -285,37 +296,44 @@ __bpf_hook_end(); * number processed last. */ static inline void __cgroup_rstat_lock(struct cgroup_subsys_state *css, - int cpu_in_loop) - __acquires(&cgroup_rstat_lock) + spinlock_t *lock, int cpu_in_loop) + __acquires(lock) { struct cgroup *cgrp = css->cgroup; bool contended; - contended = !spin_trylock_irq(&cgroup_rstat_lock); + contended = !spin_trylock_irq(lock); if (contended) { trace_cgroup_rstat_lock_contended(cgrp, cpu_in_loop, contended); - spin_lock_irq(&cgroup_rstat_lock); + spin_lock_irq(lock); } trace_cgroup_rstat_locked(cgrp, cpu_in_loop, contended); } static inline void __cgroup_rstat_unlock(struct cgroup_subsys_state *css, - int cpu_in_loop) - __releases(&cgroup_rstat_lock) + spinlock_t *lock, int cpu_in_loop) + __releases(lock) { struct cgroup *cgrp = css->cgroup; trace_cgroup_rstat_unlock(cgrp, cpu_in_loop, false); - spin_unlock_irq(&cgroup_rstat_lock); + spin_unlock_irq(lock); } /* see cgroup_rstat_flush() */ static void cgroup_rstat_flush_locked(struct cgroup_subsys_state *css) - __releases(&cgroup_rstat_lock) __acquires(&cgroup_rstat_lock) + __releases(&css->ss->rstat_lock) __acquires(&css->ss->rstat_lock) { + spinlock_t *lock; int cpu; - lockdep_assert_held(&cgroup_rstat_lock); + if (!css->ss) { + pr_warn("cannot use generic flush on base subsystem\n"); + return; + } + + lock = &css->ss->rstat_lock; + lockdep_assert_held(lock); for_each_possible_cpu(cpu) { struct cgroup_subsys_state *pos = cgroup_rstat_updated_list(css, cpu); @@ -334,11 +352,11 @@ static void cgroup_rstat_flush_locked(struct cgroup_subsys_state *css) } /* play nice and yield if necessary */ - if (need_resched() || spin_needbreak(&cgroup_rstat_lock)) { - __cgroup_rstat_unlock(css, cpu); + if (need_resched() || spin_needbreak(lock)) { + __cgroup_rstat_unlock(css, lock, cpu); if (!cond_resched()) cpu_relax(); - __cgroup_rstat_lock(css, cpu); + __cgroup_rstat_lock(css, lock, cpu); } } } @@ -358,11 +376,22 @@ static void cgroup_rstat_flush_locked(struct cgroup_subsys_state *css) */ __bpf_kfunc void cgroup_rstat_flush(struct cgroup_subsys_state *css) { + spinlock_t *lock; + + if (!css->ss) { + int cpu; + + for_each_possible_cpu(cpu) + cgroup_base_stat_flush(css->cgroup, cpu); + return; + } + might_sleep(); - __cgroup_rstat_lock(css, -1); + lock = &css->ss->rstat_lock; + __cgroup_rstat_lock(css, lock, -1); cgroup_rstat_flush_locked(css); - __cgroup_rstat_unlock(css, -1); + __cgroup_rstat_unlock(css, lock, -1); } /** @@ -374,11 +403,11 @@ __bpf_kfunc void cgroup_rstat_flush(struct cgroup_subsys_state *css) * * This function may block. */ -void cgroup_rstat_flush_hold(struct cgroup_subsys_state *css) - __acquires(&cgroup_rstat_lock) +static void cgroup_rstat_base_flush_hold(struct cgroup_subsys_state *css) + __acquires(&cgroup_rstat_base_lock) { might_sleep(); - __cgroup_rstat_lock(css, -1); + __cgroup_rstat_lock(css, &cgroup_rstat_base_lock, -1); cgroup_rstat_flush_locked(css); } @@ -386,10 +415,10 @@ void cgroup_rstat_flush_hold(struct cgroup_subsys_state *css) * cgroup_rstat_flush_release - release cgroup_rstat_flush_hold() * @cgrp: cgroup used by tracepoint */ -void cgroup_rstat_flush_release(struct cgroup_subsys_state *css) - __releases(&cgroup_rstat_lock) +static void cgroup_rstat_base_flush_release(struct cgroup_subsys_state *css) + __releases(&cgroup_rstat_base_lock) { - __cgroup_rstat_unlock(css, -1); + __cgroup_rstat_unlock(css, &cgroup_rstat_base_lock, -1); } int cgroup_rstat_init(struct cgroup_subsys_state *css) @@ -435,10 +464,15 @@ void cgroup_rstat_exit(struct cgroup_subsys_state *css) void __init cgroup_rstat_boot(void) { - int cpu; + struct cgroup_subsys *ss; + int cpu, ssid; + + for_each_possible_cpu(cpu) { + raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_base_cpu_lock, cpu)); - for_each_possible_cpu(cpu) - raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu)); + for_each_subsys(ss, ssid) + raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_cpu_lock[ssid], cpu)); + } } /* @@ -629,12 +663,12 @@ void cgroup_base_stat_cputime_show(struct seq_file *seq) u64 usage, utime, stime, ntime; if (cgroup_parent(cgrp)) { - cgroup_rstat_flush_hold(css); + cgroup_rstat_base_flush_hold(css); usage = cgrp->bstat.cputime.sum_exec_runtime; cputime_adjust(&cgrp->bstat.cputime, &cgrp->prev_cputime, &utime, &stime); ntime = cgrp->bstat.ntime; - cgroup_rstat_flush_release(css); + cgroup_rstat_base_flush_release(css); } else { /* cgrp->bstat of root is not actually used, reuse it */ root_cgroup_cputime(&cgrp->bstat);