From patchwork Tue Dec 24 01:13:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: JP Kobryn X-Patchwork-Id: 13919572 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06A97E7718E for ; Tue, 24 Dec 2024 01:14:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 201356B008A; Mon, 23 Dec 2024 20:14:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1BA556B0092; Mon, 23 Dec 2024 20:14:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF87E6B0093; Mon, 23 Dec 2024 20:14:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id BDFFE6B008A for ; Mon, 23 Dec 2024 20:14:23 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 769D1C1A04 for ; Tue, 24 Dec 2024 01:14:23 +0000 (UTC) X-FDA: 82928080200.08.D88A72A Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) by imf08.hostedemail.com (Postfix) with ESMTP id DD7DD16000D for ; Tue, 24 Dec 2024 01:13:55 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=OkVQboGw; spf=pass (imf08.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735002843; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EpQSbn6C6AIKp7YKs7miT4jgNyhtNHCB16iSRmc/g1c=; b=yppP9ogyzO4GuZutDGjwJmUBHNjcJL3fiqUcKExFqWafiLqQBLGTkqijXwKpzKP7TvnlF1 BXHqS50FSc3/6poi14cx69XDjY73KCi8alBPSLaEefn7s/ouMoTVRGjRptVbkZu8LQrrdW AW6uTakDS8jtY6zSu/JONSq7BwN4I+s= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735002843; a=rsa-sha256; cv=none; b=eo5jFb0YKYgKsiE4aIG8q/fW7roH7MCAFGytYCy1MWI2MNmZ81Zejb/bMnvo1LTSXSk5zu PPkG4ifX8JyNP8tuXBUnunHq3dWGzPXt4WOCkTWYyapA4jAxzUcG9FQnHpHJ+4bUW0TwQN rFsFKsw7XlnSNy4l//yQkJsk8u9yJrA= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=OkVQboGw; spf=pass (imf08.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-21644aca3a0so52391655ad.3 for ; Mon, 23 Dec 2024 17:14:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735002860; x=1735607660; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=EpQSbn6C6AIKp7YKs7miT4jgNyhtNHCB16iSRmc/g1c=; b=OkVQboGwVaJctbaw4hug58/+bEjAsiUfGs4t/m+aQsQwBjosOX3q2dUPVNH3X1KLew dZtHGlUHt/ZPvjNvIrIi5l76J+9wri8l2v4tWSJdSM+766HNleFzrGcWiZmC24PpRKYB 15myHRgFUtrt0vymZRDGWOu/++jGV4QDnRlApBFI+J8SFWCcvtllDjuGvqZKkOGVQ9Wh p/1dqo271C0wEtpyKQ2Tj//AYtvs5y+7Ht1cfVXJXklntqU5B0q4iEyZNk1LQQicIrbM Et52Zzdu6uYD/JO6kTl3N9h348fow8Fmr7sCumm4Pxa7FkcLsx9yTsRP+QV35Svqig5T 4N+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735002860; x=1735607660; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EpQSbn6C6AIKp7YKs7miT4jgNyhtNHCB16iSRmc/g1c=; b=Qpd+672tX9bh32dhY2PiPqvaOfF3IVQ5pzwBRKxRlr/c+w01WMS9xW15HZAty+sCrF SXVct2wG7qEN989d5D+sTkyjJTnZxAMT7HEFCY90DFZGxKU+xJFL+O3OJAqINUL3Avh/ TVyAr6V6tsHf45Fq3zMovyuiVnssVKctpn1paKOzrN/KegByg1NJ+QYjJeGn2TePhQre vYMBfJ/hh4FeY/LZ0C0bFlr2+kP/OiwvX//ebuHCWzMXJKOAZKA9OfVIIsuDfKztFeAw +hYy1v4iKMPEmJGCSGcuYQtsJNIi5D6H6xbhliGI9hJmnYe8fn4jJ563dUtdPUZybMUB bbAA== X-Gm-Message-State: AOJu0YzS45h8hmXoMhsFHceH4dHHBwFIloSISQL562pyAC5OpejKDkQ7 IWDwi7FuhhKFGprVUFDlZ9R9EfCnK6ZGYTOjZYf2IXSFQQairGab X-Gm-Gg: ASbGnct5bb+WCkXymt+O5yXruvOTwHsPcTNGipqiywXbCPEQ4prwst73qNMBNjDbmTL 3ACdUXM9MBAjBZ3mcV+6vJ5bTeHCZNErKXU80Jh2yf920zMNh0FFyQzDTfSHI0WsT4TqUhM5Hzr VZoBFV/1JHUvQfWvVVSYMbEyvS38isbwfmVLub+sCG8lOiHHpK+YaxhIYhR4N89g0NaGjiLYXzi sE4XBw0Ta5q52ZvMg8C9uEpGi65ynscm6+nb9CDHf7fLdoHZgThNplQzy1uexnNrr9lbh9zdTfG w2nNxGtsaAhIHk2IuQ== X-Google-Smtp-Source: AGHT+IFwR7wXxhzrgfdQceACkEtKrEhPAYVpgAqlRJPF2+jNIw5EIHwhSSCWtl13waEhVD43uo5AMQ== X-Received: by 2002:a17:902:c943:b0:215:72aa:693f with SMTP id d9443c01a7336-219e6e8bb89mr229405165ad.9.1735002860313; Mon, 23 Dec 2024 17:14:20 -0800 (PST) Received: from saturn.. (c-67-188-127-15.hsd1.ca.comcast.net. [67.188.127.15]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc970c84sm79541255ad.58.2024.12.23.17.14.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Dec 2024 17:14:19 -0800 (PST) From: JP Kobryn To: shakeel.butt@linux.dev, hannes@cmpxchg.org, yosryahmed@google.com, akpm@linux-foundation.org Cc: linux-mm@kvack.org, cgroups@vger.kernel.org Subject: [PATCH 5/9 RFC] cgroup: separate locking between base css and others Date: Mon, 23 Dec 2024 17:13:58 -0800 Message-ID: <20241224011402.134009-6-inwardvessel@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241224011402.134009-1-inwardvessel@gmail.com> References: <20241224011402.134009-1-inwardvessel@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: DD7DD16000D X-Stat-Signature: p54u5nhog3e8b8s6zwomipybnimdyzku X-Rspam-User: X-HE-Tag: 1735002835-383986 X-HE-Meta: U2FsdGVkX18jhDrR1Nbza5cKe+k5ujotSCzw6sRaT+y81fvupQdjnACHpQf8kXuO2DF1tuxwRK4rwhws+FGv1rgH9x2t0Aqsx3tP/HR7Iit3AkY2abHPztXSkCyb47RVBGQ1TDJb5fxITeqM1VwnW+dYGicZmEkPJ7N5WKyaRyQWqaWjdMdeApOdq9sdXI8N6KPeBS6WXqzVLDNXAhbdbK3U0WRm0sXmW+SeZSSmWK2MaEY4cuqYftK8S7xsvtGZza0TP1kK9JBewrtNY5gavT31yDVoqDvLgGNEzJHs641e+duijZllY92shCmaIgu+t0G7mjYX+s+yXoMVVbyUXl/dPi2j/TKLqo1mcyttcU/GuQ1/WtpuPu3HYriAhtBfbIXzLD2XrsSHlq4DNfBr6G3iu4ngP31V9S+Q5DRp0keL0dgBdkzkmcYTIu+LWtXzfJmlX6zS+BLEhI3f8aJiYL75WGhNwoMRAxGmhmDYVd+MRxI6ghppSgThBmTRyndqDIb4Gg2ASnBogyjeDPGVgyjG0qn/pD/SySnUkHqnhS9/fixHFB4YIU2BSVrAwcYPM/1/6gkfIr7/B5an9MMJy82QWE6lzWXYYuGFuk6lQ4OS0VRs0vdkTsCd0MA/cK5hqokeuLicuLTxzBI7wCmjp3zSwBT86MjysZx2/WiPZ5FalCglDVLPczsF6Q3+OdE8qY4mUhoMXrEooWXQNmFc384JvBEsx3pUb+axDfJa5el5KIU+XF+fD+IzZImMGsXJPHci/0hb4QyOrD+KrlITPlooNAc5MyRJSqOoULviVll4QJsIZa5CsofNC+GxX0fmyCfi6VjJyB+SAggFXbgrnuNo3hTrUItcAdSqB+kocIK7V+w9AxWZbyHgTx01sGCGHfxqlujTWn5KMHLWGQ9ReqL2j0bxyyhKB+cLqQtTsM4atc3Ejrn+RaEZ57zXNQkUnpytwRSKxpYF+bnVEEe Tm5PQO/6 468suCbIUq1MVp8+Pae4+5GI9LAfgm03o0q0L6n68eNNoEOuThQbkv3bG1ouphzxRmE5Z6N8Q3v1a4URLrW6UFKThdlMHKyGXOEjxgMhPMWyGc533BQfRp2gWjvUbbZ9Cb/ZIgeK0DX746qrMBHXhDuscGHfy7Y1ZNjxzHZl3Ko1HRlBbDABpdZKDRD6Gvtv5K9BoOuCZHKZnAMvlll92I+nxbNZFP58VM+dB47fMdPLyP/Vvga6bwH/6j4kckzSrcO952WIgWPL6BkrU4RnxPO+LeXgHvSP51XCDPquVI60frM+Zi8Fj4gG82y+KgtRYRQGE5OBNE7xY5E1XS7tfoEOC4O+5ufnBwM1CRAkjYO8AmNywPGe5138bDQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.032539, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Separate locks can be used to eliminate contention across subsystems that make use of rstat. The base stats also get their own lock. Where applicable, check for the existence of a subsystem pointer to determine if the given cgroup_subsys_state is the base css or not. Signed-off-by: JP Kobryn --- include/linux/cgroup-defs.h | 2 + kernel/cgroup/rstat.c | 92 +++++++++++++++++++++++++------------ 2 files changed, 65 insertions(+), 29 deletions(-) diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 1932f8ae7995..4d87519ff023 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -790,6 +790,8 @@ struct cgroup_subsys { * specifies the mask of subsystems that this one depends on. */ unsigned int depends_on; + + spinlock_t rstat_lock; }; extern struct percpu_rw_semaphore cgroup_threadgroup_rwsem; diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index 4381eb9ac426..958bdccf0359 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -9,8 +9,9 @@ #include -static DEFINE_SPINLOCK(cgroup_rstat_lock); -static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_cpu_lock); +static DEFINE_SPINLOCK(cgroup_rstat_base_lock); +static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_base_cpu_lock); +static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_cpu_lock[CGROUP_SUBSYS_COUNT]); static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu); @@ -86,7 +87,7 @@ void _cgroup_rstat_cpu_unlock(raw_spinlock_t *cpu_lock, int cpu, __bpf_kfunc void cgroup_rstat_updated(struct cgroup_subsys_state *css, int cpu) { struct cgroup *cgrp = css->cgroup; - raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); + raw_spinlock_t *cpu_lock; unsigned long flags; /* @@ -100,6 +101,11 @@ __bpf_kfunc void cgroup_rstat_updated(struct cgroup_subsys_state *css, int cpu) if (data_race(css_rstat_cpu(css, cpu)->updated_next)) return; + if (css->ss) + cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock[css->ss->id], cpu); + else + cpu_lock = per_cpu_ptr(&cgroup_rstat_base_cpu_lock, cpu); + flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, true); /* put @cgrp and all ancestors on the corresponding updated lists */ @@ -207,11 +213,16 @@ static struct cgroup_subsys_state *cgroup_rstat_push_children( static struct cgroup_subsys_state *cgroup_rstat_updated_list( struct cgroup_subsys_state *root, int cpu) { - raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); struct cgroup_rstat_cpu *rstatc = css_rstat_cpu(root, cpu); struct cgroup_subsys_state *head = NULL, *parent, *child; + raw_spinlock_t *cpu_lock; unsigned long flags; + if (root->ss) + cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock[root->ss->id], cpu); + else + cpu_lock = per_cpu_ptr(&cgroup_rstat_base_cpu_lock, cpu); + flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, root->cgroup, false); /* Return NULL if this subtree is not on-list */ @@ -285,37 +296,44 @@ __bpf_hook_end(); * number processed last. */ static inline void __cgroup_rstat_lock(struct cgroup_subsys_state *css, - int cpu_in_loop) - __acquires(&cgroup_rstat_lock) + spinlock_t *lock, int cpu_in_loop) + __acquires(lock) { struct cgroup *cgrp = css->cgroup; bool contended; - contended = !spin_trylock_irq(&cgroup_rstat_lock); + contended = !spin_trylock_irq(lock); if (contended) { trace_cgroup_rstat_lock_contended(cgrp, cpu_in_loop, contended); - spin_lock_irq(&cgroup_rstat_lock); + spin_lock_irq(lock); } trace_cgroup_rstat_locked(cgrp, cpu_in_loop, contended); } static inline void __cgroup_rstat_unlock(struct cgroup_subsys_state *css, - int cpu_in_loop) - __releases(&cgroup_rstat_lock) + spinlock_t *lock, int cpu_in_loop) + __releases(lock) { struct cgroup *cgrp = css->cgroup; trace_cgroup_rstat_unlock(cgrp, cpu_in_loop, false); - spin_unlock_irq(&cgroup_rstat_lock); + spin_unlock_irq(lock); } /* see cgroup_rstat_flush() */ static void cgroup_rstat_flush_locked(struct cgroup_subsys_state *css) - __releases(&cgroup_rstat_lock) __acquires(&cgroup_rstat_lock) + __releases(&css->ss->rstat_lock) __acquires(&css->ss->rstat_lock) { + spinlock_t *lock; int cpu; - lockdep_assert_held(&cgroup_rstat_lock); + if (!css->ss) { + pr_warn("cannot use generic flush on base subsystem\n"); + return; + } + + lock = &css->ss->rstat_lock; + lockdep_assert_held(lock); for_each_possible_cpu(cpu) { struct cgroup_subsys_state *pos = cgroup_rstat_updated_list(css, cpu); @@ -334,11 +352,11 @@ static void cgroup_rstat_flush_locked(struct cgroup_subsys_state *css) } /* play nice and yield if necessary */ - if (need_resched() || spin_needbreak(&cgroup_rstat_lock)) { - __cgroup_rstat_unlock(css, cpu); + if (need_resched() || spin_needbreak(lock)) { + __cgroup_rstat_unlock(css, lock, cpu); if (!cond_resched()) cpu_relax(); - __cgroup_rstat_lock(css, cpu); + __cgroup_rstat_lock(css, lock, cpu); } } } @@ -358,11 +376,22 @@ static void cgroup_rstat_flush_locked(struct cgroup_subsys_state *css) */ __bpf_kfunc void cgroup_rstat_flush(struct cgroup_subsys_state *css) { + spinlock_t *lock; + + if (!css->ss) { + int cpu; + + for_each_possible_cpu(cpu) + cgroup_base_stat_flush(css->cgroup, cpu); + return; + } + might_sleep(); - __cgroup_rstat_lock(css, -1); + lock = &css->ss->rstat_lock; + __cgroup_rstat_lock(css, lock, -1); cgroup_rstat_flush_locked(css); - __cgroup_rstat_unlock(css, -1); + __cgroup_rstat_unlock(css, lock, -1); } /** @@ -374,11 +403,11 @@ __bpf_kfunc void cgroup_rstat_flush(struct cgroup_subsys_state *css) * * This function may block. */ -void cgroup_rstat_flush_hold(struct cgroup_subsys_state *css) - __acquires(&cgroup_rstat_lock) +static void cgroup_rstat_base_flush_hold(struct cgroup_subsys_state *css) + __acquires(&cgroup_rstat_base_lock) { might_sleep(); - __cgroup_rstat_lock(css, -1); + __cgroup_rstat_lock(css, &cgroup_rstat_base_lock, -1); cgroup_rstat_flush_locked(css); } @@ -386,10 +415,10 @@ void cgroup_rstat_flush_hold(struct cgroup_subsys_state *css) * cgroup_rstat_flush_release - release cgroup_rstat_flush_hold() * @cgrp: cgroup used by tracepoint */ -void cgroup_rstat_flush_release(struct cgroup_subsys_state *css) - __releases(&cgroup_rstat_lock) +static void cgroup_rstat_base_flush_release(struct cgroup_subsys_state *css) + __releases(&cgroup_rstat_base_lock) { - __cgroup_rstat_unlock(css, -1); + __cgroup_rstat_unlock(css, &cgroup_rstat_base_lock, -1); } int cgroup_rstat_init(struct cgroup_subsys_state *css) @@ -435,10 +464,15 @@ void cgroup_rstat_exit(struct cgroup_subsys_state *css) void __init cgroup_rstat_boot(void) { - int cpu; + struct cgroup_subsys *ss; + int cpu, ssid; + + for_each_possible_cpu(cpu) { + raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_base_cpu_lock, cpu)); - for_each_possible_cpu(cpu) - raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu)); + for_each_subsys(ss, ssid) + raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_cpu_lock[ssid], cpu)); + } } /* @@ -629,12 +663,12 @@ void cgroup_base_stat_cputime_show(struct seq_file *seq) u64 usage, utime, stime, ntime; if (cgroup_parent(cgrp)) { - cgroup_rstat_flush_hold(css); + cgroup_rstat_base_flush_hold(css); usage = cgrp->bstat.cputime.sum_exec_runtime; cputime_adjust(&cgrp->bstat.cputime, &cgrp->prev_cputime, &utime, &stime); ntime = cgrp->bstat.ntime; - cgroup_rstat_flush_release(css); + cgroup_rstat_base_flush_release(css); } else { /* cgrp->bstat of root is not actually used, reuse it */ root_cgroup_cputime(&cgrp->bstat);