From patchwork Thu Feb 27 21:55:40 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: JP Kobryn X-Patchwork-Id: 13995241 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84D00C1B087 for ; Thu, 27 Feb 2025 21:56:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 187BD280002; Thu, 27 Feb 2025 16:56:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 137836B008C; Thu, 27 Feb 2025 16:56:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E57FB280002; Thu, 27 Feb 2025 16:56:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id BD0A86B008A for ; Thu, 27 Feb 2025 16:56:00 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 6358C1A1E80 for ; Thu, 27 Feb 2025 21:56:00 +0000 (UTC) X-FDA: 83167082880.01.4002684 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf22.hostedemail.com (Postfix) with ESMTP id 58596C001B for ; Thu, 27 Feb 2025 21:55:58 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=huPjR8Y4; spf=pass (imf22.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740693358; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=474JxkEOKjJVJigD4z+nvL8yXlpAD0OmCPjzkdywhJ0=; b=ug5hvP56CUo/RMgBSwHRF4YGyFOU9OH69mqbeMBaZXH8DGmpSWvIPFXQe/AIV3vJ9WcYLU fr18Swmz/9Zd3kHQGda62flacj1jCB2yLcV2COb3IFnfQA+zuEOWJJt4xD0uOIDMTaMAqy 9Dwc7e3oWKNQykV9PdKpDwEnISo0q9g= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740693358; a=rsa-sha256; cv=none; b=dV67V/NncKK8rmBOniJUKDC0vCGphpp6MC+t3B8GHkXiVjLfOB5Qaqj7MsDe/pxTUdGCi4 fuV+6rgR/OpxLiow1RG9/KjYbFIjrx62FLJQgfBS/HS0hKmNp7uy/Gi9tcdrgP/jc7c5J+ qiecCGJ+IH6+VMoHa55tpOVk6ovUwTo= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=huPjR8Y4; spf=pass (imf22.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-223480ea43aso36710155ad.1 for ; Thu, 27 Feb 2025 13:55:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740693357; x=1741298157; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=474JxkEOKjJVJigD4z+nvL8yXlpAD0OmCPjzkdywhJ0=; b=huPjR8Y4kgiKJEWW6KWHROLMg+WHwFn9wX8nmfJL86e7RDuv9wi4ZIMMtoXhedaOf+ 541uQt0YFmLwXnHjHc/YqRJ3OHvF3PJWB2eaUW2VJZD3S5TUKUlm88oC79z/gHep1N9J IHZzzfkuqTP4AlVteqtteEGATfcqe79RZGXhwTtwqIcAJL0wazKnSxWPRoLDItVgYmPb n1Wf18rp0BNv8F/ZVV2XyxoQitMAA/h9D2ViddCLKVT6IQUNOp4Zt7FgADvz0wGFayJH GTvQGonmrtbjqmNNc3GerldWed4gsENgl2TI9Pec9kPUEWLOpT94mQHcvCxJZxxNdaDX fEbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740693357; x=1741298157; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=474JxkEOKjJVJigD4z+nvL8yXlpAD0OmCPjzkdywhJ0=; b=TBrKcD0EiKqt/Woz/tPfG5UgqqVtbaBiB3mPwtzRpXv6bVH6013DlRqJUiKOFgOd2i /yX8R1c/ROpYjIOdafqQtfQXaHCzu5RBvERA9I5aienRPu7cTz4rwJkm0WPX7q9yc353 8QIVxQTUqjdq9Y2wzTybIgsrwagc3YuyBolpglfVSLKHBQfn6IZzeZER4W8VRuswwvZj gPnmwmvxr10aGwINS/NLXJikterfd+NJD5Yc38MCyUpjd5+mWdCf4wOt12z/plkOCQJ/ YOSRncEwDpbdnKcZC7YkGY2kxyBBDGixLmJRtK2pIrqK6jIGrIFGshmKXaM4+e/FURNW 2HQg== X-Gm-Message-State: AOJu0YytQtpKyvTFMid7BxE6VbvLvIQT+UWbDu6VbLjKGCW6AhvWLlXw voXvEHPjgpMglcqB2OkjgKVOxIzsHxym055Gu9wLJ2U9o5Tsb5nePSzccw== X-Gm-Gg: ASbGncsLWX/qj8nlLlU9MJjabtIS9N8l+I7UEr6QindKfPpjEO/LUk9DV7cwfkEZuI+ tljOVBL/sbAKG3C3Zrf6PJmoO1whBmah5K2YaeuqXW8ioQMTZGcC2JmoiXBsKLrHBGVl2FQpQRV QXVVBoBke9n1KkvO2CGXCPE6paK+p03BDfN630gEB15mipF1Oq/RBuilPYBDR1XrERFc5ccaOXD xTwqLMK5sKvg6owX9TlJlzyWBWMmUV8wDLV44grrE7ZcYl9uI8NmNz416vu8uslp6Uj8t0w2pvy lmDmoSrnusHuWRrY3s9IwwXppRqS8dvaCjucWxQ/n+8CLr8rgoC0Vw== X-Google-Smtp-Source: AGHT+IGBXd7LLAvRlJMDaFkbyZIUFiydBdHdhiZzy/D+lHoE5IVlW7RZYbQXedAOL6QGamfvnayp8g== X-Received: by 2002:a17:903:708:b0:220:e896:54e1 with SMTP id d9443c01a7336-22368fc97abmr9341695ad.26.1740693357001; Thu, 27 Feb 2025 13:55:57 -0800 (PST) Received: from jpkobryn-fedora-PF5CFKNC.thefacebook.com ([2620:10d:c090:500::4:4d60]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-734a003eb65sm2301321b3a.149.2025.02.27.13.55.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Feb 2025 13:55:56 -0800 (PST) From: inwardvessel To: tj@kernel.org, shakeel.butt@linux.dev, yosryahmed@google.com, mhocko@kernel.org, hannes@cmpxchg.org, akpm@linux-foundation.org Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 1/4 v2] cgroup: move cgroup_rstat from cgroup to cgroup_subsys_state Date: Thu, 27 Feb 2025 13:55:40 -0800 Message-ID: <20250227215543.49928-2-inwardvessel@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250227215543.49928-1-inwardvessel@gmail.com> References: <20250227215543.49928-1-inwardvessel@gmail.com> MIME-Version: 1.0 X-Stat-Signature: jbnb5g19afgcjym3qh3uzmw8g44hd7e9 X-Rspamd-Queue-Id: 58596C001B X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740693358-364107 X-HE-Meta: U2FsdGVkX18mXXQLG2xomLX8wFG2RtEpGmsO3uOE+qYnbZURbzlgvwoWzc/SixLKo/FoQH91SDBLSznlpwCuFpnK4cIJw4G+fAYGAVlJfmwejXsV3jr4e4NuWpniBZ1zySuMlepeY5E8iK+WcvhwURmO4iDZXQYAJN0uqKpxgFBQAppsiivDFoCjCI0CTWJPW7HHkPF5f65g06xiyW6lbOuvGQsY8cjIiJzfFIYxrVcMA0tW58TqdubA6LE1HfeAhbeWoHM/qvlm5Oge+lpnKJOeNM4WedkXuOB/8IXlxTnvsI82jOwj45GlQtEiiobM7im7iBLPVsuahnOkatHUdzmsb+SQ17uiOZM3N6+sefHG1Yf8zfv6TQZuA8UO0XckDituWJJETpZhn9F+z3H9uUSZ4e/5V8eJhCzzXGwTkL16/CGc7ms9feuGzqH8CAESjXfz0XHPsteu0cuiFZkh/c3cEXJ6vUw4aamZhZ3/acCElU1apZRjSVJPim6mqsPcFpE39hSBh3kfz0wr9WTbaiFnWh5DUcrP1tvMAPUuInYGJcHkacQ2Yxb2iw2zMFOPwt3Sqe5vluQvJU2w8WU8a3ZRC31co+YSUp64AP1T/vVgmny2+mLIofE9n7BRR8LlV8u+Zr5Vomfhdkh+khVeT1wdNyPKI0FPZce7lrx18p9l13V3erQ90c/mNYBDtfBOfwmeBMzBeILhHXjFeImKR6hN2zBvUUc8qlvO644oFHKt4MS6h7J2kHAiMOxUPAzlWaptdyuVffiMNf1zEhxe8ACdtz2h/TtWfEOVpHotHixyrYQYdg3wiuHOhrPsQTFrLlxxhYAX94MKbn9oBhLTGm0K/zkqqe3Eyl9Ly6vDDL9CrNoJSaSwMFNo1QLJA/4GC5GWDSL5ng7jPalceA3eWJkX7SEzzgjtde89bD7uZlHiZevn9VeVdKDdn/tpKEKItrGZvv6E+SBPMBqPHIx BV0r4PUI pFMUTJorUTE1RnfMVSaK5tK/CeDuo3D3KIUWM7A6JsiYa+NKT5NweY+NdOKNvcIP/lULb8giTIxXNpaMkmu9XnN5zWlVPWqR3NsS7kIzmECawK6dCE0LYLE6oUguVrpDQbR5sOM59KRpBQog0xwf5RP6O0BntTjpOmWsZf8tZ4m8ht6P4zvCipxlfMUhqPhmZQ/OVmioh1Eag5k3VtkFP8m/TTX9D0dwLuxTF2KfjDkxBtVSNeCWDzvGAjzIRZVXQKYAGq90fW3Qv+LMp1ghggZUi9jZp8u+Dm8hQPJ64h6jt+j9Xath5t4mCC1GU7HtXiHqU2eDAqYYdNyB/feejwG7GxCbo9yuwolSmdmPZBg/3KcUvZX0z5Mzmspg9hgO67SUWY0fQGHFsJxy6mzapQt3l0ooVEntfuctndcB8LIovIj4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: JP Kobryn Each cgroup owns rstat pointers. This means that a tree of pending rstat updates can contain changes from different subsystems. Because of this arrangement, when one subsystem is flushed via the public api cgroup_rstat_flushed(), all other subsystems with pending updates will also be flushed. Remove the rstat pointers from the cgroup and instead give them to each cgroup_subsys_state. Separate rstat trees will now exist for each unique subsystem. This separation allows for subsystems to make updates and flushes without the side effects of other subsystems. i.e. flushing the cpu stats does not cause the memory stats to be flushed and vice versa. The change in pointer ownership from cgroup to cgroup_subsys_state allows for direct flushing of the css, so the rcu list management entities and operations previously tied to the cgroup which were used for managing a list of subsystem states with pending flushes are removed. In terms of client code, public api calls were changed to now accept a reference to the cgroup_subsys_state so that when flushing or updating, a specific subsystem is associated with the call. Signed-off-by: JP Kobryn Reviewed-by: Shakeel Butt --- block/blk-cgroup.c | 4 +- include/linux/cgroup-defs.h | 36 ++-- include/linux/cgroup.h | 8 +- kernel/cgroup/cgroup-internal.h | 4 +- kernel/cgroup/cgroup.c | 53 +++--- kernel/cgroup/rstat.c | 159 +++++++++--------- mm/memcontrol.c | 4 +- .../selftests/bpf/progs/btf_type_tag_percpu.c | 5 +- .../bpf/progs/cgroup_hierarchical_stats.c | 8 +- 9 files changed, 140 insertions(+), 141 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 9ed93d91d754..6a0680d8ce6a 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1201,7 +1201,7 @@ static int blkcg_print_stat(struct seq_file *sf, void *v) if (!seq_css(sf)->parent) blkcg_fill_root_iostats(); else - cgroup_rstat_flush(blkcg->css.cgroup); + cgroup_rstat_flush(&blkcg->css); rcu_read_lock(); hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) { @@ -2186,7 +2186,7 @@ void blk_cgroup_bio_start(struct bio *bio) } u64_stats_update_end_irqrestore(&bis->sync, flags); - cgroup_rstat_updated(blkcg->css.cgroup, cpu); + cgroup_rstat_updated(&blkcg->css, cpu); put_cpu(); } diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 17960a1e858d..1598e1389615 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -169,6 +169,9 @@ struct cgroup_subsys_state { /* reference count - access via css_[try]get() and css_put() */ struct percpu_ref refcnt; + /* per-cpu recursive resource statistics */ + struct cgroup_rstat_cpu __percpu *rstat_cpu; + /* * siblings list anchored at the parent's ->children * @@ -177,9 +180,6 @@ struct cgroup_subsys_state { struct list_head sibling; struct list_head children; - /* flush target list anchored at cgrp->rstat_css_list */ - struct list_head rstat_css_node; - /* * PI: Subsys-unique ID. 0 is unused and root is always 1. The * matching css can be looked up using css_from_id(). @@ -219,6 +219,14 @@ struct cgroup_subsys_state { * Protected by cgroup_mutex. */ int nr_descendants; + + /* + * A singly-linked list of css structures to be rstat flushed. + * This is a scratch field to be used exclusively by + * cgroup_rstat_flush_locked() and protected by cgroup_rstat_lock. + */ + struct cgroup_subsys_state *rstat_flush_next; + }; /* @@ -386,8 +394,8 @@ struct cgroup_rstat_cpu { * * Protected by per-cpu cgroup_rstat_cpu_lock. */ - struct cgroup *updated_children; /* terminated by self cgroup */ - struct cgroup *updated_next; /* NULL iff not on the list */ + struct cgroup_subsys_state *updated_children; /* terminated by self */ + struct cgroup_subsys_state *updated_next; /* NULL if not on list */ }; struct cgroup_freezer_state { @@ -516,24 +524,6 @@ struct cgroup { struct cgroup *dom_cgrp; struct cgroup *old_dom_cgrp; /* used while enabling threaded */ - /* per-cpu recursive resource statistics */ - struct cgroup_rstat_cpu __percpu *rstat_cpu; - struct list_head rstat_css_list; - - /* - * Add padding to separate the read mostly rstat_cpu and - * rstat_css_list into a different cacheline from the following - * rstat_flush_next and *bstat fields which can have frequent updates. - */ - CACHELINE_PADDING(_pad_); - - /* - * A singly-linked list of cgroup structures to be rstat flushed. - * This is a scratch field to be used exclusively by - * cgroup_rstat_flush_locked() and protected by cgroup_rstat_lock. - */ - struct cgroup *rstat_flush_next; - /* cgroup basic resource statistics */ struct cgroup_base_stat last_bstat; struct cgroup_base_stat bstat; diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index f8ef47f8a634..eec970622419 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -687,10 +687,10 @@ static inline void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen) /* * cgroup scalable recursive statistics. */ -void cgroup_rstat_updated(struct cgroup *cgrp, int cpu); -void cgroup_rstat_flush(struct cgroup *cgrp); -void cgroup_rstat_flush_hold(struct cgroup *cgrp); -void cgroup_rstat_flush_release(struct cgroup *cgrp); +void cgroup_rstat_updated(struct cgroup_subsys_state *css, int cpu); +void cgroup_rstat_flush(struct cgroup_subsys_state *css); +void cgroup_rstat_flush_hold(struct cgroup_subsys_state *css); +void cgroup_rstat_flush_release(struct cgroup_subsys_state *css); /* * Basic resource stats. diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h index c964dd7ff967..87d062baff90 100644 --- a/kernel/cgroup/cgroup-internal.h +++ b/kernel/cgroup/cgroup-internal.h @@ -269,8 +269,8 @@ int cgroup_task_count(const struct cgroup *cgrp); /* * rstat.c */ -int cgroup_rstat_init(struct cgroup *cgrp); -void cgroup_rstat_exit(struct cgroup *cgrp); +int cgroup_rstat_init(struct cgroup_subsys_state *css); +void cgroup_rstat_exit(struct cgroup_subsys_state *css); void cgroup_rstat_boot(void); void cgroup_base_stat_cputime_show(struct seq_file *seq); diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index afc665b7b1fe..31b3bfebf7ba 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -164,7 +164,9 @@ static struct static_key_true *cgroup_subsys_on_dfl_key[] = { static DEFINE_PER_CPU(struct cgroup_rstat_cpu, cgrp_dfl_root_rstat_cpu); /* the default hierarchy */ -struct cgroup_root cgrp_dfl_root = { .cgrp.rstat_cpu = &cgrp_dfl_root_rstat_cpu }; +struct cgroup_root cgrp_dfl_root = { + .cgrp.self.rstat_cpu = &cgrp_dfl_root_rstat_cpu +}; EXPORT_SYMBOL_GPL(cgrp_dfl_root); /* @@ -1358,7 +1360,7 @@ static void cgroup_destroy_root(struct cgroup_root *root) cgroup_unlock(); - cgroup_rstat_exit(cgrp); + cgroup_rstat_exit(&cgrp->self); kernfs_destroy_root(root->kf_root); cgroup_free_root(root); } @@ -1863,13 +1865,6 @@ int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask) } spin_unlock_irq(&css_set_lock); - if (ss->css_rstat_flush) { - list_del_rcu(&css->rstat_css_node); - synchronize_rcu(); - list_add_rcu(&css->rstat_css_node, - &dcgrp->rstat_css_list); - } - /* default hierarchy doesn't enable controllers by default */ dst_root->subsys_mask |= 1 << ssid; if (dst_root == &cgrp_dfl_root) { @@ -2052,7 +2047,6 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp) cgrp->dom_cgrp = cgrp; cgrp->max_descendants = INT_MAX; cgrp->max_depth = INT_MAX; - INIT_LIST_HEAD(&cgrp->rstat_css_list); prev_cputime_init(&cgrp->prev_cputime); for_each_subsys(ss, ssid) @@ -2132,7 +2126,7 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) if (ret) goto destroy_root; - ret = cgroup_rstat_init(root_cgrp); + ret = cgroup_rstat_init(&root_cgrp->self); if (ret) goto destroy_root; @@ -2174,7 +2168,7 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) goto out; exit_stats: - cgroup_rstat_exit(root_cgrp); + cgroup_rstat_exit(&root_cgrp->self); destroy_root: kernfs_destroy_root(root->kf_root); root->kf_root = NULL; @@ -5407,7 +5401,11 @@ static void css_free_rwork_fn(struct work_struct *work) struct cgroup_subsys_state *parent = css->parent; int id = css->id; + if (css->ss->css_rstat_flush) + cgroup_rstat_exit(css); + ss->css_free(css); + cgroup_idr_remove(&ss->css_idr, id); cgroup_put(cgrp); @@ -5431,7 +5429,7 @@ static void css_free_rwork_fn(struct work_struct *work) cgroup_put(cgroup_parent(cgrp)); kernfs_put(cgrp->kn); psi_cgroup_free(cgrp); - cgroup_rstat_exit(cgrp); + cgroup_rstat_exit(&cgrp->self); kfree(cgrp); } else { /* @@ -5459,11 +5457,7 @@ static void css_release_work_fn(struct work_struct *work) if (ss) { struct cgroup *parent_cgrp; - /* css release path */ - if (!list_empty(&css->rstat_css_node)) { - cgroup_rstat_flush(cgrp); - list_del_rcu(&css->rstat_css_node); - } + cgroup_rstat_flush(css); cgroup_idr_replace(&ss->css_idr, NULL, css->id); if (ss->css_released) @@ -5489,7 +5483,7 @@ static void css_release_work_fn(struct work_struct *work) /* cgroup release path */ TRACE_CGROUP_PATH(release, cgrp); - cgroup_rstat_flush(cgrp); + cgroup_rstat_flush(&cgrp->self); spin_lock_irq(&css_set_lock); for (tcgrp = cgroup_parent(cgrp); tcgrp; @@ -5537,7 +5531,6 @@ static void init_and_link_css(struct cgroup_subsys_state *css, css->id = -1; INIT_LIST_HEAD(&css->sibling); INIT_LIST_HEAD(&css->children); - INIT_LIST_HEAD(&css->rstat_css_node); css->serial_nr = css_serial_nr_next++; atomic_set(&css->online_cnt, 0); @@ -5546,9 +5539,6 @@ static void init_and_link_css(struct cgroup_subsys_state *css, css_get(css->parent); } - if (ss->css_rstat_flush) - list_add_rcu(&css->rstat_css_node, &cgrp->rstat_css_list); - BUG_ON(cgroup_css(cgrp, ss)); } @@ -5641,6 +5631,12 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp, goto err_free_css; css->id = err; + if (css->ss->css_rstat_flush) { + err = cgroup_rstat_init(css); + if (err) + goto err_free_css; + } + /* @css is ready to be brought online now, make it visible */ list_add_tail_rcu(&css->sibling, &parent_css->children); cgroup_idr_replace(&ss->css_idr, css, css->id); @@ -5654,7 +5650,6 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp, err_list_del: list_del_rcu(&css->sibling); err_free_css: - list_del_rcu(&css->rstat_css_node); INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn); queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork); return ERR_PTR(err); @@ -5682,7 +5677,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name, if (ret) goto out_free_cgrp; - ret = cgroup_rstat_init(cgrp); + ret = cgroup_rstat_init(&cgrp->self); if (ret) goto out_cancel_ref; @@ -5775,7 +5770,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name, out_kernfs_remove: kernfs_remove(cgrp->kn); out_stat_exit: - cgroup_rstat_exit(cgrp); + cgroup_rstat_exit(&cgrp->self); out_cancel_ref: percpu_ref_exit(&cgrp->self.refcnt); out_free_cgrp: @@ -6087,6 +6082,9 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early) } else { css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL); BUG_ON(css->id < 0); + + if (css->ss && css->ss->css_rstat_flush) + BUG_ON(cgroup_rstat_init(css)); } /* Update the init_css_set to contain a subsys @@ -6188,6 +6186,9 @@ int __init cgroup_init(void) css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL); BUG_ON(css->id < 0); + + if (css->ss && css->ss->css_rstat_flush) + BUG_ON(cgroup_rstat_init(css)); } else { cgroup_init_subsys(ss, false); } diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index aac91466279f..9976f9acd62b 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -14,9 +14,10 @@ static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_cpu_lock); static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu); -static struct cgroup_rstat_cpu *cgroup_rstat_cpu(struct cgroup *cgrp, int cpu) +static struct cgroup_rstat_cpu *cgroup_rstat_cpu( + struct cgroup_subsys_state *css, int cpu) { - return per_cpu_ptr(cgrp->rstat_cpu, cpu); + return per_cpu_ptr(css->rstat_cpu, cpu); } /* @@ -75,15 +76,17 @@ void _cgroup_rstat_cpu_unlock(raw_spinlock_t *cpu_lock, int cpu, /** * cgroup_rstat_updated - keep track of updated rstat_cpu - * @cgrp: target cgroup + * @css: target cgroup subsystem state * @cpu: cpu on which rstat_cpu was updated * - * @cgrp's rstat_cpu on @cpu was updated. Put it on the parent's matching + * @css's rstat_cpu on @cpu was updated. Put it on the parent's matching * rstat_cpu->updated_children list. See the comment on top of * cgroup_rstat_cpu definition for details. */ -__bpf_kfunc void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) +__bpf_kfunc void cgroup_rstat_updated( + struct cgroup_subsys_state *css, int cpu) { + struct cgroup *cgrp = css->cgroup; raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); unsigned long flags; @@ -92,18 +95,18 @@ __bpf_kfunc void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) * temporary inaccuracies, which is fine. * * Because @parent's updated_children is terminated with @parent - * instead of NULL, we can tell whether @cgrp is on the list by + * instead of NULL, we can tell whether @css is on the list by * testing the next pointer for NULL. */ - if (data_race(cgroup_rstat_cpu(cgrp, cpu)->updated_next)) + if (data_race(cgroup_rstat_cpu(css, cpu)->updated_next)) return; flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, true); - /* put @cgrp and all ancestors on the corresponding updated lists */ + /* put @css and all ancestors on the corresponding updated lists */ while (true) { - struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu); - struct cgroup *parent = cgroup_parent(cgrp); + struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(css, cpu); + struct cgroup_subsys_state *parent = css->parent; struct cgroup_rstat_cpu *prstatc; /* @@ -115,15 +118,15 @@ __bpf_kfunc void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) /* Root has no parent to link it to, but mark it busy */ if (!parent) { - rstatc->updated_next = cgrp; + rstatc->updated_next = css; break; } prstatc = cgroup_rstat_cpu(parent, cpu); rstatc->updated_next = prstatc->updated_children; - prstatc->updated_children = cgrp; + prstatc->updated_children = css; - cgrp = parent; + css = parent; } _cgroup_rstat_cpu_unlock(cpu_lock, cpu, cgrp, flags, true); @@ -141,12 +144,13 @@ __bpf_kfunc void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) * into a singly linked list built from the tail backward like "pushing" * cgroups into a stack. The root is pushed by the caller. */ -static struct cgroup *cgroup_rstat_push_children(struct cgroup *head, - struct cgroup *child, int cpu) +static struct cgroup_subsys_state *cgroup_rstat_push_children( + struct cgroup_subsys_state *head, + struct cgroup_subsys_state *child, int cpu) { - struct cgroup *chead = child; /* Head of child cgroup level */ - struct cgroup *ghead = NULL; /* Head of grandchild cgroup level */ - struct cgroup *parent, *grandchild; + struct cgroup_subsys_state *chead = child; /* Head of child css level */ + struct cgroup_subsys_state *ghead = NULL; /* Head of grandchild css level */ + struct cgroup_subsys_state *parent, *grandchild; struct cgroup_rstat_cpu *crstatc; child->rstat_flush_next = NULL; @@ -155,7 +159,7 @@ static struct cgroup *cgroup_rstat_push_children(struct cgroup *head, while (chead) { child = chead; chead = child->rstat_flush_next; - parent = cgroup_parent(child); + parent = child->parent; /* updated_next is parent cgroup terminated */ while (child != parent) { @@ -184,30 +188,32 @@ static struct cgroup *cgroup_rstat_push_children(struct cgroup *head, /** * cgroup_rstat_updated_list - return a list of updated cgroups to be flushed - * @root: root of the cgroup subtree to traverse + * @root: root of the css subtree to traverse * @cpu: target cpu * Return: A singly linked list of cgroups to be flushed * * Walks the updated rstat_cpu tree on @cpu from @root. During traversal, - * each returned cgroup is unlinked from the updated tree. + * each returned css is unlinked from the updated tree. * * The only ordering guarantee is that, for a parent and a child pair * covered by a given traversal, the child is before its parent in * the list. * * Note that updated_children is self terminated and points to a list of - * child cgroups if not empty. Whereas updated_next is like a sibling link - * within the children list and terminated by the parent cgroup. An exception + * child css's if not empty. Whereas updated_next is like a sibling link + * within the children list and terminated by the parent css. An exception * here is the cgroup root whose updated_next can be self terminated. */ -static struct cgroup *cgroup_rstat_updated_list(struct cgroup *root, int cpu) +static struct cgroup_subsys_state *cgroup_rstat_updated_list( + struct cgroup_subsys_state *root, int cpu) { + struct cgroup *cgrp = root->cgroup; raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(root, cpu); - struct cgroup *head = NULL, *parent, *child; + struct cgroup_subsys_state *head = NULL, *parent, *child; unsigned long flags; - flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, root, false); + flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, false); /* Return NULL if this subtree is not on-list */ if (!rstatc->updated_next) @@ -217,10 +223,10 @@ static struct cgroup *cgroup_rstat_updated_list(struct cgroup *root, int cpu) * Unlink @root from its parent. As the updated_children list is * singly linked, we have to walk it to find the removal point. */ - parent = cgroup_parent(root); + parent = root->parent; if (parent) { struct cgroup_rstat_cpu *prstatc; - struct cgroup **nextp; + struct cgroup_subsys_state **nextp; prstatc = cgroup_rstat_cpu(parent, cpu); nextp = &prstatc->updated_children; @@ -244,7 +250,7 @@ static struct cgroup *cgroup_rstat_updated_list(struct cgroup *root, int cpu) if (child != root) head = cgroup_rstat_push_children(head, child, cpu); unlock_ret: - _cgroup_rstat_cpu_unlock(cpu_lock, cpu, root, flags, false); + _cgroup_rstat_cpu_unlock(cpu_lock, cpu, cgrp, flags, false); return head; } @@ -300,27 +306,25 @@ static inline void __cgroup_rstat_unlock(struct cgroup *cgrp, int cpu_in_loop) } /* see cgroup_rstat_flush() */ -static void cgroup_rstat_flush_locked(struct cgroup *cgrp) +static void cgroup_rstat_flush_locked(struct cgroup_subsys_state *css) __releases(&cgroup_rstat_lock) __acquires(&cgroup_rstat_lock) { + struct cgroup *cgrp = css->cgroup; int cpu; lockdep_assert_held(&cgroup_rstat_lock); for_each_possible_cpu(cpu) { - struct cgroup *pos = cgroup_rstat_updated_list(cgrp, cpu); + struct cgroup_subsys_state *pos; + pos = cgroup_rstat_updated_list(css, cpu); for (; pos; pos = pos->rstat_flush_next) { - struct cgroup_subsys_state *css; + if (!pos->ss) + cgroup_base_stat_flush(pos->cgroup, cpu); + else + pos->ss->css_rstat_flush(pos, cpu); - cgroup_base_stat_flush(pos, cpu); - bpf_rstat_flush(pos, cgroup_parent(pos), cpu); - - rcu_read_lock(); - list_for_each_entry_rcu(css, &pos->rstat_css_list, - rstat_css_node) - css->ss->css_rstat_flush(css, cpu); - rcu_read_unlock(); + bpf_rstat_flush(pos->cgroup, cgroup_parent(pos->cgroup), cpu); } /* play nice and yield if necessary */ @@ -334,93 +338,96 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp) } /** - * cgroup_rstat_flush - flush stats in @cgrp's subtree - * @cgrp: target cgroup + * cgroup_rstat_flush - flush stats in @css's rstat subtree + * @css: target cgroup subsystem state * - * Collect all per-cpu stats in @cgrp's subtree into the global counters - * and propagate them upwards. After this function returns, all cgroups in - * the subtree have up-to-date ->stat. + * Collect all per-cpu stats in @css's subtree into the global counters + * and propagate them upwards. After this function returns, all rstat + * nodes in the subtree have up-to-date ->stat. * - * This also gets all cgroups in the subtree including @cgrp off the + * This also gets all rstat nodes in the subtree including @css off the * ->updated_children lists. * * This function may block. */ -__bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp) +__bpf_kfunc void cgroup_rstat_flush(struct cgroup_subsys_state *css) { + struct cgroup *cgrp = css->cgroup; + might_sleep(); __cgroup_rstat_lock(cgrp, -1); - cgroup_rstat_flush_locked(cgrp); + cgroup_rstat_flush_locked(css); __cgroup_rstat_unlock(cgrp, -1); } /** - * cgroup_rstat_flush_hold - flush stats in @cgrp's subtree and hold - * @cgrp: target cgroup + * cgroup_rstat_flush_hold - flush stats in @css's rstat subtree and hold + * @css: target subsystem state * - * Flush stats in @cgrp's subtree and prevent further flushes. Must be + * Flush stats in @css's rstat subtree and prevent further flushes. Must be * paired with cgroup_rstat_flush_release(). * * This function may block. */ -void cgroup_rstat_flush_hold(struct cgroup *cgrp) - __acquires(&cgroup_rstat_lock) +void cgroup_rstat_flush_hold(struct cgroup_subsys_state *css) { + struct cgroup *cgrp = css->cgroup; + might_sleep(); __cgroup_rstat_lock(cgrp, -1); - cgroup_rstat_flush_locked(cgrp); + cgroup_rstat_flush_locked(css); } /** * cgroup_rstat_flush_release - release cgroup_rstat_flush_hold() - * @cgrp: cgroup used by tracepoint + * @css: css that was previously used for the call to flush hold */ -void cgroup_rstat_flush_release(struct cgroup *cgrp) - __releases(&cgroup_rstat_lock) +void cgroup_rstat_flush_release(struct cgroup_subsys_state *css) { + struct cgroup *cgrp = css->cgroup; __cgroup_rstat_unlock(cgrp, -1); } -int cgroup_rstat_init(struct cgroup *cgrp) +int cgroup_rstat_init(struct cgroup_subsys_state *css) { int cpu; - /* the root cgrp has rstat_cpu preallocated */ - if (!cgrp->rstat_cpu) { - cgrp->rstat_cpu = alloc_percpu(struct cgroup_rstat_cpu); - if (!cgrp->rstat_cpu) + /* the root cgrp's self css has rstat_cpu preallocated */ + if (!css->rstat_cpu) { + css->rstat_cpu = alloc_percpu(struct cgroup_rstat_cpu); + if (!css->rstat_cpu) return -ENOMEM; } /* ->updated_children list is self terminated */ for_each_possible_cpu(cpu) { - struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu); + struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(css, cpu); - rstatc->updated_children = cgrp; + rstatc->updated_children = css; u64_stats_init(&rstatc->bsync); } return 0; } -void cgroup_rstat_exit(struct cgroup *cgrp) +void cgroup_rstat_exit(struct cgroup_subsys_state *css) { int cpu; - cgroup_rstat_flush(cgrp); + cgroup_rstat_flush(css); /* sanity check */ for_each_possible_cpu(cpu) { - struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu); + struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(css, cpu); - if (WARN_ON_ONCE(rstatc->updated_children != cgrp) || + if (WARN_ON_ONCE(rstatc->updated_children != css) || WARN_ON_ONCE(rstatc->updated_next)) return; } - free_percpu(cgrp->rstat_cpu); - cgrp->rstat_cpu = NULL; + free_percpu(css->rstat_cpu); + css->rstat_cpu = NULL; } void __init cgroup_rstat_boot(void) @@ -461,7 +468,7 @@ static void cgroup_base_stat_sub(struct cgroup_base_stat *dst_bstat, static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) { - struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu); + struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(&cgrp->self, cpu); struct cgroup *parent = cgroup_parent(cgrp); struct cgroup_rstat_cpu *prstatc; struct cgroup_base_stat delta; @@ -491,7 +498,7 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) cgroup_base_stat_add(&cgrp->last_bstat, &delta); delta = rstatc->subtree_bstat; - prstatc = cgroup_rstat_cpu(parent, cpu); + prstatc = cgroup_rstat_cpu(&parent->self, cpu); cgroup_base_stat_sub(&delta, &rstatc->last_subtree_bstat); cgroup_base_stat_add(&prstatc->subtree_bstat, &delta); cgroup_base_stat_add(&rstatc->last_subtree_bstat, &delta); @@ -503,7 +510,7 @@ cgroup_base_stat_cputime_account_begin(struct cgroup *cgrp, unsigned long *flags { struct cgroup_rstat_cpu *rstatc; - rstatc = get_cpu_ptr(cgrp->rstat_cpu); + rstatc = get_cpu_ptr(cgrp->self.rstat_cpu); *flags = u64_stats_update_begin_irqsave(&rstatc->bsync); return rstatc; } @@ -513,7 +520,7 @@ static void cgroup_base_stat_cputime_account_end(struct cgroup *cgrp, unsigned long flags) { u64_stats_update_end_irqrestore(&rstatc->bsync, flags); - cgroup_rstat_updated(cgrp, smp_processor_id()); + cgroup_rstat_updated(&cgrp->self, smp_processor_id()); put_cpu_ptr(rstatc); } @@ -615,12 +622,12 @@ void cgroup_base_stat_cputime_show(struct seq_file *seq) u64 usage, utime, stime, ntime; if (cgroup_parent(cgrp)) { - cgroup_rstat_flush_hold(cgrp); + cgroup_rstat_flush_hold(&cgrp->self); usage = cgrp->bstat.cputime.sum_exec_runtime; cputime_adjust(&cgrp->bstat.cputime, &cgrp->prev_cputime, &utime, &stime); ntime = cgrp->bstat.ntime; - cgroup_rstat_flush_release(cgrp); + cgroup_rstat_flush_release(&cgrp->self); } else { /* cgrp->bstat of root is not actually used, reuse it */ root_cgroup_cputime(&cgrp->bstat); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 46f8b372d212..88c2c8e610b1 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -579,7 +579,7 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) if (!val) return; - cgroup_rstat_updated(memcg->css.cgroup, cpu); + cgroup_rstat_updated(&memcg->css, cpu); statc = this_cpu_ptr(memcg->vmstats_percpu); for (; statc; statc = statc->parent) { stats_updates = READ_ONCE(statc->stats_updates) + abs(val); @@ -611,7 +611,7 @@ static void __mem_cgroup_flush_stats(struct mem_cgroup *memcg, bool force) if (mem_cgroup_is_root(memcg)) WRITE_ONCE(flush_last_time, jiffies_64); - cgroup_rstat_flush(memcg->css.cgroup); + cgroup_rstat_flush(&memcg->css); } /* diff --git a/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c b/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c index 38f78d9345de..f362f7d41b9e 100644 --- a/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c +++ b/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c @@ -45,7 +45,7 @@ int BPF_PROG(test_percpu2, struct bpf_testmod_btf_type_tag_2 *arg) SEC("tp_btf/cgroup_mkdir") int BPF_PROG(test_percpu_load, struct cgroup *cgrp, const char *path) { - g = (__u64)cgrp->rstat_cpu->updated_children; + g = (__u64)cgrp->self.rstat_cpu->updated_children; return 0; } @@ -56,7 +56,8 @@ int BPF_PROG(test_percpu_helper, struct cgroup *cgrp, const char *path) __u32 cpu; cpu = bpf_get_smp_processor_id(); - rstat = (struct cgroup_rstat_cpu *)bpf_per_cpu_ptr(cgrp->rstat_cpu, cpu); + rstat = (struct cgroup_rstat_cpu *)bpf_per_cpu_ptr( + cgrp->self.rstat_cpu, cpu); if (rstat) { /* READ_ONCE */ *(volatile int *)rstat; diff --git a/tools/testing/selftests/bpf/progs/cgroup_hierarchical_stats.c b/tools/testing/selftests/bpf/progs/cgroup_hierarchical_stats.c index c74362854948..10c803c8dc70 100644 --- a/tools/testing/selftests/bpf/progs/cgroup_hierarchical_stats.c +++ b/tools/testing/selftests/bpf/progs/cgroup_hierarchical_stats.c @@ -37,8 +37,8 @@ struct { __type(value, struct attach_counter); } attach_counters SEC(".maps"); -extern void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) __ksym; -extern void cgroup_rstat_flush(struct cgroup *cgrp) __ksym; +extern void cgroup_rstat_updated(struct cgroup_subsys_state *css, int cpu) __ksym; +extern void cgroup_rstat_flush(struct cgroup_subsys_state *css) __ksym; static uint64_t cgroup_id(struct cgroup *cgrp) { @@ -75,7 +75,7 @@ int BPF_PROG(counter, struct cgroup *dst_cgrp, struct task_struct *leader, else if (create_percpu_attach_counter(cg_id, 1)) return 0; - cgroup_rstat_updated(dst_cgrp, bpf_get_smp_processor_id()); + cgroup_rstat_updated(&dst_cgrp->self, bpf_get_smp_processor_id()); return 0; } @@ -141,7 +141,7 @@ int BPF_PROG(dumper, struct bpf_iter_meta *meta, struct cgroup *cgrp) return 1; /* Flush the stats to make sure we get the most updated numbers */ - cgroup_rstat_flush(cgrp); + cgroup_rstat_flush(&cgrp->self); total_counter = bpf_map_lookup_elem(&attach_counters, &cg_id); if (!total_counter) { From patchwork Thu Feb 27 21:55:41 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: JP Kobryn X-Patchwork-Id: 13995242 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 598F5C197BF for ; Thu, 27 Feb 2025 21:56:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6909E280003; Thu, 27 Feb 2025 16:56:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 641B86B0092; Thu, 27 Feb 2025 16:56:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 475D4280003; Thu, 27 Feb 2025 16:56:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 0A0A36B008C for ; Thu, 27 Feb 2025 16:56:01 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 99A2F4B7EC for ; Thu, 27 Feb 2025 21:56:01 +0000 (UTC) X-FDA: 83167082922.28.1795B7F Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) by imf07.hostedemail.com (Postfix) with ESMTP id 99A6440004 for ; Thu, 27 Feb 2025 21:55:59 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=OQjmPcK8; spf=pass (imf07.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740693359; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Rzxzwt2mUUhnKAxYxNpGGBXKdjzbS8+sU15pD45zOWk=; b=7ByAlXpC/ke28LzFeI6OIi83xR91MKLXh5UlZ5OfEMqkSZRfop9npbypiFGZRMSIRhiU+U TpTLi0OnzMTFDA1F1Nifluv0rS++VoZ9w3N/Gli9WAaKRmDSVM3s6GMNrEurgW9spUPRtQ M93nK22vovpLTlMLtaI0WOFt1nAUpfk= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=OQjmPcK8; spf=pass (imf07.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740693359; a=rsa-sha256; cv=none; b=bZPDEQPPJ8jE+KJrucyHLUm3S35ELVJvmaz9yG2hdRdAGe6csuHtxZ4an+dBTKZ3PSB8j6 oFCFwX53xjsP2t+yJijjRlGQ6G6dRBQnfIIvpj69elj+fhsJtimvVAB+Wn2IzI0hM+7po+ glojBqehqIA9Z3gIdBTZf2MBhtr+0S8= Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-2232b12cd36so21595585ad.0 for ; Thu, 27 Feb 2025 13:55:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740693358; x=1741298158; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Rzxzwt2mUUhnKAxYxNpGGBXKdjzbS8+sU15pD45zOWk=; b=OQjmPcK8sJ0Oz1OWRrWHAxl4kcFF2Znny1wAS9A4s25plNjJO7BfgP4nOa7mQ9kUEc wPLCPRGmzG2DpqH17cx1BaNVaro9CyJRRjNSmvcXSLOTVCBstQEiQSzfupjb0n2iy11H 7tcepWd7tSrCx9Lk52Vjhh1Bf7pytERtJ3I0TR/RdqvU8siHiPenVwHs9T40UmgnD9Ek 55ZHP9bqZ933MCR7hch0x0MDMCkgNBVCZXotRpNDpQQFxFMTShrAJli3zf2u/7HQaYEj 93GAomUSZAtU97FTlpiOcxILtElK2+ciBS0/m8/r9WdQSbAekw+W2NfivBziUJi4sudG olSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740693358; x=1741298158; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Rzxzwt2mUUhnKAxYxNpGGBXKdjzbS8+sU15pD45zOWk=; b=pusXd8d8LnK5XYBOGFAPiXl7MruapidKIEa6K/2iIOzI6xdPu+qwwk9iN+VVES1tM6 qz2BU3fI6WwsiK+uruCOI/xDJ4QdtwvVIaDubQ5zp2ycG+45c7IfEADC0VdSG1kmshfp D9K35BDMj3NvYV3e/hXXeWadEU76Tcfr4fBSr1n/4aYoprk4elQp2bM67DVLHuvzMJtd kE1LKxu1RhH0qEZIxMJOMrgrwT1y1HwBwdR2KTKLo7vjwrCRnT28+bKrYBtPtr/qMTlE Hu8bkzHmU077w7W3mMr9vIkACrvcoZQB2N3H5WJzDMzTqmhrV4/atWBpR8f5RasLjTw0 ZY3A== X-Gm-Message-State: AOJu0YyGR9xcoLxVT05Y4wDCkkmzX6p9VrawSOlOaND2BSzkwK5Ujep0 5djyBiNiAJdXVF7lFM8mgQDgEHewW9zXjI9H2BkWbqMw42xAlg+X X-Gm-Gg: ASbGncsaA/GexG8itXzhHEEuk3GLRvbdDLNP0wGOU6hvE7RPnzkoYxu3WAn+rEI57aN YDcRZEY/9ZdiQdoSrRZJ3tTePVgYDWy9/AzIR9Glrao08pwS1y52bQynVsBMlhB6upQuwRgnb7u jm2/zvyx+BqozBoA0Lv0o1M5cyBX56rrHivL3LrYqrYVdjwF0JcaIm79cYg6sxTieR3i9TaoRXq svu1BxPJhGJ+CyGJqnAr6Do4Z1wFyff1+5LgxW7teGtH6bQsmhVeqsH4h2MgLs2Avzqddptwt/U xGl0kZ52Nn9oemx9kb2ZHwSUnZ0hZfxVFZK64YMj3ndypBebLuFSRg== X-Google-Smtp-Source: AGHT+IGmCjZRC3Jlpg0kd+gq47vBuCLSQ0CSc+pP3NgcWQ95MgHTry9WY8zWNTMUBXSWfW65jZLQfg== X-Received: by 2002:a05:6a00:b8e:b0:732:4c47:f807 with SMTP id d2e1a72fcca58-734ac4499d3mr1561671b3a.21.1740693358442; Thu, 27 Feb 2025 13:55:58 -0800 (PST) Received: from jpkobryn-fedora-PF5CFKNC.thefacebook.com ([2620:10d:c090:500::4:4d60]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-734a003eb65sm2301321b3a.149.2025.02.27.13.55.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Feb 2025 13:55:57 -0800 (PST) From: inwardvessel To: tj@kernel.org, shakeel.butt@linux.dev, yosryahmed@google.com, mhocko@kernel.org, hannes@cmpxchg.org, akpm@linux-foundation.org Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 2/4 v2] cgroup: rstat lock indirection Date: Thu, 27 Feb 2025 13:55:41 -0800 Message-ID: <20250227215543.49928-3-inwardvessel@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250227215543.49928-1-inwardvessel@gmail.com> References: <20250227215543.49928-1-inwardvessel@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 99A6440004 X-Stat-Signature: xdc8eo8n9o1j1e3wrg91n7if1srnohww X-HE-Tag: 1740693359-166728 X-HE-Meta: U2FsdGVkX18UTNBx4qzdmik/WqunnT3jUzBnKls+5LcXnhCWdOQz/3kJc0a1T4pHYF/lFAekEQwr29c0UvDJbpDX5ym5gRvwPirEy3wPFSi51t5qcnkLnZWjQKzzs30NnmDwFe+XrJdG3UKLFFiyQEyiD3K6nX9nDMoMTMhOlMZ2aSvKL03v+QCZmaKpUJ2rAoSQvBPIz0uZBOXjhsZ/NY8aMB6ZaJk6FTrCPMzGlmF//WQGOkIJwg7jeO49IcOYWxCyJuET4uAuUPaRIqmrkwsksnBFZMtrELa3GZN3LYiqbiEB6IIIym+OTfdRJaPLEYMkKK7arNTaCAIhe5MZBNdupwSWg9GqlxTC0OelEMCsNhbw0uOUUAvdCcGFhjf+en0X8pFsbqITG6993Q+gzar9L1TqFvE1i7dhZlNokyvP5UWnIHYZQ7dZudkBkT3nphRB6fmB4/0AgKE+FKw2w2Id2EvI/HFdG2r1k1jFEci80fcrw3laAhNeb27zuGgZFRtwd95W6g6dFhFlyx4LReg3r4apE3MdNkCfy91xR1r1AUieNH7dfHo3yKG3Gn9r9B81PtQLjfUO87lwGyMGhLOssmy64dIF7GXihc67P395/buahFwsVnK/oaAm/LF3BQwZoN3WC4ud6xaLpP9Ab22gXLD/p0qQMaZBIzYodDkJiLUgBu7Wdx1SgcmsECDagazJDfuTcqpBJmYNEX/7UpdjTjLOMNqE56Bs0THsOgooCtW336yNFo61juimKrwHtRVYNTrQFi0DvzU8oJ65PnxFayIqAFJSmUC87wNDmRT2a8Q/Rnl7vlfu0iAMO/GJNc5QZNvSy69DZDCQzorSPzr6JmG90HwIFdru7mxXAFlf8mRgBmRKTA/xs68u6FA8YSpDOYC2KQIEhK8hTo0oszzyEvhKTKMqNQSuj//Lz8B5Y0mPaQfdepLDnyAYbsZE33BLxFTuGFVoWqAwBEs 5fnOt+g5 UoZ7HqijvefxMabqWHI72Z3hJ2Byr1Lr/JPJeCwK7dGEitQeDoUdDEVA28NI610X+FZVH11RTewlvkP5aIpdiL3/46LhBjo8848+kbAXdj4ojhHGexlH+v7xAGAreu6EB4acRqXRf3EHRuN92g3rZZSeHx1BcrQ9ygJ+BzCUrE6XovuA0ATBMMwErrzBil/esPv2JxOWrIGLs+rvBhN6VymkFQLuDLq9Ez0R+JdRcz0p1YZB/wyl+1/NmavX+23I8GsFqn3IF+YlVV1T4FzxtwTuI+62SvdLuSXbEKwY1k/xkjHaiYn5L60lOqsGeafyGJO5cjTOVoqtc2NnAbcHmRk9pTpsecBI8fAaJMDJrYkr+y5BCvminaYIDfarB+n4Th5uy X-Bogosity: Ham, tests=bogofilter, spamicity=0.000041, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: JP Kobryn Instead of accessing the target lock directly via global var, access it indirectly in the form of a new parameter. Also change the ordering of the parameters to be consistent with the related per-cpu locking function _cgroup_rstat_cpu_lock(). Reviewed-by: Shakeel Butt Signed-off-by: JP Kobryn --- kernel/cgroup/rstat.c | 41 ++++++++++++++++++++++------------------- 1 file changed, 22 insertions(+), 19 deletions(-) diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index 9976f9acd62b..88908ef9212d 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -277,7 +277,7 @@ __weak noinline void bpf_rstat_flush(struct cgroup *cgrp, __bpf_hook_end(); /* - * Helper functions for locking cgroup_rstat_lock. + * Helper functions for locking. * * This makes it easier to diagnose locking issues and contention in * production environments. The parameter @cpu_in_loop indicate lock @@ -285,29 +285,32 @@ __bpf_hook_end(); * value -1 is used when obtaining the main lock else this is the CPU * number processed last. */ -static inline void __cgroup_rstat_lock(struct cgroup *cgrp, int cpu_in_loop) - __acquires(&cgroup_rstat_lock) +static inline void __cgroup_rstat_lock(spinlock_t *lock, + struct cgroup *cgrp, int cpu_in_loop) + __acquires(lock) { bool contended; - contended = !spin_trylock_irq(&cgroup_rstat_lock); + contended = !spin_trylock_irq(lock); if (contended) { trace_cgroup_rstat_lock_contended(cgrp, cpu_in_loop, contended); - spin_lock_irq(&cgroup_rstat_lock); + spin_lock_irq(lock); } trace_cgroup_rstat_locked(cgrp, cpu_in_loop, contended); } -static inline void __cgroup_rstat_unlock(struct cgroup *cgrp, int cpu_in_loop) - __releases(&cgroup_rstat_lock) +static inline void __cgroup_rstat_unlock(spinlock_t *lock, + struct cgroup *cgrp, int cpu_in_loop) + __releases(lock) { trace_cgroup_rstat_unlock(cgrp, cpu_in_loop, false); - spin_unlock_irq(&cgroup_rstat_lock); + spin_unlock_irq(lock); } /* see cgroup_rstat_flush() */ -static void cgroup_rstat_flush_locked(struct cgroup_subsys_state *css) - __releases(&cgroup_rstat_lock) __acquires(&cgroup_rstat_lock) +static void cgroup_rstat_flush_locked(struct cgroup_subsys_state *css, + spinlock_t *lock) + __releases(lock) __acquires(lock) { struct cgroup *cgrp = css->cgroup; int cpu; @@ -328,11 +331,11 @@ static void cgroup_rstat_flush_locked(struct cgroup_subsys_state *css) } /* play nice and yield if necessary */ - if (need_resched() || spin_needbreak(&cgroup_rstat_lock)) { - __cgroup_rstat_unlock(cgrp, cpu); + if (need_resched() || spin_needbreak(lock)) { + __cgroup_rstat_unlock(lock, cgrp, cpu); if (!cond_resched()) cpu_relax(); - __cgroup_rstat_lock(cgrp, cpu); + __cgroup_rstat_lock(lock, cgrp, cpu); } } } @@ -356,9 +359,9 @@ __bpf_kfunc void cgroup_rstat_flush(struct cgroup_subsys_state *css) might_sleep(); - __cgroup_rstat_lock(cgrp, -1); - cgroup_rstat_flush_locked(css); - __cgroup_rstat_unlock(cgrp, -1); + __cgroup_rstat_lock(&cgroup_rstat_lock, cgrp, -1); + cgroup_rstat_flush_locked(css, &cgroup_rstat_lock); + __cgroup_rstat_unlock(&cgroup_rstat_lock, cgrp, -1); } /** @@ -375,8 +378,8 @@ void cgroup_rstat_flush_hold(struct cgroup_subsys_state *css) struct cgroup *cgrp = css->cgroup; might_sleep(); - __cgroup_rstat_lock(cgrp, -1); - cgroup_rstat_flush_locked(css); + __cgroup_rstat_lock(&cgroup_rstat_lock, cgrp, -1); + cgroup_rstat_flush_locked(css, &cgroup_rstat_lock); } /** @@ -386,7 +389,7 @@ void cgroup_rstat_flush_hold(struct cgroup_subsys_state *css) void cgroup_rstat_flush_release(struct cgroup_subsys_state *css) { struct cgroup *cgrp = css->cgroup; - __cgroup_rstat_unlock(cgrp, -1); + __cgroup_rstat_unlock(&cgroup_rstat_lock, cgrp, -1); } int cgroup_rstat_init(struct cgroup_subsys_state *css) From patchwork Thu Feb 27 21:55:42 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: JP Kobryn X-Patchwork-Id: 13995243 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A77D7C197BF for ; Thu, 27 Feb 2025 21:56:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 079A36B008C; Thu, 27 Feb 2025 16:56:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F41486B0092; Thu, 27 Feb 2025 16:56:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DE415280004; Thu, 27 Feb 2025 16:56:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id BB0F26B008C for ; Thu, 27 Feb 2025 16:56:03 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 478544AA3C for ; Thu, 27 Feb 2025 21:56:03 +0000 (UTC) X-FDA: 83167083006.03.AE04744 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf04.hostedemail.com (Postfix) with ESMTP id 33FEC40009 for ; Thu, 27 Feb 2025 21:56:01 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jdRxp1Uh; spf=pass (imf04.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740693361; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mG5vp9T1tPNWJYl4dV5u9R8irLnKVTPrFYBdhD+XKBw=; b=qvcRHKb+1TIaYdH3n6L1nk9u+vIMr8EqWfFbv1T6a5e7JFapnZJ4GnB4jdFnMAMwyww4GP cqGJ5YnY9l/+MJb9Gqr3eRPKX8XHVNfSBri0pTe72dejR83fTYPoHTrusUd/bG2ExJZujW bqGfv2MNDeKPqqZcvPFpVg4IwnAb/a4= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jdRxp1Uh; spf=pass (imf04.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740693361; a=rsa-sha256; cv=none; b=MPwlJ6+W9XoRqrnuAtB/0uWWjJv1/paQV7nX2vCl3TQ1WVyp6uX2mIEUv7qHMVosSuHDx/ sA3eEDZqbEhGij7LMHokJmvuftC6bTtwIXsj3u8BlX7vpdYlkdKvFLswQ16lpcrRGUWwoJ bDMK4no46Mps65rmoXmJQRaw2FKZKpk= Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-2234bec7192so38861555ad.2 for ; Thu, 27 Feb 2025 13:56:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740693360; x=1741298160; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mG5vp9T1tPNWJYl4dV5u9R8irLnKVTPrFYBdhD+XKBw=; b=jdRxp1UhpEi0aunu4E3Mfe0mRMJhoOlSCYiawzFF6q4x1ssARh3HTbQoooNntLG2fS ZTZh2vdyAKlmWPxMA/Qc5FkluvDGqhNVg3ez1RkbEZqgDAO8a128xgxa9nYzFe5Lut0S HfTLasNqpTjA2I8kXhtkvIIbt5YCl3FKPTrqSCdsRNsrGFw7uYz88N4BgvxN+Cil8zDU BpWzNyuNcmRsQSiQnM+7XpL/Fe4VvymhmVwfexDCpMj/fx2wgtBx5xKXeLG0trWsRhtY V6v8AntEtVDXCYVQI86Xz+AWEJbZMOxzkyL3uIdX8thyFgD/CPdZJVFR37uynMvaqcFY TcnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740693360; x=1741298160; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mG5vp9T1tPNWJYl4dV5u9R8irLnKVTPrFYBdhD+XKBw=; b=H1YduPndRODxGjqnpAh/1VBAHt4T5B9SRqbE50G0OqmdoK+aK2zfCA2aree4F9pEYM 1wORRb1U+pCoEHL4tMgLcDDN9Evy6wC/cpLoWR9hoJzPp7e2CeDTR+LfkGuUXTTiv3Nv Z/D+2p9nNLQuSxHfBAu8pyXZM5XZF3moBJ7ckqAGBDXWHDWqDOTkebyzA9LS4madbXiA 0Kk7gdDZxaRcJJkkSb+zKotsGPU5Pv/AyveMKTWIH9XrKR5qMNqRPuuDZRDVz6g+NW+T jcjhEDW5klXZXfAa9pJhMuWc8ruBkeWb+TN0+TbJ54nR9BVbcbRiJ7ejmu2RXBnr6WvW O7Lg== X-Gm-Message-State: AOJu0YwZ15835qBrBEfObSxFTf6CuaYeHi8UZUrCqqk6XS2pht+G2iuV 2kz5TaqV8NUGDljeGwuPldKVstdZ5oXFuYZIP2ha/RZIL4jRUactSvTG8Q== X-Gm-Gg: ASbGncsZX8RfW6Rf1G+F0hTK59N6PTl5KOA8VNcupaGGAi/jubay7d5tY79IDw7+nG1 TmbvzFzSjeJhrnO7LLxMFzra5GJiB5gdcIeNL9Hoz+y4xD7+1GyZWzzg92DnnLwwsRaSjGO/WSw XEfFuSLRwhvZnPbjD6V+ykh0bJipKHYg4QEnqwvV9Qnv5YS/DcdNylmaOw6phTTRH+uBTIcQFvF gN05Sy30IkQcOTTngTSGmIV8vYzMuZ1gtPMvrN5lktSGVnQwm0yqp6FqkL7BluIlgjDPxQqSYYs kW9EGBQzdK0f9sMF2vV3gYvreKXs0L3XCzQ5fTCxn43bf+rOX4LgcQ== X-Google-Smtp-Source: AGHT+IF+ODqFionKKWEwTTB+YdOdfH0lHB5SMuZxWTz9E/RuYeTRSAvL8fa3TKA7DBGZe3irBXJCIw== X-Received: by 2002:a17:902:db0e:b0:223:619e:71da with SMTP id d9443c01a7336-2236921eb84mr13886465ad.49.1740693359918; Thu, 27 Feb 2025 13:55:59 -0800 (PST) Received: from jpkobryn-fedora-PF5CFKNC.thefacebook.com ([2620:10d:c090:500::4:4d60]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-734a003eb65sm2301321b3a.149.2025.02.27.13.55.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Feb 2025 13:55:59 -0800 (PST) From: inwardvessel To: tj@kernel.org, shakeel.butt@linux.dev, yosryahmed@google.com, mhocko@kernel.org, hannes@cmpxchg.org, akpm@linux-foundation.org Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 3/4 v2] cgroup: separate rstat locks for subsystems Date: Thu, 27 Feb 2025 13:55:42 -0800 Message-ID: <20250227215543.49928-4-inwardvessel@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250227215543.49928-1-inwardvessel@gmail.com> References: <20250227215543.49928-1-inwardvessel@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: oi11ehbc9cpssw8cwpopsf4s7c98mxjq X-Rspamd-Queue-Id: 33FEC40009 X-Rspamd-Server: rspam07 X-HE-Tag: 1740693361-567134 X-HE-Meta: U2FsdGVkX1+hx5E+H/46o3U66bFwV3pL0WCEcpEJNkLDIkB5Q0JrlaBkLXcuMhkDkoa+7OiGDxF/MfPkZdsXZmZvv4c2X8rm7/8dothp02YK3aBeSpH2oPhpn2ytyUmWgjsPo4ler2SvgImVOFduvefpHwk5vtEZWifhjTWcSLdOi8ux2uXXTM+PbMipGJb7KGalMgfpUiVD8Gh1LoX6nce1S2rd+BOOPaYiMARndyoXysNLiq+WXTBJGNJZYdeilWO86F+nMmviu+WHkt+ffRtQZyl2TYckpLKu+53NaSKHzsEvxe4vfd+tKlzrIWyh6OTXQW60xzTHP9m/u0HS13cztP1VW3rwR4FcGHeLhHtdlEQ05XfNz+44wL9SUl7ueNMw5wG0Knd0Jh41yFwIa/P++k/8MpQmGQYi3vHF+geQBm5f76LVybtWD45/HvxyApsAvsisf0i1DR+aqwRoCbo8GPIHoRqL5W0YlRUuuZP8PHEGaTP4V2m1Mm6sgbn+/P5AAecQC/DBlqqxtSi04EzkpN7mZ7J1/dd8Pva55dFKiyD10zg4tbJvyC3BYlnP1uSowrL6Ayz1bBrPb50qvSHtEl4rLM+2jwesOomeY77xmE77qTcrshstjg70yu2obVdcfRR2KEeiKsYTD8A3PBMmfzhBBVOXVvcWj5vS8WwhZTPLUz2gGrrINpJmmtiNBC2cSHvEIqoSuO88+V8fkLMGo8wa2VJ2K3d8C+VD2QJ5EtiNxrdxNvqGiszDQAUpzgxdqU/qFpVe9NfQ7DynHFN90lTOUJ2uDbV9my8e+KLIIJJBLxhqPLfgabJdnBP97k7v2MsgKS2kWO8HUMrr1BVEF58jH5da3splMnYMh/JCrA1wsdB/SyiWNhs7MjnwmOKiLLDfuNbsjiKOA5DvQXDwj/7tJIqv2RDiw2iIJpYGR3JIZfRx32xl3G0u8BCNeFrWy2G5xMr3g1fsjIx JxlbJWsf DqP3YVUWn5OH4+lNR+fBT2GP88mVuBWEFGRShVKPJwWG4GUdqIPR+5kxs9cnGbMZPgUzPLRzpTHIics3h0n2dx2q7sn3tITP6lP3cuLDDguw/4vQUntuHdNn+bhG+cmtXt/WHqw1aW+r2LhXGBa29M1YyS+zrqkVJH/7nW+W7Pii4pGLxVnqJpsg6DYnhNI4hVPbxlM+ve/TNBpxGIMw270CooMdlzXMF8h+g0uerWW1NW01u94m2fY93Hcp+LNvG04bPiGEhN8tE0zicGokd4pfKsYUOMSejEAPS9nRP6CvPMita7WYHiI9atcYIFgN8UwiKSi8jv9gMGSZdqdjTkRR+fT4dokMytATPC20ScUxTT7Jemr8QUpXk6RU9ISKbbBG0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: JP Kobryn Let the existing locks be dedicated to the base stats and rename them as such. Also add new rstat locks for each enabled subsystem. When handling cgroup subsystem states, distinguish between formal subsystems (memory, io, etc) and the base stats subsystem state (represented by cgroup::self) to decide on which locks to take. This change is made to prevent contention between subsystems when updating/flushing stats. Signed-off-by: JP Kobryn Reviewed-by: Shakeel Butt --- kernel/cgroup/rstat.c | 93 +++++++++++++++++++++++++++++++++---------- 1 file changed, 72 insertions(+), 21 deletions(-) diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index 88908ef9212d..b3eaefc1fd07 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -9,8 +9,12 @@ #include -static DEFINE_SPINLOCK(cgroup_rstat_lock); -static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_cpu_lock); +static DEFINE_SPINLOCK(cgroup_rstat_base_lock); +static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_base_cpu_lock); + +static spinlock_t cgroup_rstat_subsys_lock[CGROUP_SUBSYS_COUNT]; +static DEFINE_PER_CPU(raw_spinlock_t, + cgroup_rstat_subsys_cpu_lock[CGROUP_SUBSYS_COUNT]); static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu); @@ -20,8 +24,13 @@ static struct cgroup_rstat_cpu *cgroup_rstat_cpu( return per_cpu_ptr(css->rstat_cpu, cpu); } +static inline bool is_base_css(struct cgroup_subsys_state *css) +{ + return css->ss == NULL; +} + /* - * Helper functions for rstat per CPU lock (cgroup_rstat_cpu_lock). + * Helper functions for rstat per CPU locks. * * This makes it easier to diagnose locking issues and contention in * production environments. The parameter @fast_path determine the @@ -36,12 +45,12 @@ unsigned long _cgroup_rstat_cpu_lock(raw_spinlock_t *cpu_lock, int cpu, bool contended; /* - * The _irqsave() is needed because cgroup_rstat_lock is - * spinlock_t which is a sleeping lock on PREEMPT_RT. Acquiring - * this lock with the _irq() suffix only disables interrupts on - * a non-PREEMPT_RT kernel. The raw_spinlock_t below disables - * interrupts on both configurations. The _irqsave() ensures - * that interrupts are always disabled and later restored. + * The _irqsave() is needed because the locks used for flushing are + * spinlock_t which is a sleeping lock on PREEMPT_RT. Acquiring this lock + * with the _irq() suffix only disables interrupts on a non-PREEMPT_RT + * kernel. The raw_spinlock_t below disables interrupts on both + * configurations. The _irqsave() ensures that interrupts are always + * disabled and later restored. */ contended = !raw_spin_trylock_irqsave(cpu_lock, flags); if (contended) { @@ -87,7 +96,7 @@ __bpf_kfunc void cgroup_rstat_updated( struct cgroup_subsys_state *css, int cpu) { struct cgroup *cgrp = css->cgroup; - raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); + raw_spinlock_t *cpu_lock; unsigned long flags; /* @@ -101,6 +110,12 @@ __bpf_kfunc void cgroup_rstat_updated( if (data_race(cgroup_rstat_cpu(css, cpu)->updated_next)) return; + if (is_base_css(css)) + cpu_lock = per_cpu_ptr(&cgroup_rstat_base_cpu_lock, cpu); + else + cpu_lock = per_cpu_ptr(cgroup_rstat_subsys_cpu_lock, cpu) + + css->ss->id; + flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, true); /* put @css and all ancestors on the corresponding updated lists */ @@ -208,11 +223,17 @@ static struct cgroup_subsys_state *cgroup_rstat_updated_list( struct cgroup_subsys_state *root, int cpu) { struct cgroup *cgrp = root->cgroup; - raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(root, cpu); struct cgroup_subsys_state *head = NULL, *parent, *child; + raw_spinlock_t *cpu_lock; unsigned long flags; + if (is_base_css(root)) + cpu_lock = per_cpu_ptr(&cgroup_rstat_base_cpu_lock, cpu); + else + cpu_lock = per_cpu_ptr(cgroup_rstat_subsys_cpu_lock, cpu) + + root->ss->id; + flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, false); /* Return NULL if this subtree is not on-list */ @@ -315,7 +336,7 @@ static void cgroup_rstat_flush_locked(struct cgroup_subsys_state *css, struct cgroup *cgrp = css->cgroup; int cpu; - lockdep_assert_held(&cgroup_rstat_lock); + lockdep_assert_held(&lock); for_each_possible_cpu(cpu) { struct cgroup_subsys_state *pos; @@ -356,12 +377,18 @@ static void cgroup_rstat_flush_locked(struct cgroup_subsys_state *css, __bpf_kfunc void cgroup_rstat_flush(struct cgroup_subsys_state *css) { struct cgroup *cgrp = css->cgroup; + spinlock_t *lock; + + if (is_base_css(css)) + lock = &cgroup_rstat_base_lock; + else + lock = &cgroup_rstat_subsys_lock[css->ss->id]; might_sleep(); - __cgroup_rstat_lock(&cgroup_rstat_lock, cgrp, -1); - cgroup_rstat_flush_locked(css, &cgroup_rstat_lock); - __cgroup_rstat_unlock(&cgroup_rstat_lock, cgrp, -1); + __cgroup_rstat_lock(lock, cgrp, -1); + cgroup_rstat_flush_locked(css, lock); + __cgroup_rstat_unlock(lock, cgrp, -1); } /** @@ -376,10 +403,16 @@ __bpf_kfunc void cgroup_rstat_flush(struct cgroup_subsys_state *css) void cgroup_rstat_flush_hold(struct cgroup_subsys_state *css) { struct cgroup *cgrp = css->cgroup; + spinlock_t *lock; + + if (is_base_css(css)) + lock = &cgroup_rstat_base_lock; + else + lock = &cgroup_rstat_subsys_lock[css->ss->id]; might_sleep(); - __cgroup_rstat_lock(&cgroup_rstat_lock, cgrp, -1); - cgroup_rstat_flush_locked(css, &cgroup_rstat_lock); + __cgroup_rstat_lock(lock, cgrp, -1); + cgroup_rstat_flush_locked(css, lock); } /** @@ -389,7 +422,14 @@ void cgroup_rstat_flush_hold(struct cgroup_subsys_state *css) void cgroup_rstat_flush_release(struct cgroup_subsys_state *css) { struct cgroup *cgrp = css->cgroup; - __cgroup_rstat_unlock(&cgroup_rstat_lock, cgrp, -1); + spinlock_t *lock; + + if (is_base_css(css)) + lock = &cgroup_rstat_base_lock; + else + lock = &cgroup_rstat_subsys_lock[css->ss->id]; + + __cgroup_rstat_unlock(lock, cgrp, -1); } int cgroup_rstat_init(struct cgroup_subsys_state *css) @@ -435,10 +475,21 @@ void cgroup_rstat_exit(struct cgroup_subsys_state *css) void __init cgroup_rstat_boot(void) { - int cpu; + struct cgroup_subsys *ss; + int cpu, ssid; - for_each_possible_cpu(cpu) - raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu)); + for_each_subsys(ss, ssid) { + spin_lock_init(&cgroup_rstat_subsys_lock[ssid]); + } + + for_each_possible_cpu(cpu) { + raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_base_cpu_lock, cpu)); + + for_each_subsys(ss, ssid) { + raw_spin_lock_init( + per_cpu_ptr(cgroup_rstat_subsys_cpu_lock, cpu) + ssid); + } + } } /* From patchwork Thu Feb 27 21:55:43 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: JP Kobryn X-Patchwork-Id: 13995244 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BAD4AC197BF for ; Thu, 27 Feb 2025 21:56:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8DDC76B0092; Thu, 27 Feb 2025 16:56:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 88B3F280005; Thu, 27 Feb 2025 16:56:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6BB13280004; Thu, 27 Feb 2025 16:56:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 2C4E46B0092 for ; Thu, 27 Feb 2025 16:56:05 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A8B0C1C9A89 for ; Thu, 27 Feb 2025 21:56:04 +0000 (UTC) X-FDA: 83167083048.22.928C897 Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) by imf22.hostedemail.com (Postfix) with ESMTP id AF069C0015 for ; Thu, 27 Feb 2025 21:56:02 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=EQ4rubGD; spf=pass (imf22.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740693362; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IvG0OX+5jTFPX8q84psW6K+/pHLcZfB+bAlB9DXTHr4=; b=Ko+XUe6f1Jo9a556a7zxnOAB+Hdlt2RvJ7WJlcyJRGaOWvF1Pu82GjRTWoHy7WJOnc0I/F il6ACryrpOIACx6gdwIHpMIvo0msUVZquC94OBL7Nl1V9gqK7SPoOSsf333v9K/Jkhb2rK Kd28WolQRmFhIf0gH9k/PetUc8Afb7I= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=EQ4rubGD; spf=pass (imf22.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740693362; a=rsa-sha256; cv=none; b=QFHLmENz5YaBIzHXRrXhJvsuJmeElKeXw43LTw8PuzHZtmWGPigGAofFpD/mvJ8V4rI221 6c66/vHvcp3dL8CEQyGLIZRPDrwCuHOHc2aPHELoPcv5BAwFXB56ZglnUaGi1hLA7SPdgB 1Db2tLQbX51R933jdpShO1T+edrAcF8= Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-22185cddbffso45831315ad.1 for ; Thu, 27 Feb 2025 13:56:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740693361; x=1741298161; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=IvG0OX+5jTFPX8q84psW6K+/pHLcZfB+bAlB9DXTHr4=; b=EQ4rubGDBaw6U9TMdmTW6uukpz1Ygz3uOsVAd39Fgu92/YUvzxBu+I0nbQbrCo3SNK VOGY2GmPlGEavUYhyEXddB/0/ueGtquu901g4CHkcVLvJNY3HfgCfJWjJ4J4gJgkHR3T MMnw9AROxy92SJpO2d2vOyUE5II3OZ9bmWiLNk5wwtkcaSVuGeBycy+EX/b1Uhu04+I1 gDC01gODUpXqnDy85gpT2cxgG36gAXSSplam9HkFfGiPLNW6PjjAGErFZmjAoO9vQL9W jjQgSUxaZhzg48STO67iOfs24gy7MHY5NTR4cCN7vitY1aUjWzF7k0M92v32bsUsHMx6 bZrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740693361; x=1741298161; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IvG0OX+5jTFPX8q84psW6K+/pHLcZfB+bAlB9DXTHr4=; b=C2An4aywq3xAOHpu5DEqZSV7Gq9NgqlXCj2m9b7oZycH6ePIx0KzUoy8LvbvzUYTHb 3Kn6LiA5ekyHae4fqKAKU2v0mKJSjOWujThb6kiPBjP1hkncNaeh5JgiCpyEC+wC88Al zIsm4E/hORYDr5SASxVwa6rENeHP/q39JoWUZSilJjFpdiYiblYnXhwEhkhQXOA6wBfv Ma6NVPxW3kzUwPmRGGkg6p/rvhjABn4yDq9CPMviEgChy7tmx1g7tpE0t0KEL3DIS5jh ojoF9PsBXEyqhLp46AslQBgbiOFpVntuI/QoEQOqcHy2sKW5+vG4Kofkq2LBOl9S8SY7 Svnw== X-Gm-Message-State: AOJu0YwU2RAfHbv7nuQPUcu0j2ArD9xs9YdN8OcZZhh+j4LClgYDBbyk 8DFlb7XpCJe/wWULTVl874jOQBpIBuDZK1UyYGovjh6XhrWZn+XK X-Gm-Gg: ASbGncttU+cYR+o/dlhCUwOaOMXl2N9MCPKDxcD8/YTzaZ2deYP4tLmjOrfAesSo01r o+hEjkPGVjHymNf+B94vPcg+BfsR9M3ZhR9sAqrnySky1VY9O4/hq32O7libFqyeekbGtWzo1+V VGu36+5+Xs3au5DF2BcPYIYpPNdbWKwYj0vyyCdEWhp3UXgn8XpjEnRgo2WUoEJbkydrzSRlUhS 8o4fki1cCq9ivoPO6HMMVcIFllySLVxlhpy3YJWO3xcn/2fxwYmDKxJR9eHJOXHMxb5fKbl6zxn PDqAt88tK3CBoCxedCk6LtkWHdP+bg5PUDtU64c7E9Ey7nNIPrzLHw== X-Google-Smtp-Source: AGHT+IGjcfGKUg2ZVaYBK69vGrD7xJHUiP7Dd4hYZ4nUyRcwKvzSQp8A3/7Xj1SqgiBaouRBh9iJAA== X-Received: by 2002:a05:6a00:22c5:b0:732:6a48:22f6 with SMTP id d2e1a72fcca58-7349d2f51cbmr7384522b3a.9.1740693361344; Thu, 27 Feb 2025 13:56:01 -0800 (PST) Received: from jpkobryn-fedora-PF5CFKNC.thefacebook.com ([2620:10d:c090:500::4:4d60]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-734a003eb65sm2301321b3a.149.2025.02.27.13.56.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Feb 2025 13:56:00 -0800 (PST) From: inwardvessel To: tj@kernel.org, shakeel.butt@linux.dev, yosryahmed@google.com, mhocko@kernel.org, hannes@cmpxchg.org, akpm@linux-foundation.org Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 4/4 v2] cgroup: separate rstat list pointers from base stats Date: Thu, 27 Feb 2025 13:55:43 -0800 Message-ID: <20250227215543.49928-5-inwardvessel@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250227215543.49928-1-inwardvessel@gmail.com> References: <20250227215543.49928-1-inwardvessel@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: AF069C0015 X-Stat-Signature: 8tg5p6sz8ydceowq8yyu3n3miuf6ynot X-HE-Tag: 1740693362-851291 X-HE-Meta: U2FsdGVkX19Q9aYWXHPkYNx4npd6D62TS0Ys6zVNjEZUH3urxtP4fKgwEKywaOQ0PQcFpfkGEHaxlBSezRegFYuMdIaup5eDQvo+oYmCuXrJqnaRRqDwn9+X7X8Pw5mkD818Bb50GuWAdRX7coiLB/UfMS1OAmF7dv2Ep59K/XXQGD/K5up4ok69Eqf1hTPqufj3oQgoFcixa0Ie3vON7+DS2cB/8CYJddUqDegrfr5cKnageJRuARfLryj5GwMAWG1L0RxyRQ3VBpXLqv3N++CnaaxjfsoJ4WHsx56O0DhTu/JM/ZZDTLZAdp6e3Qn8abYDI2ZhoBKMfrj+4F84d/V2tj/2uBABGP3rdqzcu2tj4Ymz/VNB1lctcY6tb/eQgk5xPONQU//tZZpGnXbiqUAfW1Z6KHUKgo3LBkK5xs7xNEEHVBZ8muDKnsKlsKDLw+RK2USwyAl5K1RYkmUruZ1+SHeogD97WMVrkCc4UHdQWUvXWc0jleZ81Gt36Yn8PjBniDWCpEEeo98tVDcOaSbHO8CH1Qy6e/xm2+yqMLTNq4UXdaCt9ABvj/Ro2RtgLA92KjXAQ5yH6Nbz0J5Xqq5KpsNjAhWczAWgd+teYZWxJ0YuGTEGqXN7Fektk/h4zsEB7S8/NiUBeGoMBmZW2UJ/EsfE9s5r2BXjooIJJ7Vpr7bjH0v4RWbvU75sl5UoZK4tF/qPDs8/g4LCJCBZAuXg+2icBptJuuwOYUTUAGTXeHc7R8GnekYd5gnfy5Jox/LRkB3JxLplZYhrrfhSdJAqa0aCJoK94feYXeZICxXDMtXuaRYLaEalCKR/mrPCXs9mRsc74rK6oWb5z/M2VhjOOE9v3YKSwdQFuF9KztLv+v2QWk9FyiYEWdZrCPAEr/+wgawcDHwyQ2Mk8K3FM4gpKFrsm8yoN8Tjn1e2+D8v2VcQ9VLIN/b2XsudFZd8eiDpwNkfZMDSyQrdmHn cilMuT+z hrcD1tUsZ0BGlX5GhSdX9gbYn0dg9pMQFYK9jv7BcYZYMu/tV5jyP6BHjIrUk3C4ThABE55SflpT6iH4d+jW03OeKw+Y41WgfOz1p04RwuZHYmBDu4Y8csBgELPO8EJ2yT0UmlmpEwSMM1FFf6SOeQX2PZ6Nnx2bRr+gCLZYSwBjyUe59tKSSYAYMJc17iqocMF/9YSwNAcCPPDnfPuqsRmYIE/URSsAMU6Y3dIpqR44jnouWuowzAgPTr2i8eww2YQqpBlVGSZcFc7ZEiDAAGafug7Sre6fNmtRGRqI149vt2aMEVRgcuUWjcMfRA9Ru55gmqrL6eKT6yywZkefT0Vc6I6FqNyGKuDALwqZv1+wEUYkxclkaLiH1Ur2F2dqxEtmB X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: JP Kobryn A majority of the cgroup_rstat_cpu struct size is made up of the base stat entities. Since only the "self" subsystem state makes use of these, move them into a struct of their own. This allows for a new compact cgroup_rstat_cpu struct that the formal subsystems can make use of. Where applicable, decide on whether to allocate the compact or full struct including the base stats. Signed-off-by: JP Kobryn Reviewed-by: Shakeel Butt --- include/linux/cgroup-defs.h | 37 ++++++++++++++---------- kernel/cgroup/rstat.c | 57 +++++++++++++++++++++++++------------ 2 files changed, 61 insertions(+), 33 deletions(-) diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 1598e1389615..b0a07c63fd46 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -170,7 +170,10 @@ struct cgroup_subsys_state { struct percpu_ref refcnt; /* per-cpu recursive resource statistics */ - struct cgroup_rstat_cpu __percpu *rstat_cpu; + union { + struct cgroup_rstat_cpu __percpu *rstat_cpu; + struct cgroup_rstat_base_cpu __percpu *rstat_base_cpu; + }; /* * siblings list anchored at the parent's ->children @@ -356,6 +359,24 @@ struct cgroup_base_stat { * resource statistics on top of it - bsync, bstat and last_bstat. */ struct cgroup_rstat_cpu { + /* + * Child cgroups with stat updates on this cpu since the last read + * are linked on the parent's ->updated_children through + * ->updated_next. + * + * In addition to being more compact, singly-linked list pointing + * to the cgroup makes it unnecessary for each per-cpu struct to + * point back to the associated cgroup. + * + * Protected by per-cpu cgroup_rstat_cpu_lock. + */ + struct cgroup_subsys_state *updated_children; /* terminated by self */ + struct cgroup_subsys_state *updated_next; /* NULL if not on list */ +}; + +struct cgroup_rstat_base_cpu { + struct cgroup_rstat_cpu self; + /* * ->bsync protects ->bstat. These are the only fields which get * updated in the hot path. @@ -382,20 +403,6 @@ struct cgroup_rstat_cpu { * deltas to propagate to the per-cpu subtree_bstat. */ struct cgroup_base_stat last_subtree_bstat; - - /* - * Child cgroups with stat updates on this cpu since the last read - * are linked on the parent's ->updated_children through - * ->updated_next. - * - * In addition to being more compact, singly-linked list pointing - * to the cgroup makes it unnecessary for each per-cpu struct to - * point back to the associated cgroup. - * - * Protected by per-cpu cgroup_rstat_cpu_lock. - */ - struct cgroup_subsys_state *updated_children; /* terminated by self */ - struct cgroup_subsys_state *updated_next; /* NULL if not on list */ }; struct cgroup_freezer_state { diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index b3eaefc1fd07..c08ebe2f9568 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -24,6 +24,12 @@ static struct cgroup_rstat_cpu *cgroup_rstat_cpu( return per_cpu_ptr(css->rstat_cpu, cpu); } +static struct cgroup_rstat_base_cpu *cgroup_rstat_base_cpu( + struct cgroup_subsys_state *css, int cpu) +{ + return per_cpu_ptr(css->rstat_base_cpu, cpu); +} + static inline bool is_base_css(struct cgroup_subsys_state *css) { return css->ss == NULL; @@ -438,17 +444,31 @@ int cgroup_rstat_init(struct cgroup_subsys_state *css) /* the root cgrp's self css has rstat_cpu preallocated */ if (!css->rstat_cpu) { - css->rstat_cpu = alloc_percpu(struct cgroup_rstat_cpu); - if (!css->rstat_cpu) - return -ENOMEM; - } + if (is_base_css(css)) { + css->rstat_base_cpu = alloc_percpu(struct cgroup_rstat_base_cpu); + if (!css->rstat_base_cpu) + return -ENOMEM; - /* ->updated_children list is self terminated */ - for_each_possible_cpu(cpu) { - struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(css, cpu); + for_each_possible_cpu(cpu) { + struct cgroup_rstat_base_cpu *rstatc; + + rstatc = cgroup_rstat_base_cpu(css, cpu); + rstatc->self.updated_children = css; + u64_stats_init(&rstatc->bsync); + } + } else { + css->rstat_cpu = alloc_percpu(struct cgroup_rstat_cpu); + if (!css->rstat_cpu) + return -ENOMEM; + + for_each_possible_cpu(cpu) { + struct cgroup_rstat_cpu *rstatc; + + rstatc = cgroup_rstat_cpu(css, cpu); + rstatc->updated_children = css; + } + } - rstatc->updated_children = css; - u64_stats_init(&rstatc->bsync); } return 0; @@ -522,9 +542,10 @@ static void cgroup_base_stat_sub(struct cgroup_base_stat *dst_bstat, static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) { - struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(&cgrp->self, cpu); + struct cgroup_rstat_base_cpu *rstatc = cgroup_rstat_base_cpu( + &cgrp->self, cpu); struct cgroup *parent = cgroup_parent(cgrp); - struct cgroup_rstat_cpu *prstatc; + struct cgroup_rstat_base_cpu *prstatc; struct cgroup_base_stat delta; unsigned seq; @@ -552,25 +573,25 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) cgroup_base_stat_add(&cgrp->last_bstat, &delta); delta = rstatc->subtree_bstat; - prstatc = cgroup_rstat_cpu(&parent->self, cpu); + prstatc = cgroup_rstat_base_cpu(&parent->self, cpu); cgroup_base_stat_sub(&delta, &rstatc->last_subtree_bstat); cgroup_base_stat_add(&prstatc->subtree_bstat, &delta); cgroup_base_stat_add(&rstatc->last_subtree_bstat, &delta); } } -static struct cgroup_rstat_cpu * +static struct cgroup_rstat_base_cpu * cgroup_base_stat_cputime_account_begin(struct cgroup *cgrp, unsigned long *flags) { - struct cgroup_rstat_cpu *rstatc; + struct cgroup_rstat_base_cpu *rstatc; - rstatc = get_cpu_ptr(cgrp->self.rstat_cpu); + rstatc = get_cpu_ptr(cgrp->self.rstat_base_cpu); *flags = u64_stats_update_begin_irqsave(&rstatc->bsync); return rstatc; } static void cgroup_base_stat_cputime_account_end(struct cgroup *cgrp, - struct cgroup_rstat_cpu *rstatc, + struct cgroup_rstat_base_cpu *rstatc, unsigned long flags) { u64_stats_update_end_irqrestore(&rstatc->bsync, flags); @@ -580,7 +601,7 @@ static void cgroup_base_stat_cputime_account_end(struct cgroup *cgrp, void __cgroup_account_cputime(struct cgroup *cgrp, u64 delta_exec) { - struct cgroup_rstat_cpu *rstatc; + struct cgroup_rstat_base_cpu *rstatc; unsigned long flags; rstatc = cgroup_base_stat_cputime_account_begin(cgrp, &flags); @@ -591,7 +612,7 @@ void __cgroup_account_cputime(struct cgroup *cgrp, u64 delta_exec) void __cgroup_account_cputime_field(struct cgroup *cgrp, enum cpu_usage_stat index, u64 delta_exec) { - struct cgroup_rstat_cpu *rstatc; + struct cgroup_rstat_base_cpu *rstatc; unsigned long flags; rstatc = cgroup_base_stat_cputime_account_begin(cgrp, &flags);