From patchwork Wed Oct 30 22:28:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Namhyung Kim X-Patchwork-Id: 13857270 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF2E7D6B6DA for ; Wed, 30 Oct 2024 22:28:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 44C256B0093; Wed, 30 Oct 2024 18:28:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3FAF26B0095; Wed, 30 Oct 2024 18:28:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24D016B0096; Wed, 30 Oct 2024 18:28:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 05C416B0093 for ; Wed, 30 Oct 2024 18:28:24 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 7CD5FA10C2 for ; Wed, 30 Oct 2024 22:28:24 +0000 (UTC) X-FDA: 82731707520.06.A7162C1 Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf26.hostedemail.com (Postfix) with ESMTP id 2AB6B140010 for ; Wed, 30 Oct 2024 22:28:03 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=RyfocfDc; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf26.hostedemail.com: domain of namhyung@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=namhyung@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730327247; a=rsa-sha256; cv=none; b=i4+motCva3obB2lz50Dcu/essWAghfvmF7nsCxagc+5KV6s/kfV9DoECeWyOxaCLoJdbbQ 6jilf9tMq7PVtqzfgC1aOA+NTMDWTdGY6tm4FXXyahpo6XDZDpmRbEXAeeVtmz1HEXelfa e/30bvfdx8XXi1zhAZMt+sSk8jgvyzA= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=RyfocfDc; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf26.hostedemail.com: domain of namhyung@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=namhyung@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730327247; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=+MBCR1SBcyKAEsUwc5NgkUGWOk/jMDrop052fmw7aC4=; b=CYg0UTvNqhNkFsiNdkzTHGOguM5yFuuaN3hVCVlUyLxnfS34kOcoxMulqu/ImaedT71nZv rQHLKjAnm1sOrUABRB8pAfDl3Yd2FV0D9uQNcR/yqEd0+i1t1X2Pyfw0vzZIqnoo18h6Qi bGqV74NDpuPPMyrF8u1iW7ZtFXh9MTY= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 40753A431DD; Wed, 30 Oct 2024 22:26:26 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C3359C4CECE; Wed, 30 Oct 2024 22:28:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730327301; bh=lujB1h+8Q2UbsPZY5HJXumwMpKUUWxBvKX76ZsUvwa4=; h=From:To:Cc:Subject:Date:From; b=RyfocfDcM2+5h+GN0sm4cO95gXRNyNSNK98fZbB2zliafXRl6+Kdz1COifoyeFxbf 8EFeVwJDA+rsKP31hgLnto6BEoc8ChENNs8KvtbkVPiidumZdT7lkT7wnXj7x0NLcK dUYEAI4ek+AYj5I2YKlZsfVemJ5Ko0tNER6+AHMzPuO2RNBTfYGknGEkHn6y1UWjjH 97kjHQMSXqwAljIj+TIqXqCYdlv/GMM9bNdV7YdtunTIVO84gbgFf76fVEpl3SL1AP An3XwWYSYr3DkUQDCrFLY6tIJEmqVj0ZMsbTYBFLWkYM3PGcRovSh82wlquaKfYNKm iYTmetHs5W2jQ== From: Namhyung Kim To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , LKML , bpf@vger.kernel.org, Andrew Morton , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Vlastimil Babka , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, linux-mm@kvack.org, Arnaldo Carvalho de Melo , Kees Cook Subject: [PATCH bpf-next v3 1/2] bpf: Add open coded version of kmem_cache iterator Date: Wed, 30 Oct 2024 15:28:18 -0700 Message-ID: <20241030222819.1800667-1-namhyung@kernel.org> X-Mailer: git-send-email 2.47.0.163.g1226f6d8fa-goog MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 2AB6B140010 X-Rspamd-Server: rspam11 X-Stat-Signature: a381nfdes6m9x5md1pfen3t7wwh7e3wc X-HE-Tag: 1730327283-817573 X-HE-Meta: U2FsdGVkX1+Zir0K8/OstrwqDmJxdDPIWFFhWAEljEgnHz44PM3hBM7h8f+HhHkynAVnqRdlOk+BkvndmD+BeKWatg6Au3/5J8wx18aJH3uRG+nCX25B1uPvxcl6CXfLnblrcv/BcvdZWyKxieePZvbnfvaYCrX39Mvcr1scvKGh/lLJcJoq1B3iIe0VwEDO7HSxMuIaICiJeX1WGOOX3l7HkxT/3uyZNx8RHI+vaNxXCH4cnwybTfOq/8oWHsMV+L/iuCcx/VSdZ6Yq4DUQAoLbf4BHYFjNImKqwuL4o0eOtbqzKmnU1KxCOA0Wg6wyELIVHNEaW0VNhmBS28SUQHHg4gUBK8usEZ1u/FsVRuK2eZkmbS7Yw8foFXlM/YMI0Ci5rO0A5G0NF0pykKQj59ixHqBlPJa0GD1yjvs+6wVkVywFJnw6XpaNN6DUTmhaIdlwsMUTpnrLyXwqKsMlIhBXH05qqLXQCFr0/X56XS0nd9LQVF7PTLO6RfVkfnWWL/e2kJ8tZjuvIfnCdwiL25T7wadkfMwZJkurntwCclCYIzIpXlcJO8IbGbAMLMGy3Gthec/IZKaCWvkp3oV+pWoZmkw3YjO3MTLZEeDtfwDLFAclsEu/xVvDQFvWIvPsZKbWASkJalCUCmln4vYV36Po9LsFzEKMWJ0aA+z0yjf8+nKEa/9gO6Pi4hAh8YRbCb+F5pvsZvdq88XqxUafyHbgx4nzTfppoaQlOnKhtiZu1edYCmGbeXAfu4ixHlADlkAr+6ac1GFmIV+o0eKr/YWqOWdsBUn4ZRMeiBXGYGrbD4PM0EXMEKvUdNYVSx5dhod/4JRg8zp/SIhj+PomjMPQpl1iL8SE93SPopdXpGTnReNedAlHoWkufnsAQTWVuadnkjaIEyybPCM/kVhW4SJiVSgFz7N9NX7I8h8ux+jwrqEDaA/fmKx6Sid6F5gaoIVtgnS0yq7i3Qq4EMh sazs+jSu H7BPsWnR149M13UrMkdf+2RbkJtHvdkN4jPrrw5/R+F2i2q7368bH2LVhEaVtjrPjhGycLj3XOSYWN4iP3i4oWmmPycnQXJqSCicAlx4JK6CL2rzAMXkmM2qGI6sXCTSqGWs3VQ0ypktPllOvrUPioFTQ/afB3eKc7Dg7ZYKzF6WGk5+Ima+QwgiQFUGDqouuN/oZmaYQzUAb1vAYqA4xiErf6+pqoULGyKxL+Ypkney5HHmPYCCNoGUhfLaO/cdmPDdB8j54gx2cobFsyeYku61FqOdm9aKGLigBX5HLGj7NMhkY5RBvNxMl99deHNb0tDia3kiIxukOzjwLLNjdIzzxxhGNVabjq0DA X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add a new open coded iterator for kmem_cache which can be called from a BPF program like below. It doesn't take any argument and traverses all kmem_cache entries. struct kmem_cache *pos; bpf_for_each(kmem_cache, pos) { ... } As it needs to grab slab_mutex, it should be called from sleepable BPF programs only. Also update the existing iterator code to use the open coded version internally as suggested by Andrii. Signed-off-by: Namhyung Kim --- v2) * prevent restart after the last element (Martin) * update existing code to use the open coded version (Andrii) kernel/bpf/helpers.c | 3 + kernel/bpf/kmem_cache_iter.c | 151 +++++++++++++++++++++++++---------- 2 files changed, 110 insertions(+), 44 deletions(-) diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 2e82f8d3a76fb9ca..395221e53832e10e 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -3112,6 +3112,9 @@ BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE) BTF_ID_FLAGS(func, bpf_get_kmem_cache) +BTF_ID_FLAGS(func, bpf_iter_kmem_cache_new, KF_ITER_NEW | KF_SLEEPABLE) +BTF_ID_FLAGS(func, bpf_iter_kmem_cache_next, KF_ITER_NEXT | KF_RET_NULL | KF_SLEEPABLE) +BTF_ID_FLAGS(func, bpf_iter_kmem_cache_destroy, KF_ITER_DESTROY | KF_SLEEPABLE) BTF_KFUNCS_END(common_btf_ids) static const struct btf_kfunc_id_set common_kfunc_set = { diff --git a/kernel/bpf/kmem_cache_iter.c b/kernel/bpf/kmem_cache_iter.c index ebc101d7da51b57c..3ae2158d767f4526 100644 --- a/kernel/bpf/kmem_cache_iter.c +++ b/kernel/bpf/kmem_cache_iter.c @@ -8,16 +8,116 @@ #include "../../mm/slab.h" /* kmem_cache, slab_caches and slab_mutex */ +/* open-coded version */ +struct bpf_iter_kmem_cache { + __u64 __opaque[1]; +} __attribute__((aligned(8))); + +struct bpf_iter_kmem_cache_kern { + struct kmem_cache *pos; +} __attribute__((aligned(8))); + +#define KMEM_CACHE_POS_START ((void *)1L) + +__bpf_kfunc_start_defs(); + +__bpf_kfunc int bpf_iter_kmem_cache_new(struct bpf_iter_kmem_cache *it) +{ + struct bpf_iter_kmem_cache_kern *kit = (void *)it; + + BUILD_BUG_ON(sizeof(*kit) > sizeof(*it)); + BUILD_BUG_ON(__alignof__(*kit) != __alignof__(*it)); + + kit->pos = KMEM_CACHE_POS_START; + return 0; +} + +__bpf_kfunc struct kmem_cache *bpf_iter_kmem_cache_next(struct bpf_iter_kmem_cache *it) +{ + struct bpf_iter_kmem_cache_kern *kit = (void *)it; + struct kmem_cache *prev = kit->pos; + struct kmem_cache *next; + bool destroy = false; + + if (!prev) + return NULL; + + mutex_lock(&slab_mutex); + + if (list_empty(&slab_caches)) { + mutex_unlock(&slab_mutex); + return NULL; + } + + if (prev == KMEM_CACHE_POS_START) + next = list_first_entry(&slab_caches, struct kmem_cache, list); + else if (list_last_entry(&slab_caches, struct kmem_cache, list) == prev) + next = NULL; + else + next = list_next_entry(prev, list); + + /* boot_caches have negative refcount, don't touch them */ + if (next && next->refcount > 0) + next->refcount++; + + /* Skip kmem_cache_destroy() for active entries */ + if (prev && prev != KMEM_CACHE_POS_START) { + if (prev->refcount > 1) + prev->refcount--; + else if (prev->refcount == 1) + destroy = true; + } + + mutex_unlock(&slab_mutex); + + if (destroy) + kmem_cache_destroy(prev); + + kit->pos = next; + return next; +} + +__bpf_kfunc void bpf_iter_kmem_cache_destroy(struct bpf_iter_kmem_cache *it) +{ + struct bpf_iter_kmem_cache_kern *kit = (void *)it; + struct kmem_cache *s = kit->pos; + bool destroy = false; + + if (s == NULL || s == KMEM_CACHE_POS_START) + return; + + mutex_lock(&slab_mutex); + + /* Skip kmem_cache_destroy() for active entries */ + if (s->refcount > 1) + s->refcount--; + else if (s->refcount == 1) + destroy = true; + + mutex_unlock(&slab_mutex); + + if (destroy) + kmem_cache_destroy(s); +} + +__bpf_kfunc_end_defs(); + struct bpf_iter__kmem_cache { __bpf_md_ptr(struct bpf_iter_meta *, meta); __bpf_md_ptr(struct kmem_cache *, s); }; +union kmem_cache_iter_priv { + struct bpf_iter_kmem_cache it; + struct bpf_iter_kmem_cache_kern kit; +}; + static void *kmem_cache_iter_seq_start(struct seq_file *seq, loff_t *pos) { loff_t cnt = 0; bool found = false; struct kmem_cache *s; + union kmem_cache_iter_priv *p = seq->private; mutex_lock(&slab_mutex); @@ -43,8 +143,9 @@ static void *kmem_cache_iter_seq_start(struct seq_file *seq, loff_t *pos) mutex_unlock(&slab_mutex); if (!found) - return NULL; + s = NULL; + p->kit.pos = s; return s; } @@ -55,63 +156,24 @@ static void kmem_cache_iter_seq_stop(struct seq_file *seq, void *v) .meta = &meta, .s = v, }; + union kmem_cache_iter_priv *p = seq->private; struct bpf_prog *prog; - bool destroy = false; meta.seq = seq; prog = bpf_iter_get_info(&meta, true); if (prog && !ctx.s) bpf_iter_run_prog(prog, &ctx); - if (ctx.s == NULL) - return; - - mutex_lock(&slab_mutex); - - /* Skip kmem_cache_destroy() for active entries */ - if (ctx.s->refcount > 1) - ctx.s->refcount--; - else if (ctx.s->refcount == 1) - destroy = true; - - mutex_unlock(&slab_mutex); - - if (destroy) - kmem_cache_destroy(ctx.s); + bpf_iter_kmem_cache_destroy(&p->it); } static void *kmem_cache_iter_seq_next(struct seq_file *seq, void *v, loff_t *pos) { - struct kmem_cache *s = v; - struct kmem_cache *next = NULL; - bool destroy = false; + union kmem_cache_iter_priv *p = seq->private; ++*pos; - mutex_lock(&slab_mutex); - - if (list_last_entry(&slab_caches, struct kmem_cache, list) != s) { - next = list_next_entry(s, list); - - WARN_ON_ONCE(next->refcount == 0); - - /* boot_caches have negative refcount, don't touch them */ - if (next->refcount > 0) - next->refcount++; - } - - /* Skip kmem_cache_destroy() for active entries */ - if (s->refcount > 1) - s->refcount--; - else if (s->refcount == 1) - destroy = true; - - mutex_unlock(&slab_mutex); - - if (destroy) - kmem_cache_destroy(s); - - return next; + return bpf_iter_kmem_cache_next(&p->it); } static int kmem_cache_iter_seq_show(struct seq_file *seq, void *v) @@ -143,6 +205,7 @@ BTF_ID_LIST_GLOBAL_SINGLE(bpf_kmem_cache_btf_id, struct, kmem_cache) static const struct bpf_iter_seq_info kmem_cache_iter_seq_info = { .seq_ops = &kmem_cache_iter_seq_ops, + .seq_priv_size = sizeof(union kmem_cache_iter_priv), }; static void bpf_iter_kmem_cache_show_fdinfo(const struct bpf_iter_aux_info *aux,