From patchwork Thu Oct 24 07:48:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Namhyung Kim X-Patchwork-Id: 13848477 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3792ED0BB6A for ; Thu, 24 Oct 2024 07:48:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B4A096B0083; Thu, 24 Oct 2024 03:48:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AF9416B0085; Thu, 24 Oct 2024 03:48:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9C14E6B0089; Thu, 24 Oct 2024 03:48:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 7EB496B0083 for ; Thu, 24 Oct 2024 03:48:20 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 0A4CC141013 for ; Thu, 24 Oct 2024 07:48:01 +0000 (UTC) X-FDA: 82707717372.26.6B62D62 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf03.hostedemail.com (Postfix) with ESMTP id F2DF020007 for ; Thu, 24 Oct 2024 07:48:09 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=ciHCbXKO; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf03.hostedemail.com: domain of namhyung@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=namhyung@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729756059; a=rsa-sha256; cv=none; b=3WZo1I4lK14CgruLN5KB55o8RL6c2CtekbJ7zzxtTVw5+FkR1c+DQ6OkQJ8cw/6fQNsdWs hxOinCmkUz47CRda+GWh8XeQF2+QgDab6gYl4RhYW2ZPuCbHEgE3uCKStEYhwRvdsdl/2J grDM6AeQUcFPe47655Mb3IpJJ5WcwRQ= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=ciHCbXKO; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf03.hostedemail.com: domain of namhyung@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=namhyung@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729756059; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=ZBV1+lNt8QNfHgG2Jj4NU+ETtiwx063zla10IFQI9Uw=; b=3pPXY+Q7vTTm1H2Se4pNybeivV7aqynCMiIOT3hYg4zh0o4cxoDgVP1ym+cgqILPqJXKPj vb38pkOc2eyun0jyUc25of9xXezddyTpl2t5fFxVb5Na8sA0bqszt2Zg4pAdePHElTBk2k f9+E/TGTZ9uU/XB57LhRPSl1J2MGtZo= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id D7D495C5F32; Thu, 24 Oct 2024 07:48:12 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 40702C4CECC; Thu, 24 Oct 2024 07:48:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1729756097; bh=Uwe0jVobKIKamY27y1nMKy0mrRAseQCvrFz6/ckVHmc=; h=From:To:Cc:Subject:Date:From; b=ciHCbXKOIYP1Me1h7W/WxW4ObhwCbsNFjfdXFbt4J6Qg3j28abMdGmRFXRvhhysO1 zuI1YzZm5TZpd3o32F0UBzexA+fBj+5+Ro1NEH0EHCL5XXATVMDZtcxQCwgSvSIOf5 EQBbhkw585/wb5SMhvuPhxWubHOLEfH/q6IwrJL+L6ze5Q15AemFtcKX+rKbodUuCn c4XQg9lxCc/0yKiMXk7/56JJ4r4HJoQGK9LrrZgULUjdiEBB0AQUnZ7/yE71CCiuNK qb7ZaTMhxOLMSAHjiNsfsr5Vm2rw8VmqhWsjZOykPw2UCi3pXFWL1r7XjUf832jRDG vpXyFN5pB5TTw== From: Namhyung Kim To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , LKML , bpf@vger.kernel.org, Andrew Morton , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Vlastimil Babka , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, linux-mm@kvack.org, Arnaldo Carvalho de Melo , Kees Cook Subject: [PATCH v2 bpf-next 1/2] bpf: Add open coded version of kmem_cache iterator Date: Thu, 24 Oct 2024 00:48:14 -0700 Message-ID: <20241024074815.1255066-1-namhyung@kernel.org> X-Mailer: git-send-email 2.47.0.105.g07ac214952-goog MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: xeynkiy974ha5j9omeibrtkng7781t1m X-Rspamd-Queue-Id: F2DF020007 X-Rspamd-Server: rspam02 X-HE-Tag: 1729756089-628209 X-HE-Meta: U2FsdGVkX1/b0hmrDxcIPao8mXmTWcBFHArlGlksapIgCQnVWgYFYz+QvzG5heDZ3U2m8wFuVK5tQqTlPc3rWEbvXIlNul691GyTwYbT1DyPuMdaYdtf/EPspCSOmX4vKJTuYYCUjLR8dlmcqqmCeUiAU/MZ+1yCfM0IDoibAXDt5Rr2xrLfmE2qVoOV/9csJudo1tUvJEYapgyYQfqIs+tkd6eN7SbdHeHXJvuPgZYfjXkAUzgyY8ws9hXsp1YD+mNxL7JLyS9dYBg3RHVrHUWMx1luruxzrenkycq/wACxeBn5szaWnanzWvq0D6IX1QRkVLEgzjmw80ttP6yw8DW5wuTTLoD/M59kJRzLa19SZ+v0T4BKnFWJGxUXRQ+Yv3Vfd+JZLy37Ke7syzarwg2I+5v7MMLSCkozGQseB+vTpyd4eMK6+WsQMGmD504ow3lkb9FShQbA1hUZR6qwHtUYvtAVCnHj9AcgxGj5NYjcA6bPF95zqm3hInM/Pz3AHnEyEyX4hE4jqiM5nXhF1IHT+r5qKy7srjT8UQbT3oQ0drECPdiBaRSRWKYfOKdRnN36mYonD9TIl/382GZg4SnE91pOGzYFpLBOqpam4GQZxGFMNdJ1uccau5lqNbZDj8ZCmw1Oz7+BoofOHalx+DET6WnOPRESlkUuFmI0mHtQM2mUDodUztBuRhnxOihj/dt/Z5bnNxbAtncO1YtQU+oz8MilLN3u3jO9/TBWqd8eRIES7Oemp1vkpaVPE+Ur5hiQIE0D/2dXI8Kx2IR8DEO+tvxJXhUq10Jb1CWgKCqdOnqd1AotOmCfuQphjCee2XHC3I1JYBrE98pZx59ui5e72oGRdsKTl/IhG203H8K49OJ5IQjB8soqNSrAq2n3ToPfOLeYIFho7zCaG4LQBQLIrxFmmvUkDQ/IKpvpeq1lfH9aReMCsQzJlu8FXP1vNjA8QqMvzDSb4SACqPn YFRoPTGb VYwVmRLICv5AaJOuT2WKOJfp9ss3z30rlOhPA/D+og7vrNG1a38/FRYRNDWVL+cvAvTmGC0NYRP4RojcN4/lPIEFSyztmBZKUBs2InzjvykRxo7qXFt8G8LBmbMnmoHVviMsxaW1YmmRXQoK0fva0mG5c3Gt/F14UwfXcANn9uVYCQoyhTyl2zJRmF28PoYX70/7fan9pZRd1RY4unbj/H6Ny8rV6VeK6QV37QPfL1l8EPtNDhBp6wFzEosWPKpJs+DV84PbcvaN2x6eySMA/XvF7N9WF5my1ftdsVJ+C7e8i8HydLgV2nsGrcXMnpeVjoZG9qUPg70grVOdTzZq1d2tBScSlZMmfxVVk X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add a new open coded iterator for kmem_cache which can be called from a BPF program like below. It doesn't take any argument and traverses all kmem_cache entries. struct kmem_cache *pos; bpf_for_each(kmem_cache, pos) { ... } As it needs to grab slab_mutex, it should be called from sleepable BPF programs only. Also update the existing iterator code to use the open coded version internally as suggested by Andrii. Signed-off-by: Namhyung Kim --- v2) * prevent restart after the last element (Martin) * update existing code to use the open coded version (Andrii) kernel/bpf/helpers.c | 3 + kernel/bpf/kmem_cache_iter.c | 151 +++++++++++++++++++++++++---------- 2 files changed, 110 insertions(+), 44 deletions(-) diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 5c3fdb29c1b1fe53..ddddb060835bac4b 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -3112,6 +3112,9 @@ BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE) BTF_ID_FLAGS(func, bpf_get_kmem_cache) +BTF_ID_FLAGS(func, bpf_iter_kmem_cache_new, KF_ITER_NEW | KF_SLEEPABLE) +BTF_ID_FLAGS(func, bpf_iter_kmem_cache_next, KF_ITER_NEXT | KF_RET_NULL | KF_SLEEPABLE) +BTF_ID_FLAGS(func, bpf_iter_kmem_cache_destroy, KF_ITER_DESTROY | KF_SLEEPABLE) BTF_KFUNCS_END(common_btf_ids) static const struct btf_kfunc_id_set common_kfunc_set = { diff --git a/kernel/bpf/kmem_cache_iter.c b/kernel/bpf/kmem_cache_iter.c index ebc101d7da51b57c..3ae2158d767f4526 100644 --- a/kernel/bpf/kmem_cache_iter.c +++ b/kernel/bpf/kmem_cache_iter.c @@ -8,16 +8,116 @@ #include "../../mm/slab.h" /* kmem_cache, slab_caches and slab_mutex */ +/* open-coded version */ +struct bpf_iter_kmem_cache { + __u64 __opaque[1]; +} __attribute__((aligned(8))); + +struct bpf_iter_kmem_cache_kern { + struct kmem_cache *pos; +} __attribute__((aligned(8))); + +#define KMEM_CACHE_POS_START ((void *)1L) + +__bpf_kfunc_start_defs(); + +__bpf_kfunc int bpf_iter_kmem_cache_new(struct bpf_iter_kmem_cache *it) +{ + struct bpf_iter_kmem_cache_kern *kit = (void *)it; + + BUILD_BUG_ON(sizeof(*kit) > sizeof(*it)); + BUILD_BUG_ON(__alignof__(*kit) != __alignof__(*it)); + + kit->pos = KMEM_CACHE_POS_START; + return 0; +} + +__bpf_kfunc struct kmem_cache *bpf_iter_kmem_cache_next(struct bpf_iter_kmem_cache *it) +{ + struct bpf_iter_kmem_cache_kern *kit = (void *)it; + struct kmem_cache *prev = kit->pos; + struct kmem_cache *next; + bool destroy = false; + + if (!prev) + return NULL; + + mutex_lock(&slab_mutex); + + if (list_empty(&slab_caches)) { + mutex_unlock(&slab_mutex); + return NULL; + } + + if (prev == KMEM_CACHE_POS_START) + next = list_first_entry(&slab_caches, struct kmem_cache, list); + else if (list_last_entry(&slab_caches, struct kmem_cache, list) == prev) + next = NULL; + else + next = list_next_entry(prev, list); + + /* boot_caches have negative refcount, don't touch them */ + if (next && next->refcount > 0) + next->refcount++; + + /* Skip kmem_cache_destroy() for active entries */ + if (prev && prev != KMEM_CACHE_POS_START) { + if (prev->refcount > 1) + prev->refcount--; + else if (prev->refcount == 1) + destroy = true; + } + + mutex_unlock(&slab_mutex); + + if (destroy) + kmem_cache_destroy(prev); + + kit->pos = next; + return next; +} + +__bpf_kfunc void bpf_iter_kmem_cache_destroy(struct bpf_iter_kmem_cache *it) +{ + struct bpf_iter_kmem_cache_kern *kit = (void *)it; + struct kmem_cache *s = kit->pos; + bool destroy = false; + + if (s == NULL || s == KMEM_CACHE_POS_START) + return; + + mutex_lock(&slab_mutex); + + /* Skip kmem_cache_destroy() for active entries */ + if (s->refcount > 1) + s->refcount--; + else if (s->refcount == 1) + destroy = true; + + mutex_unlock(&slab_mutex); + + if (destroy) + kmem_cache_destroy(s); +} + +__bpf_kfunc_end_defs(); + struct bpf_iter__kmem_cache { __bpf_md_ptr(struct bpf_iter_meta *, meta); __bpf_md_ptr(struct kmem_cache *, s); }; +union kmem_cache_iter_priv { + struct bpf_iter_kmem_cache it; + struct bpf_iter_kmem_cache_kern kit; +}; + static void *kmem_cache_iter_seq_start(struct seq_file *seq, loff_t *pos) { loff_t cnt = 0; bool found = false; struct kmem_cache *s; + union kmem_cache_iter_priv *p = seq->private; mutex_lock(&slab_mutex); @@ -43,8 +143,9 @@ static void *kmem_cache_iter_seq_start(struct seq_file *seq, loff_t *pos) mutex_unlock(&slab_mutex); if (!found) - return NULL; + s = NULL; + p->kit.pos = s; return s; } @@ -55,63 +156,24 @@ static void kmem_cache_iter_seq_stop(struct seq_file *seq, void *v) .meta = &meta, .s = v, }; + union kmem_cache_iter_priv *p = seq->private; struct bpf_prog *prog; - bool destroy = false; meta.seq = seq; prog = bpf_iter_get_info(&meta, true); if (prog && !ctx.s) bpf_iter_run_prog(prog, &ctx); - if (ctx.s == NULL) - return; - - mutex_lock(&slab_mutex); - - /* Skip kmem_cache_destroy() for active entries */ - if (ctx.s->refcount > 1) - ctx.s->refcount--; - else if (ctx.s->refcount == 1) - destroy = true; - - mutex_unlock(&slab_mutex); - - if (destroy) - kmem_cache_destroy(ctx.s); + bpf_iter_kmem_cache_destroy(&p->it); } static void *kmem_cache_iter_seq_next(struct seq_file *seq, void *v, loff_t *pos) { - struct kmem_cache *s = v; - struct kmem_cache *next = NULL; - bool destroy = false; + union kmem_cache_iter_priv *p = seq->private; ++*pos; - mutex_lock(&slab_mutex); - - if (list_last_entry(&slab_caches, struct kmem_cache, list) != s) { - next = list_next_entry(s, list); - - WARN_ON_ONCE(next->refcount == 0); - - /* boot_caches have negative refcount, don't touch them */ - if (next->refcount > 0) - next->refcount++; - } - - /* Skip kmem_cache_destroy() for active entries */ - if (s->refcount > 1) - s->refcount--; - else if (s->refcount == 1) - destroy = true; - - mutex_unlock(&slab_mutex); - - if (destroy) - kmem_cache_destroy(s); - - return next; + return bpf_iter_kmem_cache_next(&p->it); } static int kmem_cache_iter_seq_show(struct seq_file *seq, void *v) @@ -143,6 +205,7 @@ BTF_ID_LIST_GLOBAL_SINGLE(bpf_kmem_cache_btf_id, struct, kmem_cache) static const struct bpf_iter_seq_info kmem_cache_iter_seq_info = { .seq_ops = &kmem_cache_iter_seq_ops, + .seq_priv_size = sizeof(union kmem_cache_iter_priv), }; static void bpf_iter_kmem_cache_show_fdinfo(const struct bpf_iter_aux_info *aux,