From patchwork Sun May 7 04:01:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 13233665 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7A40C77B73 for ; Sun, 7 May 2023 03:30:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230211AbjEGDa3 (ORCPT ); Sat, 6 May 2023 23:30:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35100 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230133AbjEGDa0 (ORCPT ); Sat, 6 May 2023 23:30:26 -0400 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A8B41816A; Sat, 6 May 2023 20:30:24 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4QDVL20hLBz4f3mWc; Sun, 7 May 2023 11:30:18 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAHcLNHG1dkjIawIw--.21328S5; Sun, 07 May 2023 11:30:19 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, Alexei Starovoitov , Yonghong Song , Andrii Nakryiko , Viacheslav Dubeyko , Amir Goldstein , houtao1@huawei.com Subject: [RFC PATCH bpf-next 1/4] bpf: Introduce bpf iterator for file-system inode Date: Sun, 7 May 2023 12:01:04 +0800 Message-Id: <20230507040107.3755166-2-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20230507040107.3755166-1-houtao@huaweicloud.com> References: <20230507040107.3755166-1-houtao@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAHcLNHG1dkjIawIw--.21328S5 X-Coremail-Antispam: 1UD129KBjvJXoWxKF1rKr47XF4rGFy5urWkXrb_yoWfKFy3pF s5Ar4DCr48X3y7Wr1kJa1UuFnYq3W09a4UKrZ7W3yYyrsFqr1vg3WrKr1IyFyrJrW09r92 vFyjka4UGryUArJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUvGb4IE77IF4wAFF20E14v26ryj6rWUM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUGw A2048vs2IY020Ec7CjxVAFwI0_JFI_Gr1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVW7JVWDJwA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr 0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY 17CE14v26r1q6r43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcV C0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY 6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa 73UjIFyTuYvjxUzl1vUUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Hou Tao The usual way to get information about a fs inode is statx(), but the returned information is so limited and sometimes it is impossible to get some internal information (e.g., dirty pages of one inode) through existed syscalls. So introduce bpf iterator for fs inode to solve the problem. By passing one fd of the specific inode and one bpf program to the bpf file-system inode iterator, a bpf iterator fd will be created and reading the iterator fd will output the content customized by the provided bpf program. Now only the bpf iterator for specific inode is supported, the support for all inodes in a file-system could be added later if needed. Without any inode related bpf helper, only the content of inode itself and the typed-pointer in inode (e.g., i_sb) can be printed in a bpf program as shown below: (struct inode){ .i_mode = (umode_t)33188, .i_opflags = (short unsigned int)13, .i_flags = (unsigned int)4096, .i_op = (struct inode_operations *)0x000000004dd45285, .i_sb = (struct super_block *)0x0000000006c11996, .i_mapping = (struct address_space *)0x00000000333cf64b, .i_ino = (long unsigned int)30982996, (union){ .i_nlink = ()1, .__i_nlink = (unsigned int)1, }, .i_size = (loff_t)4095, ...... (struct super_block){ .s_list = (struct list_head){ .next = (struct list_head *)0x000000008af29511, .prev = (struct list_head *)0x000000003d8c9095, }, .s_dev = (dev_t)265289730, .s_blocksize_bits = (unsigned char)12, .s_blocksize = (long unsigned int)4096, .s_maxbytes = (loff_t)9223372036854775807, ...... Signed-off-by: Hou Tao --- include/linux/bpf.h | 2 + include/linux/btf_ids.h | 5 +- include/uapi/linux/bpf.h | 8 ++ kernel/bpf/Makefile | 1 + kernel/bpf/fs_iter.c | 174 +++++++++++++++++++++++++++++++++ tools/include/uapi/linux/bpf.h | 8 ++ 6 files changed, 197 insertions(+), 1 deletion(-) create mode 100644 kernel/bpf/fs_iter.c diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 456f33b9d205..3b2324269647 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2120,6 +2120,8 @@ struct bpf_iter_aux_info { enum bpf_iter_task_type type; u32 pid; } task; + /* for fs iter */ + void *fs; }; typedef int (*bpf_iter_attach_target_t)(struct bpf_prog *prog, diff --git a/include/linux/btf_ids.h b/include/linux/btf_ids.h index 00950cc03bff..9e036d1360e7 100644 --- a/include/linux/btf_ids.h +++ b/include/linux/btf_ids.h @@ -255,7 +255,10 @@ extern u32 btf_sock_ids[]; #define BTF_TRACING_TYPE_xxx \ BTF_TRACING_TYPE(BTF_TRACING_TYPE_TASK, task_struct) \ BTF_TRACING_TYPE(BTF_TRACING_TYPE_FILE, file) \ - BTF_TRACING_TYPE(BTF_TRACING_TYPE_VMA, vm_area_struct) + BTF_TRACING_TYPE(BTF_TRACING_TYPE_VMA, vm_area_struct) \ + BTF_TRACING_TYPE(BTF_TRACING_TYPE_INODE, inode) \ + BTF_TRACING_TYPE(BTF_TRACING_TYPE_DENTRY, dentry) + enum { #define BTF_TRACING_TYPE(name, type) name, diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 1bb11a6ee667..099048ba3edc 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -95,6 +95,10 @@ enum bpf_cgroup_iter_order { BPF_CGROUP_ITER_ANCESTORS_UP, /* walk ancestors upward. */ }; +enum bpf_fs_iter_type { + BPF_FS_ITER_INODE = 0, /* a specific inode */ +}; + union bpf_iter_link_info { struct { __u32 map_fd; @@ -116,6 +120,10 @@ union bpf_iter_link_info { __u32 pid; __u32 pid_fd; } task; + struct { + enum bpf_fs_iter_type type; + __u32 fd; + } fs; }; /* BPF syscall commands, see bpf(2) man-page for more details. */ diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 1d3892168d32..e945d6e23eed 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -8,6 +8,7 @@ CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy) obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o +obj-$(CONFIG_BPF_SYSCALL) += fs_iter.o obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o diff --git a/kernel/bpf/fs_iter.c b/kernel/bpf/fs_iter.c new file mode 100644 index 000000000000..cd7f10ea00ab --- /dev/null +++ b/kernel/bpf/fs_iter.c @@ -0,0 +1,174 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023. Huawei Technologies Co., Ltd + */ +#include +#include +#include +#include +#include +#include + +DEFINE_BPF_ITER_FUNC(fs_inode, struct bpf_iter_meta *meta, struct inode *inode, struct dentry *dentry); + +struct bpf_iter__fs_inode { + __bpf_md_ptr(struct bpf_iter_meta *, meta); + __bpf_md_ptr(struct inode *, inode); + __bpf_md_ptr(struct dentry *, dentry); +}; + +struct bpf_fs_iter_aux_info { + atomic_t count; + enum bpf_fs_iter_type type; + struct file *filp; +}; + +struct bpf_iter_seq_fs_info { + struct bpf_fs_iter_aux_info *fs; +}; + +static inline void bpf_fs_iter_get(struct bpf_fs_iter_aux_info *fs) +{ + atomic_inc(&fs->count); +} + +static void bpf_fs_iter_put(struct bpf_fs_iter_aux_info *fs) +{ + if (!atomic_dec_and_test(&fs->count)) + return; + + fput(fs->filp); + kfree(fs); +} + +static int bpf_iter_attach_fs(struct bpf_prog *prog, union bpf_iter_link_info *linfo, + struct bpf_iter_aux_info *aux) +{ + struct bpf_fs_iter_aux_info *fs; + struct file *filp; + + if (linfo->fs.type > BPF_FS_ITER_INODE) + return -EINVAL; + /* TODO: The file-system is pinned */ + filp = fget(linfo->fs.fd); + if (!filp) + return -EINVAL; + + fs = kmalloc(sizeof(*fs), GFP_KERNEL); + if (!fs) { + fput(filp); + return -ENOMEM; + } + + atomic_set(&fs->count, 1); + fs->type = linfo->fs.type; + fs->filp = filp; + aux->fs = fs; + + return 0; +} + +static void bpf_iter_detach_fs(struct bpf_iter_aux_info *aux) +{ + bpf_fs_iter_put(aux->fs); +} + +static int bpf_iter_init_seq_fs_priv(void *priv, struct bpf_iter_aux_info *aux) +{ + struct bpf_iter_seq_fs_info *info = priv; + struct bpf_fs_iter_aux_info *fs = aux->fs; + + /* link fd is still alive, so it is OK to inc ref-count directly */ + bpf_fs_iter_get(fs); + info->fs = fs; + + return 0; +} + +static void bpf_iter_fini_seq_fs_priv(void *priv) +{ + struct bpf_iter_seq_fs_info *info = priv; + + bpf_fs_iter_put(info->fs); +} + +static void *fs_iter_seq_start(struct seq_file *m, loff_t *pos) +{ + struct bpf_iter_seq_fs_info *info = m->private; + + if (*pos == 0) + ++*pos; + + return file_inode(info->fs->filp); +} + +static int __fs_iter_seq_show(struct seq_file *m, void *v, bool stop) +{ + struct bpf_iter__fs_inode ctx; + struct bpf_iter_meta meta; + struct bpf_prog *prog; + int err; + + meta.seq = m; + prog = bpf_iter_get_info(&meta, stop); + if (!prog) + return 0; + + ctx.meta = &meta; + ctx.inode = v; + ctx.dentry = v ? d_find_alias(v) : NULL; + err = bpf_iter_run_prog(prog, &ctx); + dput(ctx.dentry); + return err; +} + +static void fs_iter_seq_stop(struct seq_file *m, void *v) +{ + if (!v) + __fs_iter_seq_show(m, NULL, true); +} + +static void *fs_iter_seq_next(struct seq_file *m, void *v, loff_t *pos) +{ + ++*pos; + return NULL; +} + +static int fs_iter_seq_show(struct seq_file *m, void *v) +{ + return __fs_iter_seq_show(m, v, false); +} + +static const struct seq_operations fs_iter_seq_ops = { + .start = fs_iter_seq_start, + .stop = fs_iter_seq_stop, + .next = fs_iter_seq_next, + .show = fs_iter_seq_show, +}; + +static const struct bpf_iter_seq_info fs_iter_seq_info = { + .seq_ops = &fs_iter_seq_ops, + .init_seq_private = bpf_iter_init_seq_fs_priv, + .fini_seq_private = bpf_iter_fini_seq_fs_priv, + .seq_priv_size = sizeof(struct bpf_iter_seq_fs_info), +}; + +static struct bpf_iter_reg fs_inode_reg_info = { + .target = "fs_inode", + .attach_target = bpf_iter_attach_fs, + .detach_target = bpf_iter_detach_fs, + .ctx_arg_info_size = 2, + .ctx_arg_info = { + { offsetof(struct bpf_iter__fs_inode, inode), PTR_TO_BTF_ID_OR_NULL }, + { offsetof(struct bpf_iter__fs_inode, dentry), PTR_TO_BTF_ID_OR_NULL }, + }, + .seq_info = &fs_iter_seq_info, +}; + +static int __init fs_iter_init(void) +{ + fs_inode_reg_info.ctx_arg_info[0].btf_id = btf_tracing_ids[BTF_TRACING_TYPE_INODE]; + fs_inode_reg_info.ctx_arg_info[1].btf_id = btf_tracing_ids[BTF_TRACING_TYPE_DENTRY]; + return bpf_iter_reg_target(&fs_inode_reg_info); +} +late_initcall(fs_iter_init); diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 1bb11a6ee667..099048ba3edc 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -95,6 +95,10 @@ enum bpf_cgroup_iter_order { BPF_CGROUP_ITER_ANCESTORS_UP, /* walk ancestors upward. */ }; +enum bpf_fs_iter_type { + BPF_FS_ITER_INODE = 0, /* a specific inode */ +}; + union bpf_iter_link_info { struct { __u32 map_fd; @@ -116,6 +120,10 @@ union bpf_iter_link_info { __u32 pid; __u32 pid_fd; } task; + struct { + enum bpf_fs_iter_type type; + __u32 fd; + } fs; }; /* BPF syscall commands, see bpf(2) man-page for more details. */ From patchwork Sun May 7 04:01:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 13233668 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7906C77B73 for ; Sun, 7 May 2023 03:30:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230148AbjEGDaf (ORCPT ); Sat, 6 May 2023 23:30:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35090 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230131AbjEGDa0 (ORCPT ); Sat, 6 May 2023 23:30:26 -0400 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 56F4518157; Sat, 6 May 2023 20:30:23 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4QDVL24LQgz4f3wRH; Sun, 7 May 2023 11:30:18 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAHcLNHG1dkjIawIw--.21328S6; Sun, 07 May 2023 11:30:19 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, Alexei Starovoitov , Yonghong Song , Andrii Nakryiko , Viacheslav Dubeyko , Amir Goldstein , houtao1@huawei.com Subject: [RFC PATCH bpf-next 2/4] bpf: Add three kfunc helpers for bpf fs inode iterator Date: Sun, 7 May 2023 12:01:05 +0800 Message-Id: <20230507040107.3755166-3-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20230507040107.3755166-1-houtao@huaweicloud.com> References: <20230507040107.3755166-1-houtao@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAHcLNHG1dkjIawIw--.21328S6 X-Coremail-Antispam: 1UD129KBjvJXoWxtF18Wr45Ww13XFy8KrW8Crg_yoWxuw4DpF WDWF1Fkrs7XFWxCrn3A3WDur1Sk3s7Ca15AFy7W3WY93W7tFyS9wnFgry5Ary5GrWkAFWI qF4ktryDuF4DXrJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUvGb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUXw A2048vs2IY020Ec7CjxVAFwI0_Gr0_Xr1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVW7JVWDJwA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr 0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY 17CE14v26r1q6r43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcV C0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY 6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa 73UjIFyTuYvjxU2GYLDUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Hou Tao Add kfunc helpers for bpf fs inode iterator to inspect the details of inode page cache: 1) bpf_filemap_cachestat. Basically copied from cachestat patchset by Nhat Pham [0]. It returns the number of cached page, dirty pages and writeback pages in the passed inode. 2) bpf_filemap_find_present & bpf_filemap_get_order. These two helpers are used to find the order of the present folios in page cache. The following is the output from bpf selftest when trying to show the cached status and folios order of a xfs inode: sb: bsize 4096 s_op xfs_super_operations s_type xfs_fs_type name xfs ino: inode nlink 1 inum 131 size 10485760, name inode.test cache: cached 2560 dirty 0 wb 0 evicted 0 orders: page offset 0 order 2 page offset 4 order 2 page offset 8 order 2 page offset 12 order 2 page offset 16 order 4 page offset 32 order 4 page offset 48 order 4 page offset 64 order 5 page offset 96 order 4 page offset 112 order 4 ...... [0]: https://lore.kernel.org/linux-mm/20230503013608.2431726-1-nphamcs@gmail.com/T/#t Signed-off-by: Hou Tao --- include/linux/fs.h | 4 ++ include/uapi/linux/mman.h | 8 ++++ kernel/bpf/helpers.c | 26 +++++++++++++ mm/filemap.c | 77 +++++++++++++++++++++++++++++++++++++++ 4 files changed, 115 insertions(+) diff --git a/include/linux/fs.h b/include/linux/fs.h index 67495ef79bb2..5ce17e87c4f6 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -46,6 +46,7 @@ #include #include +#include struct backing_dev_info; struct bdi_writeback; @@ -3191,4 +3192,7 @@ extern int vfs_fadvise(struct file *file, loff_t offset, loff_t len, extern int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice); +extern void filemap_cachestat(struct address_space *mapping, pgoff_t first_index, + pgoff_t last_index, struct cachestat *cs); + #endif /* _LINUX_FS_H */ diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h index f55bc680b5b0..6e9aa23aa124 100644 --- a/include/uapi/linux/mman.h +++ b/include/uapi/linux/mman.h @@ -4,6 +4,7 @@ #include #include +#include #define MREMAP_MAYMOVE 1 #define MREMAP_FIXED 2 @@ -41,4 +42,11 @@ #define MAP_HUGE_2GB HUGETLB_FLAG_ENCODE_2GB #define MAP_HUGE_16GB HUGETLB_FLAG_ENCODE_16GB +struct cachestat { + __u64 nr_cache; + __u64 nr_dirty; + __u64 nr_writeback; + __u64 nr_evicted; +}; + #endif /* _UAPI_LINUX_MMAN_H */ diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index bb6b4637ebf2..95174d1ef5bb 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -22,6 +22,7 @@ #include #include #include +#include #include "../../lib/kstrtox.h" @@ -2170,6 +2171,27 @@ __bpf_kfunc struct task_struct *bpf_task_from_pid(s32 pid) return p; } +__bpf_kfunc void bpf_filemap_cachestat(struct inode *inode, unsigned long from, + unsigned long last, struct cachestat *cs) +{ + filemap_cachestat(inode->i_mapping, from, last, cs); +} + +__bpf_kfunc long bpf_filemap_find_present(struct inode *inode, unsigned long from, + unsigned long last) +{ + unsigned long index = from; + + if (!xa_find(&inode->i_mapping->i_pages, &index, last, XA_PRESENT)) + return ULONG_MAX; + return index; +} + +__bpf_kfunc long bpf_filemap_get_order(struct inode *inode, unsigned long index) +{ + return xa_get_order(&inode->i_mapping->i_pages, index); +} + /** * bpf_dynptr_slice() - Obtain a read-only pointer to the dynptr data. * @ptr: The dynptr whose data slice to retrieve @@ -2402,6 +2424,10 @@ BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL) #endif BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL) +/* TODO: KF_TRUSTED_ARGS is missing */ +BTF_ID_FLAGS(func, bpf_filemap_cachestat); +BTF_ID_FLAGS(func, bpf_filemap_find_present); +BTF_ID_FLAGS(func, bpf_filemap_get_order); BTF_SET8_END(generic_btf_ids) static const struct btf_kfunc_id_set generic_kfunc_set = { diff --git a/mm/filemap.c b/mm/filemap.c index 2723104cc06a..fc63a02a9b0d 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -4122,3 +4122,80 @@ bool filemap_release_folio(struct folio *folio, gfp_t gfp) return try_to_free_buffers(folio); } EXPORT_SYMBOL(filemap_release_folio); + +/** + * filemap_cachestat() - compute the page cache statistics of a mapping + * @mapping: The mapping to compute the statistics for. + * @first_index: The starting page cache index. + * @last_index: The final page index (inclusive). + * @cs: the cachestat struct to write the result to. + * + * This will query the page cache statistics of a mapping in the + * page range of [first_index, last_index] (inclusive). The statistics + * queried include: number of dirty pages, number of pages marked for + * writeback, and the number of (recently) evicted pages. + */ +void filemap_cachestat(struct address_space *mapping, pgoff_t first_index, + pgoff_t last_index, struct cachestat *cs) +{ + XA_STATE(xas, &mapping->i_pages, first_index); + struct folio *folio; + + rcu_read_lock(); + xas_for_each(&xas, folio, last_index) { + unsigned long nr_pages; + pgoff_t folio_first_index, folio_last_index; + + if (xas_retry(&xas, folio)) + continue; + + if (xa_is_value(folio)) { + /* page is evicted */ + void *shadow = (void *)folio; + bool workingset; /* not used */ + int order = xa_get_order(xas.xa, xas.xa_index); + + nr_pages = 1 << order; + /* rounds down to the nearest multiple of 2^order */ + folio_first_index = xas.xa_index >> order << order; + folio_last_index = folio_first_index + nr_pages - 1; + + /* Folios might straddle the range boundaries, only count covered pages */ + if (folio_first_index < first_index) + nr_pages -= first_index - folio_first_index; + + if (folio_last_index > last_index) + nr_pages -= folio_last_index - last_index; + + cs->nr_evicted += nr_pages; + goto resched; + } + + nr_pages = folio_nr_pages(folio); + folio_first_index = folio_pgoff(folio); + folio_last_index = folio_first_index + nr_pages - 1; + + /* Folios might straddle the range boundaries, only count covered pages */ + if (folio_first_index < first_index) + nr_pages -= first_index - folio_first_index; + + if (folio_last_index > last_index) + nr_pages -= folio_last_index - last_index; + + /* page is in cache */ + cs->nr_cache += nr_pages; + + if (folio_test_dirty(folio)) + cs->nr_dirty += nr_pages; + + if (folio_test_writeback(folio)) + cs->nr_writeback += nr_pages; + +resched: + if (need_resched()) { + xas_pause(&xas); + cond_resched_rcu(); + } + } + rcu_read_unlock(); +} From patchwork Sun May 7 04:01:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 13233666 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A31AC7EE24 for ; Sun, 7 May 2023 03:30:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229881AbjEGDac (ORCPT ); Sat, 6 May 2023 23:30:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230110AbjEGDa0 (ORCPT ); Sat, 6 May 2023 23:30:26 -0400 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 10D5018161; Sat, 6 May 2023 20:30:24 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4QDVL26yg4z4f3wRR; Sun, 7 May 2023 11:30:18 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAHcLNHG1dkjIawIw--.21328S7; Sun, 07 May 2023 11:30:20 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, Alexei Starovoitov , Yonghong Song , Andrii Nakryiko , Viacheslav Dubeyko , Amir Goldstein , houtao1@huawei.com Subject: [RFC PATCH bpf-next 3/4] bpf: Introduce bpf iterator for file system mount Date: Sun, 7 May 2023 12:01:06 +0800 Message-Id: <20230507040107.3755166-4-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20230507040107.3755166-1-houtao@huaweicloud.com> References: <20230507040107.3755166-1-houtao@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAHcLNHG1dkjIawIw--.21328S7 X-Coremail-Antispam: 1UD129KBjvJXoWxuw1rCFy3JF1UAr1ftw4xJFb_yoW7tr43pF s5ArsrCr4xX3y7Cr1vyF47uF1Fy3WS9a4UGrZ7W3yYkF4qqr1vgw1rKr1IyFyrJrW8K3sa qFWIk3y5CryUArJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUvGb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUWw A2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVW7JVWDJwA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr 0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY 17CE14v26r1q6r43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcV C0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY 6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa 73UjIFyTuYvjxUFYFCUUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Hou Tao Now the only way to query the information about a specific mount is to parse the content of /proc/pid/mountinfo, find the specific mount and return the needed information. There is no way to query for a specific mount directly, so introduce bpf iterator for fs mount to support that. By passing a fd to bpf iterator, the bpf program will get the mount of the specific file and it can output the necessary information for the mount in bpf iterator fd. The following is the output from "test_progs -t bpf_iter_fs/fs_mnt" which shows the basic information of a tmpfs mount: dev 0:31 id 40 parent_id 24 mnt_flags 0x1003 shared:17 Signed-off-by: Hou Tao --- include/linux/btf_ids.h | 3 +- include/uapi/linux/bpf.h | 1 + kernel/bpf/fs_iter.c | 60 +++++++++++++++++++++++++++++----- tools/include/uapi/linux/bpf.h | 1 + 4 files changed, 55 insertions(+), 10 deletions(-) diff --git a/include/linux/btf_ids.h b/include/linux/btf_ids.h index 9e036d1360e7..48537adee9fc 100644 --- a/include/linux/btf_ids.h +++ b/include/linux/btf_ids.h @@ -257,7 +257,8 @@ extern u32 btf_sock_ids[]; BTF_TRACING_TYPE(BTF_TRACING_TYPE_FILE, file) \ BTF_TRACING_TYPE(BTF_TRACING_TYPE_VMA, vm_area_struct) \ BTF_TRACING_TYPE(BTF_TRACING_TYPE_INODE, inode) \ - BTF_TRACING_TYPE(BTF_TRACING_TYPE_DENTRY, dentry) + BTF_TRACING_TYPE(BTF_TRACING_TYPE_DENTRY, dentry) \ + BTF_TRACING_TYPE(BTF_TRACING_TYPE_MOUNT, mount) enum { diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 099048ba3edc..62bed6e603a5 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -97,6 +97,7 @@ enum bpf_cgroup_iter_order { enum bpf_fs_iter_type { BPF_FS_ITER_INODE = 0, /* a specific inode */ + BPF_FS_ITER_MNT, /* a specific mount */ }; union bpf_iter_link_info { diff --git a/kernel/bpf/fs_iter.c b/kernel/bpf/fs_iter.c index cd7f10ea00ab..19f83211ccc4 100644 --- a/kernel/bpf/fs_iter.c +++ b/kernel/bpf/fs_iter.c @@ -9,7 +9,11 @@ #include #include +/* TODO: move fs_iter.c to fs directory ? */ +#include "../../fs/mount.h" + DEFINE_BPF_ITER_FUNC(fs_inode, struct bpf_iter_meta *meta, struct inode *inode, struct dentry *dentry); +DEFINE_BPF_ITER_FUNC(fs_mnt, struct bpf_iter_meta *meta, struct mount *mnt); struct bpf_iter__fs_inode { __bpf_md_ptr(struct bpf_iter_meta *, meta); @@ -17,6 +21,11 @@ struct bpf_iter__fs_inode { __bpf_md_ptr(struct dentry *, dentry); }; +struct bpf_iter__fs_mnt { + __bpf_md_ptr(struct bpf_iter_meta *, meta); + __bpf_md_ptr(struct mount *, mnt); +}; + struct bpf_fs_iter_aux_info { atomic_t count; enum bpf_fs_iter_type type; @@ -47,7 +56,7 @@ static int bpf_iter_attach_fs(struct bpf_prog *prog, union bpf_iter_link_info *l struct bpf_fs_iter_aux_info *fs; struct file *filp; - if (linfo->fs.type > BPF_FS_ITER_INODE) + if (linfo->fs.type > BPF_FS_ITER_MNT) return -EINVAL; /* TODO: The file-system is pinned */ filp = fget(linfo->fs.fd); @@ -99,12 +108,14 @@ static void *fs_iter_seq_start(struct seq_file *m, loff_t *pos) if (*pos == 0) ++*pos; - return file_inode(info->fs->filp); + if (info->fs->type == BPF_FS_ITER_INODE) + return file_inode(info->fs->filp); + return real_mount(info->fs->filp->f_path.mnt); } static int __fs_iter_seq_show(struct seq_file *m, void *v, bool stop) { - struct bpf_iter__fs_inode ctx; + struct bpf_iter_seq_fs_info *info = m->private; struct bpf_iter_meta meta; struct bpf_prog *prog; int err; @@ -114,11 +125,21 @@ static int __fs_iter_seq_show(struct seq_file *m, void *v, bool stop) if (!prog) return 0; - ctx.meta = &meta; - ctx.inode = v; - ctx.dentry = v ? d_find_alias(v) : NULL; - err = bpf_iter_run_prog(prog, &ctx); - dput(ctx.dentry); + if (info->fs->type == BPF_FS_ITER_INODE) { + struct bpf_iter__fs_inode ino_ctx; + + ino_ctx.meta = &meta; + ino_ctx.inode = v; + ino_ctx.dentry = v ? d_find_alias(v) : NULL; + err = bpf_iter_run_prog(prog, &ino_ctx); + dput(ino_ctx.dentry); + } else { + struct bpf_iter__fs_mnt mnt_ctx; + + mnt_ctx.meta = &meta; + mnt_ctx.mnt = v; + err = bpf_iter_run_prog(prog, &mnt_ctx); + } return err; } @@ -165,10 +186,31 @@ static struct bpf_iter_reg fs_inode_reg_info = { .seq_info = &fs_iter_seq_info, }; +static struct bpf_iter_reg fs_mnt_reg_info = { + .target = "fs_mnt", + .attach_target = bpf_iter_attach_fs, + .detach_target = bpf_iter_detach_fs, + .ctx_arg_info_size = 1, + .ctx_arg_info = { + { offsetof(struct bpf_iter__fs_mnt, mnt), PTR_TO_BTF_ID_OR_NULL }, + }, + .seq_info = &fs_iter_seq_info, +}; + static int __init fs_iter_init(void) { + int err; + fs_inode_reg_info.ctx_arg_info[0].btf_id = btf_tracing_ids[BTF_TRACING_TYPE_INODE]; fs_inode_reg_info.ctx_arg_info[1].btf_id = btf_tracing_ids[BTF_TRACING_TYPE_DENTRY]; - return bpf_iter_reg_target(&fs_inode_reg_info); + err = bpf_iter_reg_target(&fs_inode_reg_info); + if (err) + return err; + + fs_mnt_reg_info.ctx_arg_info[0].btf_id = btf_tracing_ids[BTF_TRACING_TYPE_MOUNT]; + err = bpf_iter_reg_target(&fs_mnt_reg_info); + if (err) + bpf_iter_unreg_target(&fs_inode_reg_info); + return err; } late_initcall(fs_iter_init); diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 099048ba3edc..62bed6e603a5 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -97,6 +97,7 @@ enum bpf_cgroup_iter_order { enum bpf_fs_iter_type { BPF_FS_ITER_INODE = 0, /* a specific inode */ + BPF_FS_ITER_MNT, /* a specific mount */ }; union bpf_iter_link_info { From patchwork Sun May 7 04:01:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 13233667 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26100C7EE2A for ; Sun, 7 May 2023 03:30:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230110AbjEGDae (ORCPT ); Sat, 6 May 2023 23:30:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35106 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230148AbjEGDa0 (ORCPT ); Sat, 6 May 2023 23:30:26 -0400 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2C4231816B; Sat, 6 May 2023 20:30:24 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4QDVL32bfkz4f3wRT; Sun, 7 May 2023 11:30:19 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP4 (Coremail) with SMTP id gCh0CgAHcLNHG1dkjIawIw--.21328S8; Sun, 07 May 2023 11:30:20 +0800 (CST) From: Hou Tao To: bpf@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, Alexei Starovoitov , Yonghong Song , Andrii Nakryiko , Viacheslav Dubeyko , Amir Goldstein , houtao1@huawei.com Subject: [RFC PATCH bpf-next 4/4] selftests/bpf: Add test cases for bpf file-system iterator Date: Sun, 7 May 2023 12:01:07 +0800 Message-Id: <20230507040107.3755166-5-houtao@huaweicloud.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20230507040107.3755166-1-houtao@huaweicloud.com> References: <20230507040107.3755166-1-houtao@huaweicloud.com> MIME-Version: 1.0 X-CM-TRANSID: gCh0CgAHcLNHG1dkjIawIw--.21328S8 X-Coremail-Antispam: 1UD129KBjvJXoW3AF1rWF15Gr43Xw4xKw15Jwb_yoWDJr15pa yrX345Cr4fX3y7Wr4ktF43uryYva1UWa4xGrZ7WF1rAr4kZr929F1xKry2vFnxJrZ09a1I v3yaka48Jr18XFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUvEb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUAV Cq3wA2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0 rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVW7JVWDJwA2z4x0Y4vE2Ix0cI8IcVCY1x0267 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E 14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7 xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Y z7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7 v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF 1VAY17CE14v26r1q6r43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIx AIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI 42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWI evJa73UjIFyTuYvjxUFgAwUUUUU X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Hou Tao Add three test cases to demonstrate the basic functionalities of bpf file-system iterator: 1) dump_raw_inode. Use bpf_seq_printf_btf to dump the content of the passed inode and its super_block 2) dump_inode. Use bpf_filemap_{cachestat,find_present,get_order} to dump the details of the inode page cache. 3) dump_mnt. Dump the basic information of the passed mount. Signed-off-by: Hou Tao --- .../selftests/bpf/prog_tests/bpf_iter_fs.c | 184 ++++++++++++++++++ .../testing/selftests/bpf/progs/bpf_iter_fs.c | 122 ++++++++++++ 2 files changed, 306 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_iter_fs.c create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_fs.c diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter_fs.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter_fs.c new file mode 100644 index 000000000000..e26d736001b4 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter_fs.c @@ -0,0 +1,184 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2023. Huawei Technologies Co., Ltd */ +#include +#include "bpf_iter_fs.skel.h" + +static void test_bpf_iter_raw_inode(void) +{ + const char *fpath = "/tmp/raw_inode.test"; + DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts); + union bpf_iter_link_info linfo; + int ino_fd, iter_fd, err; + struct bpf_iter_fs *skel; + struct bpf_link *link; + char buf[8192]; + ssize_t nr; + + ino_fd = open(fpath, O_WRONLY | O_CREAT | O_TRUNC, 0644); + if (!ASSERT_GE(ino_fd, 0, "open file")) + return; + ftruncate(ino_fd, 4095); + + skel = bpf_iter_fs__open(); + if (!ASSERT_OK_PTR(skel, "open")) + goto close_ino; + + bpf_program__set_autoload(skel->progs.dump_raw_inode, true); + + err = bpf_iter_fs__load(skel); + if (!ASSERT_OK(err, "load")) + goto free_skel; + + memset(&linfo, 0, sizeof(linfo)); + linfo.fs.type = BPF_FS_ITER_INODE; + linfo.fs.fd = ino_fd; + opts.link_info = &linfo; + opts.link_info_len = sizeof(linfo); + link = bpf_program__attach_iter(skel->progs.dump_raw_inode, &opts); + if (!ASSERT_OK_PTR(link, "attach iter")) + goto free_skel; + + iter_fd = bpf_iter_create(bpf_link__fd(link)); + if (!ASSERT_GE(iter_fd, 0, "create iter")) + goto free_link; + + nr = read(iter_fd, buf, sizeof(buf)); + if (!ASSERT_GT(nr, 0, "read iter")) + goto close_iter; + + buf[nr - 1] = 0; + puts(buf); + +close_iter: + close(iter_fd); +free_link: + bpf_link__destroy(link); +free_skel: + bpf_iter_fs__destroy(skel); +close_ino: + close(ino_fd); +} + +static void test_bpf_iter_inode(void) +{ + const char *fpath = "/tmp/inode.test"; + DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts); + union bpf_iter_link_info linfo; + int ino_fd, iter_fd, err; + struct bpf_iter_fs *skel; + struct bpf_link *link; + char buf[8192]; + ssize_t nr; + + /* Close fd after reading iterator completes */ + ino_fd = open(fpath, O_WRONLY | O_CREAT | O_TRUNC, 0644); + if (!ASSERT_GE(ino_fd, 0, "open file")) + return; + pwrite(ino_fd, buf, sizeof(buf), 0); + pwrite(ino_fd, buf, sizeof(buf), sizeof(buf) * 2); + + skel = bpf_iter_fs__open(); + if (!ASSERT_OK_PTR(skel, "open")) + goto close_ino; + + bpf_program__set_autoload(skel->progs.dump_inode, true); + + err = bpf_iter_fs__load(skel); + if (!ASSERT_OK(err, "load")) + goto free_skel; + + memset(&linfo, 0, sizeof(linfo)); + linfo.fs.type = BPF_FS_ITER_INODE; + linfo.fs.fd = ino_fd; + opts.link_info = &linfo; + opts.link_info_len = sizeof(linfo); + link = bpf_program__attach_iter(skel->progs.dump_inode, &opts); + if (!ASSERT_OK_PTR(link, "attach iter")) + goto free_skel; + + iter_fd = bpf_iter_create(bpf_link__fd(link)); + if (!ASSERT_GE(iter_fd, 0, "create iter")) + goto free_link; + + nr = read(iter_fd, buf, sizeof(buf)); + if (!ASSERT_GT(nr, 0, "read iter")) + goto close_iter; + + buf[nr - 1] = 0; + puts(buf); + +close_iter: + close(iter_fd); +free_link: + bpf_link__destroy(link); +free_skel: + bpf_iter_fs__destroy(skel); +close_ino: + close(ino_fd); +} + +static void test_bpf_iter_mnt(void) +{ + const char *fpath = "/tmp/mnt.test"; + DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts); + union bpf_iter_link_info linfo; + int mnt_fd, iter_fd, err; + struct bpf_iter_fs *skel; + struct bpf_link *link; + char buf[8192]; + ssize_t nr; + + /* Close fd after reading iterator completes */ + mnt_fd = open(fpath, O_WRONLY | O_CREAT | O_TRUNC, 0644); + if (!ASSERT_GE(mnt_fd, 0, "open file")) + return; + + skel = bpf_iter_fs__open(); + if (!ASSERT_OK_PTR(skel, "open")) + goto close_ino; + + bpf_program__set_autoload(skel->progs.dump_mnt, true); + + err = bpf_iter_fs__load(skel); + if (!ASSERT_OK(err, "load")) + goto free_skel; + + memset(&linfo, 0, sizeof(linfo)); + linfo.fs.type = BPF_FS_ITER_MNT; + linfo.fs.fd = mnt_fd; + opts.link_info = &linfo; + opts.link_info_len = sizeof(linfo); + link = bpf_program__attach_iter(skel->progs.dump_mnt, &opts); + if (!ASSERT_OK_PTR(link, "attach iter")) + goto free_skel; + + iter_fd = bpf_iter_create(bpf_link__fd(link)); + if (!ASSERT_GE(iter_fd, 0, "create iter")) + goto free_link; + + nr = read(iter_fd, buf, sizeof(buf)); + if (!ASSERT_GT(nr, 0, "read iter")) + goto close_iter; + + buf[nr - 1] = 0; + puts(buf); + +close_iter: + close(iter_fd); +free_link: + bpf_link__destroy(link); +free_skel: + bpf_iter_fs__destroy(skel); +close_ino: + close(mnt_fd); +} + +void test_bpf_iter_fs(void) +{ + if (test__start_subtest("dump_raw_inode")) + test_bpf_iter_raw_inode(); + if (test__start_subtest("dump_inode")) + test_bpf_iter_inode(); + if (test__start_subtest("dump_mnt")) + test_bpf_iter_mnt(); +} diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_fs.c b/tools/testing/selftests/bpf/progs/bpf_iter_fs.c new file mode 100644 index 000000000000..e238446b6ddf --- /dev/null +++ b/tools/testing/selftests/bpf/progs/bpf_iter_fs.c @@ -0,0 +1,122 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2023. Huawei Technologies Co., Ltd */ +#include "bpf_iter.h" +#include +#include +#include + +char _license[] SEC("license") = "GPL"; + +struct dump_ctx { + struct seq_file *seq; + struct inode *inode; + unsigned long from; + unsigned long max; +}; + +void bpf_filemap_cachestat(struct inode *inode, unsigned long from, unsigned long last, + struct cachestat *cs) __ksym; +long bpf_filemap_find_present(struct inode *inode, unsigned long from, unsigned long last) __ksym; +long bpf_filemap_get_order(struct inode *inode, unsigned long index) __ksym; + +static u64 dump_page_order(unsigned int i, void *ctx) +{ + struct dump_ctx *dump = ctx; + unsigned long index; + unsigned int order; + + index = bpf_filemap_find_present(dump->inode, dump->from, dump->max); + if (index == -1UL) + return 1; + order = bpf_filemap_get_order(dump->inode, index); + + BPF_SEQ_PRINTF(dump->seq, " page offset %lu order %u\n", index, order); + dump->from = index + (1 << order); + return 0; +} + +SEC("?iter/fs_inode") +int dump_raw_inode(struct bpf_iter__fs_inode *ctx) +{ + struct seq_file *seq = ctx->meta->seq; + struct inode *inode = ctx->inode; + struct btf_ptr ptr; + + if (inode == NULL) + return 0; + + memset(&ptr, 0, sizeof(ptr)); + ptr.type_id = bpf_core_type_id_kernel(struct inode); + ptr.ptr = inode; + bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0); + + memset(&ptr, 0, sizeof(ptr)); + ptr.type_id = bpf_core_type_id_kernel(struct super_block); + ptr.ptr = inode->i_sb; + bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0); + + return 0; +} + +SEC("?iter/fs_inode") +int dump_inode(struct bpf_iter__fs_inode *ctx) +{ + struct seq_file *seq = ctx->meta->seq; + struct inode *inode = ctx->inode; + struct cachestat cs = {}; + struct super_block *sb; + struct dentry *dentry; + struct dump_ctx dump; + + if (inode == NULL) + return 0; + + sb = inode->i_sb; + BPF_SEQ_PRINTF(seq, "sb: bsize %lu s_op %ps s_type %ps name %s\n", + sb->s_blocksize, sb->s_op, sb->s_type, sb->s_type->name); + + BPF_SEQ_PRINTF(seq, "ino: inode nlink %d inum %lu size %llu", + inode->i_nlink, inode->i_ino, inode->i_size); + dentry = ctx->dentry; + if (dentry) + BPF_SEQ_PRINTF(seq, ", name %s\n", dentry->d_name.name); + else + BPF_SEQ_PRINTF(seq, "\n"); + + bpf_filemap_cachestat(inode, 0, ~0UL, &cs); + BPF_SEQ_PRINTF(seq, "cache: cached %llu dirty %llu wb %llu evicted %llu\n", + cs.nr_cache, cs.nr_dirty, cs.nr_writeback, cs.nr_evicted); + + dump.seq = seq; + dump.inode = inode; + dump.from = 0; + /* TODO: handle BPF_MAX_LOOPS */ + dump.max = ((unsigned long)inode->i_size + 4095) / 4096; + BPF_SEQ_PRINTF(seq, "orders:\n"); + bpf_loop(dump.max, dump_page_order, &dump, 0); + + return 0; +} + +SEC("?iter/fs_mnt") +int dump_mnt(struct bpf_iter__fs_mnt *ctx) +{ + struct seq_file *seq = ctx->meta->seq; + struct mount *mnt = ctx->mnt; + struct super_block *sb; + + if (mnt == NULL) + return 0; + + sb = mnt->mnt.mnt_sb; + BPF_SEQ_PRINTF(seq, "dev %u:%u ", + sb->s_dev >> 20, sb->s_dev & ((1 << 20) - 1)); + + BPF_SEQ_PRINTF(seq, "id %d parent_id %d mnt_flags 0x%x", + mnt->mnt_id, mnt->mnt_parent->mnt_id, mnt->mnt.mnt_flags); + if (mnt->mnt.mnt_flags & 0x1000) + BPF_SEQ_PRINTF(seq, " shared:%d", mnt->mnt_group_id); + BPF_SEQ_PRINTF(seq, "\n"); + + return 0; +}