[bpf-next,v2,09/25] bpf: Support bpf_list_head in map values

Message ID	20221013062303.896469-10-memxor@gmail.com (mailing list archive)
State	Changes Requested
Delegated to:	BPF
Headers	show Return-Path: <bpf-owner@kernel.org> From: Kumar Kartikeya Dwivedi <memxor@gmail.com> To: bpf@vger.kernel.org Cc: Alexei Starovoitov <ast@kernel.org>, Andrii Nakryiko <andrii@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Dave Marchevsky <davemarchevsky@meta.com>, Delyan Kratunov <delyank@meta.com> Subject: [PATCH bpf-next v2 09/25] bpf: Support bpf_list_head in map values Date: Thu, 13 Oct 2022 11:52:47 +0530 Message-Id: <20221013062303.896469-10-memxor@gmail.com> In-Reply-To: <20221013062303.896469-1-memxor@gmail.com> References: <20221013062303.896469-1-memxor@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Local kptrs, BPF linked lists \| expand [bpf-next,v2,00/25] Local kptrs, BPF linked lists [bpf-next,v2,01/25] bpf: Document UAPI details for special BPF types [bpf-next,v2,02/25] bpf: Allow specifying volatile type modifier for kptrs [bpf-next,v2,03/25] bpf: Clobber stack slot when writing over spilled PTR_TO_BTF_ID [bpf-next,v2,04/25] bpf: Fix slot type check in check_stack_write_var_off [bpf-next,v2,05/25] bpf: Drop reg_type_may_be_refcounted_or_null [bpf-next,v2,06/25] bpf: Refactor kptr_off_tab into fields_tab [bpf-next,v2,07/25] bpf: Consolidate spin_lock, timer management into fields_tab [bpf-next,v2,08/25] bpf: Refactor map->off_arr handling [bpf-next,v2,09/25] bpf: Support bpf_list_head in map values [bpf-next,v2,10/25] bpf: Introduce local kptrs [bpf-next,v2,11/25] bpf: Recognize bpf_{spin_lock,list_head,list_node} in local kptrs [bpf-next,v2,12/25] bpf: Verify ownership relationships for owning types [bpf-next,v2,13/25] bpf: Support locking bpf_spin_lock in local kptr [bpf-next,v2,14/25] bpf: Allow locking bpf_spin_lock global variables [bpf-next,v2,15/25] bpf: Rewrite kfunc argument handling [bpf-next,v2,16/25] bpf: Drop kfunc bits from btf_check_func_arg_match [bpf-next,v2,17/25] bpf: Support constant scalar arguments for kfuncs [bpf-next,v2,18/25] bpf: Teach verifier about non-size constant arguments [bpf-next,v2,19/25] bpf: Introduce bpf_kptr_new [bpf-next,v2,20/25] bpf: Introduce bpf_kptr_drop [bpf-next,v2,21/25] bpf: Permit NULL checking pointer with non-zero fixed offset [bpf-next,v2,22/25] bpf: Introduce single ownership BPF linked list API [bpf-next,v2,23/25] libbpf: Add support for private BSS map section [bpf-next,v2,24/25] selftests/bpf: Add __contains macro to bpf_experimental.h [bpf-next,v2,25/25] selftests/bpf: Add BPF linked list API tests

Context	Check	Description
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-3	success	Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-15	success	Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16	success	Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17	success	Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-7	success	Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-8	success	Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-9	success	Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-10	fail	Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-11	fail	Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-13	fail	Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14	fail	Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-PR	fail	PR summary
bpf/vmtest-bpf-next-VM_Test-6	success	Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12	success	Logs for test_progs_no_alu32 on s390x with gcc
netdev/tree_selection	success	Clearly marked for bpf-next, async
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/subject_prefix	success	Link
netdev/cover_letter	success	Series has a cover letter
netdev/patch_count	fail	Series longer than 15 patches (and no cover letter)
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	fail	Errors and warnings before: 1360 this patch: 1356
netdev/cc_maintainers	warning	11 maintainers not CCed: sdf@google.com john.fastabend@gmail.com yhs@fb.com haoluo@google.com linux-kselftest@vger.kernel.org jolsa@kernel.org kpsingh@kernel.org song@kernel.org shuah@kernel.org mykolal@fb.com martin.lau@linux.dev
netdev/build_clang	success	Errors and warnings before: 157 this patch: 157
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 1350 this patch: 1350
netdev/checkpatch	warning	WARNING: Missing or malformed SPDX-License-Identifier tag in line 1 WARNING: Prefer __aligned(8) over __attribute__((aligned(8))) WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? WARNING: line length of 81 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 86 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns WARNING: line length of 96 exceeds 80 columns WARNING: line length of 99 exceeds 80 columns
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-4	success	Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-5	success	Logs for set-matrix

diff --git a/include/linux/bpf.h b/include/linux/bpf.h index bc8e7a132664..46330d871d4e 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -27,6 +27,8 @@ #include <linux/bpfptr.h> #include <linux/btf.h> #include <linux/rcupdate_trace.h> +/* Experimental BPF APIs header for type definitions */ +#include "../tools/testing/selftests/bpf/bpf_experimental.h" struct bpf_verifier_env; struct bpf_verifier_log; @@ -175,6 +177,7 @@ enum btf_field_type { BPF_KPTR_UNREF = (1 << 2), BPF_KPTR_REF = (1 << 3), BPF_KPTR = BPF_KPTR_UNREF | BPF_KPTR_REF, + BPF_LIST_HEAD = (1 << 4), }; struct btf_field_kptr { @@ -184,11 +187,18 @@ struct btf_field_kptr { u32 btf_id; }; +struct btf_field_list_head { + struct btf *btf; + u32 value_btf_id; + u32 node_offset; +}; + struct btf_field { u32 offset; enum btf_field_type type; union { struct btf_field_kptr kptr; + struct btf_field_list_head list_head; }; }; @@ -266,6 +276,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type) case BPF_KPTR_UNREF: case BPF_KPTR_REF: return "kptr"; + case BPF_LIST_HEAD: + return "bpf_list_head"; default: WARN_ON_ONCE(1); return "unknown"; @@ -282,6 +294,8 @@ static inline u32 btf_field_type_size(enum btf_field_type type) case BPF_KPTR_UNREF: case BPF_KPTR_REF: return sizeof(u64); + case BPF_LIST_HEAD: + return sizeof(struct bpf_list_head); default: WARN_ON_ONCE(1); return 0; @@ -298,6 +312,8 @@ static inline u32 btf_field_type_align(enum btf_field_type type) case BPF_KPTR_UNREF: case BPF_KPTR_REF: return __alignof__(u64); + case BPF_LIST_HEAD: + return __alignof__(struct bpf_list_head); default: WARN_ON_ONCE(1); return 0; @@ -401,6 +417,9 @@ static inline void zero_map_value(struct bpf_map *map, void *dst) void copy_map_value_locked(struct bpf_map *map, void *dst, void *src, bool lock_src); void bpf_timer_cancel_and_free(void *timer); +void bpf_list_head_free(const struct btf_field *field, void *list_head, + struct bpf_spin_lock *spin_lock); + int bpf_obj_name_cpy(char *dst, const char *src, unsigned int size); struct bpf_offload_dev; diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index daadcd8641b5..066984d73a8b 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -3205,9 +3205,15 @@ enum { struct btf_field_info { enum btf_field_type type; u32 off; - struct { - u32 type_id; - } kptr; + union { + struct { + u32 type_id; + } kptr; + struct { + const char *node_name; + u32 value_btf_id; + } list_head; + }; }; static int btf_find_struct(const struct btf *btf, const struct btf_type *t, @@ -3261,6 +3267,69 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t, return BTF_FIELD_FOUND; } +static const char *btf_find_decl_tag_value(const struct btf *btf, + const struct btf_type *pt, + int comp_idx, const char *tag_key) +{ + int i; + + for (i = 1; i < btf_nr_types(btf); i++) { + const struct btf_type *t = btf_type_by_id(btf, i); + int len = strlen(tag_key); + + if (!btf_type_is_decl_tag(t)) + continue; + /* TODO: Instead of btf_type pt, it would be much better if we had BTF + * ID of the map value type. This would avoid btf_type_by_id call here. + */ + if (pt != btf_type_by_id(btf, t->type) || + btf_type_decl_tag(t)->component_idx != comp_idx) + continue; + if (strncmp(__btf_name_by_offset(btf, t->name_off), tag_key, len)) + continue; + return __btf_name_by_offset(btf, t->name_off) + len; + } + return NULL; +} + +static int btf_find_list_head(const struct btf *btf, const struct btf_type *pt, + const struct btf_type *t, int comp_idx, + u32 off, int sz, struct btf_field_info *info) +{ + const char *value_type; + const char *list_node; + s32 id; + + if (!__btf_type_is_struct(t)) + return BTF_FIELD_IGNORE; + if (t->size != sz) + return BTF_FIELD_IGNORE; + value_type = btf_find_decl_tag_value(btf, pt, comp_idx, "contains:"); + if (!value_type) + return -EINVAL; + if (strncmp(value_type, "struct:", sizeof("struct:") - 1)) + return -EINVAL; + value_type += sizeof("struct:") - 1; + list_node = strstr(value_type, ":"); + if (!list_node) + return -EINVAL; + value_type = kstrndup(value_type, list_node - value_type, GFP_ATOMIC); + if (!value_type) + return -ENOMEM; + id = btf_find_by_name_kind(btf, value_type, BTF_KIND_STRUCT); + kfree(value_type); + if (id < 0) + return id; + list_node++; + if (str_is_empty(list_node)) + return -EINVAL; + info->type = BPF_LIST_HEAD; + info->off = off; + info->list_head.value_btf_id = id; + info->list_head.node_name = list_node; + return BTF_FIELD_FOUND; +} + static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask, int *align, int *sz) { @@ -3284,6 +3353,12 @@ static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask, goto end; } } + if (field_mask & BPF_LIST_HEAD) { + if (!strcmp(name, "bpf_list_head")) { + type = BPF_LIST_HEAD; + goto end; + } + } /* Only return BPF_KPTR when all other types with matchable names fail */ if (field_mask & BPF_KPTR) { type = BPF_KPTR_REF; @@ -3317,6 +3392,8 @@ static int btf_find_struct_field(const struct btf *btf, return field_type; off = __btf_member_bit_offset(t, member); + if (i && !off) + return -EFAULT; if (off % 8) /* valid C code cannot generate such BTF */ return -EINVAL; @@ -3339,6 +3416,12 @@ static int btf_find_struct_field(const struct btf *btf, if (ret < 0) return ret; break; + case BPF_LIST_HEAD: + ret = btf_find_list_head(btf, t, member_type, i, off, sz, + idx < info_cnt ? &info[idx] : &tmp); + if (ret < 0) + return ret; + break; default: return -EFAULT; } @@ -3373,6 +3456,8 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t, return field_type; off = vsi->offset; + if (i && !off) + return -EFAULT; if (vsi->size != sz) continue; if (off % align) @@ -3393,6 +3478,12 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t, if (ret < 0) return ret; break; + case BPF_LIST_HEAD: + ret = btf_find_list_head(btf, var, var_type, -1, off, sz, + idx < info_cnt ? &info[idx] : &tmp); + if (ret < 0) + return ret; + break; default: return -EFAULT; } @@ -3491,6 +3582,44 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field, return ret; } +static int btf_parse_list_head(const struct btf *btf, struct btf_field *field, + struct btf_field_info *info) +{ + const struct btf_type *t, *n = NULL; + const struct btf_member *member; + u32 offset; + int i; + + t = btf_type_by_id(btf, info->list_head.value_btf_id); + /* We've already checked that value_btf_id is a struct type. We + * just need to figure out the offset of the list_node, and + * verify its type. + */ + for_each_member(i, t, member) { + if (strcmp(info->list_head.node_name, __btf_name_by_offset(btf, member->name_off))) + continue; + /* Invalid BTF, two members with same name */ + if (n) + return -EINVAL; + n = btf_type_by_id(btf, member->type); + if (!__btf_type_is_struct(n)) + return -EINVAL; + if (strcmp("bpf_list_node", __btf_name_by_offset(btf, n->name_off))) + return -EINVAL; + offset = __btf_member_bit_offset(n, member); + if (offset % 8) + return -EINVAL; + offset /= 8; + if (offset % __alignof__(struct bpf_list_node)) + return -EINVAL; + + field->list_head.btf = (struct btf *)btf; + field->list_head.value_btf_id = info->list_head.value_btf_id; + field->list_head.node_offset = offset; + } + return 0; +} + struct btf_type_fields *btf_parse_fields(const struct btf *btf, const struct btf_type *t, u32 field_mask, @@ -3542,6 +3671,11 @@ struct btf_type_fields *btf_parse_fields(const struct btf *btf, if (ret < 0) goto end; break; + case BPF_LIST_HEAD: + ret = btf_parse_list_head(btf, &tab->fields[i], &info_arr[i]); + if (ret < 0) + goto end; + break; default: ret = -EFAULT; goto end; @@ -3550,6 +3684,13 @@ struct btf_type_fields *btf_parse_fields(const struct btf *btf, tab->cnt++; } tab->cnt = cnt; + + /* bpf_list_head requires bpf_spin_lock */ + if (btf_type_fields_has_field(tab, BPF_LIST_HEAD) && tab->spin_lock_off < 0) { + ret = -EINVAL; + goto end; + } + return tab; end: btf_type_fields_free(tab); diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 8f425596b9c6..a2f2fe43916b 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1700,6 +1700,38 @@ bpf_base_func_proto(enum bpf_func_id func_id) } } +void bpf_list_head_free(const struct btf_field *field, void *list_head, + struct bpf_spin_lock *spin_lock) +{ + struct list_head *head = list_head, *orig_head = head; + unsigned long flags; + + BUILD_BUG_ON(sizeof(struct bpf_list_head) != sizeof(struct list_head)); + BUILD_BUG_ON(__alignof__(struct bpf_list_head) != __alignof__(struct list_head)); + + /* __bpf_spin_lock_irqsave cannot be used here, as we may take a spin + * lock again when we call bpf_obj_free_fields in the loop, and it will + * overwrite the per-CPU local_irq_save state. + */ + local_irq_save(flags); + __bpf_spin_lock(spin_lock); + if (!head->next || list_empty(head)) + goto unlock; + head = head->next; + while (head != orig_head) { + void *obj = head; + + obj -= field->list_head.node_offset; + head = head->next; + /* TODO: Rework later */ + kfree(obj); + } +unlock: + INIT_LIST_HEAD(head); + __bpf_spin_unlock(spin_lock); + local_irq_restore(flags); +} + BTF_SET8_START(tracing_btf_ids) #ifdef CONFIG_KEXEC_CORE BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 3f3f9697d299..92486d777246 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -536,6 +536,9 @@ void btf_type_fields_free(struct btf_type_fields *tab) module_put(tab->fields[i].kptr.module); btf_put(tab->fields[i].kptr.btf); break; + case BPF_LIST_HEAD: + /* Nothing to release for bpf_list_head */ + break; default: WARN_ON_ONCE(1); continue; @@ -578,6 +581,9 @@ struct btf_type_fields *btf_type_fields_dup(const struct btf_type_fields *tab) goto free; } break; + case BPF_LIST_HEAD: + /* Nothing to acquire for bpf_list_head */ + break; default: ret = -EFAULT; WARN_ON_ONCE(1); @@ -637,6 +643,11 @@ void bpf_obj_free_fields(const struct btf_type_fields *tab, void *obj) case BPF_KPTR_REF: field->kptr.dtor((void *)xchg((unsigned long *)field_ptr, 0)); break; + case BPF_LIST_HEAD: + if (WARN_ON_ONCE(tab->spin_lock_off < 0)) + continue; + bpf_list_head_free(field, field_ptr, obj + tab->spin_lock_off); + break; default: WARN_ON_ONCE(1); continue; @@ -965,7 +976,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf, if (!value_type || value_size != map->value_size) return -EINVAL; - map->fields_tab = btf_parse_fields(btf, value_type, BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR, + map->fields_tab = btf_parse_fields(btf, value_type, + BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD, map->value_size); if (!IS_ERR_OR_NULL(map->fields_tab)) { int i; @@ -1011,6 +1023,14 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf, goto free_map_tab; } break; + case BPF_LIST_HEAD: + if (map->map_type != BPF_MAP_TYPE_HASH && + map->map_type != BPF_MAP_TYPE_LRU_HASH && + map->map_type != BPF_MAP_TYPE_ARRAY) { + ret = -EOPNOTSUPP; + goto free_map_tab; + } + break; default: /* Fail if map_type checks are missing for a field type */ ret = -EOPNOTSUPP; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 8660d08589c8..3c47cecda302 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -12643,6 +12643,13 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, } } + if (btf_type_fields_has_field(map->fields_tab, BPF_LIST_HEAD)) { + if (is_tracing_prog_type(prog_type)) { + verbose(env, "tracing progs cannot use bpf_list_head yet\n"); + return -EINVAL; + } + } + if ((bpf_prog_is_dev_bound(prog->aux) || bpf_map_is_dev_bound(map)) && !bpf_offload_prog_map_match(prog, map)) { verbose(env, "offload device mismatch between prog and map\n"); diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h new file mode 100644 index 000000000000..4e31790e433d --- /dev/null +++ b/tools/testing/selftests/bpf/bpf_experimental.h @@ -0,0 +1,23 @@ +#ifndef __KERNEL__ + +#include <vmlinux.h> +#include <bpf/bpf_tracing.h> +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_core_read.h> + +#else + +struct bpf_list_head { + __u64 __a; + __u64 __b; +} __attribute__((aligned(8))); + +struct bpf_list_node { + __u64 __a; + __u64 __b; +} __attribute__((aligned(8))); + +#endif + +#ifndef __KERNEL__ +#endif

[bpf-next,v2,09/25] bpf: Support bpf_list_head in map values

Checks

Commit Message

Comments

Patch