From patchwork Tue Oct 25 18:45:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 13019713 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 005C7FA3740 for ; Tue, 25 Oct 2022 18:46:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232921AbiJYSqL (ORCPT ); Tue, 25 Oct 2022 14:46:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47450 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232870AbiJYSpr (ORCPT ); Tue, 25 Oct 2022 14:45:47 -0400 Received: from out2.migadu.com (out2.migadu.com [IPv6:2001:41d0:2:aacc::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8060A5C9EF for ; Tue, 25 Oct 2022 11:45:40 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1666723539; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9FuhshwQrTqxMwkQTOqpwJ/CB7AkoBoJ9v0bD91EobM=; b=rrJR68WW2eRYKLMZf1fIloQaeNdiUEFHydbMNH38OslZUPdGKFBsIZkQqGO/8WqW+GqKtm TZDphra6yvRRT/XK34E2JTe7KzZkoqIKhcFXOLPo2/fhQGvH2FOOD6wPjXAA6fHb0HdXwL KUUKBdq/RsnTSJrZ/PGmjskPnpzmwr0= From: Martin KaFai Lau To: bpf@vger.kernel.org Cc: 'Alexei Starovoitov ' , 'Andrii Nakryiko ' , 'Daniel Borkmann ' , 'Song Liu ' , kernel-team@meta.com Subject: [PATCH bpf-next 1/9] bpf: Remove prog->active check for bpf_lsm and bpf_iter Date: Tue, 25 Oct 2022 11:45:16 -0700 Message-Id: <20221025184524.3526117-2-martin.lau@linux.dev> In-Reply-To: <20221025184524.3526117-1-martin.lau@linux.dev> References: <20221025184524.3526117-1-martin.lau@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau The commit 64696c40d03c ("bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline") removed prog->active check for struct_ops prog. The bpf_lsm and bpf_iter is also using trampoline. Like struct_ops, the bpf_lsm and bpf_iter have fixed hooks for the prog to attach. The kernel does not call the same hook in a recursive way. This patch also removes the prog->active check for bpf_lsm and bpf_iter. A later patch has a test to reproduce the recursion issue for a sleepable bpf_lsm program. This patch appends the '_recur' naming to the existing enter and exit functions that track the prog->active counter. New __bpf_prog_{enter,exit}[_sleepable] function are added to skip the prog->active tracking. The '_struct_ops' version is also removed. It also moves the decision on picking the enter and exit function to the new bpf_trampoline_{enter,exit}(). It returns the '_recur' ones for all tracing progs to use. For bpf_lsm, bpf_iter, struct_ops (no prog->active tracking after 64696c40d03c), and bpf_lsm_cgroup (no prog->active tracking after 69fd337a975c7), it will return the functions that don't track the prog->active. Signed-off-by: Martin KaFai Lau --- arch/arm64/net/bpf_jit_comp.c | 9 +--- arch/x86/net/bpf_jit_comp.c | 19 +-------- include/linux/bpf.h | 24 +++++------ include/linux/bpf_verifier.h | 15 ++++++- kernel/bpf/syscall.c | 5 ++- kernel/bpf/trampoline.c | 80 +++++++++++++++++++++++++++++------ 6 files changed, 98 insertions(+), 54 deletions(-) diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 30f76178608b..62f805f427b7 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -1649,13 +1649,8 @@ static void invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_link *l, struct bpf_prog *p = l->link.prog; int cookie_off = offsetof(struct bpf_tramp_run_ctx, bpf_cookie); - if (p->aux->sleepable) { - enter_prog = (u64)__bpf_prog_enter_sleepable; - exit_prog = (u64)__bpf_prog_exit_sleepable; - } else { - enter_prog = (u64)__bpf_prog_enter; - exit_prog = (u64)__bpf_prog_exit; - } + enter_prog = (u64)bpf_trampoline_enter(p); + exit_prog = (u64)bpf_trampoline_exit(p); if (l->cookie == 0) { /* if cookie is zero, one instruction is enough to store it */ diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index d7dd8e0db8da..36ffe67ad6e5 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -1894,10 +1894,6 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog, struct bpf_tramp_link *l, int stack_size, int run_ctx_off, bool save_ret) { - void (*exit)(struct bpf_prog *prog, u64 start, - struct bpf_tramp_run_ctx *run_ctx) = __bpf_prog_exit; - u64 (*enter)(struct bpf_prog *prog, - struct bpf_tramp_run_ctx *run_ctx) = __bpf_prog_enter; u8 *prog = *pprog; u8 *jmp_insn; int ctx_cookie_off = offsetof(struct bpf_tramp_run_ctx, bpf_cookie); @@ -1916,23 +1912,12 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog, */ emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_1, -run_ctx_off + ctx_cookie_off); - if (p->aux->sleepable) { - enter = __bpf_prog_enter_sleepable; - exit = __bpf_prog_exit_sleepable; - } else if (p->type == BPF_PROG_TYPE_STRUCT_OPS) { - enter = __bpf_prog_enter_struct_ops; - exit = __bpf_prog_exit_struct_ops; - } else if (p->expected_attach_type == BPF_LSM_CGROUP) { - enter = __bpf_prog_enter_lsm_cgroup; - exit = __bpf_prog_exit_lsm_cgroup; - } - /* arg1: mov rdi, progs[i] */ emit_mov_imm64(&prog, BPF_REG_1, (long) p >> 32, (u32) (long) p); /* arg2: lea rsi, [rbp - ctx_cookie_off] */ EMIT4(0x48, 0x8D, 0x75, -run_ctx_off); - if (emit_call(&prog, enter, prog)) + if (emit_call(&prog, bpf_trampoline_enter(p), prog)) return -EINVAL; /* remember prog start time returned by __bpf_prog_enter */ emit_mov_reg(&prog, true, BPF_REG_6, BPF_REG_0); @@ -1977,7 +1962,7 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog, emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6); /* arg3: lea rdx, [rbp - run_ctx_off] */ EMIT4(0x48, 0x8D, 0x55, -run_ctx_off); - if (emit_call(&prog, exit, prog)) + if (emit_call(&prog, bpf_trampoline_exit(p), prog)) return -EINVAL; *pprog = prog; diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 9e7d46d16032..1279e699dc98 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -854,22 +854,18 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *tr, void *image, void *i const struct btf_func_model *m, u32 flags, struct bpf_tramp_links *tlinks, void *orig_call); -/* these two functions are called from generated trampoline */ -u64 notrace __bpf_prog_enter(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx); -void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start, struct bpf_tramp_run_ctx *run_ctx); -u64 notrace __bpf_prog_enter_sleepable(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx); -void notrace __bpf_prog_exit_sleepable(struct bpf_prog *prog, u64 start, - struct bpf_tramp_run_ctx *run_ctx); -u64 notrace __bpf_prog_enter_lsm_cgroup(struct bpf_prog *prog, - struct bpf_tramp_run_ctx *run_ctx); -void notrace __bpf_prog_exit_lsm_cgroup(struct bpf_prog *prog, u64 start, - struct bpf_tramp_run_ctx *run_ctx); -u64 notrace __bpf_prog_enter_struct_ops(struct bpf_prog *prog, - struct bpf_tramp_run_ctx *run_ctx); -void notrace __bpf_prog_exit_struct_ops(struct bpf_prog *prog, u64 start, - struct bpf_tramp_run_ctx *run_ctx); +u64 notrace __bpf_prog_enter_sleepable_recur(struct bpf_prog *prog, + struct bpf_tramp_run_ctx *run_ctx); +void notrace __bpf_prog_exit_sleepable_recur(struct bpf_prog *prog, u64 start, + struct bpf_tramp_run_ctx *run_ctx); void notrace __bpf_tramp_enter(struct bpf_tramp_image *tr); void notrace __bpf_tramp_exit(struct bpf_tramp_image *tr); +typedef u64 (*bpf_trampoline_enter_t)(struct bpf_prog *prog, + struct bpf_tramp_run_ctx *run_ctx); +typedef void (*bpf_trampoline_exit_t)(struct bpf_prog *prog, u64 start, + struct bpf_tramp_run_ctx *run_ctx); +bpf_trampoline_enter_t bpf_trampoline_enter(const struct bpf_prog *prog); +bpf_trampoline_exit_t bpf_trampoline_exit(const struct bpf_prog *prog); struct bpf_ksym { unsigned long start; diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 9e1e6965f407..1a32baa78ce2 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -642,10 +642,23 @@ static inline u32 type_flag(u32 type) } /* only use after check_attach_btf_id() */ -static inline enum bpf_prog_type resolve_prog_type(struct bpf_prog *prog) +static inline enum bpf_prog_type resolve_prog_type(const struct bpf_prog *prog) { return prog->type == BPF_PROG_TYPE_EXT ? prog->aux->dst_prog->type : prog->type; } +static inline bool bpf_prog_check_recur(const struct bpf_prog *prog) +{ + switch (resolve_prog_type(prog)) { + case BPF_PROG_TYPE_TRACING: + return prog->expected_attach_type != BPF_TRACE_ITER; + case BPF_PROG_TYPE_STRUCT_OPS: + case BPF_PROG_TYPE_LSM: + return false; + default: + return true; + } +} + #endif /* _LINUX_BPF_VERIFIER_H */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 7b373a5e861f..a0b4266196a8 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -5133,13 +5133,14 @@ int kern_sys_bpf(int cmd, union bpf_attr *attr, unsigned int size) run_ctx.bpf_cookie = 0; run_ctx.saved_run_ctx = NULL; - if (!__bpf_prog_enter_sleepable(prog, &run_ctx)) { + if (!__bpf_prog_enter_sleepable_recur(prog, &run_ctx)) { /* recursion detected */ bpf_prog_put(prog); return -EBUSY; } attr->test.retval = bpf_prog_run(prog, (void *) (long) attr->test.ctx_in); - __bpf_prog_exit_sleepable(prog, 0 /* bpf_prog_run does runtime stats */, &run_ctx); + __bpf_prog_exit_sleepable_recur(prog, 0 /* bpf_prog_run does runtime stats */, + &run_ctx); bpf_prog_put(prog); return 0; #endif diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index bf0906e1e2b9..d6395215b849 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -864,7 +864,7 @@ static __always_inline u64 notrace bpf_prog_start_time(void) * [2..MAX_U64] - execute bpf prog and record execution time. * This is start time. */ -u64 notrace __bpf_prog_enter(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx) +static u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx) __acquires(RCU) { rcu_read_lock(); @@ -901,7 +901,8 @@ static void notrace update_prog_stats(struct bpf_prog *prog, } } -void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start, struct bpf_tramp_run_ctx *run_ctx) +static void notrace __bpf_prog_exit_recur(struct bpf_prog *prog, u64 start, + struct bpf_tramp_run_ctx *run_ctx) __releases(RCU) { bpf_reset_run_ctx(run_ctx->saved_run_ctx); @@ -912,8 +913,8 @@ void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start, struct bpf_tramp_ rcu_read_unlock(); } -u64 notrace __bpf_prog_enter_lsm_cgroup(struct bpf_prog *prog, - struct bpf_tramp_run_ctx *run_ctx) +static u64 notrace __bpf_prog_enter_lsm_cgroup(struct bpf_prog *prog, + struct bpf_tramp_run_ctx *run_ctx) __acquires(RCU) { /* Runtime stats are exported via actual BPF_LSM_CGROUP @@ -927,8 +928,8 @@ u64 notrace __bpf_prog_enter_lsm_cgroup(struct bpf_prog *prog, return NO_START_TIME; } -void notrace __bpf_prog_exit_lsm_cgroup(struct bpf_prog *prog, u64 start, - struct bpf_tramp_run_ctx *run_ctx) +static void notrace __bpf_prog_exit_lsm_cgroup(struct bpf_prog *prog, u64 start, + struct bpf_tramp_run_ctx *run_ctx) __releases(RCU) { bpf_reset_run_ctx(run_ctx->saved_run_ctx); @@ -937,7 +938,8 @@ void notrace __bpf_prog_exit_lsm_cgroup(struct bpf_prog *prog, u64 start, rcu_read_unlock(); } -u64 notrace __bpf_prog_enter_sleepable(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx) +u64 notrace __bpf_prog_enter_sleepable_recur(struct bpf_prog *prog, + struct bpf_tramp_run_ctx *run_ctx) { rcu_read_lock_trace(); migrate_disable(); @@ -953,8 +955,8 @@ u64 notrace __bpf_prog_enter_sleepable(struct bpf_prog *prog, struct bpf_tramp_r return bpf_prog_start_time(); } -void notrace __bpf_prog_exit_sleepable(struct bpf_prog *prog, u64 start, - struct bpf_tramp_run_ctx *run_ctx) +void notrace __bpf_prog_exit_sleepable_recur(struct bpf_prog *prog, u64 start, + struct bpf_tramp_run_ctx *run_ctx) { bpf_reset_run_ctx(run_ctx->saved_run_ctx); @@ -964,8 +966,30 @@ void notrace __bpf_prog_exit_sleepable(struct bpf_prog *prog, u64 start, rcu_read_unlock_trace(); } -u64 notrace __bpf_prog_enter_struct_ops(struct bpf_prog *prog, - struct bpf_tramp_run_ctx *run_ctx) +static u64 notrace __bpf_prog_enter_sleepable(struct bpf_prog *prog, + struct bpf_tramp_run_ctx *run_ctx) +{ + rcu_read_lock_trace(); + migrate_disable(); + might_fault(); + + run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); + + return bpf_prog_start_time(); +} + +static void notrace __bpf_prog_exit_sleepable(struct bpf_prog *prog, u64 start, + struct bpf_tramp_run_ctx *run_ctx) +{ + bpf_reset_run_ctx(run_ctx->saved_run_ctx); + + update_prog_stats(prog, start); + migrate_enable(); + rcu_read_unlock_trace(); +} + +static u64 notrace __bpf_prog_enter(struct bpf_prog *prog, + struct bpf_tramp_run_ctx *run_ctx) __acquires(RCU) { rcu_read_lock(); @@ -976,8 +1000,8 @@ u64 notrace __bpf_prog_enter_struct_ops(struct bpf_prog *prog, return bpf_prog_start_time(); } -void notrace __bpf_prog_exit_struct_ops(struct bpf_prog *prog, u64 start, - struct bpf_tramp_run_ctx *run_ctx) +static void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start, + struct bpf_tramp_run_ctx *run_ctx) __releases(RCU) { bpf_reset_run_ctx(run_ctx->saved_run_ctx); @@ -997,6 +1021,36 @@ void notrace __bpf_tramp_exit(struct bpf_tramp_image *tr) percpu_ref_put(&tr->pcref); } +bpf_trampoline_enter_t bpf_trampoline_enter(const struct bpf_prog *prog) +{ + bool sleepable = prog->aux->sleepable; + + if (bpf_prog_check_recur(prog)) + return sleepable ? __bpf_prog_enter_sleepable_recur : + __bpf_prog_enter_recur; + + if (resolve_prog_type(prog) == BPF_PROG_TYPE_LSM && + prog->expected_attach_type == BPF_LSM_CGROUP) + return __bpf_prog_enter_lsm_cgroup; + + return sleepable ? __bpf_prog_enter_sleepable : __bpf_prog_enter; +} + +bpf_trampoline_exit_t bpf_trampoline_exit(const struct bpf_prog *prog) +{ + bool sleepable = prog->aux->sleepable; + + if (bpf_prog_check_recur(prog)) + return sleepable ? __bpf_prog_exit_sleepable_recur : + __bpf_prog_exit_recur; + + if (resolve_prog_type(prog) == BPF_PROG_TYPE_LSM && + prog->expected_attach_type == BPF_LSM_CGROUP) + return __bpf_prog_exit_lsm_cgroup; + + return sleepable ? __bpf_prog_exit_sleepable : __bpf_prog_exit; +} + int __weak arch_prepare_bpf_trampoline(struct bpf_tramp_image *tr, void *image, void *image_end, const struct btf_func_model *m, u32 flags, From patchwork Tue Oct 25 18:45:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 13019714 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D64CDFA373E for ; Tue, 25 Oct 2022 18:46:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232931AbiJYSqN (ORCPT ); Tue, 25 Oct 2022 14:46:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232895AbiJYSpw (ORCPT ); Tue, 25 Oct 2022 14:45:52 -0400 Received: from out2.migadu.com (out2.migadu.com [IPv6:2001:41d0:2:aacc::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B56C65FDF8 for ; Tue, 25 Oct 2022 11:45:42 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1666723541; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eQO6PkBXdrMfi6P+tengfg05Bcu+OUV7uO90cGd3Wh4=; b=D8wNDSwte48SuRjPB4ztKH4IbTfGPKawXYdyBI/IfRm5ByeOWBNwTQeqBbtd2okq9IR3NE LJ62nnNB63ExhqwBIOVU6HMX8z1D8tfKH7wiSoGrZteZ5X5Mn4GXg5vSnGZ+CWYUclvtW2 JpRBGkus3fpqFHkTATwsMH17d19bHbI= From: Martin KaFai Lau To: bpf@vger.kernel.org Cc: 'Alexei Starovoitov ' , 'Andrii Nakryiko ' , 'Daniel Borkmann ' , 'Song Liu ' , kernel-team@meta.com Subject: [PATCH bpf-next 2/9] bpf: Append _recur naming to the bpf_task_storage helper proto Date: Tue, 25 Oct 2022 11:45:17 -0700 Message-Id: <20221025184524.3526117-3-martin.lau@linux.dev> In-Reply-To: <20221025184524.3526117-1-martin.lau@linux.dev> References: <20221025184524.3526117-1-martin.lau@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau This patch adds the "_recur" naming to the bpf_task_storage_{get,delete} proto. In a latter patch, they will only be used by the tracing programs that requires a deadlock detection because a tracing prog may use bpf_task_storage_{get,delete} recursively and cause a deadlock. Another following patch will add a different helper proto for the non tracing programs because they do not need the deadlock prevention. This patch does this rename to prepare for this future proto additions. Signed-off-by: Martin KaFai Lau --- include/linux/bpf.h | 4 ++-- kernel/bpf/bpf_task_storage.c | 12 ++++++------ kernel/trace/bpf_trace.c | 4 ++-- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 1279e699dc98..b04fe3f4342e 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2519,8 +2519,8 @@ extern const struct bpf_func_proto bpf_this_cpu_ptr_proto; extern const struct bpf_func_proto bpf_ktime_get_coarse_ns_proto; extern const struct bpf_func_proto bpf_sock_from_file_proto; extern const struct bpf_func_proto bpf_get_socket_ptr_cookie_proto; -extern const struct bpf_func_proto bpf_task_storage_get_proto; -extern const struct bpf_func_proto bpf_task_storage_delete_proto; +extern const struct bpf_func_proto bpf_task_storage_get_recur_proto; +extern const struct bpf_func_proto bpf_task_storage_delete_recur_proto; extern const struct bpf_func_proto bpf_for_each_map_elem_proto; extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto; extern const struct bpf_func_proto bpf_sk_setsockopt_proto; diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c index 6f290623347e..bce50ae03f42 100644 --- a/kernel/bpf/bpf_task_storage.c +++ b/kernel/bpf/bpf_task_storage.c @@ -228,7 +228,7 @@ static int bpf_pid_task_storage_delete_elem(struct bpf_map *map, void *key) } /* *gfp_flags* is a hidden argument provided by the verifier */ -BPF_CALL_5(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *, +BPF_CALL_5(bpf_task_storage_get_recur, struct bpf_map *, map, struct task_struct *, task, void *, value, u64, flags, gfp_t, gfp_flags) { struct bpf_local_storage_data *sdata; @@ -260,7 +260,7 @@ BPF_CALL_5(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *, (unsigned long)sdata->data; } -BPF_CALL_2(bpf_task_storage_delete, struct bpf_map *, map, struct task_struct *, +BPF_CALL_2(bpf_task_storage_delete_recur, struct bpf_map *, map, struct task_struct *, task) { int ret; @@ -322,8 +322,8 @@ const struct bpf_map_ops task_storage_map_ops = { .map_owner_storage_ptr = task_storage_ptr, }; -const struct bpf_func_proto bpf_task_storage_get_proto = { - .func = bpf_task_storage_get, +const struct bpf_func_proto bpf_task_storage_get_recur_proto = { + .func = bpf_task_storage_get_recur, .gpl_only = false, .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL, .arg1_type = ARG_CONST_MAP_PTR, @@ -333,8 +333,8 @@ const struct bpf_func_proto bpf_task_storage_get_proto = { .arg4_type = ARG_ANYTHING, }; -const struct bpf_func_proto bpf_task_storage_delete_proto = { - .func = bpf_task_storage_delete, +const struct bpf_func_proto bpf_task_storage_delete_recur_proto = { + .func = bpf_task_storage_delete_recur, .gpl_only = false, .ret_type = RET_INTEGER, .arg1_type = ARG_CONST_MAP_PTR, diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 49fb9ec8366d..591caf0eb973 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -1488,9 +1488,9 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) case BPF_FUNC_this_cpu_ptr: return &bpf_this_cpu_ptr_proto; case BPF_FUNC_task_storage_get: - return &bpf_task_storage_get_proto; + return &bpf_task_storage_get_recur_proto; case BPF_FUNC_task_storage_delete: - return &bpf_task_storage_delete_proto; + return &bpf_task_storage_delete_recur_proto; case BPF_FUNC_for_each_map_elem: return &bpf_for_each_map_elem_proto; case BPF_FUNC_snprintf: From patchwork Tue Oct 25 18:45:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 13019715 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3131AFA373E for ; Tue, 25 Oct 2022 18:46:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232938AbiJYSqP (ORCPT ); Tue, 25 Oct 2022 14:46:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48126 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232898AbiJYSpw (ORCPT ); Tue, 25 Oct 2022 14:45:52 -0400 Received: from out2.migadu.com (out2.migadu.com [188.165.223.204]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9309A69182 for ; Tue, 25 Oct 2022 11:45:45 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1666723543; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=U2vo90GX2ReVHnobyCn7t/Qs/5AyO2Em2JVC4c9JtA8=; b=aSOMUQYvBJ1wWwagYh/HALq22YOIcCtdw2yo9yeXvFV3Rs4EyNYSv1+BktSfZDIVa5bES7 2exgG0Jz/v87Pny2+yeMt5sPDEUAJDBYk4aF8aqrhgnrhzffWUrD9+QcIK09W81FCVRzNU bqwWKRbFzXpo6+bcMsTsMmAH6esNoto= From: Martin KaFai Lau To: bpf@vger.kernel.org Cc: 'Alexei Starovoitov ' , 'Andrii Nakryiko ' , 'Daniel Borkmann ' , 'Song Liu ' , kernel-team@meta.com Subject: [PATCH bpf-next 3/9] bpf: Refactor the core bpf_task_storage_get logic into a new function Date: Tue, 25 Oct 2022 11:45:18 -0700 Message-Id: <20221025184524.3526117-4-martin.lau@linux.dev> In-Reply-To: <20221025184524.3526117-1-martin.lau@linux.dev> References: <20221025184524.3526117-1-martin.lau@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau This patch creates a new function __bpf_task_storage_get() and moves the core logic of the existing bpf_task_storage_get() into this new function. This new function will be shared by another new helper proto in the latter patch. Signed-off-by: Martin KaFai Lau --- kernel/bpf/bpf_task_storage.c | 44 +++++++++++++++++++++-------------- 1 file changed, 26 insertions(+), 18 deletions(-) diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c index bce50ae03f42..2726435e3eda 100644 --- a/kernel/bpf/bpf_task_storage.c +++ b/kernel/bpf/bpf_task_storage.c @@ -227,37 +227,45 @@ static int bpf_pid_task_storage_delete_elem(struct bpf_map *map, void *key) return err; } -/* *gfp_flags* is a hidden argument provided by the verifier */ -BPF_CALL_5(bpf_task_storage_get_recur, struct bpf_map *, map, struct task_struct *, - task, void *, value, u64, flags, gfp_t, gfp_flags) +/* Called by bpf_task_storage_get*() helpers */ +static void *__bpf_task_storage_get(struct bpf_map *map, + struct task_struct *task, void *value, + u64 flags, gfp_t gfp_flags) { struct bpf_local_storage_data *sdata; - WARN_ON_ONCE(!bpf_rcu_lock_held()); - if (flags & ~(BPF_LOCAL_STORAGE_GET_F_CREATE)) - return (unsigned long)NULL; - - if (!task) - return (unsigned long)NULL; - - if (!bpf_task_storage_trylock()) - return (unsigned long)NULL; - sdata = task_storage_lookup(task, map, true); if (sdata) - goto unlock; + return sdata->data; /* only allocate new storage, when the task is refcounted */ if (refcount_read(&task->usage) && - (flags & BPF_LOCAL_STORAGE_GET_F_CREATE)) + (flags & BPF_LOCAL_STORAGE_GET_F_CREATE)) { sdata = bpf_local_storage_update( task, (struct bpf_local_storage_map *)map, value, BPF_NOEXIST, gfp_flags); + return IS_ERR(sdata) ? NULL : sdata->data; + } -unlock: + return NULL; +} + +/* *gfp_flags* is a hidden argument provided by the verifier */ +BPF_CALL_5(bpf_task_storage_get_recur, struct bpf_map *, map, struct task_struct *, + task, void *, value, u64, flags, gfp_t, gfp_flags) +{ + void *data; + + WARN_ON_ONCE(!bpf_rcu_lock_held()); + if (flags & ~BPF_LOCAL_STORAGE_GET_F_CREATE || !task) + return (unsigned long)NULL; + + if (!bpf_task_storage_trylock()) + return (unsigned long)NULL; + data = __bpf_task_storage_get(map, task, value, flags, + gfp_flags); bpf_task_storage_unlock(); - return IS_ERR_OR_NULL(sdata) ? (unsigned long)NULL : - (unsigned long)sdata->data; + return (unsigned long)data; } BPF_CALL_2(bpf_task_storage_delete_recur, struct bpf_map *, map, struct task_struct *, From patchwork Tue Oct 25 18:45:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 13019716 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66674FA3740 for ; Tue, 25 Oct 2022 18:46:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232860AbiJYSqS (ORCPT ); Tue, 25 Oct 2022 14:46:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47154 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232883AbiJYSqL (ORCPT ); Tue, 25 Oct 2022 14:46:11 -0400 Received: from out2.migadu.com (out2.migadu.com [IPv6:2001:41d0:2:aacc::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2C7E66D85E for ; Tue, 25 Oct 2022 11:45:47 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1666723545; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mYqDFUp3qEFv/Og4QdrUPNe1D2znr6G+dC04OZPmTB8=; b=Cr33Je17hB4Gm+r9Ged+Fa9Unx7UWLuhNQmnQJhKTXO8/jIZXkS61Gx2XJXlfbOlIAiBMY 7wqXPVu9oX/JwtcdSMn9DfvOp9dgUhw5P6/gomZtJjEMJ84yI4XOEfxnKNPHDtY0DW3BaJ 5GVwNj8b6R+E95NACC+C+VZOoD9RhT0= From: Martin KaFai Lau To: bpf@vger.kernel.org Cc: 'Alexei Starovoitov ' , 'Andrii Nakryiko ' , 'Daniel Borkmann ' , 'Song Liu ' , kernel-team@meta.com Subject: [PATCH bpf-next 4/9] bpf: Avoid taking spinlock in bpf_task_storage_get if potential deadlock is detected Date: Tue, 25 Oct 2022 11:45:19 -0700 Message-Id: <20221025184524.3526117-5-martin.lau@linux.dev> In-Reply-To: <20221025184524.3526117-1-martin.lau@linux.dev> References: <20221025184524.3526117-1-martin.lau@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau bpf_task_storage_get() does a lookup and optionally inserts new data if BPF_LOCAL_STORAGE_GET_F_CREATE is present. During lookup, it will cache the lookup result and caching requires to acquire a spinlock. When potential deadlock is detected (by the bpf_task_storage_busy pcpu-counter added in commit bc235cdb423a ("bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]")), the current behavior is returning NULL immediately to avoid deadlock. It is too pessimistic. This patch will go ahead to do a lookup (which is a lockless operation) but it will avoid caching it in order to avoid acquiring the spinlock. When lookup fails to find the data and BPF_LOCAL_STORAGE_GET_F_CREATE is set, an insertion is needed and this requires acquiring a spinlock. This patch will still return NULL when a potential deadlock is detected. Signed-off-by: Martin KaFai Lau --- kernel/bpf/bpf_local_storage.c | 1 + kernel/bpf/bpf_task_storage.c | 15 ++++++++------- 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c index 9dc6de1cf185..781d14167140 100644 --- a/kernel/bpf/bpf_local_storage.c +++ b/kernel/bpf/bpf_local_storage.c @@ -242,6 +242,7 @@ void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool use_trace_rcu) __bpf_selem_unlink_storage(selem, use_trace_rcu); } +/* If cacheit_lockit is false, this lookup function is lockless */ struct bpf_local_storage_data * bpf_local_storage_lookup(struct bpf_local_storage *local_storage, struct bpf_local_storage_map *smap, diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c index 2726435e3eda..bc52bc8b59f7 100644 --- a/kernel/bpf/bpf_task_storage.c +++ b/kernel/bpf/bpf_task_storage.c @@ -230,17 +230,17 @@ static int bpf_pid_task_storage_delete_elem(struct bpf_map *map, void *key) /* Called by bpf_task_storage_get*() helpers */ static void *__bpf_task_storage_get(struct bpf_map *map, struct task_struct *task, void *value, - u64 flags, gfp_t gfp_flags) + u64 flags, gfp_t gfp_flags, bool nobusy) { struct bpf_local_storage_data *sdata; - sdata = task_storage_lookup(task, map, true); + sdata = task_storage_lookup(task, map, nobusy); if (sdata) return sdata->data; /* only allocate new storage, when the task is refcounted */ if (refcount_read(&task->usage) && - (flags & BPF_LOCAL_STORAGE_GET_F_CREATE)) { + (flags & BPF_LOCAL_STORAGE_GET_F_CREATE) && nobusy) { sdata = bpf_local_storage_update( task, (struct bpf_local_storage_map *)map, value, BPF_NOEXIST, gfp_flags); @@ -254,17 +254,18 @@ static void *__bpf_task_storage_get(struct bpf_map *map, BPF_CALL_5(bpf_task_storage_get_recur, struct bpf_map *, map, struct task_struct *, task, void *, value, u64, flags, gfp_t, gfp_flags) { + bool nobusy; void *data; WARN_ON_ONCE(!bpf_rcu_lock_held()); if (flags & ~BPF_LOCAL_STORAGE_GET_F_CREATE || !task) return (unsigned long)NULL; - if (!bpf_task_storage_trylock()) - return (unsigned long)NULL; + nobusy = bpf_task_storage_trylock(); data = __bpf_task_storage_get(map, task, value, flags, - gfp_flags); - bpf_task_storage_unlock(); + gfp_flags, nobusy); + if (nobusy) + bpf_task_storage_unlock(); return (unsigned long)data; } From patchwork Tue Oct 25 18:45:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 13019717 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 883DBFA373E for ; Tue, 25 Oct 2022 18:46:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232915AbiJYSqT (ORCPT ); Tue, 25 Oct 2022 14:46:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50218 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232919AbiJYSqL (ORCPT ); Tue, 25 Oct 2022 14:46:11 -0400 Received: from out2.migadu.com (out2.migadu.com [IPv6:2001:41d0:2:aacc::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 66F4373328 for ; Tue, 25 Oct 2022 11:45:49 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1666723548; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=w6PZ4VUfWyonNk0AZVq2qxMIyaU3cM8VoD81Jl31FMQ=; b=kfYnpeAg3s6DsHX82swSxH1kKSxYrD7gK+f+eixB4ihOsLX6iCg7qe9kupKIcBbKeBSr+Y Du2TnM/8CwP1KLNV0zfET033nNvqaVXcMG5ynb+OFrveV2tFnNmJt/byInxIg1e87MI6Rn AJNiTEh0AYHywQVBWfFtw2DP5lO/29o= From: Martin KaFai Lau To: bpf@vger.kernel.org Cc: 'Alexei Starovoitov ' , 'Andrii Nakryiko ' , 'Daniel Borkmann ' , 'Song Liu ' , kernel-team@meta.com Subject: [PATCH bpf-next 5/9] bpf: Add new bpf_task_storage_get proto with no deadlock detection Date: Tue, 25 Oct 2022 11:45:20 -0700 Message-Id: <20221025184524.3526117-6-martin.lau@linux.dev> In-Reply-To: <20221025184524.3526117-1-martin.lau@linux.dev> References: <20221025184524.3526117-1-martin.lau@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau The bpf_lsm and bpf_iter do not recur that will cause a deadlock. The situation is similar to the bpf_pid_task_storage_lookup_elem() which is called from the syscall map_lookup_elem. It does not need deadlock detection. Otherwise, it will cause unnecessary failure when calling the bpf_task_storage_get() helper. This patch adds bpf_task_storage_get proto that does not do deadlock detection. It will be used by bpf_lsm and bpf_iter programs. Signed-off-by: Martin KaFai Lau --- include/linux/bpf.h | 1 + kernel/bpf/bpf_task_storage.c | 28 ++++++++++++++++++++++++++++ kernel/trace/bpf_trace.c | 5 ++++- 3 files changed, 33 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index b04fe3f4342e..ef3f98afa45d 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2520,6 +2520,7 @@ extern const struct bpf_func_proto bpf_ktime_get_coarse_ns_proto; extern const struct bpf_func_proto bpf_sock_from_file_proto; extern const struct bpf_func_proto bpf_get_socket_ptr_cookie_proto; extern const struct bpf_func_proto bpf_task_storage_get_recur_proto; +extern const struct bpf_func_proto bpf_task_storage_get_proto; extern const struct bpf_func_proto bpf_task_storage_delete_recur_proto; extern const struct bpf_func_proto bpf_for_each_map_elem_proto; extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto; diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c index bc52bc8b59f7..c3a841be438f 100644 --- a/kernel/bpf/bpf_task_storage.c +++ b/kernel/bpf/bpf_task_storage.c @@ -269,6 +269,23 @@ BPF_CALL_5(bpf_task_storage_get_recur, struct bpf_map *, map, struct task_struct return (unsigned long)data; } +/* *gfp_flags* is a hidden argument provided by the verifier */ +BPF_CALL_5(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *, + task, void *, value, u64, flags, gfp_t, gfp_flags) +{ + void *data; + + WARN_ON_ONCE(!bpf_rcu_lock_held()); + if (flags & ~BPF_LOCAL_STORAGE_GET_F_CREATE || !task) + return (unsigned long)NULL; + + bpf_task_storage_lock(); + data = __bpf_task_storage_get(map, task, value, flags, + gfp_flags, true); + bpf_task_storage_unlock(); + return (unsigned long)data; +} + BPF_CALL_2(bpf_task_storage_delete_recur, struct bpf_map *, map, struct task_struct *, task) { @@ -342,6 +359,17 @@ const struct bpf_func_proto bpf_task_storage_get_recur_proto = { .arg4_type = ARG_ANYTHING, }; +const struct bpf_func_proto bpf_task_storage_get_proto = { + .func = bpf_task_storage_get, + .gpl_only = false, + .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL, + .arg1_type = ARG_CONST_MAP_PTR, + .arg2_type = ARG_PTR_TO_BTF_ID, + .arg2_btf_id = &btf_tracing_ids[BTF_TRACING_TYPE_TASK], + .arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL, + .arg4_type = ARG_ANYTHING, +}; + const struct bpf_func_proto bpf_task_storage_delete_recur_proto = { .func = bpf_task_storage_delete_recur, .gpl_only = false, diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 591caf0eb973..0986c1f0b8fc 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -1488,7 +1489,9 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) case BPF_FUNC_this_cpu_ptr: return &bpf_this_cpu_ptr_proto; case BPF_FUNC_task_storage_get: - return &bpf_task_storage_get_recur_proto; + if (bpf_prog_check_recur(prog)) + return &bpf_task_storage_get_recur_proto; + return &bpf_task_storage_get_proto; case BPF_FUNC_task_storage_delete: return &bpf_task_storage_delete_recur_proto; case BPF_FUNC_for_each_map_elem: From patchwork Tue Oct 25 18:45:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 13019718 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F4CEC38A2D for ; Tue, 25 Oct 2022 18:46:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232846AbiJYSqW (ORCPT ); Tue, 25 Oct 2022 14:46:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50426 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232925AbiJYSqM (ORCPT ); Tue, 25 Oct 2022 14:46:12 -0400 Received: from out2.migadu.com (out2.migadu.com [188.165.223.204]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF91E7C183 for ; Tue, 25 Oct 2022 11:45:51 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1666723550; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=U65HfBA0C20uBOwI9V1N9lStQVA6wiJc2iT+c+d93ps=; b=jrPM4hwPfxJFommyFzOYiJDrlSqifQshC/JXHWyMqLha0n62qC0qVklwfohtFovkvO0Q5Y XveG8QonaII+xAjSWzyLtKzIaDrduskv9pInoLg2BOWiOOztruo94ZCQ3tBcSYPhEGjTDe usNS9X5LVFcX8BRBUfJ22UKdFUUWzmA= From: Martin KaFai Lau To: bpf@vger.kernel.org Cc: 'Alexei Starovoitov ' , 'Andrii Nakryiko ' , 'Daniel Borkmann ' , 'Song Liu ' , kernel-team@meta.com Subject: [PATCH bpf-next 6/9] bpf: bpf_task_storage_delete_recur does lookup first before the deadlock check Date: Tue, 25 Oct 2022 11:45:21 -0700 Message-Id: <20221025184524.3526117-7-martin.lau@linux.dev> In-Reply-To: <20221025184524.3526117-1-martin.lau@linux.dev> References: <20221025184524.3526117-1-martin.lau@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau Similar to the earlier change in bpf_task_storage_get_recur. This patch changes bpf_task_storage_delete_recur such that it does the lookup first. It only returns -EBUSY if it needs to take the spinlock to do the deletion when potential deadlock is detected. Signed-off-by: Martin KaFai Lau --- kernel/bpf/bpf_task_storage.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c index c3a841be438f..f3f79b618a68 100644 --- a/kernel/bpf/bpf_task_storage.c +++ b/kernel/bpf/bpf_task_storage.c @@ -184,7 +184,8 @@ static int bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key, return err; } -static int task_storage_delete(struct task_struct *task, struct bpf_map *map) +static int task_storage_delete(struct task_struct *task, struct bpf_map *map, + bool nobusy) { struct bpf_local_storage_data *sdata; @@ -192,6 +193,9 @@ static int task_storage_delete(struct task_struct *task, struct bpf_map *map) if (!sdata) return -ENOENT; + if (!nobusy) + return -EBUSY; + bpf_selem_unlink(SELEM(sdata), true); return 0; @@ -220,7 +224,7 @@ static int bpf_pid_task_storage_delete_elem(struct bpf_map *map, void *key) } bpf_task_storage_lock(); - err = task_storage_delete(task, map); + err = task_storage_delete(task, map, true); bpf_task_storage_unlock(); out: put_pid(pid); @@ -289,21 +293,21 @@ BPF_CALL_5(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *, BPF_CALL_2(bpf_task_storage_delete_recur, struct bpf_map *, map, struct task_struct *, task) { + bool nobusy; int ret; WARN_ON_ONCE(!bpf_rcu_lock_held()); if (!task) return -EINVAL; - if (!bpf_task_storage_trylock()) - return -EBUSY; - + nobusy = bpf_task_storage_trylock(); /* This helper must only be called from places where the lifetime of the task * is guaranteed. Either by being refcounted or by being protected * by an RCU read-side critical section. */ - ret = task_storage_delete(task, map); - bpf_task_storage_unlock(); + ret = task_storage_delete(task, map, nobusy); + if (nobusy) + bpf_task_storage_unlock(); return ret; } From patchwork Tue Oct 25 18:45:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 13019719 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB850FA373E for ; Tue, 25 Oct 2022 18:46:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233005AbiJYSqX (ORCPT ); Tue, 25 Oct 2022 14:46:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50516 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232935AbiJYSqN (ORCPT ); Tue, 25 Oct 2022 14:46:13 -0400 Received: from out2.migadu.com (out2.migadu.com [IPv6:2001:41d0:2:aacc::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2111180E86 for ; Tue, 25 Oct 2022 11:45:53 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1666723552; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dXq73w+LtHF+WdGDeIELxMLg1o/6f66BcLKsE7yjHtk=; b=ptDqyJ5Rb1dVSCYExQylZ8SHMYjUKr30CxyKnguu1rd7qmsGIB5oV0BOBoojq1464XeqQ2 3H8XZy6DNOOg/YSf7xO6REvW1CAMJEN8VyTsVjUnxgRXHH1qmnkpqF/xkd/dNB8lxH2ch/ Ph49bBi6kl60VoMzOmwhXYm1VUjsq0c= From: Martin KaFai Lau To: bpf@vger.kernel.org Cc: 'Alexei Starovoitov ' , 'Andrii Nakryiko ' , 'Daniel Borkmann ' , 'Song Liu ' , kernel-team@meta.com Subject: [PATCH bpf-next 7/9] bpf: Add new bpf_task_storage_delete proto with no deadlock detection Date: Tue, 25 Oct 2022 11:45:22 -0700 Message-Id: <20221025184524.3526117-8-martin.lau@linux.dev> In-Reply-To: <20221025184524.3526117-1-martin.lau@linux.dev> References: <20221025184524.3526117-1-martin.lau@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau The bpf_lsm and bpf_iter do not recur that will cause a deadlock. The situation is similar to the bpf_pid_task_storage_delete_elem() which is called from the syscall map_delete_elem. It does not need deadlock detection. Otherwise, it will cause unnecessary failure when calling the bpf_task_storage_delete() helper. This patch adds bpf_task_storage_delete proto that does not do deadlock detection. It will be used by bpf_lsm and bpf_iter program. Signed-off-by: Martin KaFai Lau --- include/linux/bpf.h | 1 + kernel/bpf/bpf_task_storage.c | 28 ++++++++++++++++++++++++++++ kernel/trace/bpf_trace.c | 4 +++- 3 files changed, 32 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index ef3f98afa45d..a5dbac8f5aba 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2522,6 +2522,7 @@ extern const struct bpf_func_proto bpf_get_socket_ptr_cookie_proto; extern const struct bpf_func_proto bpf_task_storage_get_recur_proto; extern const struct bpf_func_proto bpf_task_storage_get_proto; extern const struct bpf_func_proto bpf_task_storage_delete_recur_proto; +extern const struct bpf_func_proto bpf_task_storage_delete_proto; extern const struct bpf_func_proto bpf_for_each_map_elem_proto; extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto; extern const struct bpf_func_proto bpf_sk_setsockopt_proto; diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c index f3f79b618a68..ba3fe72d1fa5 100644 --- a/kernel/bpf/bpf_task_storage.c +++ b/kernel/bpf/bpf_task_storage.c @@ -311,6 +311,25 @@ BPF_CALL_2(bpf_task_storage_delete_recur, struct bpf_map *, map, struct task_str return ret; } +BPF_CALL_2(bpf_task_storage_delete, struct bpf_map *, map, struct task_struct *, + task) +{ + int ret; + + WARN_ON_ONCE(!bpf_rcu_lock_held()); + if (!task) + return -EINVAL; + + bpf_task_storage_lock(); + /* This helper must only be called from places where the lifetime of the task + * is guaranteed. Either by being refcounted or by being protected + * by an RCU read-side critical section. + */ + ret = task_storage_delete(task, map, true); + bpf_task_storage_unlock(); + return ret; +} + static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key) { return -ENOTSUPP; @@ -382,3 +401,12 @@ const struct bpf_func_proto bpf_task_storage_delete_recur_proto = { .arg2_type = ARG_PTR_TO_BTF_ID, .arg2_btf_id = &btf_tracing_ids[BTF_TRACING_TYPE_TASK], }; + +const struct bpf_func_proto bpf_task_storage_delete_proto = { + .func = bpf_task_storage_delete, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_CONST_MAP_PTR, + .arg2_type = ARG_PTR_TO_BTF_ID, + .arg2_btf_id = &btf_tracing_ids[BTF_TRACING_TYPE_TASK], +}; diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 0986c1f0b8fc..139966c74a5a 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -1493,7 +1493,9 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_task_storage_get_recur_proto; return &bpf_task_storage_get_proto; case BPF_FUNC_task_storage_delete: - return &bpf_task_storage_delete_recur_proto; + if (bpf_prog_check_recur(prog)) + return &bpf_task_storage_delete_recur_proto; + return &bpf_task_storage_delete_proto; case BPF_FUNC_for_each_map_elem: return &bpf_for_each_map_elem_proto; case BPF_FUNC_snprintf: From patchwork Tue Oct 25 18:45:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 13019720 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 685FDC38A2D for ; Tue, 25 Oct 2022 18:46:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233015AbiJYSq0 (ORCPT ); Tue, 25 Oct 2022 14:46:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47154 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232948AbiJYSqS (ORCPT ); Tue, 25 Oct 2022 14:46:18 -0400 Received: from out2.migadu.com (out2.migadu.com [IPv6:2001:41d0:2:aacc::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 187E315712 for ; Tue, 25 Oct 2022 11:45:56 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1666723554; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LahaojD6bofe18UT9gbUbHE4FR4B7FwVw0cyJG0A0rY=; b=mSA75VC/LlSWtEr8kfQ1LCcgUyBYx4cyaxV3ctODtx0avBGi+OkS5eAgSMBbmoWm7Sf3Ow KP/cq270rnGt03NHX0EGoAhYCjLGMFuo1pzmykMPPjj2gQ6Jg2H4zRsLszczDm+x/XKJDC JDVxg0EaZQNIMNqLiXZcmonH5NaJgUs= From: Martin KaFai Lau To: bpf@vger.kernel.org Cc: 'Alexei Starovoitov ' , 'Andrii Nakryiko ' , 'Daniel Borkmann ' , 'Song Liu ' , kernel-team@meta.com Subject: [PATCH bpf-next 8/9] selftests/bpf: Ensure no task storage failure for bpf_lsm.s prog due to deadlock detection Date: Tue, 25 Oct 2022 11:45:23 -0700 Message-Id: <20221025184524.3526117-9-martin.lau@linux.dev> In-Reply-To: <20221025184524.3526117-1-martin.lau@linux.dev> References: <20221025184524.3526117-1-martin.lau@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau This patch adds a test to check for deadlock failure in bpf_task_storage_{get,delete} when called by a sleepable bpf_lsm prog. It also checks if the prog_info.recursion_misses is non zero. The test starts with 32 threads and they are affinitized to one cpu. In my qemu setup, with CONFIG_PREEMPT=y, I can reproduce it within one second if it is run without the previous patches of this set. Here is the test error message before adding the no deadlock detection version of the bpf_task_storage_{get,delete}: test_nodeadlock:FAIL:bpf_task_storage_get busy unexpected bpf_task_storage_get busy: actual 2 != expected 0 test_nodeadlock:FAIL:bpf_task_storage_delete busy unexpected bpf_task_storage_delete busy: actual 2 != expected 0 Signed-off-by: Martin KaFai Lau --- .../bpf/prog_tests/task_local_storage.c | 98 +++++++++++++++++++ .../bpf/progs/task_storage_nodeadlock.c | 47 +++++++++ 2 files changed, 145 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/task_storage_nodeadlock.c diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c index 99a42a2b6e14..ae535f5de6a2 100644 --- a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c +++ b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c @@ -3,12 +3,15 @@ #define _GNU_SOURCE /* See feature_test_macros(7) */ #include +#include +#include #include /* For SYS_xxx definitions */ #include #include #include "task_local_storage.skel.h" #include "task_local_storage_exit_creds.skel.h" #include "task_ls_recursion.skel.h" +#include "task_storage_nodeadlock.skel.h" static void test_sys_enter_exit(void) { @@ -93,6 +96,99 @@ static void test_recursion(void) task_ls_recursion__destroy(skel); } +static bool stop; + +static void waitall(const pthread_t *tids, int nr) +{ + int i; + + stop = true; + for (i = 0; i < nr; i++) + pthread_join(tids[i], NULL); +} + +static void *sock_create_loop(void *arg) +{ + struct task_storage_nodeadlock *skel = arg; + int fd; + + while (!stop) { + fd = socket(AF_INET, SOCK_STREAM, 0); + close(fd); + if (skel->bss->nr_get_errs || skel->bss->nr_del_errs) + stop = true; + } + + return NULL; +} + +static void test_nodeadlock(void) +{ + struct task_storage_nodeadlock *skel; + struct bpf_prog_info info = {}; + __u32 info_len = sizeof(info); + const int nr_threads = 32; + pthread_t tids[nr_threads]; + int i, prog_fd, err; + cpu_set_t old, new; + + /* Pin all threads to one cpu to increase the chance of preemption + * in a sleepable bpf prog. + */ + CPU_ZERO(&new); + CPU_SET(0, &new); + err = sched_getaffinity(getpid(), sizeof(old), &old); + if (!ASSERT_OK(err, "getaffinity")) + return; + err = sched_setaffinity(getpid(), sizeof(new), &new); + if (!ASSERT_OK(err, "setaffinity")) + return; + + skel = task_storage_nodeadlock__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open_and_load")) + goto done; + + /* Unnecessary recursion and deadlock detection are reproducible + * in the preemptible kernel. + */ + if (!skel->kconfig->CONFIG_PREEMPT) { + test__skip(); + goto done; + } + + err = task_storage_nodeadlock__attach(skel); + ASSERT_OK(err, "attach prog"); + + for (i = 0; i < nr_threads; i++) { + err = pthread_create(&tids[i], NULL, sock_create_loop, skel); + if (err) { + /* Only assert once here to avoid excessive + * PASS printing during test failure. + */ + ASSERT_OK(err, "pthread_create"); + waitall(tids, i); + goto done; + } + } + + /* With 32 threads, 1s is enough to reproduce the issue */ + sleep(1); + waitall(tids, nr_threads); + + info_len = sizeof(info); + prog_fd = bpf_program__fd(skel->progs.socket_post_create); + err = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len); + ASSERT_OK(err, "get prog info"); + ASSERT_EQ(info.recursion_misses, 0, "prog recursion"); + + ASSERT_EQ(skel->bss->nr_get_errs, 0, "bpf_task_storage_get busy"); + ASSERT_EQ(skel->bss->nr_del_errs, 0, "bpf_task_storage_delete busy"); + +done: + task_storage_nodeadlock__destroy(skel); + sched_setaffinity(getpid(), sizeof(old), &old); +} + void test_task_local_storage(void) { if (test__start_subtest("sys_enter_exit")) @@ -101,4 +197,6 @@ void test_task_local_storage(void) test_exit_creds(); if (test__start_subtest("recursion")) test_recursion(); + if (test__start_subtest("nodeadlock")) + test_nodeadlock(); } diff --git a/tools/testing/selftests/bpf/progs/task_storage_nodeadlock.c b/tools/testing/selftests/bpf/progs/task_storage_nodeadlock.c new file mode 100644 index 000000000000..ea2dbb80f7b3 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/task_storage_nodeadlock.c @@ -0,0 +1,47 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "vmlinux.h" +#include +#include + +char _license[] SEC("license") = "GPL"; + +#ifndef EBUSY +#define EBUSY 16 +#endif + +extern bool CONFIG_PREEMPT __kconfig __weak; +int nr_get_errs = 0; +int nr_del_errs = 0; + +struct { + __uint(type, BPF_MAP_TYPE_TASK_STORAGE); + __uint(map_flags, BPF_F_NO_PREALLOC); + __type(key, int); + __type(value, int); +} task_storage SEC(".maps"); + +SEC("lsm.s/socket_post_create") +int BPF_PROG(socket_post_create, struct socket *sock, int family, int type, + int protocol, int kern) +{ + struct task_struct *task; + int ret, zero = 0; + int *value; + + if (!CONFIG_PREEMPT) + return 0; + + task = bpf_get_current_task_btf(); + value = bpf_task_storage_get(&task_storage, task, &zero, + BPF_LOCAL_STORAGE_GET_F_CREATE); + if (!value) + __sync_fetch_and_add(&nr_get_errs, 1); + + ret = bpf_task_storage_delete(&task_storage, + bpf_get_current_task_btf()); + if (ret == -EBUSY) + __sync_fetch_and_add(&nr_del_errs, 1); + + return 0; +} From patchwork Tue Oct 25 18:45:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 13019721 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C4ABFA373E for ; Tue, 25 Oct 2022 18:46:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232949AbiJYSq2 (ORCPT ); Tue, 25 Oct 2022 14:46:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51200 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232952AbiJYSqS (ORCPT ); Tue, 25 Oct 2022 14:46:18 -0400 Received: from out2.migadu.com (out2.migadu.com [188.165.223.204]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46250895F8 for ; Tue, 25 Oct 2022 11:45:58 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1666723556; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JsqydMh/Z8t373qWG4F0IpAsNPYXlMnSRuTTTINYgE4=; b=ee8PXCyPAwW3ExwTev1Hvtad3v/hkhdbVjKFhHu5O0yE2/YTUDjMurSf0PxeymyXUeMr/j qT5UhEZP5XbAvOL6ng1HGuZYWnCfviIMA3oYqA3XtjMOZhueu5hNNiEu+Nh7A5aWVXdq+D squ/yKp2+1xntrkz/7S0LuG8BQ1/MqQ= From: Martin KaFai Lau To: bpf@vger.kernel.org Cc: 'Alexei Starovoitov ' , 'Andrii Nakryiko ' , 'Daniel Borkmann ' , 'Song Liu ' , kernel-team@meta.com Subject: [PATCH bpf-next 9/9] selftests/bpf: Tracing prog can still do lookup under busy lock Date: Tue, 25 Oct 2022 11:45:24 -0700 Message-Id: <20221025184524.3526117-10-martin.lau@linux.dev> In-Reply-To: <20221025184524.3526117-1-martin.lau@linux.dev> References: <20221025184524.3526117-1-martin.lau@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau This patch modifies the task_ls_recursion test to check that the first bpf_task_storage_get(&map_a, ...) in BPF_PROG(on_update) can still do the lockless lookup even it cannot acquire the percpu busy lock. If the lookup succeeds, it will increment the value by 1 and the value in the task storage map_a will become 200+1=201. After that, BPF_PROG(on_update) tries to delete from map_a and should get -EBUSY because it cannot acquire the percpu busy lock after finding the data. Signed-off-by: Martin KaFai Lau --- .../bpf/prog_tests/task_local_storage.c | 48 ++++++++++++++++++- .../selftests/bpf/progs/task_ls_recursion.c | 43 +++++++++++++++-- 2 files changed, 86 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c index ae535f5de6a2..a176bd75a748 100644 --- a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c +++ b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c @@ -8,6 +8,7 @@ #include /* For SYS_xxx definitions */ #include #include +#include "task_local_storage_helpers.h" #include "task_local_storage.skel.h" #include "task_local_storage_exit_creds.skel.h" #include "task_ls_recursion.skel.h" @@ -78,21 +79,64 @@ static void test_exit_creds(void) static void test_recursion(void) { + int err, map_fd, prog_fd, task_fd; struct task_ls_recursion *skel; - int err; + struct bpf_prog_info info; + __u32 info_len = sizeof(info); + long value; + + task_fd = sys_pidfd_open(getpid(), 0); + if (!ASSERT_NEQ(task_fd, -1, "sys_pidfd_open")) + return; skel = task_ls_recursion__open_and_load(); if (!ASSERT_OK_PTR(skel, "skel_open_and_load")) - return; + goto out; err = task_ls_recursion__attach(skel); if (!ASSERT_OK(err, "skel_attach")) goto out; /* trigger sys_enter, make sure it does not cause deadlock */ + skel->bss->test_pid = getpid(); syscall(SYS_gettid); + skel->bss->test_pid = 0; + task_ls_recursion__detach(skel); + + /* Refer to the comment in BPF_PROG(on_update) for + * the explanation on the value 201 and 100. + */ + map_fd = bpf_map__fd(skel->maps.map_a); + err = bpf_map_lookup_elem(map_fd, &task_fd, &value); + ASSERT_OK(err, "lookup map_a"); + ASSERT_EQ(value, 201, "map_a value"); + ASSERT_EQ(skel->bss->nr_del_errs, 1, "bpf_task_storage_delete busy"); + + map_fd = bpf_map__fd(skel->maps.map_b); + err = bpf_map_lookup_elem(map_fd, &task_fd, &value); + ASSERT_OK(err, "lookup map_b"); + ASSERT_EQ(value, 100, "map_b value"); + + prog_fd = bpf_program__fd(skel->progs.on_lookup); + memset(&info, 0, sizeof(info)); + err = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len); + ASSERT_OK(err, "get prog info"); + ASSERT_GT(info.recursion_misses, 0, "on_lookup prog recursion"); + + prog_fd = bpf_program__fd(skel->progs.on_update); + memset(&info, 0, sizeof(info)); + err = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len); + ASSERT_OK(err, "get prog info"); + ASSERT_EQ(info.recursion_misses, 0, "on_update prog recursion"); + + prog_fd = bpf_program__fd(skel->progs.on_enter); + memset(&info, 0, sizeof(info)); + err = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len); + ASSERT_OK(err, "get prog info"); + ASSERT_EQ(info.recursion_misses, 0, "on_enter prog recursion"); out: + close(task_fd); task_ls_recursion__destroy(skel); } diff --git a/tools/testing/selftests/bpf/progs/task_ls_recursion.c b/tools/testing/selftests/bpf/progs/task_ls_recursion.c index 564583dca7c8..4542dc683b44 100644 --- a/tools/testing/selftests/bpf/progs/task_ls_recursion.c +++ b/tools/testing/selftests/bpf/progs/task_ls_recursion.c @@ -5,7 +5,13 @@ #include #include +#ifndef EBUSY +#define EBUSY 16 +#endif + char _license[] SEC("license") = "GPL"; +int nr_del_errs = 0; +int test_pid = 0; struct { __uint(type, BPF_MAP_TYPE_TASK_STORAGE); @@ -26,6 +32,13 @@ int BPF_PROG(on_lookup) { struct task_struct *task = bpf_get_current_task_btf(); + if (!test_pid || task->pid != test_pid) + return 0; + + /* The bpf_task_storage_delete will call + * bpf_local_storage_lookup. The prog->active will + * stop the recursion. + */ bpf_task_storage_delete(&map_a, task); bpf_task_storage_delete(&map_b, task); return 0; @@ -37,11 +50,32 @@ int BPF_PROG(on_update) struct task_struct *task = bpf_get_current_task_btf(); long *ptr; + if (!test_pid || task->pid != test_pid) + return 0; + ptr = bpf_task_storage_get(&map_a, task, 0, BPF_LOCAL_STORAGE_GET_F_CREATE); - if (ptr) + /* ptr will not be NULL when it is called from + * the bpf_task_storage_get(&map_b,...F_CREATE) in + * the BPF_PROG(on_enter) below. It is because + * the value can be found in map_a and the kernel + * does not need to acquire any spin_lock. + */ + if (ptr) { + int err; + *ptr += 1; + err = bpf_task_storage_delete(&map_a, task); + if (err == -EBUSY) + nr_del_errs++; + } + /* This will still fail because map_b is empty and + * this BPF_PROG(on_update) has failed to acquire + * the percpu busy lock => meaning potential + * deadlock is detected and it will fail to create + * new storage. + */ ptr = bpf_task_storage_get(&map_b, task, 0, BPF_LOCAL_STORAGE_GET_F_CREATE); if (ptr) @@ -57,14 +91,17 @@ int BPF_PROG(on_enter, struct pt_regs *regs, long id) long *ptr; task = bpf_get_current_task_btf(); + if (!test_pid || task->pid != test_pid) + return 0; + ptr = bpf_task_storage_get(&map_a, task, 0, BPF_LOCAL_STORAGE_GET_F_CREATE); - if (ptr) + if (ptr && !*ptr) *ptr = 200; ptr = bpf_task_storage_get(&map_b, task, 0, BPF_LOCAL_STORAGE_GET_F_CREATE); - if (ptr) + if (ptr && !*ptr) *ptr = 100; return 0; }