From patchwork Sun Sep 29 13:27:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Hwang X-Patchwork-Id: 13815090 X-Patchwork-Delegate: bpf@iogearbox.net Received: from out-180.mta1.migadu.com (out-180.mta1.migadu.com [95.215.58.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D2BCAD24 for ; Sun, 29 Sep 2024 13:28:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727616505; cv=none; b=LDpXJn8Ya7Ur2YFjf52uOkvqgp+MshBZhKAaRtKiGHesWJ3Vu0TlMquSGhU3apP4CaYojj0z4/uBYpOXJ/4PBAE3OH/sC4AQBCge4V4IUfCZL8FBm/uVDAEosl07Ieys0Pw+VZK58jbvcWjyyHLyvdEg1cqXkJk8abLy4axJr+k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727616505; c=relaxed/simple; bh=yMAmXYf4BZsbnae4yK6RZZ1mIIAHY6/dxdBXhShYMQQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=c7IAHWxa7k6AoK+I3bF70P8J/Od6Vb6f3Ph+VywdkBVmCaEzV2VR9VDdEckyM9FzhWc+fJZbf+rCXikJIe/qS+oVpM4X9UX9CwtQLJzPqu2OEkMo43H9p/JUXrctkzIdL2/WGHxzsLdp6Fd5wrFqAGs1LfOKhhiR3bSWshCTFxg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=GCobE5Fi; arc=none smtp.client-ip=95.215.58.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="GCobE5Fi" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1727616500; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rYm0TozD+NlcikAMF3XQ/i2UU+qh6ZhxO3+lSmttCKo=; b=GCobE5Fid7hnbw/UkF8Z9/yDALSGagP5MKynJMrgU1uKC7d2f3ny0WrazfVmNz9PMQTaSz Y/G4s+zJECWXqhmnVQH3u5jcAIInb91exwFMxKTiANHStI42wO8CIphDXYP2W+TQFnM0cn 98yAbKaHWoRQJSpwS4OdfRrjekY3zWw= From: Leon Hwang To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, toke@redhat.com, martin.lau@kernel.org, yonghong.song@linux.dev, puranjay@kernel.org, xukuohai@huaweicloud.com, eddyz87@gmail.com, iii@linux.ibm.com, leon.hwang@linux.dev, kernel-patches-bot@fb.com Subject: [PATCH bpf-next v4 1/4] bpf: Prevent updating extended prog to prog_array map Date: Sun, 29 Sep 2024 21:27:54 +0800 Message-ID: <20240929132757.79826-2-leon.hwang@linux.dev> In-Reply-To: <20240929132757.79826-1-leon.hwang@linux.dev> References: <20240929132757.79826-1-leon.hwang@linux.dev> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Patchwork-Delegate: bpf@iogearbox.net This patch partially prevents a potential infinite loop issue caused by combination of tailcal and freplace. For example: tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include \#include __noinline int subprog_tc(struct __sk_buff *skb) { return skb->len * 2; } SEC("tc") int entry_tc(struct __sk_buff *skb) { return subprog_tc(skb); } char __license[] SEC("license") = "GPL"; tailcall_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include \#include struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; SEC("freplace") int entry_freplace(struct __sk_buff *skb) { count++; bpf_tail_call_static(skb, &jmp_table, 0); return count; } char __license[] SEC("license") = "GPL"; The attach target of entry_freplace is subprog_tc, and the tail callee in entry_freplace is entry_tc. Then, the infinite loop will be entry_tc -> subprog_tc -> entry_freplace --tailcall-> entry_tc, because tail_call_cnt in entry_freplace will count from zero for every time of entry_freplace execution. Kernel will panic, like: [ 15.310490] BUG: TASK stack guard page was hit at (____ptrval____) (stack is (____ptrval____)..(____ptrval____)) [ 15.310490] Oops: stack guard page: 0000 [#1] PREEMPT SMP NOPTI [ 15.310490] CPU: 1 PID: 89 Comm: test_progs Tainted: G OE 6.10.0-rc6-g026dcdae8d3e-dirty #72 [ 15.310490] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Call Trace: [ 15.310490] <#DF> [ 15.310490] ? die+0x36/0x90 [ 15.310490] ? handle_stack_overflow+0x4d/0x60 [ 15.310490] ? exc_double_fault+0x117/0x1a0 [ 15.310490] ? asm_exc_double_fault+0x23/0x30 [ 15.310490] ? bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] [ 15.310490] [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] ... [ 15.310490] bpf_prog_85781a698094722f_entry+0x4c/0x64 [ 15.310490] bpf_prog_1c515f389a9059b4_entry2+0x19/0x1b [ 15.310490] bpf_test_run+0x210/0x370 [ 15.310490] ? bpf_test_run+0x128/0x370 [ 15.310490] bpf_prog_test_run_skb+0x388/0x7a0 [ 15.310490] __sys_bpf+0xdbf/0x2c40 [ 15.310490] ? clockevents_program_event+0x52/0xf0 [ 15.310490] ? lock_release+0xbf/0x290 [ 15.310490] __x64_sys_bpf+0x1e/0x30 [ 15.310490] do_syscall_64+0x68/0x140 [ 15.310490] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 15.310490] RIP: 0033:0x7f133b52725d [ 15.310490] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48 [ 15.310490] RSP: 002b:00007ffddbc10258 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 [ 15.310490] RAX: ffffffffffffffda RBX: 00007ffddbc10828 RCX: 00007f133b52725d [ 15.310490] RDX: 0000000000000050 RSI: 00007ffddbc102a0 RDI: 000000000000000a [ 15.310490] RBP: 00007ffddbc10270 R08: 0000000000000000 R09: 00007ffddbc102a0 [ 15.310490] R10: 0000000000000064 R11: 0000000000000206 R12: 0000000000000004 [ 15.310490] R13: 0000000000000000 R14: 0000558ec4c24890 R15: 00007f133b6ed000 [ 15.310490] [ 15.310490] Modules linked in: bpf_testmod(OE) [ 15.310490] ---[ end trace 0000000000000000 ]--- [ 15.310490] RIP: 0010:bpf_prog_3a140cef239a4b4f_subprog_tail+0x14/0x53 [ 15.310490] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa <50> 50 53 41 55 48 89 fb 49 bd 00 2a 46 82 98 9c ff ff 48 89 df 4c [ 15.310490] RSP: 0018:ffffb500c0aa0000 EFLAGS: 00000202 [ 15.310490] RAX: ffffb500c0aa0028 RBX: ffff9c98808b7e00 RCX: 0000000000008cb5 [ 15.310490] RDX: 0000000000000000 RSI: ffff9c9882462a00 RDI: ffff9c98808b7e00 [ 15.310490] RBP: ffffb500c0aa0000 R08: 0000000000000000 R09: 0000000000000000 [ 15.310490] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb500c01af000 [ 15.310490] R13: ffffb500c01cd000 R14: 0000000000000000 R15: 0000000000000000 [ 15.310490] FS: 00007f133b665140(0000) GS:ffff9c98bbd00000(0000) knlGS:0000000000000000 [ 15.310490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.310490] CR2: ffffb500c0a9fff8 CR3: 0000000102478000 CR4: 00000000000006f0 [ 15.310490] Kernel panic - not syncing: Fatal exception in interrupt [ 15.310490] Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) This patch partially prevents this panic by preventing updating extended prog to prog_array map. If a prog or its subprog has been extended by freplace prog, the prog can not be updated to prog_array map. Alongside next patch, the panic will be prevented completely. BTW, fix a minor style issue by replacing 8-spaces with a tab. Signed-off-by: Leon Hwang --- include/linux/bpf.h | 2 ++ kernel/bpf/arraymap.c | 21 +++++++++++++++++---- kernel/bpf/core.c | 1 + kernel/bpf/syscall.c | 42 ++++++++++++++++++++++++++++++++++++------ 4 files changed, 56 insertions(+), 10 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 19d8ca8ac960f..aac6d2f42830c 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1483,6 +1483,8 @@ struct bpf_prog_aux { bool xdp_has_frags; bool exception_cb; bool exception_boundary; + bool is_extended; /* true if extended by freplace program */ + struct mutex ext_mutex; /* mutex for is_extended */ struct bpf_arena *arena; /* BTF_KIND_FUNC_PROTO for valid attach_btf_id */ const struct btf_type *attach_func_proto; diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index 79660e3fca4c1..4a4de4f014be9 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -947,16 +947,29 @@ static void *prog_fd_array_get_ptr(struct bpf_map *map, struct file *map_file, int fd) { struct bpf_prog *prog = bpf_prog_get(fd); + bool is_extended; if (IS_ERR(prog)) return prog; - if (!bpf_prog_map_compatible(map, prog)) { - bpf_prog_put(prog); - return ERR_PTR(-EINVAL); - } + if (!bpf_prog_map_compatible(map, prog)) + goto out_put_prog; + + mutex_lock(&prog->aux->ext_mutex); + is_extended = prog->aux->is_extended; + mutex_unlock(&prog->aux->ext_mutex); + if (is_extended) + /* Extended prog can not be tail callee. It's to prevent a + * potential infinite loop like: + * tail callee prog entry -> tail callee prog subprog -> + * freplace prog entry --tailcall-> tail callee prog entry. + */ + goto out_put_prog; return prog; +out_put_prog: + bpf_prog_put(prog); + return ERR_PTR(-EINVAL); } static void prog_fd_array_put_ptr(struct bpf_map *map, void *ptr, bool need_defer) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 4e07cc057d6f2..ea7f59374b378 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -131,6 +131,7 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag INIT_LIST_HEAD_RCU(&fp->aux->ksym_prefix.lnode); #endif mutex_init(&fp->aux->used_maps_mutex); + mutex_init(&fp->aux->ext_mutex); mutex_init(&fp->aux->dst_mutex); return fp; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index a8f1808a1ca54..db17c52fa35db 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -3212,14 +3212,23 @@ static void bpf_tracing_link_release(struct bpf_link *link) { struct bpf_tracing_link *tr_link = container_of(link, struct bpf_tracing_link, link.link); - - WARN_ON_ONCE(bpf_trampoline_unlink_prog(&tr_link->link, - tr_link->trampoline)); + struct bpf_prog *tgt_prog = tr_link->tgt_prog; + + if (link->prog->type == BPF_PROG_TYPE_EXT) { + mutex_lock(&tgt_prog->aux->ext_mutex); + WARN_ON_ONCE(bpf_trampoline_unlink_prog(&tr_link->link, + tr_link->trampoline)); + tgt_prog->aux->is_extended = false; + mutex_unlock(&tgt_prog->aux->ext_mutex); + } else { + WARN_ON_ONCE(bpf_trampoline_unlink_prog(&tr_link->link, + tr_link->trampoline)); + } bpf_trampoline_put(tr_link->trampoline); /* tgt_prog is NULL if target is a kernel function */ - if (tr_link->tgt_prog) + if (tgt_prog) bpf_prog_put(tr_link->tgt_prog); } @@ -3270,6 +3279,24 @@ static const struct bpf_link_ops bpf_tracing_link_lops = { .fill_link_info = bpf_tracing_link_fill_link_info, }; +static int bpf_extend_prog(struct bpf_tracing_link *link, + struct bpf_trampoline *tr, + struct bpf_prog *tgt_prog) +{ + struct bpf_prog_aux *aux = tgt_prog->aux; + int err = 0; + + mutex_lock(&aux->ext_mutex); + err = bpf_trampoline_link_prog(&link->link, tr); + if (err) + goto out_unlock; + + aux->is_extended = true; +out_unlock: + mutex_unlock(&aux->ext_mutex); + return err; +} + static int bpf_tracing_prog_attach(struct bpf_prog *prog, int tgt_prog_fd, u32 btf_id, @@ -3354,7 +3381,7 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog, * in prog->aux * * - if prog->aux->dst_trampoline is NULL, the program has already been - * attached to a target and its initial target was cleared (below) + * attached to a target and its initial target was cleared (below) * * - if tgt_prog != NULL, the caller specified tgt_prog_fd + * target_btf_id using the link_create API. @@ -3429,7 +3456,10 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog, if (err) goto out_unlock; - err = bpf_trampoline_link_prog(&link->link, tr); + if (prog->type == BPF_PROG_TYPE_EXT) + err = bpf_extend_prog(link, tr, tgt_prog); + else + err = bpf_trampoline_link_prog(&link->link, tr); if (err) { bpf_link_cleanup(&link_primer); link = NULL;