From patchwork Tue Mar 4 00:32:38 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999667 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com [209.85.128.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 77D6E8C1E for ; Tue, 4 Mar 2025 00:32:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741048366; cv=none; b=tanZkrWNB9rfduCbbquKImx/wdzoqyEv0q3XmcemosLhbLqO93FKw15FQIu9i0QpQzjPv3EVQiKTy3c4pdlGTRJLk98E3/gpxmUYr4+4axqxcwDvSIIy8/OtM3VOv3/Cf2WqHeyO1bKI+X+pBJmoQWvUqTGVHE2Ozreezm0ILYw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741048366; c=relaxed/simple; bh=7KqavasgLw/X40zeTLnhXYJhBEhXJVeT8ro5OxOPA1U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oakRN+4sIofd3BoIYo2jTRMKpd2DSeJU9iNOgcQWDSKfhcBi4zx9c8odo44w88goFz+HqdzE3sMhzfSPs4+gggx3e81ajH9Dve2YHPN1+FG2Y/HKY5b0huU1SC4mdrbM1C22K2FA9jtKqeM1PrE5xIxre/ZXtzi0g0SkbiUwiys= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=BZbIwTup; arc=none smtp.client-ip=209.85.128.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="BZbIwTup" Received: by mail-wm1-f65.google.com with SMTP id 5b1f17b1804b1-43bbb440520so19072505e9.2 for ; Mon, 03 Mar 2025 16:32:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741048362; x=1741653162; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DuWa5g+YuogOAsfnLcP47W0EnfQ0tjcL76LoqjUvbGA=; b=BZbIwTupSAwqer2BduhJydioX+nFi+NDNxG1XgcGWzOYYOlApLCLzQpwGKhsFO+UuA N6ZQSGKQVXwCyEMqXZ6LUJtbTocOGcfDhwuYBSQ9R9s8SW4PzlyfvJCdAE3M3rVgWLhJ eueIBYm/vmyvrVBHFCSRGinFRNOlcHk27DhTpEvZWYQJ2XK0tnvUl+ZMOE33eNqWV+2S DLDVeS3/gHtk3dShDPWMc8qbYr9AE5UD/35aOZS/S5IzfMwCWm/j3k8US8fdexg7wbZs jFPkjxnwcKdS9V7BBXx4lEhCggSz2zZomri5sk/HXaIVgi7iq2wAw+4THNpBAvej2MyR NmxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741048362; x=1741653162; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DuWa5g+YuogOAsfnLcP47W0EnfQ0tjcL76LoqjUvbGA=; b=nrdJZJk8aWiPFLsUUhoPODkkgmUmYr8skNhrr1oE/PR1z2mXaj4gz3FIcGns05M888 D38AbRMCF+cJgU3GHJXPFEGRFDosGzBJ/0z7HWbHw5xX1LGbHymerrKa0Ky3Oxk13RVl ocnLctGyVFH+A5kHzIQ/vAMo32QaozX7harZZxVIXBxep2O0Gh8I44RMK3uRkiDS/bbY cMWjTI3Q2M9e/pFe51dCmgrPA/mAZF1qmOp7KfuTQqlh/Z1oPeo13UuA5vKG6pTNxwiF dd/jMW/Y69msbmBa8BIgipxfah75u7mx6MYWKX/oTZehaL1N7TVyXySlbOS6CQtPY57Y NxRw== X-Gm-Message-State: AOJu0Yyz3KFqguMxgUgeWoioVOKCrC9fghtST1ooImWVlBG/S+v7EZf5 6LGRrkGx3H04c0fiDRCws95Plf4KOwuhfvnGpW32s5ce0W0ydPJTGMquwmBnQPA= X-Gm-Gg: ASbGncuVqczAgh2NgPeQihhDs2jzyzPfv8mPu/Syhfcgo3GMCPTGEje5UqhjspEuZCf GufI16xBe2tgHsMd1WvHH7wKGYDWQourBtt1pNOavTKLfzPHMOu8dpy8dQQ9WvYcODkFkCcLfXh POLo8iIi4/u/aBke6mAfjMC+s1uaoo9Tn5r0Vgdpr4uYfaGtzXim94AdB6eKOzwN2E2BJ6KCnYC 4YPXd26EHTWkUebFbLaqoScR4Bq/bImv0w0pJVKhM2OLajLGOS84pgoI/2r84gnMZKetvPaMTuc 5MZJZ8LYpEjLBPFo0hG036tyJzbZMVnwxw== X-Google-Smtp-Source: AGHT+IFzAWeB8kF3jZRhHW/TFPJNQj9XcN/5+Toz//9OyHDHCs5SqJjTqysLr4dN+QvLsSjKBDOucQ== X-Received: by 2002:a05:600c:1c19:b0:43b:c5a3:2e12 with SMTP id 5b1f17b1804b1-43bc5a32ef1mr40293035e9.2.1741048362075; Mon, 03 Mar 2025 16:32:42 -0800 (PST) Received: from localhost ([2a03:2880:31ff:1::]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43bc48f6d6bsm41452515e9.36.2025.03.03.16.32.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 16:32:41 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , Tejun Heo , Emil Tsalapatis , Barret Rhoden , Josh Don , Dohyun Kim , kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v2 1/2] bpf: Add verifier support for timed may_goto Date: Mon, 3 Mar 2025 16:32:38 -0800 Message-ID: <20250304003239.2390751-2-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250304003239.2390751-1-memxor@gmail.com> References: <20250304003239.2390751-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=10141; h=from:subject; bh=7KqavasgLw/X40zeTLnhXYJhBEhXJVeT8ro5OxOPA1U=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxklfRTYMfvxawOxpFL5eGpUxLsvcxRaw7epD1S48 h+98RkqJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8ZJXwAKCRBM4MiGSL8Ryu2bD/ 0Uu4QdCUr1ldrS8NcDC3d4RNAHDlXKpvvVQLOCPNBlvXWzZPUjsmQKsTyCgB1+9sH4kGF2pevitbYq vOgkbDRwS1yEnWzxi65EqcCntzzzr+FMHiVLl354BqmGGP2xyAr4eUiSC+E63gMJom25zbNCT/M4Jr YqkCpOdF6xFdHcH8XMNLaVmudgCjA4fAkrQgsVNnWv5XSQHKuad2nX3Hzolvh+eeQ1Cs0j78xeB1fJ Q6Btqxj0bcCyuAMm7Ah9/xOKJAQt7IuKuBeO23x/CqeHjVT0khQeb8N0xyqBoQWaKdS7tS+t3qNcYV DjFGJQxKRfLcQYVos0ovW+l5MrRR/kQGbxX18xMGnPYw8LoMximIZg197zdLCKHkeBLLvG6dj7Y/bF yOAtK8CbhmkRpL/tV5CTW1S3ujGxoOp/Pf15ng0McVoU3wA2IlNxI8qG/DrQ+dvDiyaFIuVD0sMQFl 7720sWZd5J/kMakV93dYcTzNXYyzX26nyW4R1/Urlyvq6c12lpMhwmGchbAKq5BbCfIok3oFeGtMO6 3Y0vCITljmBo/Zj/AoMg5T/kZsK+3EZnsUyZlVs7HTmZdR096hD0ymKjrdr6I6sPxDbA7ynv2Lxo3Y 7G4X2z4yBry88NWWKfOk69YI/paQv7eiiOOm8sFTcRLOE0aWsrFC+ZML9gdg== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Implement support in the verifier for replacing may_goto implementation from a counter-based approach to one which samples time on the local CPU to have a bigger loop bound. We implement it by maintaining 16-bytes per-stack frame, and using 8 bytes for maintaining the count for amortizing time sampling, and 8 bytes for the starting timestamp. To minimize overhead, we need to avoid spilling and filling of registers around this sequence, so we push this cost into the time sampling function 'arch_bpf_timed_may_goto'. This is a JIT-specific wrapper around bpf_check_timed_may_goto which returns us the count to store into the stack through BPF_REG_AX. All caller-saved registers (r0-r5) are guaranteed to remain untouched. The loop can be broken by returning count as 0, otherwise we dispatch into the function when the count drops to 0, and the runtime chooses to refresh it (by returning count as BPF_MAX_TIMED_LOOPS) or returning 0 and aborting the loop on next iteration. Since the check for 0 is done right after loading the count from the stack, all subsequent cond_break sequences should immediately break as well, of the same loop or subsequent loops in the program. We pass in the stack_depth of the count (and thus the timestamp, by adding 8 to it) to the arch_bpf_timed_may_goto call so that it can be passed in to bpf_check_timed_may_goto as an argument after r1 is saved, by adding the offset to r10/fp. This adjustment will be arch specific, and the next patch will introduce support for x86. Note that depending on loop complexity, time spent in the loop can be more than the current limit (250 ms), but imposing an upper bound on program runtime is an orthogonal problem which will be addressed when program cancellations are supported. The current time afforded by cond_break may not be enough for cases where BPF programs want to implement locking algorithms inline, and use cond_break as a promise to the verifier that they will eventually terminate. Below are some benchmarking numbers on the time taken per-iteration for an empty loop that counts the number of iterations until cond_break fires. For comparison, we compare it against bpf_for/bpf_repeat which is another way to achieve the same number of spins (BPF_MAX_LOOPS). The hardware used for benchmarking was a Sapphire Rapids Intel server with performance governor enabled, mitigations were enabled. +-----------------------------+--------------+--------------+------------------+ | Loop type | Iterations | Time (ms) | Time/iter (ns) | +-----------------------------|--------------+--------------+------------------+ | may_goto | 8388608 | 3 | 0.36 | | timed_may_goto (count=65535)| 589674932 | 250 | 0.42 | | bpf_for | 8388608 | 10 | 1.19 | +-----------------------------+--------------+--------------+------------------+ This gives a good approximation at low overhead while staying close to the current implementation. Signed-off-by: Kumar Kartikeya Dwivedi --- include/linux/bpf.h | 1 + include/linux/filter.h | 8 +++++ kernel/bpf/core.c | 32 +++++++++++++++++++ kernel/bpf/verifier.c | 70 +++++++++++++++++++++++++++++++++++++----- 4 files changed, 103 insertions(+), 8 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 4c4028d865ee..dae3872c301d 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1987,6 +1987,7 @@ struct bpf_array { */ enum { BPF_MAX_LOOPS = 8 * 1024 * 1024, + BPF_MAX_TIMED_LOOPS = 0xffff, }; #define BPF_F_ACCESS_MASK (BPF_F_RDONLY | \ diff --git a/include/linux/filter.h b/include/linux/filter.h index 3ed6eb9e7c73..02dda5c53d91 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -669,6 +669,11 @@ struct bpf_prog_stats { struct u64_stats_sync syncp; } __aligned(2 * sizeof(u64)); +struct bpf_timed_may_goto { + u64 count; + u64 timestamp; +}; + struct sk_filter { refcount_t refcnt; struct rcu_head rcu; @@ -1130,8 +1135,11 @@ bool bpf_jit_supports_ptr_xchg(void); bool bpf_jit_supports_arena(void); bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena); bool bpf_jit_supports_private_stack(void); +bool bpf_jit_supports_timed_may_goto(void); u64 bpf_arch_uaddress_limit(void); void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie); +u64 arch_bpf_timed_may_goto(void); +u64 bpf_check_timed_may_goto(struct bpf_timed_may_goto *); bool bpf_helper_changes_pkt_data(enum bpf_func_id func_id); static inline bool bpf_dump_raw_ok(const struct cred *cred) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index a0200fbbace9..5fae5da55a4a 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -3069,6 +3069,38 @@ void __weak arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, { } +bool __weak bpf_jit_supports_timed_may_goto(void) +{ + return false; +} + +u64 __weak arch_bpf_timed_may_goto(void) +{ + return 0; +} + +u64 bpf_check_timed_may_goto(struct bpf_timed_may_goto *p) +{ + u64 time = ktime_get_mono_fast_ns(); + + /* + * Populate the timestamp for this stack frame, and refresh count. + */ + if (!p->timestamp) { + p->timestamp = time; + return BPF_MAX_TIMED_LOOPS; + } + /* + * Check if we've exhausted our time slice, and zero count. + */ + if (time - p->timestamp >= (NSEC_PER_SEC / 4)) + return 0; + /* + * Refresh the count for the stack frame. + */ + return BPF_MAX_TIMED_LOOPS; +} + /* for configs without MMU or 32-bit */ __weak const struct bpf_map_ops arena_map_ops; __weak u64 bpf_arena_get_user_vm_start(struct bpf_arena *arena) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 22c4edc8695c..f3e95d471fa3 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -21572,7 +21572,50 @@ static int do_misc_fixups(struct bpf_verifier_env *env) goto next_insn; } - if (is_may_goto_insn(insn)) { + if (is_may_goto_insn(insn) && bpf_jit_supports_timed_may_goto()) { + int stack_off_cnt = -stack_depth - 16; + + /* + * Two 8 byte slots, depth-16 stores the count, and + * depth-8 stores the start timestamp of the loop. + * + * The starting value of count is BPF_MAX_TIMED_LOOPS + * (0xffff). Every iteration loads it and subs it by 1, + * until the value becomes 0 in AX (thus, 1 in stack), + * after which we call arch_bpf_timed_may_goto, which + * either sets AX to 0xffff to keep looping, or to 0 + * upon timeout. AX is then stored into the stack. In + * the next iteration, we either see 0 and break out, or + * continue iterating until the next time value is 0 + * after subtraction, rinse and repeat. + */ + stack_depth_extra = 16; + insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_AX, BPF_REG_10, stack_off_cnt); + if (insn->off >= 0) + insn_buf[1] = BPF_JMP_IMM(BPF_JEQ, BPF_REG_AX, 0, insn->off + 5); + else + insn_buf[1] = BPF_JMP_IMM(BPF_JEQ, BPF_REG_AX, 0, insn->off - 1); + insn_buf[2] = BPF_ALU64_IMM(BPF_SUB, BPF_REG_AX, 1); + insn_buf[3] = BPF_JMP_IMM(BPF_JNE, BPF_REG_AX, 0, 2); + /* + * AX is used as an argument to pass in stack_off_cnt + * (to add to r10/fp), and also as the return value of + * the call to arch_bpf_timed_may_goto. + */ + insn_buf[4] = BPF_MOV64_IMM(BPF_REG_AX, stack_off_cnt); + insn_buf[5] = BPF_EMIT_CALL(arch_bpf_timed_may_goto); + insn_buf[6] = BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_AX, stack_off_cnt); + cnt = 7; + + new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt); + if (!new_prog) + return -ENOMEM; + + delta += cnt - 1; + env->prog = prog = new_prog; + insn = new_prog->insnsi + i + delta; + goto next_insn; + } else if (is_may_goto_insn(insn)) { int stack_off = -stack_depth - 8; stack_depth_extra = 8; @@ -22113,23 +22156,34 @@ static int do_misc_fixups(struct bpf_verifier_env *env) env->prog->aux->stack_depth = subprogs[0].stack_depth; for (i = 0; i < env->subprog_cnt; i++) { + int delta = bpf_jit_supports_timed_may_goto() ? 2 : 1; int subprog_start = subprogs[i].start; int stack_slots = subprogs[i].stack_extra / 8; + int slots = delta, cnt = 0; if (!stack_slots) continue; - if (stack_slots > 1) { + /* + * We need two slots in case timed may_goto is supported. + */ + if (stack_slots > slots) { verbose(env, "verifier bug: stack_slots supports may_goto only\n"); return -EFAULT; } - /* Add ST insn to subprog prologue to init extra stack */ - insn_buf[0] = BPF_ST_MEM(BPF_DW, BPF_REG_FP, - -subprogs[i].stack_depth, BPF_MAX_LOOPS); + if (bpf_jit_supports_timed_may_goto()) { + insn_buf[cnt++] = BPF_ST_MEM(BPF_DW, BPF_REG_FP, -subprogs[i].stack_depth, + BPF_MAX_TIMED_LOOPS); + insn_buf[cnt++] = BPF_ST_MEM(BPF_DW, BPF_REG_FP, -subprogs[i].stack_depth + 8, 0); + } else { + /* Add ST insn to subprog prologue to init extra stack */ + insn_buf[cnt++] = BPF_ST_MEM(BPF_DW, BPF_REG_FP, -subprogs[i].stack_depth, + BPF_MAX_LOOPS); + } /* Copy first actual insn to preserve it */ - insn_buf[1] = env->prog->insnsi[subprog_start]; + insn_buf[cnt++] = env->prog->insnsi[subprog_start]; - new_prog = bpf_patch_insn_data(env, subprog_start, insn_buf, 2); + new_prog = bpf_patch_insn_data(env, subprog_start, insn_buf, cnt); if (!new_prog) return -ENOMEM; env->prog = prog = new_prog; @@ -22139,7 +22193,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env) * to insn after BPF_ST that inits may_goto count. * Adjustment will succeed because bpf_patch_insn_data() didn't fail. */ - WARN_ON(adjust_jmp_off(env->prog, subprog_start, 1)); + WARN_ON(adjust_jmp_off(env->prog, subprog_start, delta)); } /* Since poke tab is now finalized, publish aux to tracker. */ From patchwork Tue Mar 4 00:32:39 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kumar Kartikeya Dwivedi X-Patchwork-Id: 13999668 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com [209.85.128.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0F9D1847C for ; Tue, 4 Mar 2025 00:32:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741048367; cv=none; b=LYSBP/rpw3Un26ATj+VCqp8XqUwBu7tyVjPcv4R8Jm0DH1IDNhGsBXjuBRCgvtBfBkWBkSiDqq6RgTeu+g5EbSg+BTAuyjBoIi0o+pnUIerOPJE2Q8xivvevrRtHupRl6cEZmLUJ6nuUiffhmCREVRkjDGfBaNtlTO9oHywpixk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741048367; c=relaxed/simple; bh=qFoOCPGc8iCvWU53V6BGSxuE+vZSEASH2Zz/8y+fz5A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MEqWLfTyOCoEwkBWpF3LRXOirNc0SNMszDowd/AnCbgQIQSRUkjdq0bR3KhU2KktmHzLWSxSeIzXwfdTH9dAufO9+Yc9pC4rr1IcnMj/jThrL37MMTHW3eWdMB/jCrVr/DI2hYD/fXQC4j1WI00DBx+x+8pc3jUVw5rDa85EkI8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=LqJvlUqU; arc=none smtp.client-ip=209.85.128.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="LqJvlUqU" Received: by mail-wm1-f65.google.com with SMTP id 5b1f17b1804b1-4394036c0efso32391605e9.2 for ; Mon, 03 Mar 2025 16:32:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741048364; x=1741653164; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=eh0+1guvjX8NMrmS2ddf3cUxBdk6rSfM9jrmWQa5LEI=; b=LqJvlUqUqERdyCfr0sLzz/tbyLOPC8NLRKiRiNkP3jdnYd+iK3gaWfuqbaI7hPlLSx 6JmVPUrMdEIwSN1FfjO+XhOj87QZbQgytX8MOtbtF5celszyEIK9Kupae6/doDeLeag0 2RXclsrXRvsY839VBLhjHkws2EzOVqN0OzrN6Ql+5g3pvlsDRyO/ZNErxSnXYszy/30l FhgtOTio5LOejudrj+ZCIbpfyn33RySC2TnJEIAkP2KrUdcWs9VEw0OwlzyDXG8hNRx8 qlpKZ1IVw7akqoj7S8nndOqB20+LZkgWYUwNUbwFQeoB1oZi+ky5YOX2pdCI2IIVNyCP HMvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741048364; x=1741653164; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eh0+1guvjX8NMrmS2ddf3cUxBdk6rSfM9jrmWQa5LEI=; b=HaFk1u50hVShzLgbqddc5UX5p15rrkqNSzwZ1bp2ga4k8NkClxp5J6R50oHEAsXqbR O/sNSbdIMCSgBPKrBeqPD/a/eh0KoO0D7flTo/jhlk8m4RZVlOmROWhU+Ekaw32awpZ5 IqXRwcBzTUtCsH3D1hQ/U4GU+8cA/B+b2bD+n/9p6snlLjatGWb4Uxmq3VGvb6kFxMHo Ne8uLvVgM1LnICpUVOZmZqQrRXUqYgADbuy8pV9T2JbyQyHeM8Qq59N9ipY2v92a6WNI GemjKIrAeR302mo87vTzLBMeXiCNsb1zZut0xUHymYg0WmX7Y1M6ZHhnwjyqB7AS05zq HPRA== X-Gm-Message-State: AOJu0YwVSrVwgSlYFAlB6vewLoG9Ev3LOYK2BeYA8RHMlmB/BirJp55d +5MduYsmOXEiZJ2m8IKicB5CTjG/5ly2EoF72abIrtZappAdxNMV5cLuUVslBWQ= X-Gm-Gg: ASbGnctOaAC6834BGHgQLxj0WRPkZJEtXGQhuDn/S71QXs5PUc+/4HMDKwzHN6aj9mV FARFTP8N97ECK1sYJWTMy3i8P1g3zsuXyQGOslwdezHxGvXTVk1Aa6V5R6wo2p51BHJ5X8WFKi5 XJEOg0fEqSONJA2/UKPP/RqC2iW6iAdkYXT+/0t0lP87/FCwfIdg9IY4+kwshnww3fxpKmdGo2r 67rg3vt4QKJ/NIAqV9Rw0yrfckrIpnC87Nl1y5suVCzLbwyTnlHVOoXBcioVJQ93Bs1cMRMXTvj 9+vmxD1CTYro1TCw7tnKOub1b5L6qm4YdQ== X-Google-Smtp-Source: AGHT+IFBhR4Mh3u8mACWR77I/JdochbxraU1gDrIg8YPa+a8yQXnyV6RdJ8ndWKhX3kqkBj1rUNYkA== X-Received: by 2002:a5d:6da3:0:b0:38f:3ee0:7012 with SMTP id ffacd0b85a97d-390ec7c925fmr13010863f8f.3.1741048363530; Mon, 03 Mar 2025 16:32:43 -0800 (PST) Received: from localhost ([2a03:2880:31ff:5::]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e47b6ce3sm15644223f8f.43.2025.03.03.16.32.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 16:32:42 -0800 (PST) From: Kumar Kartikeya Dwivedi To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Eduard Zingerman , Tejun Heo , Emil Tsalapatis , Barret Rhoden , Josh Don , Dohyun Kim , kkd@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next v2 2/2] bpf, x86: Add x86 JIT support for timed may_goto Date: Mon, 3 Mar 2025 16:32:39 -0800 Message-ID: <20250304003239.2390751-3-memxor@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250304003239.2390751-1-memxor@gmail.com> References: <20250304003239.2390751-1-memxor@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=7414; h=from:subject; bh=qFoOCPGc8iCvWU53V6BGSxuE+vZSEASH2Zz/8y+fz5A=; b=owEBbQKS/ZANAwAIAUzgyIZIvxHKAcsmYgBnxklfCrYSS08GUx73vZQrNsJg+Ng3SW8mGh4HZaSN pp2r1aSJAjMEAAEIAB0WIQRLvip+Buz51YI8YRFM4MiGSL8RygUCZ8ZJXwAKCRBM4MiGSL8RyppGD/ 48xRErrDJHJR5NL1WaabNqEKls596mNr3kiLBqu6Lh7Ym3MoH1z1oDnoFGbhxFL/Y56bq2rDyLK/Aj g7d4qSp9gpiK78hB7LF/qvuyNNe0l8R9GnoI3/Exe8qpIkqJtpIjgybuLPq3hcOgt6RYF0AIpyVv1u ow/xYVIr01omDmnCUGsFLZLjEDG7W9Sm5JvGkmOWAPODa0KxTLUrjeWlWXmtxPukmMFMU5E31pe4SH FNPFSdLTIs8+bF1iutM0AbNUEhHbObFO79CTQPNno4t7n6nlctfiidQgHCAhHR+tNC6aWO4dLBmwlb FzG1fmWy8iXJt450gfTVzOIAEbEV2V/VwcJRemgP4twpqwye6JR3KbUFVF8Kl/pJDwBSZZjj0rM2zd mV44S9Xe032pOL8Uvsmx6PInGaCPwTqM0DEjCyRy9YjC2lEvsCnvSmrA5riseEAznFsbG/mG/l9cV6 boxKJUvXA6W+PKDfJjsk9RV5xIhBmUFMp5r3s0eaJAV3iJ23sY3fGnTStK+9GCSBKVXg3WQteBcufH LMHn2z6yUDga+C6b5hYCFD55+HkP/zMjJQ3GetmP1qePekXAt836g0Ka0J3zJeSbA9fQMExs12syFp Vj4Lb22vrjxmveMumdcLsZyncm6yL3t0x+wgixIJ/X4z5hfC8IKOxYYriDiw== X-Developer-Key: i=memxor@gmail.com; a=openpgp; fpr=4BBE2A7E06ECF9D5823C61114CE0C88648BF11CA X-Patchwork-Delegate: bpf@iogearbox.net Implement the arch_bpf_timed_may_goto function using inline assembly to have control over which registers are spilled, and use our special protocol of using BPF_REG_AX as an argument into the function, and as the return value when going back. Emit call depth accounting for the call made from this stub, and ensure we don't have naked returns (when rethunk mitigations are enabled) by falling back to the RET macro (instead of retq). After popping all saved registers, the return address into the BPF program should be on top of the stack. Since the JIT support is now enabled, ensure selftests which are checking the produced may_goto sequences do not break by adjusting them. Make sure we still test the old may_goto sequence on other architectures, while testing the new sequence on x86_64. Signed-off-by: Kumar Kartikeya Dwivedi --- arch/x86/net/Makefile | 2 +- arch/x86/net/bpf_jit_comp.c | 5 ++ arch/x86/net/bpf_timed_may_goto.S | 52 +++++++++++++++++ .../bpf/progs/verifier_bpf_fastcall.c | 58 +++++++++++++++---- .../selftests/bpf/progs/verifier_may_goto_1.c | 34 ++++++++++- 5 files changed, 138 insertions(+), 13 deletions(-) create mode 100644 arch/x86/net/bpf_timed_may_goto.S diff --git a/arch/x86/net/Makefile b/arch/x86/net/Makefile index 383c87300b0d..dddbefc0f439 100644 --- a/arch/x86/net/Makefile +++ b/arch/x86/net/Makefile @@ -6,5 +6,5 @@ ifeq ($(CONFIG_X86_32),y) obj-$(CONFIG_BPF_JIT) += bpf_jit_comp32.o else - obj-$(CONFIG_BPF_JIT) += bpf_jit_comp.o + obj-$(CONFIG_BPF_JIT) += bpf_jit_comp.o bpf_timed_may_goto.o endif diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index a43fc5af973d..f3e9ef6b5329 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -3791,3 +3791,8 @@ u64 bpf_arch_uaddress_limit(void) { return 0; } + +bool bpf_jit_supports_timed_may_goto(void) +{ + return true; +} diff --git a/arch/x86/net/bpf_timed_may_goto.S b/arch/x86/net/bpf_timed_may_goto.S new file mode 100644 index 000000000000..547140ebcd10 --- /dev/null +++ b/arch/x86/net/bpf_timed_may_goto.S @@ -0,0 +1,52 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */ + +#include +#include +#include + + .code64 + .section .text, "ax" + +SYM_FUNC_START(arch_bpf_timed_may_goto) + ANNOTATE_NOENDBR + + /* + * Save r0-r5. + */ + pushq %rax + pushq %rdi + pushq %rsi + pushq %rdx + pushq %rcx + pushq %r8 + + /* + * r10 passes us stack depth, load the pointer to count and timestamp as + * first argument to the call below. + */ + leaq (%rbp, %r10, 1), %rdi + + /* + * Emit call depth accounting for call below. + */ + CALL_DEPTH_ACCOUNT + call bpf_check_timed_may_goto + + /* + * BPF_REG_AX=r10 will be stored into count, so move return value to it. + */ + movq %rax, %r10 + + /* + * Restore r5-r0. + */ + popq %r8 + popq %rcx + popq %rdx + popq %rsi + popq %rdi + popq %rax + + RET +SYM_FUNC_END(arch_bpf_timed_may_goto) diff --git a/tools/testing/selftests/bpf/progs/verifier_bpf_fastcall.c b/tools/testing/selftests/bpf/progs/verifier_bpf_fastcall.c index 5094c288cfd7..a9be6ae49454 100644 --- a/tools/testing/selftests/bpf/progs/verifier_bpf_fastcall.c +++ b/tools/testing/selftests/bpf/progs/verifier_bpf_fastcall.c @@ -620,23 +620,61 @@ __naked void helper_call_does_not_prevent_bpf_fastcall(void) SEC("raw_tp") __arch_x86_64 +__log_level(4) __msg("stack depth 24") +/* may_goto counter at -24 */ +__xlated("0: *(u64 *)(r10 -24) =") +/* may_goto timestamp at -16 */ +__xlated("1: *(u64 *)(r10 -16) =") +__xlated("2: r1 = 1") +__xlated("...") +__xlated("4: r0 = &(void __percpu *)(r0)") +__xlated("...") +/* may_goto expansion starts */ +__xlated("6: r11 = *(u64 *)(r10 -24)") +__xlated("7: if r11 == 0x0 goto pc+6") +__xlated("8: r11 -= 1") +__xlated("9: if r11 != 0x0 goto pc+2") +__xlated("10: r11 = -24") +__xlated("11: call unknown") +__xlated("12: *(u64 *)(r10 -24) = r11") +/* may_goto expansion ends */ +__xlated("13: *(u64 *)(r10 -8) = r1") +__xlated("14: exit") +__success +__naked void may_goto_interaction_x86_64(void) +{ + asm volatile ( + "r1 = 1;" + "*(u64 *)(r10 - 16) = r1;" + "call %[bpf_get_smp_processor_id];" + "r1 = *(u64 *)(r10 - 16);" + ".8byte %[may_goto];" + /* just touch some stack at -8 */ + "*(u64 *)(r10 - 8) = r1;" + "exit;" + : + : __imm(bpf_get_smp_processor_id), + __imm_insn(may_goto, BPF_RAW_INSN(BPF_JMP | BPF_JCOND, 0, 0, +1 /* offset */, 0)) + : __clobber_all); +} + +SEC("raw_tp") +__arch_arm64 __log_level(4) __msg("stack depth 16") /* may_goto counter at -16 */ __xlated("0: *(u64 *)(r10 -16) =") __xlated("1: r1 = 1") -__xlated("...") -__xlated("3: r0 = &(void __percpu *)(r0)") -__xlated("...") +__xlated("2: call bpf_get_smp_processor_id") /* may_goto expansion starts */ -__xlated("5: r11 = *(u64 *)(r10 -16)") -__xlated("6: if r11 == 0x0 goto pc+3") -__xlated("7: r11 -= 1") -__xlated("8: *(u64 *)(r10 -16) = r11") +__xlated("3: r11 = *(u64 *)(r10 -16)") +__xlated("4: if r11 == 0x0 goto pc+3") +__xlated("5: r11 -= 1") +__xlated("6: *(u64 *)(r10 -16) = r11") /* may_goto expansion ends */ -__xlated("9: *(u64 *)(r10 -8) = r1") -__xlated("10: exit") +__xlated("7: *(u64 *)(r10 -8) = r1") +__xlated("8: exit") __success -__naked void may_goto_interaction(void) +__naked void may_goto_interaction_arm64(void) { asm volatile ( "r1 = 1;" diff --git a/tools/testing/selftests/bpf/progs/verifier_may_goto_1.c b/tools/testing/selftests/bpf/progs/verifier_may_goto_1.c index e81097c96fe2..3966d827f288 100644 --- a/tools/testing/selftests/bpf/progs/verifier_may_goto_1.c +++ b/tools/testing/selftests/bpf/progs/verifier_may_goto_1.c @@ -69,8 +69,38 @@ __naked void may_goto_batch_1(void) } SEC("raw_tp") -__description("may_goto batch with offsets 2/0") +__description("may_goto batch with offsets 2/0 - x86_64") __arch_x86_64 +__xlated("0: *(u64 *)(r10 -16) = 65535") +__xlated("1: *(u64 *)(r10 -8) = 0") +__xlated("2: r11 = *(u64 *)(r10 -16)") +__xlated("3: if r11 == 0x0 goto pc+6") +__xlated("4: r11 -= 1") +__xlated("5: if r11 != 0x0 goto pc+2") +__xlated("6: r11 = -16") +__xlated("7: call unknown") +__xlated("8: *(u64 *)(r10 -16) = r11") +__xlated("9: r0 = 1") +__xlated("10: r0 = 2") +__xlated("11: exit") +__success +__naked void may_goto_batch_2_x86_64(void) +{ + asm volatile ( + ".8byte %[may_goto1];" + ".8byte %[may_goto3];" + "r0 = 1;" + "r0 = 2;" + "exit;" + : + : __imm_insn(may_goto1, BPF_RAW_INSN(BPF_JMP | BPF_JCOND, 0, 0, 2 /* offset */, 0)), + __imm_insn(may_goto3, BPF_RAW_INSN(BPF_JMP | BPF_JCOND, 0, 0, 0 /* offset */, 0)) + : __clobber_all); +} + +SEC("raw_tp") +__description("may_goto batch with offsets 2/0 - arm64") +__arch_arm64 __xlated("0: *(u64 *)(r10 -8) = 8388608") __xlated("1: r11 = *(u64 *)(r10 -8)") __xlated("2: if r11 == 0x0 goto pc+3") @@ -80,7 +110,7 @@ __xlated("5: r0 = 1") __xlated("6: r0 = 2") __xlated("7: exit") __success -__naked void may_goto_batch_2(void) +__naked void may_goto_batch_2_arm64(void) { asm volatile ( ".8byte %[may_goto1];"