From patchwork Tue Dec 5 18:42:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13480625 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE98210D3 for ; Tue, 5 Dec 2023 10:43:11 -0800 (PST) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3B5IfKTd015622 for ; Tue, 5 Dec 2023 10:43:11 -0800 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3usxjtn4ds-6 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 05 Dec 2023 10:43:10 -0800 Received: from twshared29647.38.frc1.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:11d::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.34; Tue, 5 Dec 2023 10:43:03 -0800 Received: by devbig019.vll3.facebook.com (Postfix, from userid 137359) id 49CE33CA16FA9; Tue, 5 Dec 2023 10:42:52 -0800 (PST) From: Andrii Nakryiko To: , , , CC: , , Eduard Zingerman , Tao Lyu Subject: [PATCH v4 bpf-next 01/10] bpf: support non-r10 register spill/fill to/from stack in precision tracking Date: Tue, 5 Dec 2023 10:42:39 -0800 Message-ID: <20231205184248.1502704-2-andrii@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231205184248.1502704-1-andrii@kernel.org> References: <20231205184248.1502704-1-andrii@kernel.org> X-FB-Internal: Safe X-Proofpoint-GUID: xSFV07NZ6oRK8-6jw-MVVTVwaLNy-ytZ X-Proofpoint-ORIG-GUID: xSFV07NZ6oRK8-6jw-MVVTVwaLNy-ytZ X-Proofpoint-UnRewURL: 0 URL was un-rewritten Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-05_14,2023-12-05_01,2023-05-22_02 X-Patchwork-Delegate: bpf@iogearbox.net Use instruction (jump) history to record instructions that performed register spill/fill to/from stack, regardless if this was done through read-only r10 register, or any other register after copying r10 into it *and* potentially adjusting offset. To make this work reliably, we push extra per-instruction flags into instruction history, encoding stack slot index (spi) and stack frame number in extra 10 bit flags we take away from prev_idx in instruction history. We don't touch idx field for maximum performance, as it's checked most frequently during backtracking. This change removes basically the last remaining practical limitation of precision backtracking logic in BPF verifier. It fixes known deficiencies, but also opens up new opportunities to reduce number of verified states, explored in the subsequent patches. There are only three differences in selftests' BPF object files according to veristat, all in the positive direction (less states). File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) -------------------------------------- ------------- --------- --------- ------------- ---------- ---------- ------------- test_cls_redirect_dynptr.bpf.linked3.o cls_redirect 2987 2864 -123 (-4.12%) 240 231 -9 (-3.75%) xdp_synproxy_kern.bpf.linked3.o syncookie_tc 82848 82661 -187 (-0.23%) 5107 5073 -34 (-0.67%) xdp_synproxy_kern.bpf.linked3.o syncookie_xdp 85116 84964 -152 (-0.18%) 5162 5130 -32 (-0.62%) Note, I avoided renaming jmp_history to more generic insn_hist to minimize number of lines changed and potential merge conflicts between bpf and bpf-next trees. Notice also cur_hist_entry pointer reset to NULL at the beginning of instruction verification loop. This pointer avoids the problem of relying on last jump history entry's insn_idx to determine whether we already have entry for current instruction or not. It can happen that we added jump history entry because current instruction is_jmp_point(), but also we need to add instruction flags for stack access. In this case, we don't want to entries, so we need to reuse last added entry, if it is present. Relying on insn_idx comparison has the same ambiguity problem as the one that was fixed recently in [0], so we avoid that. [0] https://patchwork.kernel.org/project/netdevbpf/patch/20231110002638.4168352-3-andrii@kernel.org/ Acked-by: Eduard Zingerman Reported-by: Tao Lyu Signed-off-by: Andrii Nakryiko --- include/linux/bpf_verifier.h | 31 +++- kernel/bpf/verifier.c | 175 ++++++++++-------- .../bpf/progs/verifier_subprog_precision.c | 23 ++- .../testing/selftests/bpf/verifier/precise.c | 38 ++-- 4 files changed, 169 insertions(+), 98 deletions(-) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 3378cc753061..bada59812e00 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -325,12 +325,34 @@ struct bpf_func_state { int allocated_stack; }; -struct bpf_idx_pair { - u32 prev_idx; +#define MAX_CALL_FRAMES 8 + +/* instruction history flags, used in bpf_jmp_history_entry.flags field */ +enum { + /* instruction references stack slot through PTR_TO_STACK register; + * we also store stack's frame number in lower 3 bits (MAX_CALL_FRAMES is 8) + * and accessed stack slot's index in next 6 bits (MAX_BPF_STACK is 512, + * 8 bytes per slot, so slot index (spi) is [0, 63]) + */ + INSN_F_FRAMENO_MASK = 0x7, /* 3 bits */ + + INSN_F_SPI_MASK = 0x3f, /* 6 bits */ + INSN_F_SPI_SHIFT = 3, /* shifted 3 bits to the left */ + + INSN_F_STACK_ACCESS = BIT(9), /* we need 10 bits total */ +}; + +static_assert(INSN_F_FRAMENO_MASK + 1 >= MAX_CALL_FRAMES); +static_assert(INSN_F_SPI_MASK + 1 >= MAX_BPF_STACK / 8); + +struct bpf_jmp_history_entry { u32 idx; + /* insn idx can't be bigger than 1 million */ + u32 prev_idx : 22; + /* special flags, e.g., whether insn is doing register stack spill/load */ + u32 flags : 10; }; -#define MAX_CALL_FRAMES 8 /* Maximum number of register states that can exist at once */ #define BPF_ID_MAP_SIZE ((MAX_BPF_REG + MAX_BPF_STACK / BPF_REG_SIZE) * MAX_CALL_FRAMES) struct bpf_verifier_state { @@ -413,7 +435,7 @@ struct bpf_verifier_state { * For most states jmp_history_cnt is [0-3]. * For loops can go up to ~40. */ - struct bpf_idx_pair *jmp_history; + struct bpf_jmp_history_entry *jmp_history; u32 jmp_history_cnt; u32 dfs_depth; u32 callback_unroll_depth; @@ -656,6 +678,7 @@ struct bpf_verifier_env { int cur_stack; } cfg; struct backtrack_state bt; + struct bpf_jmp_history_entry *cur_hist_ent; u32 pass_cnt; /* number of times do_check() was called */ u32 subprog_cnt; /* number of instructions analyzed by the verifier */ diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 1ed39665f802..9bc16dc66465 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1355,8 +1355,8 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state, int i, err; dst_state->jmp_history = copy_array(dst_state->jmp_history, src->jmp_history, - src->jmp_history_cnt, sizeof(struct bpf_idx_pair), - GFP_USER); + src->jmp_history_cnt, sizeof(*dst_state->jmp_history), + GFP_USER); if (!dst_state->jmp_history) return -ENOMEM; dst_state->jmp_history_cnt = src->jmp_history_cnt; @@ -3221,6 +3221,21 @@ static int check_reg_arg(struct bpf_verifier_env *env, u32 regno, return __check_reg_arg(env, state->regs, regno, t); } +static int insn_stack_access_flags(int frameno, int spi) +{ + return INSN_F_STACK_ACCESS | (spi << INSN_F_SPI_SHIFT) | frameno; +} + +static int insn_stack_access_spi(int insn_flags) +{ + return (insn_flags >> INSN_F_SPI_SHIFT) & INSN_F_SPI_MASK; +} + +static int insn_stack_access_frameno(int insn_flags) +{ + return insn_flags & INSN_F_FRAMENO_MASK; +} + static void mark_jmp_point(struct bpf_verifier_env *env, int idx) { env->insn_aux_data[idx].jmp_point = true; @@ -3232,28 +3247,51 @@ static bool is_jmp_point(struct bpf_verifier_env *env, int insn_idx) } /* for any branch, call, exit record the history of jmps in the given state */ -static int push_jmp_history(struct bpf_verifier_env *env, - struct bpf_verifier_state *cur) +static int push_jmp_history(struct bpf_verifier_env *env, struct bpf_verifier_state *cur, + int insn_flags) { u32 cnt = cur->jmp_history_cnt; - struct bpf_idx_pair *p; + struct bpf_jmp_history_entry *p; size_t alloc_size; - if (!is_jmp_point(env, env->insn_idx)) + /* combine instruction flags if we already recorded this instruction */ + if (env->cur_hist_ent) { + /* atomic instructions push insn_flags twice, for READ and + * WRITE sides, but they should agree on stack slot + */ + WARN_ONCE((env->cur_hist_ent->flags & insn_flags) && + (env->cur_hist_ent->flags & insn_flags) != insn_flags, + "verifier insn history bug: insn_idx %d cur flags %x new flags %x\n", + env->insn_idx, env->cur_hist_ent->flags, insn_flags); + env->cur_hist_ent->flags |= insn_flags; return 0; + } cnt++; alloc_size = kmalloc_size_roundup(size_mul(cnt, sizeof(*p))); p = krealloc(cur->jmp_history, alloc_size, GFP_USER); if (!p) return -ENOMEM; - p[cnt - 1].idx = env->insn_idx; - p[cnt - 1].prev_idx = env->prev_insn_idx; cur->jmp_history = p; + + p = &cur->jmp_history[cnt - 1]; + p->idx = env->insn_idx; + p->prev_idx = env->prev_insn_idx; + p->flags = insn_flags; cur->jmp_history_cnt = cnt; + env->cur_hist_ent = p; + return 0; } +static struct bpf_jmp_history_entry *get_jmp_hist_entry(struct bpf_verifier_state *st, + u32 hist_end, int insn_idx) +{ + if (hist_end > 0 && st->jmp_history[hist_end - 1].idx == insn_idx) + return &st->jmp_history[hist_end - 1]; + return NULL; +} + /* Backtrack one insn at a time. If idx is not at the top of recorded * history then previous instruction came from straight line execution. * Return -ENOENT if we exhausted all instructions within given state. @@ -3415,9 +3453,14 @@ static inline bool bt_is_reg_set(struct backtrack_state *bt, u32 reg) return bt->reg_masks[bt->frame] & (1 << reg); } +static inline bool bt_is_frame_slot_set(struct backtrack_state *bt, u32 frame, u32 slot) +{ + return bt->stack_masks[frame] & (1ull << slot); +} + static inline bool bt_is_slot_set(struct backtrack_state *bt, u32 slot) { - return bt->stack_masks[bt->frame] & (1ull << slot); + return bt_is_frame_slot_set(bt, bt->frame, slot); } /* format registers bitmask, e.g., "r0,r2,r4" for 0x15 mask */ @@ -3471,7 +3514,7 @@ static bool calls_callback(struct bpf_verifier_env *env, int insn_idx); * - *was* processed previously during backtracking. */ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx, - struct backtrack_state *bt) + struct bpf_jmp_history_entry *hist, struct backtrack_state *bt) { const struct bpf_insn_cbs cbs = { .cb_call = disasm_kfunc_name, @@ -3484,7 +3527,7 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx, u8 mode = BPF_MODE(insn->code); u32 dreg = insn->dst_reg; u32 sreg = insn->src_reg; - u32 spi, i; + u32 spi, i, fr; if (insn->code == 0) return 0; @@ -3545,20 +3588,15 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx, * by 'precise' mark in corresponding register of this state. * No further tracking necessary. */ - if (insn->src_reg != BPF_REG_FP) + if (!hist || !(hist->flags & INSN_F_STACK_ACCESS)) return 0; - /* dreg = *(u64 *)[fp - off] was a fill from the stack. * that [fp - off] slot contains scalar that needs to be * tracked with precision */ - spi = (-insn->off - 1) / BPF_REG_SIZE; - if (spi >= 64) { - verbose(env, "BUG spi %d\n", spi); - WARN_ONCE(1, "verifier backtracking bug"); - return -EFAULT; - } - bt_set_slot(bt, spi); + spi = insn_stack_access_spi(hist->flags); + fr = insn_stack_access_frameno(hist->flags); + bt_set_frame_slot(bt, fr, spi); } else if (class == BPF_STX || class == BPF_ST) { if (bt_is_reg_set(bt, dreg)) /* stx & st shouldn't be using _scalar_ dst_reg @@ -3567,17 +3605,13 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx, */ return -ENOTSUPP; /* scalars can only be spilled into stack */ - if (insn->dst_reg != BPF_REG_FP) + if (!hist || !(hist->flags & INSN_F_STACK_ACCESS)) return 0; - spi = (-insn->off - 1) / BPF_REG_SIZE; - if (spi >= 64) { - verbose(env, "BUG spi %d\n", spi); - WARN_ONCE(1, "verifier backtracking bug"); - return -EFAULT; - } - if (!bt_is_slot_set(bt, spi)) + spi = insn_stack_access_spi(hist->flags); + fr = insn_stack_access_frameno(hist->flags); + if (!bt_is_frame_slot_set(bt, fr, spi)) return 0; - bt_clear_slot(bt, spi); + bt_clear_frame_slot(bt, fr, spi); if (class == BPF_STX) bt_set_reg(bt, sreg); } else if (class == BPF_JMP || class == BPF_JMP32) { @@ -3621,10 +3655,14 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx, WARN_ONCE(1, "verifier backtracking bug"); return -EFAULT; } - /* we don't track register spills perfectly, - * so fallback to force-precise instead of failing */ - if (bt_stack_mask(bt) != 0) - return -ENOTSUPP; + /* we are now tracking register spills correctly, + * so any instance of leftover slots is a bug + */ + if (bt_stack_mask(bt) != 0) { + verbose(env, "BUG stack slots %llx\n", bt_stack_mask(bt)); + WARN_ONCE(1, "verifier backtracking bug (subprog leftover stack slots)"); + return -EFAULT; + } /* propagate r1-r5 to the caller */ for (i = BPF_REG_1; i <= BPF_REG_5; i++) { if (bt_is_reg_set(bt, i)) { @@ -3649,8 +3687,11 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx, WARN_ONCE(1, "verifier backtracking bug"); return -EFAULT; } - if (bt_stack_mask(bt) != 0) - return -ENOTSUPP; + if (bt_stack_mask(bt) != 0) { + verbose(env, "BUG stack slots %llx\n", bt_stack_mask(bt)); + WARN_ONCE(1, "verifier backtracking bug (callback leftover stack slots)"); + return -EFAULT; + } /* clear r1-r5 in callback subprog's mask */ for (i = BPF_REG_1; i <= BPF_REG_5; i++) bt_clear_reg(bt, i); @@ -4087,6 +4128,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno) for (;;) { DECLARE_BITMAP(mask, 64); u32 history = st->jmp_history_cnt; + struct bpf_jmp_history_entry *hist; if (env->log.level & BPF_LOG_LEVEL2) { verbose(env, "mark_precise: frame%d: last_idx %d first_idx %d subseq_idx %d \n", @@ -4150,7 +4192,8 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno) err = 0; skip_first = false; } else { - err = backtrack_insn(env, i, subseq_idx, bt); + hist = get_jmp_hist_entry(st, history, i); + err = backtrack_insn(env, i, subseq_idx, hist, bt); } if (err == -ENOTSUPP) { mark_all_scalars_precise(env, env->cur_state); @@ -4203,22 +4246,10 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno) bitmap_from_u64(mask, bt_frame_stack_mask(bt, fr)); for_each_set_bit(i, mask, 64) { if (i >= func->allocated_stack / BPF_REG_SIZE) { - /* the sequence of instructions: - * 2: (bf) r3 = r10 - * 3: (7b) *(u64 *)(r3 -8) = r0 - * 4: (79) r4 = *(u64 *)(r10 -8) - * doesn't contain jmps. It's backtracked - * as a single block. - * During backtracking insn 3 is not recognized as - * stack access, so at the end of backtracking - * stack slot fp-8 is still marked in stack_mask. - * However the parent state may not have accessed - * fp-8 and it's "unallocated" stack space. - * In such case fallback to conservative. - */ - mark_all_scalars_precise(env, env->cur_state); - bt_reset(bt); - return 0; + verbose(env, "BUG backtracking (stack slot %d, total slots %d)\n", + i, func->allocated_stack / BPF_REG_SIZE); + WARN_ONCE(1, "verifier backtracking bug (stack slot out of bounds)"); + return -EFAULT; } if (!is_spilled_scalar_reg(&func->stack[i])) { @@ -4391,7 +4422,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, int i, slot = -off - 1, spi = slot / BPF_REG_SIZE, err; struct bpf_insn *insn = &env->prog->insnsi[insn_idx]; struct bpf_reg_state *reg = NULL; - u32 dst_reg = insn->dst_reg; + int insn_flags = insn_stack_access_flags(state->frameno, spi); err = grow_stack_state(state, round_up(slot + 1, BPF_REG_SIZE)); if (err) @@ -4432,17 +4463,6 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, mark_stack_slot_scratched(env, spi); if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && !register_is_null(reg) && env->bpf_capable) { - if (dst_reg != BPF_REG_FP) { - /* The backtracking logic can only recognize explicit - * stack slot address like [fp - 8]. Other spill of - * scalar via different register has to be conservative. - * Backtrack from here and mark all registers as precise - * that contributed into 'reg' being a constant. - */ - err = mark_chain_precision(env, value_regno); - if (err) - return err; - } save_register_state(state, spi, reg, size); /* Break the relation on a narrowing spill. */ if (fls64(reg->umax_value) > BITS_PER_BYTE * size) @@ -4454,6 +4474,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, __mark_reg_known(&fake_reg, insn->imm); fake_reg.type = SCALAR_VALUE; save_register_state(state, spi, &fake_reg, size); + insn_flags = 0; /* not a register spill */ } else if (reg && is_spillable_regtype(reg->type)) { /* register containing pointer is being spilled into stack */ if (size != BPF_REG_SIZE) { @@ -4499,9 +4520,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, /* Mark slots affected by this stack write. */ for (i = 0; i < size; i++) - state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] = - type; + state->stack[spi].slot_type[(slot - i) % BPF_REG_SIZE] = type; + insn_flags = 0; /* not a register spill */ } + + if (insn_flags) + return push_jmp_history(env, env->cur_state, insn_flags); return 0; } @@ -4694,6 +4718,7 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env, int i, slot = -off - 1, spi = slot / BPF_REG_SIZE; struct bpf_reg_state *reg; u8 *stype, type; + int insn_flags = insn_stack_access_flags(reg_state->frameno, spi); stype = reg_state->stack[spi].slot_type; reg = ®_state->stack[spi].spilled_ptr; @@ -4739,12 +4764,10 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env, return -EACCES; } mark_reg_unknown(env, state->regs, dst_regno); + insn_flags = 0; /* not restoring original register state */ } state->regs[dst_regno].live |= REG_LIVE_WRITTEN; - return 0; - } - - if (dst_regno >= 0) { + } else if (dst_regno >= 0) { /* restore register state from stack */ copy_register_state(&state->regs[dst_regno], reg); /* mark reg as written since spilled pointer state likely @@ -4780,7 +4803,10 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env, mark_reg_read(env, reg, reg->parent, REG_LIVE_READ64); if (dst_regno >= 0) mark_reg_stack_read(env, reg_state, off, off + size, dst_regno); + insn_flags = 0; /* we are not restoring spilled register */ } + if (insn_flags) + return push_jmp_history(env, env->cur_state, insn_flags); return 0; } @@ -6940,7 +6966,6 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i BPF_SIZE(insn->code), BPF_WRITE, -1, true, false); if (err) return err; - return 0; } @@ -16910,7 +16935,8 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx) * the precision needs to be propagated back in * the current state. */ - err = err ? : push_jmp_history(env, cur); + if (is_jmp_point(env, env->insn_idx)) + err = err ? : push_jmp_history(env, cur, 0); err = err ? : propagate_precision(env, &sl->state); if (err) return err; @@ -17135,6 +17161,9 @@ static int do_check(struct bpf_verifier_env *env) u8 class; int err; + /* reset current history entry on each new instruction */ + env->cur_hist_ent = NULL; + env->prev_insn_idx = prev_insn_idx; if (env->insn_idx >= insn_cnt) { verbose(env, "invalid insn idx %d insn_cnt %d\n", @@ -17174,7 +17203,7 @@ static int do_check(struct bpf_verifier_env *env) } if (is_jmp_point(env, env->insn_idx)) { - err = push_jmp_history(env, state); + err = push_jmp_history(env, state, 0); if (err) return err; } diff --git a/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c b/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c index 0dfe3f8b69ac..eba98fab2f54 100644 --- a/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c +++ b/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c @@ -589,11 +589,24 @@ static __u64 subprog_spill_reg_precise(void) SEC("?raw_tp") __success __log_level(2) -/* precision backtracking can't currently handle stack access not through r10, - * so we won't be able to mark stack slot fp-8 as precise, and so will - * fallback to forcing all as precise - */ -__msg("mark_precise: frame0: falling back to forcing all scalars precise") +__msg("10: (0f) r1 += r7") +__msg("mark_precise: frame0: last_idx 10 first_idx 7 subseq_idx -1") +__msg("mark_precise: frame0: regs=r7 stack= before 9: (bf) r1 = r8") +__msg("mark_precise: frame0: regs=r7 stack= before 8: (27) r7 *= 4") +__msg("mark_precise: frame0: regs=r7 stack= before 7: (79) r7 = *(u64 *)(r10 -8)") +__msg("mark_precise: frame0: parent state regs= stack=-8: R0_w=2 R6_w=1 R8_rw=map_value(map=.data.vals,ks=4,vs=16) R10=fp0 fp-8_rw=P1") +__msg("mark_precise: frame0: last_idx 18 first_idx 0 subseq_idx 7") +__msg("mark_precise: frame0: regs= stack=-8 before 18: (95) exit") +__msg("mark_precise: frame1: regs= stack= before 17: (0f) r0 += r2") +__msg("mark_precise: frame1: regs= stack= before 16: (79) r2 = *(u64 *)(r1 +0)") +__msg("mark_precise: frame1: regs= stack= before 15: (79) r0 = *(u64 *)(r10 -16)") +__msg("mark_precise: frame1: regs= stack= before 14: (7b) *(u64 *)(r10 -16) = r2") +__msg("mark_precise: frame1: regs= stack= before 13: (7b) *(u64 *)(r1 +0) = r2") +__msg("mark_precise: frame1: regs=r2 stack= before 6: (85) call pc+6") +__msg("mark_precise: frame0: regs=r2 stack= before 5: (bf) r2 = r6") +__msg("mark_precise: frame0: regs=r6 stack= before 4: (07) r1 += -8") +__msg("mark_precise: frame0: regs=r6 stack= before 3: (bf) r1 = r10") +__msg("mark_precise: frame0: regs=r6 stack= before 2: (b7) r6 = 1") __naked int subprog_spill_into_parent_stack_slot_precise(void) { asm volatile ( diff --git a/tools/testing/selftests/bpf/verifier/precise.c b/tools/testing/selftests/bpf/verifier/precise.c index 0d84dd1f38b6..8a2ff81d8350 100644 --- a/tools/testing/selftests/bpf/verifier/precise.c +++ b/tools/testing/selftests/bpf/verifier/precise.c @@ -140,10 +140,11 @@ .result = REJECT, }, { - "precise: ST insn causing spi > allocated_stack", + "precise: ST zero to stack insn is supported", .insns = { BPF_MOV64_REG(BPF_REG_3, BPF_REG_10), BPF_JMP_IMM(BPF_JNE, BPF_REG_3, 123, 0), + /* not a register spill, so we stop precision propagation for R4 here */ BPF_ST_MEM(BPF_DW, BPF_REG_3, -8, 0), BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_10, -8), BPF_MOV64_IMM(BPF_REG_0, -1), @@ -157,11 +158,11 @@ mark_precise: frame0: last_idx 4 first_idx 2\ mark_precise: frame0: regs=r4 stack= before 4\ mark_precise: frame0: regs=r4 stack= before 3\ - mark_precise: frame0: regs= stack=-8 before 2\ - mark_precise: frame0: falling back to forcing all scalars precise\ - force_precise: frame0: forcing r0 to be precise\ mark_precise: frame0: last_idx 5 first_idx 5\ - mark_precise: frame0: parent state regs= stack=:", + mark_precise: frame0: parent state regs=r0 stack=:\ + mark_precise: frame0: last_idx 4 first_idx 2\ + mark_precise: frame0: regs=r0 stack= before 4\ + 5: R0=-1 R4=0", .result = VERBOSE_ACCEPT, .retval = -1, }, @@ -169,6 +170,8 @@ "precise: STX insn causing spi > allocated_stack", .insns = { BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), + /* make later reg spill more interesting by having somewhat known scalar */ + BPF_ALU64_IMM(BPF_AND, BPF_REG_0, 0xff), BPF_MOV64_REG(BPF_REG_3, BPF_REG_10), BPF_JMP_IMM(BPF_JNE, BPF_REG_3, 123, 0), BPF_STX_MEM(BPF_DW, BPF_REG_3, BPF_REG_0, -8), @@ -179,18 +182,21 @@ }, .prog_type = BPF_PROG_TYPE_XDP, .flags = BPF_F_TEST_STATE_FREQ, - .errstr = "mark_precise: frame0: last_idx 6 first_idx 6\ + .errstr = "mark_precise: frame0: last_idx 7 first_idx 7\ mark_precise: frame0: parent state regs=r4 stack=:\ - mark_precise: frame0: last_idx 5 first_idx 3\ - mark_precise: frame0: regs=r4 stack= before 5\ - mark_precise: frame0: regs=r4 stack= before 4\ - mark_precise: frame0: regs= stack=-8 before 3\ - mark_precise: frame0: falling back to forcing all scalars precise\ - force_precise: frame0: forcing r0 to be precise\ - force_precise: frame0: forcing r0 to be precise\ - force_precise: frame0: forcing r0 to be precise\ - force_precise: frame0: forcing r0 to be precise\ - mark_precise: frame0: last_idx 6 first_idx 6\ + mark_precise: frame0: last_idx 6 first_idx 4\ + mark_precise: frame0: regs=r4 stack= before 6: (b7) r0 = -1\ + mark_precise: frame0: regs=r4 stack= before 5: (79) r4 = *(u64 *)(r10 -8)\ + mark_precise: frame0: regs= stack=-8 before 4: (7b) *(u64 *)(r3 -8) = r0\ + mark_precise: frame0: parent state regs=r0 stack=:\ + mark_precise: frame0: last_idx 3 first_idx 3\ + mark_precise: frame0: regs=r0 stack= before 3: (55) if r3 != 0x7b goto pc+0\ + mark_precise: frame0: regs=r0 stack= before 2: (bf) r3 = r10\ + mark_precise: frame0: regs=r0 stack= before 1: (57) r0 &= 255\ + mark_precise: frame0: parent state regs=r0 stack=:\ + mark_precise: frame0: last_idx 0 first_idx 0\ + mark_precise: frame0: regs=r0 stack= before 0: (85) call bpf_get_prandom_u32#7\ + mark_precise: frame0: last_idx 7 first_idx 7\ mark_precise: frame0: parent state regs= stack=:", .result = VERBOSE_ACCEPT, .retval = -1, From patchwork Tue Dec 5 18:42:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13480626 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EA49910E0 for ; Tue, 5 Dec 2023 10:43:15 -0800 (PST) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3B5IfKTh015622 for ; Tue, 5 Dec 2023 10:43:14 -0800 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3usxjtn4ds-10 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 05 Dec 2023 10:43:14 -0800 Received: from twshared15991.38.frc1.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:11d::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.34; Tue, 5 Dec 2023 10:43:08 -0800 Received: by devbig019.vll3.facebook.com (Postfix, from userid 137359) id 6A5193CA16FC7; Tue, 5 Dec 2023 10:42:54 -0800 (PST) From: Andrii Nakryiko To: , , , CC: , , Eduard Zingerman Subject: [PATCH v4 bpf-next 02/10] selftests/bpf: add stack access precision test Date: Tue, 5 Dec 2023 10:42:40 -0800 Message-ID: <20231205184248.1502704-3-andrii@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231205184248.1502704-1-andrii@kernel.org> References: <20231205184248.1502704-1-andrii@kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: TSYYfX3G4uL29vfZ92aH_sFj9882FfNk X-Proofpoint-ORIG-GUID: TSYYfX3G4uL29vfZ92aH_sFj9882FfNk X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-05_14,2023-12-05_01,2023-05-22_02 X-Patchwork-Delegate: bpf@iogearbox.net Add a new selftests that validates precision tracking for stack access instruction, using both r10-based and non-r10-based accesses. For non-r10 ones we also make sure to have non-zero var_off to validate that final stack offset is tracked properly in instruction history information inside verifier. Acked-by: Eduard Zingerman Signed-off-by: Andrii Nakryiko --- .../bpf/progs/verifier_subprog_precision.c | 64 +++++++++++++++++-- 1 file changed, 59 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c b/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c index eba98fab2f54..6f5d19665cf6 100644 --- a/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c +++ b/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c @@ -641,14 +641,68 @@ __naked int subprog_spill_into_parent_stack_slot_precise(void) ); } -__naked __noinline __used -static __u64 subprog_with_checkpoint(void) +SEC("?raw_tp") +__success __log_level(2) +__msg("17: (0f) r1 += r0") +__msg("mark_precise: frame0: last_idx 17 first_idx 0 subseq_idx -1") +__msg("mark_precise: frame0: regs=r0 stack= before 16: (bf) r1 = r7") +__msg("mark_precise: frame0: regs=r0 stack= before 15: (27) r0 *= 4") +__msg("mark_precise: frame0: regs=r0 stack= before 14: (79) r0 = *(u64 *)(r10 -16)") +__msg("mark_precise: frame0: regs= stack=-16 before 13: (7b) *(u64 *)(r7 -8) = r0") +__msg("mark_precise: frame0: regs=r0 stack= before 12: (79) r0 = *(u64 *)(r8 +16)") +__msg("mark_precise: frame0: regs= stack=-16 before 11: (7b) *(u64 *)(r8 +16) = r0") +__msg("mark_precise: frame0: regs=r0 stack= before 10: (79) r0 = *(u64 *)(r7 -8)") +__msg("mark_precise: frame0: regs= stack=-16 before 9: (7b) *(u64 *)(r10 -16) = r0") +__msg("mark_precise: frame0: regs=r0 stack= before 8: (07) r8 += -32") +__msg("mark_precise: frame0: regs=r0 stack= before 7: (bf) r8 = r10") +__msg("mark_precise: frame0: regs=r0 stack= before 6: (07) r7 += -8") +__msg("mark_precise: frame0: regs=r0 stack= before 5: (bf) r7 = r10") +__msg("mark_precise: frame0: regs=r0 stack= before 21: (95) exit") +__msg("mark_precise: frame1: regs=r0 stack= before 20: (bf) r0 = r1") +__msg("mark_precise: frame1: regs=r1 stack= before 4: (85) call pc+15") +__msg("mark_precise: frame0: regs=r1 stack= before 3: (bf) r1 = r6") +__msg("mark_precise: frame0: regs=r6 stack= before 2: (b7) r6 = 1") +__naked int stack_slot_aliases_precision(void) { asm volatile ( - "r0 = 0;" - /* guaranteed checkpoint if BPF_F_TEST_STATE_FREQ is used */ - "goto +0;" + "r6 = 1;" + /* pass r6 through r1 into subprog to get it back as r0; + * this whole chain will have to be marked as precise later + */ + "r1 = r6;" + "call identity_subprog;" + /* let's setup two registers that are aliased to r10 */ + "r7 = r10;" + "r7 += -8;" /* r7 = r10 - 8 */ + "r8 = r10;" + "r8 += -32;" /* r8 = r10 - 32 */ + /* now spill subprog's return value (a r6 -> r1 -> r0 chain) + * a few times through different stack pointer regs, making + * sure to use r10, r7, and r8 both in LDX and STX insns, and + * *importantly* also using a combination of const var_off and + * insn->off to validate that we record final stack slot + * correctly, instead of relying on just insn->off derivation, + * which is only valid for r10-based stack offset + */ + "*(u64 *)(r10 - 16) = r0;" + "r0 = *(u64 *)(r7 - 8);" /* r7 - 8 == r10 - 16 */ + "*(u64 *)(r8 + 16) = r0;" /* r8 + 16 = r10 - 16 */ + "r0 = *(u64 *)(r8 + 16);" + "*(u64 *)(r7 - 8) = r0;" + "r0 = *(u64 *)(r10 - 16);" + /* get ready to use r0 as an index into array to force precision */ + "r0 *= 4;" + "r1 = %[vals];" + /* here r0->r1->r6 chain is forced to be precise and has to be + * propagated back to the beginning, including through the + * subprog call and all the stack spills and loads + */ + "r1 += r0;" + "r0 = *(u32 *)(r1 + 0);" "exit;" + : + : __imm_ptr(vals) + : __clobber_common, "r6" ); } From patchwork Tue Dec 5 18:42:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13480624 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 88B5E10DB for ; Tue, 5 Dec 2023 10:43:13 -0800 (PST) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3B5IfMc9015699 for ; Tue, 5 Dec 2023 10:43:12 -0800 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3usxjtn4eg-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 05 Dec 2023 10:43:11 -0800 Received: from twshared15232.14.prn3.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:21d::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.34; Tue, 5 Dec 2023 10:43:09 -0800 Received: by devbig019.vll3.facebook.com (Postfix, from userid 137359) id DEBCA3CA16FCF; Tue, 5 Dec 2023 10:42:57 -0800 (PST) From: Andrii Nakryiko To: , , , CC: , Subject: [PATCH v4 bpf-next 03/10] bpf: fix check for attempt to corrupt spilled pointer Date: Tue, 5 Dec 2023 10:42:41 -0800 Message-ID: <20231205184248.1502704-4-andrii@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231205184248.1502704-1-andrii@kernel.org> References: <20231205184248.1502704-1-andrii@kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: nvbxwQefeVeLSrkzr-rZMYtroTNubTly X-Proofpoint-ORIG-GUID: nvbxwQefeVeLSrkzr-rZMYtroTNubTly X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-05_14,2023-12-05_01,2023-05-22_02 X-Patchwork-Delegate: bpf@iogearbox.net When register is spilled onto a stack as a 1/2/4-byte register, we set slot_type[BPF_REG_SIZE - 1] (plus potentially few more below it, depending on actual spill size). So to check if some stack slot has spilled register we need to consult slot_type[7], not slot_type[0]. To avoid the need to remember and double-check this in the future, just use is_spilled_reg() helper. Fixes: 27113c59b6d0 ("bpf: Check the other end of slot_type for STACK_SPILL") Signed-off-by: Andrii Nakryiko --- kernel/bpf/verifier.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 9bc16dc66465..3edca06de9fd 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -4431,7 +4431,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, * so it's aligned access and [off, off + size) are within stack limits */ if (!env->allow_ptr_leaks && - state->stack[spi].slot_type[0] == STACK_SPILL && + is_spilled_reg(&state->stack[spi]) && size != BPF_REG_SIZE) { verbose(env, "attempt to corrupt spilled pointer on stack\n"); return -EACCES; From patchwork Tue Dec 5 18:42:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13480633 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0DC801A2 for ; Tue, 5 Dec 2023 10:45:46 -0800 (PST) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3B5FexJu009691 for ; Tue, 5 Dec 2023 10:45:46 -0800 Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3ut2n0ukjk-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 05 Dec 2023 10:45:45 -0800 Received: from twshared68648.02.prn6.facebook.com (2620:10d:c0a8:1b::2d) by mail.thefacebook.com (2620:10d:c0a8:83::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.34; Tue, 5 Dec 2023 10:45:43 -0800 Received: by devbig019.vll3.facebook.com (Postfix, from userid 137359) id 9E1123CA16FED; Tue, 5 Dec 2023 10:43:00 -0800 (PST) From: Andrii Nakryiko To: , , , CC: , , Eduard Zingerman Subject: [PATCH v4 bpf-next 04/10] bpf: preserve STACK_ZERO slots on partial reg spills Date: Tue, 5 Dec 2023 10:42:42 -0800 Message-ID: <20231205184248.1502704-5-andrii@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231205184248.1502704-1-andrii@kernel.org> References: <20231205184248.1502704-1-andrii@kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: bW-g0UTnD7HfobVfBYbY9Ujm4iqY9HDy X-Proofpoint-GUID: bW-g0UTnD7HfobVfBYbY9Ujm4iqY9HDy X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-05_14,2023-12-05_01,2023-05-22_02 X-Patchwork-Delegate: bpf@iogearbox.net Instead of always forcing STACK_ZERO slots to STACK_MISC, preserve it in situations where this is possible. E.g., when spilling register as 1/2/4-byte subslots on the stack, all the remaining bytes in the stack slot do not automatically become unknown. If we knew they contained zeroes, we can preserve those STACK_ZERO markers. Add a helper mark_stack_slot_misc(), similar to scrub_spilled_slot(), but that doesn't overwrite either STACK_INVALID nor STACK_ZERO. Note that we need to take into account possibility of being in unprivileged mode, in which case STACK_INVALID is forced to STACK_MISC for correctness, as treating STACK_INVALID as equivalent STACK_MISC is only enabled in privileged mode. Acked-by: Eduard Zingerman Signed-off-by: Andrii Nakryiko --- kernel/bpf/verifier.c | 28 +++++++++++++++++++++++----- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 3edca06de9fd..93de39a6e36e 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1144,6 +1144,21 @@ static bool is_spilled_scalar_reg(const struct bpf_stack_state *stack) stack->spilled_ptr.type == SCALAR_VALUE; } +/* Mark stack slot as STACK_MISC, unless it is already STACK_INVALID, in which + * case they are equivalent, or it's STACK_ZERO, in which case we preserve + * more precise STACK_ZERO. + * Note, in uprivileged mode leaving STACK_INVALID is wrong, so we take + * env->allow_ptr_leaks into account and force STACK_MISC, if necessary. + */ +static void mark_stack_slot_misc(struct bpf_verifier_env *env, u8 *stype) +{ + if (*stype == STACK_ZERO) + return; + if (env->allow_ptr_leaks && *stype == STACK_INVALID) + return; + *stype = STACK_MISC; +} + static void scrub_spilled_slot(u8 *stype) { if (*stype != STACK_INVALID) @@ -4386,7 +4401,8 @@ static void copy_register_state(struct bpf_reg_state *dst, const struct bpf_reg_ dst->live = live; } -static void save_register_state(struct bpf_func_state *state, +static void save_register_state(struct bpf_verifier_env *env, + struct bpf_func_state *state, int spi, struct bpf_reg_state *reg, int size) { @@ -4401,7 +4417,7 @@ static void save_register_state(struct bpf_func_state *state, /* size < 8 bytes spill */ for (; i; i--) - scrub_spilled_slot(&state->stack[spi].slot_type[i - 1]); + mark_stack_slot_misc(env, &state->stack[spi].slot_type[i - 1]); } static bool is_bpf_st_mem(struct bpf_insn *insn) @@ -4463,7 +4479,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, mark_stack_slot_scratched(env, spi); if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && !register_is_null(reg) && env->bpf_capable) { - save_register_state(state, spi, reg, size); + save_register_state(env, state, spi, reg, size); /* Break the relation on a narrowing spill. */ if (fls64(reg->umax_value) > BITS_PER_BYTE * size) state->stack[spi].spilled_ptr.id = 0; @@ -4473,7 +4489,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, __mark_reg_known(&fake_reg, insn->imm); fake_reg.type = SCALAR_VALUE; - save_register_state(state, spi, &fake_reg, size); + save_register_state(env, state, spi, &fake_reg, size); insn_flags = 0; /* not a register spill */ } else if (reg && is_spillable_regtype(reg->type)) { /* register containing pointer is being spilled into stack */ @@ -4486,7 +4502,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, verbose(env, "cannot spill pointers to stack into stack frame of the caller\n"); return -EINVAL; } - save_register_state(state, spi, reg, size); + save_register_state(env, state, spi, reg, size); } else { u8 type = STACK_MISC; @@ -4757,6 +4773,8 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env, continue; if (type == STACK_MISC) continue; + if (type == STACK_ZERO) + continue; if (type == STACK_INVALID && env->allow_uninit_stack) continue; verbose(env, "invalid read from stack off %d+%d size %d\n", From patchwork Tue Dec 5 18:42:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13480629 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5CB4D10E0 for ; Tue, 5 Dec 2023 10:43:24 -0800 (PST) Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3B5FgtE2014591 for ; Tue, 5 Dec 2023 10:43:24 -0800 Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3usrmmqbgt-6 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 05 Dec 2023 10:43:23 -0800 Received: from twshared51573.38.frc1.facebook.com (2620:10d:c0a8:1b::2d) by mail.thefacebook.com (2620:10d:c0a8:82::b) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.34; Tue, 5 Dec 2023 10:43:18 -0800 Received: by devbig019.vll3.facebook.com (Postfix, from userid 137359) id C2D853CA17035; Tue, 5 Dec 2023 10:43:05 -0800 (PST) From: Andrii Nakryiko To: , , , CC: , , Eduard Zingerman Subject: [PATCH v4 bpf-next 05/10] selftests/bpf: validate STACK_ZERO is preserved on subreg spill Date: Tue, 5 Dec 2023 10:42:43 -0800 Message-ID: <20231205184248.1502704-6-andrii@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231205184248.1502704-1-andrii@kernel.org> References: <20231205184248.1502704-1-andrii@kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: aeB89lFmSrsTe1libjsATSuzgMb8FkDG X-Proofpoint-GUID: aeB89lFmSrsTe1libjsATSuzgMb8FkDG X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-05_14,2023-12-05_01,2023-05-22_02 X-Patchwork-Delegate: bpf@iogearbox.net Add tests validating that STACK_ZERO slots are preserved when slot is partially overwritten with subregister spill. Acked-by: Eduard Zingerman Signed-off-by: Andrii Nakryiko --- .../selftests/bpf/progs/verifier_spill_fill.c | 40 +++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/tools/testing/selftests/bpf/progs/verifier_spill_fill.c b/tools/testing/selftests/bpf/progs/verifier_spill_fill.c index 6115520154e3..d9dabae81176 100644 --- a/tools/testing/selftests/bpf/progs/verifier_spill_fill.c +++ b/tools/testing/selftests/bpf/progs/verifier_spill_fill.c @@ -4,6 +4,7 @@ #include #include #include "bpf_misc.h" +#include <../../../tools/include/linux/filter.h> struct { __uint(type, BPF_MAP_TYPE_RINGBUF); @@ -450,4 +451,43 @@ l0_%=: r1 >>= 16; \ : __clobber_all); } +SEC("raw_tp") +__log_level(2) +__success +__msg("fp-8=0m??mmmm") +__msg("fp-16=00mm??mm") +__msg("fp-24=00mm???m") +__naked void spill_subregs_preserve_stack_zero(void) +{ + asm volatile ( + "call %[bpf_get_prandom_u32];" + + /* 32-bit subreg spill with ZERO, MISC, and INVALID */ + ".8byte %[fp1_u8_st_zero];" /* ZERO, LLVM-18+: *(u8 *)(r10 -1) = 0; */ + "*(u8 *)(r10 -2) = r0;" /* MISC */ + /* fp-3 and fp-4 stay INVALID */ + "*(u32 *)(r10 -8) = r0;" + + /* 16-bit subreg spill with ZERO, MISC, and INVALID */ + ".8byte %[fp10_u16_st_zero];" /* ZERO, LLVM-18+: *(u16 *)(r10 -10) = 0; */ + "*(u16 *)(r10 -12) = r0;" /* MISC */ + /* fp-13 and fp-14 stay INVALID */ + "*(u16 *)(r10 -16) = r0;" + + /* 8-bit subreg spill with ZERO, MISC, and INVALID */ + ".8byte %[fp18_u16_st_zero];" /* ZERO, LLVM-18+: *(u16 *)(r18 -10) = 0; */ + "*(u16 *)(r10 -20) = r0;" /* MISC */ + /* fp-21, fp-22, and fp-23 stay INVALID */ + "*(u8 *)(r10 -24) = r0;" + + "r0 = 0;" + "exit;" + : + : __imm(bpf_get_prandom_u32), + __imm_insn(fp1_u8_st_zero, BPF_ST_MEM(BPF_B, BPF_REG_FP, -1, 0)), + __imm_insn(fp10_u16_st_zero, BPF_ST_MEM(BPF_H, BPF_REG_FP, -10, 0)), + __imm_insn(fp18_u16_st_zero, BPF_ST_MEM(BPF_H, BPF_REG_FP, -18, 0)) + : __clobber_all); +} + char _license[] SEC("license") = "GPL"; From patchwork Tue Dec 5 18:42:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13480627 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E1D510DB for ; Tue, 5 Dec 2023 10:43:17 -0800 (PST) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3B5IfMcD015699 for ; Tue, 5 Dec 2023 10:43:16 -0800 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3usxjtn4eg-9 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 05 Dec 2023 10:43:15 -0800 Received: from twshared29647.38.frc1.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:21d::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.34; Tue, 5 Dec 2023 10:43:14 -0800 Received: by devbig019.vll3.facebook.com (Postfix, from userid 137359) id D132A3CA1703B; Tue, 5 Dec 2023 10:43:07 -0800 (PST) From: Andrii Nakryiko To: , , , CC: , , Eduard Zingerman Subject: [PATCH v4 bpf-next 06/10] bpf: preserve constant zero when doing partial register restore Date: Tue, 5 Dec 2023 10:42:44 -0800 Message-ID: <20231205184248.1502704-7-andrii@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231205184248.1502704-1-andrii@kernel.org> References: <20231205184248.1502704-1-andrii@kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: TIaNrt9430cCI330q02B4nmKMNiGB9Z_ X-Proofpoint-ORIG-GUID: TIaNrt9430cCI330q02B4nmKMNiGB9Z_ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-05_14,2023-12-05_01,2023-05-22_02 X-Patchwork-Delegate: bpf@iogearbox.net Similar to special handling of STACK_ZERO, when reading 1/2/4 bytes from stack from slot that has register spilled into it and that register has a constant value zero, preserve that zero and mark spilled register as precise for that. This makes spilled const zero register and STACK_ZERO cases equivalent in their behavior. Acked-by: Eduard Zingerman Signed-off-by: Andrii Nakryiko --- kernel/bpf/verifier.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 93de39a6e36e..1ebe76c98451 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -4767,22 +4767,39 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env, copy_register_state(&state->regs[dst_regno], reg); state->regs[dst_regno].subreg_def = subreg_def; } else { + int spill_cnt = 0, zero_cnt = 0; + for (i = 0; i < size; i++) { type = stype[(slot - i) % BPF_REG_SIZE]; - if (type == STACK_SPILL) + if (type == STACK_SPILL) { + spill_cnt++; continue; + } if (type == STACK_MISC) continue; - if (type == STACK_ZERO) + if (type == STACK_ZERO) { + zero_cnt++; continue; + } if (type == STACK_INVALID && env->allow_uninit_stack) continue; verbose(env, "invalid read from stack off %d+%d size %d\n", off, i, size); return -EACCES; } - mark_reg_unknown(env, state->regs, dst_regno); - insn_flags = 0; /* not restoring original register state */ + + if (spill_cnt == size && + tnum_is_const(reg->var_off) && reg->var_off.value == 0) { + __mark_reg_const_zero(&state->regs[dst_regno]); + /* this IS register fill, so keep insn_flags */ + } else if (zero_cnt == size) { + /* similarly to mark_reg_stack_read(), preserve zeroes */ + __mark_reg_const_zero(&state->regs[dst_regno]); + insn_flags = 0; /* not restoring original register state */ + } else { + mark_reg_unknown(env, state->regs, dst_regno); + insn_flags = 0; /* not restoring original register state */ + } } state->regs[dst_regno].live |= REG_LIVE_WRITTEN; } else if (dst_regno >= 0) { From patchwork Tue Dec 5 18:42:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13480631 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E9C3310F1 for ; Tue, 5 Dec 2023 10:43:31 -0800 (PST) Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3B5FdX6G024767 for ; Tue, 5 Dec 2023 10:43:31 -0800 Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3ustyw6eq3-19 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 05 Dec 2023 10:43:31 -0800 Received: from twshared51573.38.frc1.facebook.com (2620:10d:c0a8:1b::2d) by mail.thefacebook.com (2620:10d:c0a8:83::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.34; Tue, 5 Dec 2023 10:43:18 -0800 Received: by devbig019.vll3.facebook.com (Postfix, from userid 137359) id E3B843CA17050; Tue, 5 Dec 2023 10:43:09 -0800 (PST) From: Andrii Nakryiko To: , , , CC: , Subject: [PATCH v4 bpf-next 07/10] selftests/bpf: validate zero preservation for sub-slot loads Date: Tue, 5 Dec 2023 10:42:45 -0800 Message-ID: <20231205184248.1502704-8-andrii@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231205184248.1502704-1-andrii@kernel.org> References: <20231205184248.1502704-1-andrii@kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: jRX1gNlLjojrUROo-TGH-pUBW2ldnTB9 X-Proofpoint-ORIG-GUID: jRX1gNlLjojrUROo-TGH-pUBW2ldnTB9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-05_14,2023-12-05_01,2023-05-22_02 X-Patchwork-Delegate: bpf@iogearbox.net Validate that 1-, 2-, and 4-byte loads from stack slots not aligned on 8-byte boundary still preserve zero, when loading from all-STACK_ZERO sub-slots, or when stack sub-slots are covered by spilled register with known constant zero value. Signed-off-by: Andrii Nakryiko --- .../selftests/bpf/progs/verifier_spill_fill.c | 71 +++++++++++++++++++ 1 file changed, 71 insertions(+) diff --git a/tools/testing/selftests/bpf/progs/verifier_spill_fill.c b/tools/testing/selftests/bpf/progs/verifier_spill_fill.c index d9dabae81176..41fd61299eab 100644 --- a/tools/testing/selftests/bpf/progs/verifier_spill_fill.c +++ b/tools/testing/selftests/bpf/progs/verifier_spill_fill.c @@ -490,4 +490,75 @@ __naked void spill_subregs_preserve_stack_zero(void) : __clobber_all); } +char single_byte_buf[1] SEC(".data.single_byte_buf"); + +SEC("raw_tp") +__log_level(2) +__success +__naked void partial_stack_load_preserves_zeros(void) +{ + asm volatile ( + /* fp-8 is all STACK_ZERO */ + ".8byte %[fp8_st_zero];" /* LLVM-18+: *(u64 *)(r10 -8) = 0; */ + + /* fp-16 is const zero register */ + "r0 = 0;" + "*(u64 *)(r10 -16) = r0;" + + /* load single U8 from non-aligned STACK_ZERO slot */ + "r1 = %[single_byte_buf];" + "r2 = *(u8 *)(r10 -1);" + "r1 += r2;" + "*(u8 *)(r1 + 0) = r2;" /* this should be fine */ + + /* load single U8 from non-aligned ZERO REG slot */ + "r1 = %[single_byte_buf];" + "r2 = *(u8 *)(r10 -9);" + "r1 += r2;" + "*(u8 *)(r1 + 0) = r2;" /* this should be fine */ + + /* load single U16 from non-aligned STACK_ZERO slot */ + "r1 = %[single_byte_buf];" + "r2 = *(u16 *)(r10 -2);" + "r1 += r2;" + "*(u8 *)(r1 + 0) = r2;" /* this should be fine */ + + /* load single U16 from non-aligned ZERO REG slot */ + "r1 = %[single_byte_buf];" + "r2 = *(u16 *)(r10 -10);" + "r1 += r2;" + "*(u8 *)(r1 + 0) = r2;" /* this should be fine */ + + /* load single U32 from non-aligned STACK_ZERO slot */ + "r1 = %[single_byte_buf];" + "r2 = *(u32 *)(r10 -4);" + "r1 += r2;" + "*(u8 *)(r1 + 0) = r2;" /* this should be fine */ + + /* load single U32 from non-aligned ZERO REG slot */ + "r1 = %[single_byte_buf];" + "r2 = *(u32 *)(r10 -12);" + "r1 += r2;" + "*(u8 *)(r1 + 0) = r2;" /* this should be fine */ + + /* for completeness, load U64 from STACK_ZERO slot */ + "r1 = %[single_byte_buf];" + "r2 = *(u64 *)(r10 -8);" + "r1 += r2;" + "*(u8 *)(r1 + 0) = r2;" /* this should be fine */ + + /* for completeness, load U64 from ZERO REG slot */ + "r1 = %[single_byte_buf];" + "r2 = *(u64 *)(r10 -16);" + "r1 += r2;" + "*(u8 *)(r1 + 0) = r2;" /* this should be fine */ + + "r0 = 0;" + "exit;" + : + : __imm_ptr(single_byte_buf), + __imm_insn(fp8_st_zero, BPF_ST_MEM(BPF_DW, BPF_REG_FP, -8, 0)) + : __clobber_common); +} + char _license[] SEC("license") = "GPL"; From patchwork Tue Dec 5 18:42:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13480630 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 95ABC10D5 for ; Tue, 5 Dec 2023 10:43:25 -0800 (PST) Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3B5FgtE4014591 for ; Tue, 5 Dec 2023 10:43:25 -0800 Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3usrmmqbgt-8 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 05 Dec 2023 10:43:25 -0800 Received: from twshared51573.38.frc1.facebook.com (2620:10d:c0a8:1b::30) by mail.thefacebook.com (2620:10d:c0a8:82::b) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.34; Tue, 5 Dec 2023 10:43:18 -0800 Received: by devbig019.vll3.facebook.com (Postfix, from userid 137359) id 671F63CA1705C; Tue, 5 Dec 2023 10:43:11 -0800 (PST) From: Andrii Nakryiko To: , , , CC: , , Eduard Zingerman Subject: [PATCH v4 bpf-next 08/10] bpf: track aligned STACK_ZERO cases as imprecise spilled registers Date: Tue, 5 Dec 2023 10:42:46 -0800 Message-ID: <20231205184248.1502704-9-andrii@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231205184248.1502704-1-andrii@kernel.org> References: <20231205184248.1502704-1-andrii@kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: 6s4ZFByRsacOL20QBHltYHkgLRCZxn6I X-Proofpoint-GUID: 6s4ZFByRsacOL20QBHltYHkgLRCZxn6I X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-05_14,2023-12-05_01,2023-05-22_02 X-Patchwork-Delegate: bpf@iogearbox.net Now that precision backtracing is supporting register spill/fill to/from stack, there is another oportunity to be exploited here: minimizing precise STACK_ZERO cases. With a simple code change we can rely on initially imprecise register spill tracking for cases when register spilled to stack was a known zero. This is a very common case for initializing on the stack variables, including rather large structures. Often times zero has no special meaning for the subsequent BPF program logic and is often overwritten with non-zero values soon afterwards. But due to STACK_ZERO vs STACK_MISC tracking, such initial zero initialization actually causes duplication of verifier states as STACK_ZERO is clearly different than STACK_MISC or spilled SCALAR_VALUE register. The effect of this (now) trivial change is huge, as can be seen below. These are differences between BPF selftests, Cilium, and Meta-internal BPF object files relative to previous patch in this series. You can see improvements ranging from single-digit percentage improvement for instructions and states, all the way to 50-60% reduction for some of Meta-internal host agent programs, and even some Cilium programs. For Meta-internal ones I left only the differences for largest BPF object files by states/instructions, as there were too many differences in the overall output. All the differences were improvements, reducting number of states and thus instructions validated. Note, Meta-internal BPF object file names are not printed below. Many copies of balancer_ingress are actually many different configurations of Katran, so they are different BPF programs, which explains state reduction going from -16% all the way to 31%, depending on BPF program logic complexity. I also tooked a closer look at a few small-ish BPF programs to validate the behavior. Let's take bpf_iter_netrlink.bpf.o (first row below). While it's just 8 vs 5 states, verifier log is still pretty long to include it here. But the reduction in states is due to the following piece of C code: unsigned long ino; ... sk = s->sk_socket; if (!sk) { ino = 0; } else { inode = SOCK_INODE(sk); bpf_probe_read_kernel(&ino, sizeof(ino), &inode->i_ino); } BPF_SEQ_PRINTF(seq, "%-8u %-8lu\n", s->sk_drops.counter, ino); return 0; You can see that in some situations `ino` is zero-initialized, while in others it's unknown value filled out by bpf_probe_read_kernel(). Before this change code after if/else branches have to be validated twice. Once with (precise) ino == 0, due to eager STACK_ZERO logic, and then again for when ino is just STACK_MISC. But BPF_SEQ_PRINTF() doesn't care about precise value of ino, so with the change in this patch verifier is able to prune states from after one of the branches, reducing number of total states (and instructions) required for successful validation. Similar principle applies to bigger real-world applications, just at a much larger scale. SELFTESTS ========= File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) --------------------------------------- ----------------------- --------- --------- --------------- ---------- ---------- ------------- bpf_iter_netlink.bpf.linked3.o dump_netlink 148 104 -44 (-29.73%) 8 5 -3 (-37.50%) bpf_iter_unix.bpf.linked3.o dump_unix 8474 8404 -70 (-0.83%) 151 147 -4 (-2.65%) bpf_loop.bpf.linked3.o stack_check 560 324 -236 (-42.14%) 42 24 -18 (-42.86%) local_storage_bench.bpf.linked3.o get_local 120 77 -43 (-35.83%) 9 6 -3 (-33.33%) loop6.bpf.linked3.o trace_virtqueue_add_sgs 10167 9868 -299 (-2.94%) 226 206 -20 (-8.85%) pyperf600_bpf_loop.bpf.linked3.o on_event 4872 3423 -1449 (-29.74%) 322 229 -93 (-28.88%) strobemeta.bpf.linked3.o on_event 180697 176036 -4661 (-2.58%) 4780 4734 -46 (-0.96%) test_cls_redirect.bpf.linked3.o cls_redirect 65594 65401 -193 (-0.29%) 4230 4212 -18 (-0.43%) test_global_func_args.bpf.linked3.o test_cls 145 136 -9 (-6.21%) 10 9 -1 (-10.00%) test_l4lb.bpf.linked3.o balancer_ingress 4760 2612 -2148 (-45.13%) 113 102 -11 (-9.73%) test_l4lb_noinline.bpf.linked3.o balancer_ingress 4845 4877 +32 (+0.66%) 219 221 +2 (+0.91%) test_l4lb_noinline_dynptr.bpf.linked3.o balancer_ingress 2072 2087 +15 (+0.72%) 97 98 +1 (+1.03%) test_seg6_loop.bpf.linked3.o __add_egr_x 12440 9975 -2465 (-19.82%) 364 353 -11 (-3.02%) test_tcp_hdr_options.bpf.linked3.o estab 2558 2572 +14 (+0.55%) 179 180 +1 (+0.56%) test_xdp_dynptr.bpf.linked3.o _xdp_tx_iptunnel 645 596 -49 (-7.60%) 26 24 -2 (-7.69%) test_xdp_noinline.bpf.linked3.o balancer_ingress_v6 3520 3516 -4 (-0.11%) 216 216 +0 (+0.00%) xdp_synproxy_kern.bpf.linked3.o syncookie_tc 82661 81241 -1420 (-1.72%) 5073 5155 +82 (+1.62%) xdp_synproxy_kern.bpf.linked3.o syncookie_xdp 84964 82297 -2667 (-3.14%) 5130 5157 +27 (+0.53%) META-INTERNAL ============= Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) -------------------------------------- --------- --------- ----------------- ---------- ---------- --------------- balancer_ingress 27925 23608 -4317 (-15.46%) 1488 1482 -6 (-0.40%) balancer_ingress 31824 27546 -4278 (-13.44%) 1658 1652 -6 (-0.36%) balancer_ingress 32213 27935 -4278 (-13.28%) 1689 1683 -6 (-0.36%) balancer_ingress 32213 27935 -4278 (-13.28%) 1689 1683 -6 (-0.36%) balancer_ingress 31824 27546 -4278 (-13.44%) 1658 1652 -6 (-0.36%) balancer_ingress 38647 29562 -9085 (-23.51%) 2069 1835 -234 (-11.31%) balancer_ingress 38647 29562 -9085 (-23.51%) 2069 1835 -234 (-11.31%) balancer_ingress 40339 30792 -9547 (-23.67%) 2193 1934 -259 (-11.81%) balancer_ingress 37321 29055 -8266 (-22.15%) 1972 1795 -177 (-8.98%) balancer_ingress 38176 29753 -8423 (-22.06%) 2008 1831 -177 (-8.81%) balancer_ingress 29193 20910 -8283 (-28.37%) 1599 1422 -177 (-11.07%) balancer_ingress 30013 21452 -8561 (-28.52%) 1645 1447 -198 (-12.04%) balancer_ingress 28691 24290 -4401 (-15.34%) 1545 1531 -14 (-0.91%) balancer_ingress 34223 28965 -5258 (-15.36%) 1984 1875 -109 (-5.49%) balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%) balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%) balancer_ingress 35868 26455 -9413 (-26.24%) 2140 1827 -313 (-14.63%) balancer_ingress 35868 26455 -9413 (-26.24%) 2140 1827 -313 (-14.63%) balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%) balancer_ingress 35481 26158 -9323 (-26.28%) 2095 1806 -289 (-13.79%) balancer_ingress 34844 29485 -5359 (-15.38%) 2036 1918 -118 (-5.80%) fbflow_egress 3256 2652 -604 (-18.55%) 218 192 -26 (-11.93%) fbflow_ingress 1026 944 -82 (-7.99%) 70 63 -7 (-10.00%) sslwall_tc_egress 8424 7360 -1064 (-12.63%) 498 458 -40 (-8.03%) syar_accept_protect 15040 9539 -5501 (-36.58%) 364 220 -144 (-39.56%) syar_connect_tcp_v6 15036 9535 -5501 (-36.59%) 360 216 -144 (-40.00%) syar_connect_udp_v4 15039 9538 -5501 (-36.58%) 361 217 -144 (-39.89%) syar_connect_connect4_protect4 24805 15833 -8972 (-36.17%) 756 480 -276 (-36.51%) syar_lsm_file_open 167772 151813 -15959 (-9.51%) 1836 1667 -169 (-9.20%) syar_namespace_create_new 14805 9304 -5501 (-37.16%) 353 209 -144 (-40.79%) syar_python3_detect 17531 12030 -5501 (-31.38%) 391 247 -144 (-36.83%) syar_ssh_post_fork 16412 10911 -5501 (-33.52%) 405 261 -144 (-35.56%) syar_enter_execve 14728 9227 -5501 (-37.35%) 345 201 -144 (-41.74%) syar_enter_execveat 14728 9227 -5501 (-37.35%) 345 201 -144 (-41.74%) syar_exit_execve 16622 11121 -5501 (-33.09%) 376 232 -144 (-38.30%) syar_exit_execveat 16622 11121 -5501 (-33.09%) 376 232 -144 (-38.30%) syar_syscalls_kill 15288 9787 -5501 (-35.98%) 398 254 -144 (-36.18%) syar_task_enter_pivot_root 14898 9397 -5501 (-36.92%) 357 213 -144 (-40.34%) syar_syscalls_setreuid 16678 11177 -5501 (-32.98%) 429 285 -144 (-33.57%) syar_syscalls_setuid 16678 11177 -5501 (-32.98%) 429 285 -144 (-33.57%) syar_syscalls_process_vm_readv 14959 9458 -5501 (-36.77%) 364 220 -144 (-39.56%) syar_syscalls_process_vm_writev 15757 10256 -5501 (-34.91%) 390 246 -144 (-36.92%) do_uprobe 15519 10018 -5501 (-35.45%) 373 229 -144 (-38.61%) edgewall 179715 55783 -123932 (-68.96%) 12607 3999 -8608 (-68.28%) bictcp_state 7570 4131 -3439 (-45.43%) 496 269 -227 (-45.77%) cubictcp_state 7570 4131 -3439 (-45.43%) 496 269 -227 (-45.77%) tcp_rate_skb_delivered 447 272 -175 (-39.15%) 29 18 -11 (-37.93%) kprobe__bbr_set_state 4566 2615 -1951 (-42.73%) 209 124 -85 (-40.67%) kprobe__bictcp_state 4566 2615 -1951 (-42.73%) 209 124 -85 (-40.67%) inet_sock_set_state 1501 1337 -164 (-10.93%) 93 85 -8 (-8.60%) tcp_retransmit_skb 1145 981 -164 (-14.32%) 67 59 -8 (-11.94%) tcp_retransmit_synack 1183 951 -232 (-19.61%) 67 55 -12 (-17.91%) bpf_tcptuner 1459 1187 -272 (-18.64%) 99 80 -19 (-19.19%) tw_egress 801 776 -25 (-3.12%) 69 66 -3 (-4.35%) tw_ingress 795 770 -25 (-3.14%) 69 66 -3 (-4.35%) ttls_tc_ingress 19025 19383 +358 (+1.88%) 470 465 -5 (-1.06%) ttls_nat_egress 490 299 -191 (-38.98%) 33 20 -13 (-39.39%) ttls_nat_ingress 448 285 -163 (-36.38%) 32 21 -11 (-34.38%) tw_twfw_egress 511127 212071 -299056 (-58.51%) 16733 8504 -8229 (-49.18%) tw_twfw_ingress 500095 212069 -288026 (-57.59%) 16223 8504 -7719 (-47.58%) tw_twfw_tc_eg 511113 212064 -299049 (-58.51%) 16732 8504 -8228 (-49.18%) tw_twfw_tc_in 500095 212069 -288026 (-57.59%) 16223 8504 -7719 (-47.58%) tw_twfw_egress 12632 12435 -197 (-1.56%) 276 260 -16 (-5.80%) tw_twfw_ingress 12631 12454 -177 (-1.40%) 278 261 -17 (-6.12%) tw_twfw_tc_eg 12595 12435 -160 (-1.27%) 274 259 -15 (-5.47%) tw_twfw_tc_in 12631 12454 -177 (-1.40%) 278 261 -17 (-6.12%) tw_xdp_dump 266 209 -57 (-21.43%) 9 8 -1 (-11.11%) CILIUM ========= File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) ------------- -------------------------------- --------- --------- ---------------- ---------- ---------- -------------- bpf_host.o cil_to_netdev 6047 4578 -1469 (-24.29%) 362 249 -113 (-31.22%) bpf_host.o handle_lxc_traffic 2227 1585 -642 (-28.83%) 156 103 -53 (-33.97%) bpf_host.o tail_handle_ipv4_from_netdev 2244 1458 -786 (-35.03%) 163 106 -57 (-34.97%) bpf_host.o tail_handle_nat_fwd_ipv4 21022 10479 -10543 (-50.15%) 1289 670 -619 (-48.02%) bpf_host.o tail_handle_nat_fwd_ipv6 15433 11375 -4058 (-26.29%) 905 643 -262 (-28.95%) bpf_host.o tail_ipv4_host_policy_ingress 2219 1367 -852 (-38.40%) 161 96 -65 (-40.37%) bpf_host.o tail_nodeport_nat_egress_ipv4 22460 19862 -2598 (-11.57%) 1469 1293 -176 (-11.98%) bpf_host.o tail_nodeport_nat_ingress_ipv4 5526 3534 -1992 (-36.05%) 366 243 -123 (-33.61%) bpf_host.o tail_nodeport_nat_ingress_ipv6 5132 4256 -876 (-17.07%) 241 219 -22 (-9.13%) bpf_host.o tail_nodeport_nat_ipv6_egress 3702 3542 -160 (-4.32%) 215 205 -10 (-4.65%) bpf_lxc.o tail_handle_nat_fwd_ipv4 21022 10479 -10543 (-50.15%) 1289 670 -619 (-48.02%) bpf_lxc.o tail_handle_nat_fwd_ipv6 15433 11375 -4058 (-26.29%) 905 643 -262 (-28.95%) bpf_lxc.o tail_ipv4_ct_egress 5073 3374 -1699 (-33.49%) 262 172 -90 (-34.35%) bpf_lxc.o tail_ipv4_ct_ingress 5093 3385 -1708 (-33.54%) 262 172 -90 (-34.35%) bpf_lxc.o tail_ipv4_ct_ingress_policy_only 5093 3385 -1708 (-33.54%) 262 172 -90 (-34.35%) bpf_lxc.o tail_ipv6_ct_egress 4593 3878 -715 (-15.57%) 194 151 -43 (-22.16%) bpf_lxc.o tail_ipv6_ct_ingress 4606 3891 -715 (-15.52%) 194 151 -43 (-22.16%) bpf_lxc.o tail_ipv6_ct_ingress_policy_only 4606 3891 -715 (-15.52%) 194 151 -43 (-22.16%) bpf_lxc.o tail_nodeport_nat_ingress_ipv4 5526 3534 -1992 (-36.05%) 366 243 -123 (-33.61%) bpf_lxc.o tail_nodeport_nat_ingress_ipv6 5132 4256 -876 (-17.07%) 241 219 -22 (-9.13%) bpf_overlay.o tail_handle_nat_fwd_ipv4 20524 10114 -10410 (-50.72%) 1271 638 -633 (-49.80%) bpf_overlay.o tail_nodeport_nat_egress_ipv4 22718 19490 -3228 (-14.21%) 1475 1275 -200 (-13.56%) bpf_overlay.o tail_nodeport_nat_ingress_ipv4 5526 3534 -1992 (-36.05%) 366 243 -123 (-33.61%) bpf_overlay.o tail_nodeport_nat_ingress_ipv6 5132 4256 -876 (-17.07%) 241 219 -22 (-9.13%) bpf_overlay.o tail_nodeport_nat_ipv6_egress 3638 3548 -90 (-2.47%) 209 203 -6 (-2.87%) bpf_overlay.o tail_rev_nodeport_lb4 4368 3820 -548 (-12.55%) 248 215 -33 (-13.31%) bpf_overlay.o tail_rev_nodeport_lb6 2867 2428 -439 (-15.31%) 167 140 -27 (-16.17%) bpf_sock.o cil_sock6_connect 1718 1703 -15 (-0.87%) 100 99 -1 (-1.00%) bpf_xdp.o tail_handle_nat_fwd_ipv4 12917 12443 -474 (-3.67%) 875 849 -26 (-2.97%) bpf_xdp.o tail_handle_nat_fwd_ipv6 13515 13264 -251 (-1.86%) 715 702 -13 (-1.82%) bpf_xdp.o tail_lb_ipv4 39492 36367 -3125 (-7.91%) 2430 2251 -179 (-7.37%) bpf_xdp.o tail_lb_ipv6 80441 78058 -2383 (-2.96%) 3647 3523 -124 (-3.40%) bpf_xdp.o tail_nodeport_ipv6_dsr 1038 901 -137 (-13.20%) 61 55 -6 (-9.84%) bpf_xdp.o tail_nodeport_nat_egress_ipv4 13027 12096 -931 (-7.15%) 868 809 -59 (-6.80%) bpf_xdp.o tail_nodeport_nat_ingress_ipv4 7617 5900 -1717 (-22.54%) 522 413 -109 (-20.88%) bpf_xdp.o tail_nodeport_nat_ingress_ipv6 7575 7395 -180 (-2.38%) 383 374 -9 (-2.35%) bpf_xdp.o tail_rev_nodeport_lb4 6808 6739 -69 (-1.01%) 403 396 -7 (-1.74%) bpf_xdp.o tail_rev_nodeport_lb6 16173 15847 -326 (-2.02%) 1010 990 -20 (-1.98%) Acked-by: Eduard Zingerman Signed-off-by: Andrii Nakryiko --- kernel/bpf/verifier.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 1ebe76c98451..e5ce530641ba 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -4477,8 +4477,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, return err; mark_stack_slot_scratched(env, spi); - if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && - !register_is_null(reg) && env->bpf_capable) { + if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && env->bpf_capable) { save_register_state(env, state, spi, reg, size); /* Break the relation on a narrowing spill. */ if (fls64(reg->umax_value) > BITS_PER_BYTE * size) @@ -4527,7 +4526,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, /* when we zero initialize stack slots mark them as such */ if ((reg && register_is_null(reg)) || (!reg && is_bpf_st_mem(insn) && insn->imm == 0)) { - /* backtracking doesn't work for STACK_ZERO yet. */ + /* STACK_ZERO case happened because register spill + * wasn't properly aligned at the stack slot boundary, + * so it's not a register spill anymore; force + * originating register to be precise to make + * STACK_ZERO correct for subsequent states + */ err = mark_chain_precision(env, value_regno); if (err) return err; From patchwork Tue Dec 5 18:42:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13480628 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8741F10DD for ; Tue, 5 Dec 2023 10:43:22 -0800 (PST) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3B5IfMcL015699 for ; Tue, 5 Dec 2023 10:43:21 -0800 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3usxjtn4eg-16 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 05 Dec 2023 10:43:21 -0800 Received: from twshared15991.38.frc1.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:21d::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.34; Tue, 5 Dec 2023 10:43:18 -0800 Received: by devbig019.vll3.facebook.com (Postfix, from userid 137359) id D9EF43CA17064; Tue, 5 Dec 2023 10:43:14 -0800 (PST) From: Andrii Nakryiko To: , , , CC: , , Eduard Zingerman Subject: [PATCH v4 bpf-next 09/10] selftests/bpf: validate precision logic in partial_stack_load_preserves_zeros Date: Tue, 5 Dec 2023 10:42:47 -0800 Message-ID: <20231205184248.1502704-10-andrii@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231205184248.1502704-1-andrii@kernel.org> References: <20231205184248.1502704-1-andrii@kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: FdHx6wp6xQjwpSpWBG4E9HS1acRLPuYE X-Proofpoint-ORIG-GUID: FdHx6wp6xQjwpSpWBG4E9HS1acRLPuYE X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-05_14,2023-12-05_01,2023-05-22_02 X-Patchwork-Delegate: bpf@iogearbox.net Enhance partial_stack_load_preserves_zeros subtest with detailed precision propagation log checks. We know expect fp-16 to be spilled, initially imprecise, zero const register, which is later marked as precise even when partial stack slot load is performed, even if it's not a register fill (!). Acked-by: Eduard Zingerman Signed-off-by: Andrii Nakryiko --- .../selftests/bpf/progs/verifier_spill_fill.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/tools/testing/selftests/bpf/progs/verifier_spill_fill.c b/tools/testing/selftests/bpf/progs/verifier_spill_fill.c index 41fd61299eab..df4920da3472 100644 --- a/tools/testing/selftests/bpf/progs/verifier_spill_fill.c +++ b/tools/testing/selftests/bpf/progs/verifier_spill_fill.c @@ -495,6 +495,22 @@ char single_byte_buf[1] SEC(".data.single_byte_buf"); SEC("raw_tp") __log_level(2) __success +/* make sure fp-8 is all STACK_ZERO */ +__msg("2: (7a) *(u64 *)(r10 -8) = 0 ; R10=fp0 fp-8_w=00000000") +/* but fp-16 is spilled IMPRECISE zero const reg */ +__msg("4: (7b) *(u64 *)(r10 -16) = r0 ; R0_w=0 R10=fp0 fp-16_w=0") +/* and now check that precision propagation works even for such tricky case */ +__msg("10: (71) r2 = *(u8 *)(r10 -9) ; R2_w=P0 R10=fp0 fp-16_w=0") +__msg("11: (0f) r1 += r2") +__msg("mark_precise: frame0: last_idx 11 first_idx 0 subseq_idx -1") +__msg("mark_precise: frame0: regs=r2 stack= before 10: (71) r2 = *(u8 *)(r10 -9)") +__msg("mark_precise: frame0: regs= stack=-16 before 9: (bf) r1 = r6") +__msg("mark_precise: frame0: regs= stack=-16 before 8: (73) *(u8 *)(r1 +0) = r2") +__msg("mark_precise: frame0: regs= stack=-16 before 7: (0f) r1 += r2") +__msg("mark_precise: frame0: regs= stack=-16 before 6: (71) r2 = *(u8 *)(r10 -1)") +__msg("mark_precise: frame0: regs= stack=-16 before 5: (bf) r1 = r6") +__msg("mark_precise: frame0: regs= stack=-16 before 4: (7b) *(u64 *)(r10 -16) = r0") +__msg("mark_precise: frame0: regs=r0 stack= before 3: (b7) r0 = 0") __naked void partial_stack_load_preserves_zeros(void) { asm volatile ( From patchwork Tue Dec 5 18:42:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13480632 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 934701A5 for ; Tue, 5 Dec 2023 10:43:45 -0800 (PST) Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3B5IfZZt017839 for ; Tue, 5 Dec 2023 10:43:44 -0800 Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3usxjfd271-11 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 05 Dec 2023 10:43:44 -0800 Received: from twshared10507.42.prn1.facebook.com (2620:10d:c0a8:1c::1b) by mail.thefacebook.com (2620:10d:c0a8:83::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.34; Tue, 5 Dec 2023 10:43:26 -0800 Received: by devbig019.vll3.facebook.com (Postfix, from userid 137359) id 2506A3CA17081; Tue, 5 Dec 2023 10:43:17 -0800 (PST) From: Andrii Nakryiko To: , , , CC: , , Eduard Zingerman Subject: [PATCH v4 bpf-next 10/10] bpf: use common instruction history across all states Date: Tue, 5 Dec 2023 10:42:48 -0800 Message-ID: <20231205184248.1502704-11-andrii@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231205184248.1502704-1-andrii@kernel.org> References: <20231205184248.1502704-1-andrii@kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: WgNIThjKEaTr8-LmC77Sz9LqF1RlsQvG X-Proofpoint-ORIG-GUID: WgNIThjKEaTr8-LmC77Sz9LqF1RlsQvG X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-05_14,2023-12-05_01,2023-05-22_02 X-Patchwork-Delegate: bpf@iogearbox.net Instead of allocating and copying instruction history each time we enqueue child verifier state, switch to a model where we use one common dynamically sized array of instruction history entries across all states. The key observation for proving this is correct is that instruction history is only relevant while state is active, which means it either is a current state (and thus we are actively modifying instruction history and no other state can interfere with us) or we are checkpointed state with some children still active (either enqueued or being current). In the latter case our portion of instruction history is finalized and won't change or grow, so as long as we keep it immutable until the state is finalized, we are good. Now, when state is finalized and is put into state hash for potentially future pruning lookups, instruction history is not used anymore. This is because instruction history is only used by precision marking logic, and we never modify precision markings for finalized states. So, instead of each state having its own small instruction history, we keep a global dynamically-sized instruction history, where each state in current DFS path from root to active state remembers its portion of instruction history. Current state can append to this history, but cannot modify any of its parent histories. Because the insn_hist array can be grown through realloc, states don't keep pointers, they instead maintain two indices, [start, end), into global instruction history array. End is exclusive index, so `start == end` means there is no relevant instruction history. This eliminates a lot of allocations and minimizes overall memory usage. Acked-by: Eduard Zingerman Signed-off-by: Andrii Nakryiko --- include/linux/bpf_verifier.h | 19 +++++--- kernel/bpf/verifier.c | 95 ++++++++++++++++-------------------- 2 files changed, 54 insertions(+), 60 deletions(-) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index bada59812e00..13a1824aafa7 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -327,7 +327,7 @@ struct bpf_func_state { #define MAX_CALL_FRAMES 8 -/* instruction history flags, used in bpf_jmp_history_entry.flags field */ +/* instruction history flags, used in bpf_insn_hist_entry.flags field */ enum { /* instruction references stack slot through PTR_TO_STACK register; * we also store stack's frame number in lower 3 bits (MAX_CALL_FRAMES is 8) @@ -345,7 +345,7 @@ enum { static_assert(INSN_F_FRAMENO_MASK + 1 >= MAX_CALL_FRAMES); static_assert(INSN_F_SPI_MASK + 1 >= MAX_BPF_STACK / 8); -struct bpf_jmp_history_entry { +struct bpf_insn_hist_entry { u32 idx; /* insn idx can't be bigger than 1 million */ u32 prev_idx : 22; @@ -430,13 +430,14 @@ struct bpf_verifier_state { * See get_loop_entry() for more information. */ struct bpf_verifier_state *loop_entry; - /* jmp history recorded from first to last. - * backtracking is using it to go from last to first. - * For most states jmp_history_cnt is [0-3]. + /* Sub-range of env->insn_hist[] corresponding to this state's + * instruction history. + * Backtracking is using it to go from last to first. + * For most states instruction history is short, 0-3 instructions. * For loops can go up to ~40. */ - struct bpf_jmp_history_entry *jmp_history; - u32 jmp_history_cnt; + u32 insn_hist_start; + u32 insn_hist_end; u32 dfs_depth; u32 callback_unroll_depth; }; @@ -678,7 +679,9 @@ struct bpf_verifier_env { int cur_stack; } cfg; struct backtrack_state bt; - struct bpf_jmp_history_entry *cur_hist_ent; + struct bpf_insn_hist_entry *insn_hist; + struct bpf_insn_hist_entry *cur_hist_ent; + u32 insn_hist_cap; u32 pass_cnt; /* number of times do_check() was called */ u32 subprog_cnt; /* number of instructions analyzed by the verifier */ diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index e5ce530641ba..92a9aed7e112 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1327,13 +1327,6 @@ static void free_func_state(struct bpf_func_state *state) kfree(state); } -static void clear_jmp_history(struct bpf_verifier_state *state) -{ - kfree(state->jmp_history); - state->jmp_history = NULL; - state->jmp_history_cnt = 0; -} - static void free_verifier_state(struct bpf_verifier_state *state, bool free_self) { @@ -1343,7 +1336,6 @@ static void free_verifier_state(struct bpf_verifier_state *state, free_func_state(state->frame[i]); state->frame[i] = NULL; } - clear_jmp_history(state); if (free_self) kfree(state); } @@ -1369,13 +1361,6 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state, struct bpf_func_state *dst; int i, err; - dst_state->jmp_history = copy_array(dst_state->jmp_history, src->jmp_history, - src->jmp_history_cnt, sizeof(*dst_state->jmp_history), - GFP_USER); - if (!dst_state->jmp_history) - return -ENOMEM; - dst_state->jmp_history_cnt = src->jmp_history_cnt; - /* if dst has more stack frames then src frame, free them, this is also * necessary in case of exceptional exits using bpf_throw. */ @@ -1392,6 +1377,8 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state, dst_state->parent = src->parent; dst_state->first_insn_idx = src->first_insn_idx; dst_state->last_insn_idx = src->last_insn_idx; + dst_state->insn_hist_start = src->insn_hist_start; + dst_state->insn_hist_end = src->insn_hist_end; dst_state->dfs_depth = src->dfs_depth; dst_state->callback_unroll_depth = src->callback_unroll_depth; dst_state->used_as_loop_entry = src->used_as_loop_entry; @@ -3262,11 +3249,10 @@ static bool is_jmp_point(struct bpf_verifier_env *env, int insn_idx) } /* for any branch, call, exit record the history of jmps in the given state */ -static int push_jmp_history(struct bpf_verifier_env *env, struct bpf_verifier_state *cur, - int insn_flags) +static int push_insn_history(struct bpf_verifier_env *env, struct bpf_verifier_state *cur, + int insn_flags) { - u32 cnt = cur->jmp_history_cnt; - struct bpf_jmp_history_entry *p; + struct bpf_insn_hist_entry *p; size_t alloc_size; /* combine instruction flags if we already recorded this instruction */ @@ -3282,28 +3268,31 @@ static int push_jmp_history(struct bpf_verifier_env *env, struct bpf_verifier_st return 0; } - cnt++; - alloc_size = kmalloc_size_roundup(size_mul(cnt, sizeof(*p))); - p = krealloc(cur->jmp_history, alloc_size, GFP_USER); - if (!p) - return -ENOMEM; - cur->jmp_history = p; + if (cur->insn_hist_end + 1 > env->insn_hist_cap) { + alloc_size = size_mul(cur->insn_hist_end + 1, sizeof(*p)); + alloc_size = kmalloc_size_roundup(alloc_size); + p = krealloc(env->insn_hist, alloc_size, GFP_USER); + if (!p) + return -ENOMEM; + env->insn_hist = p; + env->insn_hist_cap = alloc_size / sizeof(*p); + } - p = &cur->jmp_history[cnt - 1]; + p = &env->insn_hist[cur->insn_hist_end]; p->idx = env->insn_idx; p->prev_idx = env->prev_insn_idx; p->flags = insn_flags; - cur->jmp_history_cnt = cnt; + cur->insn_hist_end++; env->cur_hist_ent = p; return 0; } -static struct bpf_jmp_history_entry *get_jmp_hist_entry(struct bpf_verifier_state *st, - u32 hist_end, int insn_idx) +static struct bpf_insn_hist_entry *get_insn_hist_entry(struct bpf_verifier_env *env, + u32 hist_end, int insn_idx) { - if (hist_end > 0 && st->jmp_history[hist_end - 1].idx == insn_idx) - return &st->jmp_history[hist_end - 1]; + if (hist_end > 0 && env->insn_hist[hist_end - 1].idx == insn_idx) + return &env->insn_hist[hist_end - 1]; return NULL; } @@ -3320,25 +3309,26 @@ static struct bpf_jmp_history_entry *get_jmp_hist_entry(struct bpf_verifier_stat * history entry recording a jump from last instruction of parent state and * first instruction of given state. */ -static int get_prev_insn_idx(struct bpf_verifier_state *st, int i, - u32 *history) +static int get_prev_insn_idx(const struct bpf_verifier_env *env, + struct bpf_verifier_state *st, + int insn_idx, u32 *hist_endp) { - u32 cnt = *history; + u32 hist_end = *hist_endp; + u32 cnt = hist_end - st->insn_hist_start; - if (i == st->first_insn_idx) { + if (insn_idx == st->first_insn_idx) { if (cnt == 0) return -ENOENT; - if (cnt == 1 && st->jmp_history[0].idx == i) + if (cnt == 1 && env->insn_hist[hist_end - 1].idx == insn_idx) return -ENOENT; } - if (cnt && st->jmp_history[cnt - 1].idx == i) { - i = st->jmp_history[cnt - 1].prev_idx; - (*history)--; + if (cnt && env->insn_hist[hist_end - 1].idx == insn_idx) { + (*hist_endp)--; + return env->insn_hist[hist_end - 1].prev_idx; } else { - i--; + return insn_idx - 1; } - return i; } static const char *disasm_kfunc_name(void *data, const struct bpf_insn *insn) @@ -3529,7 +3519,7 @@ static bool calls_callback(struct bpf_verifier_env *env, int insn_idx); * - *was* processed previously during backtracking. */ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx, - struct bpf_jmp_history_entry *hist, struct backtrack_state *bt) + struct bpf_insn_hist_entry *hist, struct backtrack_state *bt) { const struct bpf_insn_cbs cbs = { .cb_call = disasm_kfunc_name, @@ -4025,7 +4015,7 @@ static int mark_precise_scalar_ids(struct bpf_verifier_env *env, struct bpf_veri * SCALARS, as well as any other registers and slots that contribute to * a tracked state of given registers/stack slots, depending on specific BPF * assembly instructions (see backtrack_insns() for exact instruction handling - * logic). This backtracking relies on recorded jmp_history and is able to + * logic). This backtracking relies on recorded insn_hist and is able to * traverse entire chain of parent states. This process ends only when all the * necessary registers/slots and their transitive dependencies are marked as * precise. @@ -4142,8 +4132,8 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno) for (;;) { DECLARE_BITMAP(mask, 64); - u32 history = st->jmp_history_cnt; - struct bpf_jmp_history_entry *hist; + u32 hist_end = st->insn_hist_end; + struct bpf_insn_hist_entry *hist; if (env->log.level & BPF_LOG_LEVEL2) { verbose(env, "mark_precise: frame%d: last_idx %d first_idx %d subseq_idx %d \n", @@ -4207,7 +4197,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno) err = 0; skip_first = false; } else { - hist = get_jmp_hist_entry(st, history, i); + hist = get_insn_hist_entry(env, hist_end, i); err = backtrack_insn(env, i, subseq_idx, hist, bt); } if (err == -ENOTSUPP) { @@ -4224,7 +4214,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno) */ return 0; subseq_idx = i; - i = get_prev_insn_idx(st, i, &history); + i = get_prev_insn_idx(env, st, i, &hist_end); if (i == -ENOENT) break; if (i >= env->prog->len) { @@ -4545,7 +4535,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, } if (insn_flags) - return push_jmp_history(env, env->cur_state, insn_flags); + return push_insn_history(env, env->cur_state, insn_flags); return 0; } @@ -4845,7 +4835,7 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env, insn_flags = 0; /* we are not restoring spilled register */ } if (insn_flags) - return push_jmp_history(env, env->cur_state, insn_flags); + return push_insn_history(env, env->cur_state, insn_flags); return 0; } @@ -16975,7 +16965,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx) * the current state. */ if (is_jmp_point(env, env->insn_idx)) - err = err ? : push_jmp_history(env, cur, 0); + err = err ? : push_insn_history(env, cur, 0); err = err ? : propagate_precision(env, &sl->state); if (err) return err; @@ -17074,8 +17064,8 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx) cur->parent = new; cur->first_insn_idx = insn_idx; + cur->insn_hist_start = cur->insn_hist_end; cur->dfs_depth = new->dfs_depth + 1; - clear_jmp_history(cur); new_sl->next = *explored_state(env, insn_idx); *explored_state(env, insn_idx) = new_sl; /* connect new state to parentage chain. Current frame needs all @@ -17242,7 +17232,7 @@ static int do_check(struct bpf_verifier_env *env) } if (is_jmp_point(env, env->insn_idx)) { - err = push_jmp_history(env, state, 0); + err = push_insn_history(env, state, 0); if (err) return err; } @@ -20804,6 +20794,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3 if (!is_priv) mutex_unlock(&bpf_verifier_lock); vfree(env->insn_aux_data); + kvfree(env->insn_hist); err_free_env: kfree(env); return ret;