Message ID | 20231110002638.4168352-4-andrii@kernel.org (mailing list archive) |
---|---|
State | Accepted |
Commit | 62ccdb11d3c63dc697dea1fd92b3496fe43dcc1e |
Delegated to: | BPF |
Headers | show |
Series | BPF control flow graph and precision backtrack fixes | expand |
On Thu, Nov 9, 2023 at 4:26 PM Andrii Nakryiko <andrii@kernel.org> wrote: > > Add a dedicated selftests to try to set up conditions to have a state > with same first and last instruction index, but it actually is a loop > 3->4->1->2->3. This confuses mark_chain_precision() if verifier doesn't > take into account jump history. > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org> > --- > .../selftests/bpf/progs/verifier_precision.c | 40 +++++++++++++++++++ > 1 file changed, 40 insertions(+) > > diff --git a/tools/testing/selftests/bpf/progs/verifier_precision.c b/tools/testing/selftests/bpf/progs/verifier_precision.c > index 193c0f8272d0..6b564d4c0986 100644 > --- a/tools/testing/selftests/bpf/progs/verifier_precision.c > +++ b/tools/testing/selftests/bpf/progs/verifier_precision.c > @@ -91,3 +91,43 @@ __naked int bpf_end_bswap(void) > } > > #endif /* v4 instruction */ > + > +SEC("?raw_tp") > +__success __log_level(2) > +/* > + * Without the bug fix there will be no history between "last_idx 3 first_idx 3" > + * and "parent state regs=" lines. "R0_w=6" parts are here to help anchor > + * expected log messages to the one specific mark_chain_precision operation. > + * > + * This is quite fragile: if verifier checkpointing heuristic changes, this > + * might need adjusting. Hmm, but that what __flag(BPF_F_TEST_STATE_FREQ) supposed to address. > + */ > +__msg("2: (07) r0 += 1 ; R0_w=6") > +__msg("3: (35) if r0 >= 0xa goto pc+1") > +__msg("mark_precise: frame0: last_idx 3 first_idx 3 subseq_idx -1") > +__msg("mark_precise: frame0: regs=r0 stack= before 2: (07) r0 += 1") > +__msg("mark_precise: frame0: regs=r0 stack= before 1: (07) r0 += 1") > +__msg("mark_precise: frame0: regs=r0 stack= before 4: (05) goto pc-4") > +__msg("mark_precise: frame0: regs=r0 stack= before 3: (35) if r0 >= 0xa goto pc+1") > +__msg("mark_precise: frame0: parent state regs= stack=: R0_rw=P4") > +__msg("3: R0_w=6") > +__naked int state_loop_first_last_equal(void) > +{ > + asm volatile ( > + "r0 = 0;" > + "l0_%=:" > + "r0 += 1;" > + "r0 += 1;" That's why you had two ++ ? Add state_freq and remove one of them?
On Thu, Nov 9, 2023 at 5:34 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Thu, Nov 9, 2023 at 4:26 PM Andrii Nakryiko <andrii@kernel.org> wrote: > > > > Add a dedicated selftests to try to set up conditions to have a state > > with same first and last instruction index, but it actually is a loop > > 3->4->1->2->3. This confuses mark_chain_precision() if verifier doesn't > > take into account jump history. > > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org> > > --- > > .../selftests/bpf/progs/verifier_precision.c | 40 +++++++++++++++++++ > > 1 file changed, 40 insertions(+) > > > > diff --git a/tools/testing/selftests/bpf/progs/verifier_precision.c b/tools/testing/selftests/bpf/progs/verifier_precision.c > > index 193c0f8272d0..6b564d4c0986 100644 > > --- a/tools/testing/selftests/bpf/progs/verifier_precision.c > > +++ b/tools/testing/selftests/bpf/progs/verifier_precision.c > > @@ -91,3 +91,43 @@ __naked int bpf_end_bswap(void) > > } > > > > #endif /* v4 instruction */ > > + > > +SEC("?raw_tp") > > +__success __log_level(2) > > +/* > > + * Without the bug fix there will be no history between "last_idx 3 first_idx 3" > > + * and "parent state regs=" lines. "R0_w=6" parts are here to help anchor > > + * expected log messages to the one specific mark_chain_precision operation. > > + * > > + * This is quite fragile: if verifier checkpointing heuristic changes, this > > + * might need adjusting. > > Hmm, but that what > __flag(BPF_F_TEST_STATE_FREQ) > supposed to address. When I was analysing and crafting the test I for some reason assumed I need to have a jump inside the state that won't trigger state checkpoint. But I think that's not necessary, just doing conditional jump and jumping back an instruction or two should do. With that yes, TEST_STATE_FREQ should be a better way to do this. > > > + */ > > +__msg("2: (07) r0 += 1 ; R0_w=6") > > +__msg("3: (35) if r0 >= 0xa goto pc+1") > > +__msg("mark_precise: frame0: last_idx 3 first_idx 3 subseq_idx -1") > > +__msg("mark_precise: frame0: regs=r0 stack= before 2: (07) r0 += 1") > > +__msg("mark_precise: frame0: regs=r0 stack= before 1: (07) r0 += 1") > > +__msg("mark_precise: frame0: regs=r0 stack= before 4: (05) goto pc-4") > > +__msg("mark_precise: frame0: regs=r0 stack= before 3: (35) if r0 >= 0xa goto pc+1") > > +__msg("mark_precise: frame0: parent state regs= stack=: R0_rw=P4") > > +__msg("3: R0_w=6") > > +__naked int state_loop_first_last_equal(void) > > +{ > > + asm volatile ( > > + "r0 = 0;" > > + "l0_%=:" > > + "r0 += 1;" > > + "r0 += 1;" > > That's why you had two ++ ? > Add state_freq and remove one of them?
On Thu, Nov 9, 2023 at 7:43 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Thu, Nov 9, 2023 at 5:34 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Thu, Nov 9, 2023 at 4:26 PM Andrii Nakryiko <andrii@kernel.org> wrote: > > > > > > Add a dedicated selftests to try to set up conditions to have a state > > > with same first and last instruction index, but it actually is a loop > > > 3->4->1->2->3. This confuses mark_chain_precision() if verifier doesn't > > > take into account jump history. > > > > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org> > > > --- > > > .../selftests/bpf/progs/verifier_precision.c | 40 +++++++++++++++++++ > > > 1 file changed, 40 insertions(+) > > > > > > diff --git a/tools/testing/selftests/bpf/progs/verifier_precision.c b/tools/testing/selftests/bpf/progs/verifier_precision.c > > > index 193c0f8272d0..6b564d4c0986 100644 > > > --- a/tools/testing/selftests/bpf/progs/verifier_precision.c > > > +++ b/tools/testing/selftests/bpf/progs/verifier_precision.c > > > @@ -91,3 +91,43 @@ __naked int bpf_end_bswap(void) > > > } > > > > > > #endif /* v4 instruction */ > > > + > > > +SEC("?raw_tp") > > > +__success __log_level(2) > > > +/* > > > + * Without the bug fix there will be no history between "last_idx 3 first_idx 3" > > > + * and "parent state regs=" lines. "R0_w=6" parts are here to help anchor > > > + * expected log messages to the one specific mark_chain_precision operation. > > > + * > > > + * This is quite fragile: if verifier checkpointing heuristic changes, this > > > + * might need adjusting. > > > > Hmm, but that what > > __flag(BPF_F_TEST_STATE_FREQ) > > supposed to address. > > When I was analysing and crafting the test I for some reason assumed I > need to have a jump inside the state that won't trigger state > checkpoint. But I think that's not necessary, just doing conditional > jump and jumping back an instruction or two should do. With that yes, > TEST_STATE_FREQ should be a better way to do this. Ah, ok, TEST_STATE_FREQ won't work. It triggers state checkpointing both at conditional jump instruction and on its target, because target is prune point. So I think this test has to be the way it is. > > > > > > + */ > > > +__msg("2: (07) r0 += 1 ; R0_w=6") > > > +__msg("3: (35) if r0 >= 0xa goto pc+1") > > > +__msg("mark_precise: frame0: last_idx 3 first_idx 3 subseq_idx -1") > > > +__msg("mark_precise: frame0: regs=r0 stack= before 2: (07) r0 += 1") > > > +__msg("mark_precise: frame0: regs=r0 stack= before 1: (07) r0 += 1") > > > +__msg("mark_precise: frame0: regs=r0 stack= before 4: (05) goto pc-4") > > > +__msg("mark_precise: frame0: regs=r0 stack= before 3: (35) if r0 >= 0xa goto pc+1") > > > +__msg("mark_precise: frame0: parent state regs= stack=: R0_rw=P4") > > > +__msg("3: R0_w=6") > > > +__naked int state_loop_first_last_equal(void) > > > +{ > > > + asm volatile ( > > > + "r0 = 0;" > > > + "l0_%=:" > > > + "r0 += 1;" > > > + "r0 += 1;" > > > > That's why you had two ++ ? > > Add state_freq and remove one of them?
On Thu, Nov 9, 2023 at 8:05 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > > > > When I was analysing and crafting the test I for some reason assumed I > > need to have a jump inside the state that won't trigger state > > checkpoint. But I think that's not necessary, just doing conditional > > jump and jumping back an instruction or two should do. With that yes, > > TEST_STATE_FREQ should be a better way to do this. > > Ah, ok, TEST_STATE_FREQ won't work. It triggers state checkpointing > both at conditional jump instruction and on its target, because target > is prune point. > > So I think this test has to be the way it is. I see. I was about to apply it, but then noticed: numamove_bpf-numamove_bpf.o |migrate_misplaced_page |success -> failure (!!)|-100.00 % veristat is not known for sporadic failures. Is this a real issue?
On Thu, Nov 9, 2023 at 8:14 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Thu, Nov 9, 2023 at 8:05 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > When I was analysing and crafting the test I for some reason assumed I > > > need to have a jump inside the state that won't trigger state > > > checkpoint. But I think that's not necessary, just doing conditional > > > jump and jumping back an instruction or two should do. With that yes, > > > TEST_STATE_FREQ should be a better way to do this. > > > > Ah, ok, TEST_STATE_FREQ won't work. It triggers state checkpointing > > both at conditional jump instruction and on its target, because target > > is prune point. > > > > So I think this test has to be the way it is. > > I see. > I was about to apply it, but then noticed: > numamove_bpf-numamove_bpf.o |migrate_misplaced_page |success -> > failure (!!)|-100.00 % > > veristat is not known for sporadic failures. > Is this a real issue? No idea what this is, I don't have it in my local object files, will need to regenerate them and check.
On Thu, Nov 9, 2023 at 8:48 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Thu, Nov 9, 2023 at 8:14 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Thu, Nov 9, 2023 at 8:05 PM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > > > > When I was analysing and crafting the test I for some reason assumed I > > > > need to have a jump inside the state that won't trigger state > > > > checkpoint. But I think that's not necessary, just doing conditional > > > > jump and jumping back an instruction or two should do. With that yes, > > > > TEST_STATE_FREQ should be a better way to do this. > > > > > > Ah, ok, TEST_STATE_FREQ won't work. It triggers state checkpointing > > > both at conditional jump instruction and on its target, because target > > > is prune point. > > > > > > So I think this test has to be the way it is. > > > > I see. > > I was about to apply it, but then noticed: > > numamove_bpf-numamove_bpf.o |migrate_misplaced_page |success -> > > failure (!!)|-100.00 % > > > > veristat is not known for sporadic failures. > > Is this a real issue? > > No idea what this is, I don't have it in my local object files, will > need to regenerate them and check. libbpf: prog 'migrate_misplaced_page_exit': failed to find kernel BTF type ID of 'migrate_misplaced_page': -3 It fails also on bpf-next/master. I think CI compares with the last state before net/net-next merge, and now this tool (it's from libbpf-tools) fails to find migrate_misplaced_page kernel function, apparently. So veristat itself doesn't have sporadic failures, but our CI setup is not 100% reliable, it seems.
diff --git a/tools/testing/selftests/bpf/progs/verifier_precision.c b/tools/testing/selftests/bpf/progs/verifier_precision.c index 193c0f8272d0..6b564d4c0986 100644 --- a/tools/testing/selftests/bpf/progs/verifier_precision.c +++ b/tools/testing/selftests/bpf/progs/verifier_precision.c @@ -91,3 +91,43 @@ __naked int bpf_end_bswap(void) } #endif /* v4 instruction */ + +SEC("?raw_tp") +__success __log_level(2) +/* + * Without the bug fix there will be no history between "last_idx 3 first_idx 3" + * and "parent state regs=" lines. "R0_w=6" parts are here to help anchor + * expected log messages to the one specific mark_chain_precision operation. + * + * This is quite fragile: if verifier checkpointing heuristic changes, this + * might need adjusting. + */ +__msg("2: (07) r0 += 1 ; R0_w=6") +__msg("3: (35) if r0 >= 0xa goto pc+1") +__msg("mark_precise: frame0: last_idx 3 first_idx 3 subseq_idx -1") +__msg("mark_precise: frame0: regs=r0 stack= before 2: (07) r0 += 1") +__msg("mark_precise: frame0: regs=r0 stack= before 1: (07) r0 += 1") +__msg("mark_precise: frame0: regs=r0 stack= before 4: (05) goto pc-4") +__msg("mark_precise: frame0: regs=r0 stack= before 3: (35) if r0 >= 0xa goto pc+1") +__msg("mark_precise: frame0: parent state regs= stack=: R0_rw=P4") +__msg("3: R0_w=6") +__naked int state_loop_first_last_equal(void) +{ + asm volatile ( + "r0 = 0;" + "l0_%=:" + "r0 += 1;" + "r0 += 1;" + /* every few iterations we'll have a checkpoint here with + * first_idx == last_idx, potentially confusing precision + * backtracking logic + */ + "if r0 >= 10 goto l1_%=;" /* checkpoint + mark_precise */ + "goto l0_%=;" + "l1_%=:" + "exit;" + ::: __clobber_common + ); +} + +char _license[] SEC("license") = "GPL";
Add a dedicated selftests to try to set up conditions to have a state with same first and last instruction index, but it actually is a loop 3->4->1->2->3. This confuses mark_chain_precision() if verifier doesn't take into account jump history. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> --- .../selftests/bpf/progs/verifier_precision.c | 40 +++++++++++++++++++ 1 file changed, 40 insertions(+)