Message ID | 20240814185417.1171430-11-andrii@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Harden and extend ELF build ID parsing logic | expand |
On Wed, 2024-08-14 at 11:54 -0700, Andrii Nakryiko wrote: > Add a new set of tests validating behavior of capturing stack traces > with build ID. We extend uprobe_multi target binary with ability to > trigger uprobe (so that we can capture stack traces from it), but also > we allow to force build ID data to be either resident or non-resident in > memory (see also a comment about quirks of MADV_PAGEOUT). > > That way we can validate that in non-sleepable context we won't get > build ID (as expected), but with sleepable uprobes we will get that > build ID regardless of it being physically present in memory. > > Also, we add a small add-on linker script which reorders > .note.gnu.build-id section and puts it after (big) .text section, > putting build ID data outside of the very first page of ELF file. This > will test all the relaxations we did in build ID parsing logic in kernel > thanks to freader abstraction. > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org> > --- Acked-by: Eduard Zingerman <eddyz87@gmail.com> [...] > diff --git a/tools/testing/selftests/bpf/uprobe_multi.c b/tools/testing/selftests/bpf/uprobe_multi.c > index 7ffa563ffeba..c7828b13e5ff 100644 > --- a/tools/testing/selftests/bpf/uprobe_multi.c > +++ b/tools/testing/selftests/bpf/uprobe_multi.c [...] > +int __attribute__((weak)) trigger_uprobe(bool build_id_resident) > +{ > + int page_sz = sysconf(_SC_PAGESIZE); > + void *addr; > + > + /* page-align build ID start */ > + addr = (void *)((uintptr_t)&build_id_start & ~(page_sz - 1)); > + > + /* to guarantee MADV_PAGEOUT work reliably, we need to ensure that > + * memory range is mapped into current process, so we unconditionally > + * do MADV_POPULATE_READ, and then MADV_PAGEOUT, if necessary > + */ > + madvise(addr, page_sz, MADV_POPULATE_READ); Nit: check error code? > + if (!build_id_resident) > + madvise(addr, page_sz, MADV_PAGEOUT); > + > + (void)uprobe(); > + > + return 0; > +} > + [...] Silly question, unrelated to the patch-set itself. When I do ./test_progs -vvv -t build_id/sleepable five stack frames are printed: FRAME #00: BUILD ID = 46d2568fe293274105f9dad0cc73de54a176f368 OFFSET = 2c4156 FRAME #01: BUILD ID = 46d2568fe293274105f9dad0cc73de54a176f368 OFFSET = 393aef FRAME #02: BUILD ID = 8f53abaad945a669f2bdcd25f471d80e077568ef OFFSET = 2a088 FRAME #03: BUILD ID = 8f53abaad945a669f2bdcd25f471d80e077568ef OFFSET = 2a14b FRAME #04: BUILD ID = 46d2568fe293274105f9dad0cc73de54a176f368 OFFSET = 2c4095 The ...6f368 is build-id of the uprobe_multi. How do I check where ...568ef comes from? Also, why are there 5 frames when nesting level for uprobe() is 3?
On Thu, Aug 22, 2024 at 3:30 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > On Wed, 2024-08-14 at 11:54 -0700, Andrii Nakryiko wrote: > > Add a new set of tests validating behavior of capturing stack traces > > with build ID. We extend uprobe_multi target binary with ability to > > trigger uprobe (so that we can capture stack traces from it), but also > > we allow to force build ID data to be either resident or non-resident in > > memory (see also a comment about quirks of MADV_PAGEOUT). > > > > That way we can validate that in non-sleepable context we won't get > > build ID (as expected), but with sleepable uprobes we will get that > > build ID regardless of it being physically present in memory. > > > > Also, we add a small add-on linker script which reorders > > .note.gnu.build-id section and puts it after (big) .text section, > > putting build ID data outside of the very first page of ELF file. This > > will test all the relaxations we did in build ID parsing logic in kernel > > thanks to freader abstraction. > > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org> > > --- > > Acked-by: Eduard Zingerman <eddyz87@gmail.com> > > [...] > > > diff --git a/tools/testing/selftests/bpf/uprobe_multi.c b/tools/testing/selftests/bpf/uprobe_multi.c > > index 7ffa563ffeba..c7828b13e5ff 100644 > > --- a/tools/testing/selftests/bpf/uprobe_multi.c > > +++ b/tools/testing/selftests/bpf/uprobe_multi.c > > [...] > > > +int __attribute__((weak)) trigger_uprobe(bool build_id_resident) > > +{ > > + int page_sz = sysconf(_SC_PAGESIZE); > > + void *addr; > > + > > + /* page-align build ID start */ > > + addr = (void *)((uintptr_t)&build_id_start & ~(page_sz - 1)); > > + > > + /* to guarantee MADV_PAGEOUT work reliably, we need to ensure that > > + * memory range is mapped into current process, so we unconditionally > > + * do MADV_POPULATE_READ, and then MADV_PAGEOUT, if necessary > > + */ > > + madvise(addr, page_sz, MADV_POPULATE_READ); > > Nit: check error code? > Well, even if this errors out there is no one to notice and do anything about it, given this is in a forked process. The idea, though, is that if this doesn't work, we'll catch it as part of the actual selftest. > > + if (!build_id_resident) > > + madvise(addr, page_sz, MADV_PAGEOUT); > > + > > + (void)uprobe(); > > + > > + return 0; > > +} > > + > > [...] > > Silly question, unrelated to the patch-set itself. > When I do ./test_progs -vvv -t build_id/sleepable five stack frames > are printed: > > FRAME #00: BUILD ID = 46d2568fe293274105f9dad0cc73de54a176f368 OFFSET = 2c4156 > FRAME #01: BUILD ID = 46d2568fe293274105f9dad0cc73de54a176f368 OFFSET = 393aef > FRAME #02: BUILD ID = 8f53abaad945a669f2bdcd25f471d80e077568ef OFFSET = 2a088 > FRAME #03: BUILD ID = 8f53abaad945a669f2bdcd25f471d80e077568ef OFFSET = 2a14b > FRAME #04: BUILD ID = 46d2568fe293274105f9dad0cc73de54a176f368 OFFSET = 2c4095 In my QEMU I only get 3: FRAME #00: BUILD ID = d370860567af6d28316d45726045f1c59bbfc416 OFFSET = 2c4156 FRAME #01: BUILD ID = d370860567af6d28316d45726045f1c59bbfc416 OFFSET = 393ac7 FRAME #02: BUILD ID = 8bfe03f6bf9b6a6e2591babd0bbc266837d8f658 OFFSET = 27cd0 But see below, for my actual devserver there are 4 frames. My bet would be that 568ef is libc. A bit confused why you get frame 04 from uprobe_multi, but maybe that's how things work with musl or whatever? Don't know. Check libc.so. > > The ...6f368 is build-id of the uprobe_multi. > How do I check where ...568ef comes from? > Also, why are there 5 frames when nesting level for uprobe() is 3? > Well, libc has some function calls before it gets to main. E.g., for my local machine: $ sudo bpftrace -e 'uprobe:./uprobe_multi:uprobe { print(ustack()); }' Attaching 1 probe... uprobe+4 trigger_uprobe+113 main+176 __libc_start_call_main+128 Note that you won't have trigger_uprobe in your stack trace until your kernel has [0] [0] https://lore.kernel.org/linux-trace-kernel/20240729175223.23914-1-andrii@kernel.org/
On Thu, 2024-08-22 at 15:55 -0700, Andrii Nakryiko wrote: > > > + madvise(addr, page_sz, MADV_POPULATE_READ); > > > > Nit: check error code? > > Well, even if this errors out there is no one to notice and do > anything about it, given this is in a forked process. The idea, > though, is that if this doesn't work, we'll catch it as part of the > actual selftest. Ok. [...] > In my QEMU I only get 3: > > FRAME #00: BUILD ID = d370860567af6d28316d45726045f1c59bbfc416 OFFSET = 2c4156 > FRAME #01: BUILD ID = d370860567af6d28316d45726045f1c59bbfc416 OFFSET = 393ac7 > FRAME #02: BUILD ID = 8bfe03f6bf9b6a6e2591babd0bbc266837d8f658 OFFSET = 27cd0 > > But see below, for my actual devserver there are 4 frames. My bet > would be that 568ef is libc. A bit confused why you get frame 04 from > uprobe_multi, but maybe that's how things work with musl or whatever? > Don't know. Check libc.so. Oh, right, I had to check libc build-id inside QEMU, not outside... Yes, this is libc signature. This figures, thank you for explaining. [...]
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 7e4b107b37b4..e47d983d2694 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -787,9 +787,10 @@ $(OUTPUT)/veristat: $(OUTPUT)/veristat.o # Linking uprobe_multi can fail due to relocation overflows on mips. $(OUTPUT)/uprobe_multi: CFLAGS += $(if $(filter mips, $(ARCH)),-mxgot) -$(OUTPUT)/uprobe_multi: uprobe_multi.c +$(OUTPUT)/uprobe_multi: uprobe_multi.c uprobe_multi.ld $(call msg,BINARY,,$@) - $(Q)$(CC) $(CFLAGS) -O0 $(LDFLAGS) $^ $(LDLIBS) -o $@ + $(Q)$(CC) $(CFLAGS) -Wl,-T,uprobe_multi.ld -O0 $(LDFLAGS) \ + $(filter-out %.ld,$^) $(LDLIBS) -o $@ EXTRA_CLEAN := $(SCRATCH_DIR) $(HOST_SCRATCH_DIR) \ prog_tests/tests.h map_tests/tests.h verifier/tests.h \ diff --git a/tools/testing/selftests/bpf/prog_tests/build_id.c b/tools/testing/selftests/bpf/prog_tests/build_id.c new file mode 100644 index 000000000000..aec9c8d6bc96 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/build_id.c @@ -0,0 +1,118 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ +#include <test_progs.h> + +#include "test_build_id.skel.h" + +static char build_id[BPF_BUILD_ID_SIZE]; +static int build_id_sz; + +static void print_stack(struct bpf_stack_build_id *stack, int frame_cnt) +{ + int i, j; + + for (i = 0; i < frame_cnt; i++) { + printf("FRAME #%02d: ", i); + switch (stack[i].status) { + case BPF_STACK_BUILD_ID_EMPTY: + printf("<EMPTY>\n"); + break; + case BPF_STACK_BUILD_ID_VALID: + printf("BUILD ID = "); + for (j = 0; j < BPF_BUILD_ID_SIZE; j++) + printf("%02hhx", (unsigned)stack[i].build_id[j]); + printf(" OFFSET = %llx", (unsigned long long)stack[i].offset); + break; + case BPF_STACK_BUILD_ID_IP: + printf("IP = %llx", (unsigned long long)stack[i].ip); + break; + default: + printf("UNEXPECTED STATUS %d ", stack[i].status); + break; + } + printf("\n"); + } +} + +static void subtest_nofault(bool build_id_resident) +{ + struct test_build_id *skel; + struct bpf_stack_build_id *stack; + int frame_cnt; + + skel = test_build_id__open_and_load(); + if (!ASSERT_OK_PTR(skel, "skel_open")) + return; + + skel->links.uprobe_nofault = bpf_program__attach(skel->progs.uprobe_nofault); + if (!ASSERT_OK_PTR(skel->links.uprobe_nofault, "link")) + goto cleanup; + + if (build_id_resident) + ASSERT_OK(system("./uprobe_multi uprobe-paged-in"), "trigger_uprobe"); + else + ASSERT_OK(system("./uprobe_multi uprobe-paged-out"), "trigger_uprobe"); + + if (!ASSERT_GT(skel->bss->res_nofault, 0, "res")) + goto cleanup; + + stack = skel->bss->stack_nofault; + frame_cnt = skel->bss->res_nofault / sizeof(struct bpf_stack_build_id); + if (env.verbosity >= VERBOSE_NORMAL) + print_stack(stack, frame_cnt); + + if (build_id_resident) { + ASSERT_EQ(stack[0].status, BPF_STACK_BUILD_ID_VALID, "build_id_status"); + ASSERT_EQ(memcmp(stack[0].build_id, build_id, build_id_sz), 0, "build_id_match"); + } else { + ASSERT_EQ(stack[0].status, BPF_STACK_BUILD_ID_IP, "build_id_status"); + } + +cleanup: + test_build_id__destroy(skel); +} + +static void subtest_sleepable(void) +{ + struct test_build_id *skel; + struct bpf_stack_build_id *stack; + int frame_cnt; + + skel = test_build_id__open_and_load(); + if (!ASSERT_OK_PTR(skel, "skel_open")) + return; + + skel->links.uprobe_sleepable = bpf_program__attach(skel->progs.uprobe_sleepable); + if (!ASSERT_OK_PTR(skel->links.uprobe_sleepable, "link")) + goto cleanup; + + /* force build ID to not be paged in */ + ASSERT_OK(system("./uprobe_multi uprobe-paged-out"), "trigger_uprobe"); + + if (!ASSERT_GT(skel->bss->res_sleepable, 0, "res")) + goto cleanup; + + stack = skel->bss->stack_sleepable; + frame_cnt = skel->bss->res_sleepable / sizeof(struct bpf_stack_build_id); + if (env.verbosity >= VERBOSE_NORMAL) + print_stack(stack, frame_cnt); + + ASSERT_EQ(stack[0].status, BPF_STACK_BUILD_ID_VALID, "build_id_status"); + ASSERT_EQ(memcmp(stack[0].build_id, build_id, build_id_sz), 0, "build_id_match"); + +cleanup: + test_build_id__destroy(skel); +} + +void serial_test_build_id(void) +{ + build_id_sz = read_build_id("uprobe_multi", build_id, sizeof(build_id)); + ASSERT_EQ(build_id_sz, BPF_BUILD_ID_SIZE, "parse_build_id"); + + if (test__start_subtest("nofault-paged-out")) + subtest_nofault(false /* not resident */); + if (test__start_subtest("nofault-paged-in")) + subtest_nofault(true /* resident */); + if (test__start_subtest("sleepable")) + subtest_sleepable(); +} diff --git a/tools/testing/selftests/bpf/progs/test_build_id.c b/tools/testing/selftests/bpf/progs/test_build_id.c new file mode 100644 index 000000000000..32ce59f9aa27 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_build_id.c @@ -0,0 +1,31 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ + +#include "vmlinux.h" +#include <bpf/bpf_helpers.h> + +struct bpf_stack_build_id stack_sleepable[128]; +int res_sleepable; + +struct bpf_stack_build_id stack_nofault[128]; +int res_nofault; + +SEC("uprobe.multi/./uprobe_multi:uprobe") +int uprobe_nofault(struct pt_regs *ctx) +{ + res_nofault = bpf_get_stack(ctx, stack_nofault, sizeof(stack_nofault), + BPF_F_USER_STACK | BPF_F_USER_BUILD_ID); + + return 0; +} + +SEC("uprobe.multi.s/./uprobe_multi:uprobe") +int uprobe_sleepable(struct pt_regs *ctx) +{ + res_sleepable = bpf_get_stack(ctx, stack_sleepable, sizeof(stack_sleepable), + BPF_F_USER_STACK | BPF_F_USER_BUILD_ID); + + return 0; +} + +char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/uprobe_multi.c b/tools/testing/selftests/bpf/uprobe_multi.c index 7ffa563ffeba..c7828b13e5ff 100644 --- a/tools/testing/selftests/bpf/uprobe_multi.c +++ b/tools/testing/selftests/bpf/uprobe_multi.c @@ -2,8 +2,21 @@ #include <stdio.h> #include <string.h> +#include <stdbool.h> +#include <stdint.h> +#include <sys/mman.h> +#include <unistd.h> #include <sdt.h> +#ifndef MADV_POPULATE_READ +#define MADV_POPULATE_READ 22 +#endif + +int __attribute__((weak)) uprobe(void) +{ + return 0; +} + #define __PASTE(a, b) a##b #define PASTE(a, b) __PASTE(a, b) @@ -75,6 +88,30 @@ static int usdt(void) return 0; } +extern char build_id_start[]; +extern char build_id_end[]; + +int __attribute__((weak)) trigger_uprobe(bool build_id_resident) +{ + int page_sz = sysconf(_SC_PAGESIZE); + void *addr; + + /* page-align build ID start */ + addr = (void *)((uintptr_t)&build_id_start & ~(page_sz - 1)); + + /* to guarantee MADV_PAGEOUT work reliably, we need to ensure that + * memory range is mapped into current process, so we unconditionally + * do MADV_POPULATE_READ, and then MADV_PAGEOUT, if necessary + */ + madvise(addr, page_sz, MADV_POPULATE_READ); + if (!build_id_resident) + madvise(addr, page_sz, MADV_PAGEOUT); + + (void)uprobe(); + + return 0; +} + int main(int argc, char **argv) { if (argc != 2) @@ -84,6 +121,10 @@ int main(int argc, char **argv) return bench(); if (!strcmp("usdt", argv[1])) return usdt(); + if (!strcmp("uprobe-paged-out", argv[1])) + return trigger_uprobe(false /* page-out build ID */); + if (!strcmp("uprobe-paged-in", argv[1])) + return trigger_uprobe(true /* page-in build ID */); error: fprintf(stderr, "usage: %s <bench|usdt>\n", argv[0]); diff --git a/tools/testing/selftests/bpf/uprobe_multi.ld b/tools/testing/selftests/bpf/uprobe_multi.ld new file mode 100644 index 000000000000..a2e94828bc8c --- /dev/null +++ b/tools/testing/selftests/bpf/uprobe_multi.ld @@ -0,0 +1,11 @@ +SECTIONS +{ + . = ALIGN(4096); + .note.gnu.build-id : { *(.note.gnu.build-id) } + . = ALIGN(4096); +} +INSERT AFTER .text; + +build_id_start = ADDR(.note.gnu.build-id); +build_id_end = ADDR(.note.gnu.build-id) + SIZEOF(.note.gnu.build-id); +
Add a new set of tests validating behavior of capturing stack traces with build ID. We extend uprobe_multi target binary with ability to trigger uprobe (so that we can capture stack traces from it), but also we allow to force build ID data to be either resident or non-resident in memory (see also a comment about quirks of MADV_PAGEOUT). That way we can validate that in non-sleepable context we won't get build ID (as expected), but with sleepable uprobes we will get that build ID regardless of it being physically present in memory. Also, we add a small add-on linker script which reorders .note.gnu.build-id section and puts it after (big) .text section, putting build ID data outside of the very first page of ELF file. This will test all the relaxations we did in build ID parsing logic in kernel thanks to freader abstraction. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> --- tools/testing/selftests/bpf/Makefile | 5 +- .../selftests/bpf/prog_tests/build_id.c | 118 ++++++++++++++++++ .../selftests/bpf/progs/test_build_id.c | 31 +++++ tools/testing/selftests/bpf/uprobe_multi.c | 41 ++++++ tools/testing/selftests/bpf/uprobe_multi.ld | 11 ++ 5 files changed, 204 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/build_id.c create mode 100644 tools/testing/selftests/bpf/progs/test_build_id.c create mode 100644 tools/testing/selftests/bpf/uprobe_multi.ld