Message ID | 20221007081327.1047552-1-sumanthk@linux.ibm.com (mailing list archive) |
---|---|
State | Handled Elsewhere |
Delegated to: | BPF |
Headers | show |
Series | bpf: fix sample_flags for bpf_perf_event_output | expand |
On Fri, Oct 07, 2022 at 10:13:27AM +0200, Sumanth Korikkar wrote: > * Raw data is also filled by bpf_perf_event_output. > * Add sample_flags to indicate raw data. > * This eliminates the segfaults as shown below: > Run ./samples/bpf/trace_output > BUG pid 9 cookie 1001000000004 sized 4 > BUG pid 9 cookie 1001000000004 sized 4 > BUG pid 9 cookie 1001000000004 sized 4 > Segmentation fault (core dumped) > > Fixes: 838d9bb62d13 ("perf: Use sample_flags for raw_data") > Acked-by: Namhyung Kim <namhyung@kernel.org> > Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Peter, I think this should go through your tree again? bpf-next/master does not have sample_flags merged yet thanks, jirka > --- > kernel/trace/bpf_trace.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index 49fb9ec8366d..1ed08967fb97 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -687,6 +687,7 @@ BPF_CALL_5(bpf_perf_event_output, struct pt_regs *, regs, struct bpf_map *, map, > > perf_sample_data_init(sd, 0, 0); > sd->raw = &raw; > + sd->sample_flags |= PERF_SAMPLE_RAW; > > err = __bpf_perf_event_output(regs, map, flags, sd); > > @@ -745,6 +746,7 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, > perf_fetch_caller_regs(regs); > perf_sample_data_init(sd, 0, 0); > sd->raw = &raw; > + sd->sample_flags |= PERF_SAMPLE_RAW; > > ret = __bpf_perf_event_output(regs, map, flags, sd); > out: > -- > 2.36.1 >
On Fri, Oct 07, 2022 at 11:45:36AM +0200, Jiri Olsa wrote: > On Fri, Oct 07, 2022 at 10:13:27AM +0200, Sumanth Korikkar wrote: > > * Raw data is also filled by bpf_perf_event_output. > > * Add sample_flags to indicate raw data. > > * This eliminates the segfaults as shown below: > > Run ./samples/bpf/trace_output > > BUG pid 9 cookie 1001000000004 sized 4 > > BUG pid 9 cookie 1001000000004 sized 4 > > BUG pid 9 cookie 1001000000004 sized 4 > > Segmentation fault (core dumped) > > > > Fixes: 838d9bb62d13 ("perf: Use sample_flags for raw_data") > > Acked-by: Namhyung Kim <namhyung@kernel.org> > > Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> > > Acked-by: Jiri Olsa <jolsa@kernel.org> > > Peter, > I think this should go through your tree again? > bpf-next/master does not have sample_flags merged yet Yep can do. I'll line it up in perf/urgent (Ingo just send out perf/core).
Hello, The commit that this patch is fixing[1] also causes yet another segfault for 'perf-script' of tracepoint records. For example: $ sudo timeout 3 perf record -e exceptions:page_fault_user [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.228 MB perf.data (74 samples) ] $ sudo perf script Segmentation fault Reverting this patch and the original bug commit[1] fixes the issue. I haven't deep dive yet because I'm not familiar with this area. Anybody has any idea about this? [1] 838d9bb62d13 ("perf: Use sample_flags for raw_data") Thanks, SJ On Fri, 7 Oct 2022 10:13:27 +0200 Sumanth Korikkar <sumanthk@linux.ibm.com> wrote: > * Raw data is also filled by bpf_perf_event_output. > * Add sample_flags to indicate raw data. > * This eliminates the segfaults as shown below: > Run ./samples/bpf/trace_output > BUG pid 9 cookie 1001000000004 sized 4 > BUG pid 9 cookie 1001000000004 sized 4 > BUG pid 9 cookie 1001000000004 sized 4 > Segmentation fault (core dumped) > > Fixes: 838d9bb62d13 ("perf: Use sample_flags for raw_data") > Acked-by: Namhyung Kim <namhyung@kernel.org> > Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> > --- > kernel/trace/bpf_trace.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index 49fb9ec8366d..1ed08967fb97 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -687,6 +687,7 @@ BPF_CALL_5(bpf_perf_event_output, struct pt_regs *, regs, struct bpf_map *, map, > > perf_sample_data_init(sd, 0, 0); > sd->raw = &raw; > + sd->sample_flags |= PERF_SAMPLE_RAW; > > err = __bpf_perf_event_output(regs, map, flags, sd); > > @@ -745,6 +746,7 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, > perf_fetch_caller_regs(regs); > perf_sample_data_init(sd, 0, 0); > sd->raw = &raw; > + sd->sample_flags |= PERF_SAMPLE_RAW; > > ret = __bpf_perf_event_output(regs, map, flags, sd); > out: > -- > 2.36.1
Hi SeongJae, On Mon, Oct 17, 2022 at 12:27 PM SeongJae Park <sj@kernel.org> wrote: > > Hello, > > > The commit that this patch is fixing[1] also causes yet another segfault for > 'perf-script' of tracepoint records. For example: > > $ sudo timeout 3 perf record -e exceptions:page_fault_user > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.228 MB perf.data (74 samples) ] > $ sudo perf script > Segmentation fault > > Reverting this patch and the original bug commit[1] fixes the issue. I haven't > deep dive yet because I'm not familiar with this area. Anybody has any idea > about this? > > [1] 838d9bb62d13 ("perf: Use sample_flags for raw_data") Sorry for the trouble. I think you also need to apply the below: https://lore.kernel.org/r/20221012143857.48198-1-james.clark@arm.com Thanks, Namhyung > > On Fri, 7 Oct 2022 10:13:27 +0200 Sumanth Korikkar <sumanthk@linux.ibm.com> wrote: > > > * Raw data is also filled by bpf_perf_event_output. > > * Add sample_flags to indicate raw data. > > * This eliminates the segfaults as shown below: > > Run ./samples/bpf/trace_output > > BUG pid 9 cookie 1001000000004 sized 4 > > BUG pid 9 cookie 1001000000004 sized 4 > > BUG pid 9 cookie 1001000000004 sized 4 > > Segmentation fault (core dumped) > > > > Fixes: 838d9bb62d13 ("perf: Use sample_flags for raw_data") > > Acked-by: Namhyung Kim <namhyung@kernel.org> > > Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> > > --- > > kernel/trace/bpf_trace.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > > index 49fb9ec8366d..1ed08967fb97 100644 > > --- a/kernel/trace/bpf_trace.c > > +++ b/kernel/trace/bpf_trace.c > > @@ -687,6 +687,7 @@ BPF_CALL_5(bpf_perf_event_output, struct pt_regs *, regs, struct bpf_map *, map, > > > > perf_sample_data_init(sd, 0, 0); > > sd->raw = &raw; > > + sd->sample_flags |= PERF_SAMPLE_RAW; > > > > err = __bpf_perf_event_output(regs, map, flags, sd); > > > > @@ -745,6 +746,7 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, > > perf_fetch_caller_regs(regs); > > perf_sample_data_init(sd, 0, 0); > > sd->raw = &raw; > > + sd->sample_flags |= PERF_SAMPLE_RAW; > > > > ret = __bpf_perf_event_output(regs, map, flags, sd); > > out: > > -- > > 2.36.1
On Mon, 17 Oct 2022 15:52:15 -0700 Namhyung Kim <namhyung@kernel.org> wrote: > Hi SeongJae, > > On Mon, Oct 17, 2022 at 12:27 PM SeongJae Park <sj@kernel.org> wrote: > > > > Hello, > > > > > > The commit that this patch is fixing[1] also causes yet another segfault for > > 'perf-script' of tracepoint records. For example: > > > > $ sudo timeout 3 perf record -e exceptions:page_fault_user > > [ perf record: Woken up 1 times to write data ] > > [ perf record: Captured and wrote 0.228 MB perf.data (74 samples) ] > > $ sudo perf script > > Segmentation fault > > > > Reverting this patch and the original bug commit[1] fixes the issue. I haven't > > deep dive yet because I'm not familiar with this area. Anybody has any idea > > about this? > > > > [1] 838d9bb62d13 ("perf: Use sample_flags for raw_data") > > Sorry for the trouble. No problem. > I think you also need to apply the below: > > https://lore.kernel.org/r/20221012143857.48198-1-james.clark@arm.com Thank you for this nice answer. I confirmed that this fixes my issue. Thanks, SJ [...]
On Fri, Oct 7, 2022 at 8:31 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Fri, Oct 07, 2022 at 11:45:36AM +0200, Jiri Olsa wrote: > > On Fri, Oct 07, 2022 at 10:13:27AM +0200, Sumanth Korikkar wrote: > > > * Raw data is also filled by bpf_perf_event_output. > > > * Add sample_flags to indicate raw data. > > > * This eliminates the segfaults as shown below: > > > Run ./samples/bpf/trace_output > > > BUG pid 9 cookie 1001000000004 sized 4 > > > BUG pid 9 cookie 1001000000004 sized 4 > > > BUG pid 9 cookie 1001000000004 sized 4 > > > Segmentation fault (core dumped) > > > > > > Fixes: 838d9bb62d13 ("perf: Use sample_flags for raw_data") > > > Acked-by: Namhyung Kim <namhyung@kernel.org> > > > Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> > > > > Acked-by: Jiri Olsa <jolsa@kernel.org> > > > > Peter, > > I think this should go through your tree again? > > bpf-next/master does not have sample_flags merged yet > > Yep can do. I'll line it up in perf/urgent (Ingo just send out > perf/core). Peter, Could you please hurry up. 11 days have passed. This issue affects everyone the hard way now after merging all the trees: tip -> linus -> net-next -> bpf-next. The BPF CI is red right now with 5 tests failing because this fix is still missing. It's causing a headache to maintainers and developers.
Peter, Another 2 days have passed and bpf side is still broken due to the change that went during the merge window without corresponding fix from the bpf side. Looks like the patch is sitting in tip:perf/urgent. Please send it to Linus asap. We're not sending bpf fixes to avoid breaking bpf tree too. We've worked around the issue in bpf CI for bpf-next tree only. Developers still see failures when they run tests locally. On Tue, Oct 18, 2022 at 9:57 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Fri, Oct 7, 2022 at 8:31 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Fri, Oct 07, 2022 at 11:45:36AM +0200, Jiri Olsa wrote: > > > On Fri, Oct 07, 2022 at 10:13:27AM +0200, Sumanth Korikkar wrote: > > > > * Raw data is also filled by bpf_perf_event_output. > > > > * Add sample_flags to indicate raw data. > > > > * This eliminates the segfaults as shown below: > > > > Run ./samples/bpf/trace_output > > > > BUG pid 9 cookie 1001000000004 sized 4 > > > > BUG pid 9 cookie 1001000000004 sized 4 > > > > BUG pid 9 cookie 1001000000004 sized 4 > > > > Segmentation fault (core dumped) > > > > > > > > Fixes: 838d9bb62d13 ("perf: Use sample_flags for raw_data") > > > > Acked-by: Namhyung Kim <namhyung@kernel.org> > > > > Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> > > > > > > Acked-by: Jiri Olsa <jolsa@kernel.org> > > > > > > Peter, > > > I think this should go through your tree again? > > > bpf-next/master does not have sample_flags merged yet > > > > Yep can do. I'll line it up in perf/urgent (Ingo just send out > > perf/core). > > Peter, > > Could you please hurry up. 11 days have passed. > > This issue affects everyone the hard way now after merging > all the trees: tip -> linus -> net-next -> bpf-next. > The BPF CI is red right now with 5 tests failing because > this fix is still missing. > It's causing a headache to maintainers and developers.
Another 2 days have passed and the fix is still not in the Linus's tree. Peter, whatever your excuse is for not sending tip:perf/urgent this is not acceptable. Linus, please apply this fix directly: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=perf/urgent&id=21da7472a040420f2dc624ffec70291a72c5d6a6 or suggest the course of action. It sucked to have such a breakage in rc1 and we don't want rc2 to stay broken. Thanks On Thu, Oct 20, 2022 at 6:36 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > Peter, > > Another 2 days have passed and bpf side is still broken > due to the change that went during the merge window without > corresponding fix from the bpf side. > Looks like the patch is sitting in tip:perf/urgent. > Please send it to Linus asap. > > We're not sending bpf fixes to avoid breaking bpf tree too. > We've worked around the issue in bpf CI for bpf-next tree only. > Developers still see failures when they run tests locally. > > On Tue, Oct 18, 2022 at 9:57 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Fri, Oct 7, 2022 at 8:31 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > On Fri, Oct 07, 2022 at 11:45:36AM +0200, Jiri Olsa wrote: > > > > On Fri, Oct 07, 2022 at 10:13:27AM +0200, Sumanth Korikkar wrote: > > > > > * Raw data is also filled by bpf_perf_event_output. > > > > > * Add sample_flags to indicate raw data. > > > > > * This eliminates the segfaults as shown below: > > > > > Run ./samples/bpf/trace_output > > > > > BUG pid 9 cookie 1001000000004 sized 4 > > > > > BUG pid 9 cookie 1001000000004 sized 4 > > > > > BUG pid 9 cookie 1001000000004 sized 4 > > > > > Segmentation fault (core dumped) > > > > > > > > > > Fixes: 838d9bb62d13 ("perf: Use sample_flags for raw_data") > > > > > Acked-by: Namhyung Kim <namhyung@kernel.org> > > > > > Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> > > > > > > > > Acked-by: Jiri Olsa <jolsa@kernel.org> > > > > > > > > Peter, > > > > I think this should go through your tree again? > > > > bpf-next/master does not have sample_flags merged yet > > > > > > Yep can do. I'll line it up in perf/urgent (Ingo just send out > > > perf/core). > > > > Peter, > > > > Could you please hurry up. 11 days have passed. > > > > This issue affects everyone the hard way now after merging > > all the trees: tip -> linus -> net-next -> bpf-next. > > The BPF CI is red right now with 5 tests failing because > > this fix is still missing. > > It's causing a headache to maintainers and developers.
On Sat, Oct 22, 2022 at 6:16 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > Linus, > > please apply this fix directly or suggest the course of action. I have a pull request from Borislav with the fix that came in overnight, so this should be all fixed in rc2. Linus
On Sun, Oct 23, 2022 at 9:55 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I have a pull request from Borislav with the fix that came in > overnight, so this should be all fixed in rc2. .. and now it has moved from my inbox to my -git tree. Linus
On Sun, Oct 23, 2022 at 10:20 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Sun, Oct 23, 2022 at 9:55 AM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > I have a pull request from Borislav with the fix that came in > > overnight, so this should be all fixed in rc2. > > .. and now it has moved from my inbox to my -git tree. Great. Thank you.
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 49fb9ec8366d..1ed08967fb97 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -687,6 +687,7 @@ BPF_CALL_5(bpf_perf_event_output, struct pt_regs *, regs, struct bpf_map *, map, perf_sample_data_init(sd, 0, 0); sd->raw = &raw; + sd->sample_flags |= PERF_SAMPLE_RAW; err = __bpf_perf_event_output(regs, map, flags, sd); @@ -745,6 +746,7 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, perf_fetch_caller_regs(regs); perf_sample_data_init(sd, 0, 0); sd->raw = &raw; + sd->sample_flags |= PERF_SAMPLE_RAW; ret = __bpf_perf_event_output(regs, map, flags, sd); out: