mbox series

[-next,v1,0/3] kernel/events/uprobes: uprobe_write_opcode() rewrite

Message ID 20250304154846.1937958-1-david@redhat.com (mailing list archive)
Headers show
Series kernel/events/uprobes: uprobe_write_opcode() rewrite | expand

Message

David Hildenbrand March 4, 2025, 3:48 p.m. UTC
Based on -next, because a related fix [1] is not in mm.git. This is
the follow-up to [2] (later than I wanted to send it out), now that
Willy also stumbled over this [3].

Since the RFC, I rewrote it once again, now using a folio_walk instead of
our old pagewalk infrastructure.


Currently, uprobe_write_opcode() implements COW-breaking manually, which is
really far from ideal. Further, there is interest in supporting uprobes on
hugetlb pages [1], and leaving at least the COW-breaking to the core will
make this much easier.

Also, I think the current code doesn't really handle some things
properly (see patch #3) when replacing/zapping pages.

Let's rewrite it, to leave COW-breaking to the fault handler, and handle
registration/unregistration by temporarily unmapping the anonymous page,
modifying it, and mapping it again. We still have to implement
zapping of anonymous pages ourselves, unfortunately.

We could look into not performing the temporary unmapping if we can
perform the write atomically, which would likely also make adding hugetlb
support a lot easier. But, limited (e.g., only PMD/PUD) hugetlb support
could be added on top of this with some tweaking.

Note that we now won't have to allocate another anonymous folio when
unregistering (which will be beneficial for hugetlb as well), we can simply
modify the already-mapped one from the registration (if any). When
registering a uprobe, we'll first trigger a ptrace-like write fault to
break COW, to then modify the already-mapped page.

Briefly sanity tested with perf:
  [root@localhost ~]# perf probe -x /usr/bin/bash -a main
  ...
  [root@localhost ~]# perf record -e probe_bash:main -aR sleep 10 &
  [1] 2196
  [root@localhost ~]# bash
  [root@localhost ~]# exit
  exit
  [root@localhost ~]# bash
  [root@localhost ~]# exit
  exit
  [root@localhost ~]# [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.287 MB perf.data (8 samples) ]
  ...
  [root@localhost ~]# perf report --stdio
  # To display the perf.data header info, please use --header/--header-only optio>
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 8  of event 'probe_bash:main'
  # Event count (approx.): 8
  #
  # Overhead  Command      Shared Object  Symbol
  # ........  ...........  .............  ........
  #
      75.00%  grepconf.sh  bash           [.] main
      25.00%  bash         bash           [.] main
  ...

Are there any uprobe tests / benchmarks that are worth running?

RFC -> v1:
* Use folio_walk and simplify the logic

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Tong Tiangen <tongtiangen@huawei.com>

[1] https://lkml.kernel.org/r/20250224031149.1598949-1-tongtiangen@huawei.com
[2] https://lore.kernel.org/linux-mm/20240604122548.359952-2-david@redhat.com/T/
[3] https://lore.kernel.org/all/d7971673-19ed-448a-9e54-8ffbde5059dc@redhat.com/T/
[4] https://lkml.kernel.org/r/ZiK50qob9yl5e0Xz@bender.morinfr.org

David Hildenbrand (3):
  kernel/events/uprobes: pass VMA instead of MM to remove_breakpoint()
  kernel/events/uprobes: pass VMA to set_swbp(), set_orig_insn() and
    uprobe_write_opcode()
  kernel/events/uprobes: uprobe_write_opcode() rewrite

 arch/arm/probes/uprobes/core.c |   4 +-
 include/linux/uprobes.h        |   6 +-
 kernel/events/uprobes.c        | 363 +++++++++++++++++----------------
 3 files changed, 190 insertions(+), 183 deletions(-)


base-commit: cd3215bbcb9d4321def93fea6cfad4d5b42b9d1d

Comments

Oleg Nesterov March 5, 2025, 3:20 p.m. UTC | #1
On 03/04, David Hildenbrand wrote:
>
> Currently, uprobe_write_opcode() implements COW-breaking manually, which is
> really far from ideal.

To say at least ;)

David, thanks for doing this. I'll try to read 3/3 tomorrow, but I don't
think I can really help. Let me repeat, this code was written many years
ago, I forgot everything, and today my understanding of mm/ is very poor.
But I'll try anyway.

> Are there any uprobe tests / benchmarks that are worth running?

All I know about uprobe tests is that bpf people run a lot of tests which
use uprobes.

Andrii, Jiri, what you advise?

Oleg.
Andrii Nakryiko March 5, 2025, 7:43 p.m. UTC | #2
On Wed, Mar 5, 2025 at 7:22 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 03/04, David Hildenbrand wrote:
> >
> > Currently, uprobe_write_opcode() implements COW-breaking manually, which is
> > really far from ideal.
>
> To say at least ;)
>
> David, thanks for doing this. I'll try to read 3/3 tomorrow, but I don't
> think I can really help. Let me repeat, this code was written many years
> ago, I forgot everything, and today my understanding of mm/ is very poor.
> But I'll try anyway.
>
> > Are there any uprobe tests / benchmarks that are worth running?
>
> All I know about uprobe tests is that bpf people run a lot of tests which
> use uprobes.
>
> Andrii, Jiri, what you advise?
>

We do have a bunch of tests within BPF selftests:

cd tools/testing/selftest/bpf && make -j$(nproc) && sudo ./test_progs -t uprobe

I also built an uprobe-stress tool to validate uprobe optimizations I
was doing, this one is the most stand-alone thing to use for testing,
please consider checking that. You can find it at [0], and see also
[1] and [2] where  I was helping Peter to build it from sources, so
that might be useful for you as well, if you run into problems with
building. Running something like `sudo ./uprobe-stress -a10 -t5 -m5
-f3` would hammer on this quite a bit.

I'm just about to leave on a short vacation, so won't have time to go
over patches, but I plan to look at them when I'm back next week.

  [0] https://github.com/libbpf/libbpf-bootstrap/tree/uprobe-stress
  [1] https://lore.kernel.org/linux-trace-kernel/CAEf4BzZ+ygwfk8FKn5AS_Ny=igvGcFzdDLE2FjcvwjCKazEWMA@mail.gmail.com/
  [2] https://lore.kernel.org/linux-trace-kernel/CAEf4BzZqKCR-EQz6LTi-YvFY4RnYb_NnQXtwgZCv6aUo7gjkHg@mail.gmail.com

> Oleg.
>
David Hildenbrand March 5, 2025, 7:47 p.m. UTC | #3
On 05.03.25 20:43, Andrii Nakryiko wrote:
> On Wed, Mar 5, 2025 at 7:22 AM Oleg Nesterov <oleg@redhat.com> wrote:
>>
>> On 03/04, David Hildenbrand wrote:
>>>
>>> Currently, uprobe_write_opcode() implements COW-breaking manually, which is
>>> really far from ideal.
>>
>> To say at least ;)
>>
>> David, thanks for doing this. I'll try to read 3/3 tomorrow, but I don't
>> think I can really help. Let me repeat, this code was written many years
>> ago, I forgot everything, and today my understanding of mm/ is very poor.
>> But I'll try anyway.
>>
>>> Are there any uprobe tests / benchmarks that are worth running?
>>
>> All I know about uprobe tests is that bpf people run a lot of tests which
>> use uprobes.
>>
>> Andrii, Jiri, what you advise?
>>
> 
> We do have a bunch of tests within BPF selftests:
> 
> cd tools/testing/selftest/bpf && make -j$(nproc) && sudo ./test_progs -t uprobe

I stumbled over them, but was so far not successful in building them in 
my test VM (did not try too hard, though). Will try harder now that I 
know that it actually tests uprobe properly :)

> 
> I also built an uprobe-stress tool to validate uprobe optimizations I
> was doing, this one is the most stand-alone thing to use for testing,
> please consider checking that. You can find it at [0], and see also
> [1] and [2] where  I was helping Peter to build it from sources, so
> that might be useful for you as well, if you run into problems with
> building. Running something like `sudo ./uprobe-stress -a10 -t5 -m5
> -f3` would hammer on this quite a bit.

Thanks, I'll play with that as well.

> 
> I'm just about to leave on a short vacation, so won't have time to go
> over patches, but I plan to look at them when I'm back next week.
> 
>    [0] https://github.com/libbpf/libbpf-bootstrap/tree/uprobe-stress
>    [1] https://lore.kernel.org/linux-trace-kernel/CAEf4BzZ+ygwfk8FKn5AS_Ny=igvGcFzdDLE2FjcvwjCKazEWMA@mail.gmail.com/
>    [2] https://lore.kernel.org/linux-trace-kernel/CAEf4BzZqKCR-EQz6LTi-YvFY4RnYb_NnQXtwgZCv6aUo7gjkHg@mail.gmail.com
> 
>> Oleg.
>>
>
Andrii Nakryiko March 5, 2025, 7:58 p.m. UTC | #4
On Wed, Mar 5, 2025 at 11:47 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 05.03.25 20:43, Andrii Nakryiko wrote:
> > On Wed, Mar 5, 2025 at 7:22 AM Oleg Nesterov <oleg@redhat.com> wrote:
> >>
> >> On 03/04, David Hildenbrand wrote:
> >>>
> >>> Currently, uprobe_write_opcode() implements COW-breaking manually, which is
> >>> really far from ideal.
> >>
> >> To say at least ;)
> >>
> >> David, thanks for doing this. I'll try to read 3/3 tomorrow, but I don't
> >> think I can really help. Let me repeat, this code was written many years
> >> ago, I forgot everything, and today my understanding of mm/ is very poor.
> >> But I'll try anyway.
> >>
> >>> Are there any uprobe tests / benchmarks that are worth running?
> >>
> >> All I know about uprobe tests is that bpf people run a lot of tests which
> >> use uprobes.
> >>
> >> Andrii, Jiri, what you advise?
> >>
> >
> > We do have a bunch of tests within BPF selftests:
> >
> > cd tools/testing/selftest/bpf && make -j$(nproc) && sudo ./test_progs -t uprobe
>
> I stumbled over them, but was so far not successful in building them in
> my test VM (did not try too hard, though). Will try harder now that I
> know that it actually tests uprobe properly :)

If you have decently recent Clang and pahole, then just make sure you
have kernel built before you build selftests. So above instructions
are more like:

1. cd <linux-repo>
2. cat tools/testing/selftests/bpf/{config, config.<your_arch>} >> .config
3. make -j$(nproc) # build kernel with that adjusted config
4. cd tools/testing/selftests/bpf
5. make -j$(nproc) # build BPF selftests
6. sudo ./test_progs -t uprobe # run selftests with "uprobe" in their name

>
> >
> > I also built an uprobe-stress tool to validate uprobe optimizations I
> > was doing, this one is the most stand-alone thing to use for testing,
> > please consider checking that. You can find it at [0], and see also
> > [1] and [2] where  I was helping Peter to build it from sources, so
> > that might be useful for you as well, if you run into problems with
> > building. Running something like `sudo ./uprobe-stress -a10 -t5 -m5
> > -f3` would hammer on this quite a bit.
>
> Thanks, I'll play with that as well.
>
> >
> > I'm just about to leave on a short vacation, so won't have time to go
> > over patches, but I plan to look at them when I'm back next week.
> >
> >    [0] https://github.com/libbpf/libbpf-bootstrap/tree/uprobe-stress
> >    [1] https://lore.kernel.org/linux-trace-kernel/CAEf4BzZ+ygwfk8FKn5AS_Ny=igvGcFzdDLE2FjcvwjCKazEWMA@mail.gmail.com/
> >    [2] https://lore.kernel.org/linux-trace-kernel/CAEf4BzZqKCR-EQz6LTi-YvFY4RnYb_NnQXtwgZCv6aUo7gjkHg@mail.gmail.com
> >
> >> Oleg.
> >>
> >
>
>
> --
> Cheers,
>
> David / dhildenb
>
David Hildenbrand March 5, 2025, 8:53 p.m. UTC | #5
On 05.03.25 20:58, Andrii Nakryiko wrote:
> On Wed, Mar 5, 2025 at 11:47 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 05.03.25 20:43, Andrii Nakryiko wrote:
>>> On Wed, Mar 5, 2025 at 7:22 AM Oleg Nesterov <oleg@redhat.com> wrote:
>>>>
>>>> On 03/04, David Hildenbrand wrote:
>>>>>
>>>>> Currently, uprobe_write_opcode() implements COW-breaking manually, which is
>>>>> really far from ideal.
>>>>
>>>> To say at least ;)
>>>>
>>>> David, thanks for doing this. I'll try to read 3/3 tomorrow, but I don't
>>>> think I can really help. Let me repeat, this code was written many years
>>>> ago, I forgot everything, and today my understanding of mm/ is very poor.
>>>> But I'll try anyway.
>>>>
>>>>> Are there any uprobe tests / benchmarks that are worth running?
>>>>
>>>> All I know about uprobe tests is that bpf people run a lot of tests which
>>>> use uprobes.
>>>>
>>>> Andrii, Jiri, what you advise?
>>>>
>>>
>>> We do have a bunch of tests within BPF selftests:
>>>
>>> cd tools/testing/selftest/bpf && make -j$(nproc) && sudo ./test_progs -t uprobe
>>
>> I stumbled over them, but was so far not successful in building them in
>> my test VM (did not try too hard, though). Will try harder now that I
>> know that it actually tests uprobe properly :)
> 
> If you have decently recent Clang and pahole, then just make sure you
> have kernel built before you build selftests. So above instructions
> are more like:
> 
> 1. cd <linux-repo>
> 2. cat tools/testing/selftests/bpf/{config, config.<your_arch>} >> .config

^ that did the trick

> 3. make -j$(nproc) # build kernel with that adjusted config
> 4. cd tools/testing/selftests/bpf
> 5. make -j$(nproc) # build BPF selftests
> 6. sudo ./test_progs -t uprobe # run selftests with "uprobe" in their name

#444     uprobe:OK
#445     uprobe_autoattach:OK
#446/1   uprobe_multi_test/skel_api:OK
#446/2   uprobe_multi_test/attach_api_pattern:OK
#446/3   uprobe_multi_test/attach_api_syms:OK
#446/4   uprobe_multi_test/link_api:OK
#446/5   uprobe_multi_test/bench_uprobe:OK
#446/6   uprobe_multi_test/bench_usdt:OK
#446/7   uprobe_multi_test/attach_api_fails:OK
#446/8   uprobe_multi_test/attach_uprobe_fails:OK
#446/9   uprobe_multi_test/consumers:OK
#446/10  uprobe_multi_test/filter_fork:OK
#446/11  uprobe_multi_test/filter_clone_vm:OK
#446/12  uprobe_multi_test/session:OK
#446/13  uprobe_multi_test/session_single:OK
#446/14  uprobe_multi_test/session_cookie:OK
#446/15  uprobe_multi_test/session_cookie_recursive:OK
#446/16  uprobe_multi_test/uprobe_sesison_return_0:OK
#446/17  uprobe_multi_test/uprobe_sesison_return_1:OK
#446/18  uprobe_multi_test/uprobe_sesison_return_2:OK
#446     uprobe_multi_test:OK
#447/1   uprobe_syscall/uretprobe_regs_equal:OK
#447/2   uprobe_syscall/uretprobe_regs_change:OK
#447/3   uprobe_syscall/uretprobe_syscall_call:OK
#447/4   uprobe_syscall/uretprobe_shadow_stack:SKIP
#447     uprobe_syscall:OK (SKIP: 1/4)
Summary: 4/21 PASSED, 1 SKIPPED, 0 FAILED


Looks promising, thanks!