mbox series

[PATCHv6,bpf-next,0/9] uprobe: uretprobe speed up

Message ID 20240521104825.1060966-1-jolsa@kernel.org (mailing list archive)
Headers show
Series uprobe: uretprobe speed up | expand

Message

Jiri Olsa May 21, 2024, 10:48 a.m. UTC
hi,
as part of the effort on speeding up the uprobes [0] coming with
return uprobe optimization by using syscall instead of the trap
on the uretprobe trampoline.

The speed up depends on instruction type that uprobe is installed
and depends on specific HW type, please check patch 1 for details.

Patches 1-8 are based on bpf-next/master, but patch 2 and 3 are
apply-able on linux-trace.git tree probes/for-next branch.
Patch 9 is based on man-pages master.

v6 changes:
- separate shadow stack fix for current uretprobe in patch 1
- skip shadow stack test when uprobe is not compiled int [Masami]
- fix retprobe with the shadow stack, using iret return when
  shadow stack is detected
- I kept the acks on patch 3, because the shadow stack change is
  minimal and the original code is almost untouched
- added shadow stack bpf selftest
- rebased man page

Also available at:
  https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
  uretprobe_syscall

thanks,
jirka


Notes to check list items in Documentation/process/adding-syscalls.rst:

- System Call Alternatives
  New syscall seems like the best way in here, because we need
  just to quickly enter kernel with no extra arguments processing,
  which we'd need to do if we decided to use another syscall.

- Designing the API: Planning for Extension
  The uretprobe syscall is very specific and most likely won't be
  extended in the future.

  At the moment it does not take any arguments and even if it does
  in future, it's allowed to be called only from trampoline prepared
  by kernel, so there'll be no broken user.

- Designing the API: Other Considerations
  N/A because uretprobe syscall does not return reference to kernel
  object.

- Proposing the API
  Wiring up of the uretprobe system call is in separate change,
  selftests and man page changes are part of the patchset.

- Generic System Call Implementation
  There's no CONFIG option for the new functionality because it
  keeps the same behaviour from the user POV.

- x86 System Call Implementation
  It's 64-bit syscall only.

- Compatibility System Calls (Generic)
  N/A uretprobe syscall has no arguments and is not supported
  for compat processes.

- Compatibility System Calls (x86)
  N/A uretprobe syscall is not supported for compat processes.

- System Calls Returning Elsewhere
  N/A.

- Other Details
  N/A.

- Testing
  Adding new bpf selftests and ran ltp on top of this change.

- Man Page
  Attached.

- Do not call System Calls in the Kernel
  N/A.


[0] https://lore.kernel.org/bpf/ZeCXHKJ--iYYbmLj@krava/
---
Jiri Olsa (8):
      x86/shstk: Make return uprobe work with shadow stack
      uprobe: Wire up uretprobe system call
      uprobe: Add uretprobe syscall to speed up return probe
      selftests/x86: Add return uprobe shadow stack test
      selftests/bpf: Add uretprobe syscall test for regs integrity
      selftests/bpf: Add uretprobe syscall test for regs changes
      selftests/bpf: Add uretprobe syscall call from user space test
      selftests/bpf: Add uretprobe shadow stack test

 arch/x86/entry/syscalls/syscall_64.tbl                      |   1 +
 arch/x86/include/asm/shstk.h                                |   4 +
 arch/x86/kernel/shstk.c                                     |  16 ++++
 arch/x86/kernel/uprobes.c                                   | 124 ++++++++++++++++++++++++++++-
 include/linux/syscalls.h                                    |   2 +
 include/linux/uprobes.h                                     |   3 +
 include/uapi/asm-generic/unistd.h                           |   5 +-
 kernel/events/uprobes.c                                     |  24 ++++--
 kernel/sys_ni.c                                             |   2 +
 tools/include/linux/compiler.h                              |   4 +
 tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c       | 123 ++++++++++++++++++++++++++++-
 tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c     | 385 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tools/testing/selftests/bpf/progs/uprobe_syscall.c          |  15 ++++
 tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c |  17 ++++
 tools/testing/selftests/x86/test_shadow_stack.c             | 145 ++++++++++++++++++++++++++++++++++
 15 files changed, 860 insertions(+), 10 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
 create mode 100644 tools/testing/selftests/bpf/progs/uprobe_syscall.c
 create mode 100644 tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c

Jiri Olsa (1):
      man2: Add uretprobe syscall page

 man/man2/uretprobe.2 | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)
 create mode 100644 man/man2/uretprobe.2

Comments

Deepak Gupta May 21, 2024, 8:49 p.m. UTC | #1
On Tue, May 21, 2024 at 12:48:16PM +0200, Jiri Olsa wrote:
>hi,
>as part of the effort on speeding up the uprobes [0] coming with
>return uprobe optimization by using syscall instead of the trap
>on the uretprobe trampoline.

I understand this provides an optimization on x86. I believe primary reason
is syscall is straight-line microcode and short sequence while trap delivery
still does all the GDT / IDT and segmentation checks and it makes delivery
of the trap slow.

So doing syscall improves that. Although it seems x86 is going to get rid of 
that as part of FRED [1, 2]. And linux kernel support for FRED is already upstream [2].
So I am imagining x86 hardware already exists with FRED support.

On other architectures, I believe trap delivery for breakpoint instruction
is same as syscall instruction.

Given that x86 trap delivery is pretty much going following the suit here and
intend to make trap delivery cost similar to syscall delivery.

Sorry for being buzzkill here but ...
Is it worth introducing this syscall which otherwise has no use on other arches
and x86 (and x86 kernel) has already taken steps to match trap delivery latency with
syscall latency would have similar cost?

Did you do any study of this on FRED enabled x86 CPUs?

[1] - https://www.intel.com/content/www/us/en/content-details/780121/flexible-return-and-event-delivery-fred-specification.html
[2] - https://docs.kernel.org/arch/x86/x86_64/fred.html

>
>The speed up depends on instruction type that uprobe is installed
>and depends on specific HW type, please check patch 1 for details.
>
Alexei Starovoitov May 21, 2024, 8:57 p.m. UTC | #2
On Tue, May 21, 2024 at 1:49 PM Deepak Gupta <debug@rivosinc.com> wrote:
>
> On Tue, May 21, 2024 at 12:48:16PM +0200, Jiri Olsa wrote:
> >hi,
> >as part of the effort on speeding up the uprobes [0] coming with
> >return uprobe optimization by using syscall instead of the trap
> >on the uretprobe trampoline.
>
> I understand this provides an optimization on x86. I believe primary reason
> is syscall is straight-line microcode and short sequence while trap delivery
> still does all the GDT / IDT and segmentation checks and it makes delivery
> of the trap slow.
>
> So doing syscall improves that. Although it seems x86 is going to get rid of
> that as part of FRED [1, 2]. And linux kernel support for FRED is already upstream [2].
> So I am imagining x86 hardware already exists with FRED support.
>
> On other architectures, I believe trap delivery for breakpoint instruction
> is same as syscall instruction.
>
> Given that x86 trap delivery is pretty much going following the suit here and
> intend to make trap delivery cost similar to syscall delivery.
>
> Sorry for being buzzkill here but ...
> Is it worth introducing this syscall which otherwise has no use on other arches
> and x86 (and x86 kernel) has already taken steps to match trap delivery latency with
> syscall latency would have similar cost?
>
> Did you do any study of this on FRED enabled x86 CPUs?

afaik CPUs with FRED do not exist on the market and it's
not clear when they will be available.
And when they finally will be on the shelves
the overhead of FRED vs int3 would still have to be measured.
int3 with FRED might still be higher than syscall with FRED.

>
> [1] - https://www.intel.com/content/www/us/en/content-details/780121/flexible-return-and-event-delivery-fred-specification.html
> [2] - https://docs.kernel.org/arch/x86/x86_64/fred.html
>
> >
> >The speed up depends on instruction type that uprobe is installed
> >and depends on specific HW type, please check patch 1 for details.
> >
Jiri Olsa May 22, 2024, 8:55 a.m. UTC | #3
On Tue, May 21, 2024 at 01:57:33PM -0700, Alexei Starovoitov wrote:
> On Tue, May 21, 2024 at 1:49 PM Deepak Gupta <debug@rivosinc.com> wrote:
> >
> > On Tue, May 21, 2024 at 12:48:16PM +0200, Jiri Olsa wrote:
> > >hi,
> > >as part of the effort on speeding up the uprobes [0] coming with
> > >return uprobe optimization by using syscall instead of the trap
> > >on the uretprobe trampoline.
> >
> > I understand this provides an optimization on x86. I believe primary reason
> > is syscall is straight-line microcode and short sequence while trap delivery
> > still does all the GDT / IDT and segmentation checks and it makes delivery
> > of the trap slow.
> >
> > So doing syscall improves that. Although it seems x86 is going to get rid of
> > that as part of FRED [1, 2]. And linux kernel support for FRED is already upstream [2].
> > So I am imagining x86 hardware already exists with FRED support.
> >
> > On other architectures, I believe trap delivery for breakpoint instruction
> > is same as syscall instruction.
> >
> > Given that x86 trap delivery is pretty much going following the suit here and
> > intend to make trap delivery cost similar to syscall delivery.
> >
> > Sorry for being buzzkill here but ...
> > Is it worth introducing this syscall which otherwise has no use on other arches
> > and x86 (and x86 kernel) has already taken steps to match trap delivery latency with
> > syscall latency would have similar cost?
> >
> > Did you do any study of this on FRED enabled x86 CPUs?

nope.. interesting, will check, thanks

> 
> afaik CPUs with FRED do not exist on the market and it's
> not clear when they will be available.
> And when they finally will be on the shelves
> the overhead of FRED vs int3 would still have to be measured.
> int3 with FRED might still be higher than syscall with FRED.

+1, also it's not really a complicated change and the wiring of the
new syscall to uretprobe is really simple and we could go back to int3
with just one single patch if we see no longer any benefit to it,
but at the moment it provides speed up

jirka

> 
> >
> > [1] - https://www.intel.com/content/www/us/en/content-details/780121/flexible-return-and-event-delivery-fred-specification.html
> > [2] - https://docs.kernel.org/arch/x86/x86_64/fred.html
> >
> > >
> > >The speed up depends on instruction type that uprobe is installed
> > >and depends on specific HW type, please check patch 1 for details.
> > >