Message ID | 20240521104825.1060966-1-jolsa@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | uprobe: uretprobe speed up | expand |
On Tue, May 21, 2024 at 12:48:16PM +0200, Jiri Olsa wrote: >hi, >as part of the effort on speeding up the uprobes [0] coming with >return uprobe optimization by using syscall instead of the trap >on the uretprobe trampoline. I understand this provides an optimization on x86. I believe primary reason is syscall is straight-line microcode and short sequence while trap delivery still does all the GDT / IDT and segmentation checks and it makes delivery of the trap slow. So doing syscall improves that. Although it seems x86 is going to get rid of that as part of FRED [1, 2]. And linux kernel support for FRED is already upstream [2]. So I am imagining x86 hardware already exists with FRED support. On other architectures, I believe trap delivery for breakpoint instruction is same as syscall instruction. Given that x86 trap delivery is pretty much going following the suit here and intend to make trap delivery cost similar to syscall delivery. Sorry for being buzzkill here but ... Is it worth introducing this syscall which otherwise has no use on other arches and x86 (and x86 kernel) has already taken steps to match trap delivery latency with syscall latency would have similar cost? Did you do any study of this on FRED enabled x86 CPUs? [1] - https://www.intel.com/content/www/us/en/content-details/780121/flexible-return-and-event-delivery-fred-specification.html [2] - https://docs.kernel.org/arch/x86/x86_64/fred.html > >The speed up depends on instruction type that uprobe is installed >and depends on specific HW type, please check patch 1 for details. >
On Tue, May 21, 2024 at 1:49 PM Deepak Gupta <debug@rivosinc.com> wrote: > > On Tue, May 21, 2024 at 12:48:16PM +0200, Jiri Olsa wrote: > >hi, > >as part of the effort on speeding up the uprobes [0] coming with > >return uprobe optimization by using syscall instead of the trap > >on the uretprobe trampoline. > > I understand this provides an optimization on x86. I believe primary reason > is syscall is straight-line microcode and short sequence while trap delivery > still does all the GDT / IDT and segmentation checks and it makes delivery > of the trap slow. > > So doing syscall improves that. Although it seems x86 is going to get rid of > that as part of FRED [1, 2]. And linux kernel support for FRED is already upstream [2]. > So I am imagining x86 hardware already exists with FRED support. > > On other architectures, I believe trap delivery for breakpoint instruction > is same as syscall instruction. > > Given that x86 trap delivery is pretty much going following the suit here and > intend to make trap delivery cost similar to syscall delivery. > > Sorry for being buzzkill here but ... > Is it worth introducing this syscall which otherwise has no use on other arches > and x86 (and x86 kernel) has already taken steps to match trap delivery latency with > syscall latency would have similar cost? > > Did you do any study of this on FRED enabled x86 CPUs? afaik CPUs with FRED do not exist on the market and it's not clear when they will be available. And when they finally will be on the shelves the overhead of FRED vs int3 would still have to be measured. int3 with FRED might still be higher than syscall with FRED. > > [1] - https://www.intel.com/content/www/us/en/content-details/780121/flexible-return-and-event-delivery-fred-specification.html > [2] - https://docs.kernel.org/arch/x86/x86_64/fred.html > > > > >The speed up depends on instruction type that uprobe is installed > >and depends on specific HW type, please check patch 1 for details. > >
On Tue, May 21, 2024 at 01:57:33PM -0700, Alexei Starovoitov wrote: > On Tue, May 21, 2024 at 1:49 PM Deepak Gupta <debug@rivosinc.com> wrote: > > > > On Tue, May 21, 2024 at 12:48:16PM +0200, Jiri Olsa wrote: > > >hi, > > >as part of the effort on speeding up the uprobes [0] coming with > > >return uprobe optimization by using syscall instead of the trap > > >on the uretprobe trampoline. > > > > I understand this provides an optimization on x86. I believe primary reason > > is syscall is straight-line microcode and short sequence while trap delivery > > still does all the GDT / IDT and segmentation checks and it makes delivery > > of the trap slow. > > > > So doing syscall improves that. Although it seems x86 is going to get rid of > > that as part of FRED [1, 2]. And linux kernel support for FRED is already upstream [2]. > > So I am imagining x86 hardware already exists with FRED support. > > > > On other architectures, I believe trap delivery for breakpoint instruction > > is same as syscall instruction. > > > > Given that x86 trap delivery is pretty much going following the suit here and > > intend to make trap delivery cost similar to syscall delivery. > > > > Sorry for being buzzkill here but ... > > Is it worth introducing this syscall which otherwise has no use on other arches > > and x86 (and x86 kernel) has already taken steps to match trap delivery latency with > > syscall latency would have similar cost? > > > > Did you do any study of this on FRED enabled x86 CPUs? nope.. interesting, will check, thanks > > afaik CPUs with FRED do not exist on the market and it's > not clear when they will be available. > And when they finally will be on the shelves > the overhead of FRED vs int3 would still have to be measured. > int3 with FRED might still be higher than syscall with FRED. +1, also it's not really a complicated change and the wiring of the new syscall to uretprobe is really simple and we could go back to int3 with just one single patch if we see no longer any benefit to it, but at the moment it provides speed up jirka > > > > > [1] - https://www.intel.com/content/www/us/en/content-details/780121/flexible-return-and-event-delivery-fred-specification.html > > [2] - https://docs.kernel.org/arch/x86/x86_64/fred.html > > > > > > > >The speed up depends on instruction type that uprobe is installed > > >and depends on specific HW type, please check patch 1 for details. > > >