mbox series

[v2,00/11] perf/uprobe: Optimize uprobes

Message ID 20240711110235.098009979@infradead.org (mailing list archive)
Headers show
Series perf/uprobe: Optimize uprobes | expand

Message

Peter Zijlstra July 11, 2024, 11:02 a.m. UTC
Hi!

These patches implement the (S)RCU based proposal to optimize uprobes.

On my c^Htrusty old IVB-EP -- where each (of the 40) CPU calls 'func' in a
tight loop:

  perf probe -x ./uprobes test=func
  perf stat -ae probe_uprobe:test  -- sleep 1

  perf probe -x ./uprobes test=func%return
  perf stat -ae probe_uprobe:test__return -- sleep 1

PRE:

  4,038,804      probe_uprobe:test
  2,356,275      probe_uprobe:test__return

POST:

  7,216,579      probe_uprobe:test
  6,744,786      probe_uprobe:test__return

(copy-paste FTW, I didn't do new numbers because the fast paths didn't change --
 and quick test run shows similar numbers)

Patches also available here:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/uprobes


Changes since last time:
 - better split with intermediate inc_not_zero()
 - fix UPROBE_HANDLER_REMOVE
 - restored the lost rcu_assign_pointer()
 - avoid lockdep for uretprobe_srcu
 - add missing put_uprobe() -> srcu_read_unlock() conversion
 - actually initialize return_instance::has_ref
 - a few comments
 - things I don't remember

Comments

Andrii Nakryiko July 12, 2024, 4:57 a.m. UTC | #1
On Thu, Jul 11, 2024 at 4:07 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Hi!
>
> These patches implement the (S)RCU based proposal to optimize uprobes.
>
> On my c^Htrusty old IVB-EP -- where each (of the 40) CPU calls 'func' in a
> tight loop:
>
>   perf probe -x ./uprobes test=func
>   perf stat -ae probe_uprobe:test  -- sleep 1
>
>   perf probe -x ./uprobes test=func%return
>   perf stat -ae probe_uprobe:test__return -- sleep 1
>
> PRE:
>
>   4,038,804      probe_uprobe:test
>   2,356,275      probe_uprobe:test__return
>
> POST:
>
>   7,216,579      probe_uprobe:test
>   6,744,786      probe_uprobe:test__return
>
> (copy-paste FTW, I didn't do new numbers because the fast paths didn't change --
>  and quick test run shows similar numbers)
>
> Patches also available here:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/uprobes
>
>
> Changes since last time:
>  - better split with intermediate inc_not_zero()
>  - fix UPROBE_HANDLER_REMOVE
>  - restored the lost rcu_assign_pointer()
>  - avoid lockdep for uretprobe_srcu
>  - add missing put_uprobe() -> srcu_read_unlock() conversion
>  - actually initialize return_instance::has_ref
>  - a few comments
>  - things I don't remember
>
>

Hey Peter!

Thanks for the v2, I plan to look at it more thoroughly tomorrow. But
meanwhile I spent a good chunk of today to write an uprobes
stress-test, so we can validate that we are not regressing anything
(yes, I don't trust lockless code and people in general ;)

Anyways, if you'd like to use it, it's at [0]. All you should need to
build and run it is:

  $ cd examples/c
  $ make -j$(nproc) uprobe-stress
  $ sudo ./uprobe-stress -tN -aM -mP -fR


N, M, P, R are number of threads dedicated to one of four functions of
the stress test: triggering user space functions (N),
attaching/detaching various random subsets of uprobes (M), mmap()ing
parts of executable with uprobes (P), and forking the process and
triggering uprobes for a little bit (R). The idea is to test various
timings and interleavings of uprobe-related logic.

You should only need not-too-old Clang to build everything (Clang 12+
should work, I believe). But do let me know if you run into troubles.

I did run this stress test for a little while on current
bpf-next/master with no issues detected (yay!).

But then I also ran it on Linux built from perf/uprobes branch (these
patches), and after a few seconds I see that there is no more
attachment/detachment happening. Eventually I got splats, which you
can see in [1]. I used `sudo ./uprobe-stress -a10 -t5 -m5 -f3` command
to run it inside my QEMU image.

So there is still something off, hopefully this will help to debug and
hammer out any remaining kinks. Thanks!

  [0] https://github.com/libbpf/libbpf-bootstrap/commit/2f88cef90f9728ec8c7bee7bd48fdbcf197806c3
  [1] https://gist.github.com/anakryiko/f761690addf7aa5f08caec95fda9ef1a
Peter Zijlstra July 12, 2024, 9:13 a.m. UTC | #2
On Thu, Jul 11, 2024 at 09:57:44PM -0700, Andrii Nakryiko wrote:

> You should only need not-too-old Clang to build everything (Clang 12+
> should work, I believe). But do let me know if you run into troubles.

A quick look at the thing shows me it's full of BPF gunk :/

Which means, I probably also need a kernel with BPF and BTF and all that
other gunk on -- which I don't have on any of my machines.

Esp. the BTF stuff is a force kill on all my builds as it completely
destroys build times.

> But then I also ran it on Linux built from perf/uprobes branch (these
> patches), and after a few seconds I see that there is no more
> attachment/detachment happening. Eventually I got splats, which you
> can see in [1]. I used `sudo ./uprobe-stress -a10 -t5 -m5 -f3` command
> to run it inside my QEMU image.

OK, I'll see if I can make it work.
Peter Zijlstra July 12, 2024, 1:10 p.m. UTC | #3
On Thu, Jul 11, 2024 at 09:57:44PM -0700, Andrii Nakryiko wrote:

> Anyways, if you'd like to use it, it's at [0]. All you should need to
> build and run it is:
> 
>   $ cd examples/c
>   $ make -j$(nproc) uprobe-stress
>   $ sudo ./uprobe-stress -tN -aM -mP -fR

>   [0] https://github.com/libbpf/libbpf-bootstrap/commit/2f88cef90f9728ec8c7bee7bd48fdbcf197806c3

So, I cannot clone that [0] URL I had to click around github shite for a
while to figure out wtf to clone and where your commit lives, turns out
it is:

	$ git clone https://github.com/libbpf/libbpf-bootstrap.git
	$ cd libbpf-bootstrap
	$ git checkout uprobe-stress

But then I do the above and I'm greeted with:

root@ivb-ep:/usr/src/libbpf-bootstrap/examples/c# make -j40
  MKDIR    .output
  MKDIR    .output/libbpf
  MKDIR    bpftool
  LIB      libbpf.a
  BPFTOOL  bpftool/bootstrap/bpftool
make[1]: *** /usr/src/libbpf-bootstrap/libbpf/src: No such file or directory.  Stop.
make: *** [Makefile:87: /usr/src/libbpf-bootstrap/examples/c/.output/libbpf.a] Error 2
make: *** Waiting for unfinished jobs....
make[1]: *** /usr/src/libbpf-bootstrap/bpftool/src: No such file or directory.  Stop.
make: *** [Makefile:95: /usr/src/libbpf-bootstrap/examples/c/.output/bpftool/bootstrap/bpftool] Error 2


Now what ?!?

BPF is ever such unusable shite :/ It's very near as bad as qemu.
Andrii Nakryiko July 12, 2024, 3:29 p.m. UTC | #4
On Fri, Jul 12, 2024 at 6:10 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Jul 11, 2024 at 09:57:44PM -0700, Andrii Nakryiko wrote:
>
> > Anyways, if you'd like to use it, it's at [0]. All you should need to
> > build and run it is:
> >
> >   $ cd examples/c
> >   $ make -j$(nproc) uprobe-stress
> >   $ sudo ./uprobe-stress -tN -aM -mP -fR
>
> >   [0] https://github.com/libbpf/libbpf-bootstrap/commit/2f88cef90f9728ec8c7bee7bd48fdbcf197806c3
>
> So, I cannot clone that [0] URL I had to click around github shite for a
> while to figure out wtf to clone and where your commit lives, turns out
> it is:

Sorry, my bad, it's just "normal" Github stuff, but yes, I also find
it quite confusing, so should have been more explicit that this is
uprobe-stress branch in libbpf/libbpf-bootstrap Github repo.

>
>         $ git clone https://github.com/libbpf/libbpf-bootstrap.git
>         $ cd libbpf-bootstrap
>         $ git checkout uprobe-stress

Yes, sorry, as I said, I should have been more thorough in my
instructions. You did a great job figuring all the above out, the next
step is to make sure all the git submodules are checked out, so, in
addition to the above you are just missing git submodule
initialization:

$ git submodule update --init --recursive
$ cd examples/c
$ make -j%(nproc) uprobe-stress

NOTE: Don't run just `make`, because it will build all the examples,
which have extra dependencies, and we don't want to go over that with
you :) Stick to `make uprobe-stress` to build just an uprobe-stress
binary.

The only extra thing that you might need (if you don't have that
already on your build machine) is development versions of libelf and
zlib, as dependencies of libbpf (that would be elfutils-libelf-devel
and zlib-devel packages in Fedora, I think). libbpf-bootstrap is
trying to be as self-contained and dependency free as possible (which
is also why git submodules).

You were worried about BTF. This tool doesn't need BTF and it should
work without it enabled in kernel config (but if you do want BTF
generation to succeed, I think the only kernel build requirement for
that is up-to-date pahole from dwarves package).

As for the kernel config, I don't think you need anything BPF-specific beyond:

CONFIG_BPF=y
CONFIG_BPF_EVENTS=y
CONFIG_BPF_SYSCALL=y

But just in case, we maintain a list of kernel configuration that *all
of BPF selftests* require (see [0]), so worst case just append that to
your config (but really, above three is probably all you need,
assuming you have all the non-BPF perf/tracing/uprobe configs
enabled).

  [0] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/config

>
> But then I do the above and I'm greeted with:
>
> root@ivb-ep:/usr/src/libbpf-bootstrap/examples/c# make -j40
>   MKDIR    .output
>   MKDIR    .output/libbpf
>   MKDIR    bpftool
>   LIB      libbpf.a
>   BPFTOOL  bpftool/bootstrap/bpftool
> make[1]: *** /usr/src/libbpf-bootstrap/libbpf/src: No such file or directory.  Stop.
> make: *** [Makefile:87: /usr/src/libbpf-bootstrap/examples/c/.output/libbpf.a] Error 2
> make: *** Waiting for unfinished jobs....
> make[1]: *** /usr/src/libbpf-bootstrap/bpftool/src: No such file or directory.  Stop.
> make: *** [Makefile:95: /usr/src/libbpf-bootstrap/examples/c/.output/bpftool/bootstrap/bpftool] Error 2
>
>
> Now what ?!?
>
> BPF is ever such unusable shite :/ It's very near as bad as qemu.
>

Sorry, not going into discussions about this :) I do agree initial
setup is not straightforward, and this libbpf-bootstrap repo is
actually an attempt to let users start quickly by doing a minimal and
more-or-less straightforward BPF setup (which is why I built
uprobe-stress on top of libbpf-bootstrap setup instead of BPF
selftests; that would be even more painful process for you which I
didn't want to force you through, see above about
selftests/bpf/config).

But just keep in mind that using BPF here isn't some sort of random
choice just because I tend to work mostly with BPF. BPF is the only
interface to multi-uprobe attachment, it's a lightweight and
low-overhead way to know whether uprobes are triggered (I do
memory-mapped sharing of per-cpu counters between BPF/kernel and user
space, no extra syscall overhead, blocking, or unnecessary memory
ordering is added). And in general, whether you like it or not, most
people would never care to use uprobes if not for BPF as an interface
into that.

I'm not trying to convert you into BPF or anything like that, but this
has to use BPF for end-to-end testing.

But you are close, please don't give up!
Peter Zijlstra July 15, 2024, 2:45 p.m. UTC | #5
On Thu, Jul 11, 2024 at 09:57:44PM -0700, Andrii Nakryiko wrote:

> But then I also ran it on Linux built from perf/uprobes branch (these
> patches), and after a few seconds I see that there is no more
> attachment/detachment happening. Eventually I got splats, which you
> can see in [1]. I used `sudo ./uprobe-stress -a10 -t5 -m5 -f3` command
> to run it inside my QEMU image.

So them git voodoo incantations did work and I got it built. I'm running
that exact same line above (minus the sudo, because test box only has a
root account I think) on real hardware.

I'm now ~100 periods in and wondering what 'eventually' means...

Also, this is a 2 socket, 10 core per socket, 2 threads per core
ivybridge thing, are those parameters sufficient?
Andrii Nakryiko July 15, 2024, 5:10 p.m. UTC | #6
On Mon, Jul 15, 2024 at 7:45 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Jul 11, 2024 at 09:57:44PM -0700, Andrii Nakryiko wrote:
>
> > But then I also ran it on Linux built from perf/uprobes branch (these
> > patches), and after a few seconds I see that there is no more
> > attachment/detachment happening. Eventually I got splats, which you
> > can see in [1]. I used `sudo ./uprobe-stress -a10 -t5 -m5 -f3` command
> > to run it inside my QEMU image.
>
> So them git voodoo incantations did work and I got it built. I'm running
> that exact same line above (minus the sudo, because test box only has a
> root account I think) on real hardware.
>
> I'm now ~100 periods in and wondering what 'eventually' means...

So I was running in a qemu set up with 16 cores on top of bare metal's
80 core CPU (Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz). I just tried
it again, and I can reproduce it within first few periods:

WORKING HARD!..

PERIOD #1 STATS:
FUNC CALLS               919632
UPROBE HITS              706351
URETPROBE HITS           641679
ATTACHED LINKS              951
ATTACHED UPROBES           2421
ATTACHED URETPROBES        2343
MMAP CALLS                33533
FORKS CALLS                 241

PERIOD #2 STATS:
FUNC CALLS                11444
UPROBE HITS               14320
URETPROBE HITS             9896
ATTACHED LINKS               26
ATTACHED UPROBES             75
ATTACHED URETPROBES          61
MMAP CALLS                39093
FORKS CALLS                  14

PERIOD #3 STATS:
FUNC CALLS                  230
UPROBE HITS                 152
URETPROBE HITS              145
ATTACHED LINKS                2
ATTACHED UPROBES              2
ATTACHED URETPROBES           2
MMAP CALLS                39121
FORKS CALLS                   0

PERIOD #4 STATS:
FUNC CALLS                    0
UPROBE HITS                   0
URETPROBE HITS                0
ATTACHED LINKS                0
ATTACHED UPROBES              0
ATTACHED URETPROBES           0
MMAP CALLS                39010
FORKS CALLS                   0

You can see in the second period all the numbers drop and by period #4
(which is about 20 seconds in) anything but mmap()ing stops. When I
said "eventually" I meant about a minute tops, however long it takes
to do soft lockup detection, 23 seconds this time.

So it should be very fast.

Note that I'm running with debug kernel configuration (see [0] for
full kernel config), here are debug-related settings, in case that
makes a difference:

$ cat ~/linux-build/default/.config | rg -i debug | rg -v '^#'
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_BLK_DEBUG_FS=y
CONFIG_PNP_DEBUG_MESSAGES=y
CONFIG_AIC7XXX_DEBUG_MASK=0
CONFIG_AIC79XX_DEBUG_MASK=0
CONFIG_SCSI_MVSAS_DEBUG=y
CONFIG_DM_DEBUG=y
CONFIG_MLX4_DEBUG=y
CONFIG_USB_SERIAL_DEBUG=m
CONFIG_INFINIBAND_MTHCA_DEBUG=y
CONFIG_INFINIBAND_IPOIB_DEBUG=y
CONFIG_INFINIBAND_IPOIB_DEBUG_DATA=y
CONFIG_CIFS_DEBUG=y
CONFIG_DLM_DEBUG=y
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_DWARF4=y
CONFIG_DEBUG_INFO_COMPRESSED_NONE=y
CONFIG_DEBUG_INFO_BTF=y
CONFIG_DEBUG_INFO_BTF_MODULES=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_FS_ALLOW_ALL=y
CONFIG_ARCH_HAS_DEBUG_WX=y
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
CONFIG_SCHED_DEBUG=y
CONFIG_DEBUG_PREEMPT=y
CONFIG_LOCK_DEBUGGING_SUPPORT=y
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_RWSEMS=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_DEBUG_LOCKDEP=y
CONFIG_DEBUG_ATOMIC_SLEEP=y
CONFIG_DEBUG_IRQFLAGS=y
CONFIG_X86_DEBUG_FPU=y
CONFIG_FAULT_INJECTION_DEBUG_FS=y

  [0] https://gist.github.com/anakryiko/97a023a95b30fb0fe607ff743433e64b

>
> Also, this is a 2 socket, 10 core per socket, 2 threads per core
> ivybridge thing, are those parameters sufficient?

Should be, I guess? It might be VM vs bare metal differences, though.
I'll try to run this on bare metal with more production-like kernel
configuration to see if I can still trigger this. Will let you know
the results when I get them.
Andrii Nakryiko July 15, 2024, 6:10 p.m. UTC | #7
On Mon, Jul 15, 2024 at 10:10 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Jul 15, 2024 at 7:45 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Jul 11, 2024 at 09:57:44PM -0700, Andrii Nakryiko wrote:
> >
> > > But then I also ran it on Linux built from perf/uprobes branch (these
> > > patches), and after a few seconds I see that there is no more
> > > attachment/detachment happening. Eventually I got splats, which you
> > > can see in [1]. I used `sudo ./uprobe-stress -a10 -t5 -m5 -f3` command
> > > to run it inside my QEMU image.
> >
> > So them git voodoo incantations did work and I got it built. I'm running
> > that exact same line above (minus the sudo, because test box only has a
> > root account I think) on real hardware.
> >
> > I'm now ~100 periods in and wondering what 'eventually' means...
>
> So I was running in a qemu set up with 16 cores on top of bare metal's
> 80 core CPU (Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz). I just tried
> it again, and I can reproduce it within first few periods:
>
> WORKING HARD!..
>
> PERIOD #1 STATS:
> FUNC CALLS               919632
> UPROBE HITS              706351
> URETPROBE HITS           641679
> ATTACHED LINKS              951
> ATTACHED UPROBES           2421
> ATTACHED URETPROBES        2343
> MMAP CALLS                33533
> FORKS CALLS                 241
>
> PERIOD #2 STATS:
> FUNC CALLS                11444
> UPROBE HITS               14320
> URETPROBE HITS             9896
> ATTACHED LINKS               26
> ATTACHED UPROBES             75
> ATTACHED URETPROBES          61
> MMAP CALLS                39093
> FORKS CALLS                  14
>
> PERIOD #3 STATS:
> FUNC CALLS                  230
> UPROBE HITS                 152
> URETPROBE HITS              145
> ATTACHED LINKS                2
> ATTACHED UPROBES              2
> ATTACHED URETPROBES           2
> MMAP CALLS                39121
> FORKS CALLS                   0
>
> PERIOD #4 STATS:
> FUNC CALLS                    0
> UPROBE HITS                   0
> URETPROBE HITS                0
> ATTACHED LINKS                0
> ATTACHED UPROBES              0
> ATTACHED URETPROBES           0
> MMAP CALLS                39010
> FORKS CALLS                   0
>
> You can see in the second period all the numbers drop and by period #4
> (which is about 20 seconds in) anything but mmap()ing stops. When I
> said "eventually" I meant about a minute tops, however long it takes
> to do soft lockup detection, 23 seconds this time.
>
> So it should be very fast.
>
> Note that I'm running with debug kernel configuration (see [0] for
> full kernel config), here are debug-related settings, in case that
> makes a difference:
>
> $ cat ~/linux-build/default/.config | rg -i debug | rg -v '^#'
> CONFIG_X86_DEBUGCTLMSR=y
> CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
> CONFIG_BLK_DEBUG_FS=y
> CONFIG_PNP_DEBUG_MESSAGES=y
> CONFIG_AIC7XXX_DEBUG_MASK=0
> CONFIG_AIC79XX_DEBUG_MASK=0
> CONFIG_SCSI_MVSAS_DEBUG=y
> CONFIG_DM_DEBUG=y
> CONFIG_MLX4_DEBUG=y
> CONFIG_USB_SERIAL_DEBUG=m
> CONFIG_INFINIBAND_MTHCA_DEBUG=y
> CONFIG_INFINIBAND_IPOIB_DEBUG=y
> CONFIG_INFINIBAND_IPOIB_DEBUG_DATA=y
> CONFIG_CIFS_DEBUG=y
> CONFIG_DLM_DEBUG=y
> CONFIG_DEBUG_BUGVERBOSE=y
> CONFIG_DEBUG_KERNEL=y
> CONFIG_DEBUG_INFO=y
> CONFIG_DEBUG_INFO_DWARF4=y
> CONFIG_DEBUG_INFO_COMPRESSED_NONE=y
> CONFIG_DEBUG_INFO_BTF=y
> CONFIG_DEBUG_INFO_BTF_MODULES=y
> CONFIG_DEBUG_FS=y
> CONFIG_DEBUG_FS_ALLOW_ALL=y
> CONFIG_ARCH_HAS_DEBUG_WX=y
> CONFIG_HAVE_DEBUG_KMEMLEAK=y
> CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
> CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
> CONFIG_SCHED_DEBUG=y
> CONFIG_DEBUG_PREEMPT=y
> CONFIG_LOCK_DEBUGGING_SUPPORT=y
> CONFIG_DEBUG_RT_MUTEXES=y
> CONFIG_DEBUG_SPINLOCK=y
> CONFIG_DEBUG_MUTEXES=y
> CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
> CONFIG_DEBUG_RWSEMS=y
> CONFIG_DEBUG_LOCK_ALLOC=y
> CONFIG_DEBUG_LOCKDEP=y
> CONFIG_DEBUG_ATOMIC_SLEEP=y
> CONFIG_DEBUG_IRQFLAGS=y
> CONFIG_X86_DEBUG_FPU=y
> CONFIG_FAULT_INJECTION_DEBUG_FS=y
>
>   [0] https://gist.github.com/anakryiko/97a023a95b30fb0fe607ff743433e64b
>
> >
> > Also, this is a 2 socket, 10 core per socket, 2 threads per core
> > ivybridge thing, are those parameters sufficient?
>
> Should be, I guess? It might be VM vs bare metal differences, though.
> I'll try to run this on bare metal with more production-like kernel
> configuration to see if I can still trigger this. Will let you know
> the results when I get them.

Ok, so I ran it on bare metal host with production config. I didn't
really bother to specify parameters (so just one thread for
everything, the default):

# ./uprobe-stress
WORKING HARD!..

PERIOD #1 STATS:
FUNC CALLS              2959843
UPROBE HITS             1001312
URETPROBE HITS                0
ATTACHED LINKS                6
ATTACHED UPROBES             28
ATTACHED URETPROBES           0
MMAP CALLS                 8143
FORKS CALLS                 301

PERIOD #2 STATS:
FUNC CALLS                    0
UPROBE HITS              822826
URETPROBE HITS                0
ATTACHED LINKS                0
ATTACHED UPROBES              0
ATTACHED URETPROBES           0
MMAP CALLS                 8006
FORKS CALLS                 270

PERIOD #3 STATS:
FUNC CALLS                    0
UPROBE HITS              889534
URETPROBE HITS                0
ATTACHED LINKS                0
ATTACHED UPROBES              0
ATTACHED URETPROBES           0
MMAP CALLS                 8004
FORKS CALLS                 288

PERIOD #4 STATS:
FUNC CALLS                    0
UPROBE HITS              886506
URETPROBE HITS                0
ATTACHED LINKS                0
ATTACHED UPROBES              0
ATTACHED URETPROBES           0
MMAP CALLS                 8120
FORKS CALLS                 285

PERIOD #5 STATS:
FUNC CALLS                    0
UPROBE HITS              804556
URETPROBE HITS                0
ATTACHED LINKS                0
ATTACHED UPROBES              0
ATTACHED URETPROBES           0
MMAP CALLS                 7131
FORKS CALLS                 263
^C
EXITING...

Message from syslogd@kerneltest003.10.atn6.facebook.com at Jul 15 11:06:33 ...
 kernel:[ 2194.334618] watchdog: BUG: soft lockup - CPU#71 stuck for
48s! [uprobe-stress:69900]

It was weird on the very first period (no uretprobes, small amount of
attachments). And sure enough (gmail will reformat below in the
garbage, so [0] has the splat with the original formatting).

  [0] https://gist.github.com/anakryiko/3e3ddcccc5ea3ca70ce90b5491485fdc

I also keep getting:

Message from syslogd@kerneltest003.10.atn6.facebook.com at Jul 15 11:09:41 ...
 kernel:[ 2382.334088] watchdog: BUG: soft lockup - CPU#71 stuck for
223s! [uprobe-stress:69900]

so it's not just a temporary slowdown


[ 2166.893057] rcu: INFO: rcu_sched self-detected stall on CPU
[ 2166.904199] rcu:     71-....: (20999 ticks this GP)
idle=2c84/1/0x4000000000000000 softirq=30158/30158 fqs=8110
[ 2166.923810] rcu:              hardirqs   softirqs   csw/system
[ 2166.934939] rcu:      number:        0        183            0
[ 2166.946064] rcu:     cputime:       60          0        10438
==> 10549(ms)
[ 2166.959969] rcu:     (t=21065 jiffies g=369217 q=207850 ncpus=80)
[ 2166.971619] CPU: 71 PID: 69900 Comm: uprobe-stress Tainted: G S
     E      6.10.0-rc7-00071-g9423ae8ef6ff #62
[ 2166.992275] Hardware name: Quanta Tioga Pass Single Side
01-0032211004/Tioga Pass Single Side, BIOS F08_3A24 05/13/2020
[ 2167.013804] RIP: 0010:uprobe_notify_resume+0x622/0xe20
[ 2167.024064] Code: 8d 9d c0 00 00 00 48 89 df 4c 89 e6 e8 d7 f9 ff
ff 84 c0 0f 85 c6 06 00 00 48 89 5c 24 20 41 8b 6d 58 40 f6 c5 01 74
23 f3 90 <eb> f2 83 7c 24 18 00 48 8b 44 24 10 0f 8e 71 01 00 00 bf 05
00 00
[ 2167.061543] RSP: 0000:ffffc9004a49fe78 EFLAGS: 00000202
[ 2167.071973] RAX: 0000000000000000 RBX: ffff88a11d307fc0 RCX: ffff88a120752c40
[ 2167.086223] RDX: 00000000000042ec RSI: ffffc9004a49ff58 RDI: ffff88a11d307fc0
[ 2167.100472] RBP: 0000000000000003 R08: ffff88a12516e500 R09: ffff88a12516f208
[ 2167.114717] R10: 00000000004042ec R11: 000000000000000f R12: ffffc9004a49ff58
[ 2167.128967] R13: ffff88a11d307f00 R14: 00000000004042ec R15: ffff88a09042e000
[ 2167.143213] FS:  00007fd252000640(0000) GS:ffff88bfffbc0000(0000)
knlGS:0000000000000000
[ 2167.159368] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2167.170843] CR2: 00007fd244000b60 CR3: 000000209090b001 CR4: 00000000007706f0
[ 2167.185091] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2167.199340] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2167.213586] PKRU: 55555554
[ 2167.218994] Call Trace:
[ 2167.223883]  <IRQ>
[ 2167.227905]  ? rcu_dump_cpu_stacks+0x77/0xd0
[ 2167.236433]  ? print_cpu_stall+0x150/0x2a0
[ 2167.244615]  ? rcu_sched_clock_irq+0x319/0x490
[ 2167.253487]  ? update_process_times+0x71/0xa0
[ 2167.262191]  ? tick_nohz_handler+0xc0/0x100
[ 2167.270544]  ? tick_setup_sched_timer+0x170/0x170
[ 2167.279937]  ? __hrtimer_run_queues+0xe3/0x250
[ 2167.288815]  ? hrtimer_interrupt+0xf0/0x390
[ 2167.297168]  ? __sysvec_apic_timer_interrupt+0x47/0x110
[ 2167.307602]  ? sysvec_apic_timer_interrupt+0x68/0x80
[ 2167.317519]  </IRQ>
[ 2167.321710]  <TASK>
[ 2167.325905]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 2167.336517]  ? uprobe_notify_resume+0x622/0xe20
[ 2167.345565]  ? uprobe_notify_resume+0x609/0xe20
[ 2167.354612]  ? __se_sys_futex+0xf3/0x180
[ 2167.362445]  ? arch_uprobe_exception_notify+0x29/0x40
[ 2167.372533]  ? notify_die+0x51/0xb0
[ 2167.379503]  irqentry_exit_to_user_mode+0x7f/0xd0
[ 2167.388896]  asm_exc_int3+0x35/0x40
[ 2167.395862] RIP: 0033:0x4042ec
[ 2167.401966] Code: fc 8b 45 fc 89 c7 e8 6f 07 00 00 83 c0 01 c9 c3
cc 48 89 e5 48 83 ec 10 89 7d fc 8b 45 fc 89 c7 e8 55 07 00 00 83 c0
01 c9 c3 <cc> 48 89 e5 48 83 ec 10 89 7d fc 8b 45 fc 89 c7 e8 3b 07 00
00 83
[ 2167.439439] RSP: 002b:00007fd251fff8a8 EFLAGS: 00000206
[ 2167.449874] RAX: 00000000004042ec RBX: 00007fd252000640 RCX: 000000000000001c
[ 2167.464122] RDX: 0000000000000033 RSI: 0000000000000064 RDI: 0000000000000033
[ 2167.478368] RBP: 00007fd251fff8d0 R08: 00007fd2523fa234 R09: 00007fd2523fa280
[ 2167.492617] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fd252000640
[ 2167.506866] R13: 0000000000000016 R14: 00007fd252289930 R15: 0000000000000000
[ 2167.521117]  </TASK>
Andrii Nakryiko July 19, 2024, 6:42 p.m. UTC | #8
On Mon, Jul 15, 2024 at 11:10 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Jul 15, 2024 at 10:10 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Jul 15, 2024 at 7:45 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Thu, Jul 11, 2024 at 09:57:44PM -0700, Andrii Nakryiko wrote:
> > >
> > > > But then I also ran it on Linux built from perf/uprobes branch (these
> > > > patches), and after a few seconds I see that there is no more
> > > > attachment/detachment happening. Eventually I got splats, which you
> > > > can see in [1]. I used `sudo ./uprobe-stress -a10 -t5 -m5 -f3` command
> > > > to run it inside my QEMU image.
> > >
> > > So them git voodoo incantations did work and I got it built. I'm running
> > > that exact same line above (minus the sudo, because test box only has a
> > > root account I think) on real hardware.
> > >
> > > I'm now ~100 periods in and wondering what 'eventually' means...
> >
> > So I was running in a qemu set up with 16 cores on top of bare metal's
> > 80 core CPU (Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz). I just tried
> > it again, and I can reproduce it within first few periods:
> >
> > WORKING HARD!..
> >
> > PERIOD #1 STATS:
> > FUNC CALLS               919632
> > UPROBE HITS              706351
> > URETPROBE HITS           641679
> > ATTACHED LINKS              951
> > ATTACHED UPROBES           2421
> > ATTACHED URETPROBES        2343
> > MMAP CALLS                33533
> > FORKS CALLS                 241
> >
> > PERIOD #2 STATS:
> > FUNC CALLS                11444
> > UPROBE HITS               14320
> > URETPROBE HITS             9896
> > ATTACHED LINKS               26
> > ATTACHED UPROBES             75
> > ATTACHED URETPROBES          61
> > MMAP CALLS                39093
> > FORKS CALLS                  14
> >
> > PERIOD #3 STATS:
> > FUNC CALLS                  230
> > UPROBE HITS                 152
> > URETPROBE HITS              145
> > ATTACHED LINKS                2
> > ATTACHED UPROBES              2
> > ATTACHED URETPROBES           2
> > MMAP CALLS                39121
> > FORKS CALLS                   0
> >
> > PERIOD #4 STATS:
> > FUNC CALLS                    0
> > UPROBE HITS                   0
> > URETPROBE HITS                0
> > ATTACHED LINKS                0
> > ATTACHED UPROBES              0
> > ATTACHED URETPROBES           0
> > MMAP CALLS                39010
> > FORKS CALLS                   0
> >
> > You can see in the second period all the numbers drop and by period #4
> > (which is about 20 seconds in) anything but mmap()ing stops. When I
> > said "eventually" I meant about a minute tops, however long it takes
> > to do soft lockup detection, 23 seconds this time.
> >
> > So it should be very fast.
> >
> > Note that I'm running with debug kernel configuration (see [0] for
> > full kernel config), here are debug-related settings, in case that
> > makes a difference:
> >
> > $ cat ~/linux-build/default/.config | rg -i debug | rg -v '^#'
> > CONFIG_X86_DEBUGCTLMSR=y
> > CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
> > CONFIG_BLK_DEBUG_FS=y
> > CONFIG_PNP_DEBUG_MESSAGES=y
> > CONFIG_AIC7XXX_DEBUG_MASK=0
> > CONFIG_AIC79XX_DEBUG_MASK=0
> > CONFIG_SCSI_MVSAS_DEBUG=y
> > CONFIG_DM_DEBUG=y
> > CONFIG_MLX4_DEBUG=y
> > CONFIG_USB_SERIAL_DEBUG=m
> > CONFIG_INFINIBAND_MTHCA_DEBUG=y
> > CONFIG_INFINIBAND_IPOIB_DEBUG=y
> > CONFIG_INFINIBAND_IPOIB_DEBUG_DATA=y
> > CONFIG_CIFS_DEBUG=y
> > CONFIG_DLM_DEBUG=y
> > CONFIG_DEBUG_BUGVERBOSE=y
> > CONFIG_DEBUG_KERNEL=y
> > CONFIG_DEBUG_INFO=y
> > CONFIG_DEBUG_INFO_DWARF4=y
> > CONFIG_DEBUG_INFO_COMPRESSED_NONE=y
> > CONFIG_DEBUG_INFO_BTF=y
> > CONFIG_DEBUG_INFO_BTF_MODULES=y
> > CONFIG_DEBUG_FS=y
> > CONFIG_DEBUG_FS_ALLOW_ALL=y
> > CONFIG_ARCH_HAS_DEBUG_WX=y
> > CONFIG_HAVE_DEBUG_KMEMLEAK=y
> > CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
> > CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
> > CONFIG_SCHED_DEBUG=y
> > CONFIG_DEBUG_PREEMPT=y
> > CONFIG_LOCK_DEBUGGING_SUPPORT=y
> > CONFIG_DEBUG_RT_MUTEXES=y
> > CONFIG_DEBUG_SPINLOCK=y
> > CONFIG_DEBUG_MUTEXES=y
> > CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
> > CONFIG_DEBUG_RWSEMS=y
> > CONFIG_DEBUG_LOCK_ALLOC=y
> > CONFIG_DEBUG_LOCKDEP=y
> > CONFIG_DEBUG_ATOMIC_SLEEP=y
> > CONFIG_DEBUG_IRQFLAGS=y
> > CONFIG_X86_DEBUG_FPU=y
> > CONFIG_FAULT_INJECTION_DEBUG_FS=y
> >
> >   [0] https://gist.github.com/anakryiko/97a023a95b30fb0fe607ff743433e64b
> >
> > >
> > > Also, this is a 2 socket, 10 core per socket, 2 threads per core
> > > ivybridge thing, are those parameters sufficient?
> >
> > Should be, I guess? It might be VM vs bare metal differences, though.
> > I'll try to run this on bare metal with more production-like kernel
> > configuration to see if I can still trigger this. Will let you know
> > the results when I get them.
>
> Ok, so I ran it on bare metal host with production config. I didn't
> really bother to specify parameters (so just one thread for
> everything, the default):
>
> # ./uprobe-stress
> WORKING HARD!..
>
> PERIOD #1 STATS:
> FUNC CALLS              2959843
> UPROBE HITS             1001312
> URETPROBE HITS                0
> ATTACHED LINKS                6
> ATTACHED UPROBES             28
> ATTACHED URETPROBES           0
> MMAP CALLS                 8143
> FORKS CALLS                 301
>
> PERIOD #2 STATS:
> FUNC CALLS                    0
> UPROBE HITS              822826
> URETPROBE HITS                0
> ATTACHED LINKS                0
> ATTACHED UPROBES              0
> ATTACHED URETPROBES           0
> MMAP CALLS                 8006
> FORKS CALLS                 270
>
> PERIOD #3 STATS:
> FUNC CALLS                    0
> UPROBE HITS              889534
> URETPROBE HITS                0
> ATTACHED LINKS                0
> ATTACHED UPROBES              0
> ATTACHED URETPROBES           0
> MMAP CALLS                 8004
> FORKS CALLS                 288
>
> PERIOD #4 STATS:
> FUNC CALLS                    0
> UPROBE HITS              886506
> URETPROBE HITS                0
> ATTACHED LINKS                0
> ATTACHED UPROBES              0
> ATTACHED URETPROBES           0
> MMAP CALLS                 8120
> FORKS CALLS                 285
>
> PERIOD #5 STATS:
> FUNC CALLS                    0
> UPROBE HITS              804556
> URETPROBE HITS                0
> ATTACHED LINKS                0
> ATTACHED UPROBES              0
> ATTACHED URETPROBES           0
> MMAP CALLS                 7131
> FORKS CALLS                 263
> ^C
> EXITING...
>
> Message from syslogd@kerneltest003.10.atn6.facebook.com at Jul 15 11:06:33 ...
>  kernel:[ 2194.334618] watchdog: BUG: soft lockup - CPU#71 stuck for
> 48s! [uprobe-stress:69900]
>
> It was weird on the very first period (no uretprobes, small amount of
> attachments). And sure enough (gmail will reformat below in the
> garbage, so [0] has the splat with the original formatting).
>
>   [0] https://gist.github.com/anakryiko/3e3ddcccc5ea3ca70ce90b5491485fdc
>
> I also keep getting:
>
> Message from syslogd@kerneltest003.10.atn6.facebook.com at Jul 15 11:09:41 ...
>  kernel:[ 2382.334088] watchdog: BUG: soft lockup - CPU#71 stuck for
> 223s! [uprobe-stress:69900]
>
> so it's not just a temporary slowdown
>
>
> [ 2166.893057] rcu: INFO: rcu_sched self-detected stall on CPU
> [ 2166.904199] rcu:     71-....: (20999 ticks this GP)
> idle=2c84/1/0x4000000000000000 softirq=30158/30158 fqs=8110
> [ 2166.923810] rcu:              hardirqs   softirqs   csw/system
> [ 2166.934939] rcu:      number:        0        183            0
> [ 2166.946064] rcu:     cputime:       60          0        10438
> ==> 10549(ms)
> [ 2166.959969] rcu:     (t=21065 jiffies g=369217 q=207850 ncpus=80)
> [ 2166.971619] CPU: 71 PID: 69900 Comm: uprobe-stress Tainted: G S
>      E      6.10.0-rc7-00071-g9423ae8ef6ff #62
> [ 2166.992275] Hardware name: Quanta Tioga Pass Single Side
> 01-0032211004/Tioga Pass Single Side, BIOS F08_3A24 05/13/2020
> [ 2167.013804] RIP: 0010:uprobe_notify_resume+0x622/0xe20
> [ 2167.024064] Code: 8d 9d c0 00 00 00 48 89 df 4c 89 e6 e8 d7 f9 ff
> ff 84 c0 0f 85 c6 06 00 00 48 89 5c 24 20 41 8b 6d 58 40 f6 c5 01 74
> 23 f3 90 <eb> f2 83 7c 24 18 00 48 8b 44 24 10 0f 8e 71 01 00 00 bf 05
> 00 00
> [ 2167.061543] RSP: 0000:ffffc9004a49fe78 EFLAGS: 00000202
> [ 2167.071973] RAX: 0000000000000000 RBX: ffff88a11d307fc0 RCX: ffff88a120752c40
> [ 2167.086223] RDX: 00000000000042ec RSI: ffffc9004a49ff58 RDI: ffff88a11d307fc0
> [ 2167.100472] RBP: 0000000000000003 R08: ffff88a12516e500 R09: ffff88a12516f208
> [ 2167.114717] R10: 00000000004042ec R11: 000000000000000f R12: ffffc9004a49ff58
> [ 2167.128967] R13: ffff88a11d307f00 R14: 00000000004042ec R15: ffff88a09042e000
> [ 2167.143213] FS:  00007fd252000640(0000) GS:ffff88bfffbc0000(0000)
> knlGS:0000000000000000
> [ 2167.159368] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2167.170843] CR2: 00007fd244000b60 CR3: 000000209090b001 CR4: 00000000007706f0
> [ 2167.185091] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2167.199340] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2167.213586] PKRU: 55555554
> [ 2167.218994] Call Trace:
> [ 2167.223883]  <IRQ>
> [ 2167.227905]  ? rcu_dump_cpu_stacks+0x77/0xd0
> [ 2167.236433]  ? print_cpu_stall+0x150/0x2a0
> [ 2167.244615]  ? rcu_sched_clock_irq+0x319/0x490
> [ 2167.253487]  ? update_process_times+0x71/0xa0
> [ 2167.262191]  ? tick_nohz_handler+0xc0/0x100
> [ 2167.270544]  ? tick_setup_sched_timer+0x170/0x170
> [ 2167.279937]  ? __hrtimer_run_queues+0xe3/0x250
> [ 2167.288815]  ? hrtimer_interrupt+0xf0/0x390
> [ 2167.297168]  ? __sysvec_apic_timer_interrupt+0x47/0x110
> [ 2167.307602]  ? sysvec_apic_timer_interrupt+0x68/0x80
> [ 2167.317519]  </IRQ>
> [ 2167.321710]  <TASK>
> [ 2167.325905]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> [ 2167.336517]  ? uprobe_notify_resume+0x622/0xe20
> [ 2167.345565]  ? uprobe_notify_resume+0x609/0xe20
> [ 2167.354612]  ? __se_sys_futex+0xf3/0x180
> [ 2167.362445]  ? arch_uprobe_exception_notify+0x29/0x40
> [ 2167.372533]  ? notify_die+0x51/0xb0
> [ 2167.379503]  irqentry_exit_to_user_mode+0x7f/0xd0
> [ 2167.388896]  asm_exc_int3+0x35/0x40
> [ 2167.395862] RIP: 0033:0x4042ec
> [ 2167.401966] Code: fc 8b 45 fc 89 c7 e8 6f 07 00 00 83 c0 01 c9 c3
> cc 48 89 e5 48 83 ec 10 89 7d fc 8b 45 fc 89 c7 e8 55 07 00 00 83 c0
> 01 c9 c3 <cc> 48 89 e5 48 83 ec 10 89 7d fc 8b 45 fc 89 c7 e8 3b 07 00
> 00 83
> [ 2167.439439] RSP: 002b:00007fd251fff8a8 EFLAGS: 00000206
> [ 2167.449874] RAX: 00000000004042ec RBX: 00007fd252000640 RCX: 000000000000001c
> [ 2167.464122] RDX: 0000000000000033 RSI: 0000000000000064 RDI: 0000000000000033
> [ 2167.478368] RBP: 00007fd251fff8d0 R08: 00007fd2523fa234 R09: 00007fd2523fa280
> [ 2167.492617] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fd252000640
> [ 2167.506866] R13: 0000000000000016 R14: 00007fd252289930 R15: 0000000000000000
> [ 2167.521117]  </TASK>

Peter,

Did you manage to reproduce this?
Andrii Nakryiko July 27, 2024, 12:18 a.m. UTC | #9
On Fri, Jul 19, 2024 at 11:42 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Jul 15, 2024 at 11:10 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Jul 15, 2024 at 10:10 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Mon, Jul 15, 2024 at 7:45 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > On Thu, Jul 11, 2024 at 09:57:44PM -0700, Andrii Nakryiko wrote:
> > > >
> > > > > But then I also ran it on Linux built from perf/uprobes branch (these
> > > > > patches), and after a few seconds I see that there is no more
> > > > > attachment/detachment happening. Eventually I got splats, which you
> > > > > can see in [1]. I used `sudo ./uprobe-stress -a10 -t5 -m5 -f3` command
> > > > > to run it inside my QEMU image.
> > > >
> > > > So them git voodoo incantations did work and I got it built. I'm running
> > > > that exact same line above (minus the sudo, because test box only has a
> > > > root account I think) on real hardware.
> > > >
> > > > I'm now ~100 periods in and wondering what 'eventually' means...
> > >
> > > So I was running in a qemu set up with 16 cores on top of bare metal's
> > > 80 core CPU (Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz). I just tried
> > > it again, and I can reproduce it within first few periods:
> > >
> > > WORKING HARD!..
> > >
> > > PERIOD #1 STATS:
> > > FUNC CALLS               919632
> > > UPROBE HITS              706351
> > > URETPROBE HITS           641679
> > > ATTACHED LINKS              951
> > > ATTACHED UPROBES           2421
> > > ATTACHED URETPROBES        2343
> > > MMAP CALLS                33533
> > > FORKS CALLS                 241
> > >
> > > PERIOD #2 STATS:
> > > FUNC CALLS                11444
> > > UPROBE HITS               14320
> > > URETPROBE HITS             9896
> > > ATTACHED LINKS               26
> > > ATTACHED UPROBES             75
> > > ATTACHED URETPROBES          61
> > > MMAP CALLS                39093
> > > FORKS CALLS                  14
> > >
> > > PERIOD #3 STATS:
> > > FUNC CALLS                  230
> > > UPROBE HITS                 152
> > > URETPROBE HITS              145
> > > ATTACHED LINKS                2
> > > ATTACHED UPROBES              2
> > > ATTACHED URETPROBES           2
> > > MMAP CALLS                39121
> > > FORKS CALLS                   0
> > >
> > > PERIOD #4 STATS:
> > > FUNC CALLS                    0
> > > UPROBE HITS                   0
> > > URETPROBE HITS                0
> > > ATTACHED LINKS                0
> > > ATTACHED UPROBES              0
> > > ATTACHED URETPROBES           0
> > > MMAP CALLS                39010
> > > FORKS CALLS                   0
> > >
> > > You can see in the second period all the numbers drop and by period #4
> > > (which is about 20 seconds in) anything but mmap()ing stops. When I
> > > said "eventually" I meant about a minute tops, however long it takes
> > > to do soft lockup detection, 23 seconds this time.
> > >
> > > So it should be very fast.
> > >
> > > Note that I'm running with debug kernel configuration (see [0] for
> > > full kernel config), here are debug-related settings, in case that
> > > makes a difference:
> > >
> > > $ cat ~/linux-build/default/.config | rg -i debug | rg -v '^#'
> > > CONFIG_X86_DEBUGCTLMSR=y
> > > CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
> > > CONFIG_BLK_DEBUG_FS=y
> > > CONFIG_PNP_DEBUG_MESSAGES=y
> > > CONFIG_AIC7XXX_DEBUG_MASK=0
> > > CONFIG_AIC79XX_DEBUG_MASK=0
> > > CONFIG_SCSI_MVSAS_DEBUG=y
> > > CONFIG_DM_DEBUG=y
> > > CONFIG_MLX4_DEBUG=y
> > > CONFIG_USB_SERIAL_DEBUG=m
> > > CONFIG_INFINIBAND_MTHCA_DEBUG=y
> > > CONFIG_INFINIBAND_IPOIB_DEBUG=y
> > > CONFIG_INFINIBAND_IPOIB_DEBUG_DATA=y
> > > CONFIG_CIFS_DEBUG=y
> > > CONFIG_DLM_DEBUG=y
> > > CONFIG_DEBUG_BUGVERBOSE=y
> > > CONFIG_DEBUG_KERNEL=y
> > > CONFIG_DEBUG_INFO=y
> > > CONFIG_DEBUG_INFO_DWARF4=y
> > > CONFIG_DEBUG_INFO_COMPRESSED_NONE=y
> > > CONFIG_DEBUG_INFO_BTF=y
> > > CONFIG_DEBUG_INFO_BTF_MODULES=y
> > > CONFIG_DEBUG_FS=y
> > > CONFIG_DEBUG_FS_ALLOW_ALL=y
> > > CONFIG_ARCH_HAS_DEBUG_WX=y
> > > CONFIG_HAVE_DEBUG_KMEMLEAK=y
> > > CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
> > > CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
> > > CONFIG_SCHED_DEBUG=y
> > > CONFIG_DEBUG_PREEMPT=y
> > > CONFIG_LOCK_DEBUGGING_SUPPORT=y
> > > CONFIG_DEBUG_RT_MUTEXES=y
> > > CONFIG_DEBUG_SPINLOCK=y
> > > CONFIG_DEBUG_MUTEXES=y
> > > CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
> > > CONFIG_DEBUG_RWSEMS=y
> > > CONFIG_DEBUG_LOCK_ALLOC=y
> > > CONFIG_DEBUG_LOCKDEP=y
> > > CONFIG_DEBUG_ATOMIC_SLEEP=y
> > > CONFIG_DEBUG_IRQFLAGS=y
> > > CONFIG_X86_DEBUG_FPU=y
> > > CONFIG_FAULT_INJECTION_DEBUG_FS=y
> > >
> > >   [0] https://gist.github.com/anakryiko/97a023a95b30fb0fe607ff743433e64b
> > >
> > > >
> > > > Also, this is a 2 socket, 10 core per socket, 2 threads per core
> > > > ivybridge thing, are those parameters sufficient?
> > >
> > > Should be, I guess? It might be VM vs bare metal differences, though.
> > > I'll try to run this on bare metal with more production-like kernel
> > > configuration to see if I can still trigger this. Will let you know
> > > the results when I get them.
> >
> > Ok, so I ran it on bare metal host with production config. I didn't
> > really bother to specify parameters (so just one thread for
> > everything, the default):
> >
> > # ./uprobe-stress
> > WORKING HARD!..
> >
> > PERIOD #1 STATS:
> > FUNC CALLS              2959843
> > UPROBE HITS             1001312
> > URETPROBE HITS                0
> > ATTACHED LINKS                6
> > ATTACHED UPROBES             28
> > ATTACHED URETPROBES           0
> > MMAP CALLS                 8143
> > FORKS CALLS                 301
> >
> > PERIOD #2 STATS:
> > FUNC CALLS                    0
> > UPROBE HITS              822826
> > URETPROBE HITS                0
> > ATTACHED LINKS                0
> > ATTACHED UPROBES              0
> > ATTACHED URETPROBES           0
> > MMAP CALLS                 8006
> > FORKS CALLS                 270
> >
> > PERIOD #3 STATS:
> > FUNC CALLS                    0
> > UPROBE HITS              889534
> > URETPROBE HITS                0
> > ATTACHED LINKS                0
> > ATTACHED UPROBES              0
> > ATTACHED URETPROBES           0
> > MMAP CALLS                 8004
> > FORKS CALLS                 288
> >
> > PERIOD #4 STATS:
> > FUNC CALLS                    0
> > UPROBE HITS              886506
> > URETPROBE HITS                0
> > ATTACHED LINKS                0
> > ATTACHED UPROBES              0
> > ATTACHED URETPROBES           0
> > MMAP CALLS                 8120
> > FORKS CALLS                 285
> >
> > PERIOD #5 STATS:
> > FUNC CALLS                    0
> > UPROBE HITS              804556
> > URETPROBE HITS                0
> > ATTACHED LINKS                0
> > ATTACHED UPROBES              0
> > ATTACHED URETPROBES           0
> > MMAP CALLS                 7131
> > FORKS CALLS                 263
> > ^C
> > EXITING...
> >
> > Message from syslogd@kerneltest003.10.atn6.facebook.com at Jul 15 11:06:33 ...
> >  kernel:[ 2194.334618] watchdog: BUG: soft lockup - CPU#71 stuck for
> > 48s! [uprobe-stress:69900]
> >
> > It was weird on the very first period (no uretprobes, small amount of
> > attachments). And sure enough (gmail will reformat below in the
> > garbage, so [0] has the splat with the original formatting).
> >
> >   [0] https://gist.github.com/anakryiko/3e3ddcccc5ea3ca70ce90b5491485fdc
> >
> > I also keep getting:
> >
> > Message from syslogd@kerneltest003.10.atn6.facebook.com at Jul 15 11:09:41 ...
> >  kernel:[ 2382.334088] watchdog: BUG: soft lockup - CPU#71 stuck for
> > 223s! [uprobe-stress:69900]
> >
> > so it's not just a temporary slowdown
> >
> >
> > [ 2166.893057] rcu: INFO: rcu_sched self-detected stall on CPU
> > [ 2166.904199] rcu:     71-....: (20999 ticks this GP)
> > idle=2c84/1/0x4000000000000000 softirq=30158/30158 fqs=8110
> > [ 2166.923810] rcu:              hardirqs   softirqs   csw/system
> > [ 2166.934939] rcu:      number:        0        183            0
> > [ 2166.946064] rcu:     cputime:       60          0        10438
> > ==> 10549(ms)
> > [ 2166.959969] rcu:     (t=21065 jiffies g=369217 q=207850 ncpus=80)
> > [ 2166.971619] CPU: 71 PID: 69900 Comm: uprobe-stress Tainted: G S
> >      E      6.10.0-rc7-00071-g9423ae8ef6ff #62
> > [ 2166.992275] Hardware name: Quanta Tioga Pass Single Side
> > 01-0032211004/Tioga Pass Single Side, BIOS F08_3A24 05/13/2020
> > [ 2167.013804] RIP: 0010:uprobe_notify_resume+0x622/0xe20
> > [ 2167.024064] Code: 8d 9d c0 00 00 00 48 89 df 4c 89 e6 e8 d7 f9 ff
> > ff 84 c0 0f 85 c6 06 00 00 48 89 5c 24 20 41 8b 6d 58 40 f6 c5 01 74
> > 23 f3 90 <eb> f2 83 7c 24 18 00 48 8b 44 24 10 0f 8e 71 01 00 00 bf 05
> > 00 00
> > [ 2167.061543] RSP: 0000:ffffc9004a49fe78 EFLAGS: 00000202
> > [ 2167.071973] RAX: 0000000000000000 RBX: ffff88a11d307fc0 RCX: ffff88a120752c40
> > [ 2167.086223] RDX: 00000000000042ec RSI: ffffc9004a49ff58 RDI: ffff88a11d307fc0
> > [ 2167.100472] RBP: 0000000000000003 R08: ffff88a12516e500 R09: ffff88a12516f208
> > [ 2167.114717] R10: 00000000004042ec R11: 000000000000000f R12: ffffc9004a49ff58
> > [ 2167.128967] R13: ffff88a11d307f00 R14: 00000000004042ec R15: ffff88a09042e000
> > [ 2167.143213] FS:  00007fd252000640(0000) GS:ffff88bfffbc0000(0000)
> > knlGS:0000000000000000
> > [ 2167.159368] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 2167.170843] CR2: 00007fd244000b60 CR3: 000000209090b001 CR4: 00000000007706f0
> > [ 2167.185091] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 2167.199340] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 2167.213586] PKRU: 55555554
> > [ 2167.218994] Call Trace:
> > [ 2167.223883]  <IRQ>
> > [ 2167.227905]  ? rcu_dump_cpu_stacks+0x77/0xd0
> > [ 2167.236433]  ? print_cpu_stall+0x150/0x2a0
> > [ 2167.244615]  ? rcu_sched_clock_irq+0x319/0x490
> > [ 2167.253487]  ? update_process_times+0x71/0xa0
> > [ 2167.262191]  ? tick_nohz_handler+0xc0/0x100
> > [ 2167.270544]  ? tick_setup_sched_timer+0x170/0x170
> > [ 2167.279937]  ? __hrtimer_run_queues+0xe3/0x250
> > [ 2167.288815]  ? hrtimer_interrupt+0xf0/0x390
> > [ 2167.297168]  ? __sysvec_apic_timer_interrupt+0x47/0x110
> > [ 2167.307602]  ? sysvec_apic_timer_interrupt+0x68/0x80
> > [ 2167.317519]  </IRQ>
> > [ 2167.321710]  <TASK>
> > [ 2167.325905]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> > [ 2167.336517]  ? uprobe_notify_resume+0x622/0xe20
> > [ 2167.345565]  ? uprobe_notify_resume+0x609/0xe20
> > [ 2167.354612]  ? __se_sys_futex+0xf3/0x180
> > [ 2167.362445]  ? arch_uprobe_exception_notify+0x29/0x40
> > [ 2167.372533]  ? notify_die+0x51/0xb0
> > [ 2167.379503]  irqentry_exit_to_user_mode+0x7f/0xd0
> > [ 2167.388896]  asm_exc_int3+0x35/0x40
> > [ 2167.395862] RIP: 0033:0x4042ec
> > [ 2167.401966] Code: fc 8b 45 fc 89 c7 e8 6f 07 00 00 83 c0 01 c9 c3
> > cc 48 89 e5 48 83 ec 10 89 7d fc 8b 45 fc 89 c7 e8 55 07 00 00 83 c0
> > 01 c9 c3 <cc> 48 89 e5 48 83 ec 10 89 7d fc 8b 45 fc 89 c7 e8 3b 07 00
> > 00 83
> > [ 2167.439439] RSP: 002b:00007fd251fff8a8 EFLAGS: 00000206
> > [ 2167.449874] RAX: 00000000004042ec RBX: 00007fd252000640 RCX: 000000000000001c
> > [ 2167.464122] RDX: 0000000000000033 RSI: 0000000000000064 RDI: 0000000000000033
> > [ 2167.478368] RBP: 00007fd251fff8d0 R08: 00007fd2523fa234 R09: 00007fd2523fa280
> > [ 2167.492617] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fd252000640
> > [ 2167.506866] R13: 0000000000000016 R14: 00007fd252289930 R15: 0000000000000000
> > [ 2167.521117]  </TASK>
>
> Peter,
>
> Did you manage to reproduce this?

Ping.