mbox series

[v4,tip/perf/core,0/4] uprobes,mm: speculative lockless VMA-to-uprobe lookup

Message ID 20241028010818.2487581-1-andrii@kernel.org (mailing list archive)
Headers show
Series uprobes,mm: speculative lockless VMA-to-uprobe lookup | expand

Message

Andrii Nakryiko Oct. 28, 2024, 1:08 a.m. UTC
Implement speculative (lockless) resolution of VMA to inode to uprobe,
bypassing the need to take mmap_lock for reads, if possible. First two patches
by Suren adds mm_struct helpers that help detect whether mm_struct was
changed, which is used by uprobe logic to validate that speculative results
can be trusted after all the lookup logic results in a valid uprobe instance.

Patch #3 is a simplification to uprobe VMA flag checking, suggested by Oleg.

And, finally, patch #4 is the speculative VMA-to-uprobe resolution logic
itself, and is the focal point of this patch set. It makes entry uprobes in
common case scale very well with number of CPUs, as we avoid any locking or
cache line bouncing between CPUs. See corresponding patch for details and
benchmarking results.

Note, this patch set assumes that FMODE_BACKING files were switched to have
SLAB_TYPE_SAFE_BY_RCU semantics, which was recently done by Christian Brauner
in [0]. This change can be pulled into perf/core through stable
tags/vfs-6.13.for-bpf.file tag from [1].

  [0] https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.13.for-bpf.file&id=8b1bc2590af61129b82a189e9dc7c2804c34400e
  [1] git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git

v3->v4:
- rebased and dropped data_race(), given mm_struct uses real seqcount (Peter);
v2->v3:
- dropped kfree_rcu() patch (Christian);
- added data_race() annotations for fields of vma and vma->vm_file which could
  be modified during speculative lookup (Oleg);
- fixed int->long problem in stubs for mmap_lock_speculation_{start,end}(),
  caught by Kernel test robot;
v1->v2:
- adjusted vma_end_write_all() comment to point out it should never be called
  manually now, but I wasn't sure how ACQUIRE/RELEASE comments should be
  reworded (previously requested by Jann), so I'd appreciate some help there
  (Jann);
- int -> long change for mm_lock_seq, as agreed at LPC2024 (Jann, Suren, Liam);
- kfree_rcu_mightsleep() for FMODE_BACKING (Suren, Christian);
- vm_flags simplification in find_active_uprobe_rcu() and
  find_active_uprobe_speculative() (Oleg);
- guard(rcu)() simplified find_active_uprobe_speculative() implementation.

Andrii Nakryiko (2):
  uprobes: simplify find_active_uprobe_rcu() VMA checks
  uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution

Suren Baghdasaryan (2):
  mm: Convert mm_lock_seq to a proper seqcount
  mm: Introduce mmap_lock_speculation_{begin|end}

 include/linux/mm.h               | 12 ++---
 include/linux/mm_types.h         |  7 ++-
 include/linux/mmap_lock.h        | 87 ++++++++++++++++++++++++--------
 kernel/events/uprobes.c          | 47 ++++++++++++++++-
 kernel/fork.c                    |  5 +-
 mm/init-mm.c                     |  2 +-
 tools/testing/vma/vma.c          |  4 +-
 tools/testing/vma/vma_internal.h |  4 +-
 8 files changed, 129 insertions(+), 39 deletions(-)

Comments

Andrii Nakryiko Nov. 6, 2024, 2:01 a.m. UTC | #1
On Sun, Oct 27, 2024 at 6:09 PM Andrii Nakryiko <andrii@kernel.org> wrote:
>
> Implement speculative (lockless) resolution of VMA to inode to uprobe,
> bypassing the need to take mmap_lock for reads, if possible. First two patches
> by Suren adds mm_struct helpers that help detect whether mm_struct was
> changed, which is used by uprobe logic to validate that speculative results
> can be trusted after all the lookup logic results in a valid uprobe instance.
>
> Patch #3 is a simplification to uprobe VMA flag checking, suggested by Oleg.
>
> And, finally, patch #4 is the speculative VMA-to-uprobe resolution logic
> itself, and is the focal point of this patch set. It makes entry uprobes in
> common case scale very well with number of CPUs, as we avoid any locking or
> cache line bouncing between CPUs. See corresponding patch for details and
> benchmarking results.
>
> Note, this patch set assumes that FMODE_BACKING files were switched to have
> SLAB_TYPE_SAFE_BY_RCU semantics, which was recently done by Christian Brauner
> in [0]. This change can be pulled into perf/core through stable
> tags/vfs-6.13.for-bpf.file tag from [1].
>
>   [0] https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.13.for-bpf.file&id=8b1bc2590af61129b82a189e9dc7c2804c34400e
>   [1] git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
>
> v3->v4:
> - rebased and dropped data_race(), given mm_struct uses real seqcount (Peter);
> v2->v3:
> - dropped kfree_rcu() patch (Christian);
> - added data_race() annotations for fields of vma and vma->vm_file which could
>   be modified during speculative lookup (Oleg);
> - fixed int->long problem in stubs for mmap_lock_speculation_{start,end}(),
>   caught by Kernel test robot;
> v1->v2:
> - adjusted vma_end_write_all() comment to point out it should never be called
>   manually now, but I wasn't sure how ACQUIRE/RELEASE comments should be
>   reworded (previously requested by Jann), so I'd appreciate some help there
>   (Jann);
> - int -> long change for mm_lock_seq, as agreed at LPC2024 (Jann, Suren, Liam);
> - kfree_rcu_mightsleep() for FMODE_BACKING (Suren, Christian);
> - vm_flags simplification in find_active_uprobe_rcu() and
>   find_active_uprobe_speculative() (Oleg);
> - guard(rcu)() simplified find_active_uprobe_speculative() implementation.
>
> Andrii Nakryiko (2):
>   uprobes: simplify find_active_uprobe_rcu() VMA checks
>   uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution
>
> Suren Baghdasaryan (2):
>   mm: Convert mm_lock_seq to a proper seqcount
>   mm: Introduce mmap_lock_speculation_{begin|end}
>
>  include/linux/mm.h               | 12 ++---
>  include/linux/mm_types.h         |  7 ++-
>  include/linux/mmap_lock.h        | 87 ++++++++++++++++++++++++--------
>  kernel/events/uprobes.c          | 47 ++++++++++++++++-
>  kernel/fork.c                    |  5 +-
>  mm/init-mm.c                     |  2 +-
>  tools/testing/vma/vma.c          |  4 +-
>  tools/testing/vma/vma_internal.h |  4 +-
>  8 files changed, 129 insertions(+), 39 deletions(-)
>
> --
> 2.43.5
>

Hi!

What's the status of this patch set? Are there any blockers for it to
be applied to perf/core? MM folks are OK with landing the first two
patches in perf/core, so hopefully we should be good to go?
Andrii Nakryiko Nov. 11, 2024, 5:26 p.m. UTC | #2
On Tue, Nov 5, 2024 at 6:01 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Sun, Oct 27, 2024 at 6:09 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > Implement speculative (lockless) resolution of VMA to inode to uprobe,
> > bypassing the need to take mmap_lock for reads, if possible. First two patches
> > by Suren adds mm_struct helpers that help detect whether mm_struct was
> > changed, which is used by uprobe logic to validate that speculative results
> > can be trusted after all the lookup logic results in a valid uprobe instance.
> >
> > Patch #3 is a simplification to uprobe VMA flag checking, suggested by Oleg.
> >
> > And, finally, patch #4 is the speculative VMA-to-uprobe resolution logic
> > itself, and is the focal point of this patch set. It makes entry uprobes in
> > common case scale very well with number of CPUs, as we avoid any locking or
> > cache line bouncing between CPUs. See corresponding patch for details and
> > benchmarking results.
> >
> > Note, this patch set assumes that FMODE_BACKING files were switched to have
> > SLAB_TYPE_SAFE_BY_RCU semantics, which was recently done by Christian Brauner
> > in [0]. This change can be pulled into perf/core through stable
> > tags/vfs-6.13.for-bpf.file tag from [1].
> >
> >   [0] https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.13.for-bpf.file&id=8b1bc2590af61129b82a189e9dc7c2804c34400e
> >   [1] git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
> >
> > v3->v4:
> > - rebased and dropped data_race(), given mm_struct uses real seqcount (Peter);
> > v2->v3:
> > - dropped kfree_rcu() patch (Christian);
> > - added data_race() annotations for fields of vma and vma->vm_file which could
> >   be modified during speculative lookup (Oleg);
> > - fixed int->long problem in stubs for mmap_lock_speculation_{start,end}(),
> >   caught by Kernel test robot;
> > v1->v2:
> > - adjusted vma_end_write_all() comment to point out it should never be called
> >   manually now, but I wasn't sure how ACQUIRE/RELEASE comments should be
> >   reworded (previously requested by Jann), so I'd appreciate some help there
> >   (Jann);
> > - int -> long change for mm_lock_seq, as agreed at LPC2024 (Jann, Suren, Liam);
> > - kfree_rcu_mightsleep() for FMODE_BACKING (Suren, Christian);
> > - vm_flags simplification in find_active_uprobe_rcu() and
> >   find_active_uprobe_speculative() (Oleg);
> > - guard(rcu)() simplified find_active_uprobe_speculative() implementation.
> >
> > Andrii Nakryiko (2):
> >   uprobes: simplify find_active_uprobe_rcu() VMA checks
> >   uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution
> >
> > Suren Baghdasaryan (2):
> >   mm: Convert mm_lock_seq to a proper seqcount
> >   mm: Introduce mmap_lock_speculation_{begin|end}
> >
> >  include/linux/mm.h               | 12 ++---
> >  include/linux/mm_types.h         |  7 ++-
> >  include/linux/mmap_lock.h        | 87 ++++++++++++++++++++++++--------
> >  kernel/events/uprobes.c          | 47 ++++++++++++++++-
> >  kernel/fork.c                    |  5 +-
> >  mm/init-mm.c                     |  2 +-
> >  tools/testing/vma/vma.c          |  4 +-
> >  tools/testing/vma/vma_internal.h |  4 +-
> >  8 files changed, 129 insertions(+), 39 deletions(-)
> >
> > --
> > 2.43.5
> >
>
> Hi!
>
> What's the status of this patch set? Are there any blockers for it to
> be applied to perf/core? MM folks are OK with landing the first two
> patches in perf/core, so hopefully we should be good to go?

Another week, another ping. Peter, what can I do to make this land? MM
parts are clearly ok with Andrew Morton, uprobe-side logic didn't
change (modulo inconsequential data_race() back and forth) since at
least August, was approved by Oleg, and seems to be very stable in
testing. I think it's time to let me forget about this patch set and
make actual use of it in production, please.