mbox series

[0/2] uprobes,mm: speculative lockless VMA-to-uprobe lookup

Message ID 20240906051205.530219-1-andrii@kernel.org (mailing list archive)
Headers show
Series uprobes,mm: speculative lockless VMA-to-uprobe lookup | expand

Message

Andrii Nakryiko Sept. 6, 2024, 5:12 a.m. UTC
Implement speculative (lockless) resolution of VMA to inode to uprobe,
bypassing the need to take mmap_lock for reads, if possible. Patch #1 by Suren
adds mm_struct helpers that help detect whether mm_struct were changed, which
is used by uprobe logic to validate that speculative results can be trusted
after all the lookup logic results in a valid uprobe instance.

I ran a few will-it-scale benchmarks to sanity check that patch #1 doesn't
introduce any noticeable regressions. Which it seems it doesn't.

Andrii Nakryiko (1):
  uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution

Suren Baghdasaryan (1):
  mm: introduce mmap_lock_speculation_{start|end}

 include/linux/mm_types.h  |  3 +++
 include/linux/mmap_lock.h | 53 +++++++++++++++++++++++++++++++--------
 kernel/events/uprobes.c   | 51 +++++++++++++++++++++++++++++++++++++
 kernel/fork.c             |  3 ---
 4 files changed, 97 insertions(+), 13 deletions(-)

Comments

Jann Horn Sept. 10, 2024, 4:06 p.m. UTC | #1
On Fri, Sep 6, 2024 at 7:12 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> Implement speculative (lockless) resolution of VMA to inode to uprobe,
> bypassing the need to take mmap_lock for reads, if possible. Patch #1 by Suren
> adds mm_struct helpers that help detect whether mm_struct were changed, which
> is used by uprobe logic to validate that speculative results can be trusted
> after all the lookup logic results in a valid uprobe instance.

Random thought: It would be nice if you could skip the MM stuff
entirely and instead go through the GUP-fast path, but I guess going
from a uprobe-created anon page to the corresponding uprobe is hard...
but maybe if you used the anon_vma pointer as a lookup key to find the
uprobe, it could work? Though then you'd need hooks in the anon_vma
code... maybe not such a great idea.
Andrii Nakryiko Sept. 10, 2024, 5:58 p.m. UTC | #2
On Tue, Sep 10, 2024 at 9:06 AM Jann Horn <jannh@google.com> wrote:
>
> On Fri, Sep 6, 2024 at 7:12 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > Implement speculative (lockless) resolution of VMA to inode to uprobe,
> > bypassing the need to take mmap_lock for reads, if possible. Patch #1 by Suren
> > adds mm_struct helpers that help detect whether mm_struct were changed, which
> > is used by uprobe logic to validate that speculative results can be trusted
> > after all the lookup logic results in a valid uprobe instance.
>
> Random thought: It would be nice if you could skip the MM stuff
> entirely and instead go through the GUP-fast path, but I guess going
> from a uprobe-created anon page to the corresponding uprobe is hard...
> but maybe if you used the anon_vma pointer as a lookup key to find the
> uprobe, it could work? Though then you'd need hooks in the anon_vma
> code... maybe not such a great idea.

So I'm not crystal clear on all the details here, so maybe you can
elaborate a bit. But keep in mind that a) there could be multiple
uprobes within a single user page, so lookup has to take at least
offset within the page into account somehow. But also b) single uprobe
can be installed in many independent anon VMAs across many processes.
So anon vma itself can't be part of the key.

Though maybe we could have left some sort of "cookie" stashed
somewhere to help with lookup. But then again, multiple uprobes per
page.

It does feel like lockless VMA to inode resolution would be a cleaner
solution, let's see if we can get there somehow.
Jann Horn Sept. 10, 2024, 6:13 p.m. UTC | #3
On Tue, Sep 10, 2024 at 7:58 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> On Tue, Sep 10, 2024 at 9:06 AM Jann Horn <jannh@google.com> wrote:
> > On Fri, Sep 6, 2024 at 7:12 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > > Implement speculative (lockless) resolution of VMA to inode to uprobe,
> > > bypassing the need to take mmap_lock for reads, if possible. Patch #1 by Suren
> > > adds mm_struct helpers that help detect whether mm_struct were changed, which
> > > is used by uprobe logic to validate that speculative results can be trusted
> > > after all the lookup logic results in a valid uprobe instance.
> >
> > Random thought: It would be nice if you could skip the MM stuff
> > entirely and instead go through the GUP-fast path, but I guess going
> > from a uprobe-created anon page to the corresponding uprobe is hard...
> > but maybe if you used the anon_vma pointer as a lookup key to find the
> > uprobe, it could work? Though then you'd need hooks in the anon_vma
> > code... maybe not such a great idea.
>
> So I'm not crystal clear on all the details here, so maybe you can
> elaborate a bit. But keep in mind that a) there could be multiple
> uprobes within a single user page, so lookup has to take at least
> offset within the page into account somehow. But also b) single uprobe

I think anonymous pages have the same pgoff numbering as file pages;
so the page's mapping and pgoff pointers together should almost give
you the same amount of information as what you are currently looking
for (the file and the offset inside it), except that you'd get an
anon_vma pointer corresponding to the file instead of directly getting
the file.

> can be installed in many independent anon VMAs across many processes.
> So anon vma itself can't be part of the key.

Yeah, I guess to make that work you'd have to somehow track which
anon_vmas exist for which mappings.

(An anon_vma is tied to one specific file, see anon_vma_compatible().)

> Though maybe we could have left some sort of "cookie" stashed
> somewhere to help with lookup. But then again, multiple uprobes per
> page.
>
> It does feel like lockless VMA to inode resolution would be a cleaner
> solution, let's see if we can get there somehow.

Mh, yes, I was just thinking it would be nice if we could keep this
lockless complexity out of the mmap locking code... but I guess it's
not much more straightforward than what you're doing.