diff mbox series

[v3,tip/perf/core,1/4] mm: introduce mmap_lock_speculation_{start|end}

Message ID 20241010205644.3831427-2-andrii@kernel.org (mailing list archive)
State New
Headers show
Series uprobes,mm: speculative lockless VMA-to-uprobe lookup | expand

Commit Message

Andrii Nakryiko Oct. 10, 2024, 8:56 p.m. UTC
From: Suren Baghdasaryan <surenb@google.com>

Add helper functions to speculatively perform operations without
read-locking mmap_lock, expecting that mmap_lock will not be
write-locked and mm is not modified from under us.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240912210222.186542-1-surenb@google.com
---
 include/linux/mm_types.h  |  3 ++
 include/linux/mmap_lock.h | 72 ++++++++++++++++++++++++++++++++-------
 kernel/fork.c             |  3 --
 3 files changed, 63 insertions(+), 15 deletions(-)

Comments

Shakeel Butt Oct. 13, 2024, 7:56 a.m. UTC | #1
On Thu, Oct 10, 2024 at 01:56:41PM GMT, Andrii Nakryiko wrote:
> From: Suren Baghdasaryan <surenb@google.com>
> 
> Add helper functions to speculatively perform operations without
> read-locking mmap_lock, expecting that mmap_lock will not be
> write-locked and mm is not modified from under us.
> 
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> Link: https://lore.kernel.org/bpf/20240912210222.186542-1-surenb@google.com

Looks good to me. mmap_lock_speculation_* functions could use kerneldoc
but that can be added later.

Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Andrii Nakryiko Oct. 14, 2024, 8:27 p.m. UTC | #2
On Sun, Oct 13, 2024 at 12:56 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Thu, Oct 10, 2024 at 01:56:41PM GMT, Andrii Nakryiko wrote:
> > From: Suren Baghdasaryan <surenb@google.com>
> >
> > Add helper functions to speculatively perform operations without
> > read-locking mmap_lock, expecting that mmap_lock will not be
> > write-locked and mm is not modified from under us.
> >
> > Suggested-by: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > Link: https://lore.kernel.org/bpf/20240912210222.186542-1-surenb@google.com
>
> Looks good to me. mmap_lock_speculation_* functions could use kerneldoc
> but that can be added later.

Yep, though probably best if Suren can do that in the follow up, as he
knows all the right words to use :)

>
> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
>

Thanks!

>
Suren Baghdasaryan Oct. 14, 2024, 8:48 p.m. UTC | #3
On Mon, Oct 14, 2024 at 1:27 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Sun, Oct 13, 2024 at 12:56 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > On Thu, Oct 10, 2024 at 01:56:41PM GMT, Andrii Nakryiko wrote:
> > > From: Suren Baghdasaryan <surenb@google.com>
> > >
> > > Add helper functions to speculatively perform operations without
> > > read-locking mmap_lock, expecting that mmap_lock will not be
> > > write-locked and mm is not modified from under us.
> > >
> > > Suggested-by: Peter Zijlstra <peterz@infradead.org>
> > > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > Link: https://lore.kernel.org/bpf/20240912210222.186542-1-surenb@google.com
> >
> > Looks good to me. mmap_lock_speculation_* functions could use kerneldoc
> > but that can be added later.
>
> Yep, though probably best if Suren can do that in the follow up, as he
> knows all the right words to use :)

Will add to my TODO list.

>
> >
> > Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
> >
>
> Thanks!
>
> >
Peter Zijlstra Oct. 23, 2024, 8:10 p.m. UTC | #4
On Thu, Oct 10, 2024 at 01:56:41PM -0700, Andrii Nakryiko wrote:
> From: Suren Baghdasaryan <surenb@google.com>
> 
> Add helper functions to speculatively perform operations without
> read-locking mmap_lock, expecting that mmap_lock will not be
> write-locked and mm is not modified from under us.
> 
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> Link: https://lore.kernel.org/bpf/20240912210222.186542-1-surenb@google.com
> ---
>  include/linux/mm_types.h  |  3 ++
>  include/linux/mmap_lock.h | 72 ++++++++++++++++++++++++++++++++-------
>  kernel/fork.c             |  3 --
>  3 files changed, 63 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6e3bdf8e38bc..5d8cdebd42bc 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -887,6 +887,9 @@ struct mm_struct {
>  		 * Roughly speaking, incrementing the sequence number is
>  		 * equivalent to releasing locks on VMAs; reading the sequence
>  		 * number can be part of taking a read lock on a VMA.
> +		 * Incremented every time mmap_lock is write-locked/unlocked.
> +		 * Initialized to 0, therefore odd values indicate mmap_lock
> +		 * is write-locked and even values that it's released.
>  		 *
>  		 * Can be modified under write mmap_lock using RELEASE
>  		 * semantics.
> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> index de9dc20b01ba..9d23635bc701 100644
> --- a/include/linux/mmap_lock.h
> +++ b/include/linux/mmap_lock.h
> @@ -71,39 +71,84 @@ static inline void mmap_assert_write_locked(const struct mm_struct *mm)
>  }
>  
>  #ifdef CONFIG_PER_VMA_LOCK
> +static inline void init_mm_lock_seq(struct mm_struct *mm)
> +{
> +	mm->mm_lock_seq = 0;
> +}
> +
>  /*
> - * Drop all currently-held per-VMA locks.
> - * This is called from the mmap_lock implementation directly before releasing
> - * a write-locked mmap_lock (or downgrading it to read-locked).
> - * This should normally NOT be called manually from other places.
> - * If you want to call this manually anyway, keep in mind that this will release
> - * *all* VMA write locks, including ones from further up the stack.
> + * Increment mm->mm_lock_seq when mmap_lock is write-locked (ACQUIRE semantics)
> + * or write-unlocked (RELEASE semantics).
>   */
> -static inline void vma_end_write_all(struct mm_struct *mm)
> +static inline void inc_mm_lock_seq(struct mm_struct *mm, bool acquire)
>  {
>  	mmap_assert_write_locked(mm);
>  	/*
>  	 * Nobody can concurrently modify mm->mm_lock_seq due to exclusive
>  	 * mmap_lock being held.
> -	 * We need RELEASE semantics here to ensure that preceding stores into
> -	 * the VMA take effect before we unlock it with this store.
> -	 * Pairs with ACQUIRE semantics in vma_start_read().
>  	 */
> -	smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1);
> +
> +	if (acquire) {
> +		WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1);
> +		/*
> +		 * For ACQUIRE semantics we should ensure no following stores are
> +		 * reordered to appear before the mm->mm_lock_seq modification.
> +		 */
> +		smp_wmb();

Strictly speaking this isn't ACQUIRE, nor do we care about ACQUIRE here.
This really is about subsequent stores, loads are irrelevant.

> +	} else {
> +		/*
> +		 * We need RELEASE semantics here to ensure that preceding stores
> +		 * into the VMA take effect before we unlock it with this store.
> +		 * Pairs with ACQUIRE semantics in vma_start_read().
> +		 */

Again, not strictly true. We don't care about loads. Using RELEASE here
is fine and probably cheaper on a few platforms, but we don't strictly
need/care about RELEASE.

> +		smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1);
> +	}
> +}

Also, it might be saner to stick closer to the seqcount naming of
things and use two different functions for these two different things.

/* straight up copy of do_raw_write_seqcount_begin() */
static inline void mm_write_seqlock_begin(struct mm_struct *mm)
{
	kcsan_nestable_atomic_begin();
	mm->mm_lock_seq++;
	smp_wmb();
}

/* straigjt up copy of do_raw_write_seqcount_end() */
static inline void mm_write_seqcount_end(struct mm_struct *mm)
{
	smp_wmb();
	mm->mm_lock_seq++;
	kcsan_nestable_atomic_end();
}

Or better yet, just use seqcount...

> +
> +static inline bool mmap_lock_speculation_start(struct mm_struct *mm, int *seq)
> +{
> +	/* Pairs with RELEASE semantics in inc_mm_lock_seq(). */
> +	*seq = smp_load_acquire(&mm->mm_lock_seq);
> +	/* Allow speculation if mmap_lock is not write-locked */
> +	return (*seq & 1) == 0;
> +}
> +
> +static inline bool mmap_lock_speculation_end(struct mm_struct *mm, int seq)
> +{
> +	/* Pairs with ACQUIRE semantics in inc_mm_lock_seq(). */
> +	smp_rmb();
> +	return seq == READ_ONCE(mm->mm_lock_seq);
>  }

Because there's nothing better than well known functions with a randomly
different name and interface I suppose...


Anyway, all the actual code proposed is not wrong. I'm just a bit
annoyed its a random NIH of seqcount.
Suren Baghdasaryan Oct. 23, 2024, 10:17 p.m. UTC | #5
On Wed, Oct 23, 2024 at 1:10 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Oct 10, 2024 at 01:56:41PM -0700, Andrii Nakryiko wrote:
> > From: Suren Baghdasaryan <surenb@google.com>
> >
> > Add helper functions to speculatively perform operations without
> > read-locking mmap_lock, expecting that mmap_lock will not be
> > write-locked and mm is not modified from under us.
> >
> > Suggested-by: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > Link: https://lore.kernel.org/bpf/20240912210222.186542-1-surenb@google.com
> > ---
> >  include/linux/mm_types.h  |  3 ++
> >  include/linux/mmap_lock.h | 72 ++++++++++++++++++++++++++++++++-------
> >  kernel/fork.c             |  3 --
> >  3 files changed, 63 insertions(+), 15 deletions(-)
> >
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 6e3bdf8e38bc..5d8cdebd42bc 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -887,6 +887,9 @@ struct mm_struct {
> >                * Roughly speaking, incrementing the sequence number is
> >                * equivalent to releasing locks on VMAs; reading the sequence
> >                * number can be part of taking a read lock on a VMA.
> > +              * Incremented every time mmap_lock is write-locked/unlocked.
> > +              * Initialized to 0, therefore odd values indicate mmap_lock
> > +              * is write-locked and even values that it's released.
> >                *
> >                * Can be modified under write mmap_lock using RELEASE
> >                * semantics.
> > diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> > index de9dc20b01ba..9d23635bc701 100644
> > --- a/include/linux/mmap_lock.h
> > +++ b/include/linux/mmap_lock.h
> > @@ -71,39 +71,84 @@ static inline void mmap_assert_write_locked(const struct mm_struct *mm)
> >  }
> >
> >  #ifdef CONFIG_PER_VMA_LOCK
> > +static inline void init_mm_lock_seq(struct mm_struct *mm)
> > +{
> > +     mm->mm_lock_seq = 0;
> > +}
> > +
> >  /*
> > - * Drop all currently-held per-VMA locks.
> > - * This is called from the mmap_lock implementation directly before releasing
> > - * a write-locked mmap_lock (or downgrading it to read-locked).
> > - * This should normally NOT be called manually from other places.
> > - * If you want to call this manually anyway, keep in mind that this will release
> > - * *all* VMA write locks, including ones from further up the stack.
> > + * Increment mm->mm_lock_seq when mmap_lock is write-locked (ACQUIRE semantics)
> > + * or write-unlocked (RELEASE semantics).
> >   */
> > -static inline void vma_end_write_all(struct mm_struct *mm)
> > +static inline void inc_mm_lock_seq(struct mm_struct *mm, bool acquire)
> >  {
> >       mmap_assert_write_locked(mm);
> >       /*
> >        * Nobody can concurrently modify mm->mm_lock_seq due to exclusive
> >        * mmap_lock being held.
> > -      * We need RELEASE semantics here to ensure that preceding stores into
> > -      * the VMA take effect before we unlock it with this store.
> > -      * Pairs with ACQUIRE semantics in vma_start_read().
> >        */
> > -     smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1);
> > +
> > +     if (acquire) {
> > +             WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1);
> > +             /*
> > +              * For ACQUIRE semantics we should ensure no following stores are
> > +              * reordered to appear before the mm->mm_lock_seq modification.
> > +              */
> > +             smp_wmb();
>
> Strictly speaking this isn't ACQUIRE, nor do we care about ACQUIRE here.
> This really is about subsequent stores, loads are irrelevant.
>
> > +     } else {
> > +             /*
> > +              * We need RELEASE semantics here to ensure that preceding stores
> > +              * into the VMA take effect before we unlock it with this store.
> > +              * Pairs with ACQUIRE semantics in vma_start_read().
> > +              */
>
> Again, not strictly true. We don't care about loads. Using RELEASE here
> is fine and probably cheaper on a few platforms, but we don't strictly
> need/care about RELEASE.
>
> > +             smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1);
> > +     }
> > +}
>
> Also, it might be saner to stick closer to the seqcount naming of
> things and use two different functions for these two different things.
>
> /* straight up copy of do_raw_write_seqcount_begin() */
> static inline void mm_write_seqlock_begin(struct mm_struct *mm)
> {
>         kcsan_nestable_atomic_begin();
>         mm->mm_lock_seq++;
>         smp_wmb();
> }
>
> /* straigjt up copy of do_raw_write_seqcount_end() */
> static inline void mm_write_seqcount_end(struct mm_struct *mm)
> {
>         smp_wmb();
>         mm->mm_lock_seq++;
>         kcsan_nestable_atomic_end();
> }
>
> Or better yet, just use seqcount...

Yeah, with these changes it does look a lot like seqcount now...
I can take another stab at rewriting this using seqcount_t but one
issue that Jann was concerned about is the counter being int vs long.
seqcount_t uses unsigned, so I'm not sure how to address that if I
were to use seqcount_t. Any suggestions how to address that before I
move forward with a rewrite?

>
> > +
> > +static inline bool mmap_lock_speculation_start(struct mm_struct *mm, int *seq)
> > +{
> > +     /* Pairs with RELEASE semantics in inc_mm_lock_seq(). */
> > +     *seq = smp_load_acquire(&mm->mm_lock_seq);
> > +     /* Allow speculation if mmap_lock is not write-locked */
> > +     return (*seq & 1) == 0;
> > +}
> > +
> > +static inline bool mmap_lock_speculation_end(struct mm_struct *mm, int seq)
> > +{
> > +     /* Pairs with ACQUIRE semantics in inc_mm_lock_seq(). */
> > +     smp_rmb();
> > +     return seq == READ_ONCE(mm->mm_lock_seq);
> >  }
>
> Because there's nothing better than well known functions with a randomly
> different name and interface I suppose...
>
>
> Anyway, all the actual code proposed is not wrong. I'm just a bit
> annoyed its a random NIH of seqcount.

Ack. Let's decide what we do about u32 vs u64 issue and I'll rewrite this.
Peter Zijlstra Oct. 24, 2024, 9:56 a.m. UTC | #6
On Wed, Oct 23, 2024 at 03:17:01PM -0700, Suren Baghdasaryan wrote:

> > Or better yet, just use seqcount...
> 
> Yeah, with these changes it does look a lot like seqcount now...
> I can take another stab at rewriting this using seqcount_t but one
> issue that Jann was concerned about is the counter being int vs long.
> seqcount_t uses unsigned, so I'm not sure how to address that if I
> were to use seqcount_t. Any suggestions how to address that before I
> move forward with a rewrite?

So if that issue is real, it is not specific to this case. Specifically
preemptible seqcount will be similarly affected. So we should probably
address that in the seqcount implementation.
Suren Baghdasaryan Oct. 24, 2024, 4:28 p.m. UTC | #7
On Thu, Oct 24, 2024 at 2:57 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Oct 23, 2024 at 03:17:01PM -0700, Suren Baghdasaryan wrote:
>
> > > Or better yet, just use seqcount...
> >
> > Yeah, with these changes it does look a lot like seqcount now...
> > I can take another stab at rewriting this using seqcount_t but one
> > issue that Jann was concerned about is the counter being int vs long.
> > seqcount_t uses unsigned, so I'm not sure how to address that if I
> > were to use seqcount_t. Any suggestions how to address that before I
> > move forward with a rewrite?
>
> So if that issue is real, it is not specific to this case. Specifically
> preemptible seqcount will be similarly affected. So we should probably
> address that in the seqcount implementation.

Sounds good. Let me try rewriting this patch using seqcount_t and I'll
work with Jann on a separate patch to change seqcount_t.
Thanks for the feedback!

>
Suren Baghdasaryan Oct. 24, 2024, 9:04 p.m. UTC | #8
On Thu, Oct 24, 2024 at 9:28 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Thu, Oct 24, 2024 at 2:57 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Wed, Oct 23, 2024 at 03:17:01PM -0700, Suren Baghdasaryan wrote:
> >
> > > > Or better yet, just use seqcount...
> > >
> > > Yeah, with these changes it does look a lot like seqcount now...
> > > I can take another stab at rewriting this using seqcount_t but one
> > > issue that Jann was concerned about is the counter being int vs long.
> > > seqcount_t uses unsigned, so I'm not sure how to address that if I
> > > were to use seqcount_t. Any suggestions how to address that before I
> > > move forward with a rewrite?
> >
> > So if that issue is real, it is not specific to this case. Specifically
> > preemptible seqcount will be similarly affected. So we should probably
> > address that in the seqcount implementation.
>
> Sounds good. Let me try rewriting this patch using seqcount_t and I'll
> work with Jann on a separate patch to change seqcount_t.
> Thanks for the feedback!

I posted the patchset to convert mm_lock_seq into seqcount_t and to
add speculative functions at
https://lore.kernel.org/all/20241024205231.1944747-1-surenb@google.com/.

>
> >
Andrii Nakryiko Oct. 24, 2024, 11:20 p.m. UTC | #9
On Thu, Oct 24, 2024 at 2:04 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Thu, Oct 24, 2024 at 9:28 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Thu, Oct 24, 2024 at 2:57 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Wed, Oct 23, 2024 at 03:17:01PM -0700, Suren Baghdasaryan wrote:
> > >
> > > > > Or better yet, just use seqcount...
> > > >
> > > > Yeah, with these changes it does look a lot like seqcount now...
> > > > I can take another stab at rewriting this using seqcount_t but one
> > > > issue that Jann was concerned about is the counter being int vs long.
> > > > seqcount_t uses unsigned, so I'm not sure how to address that if I
> > > > were to use seqcount_t. Any suggestions how to address that before I
> > > > move forward with a rewrite?
> > >
> > > So if that issue is real, it is not specific to this case. Specifically
> > > preemptible seqcount will be similarly affected. So we should probably
> > > address that in the seqcount implementation.
> >
> > Sounds good. Let me try rewriting this patch using seqcount_t and I'll
> > work with Jann on a separate patch to change seqcount_t.
> > Thanks for the feedback!
>
> I posted the patchset to convert mm_lock_seq into seqcount_t and to
> add speculative functions at
> https://lore.kernel.org/all/20241024205231.1944747-1-surenb@google.com/.

Thanks, Suren! Hopefully it can land soon!

>
> >
> > >
Suren Baghdasaryan Oct. 24, 2024, 11:33 p.m. UTC | #10
On Thu, Oct 24, 2024 at 4:20 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Oct 24, 2024 at 2:04 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Thu, Oct 24, 2024 at 9:28 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Thu, Oct 24, 2024 at 2:57 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > On Wed, Oct 23, 2024 at 03:17:01PM -0700, Suren Baghdasaryan wrote:
> > > >
> > > > > > Or better yet, just use seqcount...
> > > > >
> > > > > Yeah, with these changes it does look a lot like seqcount now...
> > > > > I can take another stab at rewriting this using seqcount_t but one
> > > > > issue that Jann was concerned about is the counter being int vs long.
> > > > > seqcount_t uses unsigned, so I'm not sure how to address that if I
> > > > > were to use seqcount_t. Any suggestions how to address that before I
> > > > > move forward with a rewrite?
> > > >
> > > > So if that issue is real, it is not specific to this case. Specifically
> > > > preemptible seqcount will be similarly affected. So we should probably
> > > > address that in the seqcount implementation.
> > >
> > > Sounds good. Let me try rewriting this patch using seqcount_t and I'll
> > > work with Jann on a separate patch to change seqcount_t.
> > > Thanks for the feedback!
> >
> > I posted the patchset to convert mm_lock_seq into seqcount_t and to
> > add speculative functions at
> > https://lore.kernel.org/all/20241024205231.1944747-1-surenb@google.com/.
>
> Thanks, Suren! Hopefully it can land soon!

Would incorporating them into your patchset speed things up? If so,
feel free to include them into your series.
The only required change in your other patches is the renaming of
mmap_lock_speculation_start() to mmap_lock_speculation_begin().

>
> >
> > >
> > > >
Andrii Nakryiko Oct. 25, 2024, 5:12 a.m. UTC | #11
On Thu, Oct 24, 2024 at 4:33 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Thu, Oct 24, 2024 at 4:20 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Oct 24, 2024 at 2:04 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Thu, Oct 24, 2024 at 9:28 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > >
> > > > On Thu, Oct 24, 2024 at 2:57 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > > > >
> > > > > On Wed, Oct 23, 2024 at 03:17:01PM -0700, Suren Baghdasaryan wrote:
> > > > >
> > > > > > > Or better yet, just use seqcount...
> > > > > >
> > > > > > Yeah, with these changes it does look a lot like seqcount now...
> > > > > > I can take another stab at rewriting this using seqcount_t but one
> > > > > > issue that Jann was concerned about is the counter being int vs long.
> > > > > > seqcount_t uses unsigned, so I'm not sure how to address that if I
> > > > > > were to use seqcount_t. Any suggestions how to address that before I
> > > > > > move forward with a rewrite?
> > > > >
> > > > > So if that issue is real, it is not specific to this case. Specifically
> > > > > preemptible seqcount will be similarly affected. So we should probably
> > > > > address that in the seqcount implementation.
> > > >
> > > > Sounds good. Let me try rewriting this patch using seqcount_t and I'll
> > > > work with Jann on a separate patch to change seqcount_t.
> > > > Thanks for the feedback!
> > >
> > > I posted the patchset to convert mm_lock_seq into seqcount_t and to
> > > add speculative functions at
> > > https://lore.kernel.org/all/20241024205231.1944747-1-surenb@google.com/.
> >
> > Thanks, Suren! Hopefully it can land soon!
>
> Would incorporating them into your patchset speed things up? If so,
> feel free to include them into your series.

I don't really think so. At this point the uprobe part is done (next
revision has a comment style fix, that's all). So I'll just wait for
your patches to be acked and applied, then I'll just do a trivial
rebase. This will be easier for everyone at this point, IMO, to not
couple them into a single patch set with two authors.

Hopefully Peter will take those patches through tip/perf/core, though,
so I don't have to wait for mm and tip trees to converge.

> The only required change in your other patches is the renaming of
> mmap_lock_speculation_start() to mmap_lock_speculation_begin().

Yep, no problem.

>
> >
> > >
> > > >
> > > > >
diff mbox series

Patch

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6e3bdf8e38bc..5d8cdebd42bc 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -887,6 +887,9 @@  struct mm_struct {
 		 * Roughly speaking, incrementing the sequence number is
 		 * equivalent to releasing locks on VMAs; reading the sequence
 		 * number can be part of taking a read lock on a VMA.
+		 * Incremented every time mmap_lock is write-locked/unlocked.
+		 * Initialized to 0, therefore odd values indicate mmap_lock
+		 * is write-locked and even values that it's released.
 		 *
 		 * Can be modified under write mmap_lock using RELEASE
 		 * semantics.
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index de9dc20b01ba..9d23635bc701 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -71,39 +71,84 @@  static inline void mmap_assert_write_locked(const struct mm_struct *mm)
 }
 
 #ifdef CONFIG_PER_VMA_LOCK
+static inline void init_mm_lock_seq(struct mm_struct *mm)
+{
+	mm->mm_lock_seq = 0;
+}
+
 /*
- * Drop all currently-held per-VMA locks.
- * This is called from the mmap_lock implementation directly before releasing
- * a write-locked mmap_lock (or downgrading it to read-locked).
- * This should normally NOT be called manually from other places.
- * If you want to call this manually anyway, keep in mind that this will release
- * *all* VMA write locks, including ones from further up the stack.
+ * Increment mm->mm_lock_seq when mmap_lock is write-locked (ACQUIRE semantics)
+ * or write-unlocked (RELEASE semantics).
  */
-static inline void vma_end_write_all(struct mm_struct *mm)
+static inline void inc_mm_lock_seq(struct mm_struct *mm, bool acquire)
 {
 	mmap_assert_write_locked(mm);
 	/*
 	 * Nobody can concurrently modify mm->mm_lock_seq due to exclusive
 	 * mmap_lock being held.
-	 * We need RELEASE semantics here to ensure that preceding stores into
-	 * the VMA take effect before we unlock it with this store.
-	 * Pairs with ACQUIRE semantics in vma_start_read().
 	 */
-	smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1);
+
+	if (acquire) {
+		WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1);
+		/*
+		 * For ACQUIRE semantics we should ensure no following stores are
+		 * reordered to appear before the mm->mm_lock_seq modification.
+		 */
+		smp_wmb();
+	} else {
+		/*
+		 * We need RELEASE semantics here to ensure that preceding stores
+		 * into the VMA take effect before we unlock it with this store.
+		 * Pairs with ACQUIRE semantics in vma_start_read().
+		 */
+		smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1);
+	}
+}
+
+static inline bool mmap_lock_speculation_start(struct mm_struct *mm, int *seq)
+{
+	/* Pairs with RELEASE semantics in inc_mm_lock_seq(). */
+	*seq = smp_load_acquire(&mm->mm_lock_seq);
+	/* Allow speculation if mmap_lock is not write-locked */
+	return (*seq & 1) == 0;
+}
+
+static inline bool mmap_lock_speculation_end(struct mm_struct *mm, int seq)
+{
+	/* Pairs with ACQUIRE semantics in inc_mm_lock_seq(). */
+	smp_rmb();
+	return seq == READ_ONCE(mm->mm_lock_seq);
 }
+
 #else
-static inline void vma_end_write_all(struct mm_struct *mm) {}
+static inline void init_mm_lock_seq(struct mm_struct *mm) {}
+static inline void inc_mm_lock_seq(struct mm_struct *mm, bool acquire) {}
+static inline bool mmap_lock_speculation_start(struct mm_struct *mm, int *seq) { return false; }
+static inline bool mmap_lock_speculation_end(struct mm_struct *mm, int seq) { return false; }
 #endif
 
+/*
+ * Drop all currently-held per-VMA locks.
+ * This is called from the mmap_lock implementation directly before releasing
+ * a write-locked mmap_lock (or downgrading it to read-locked).
+ * This should NOT be called manually from other places.
+ */
+static inline void vma_end_write_all(struct mm_struct *mm)
+{
+	inc_mm_lock_seq(mm, false);
+}
+
 static inline void mmap_init_lock(struct mm_struct *mm)
 {
 	init_rwsem(&mm->mmap_lock);
+	init_mm_lock_seq(mm);
 }
 
 static inline void mmap_write_lock(struct mm_struct *mm)
 {
 	__mmap_lock_trace_start_locking(mm, true);
 	down_write(&mm->mmap_lock);
+	inc_mm_lock_seq(mm, true);
 	__mmap_lock_trace_acquire_returned(mm, true, true);
 }
 
@@ -111,6 +156,7 @@  static inline void mmap_write_lock_nested(struct mm_struct *mm, int subclass)
 {
 	__mmap_lock_trace_start_locking(mm, true);
 	down_write_nested(&mm->mmap_lock, subclass);
+	inc_mm_lock_seq(mm, true);
 	__mmap_lock_trace_acquire_returned(mm, true, true);
 }
 
@@ -120,6 +166,8 @@  static inline int mmap_write_lock_killable(struct mm_struct *mm)
 
 	__mmap_lock_trace_start_locking(mm, true);
 	ret = down_write_killable(&mm->mmap_lock);
+	if (!ret)
+		inc_mm_lock_seq(mm, true);
 	__mmap_lock_trace_acquire_returned(mm, true, ret == 0);
 	return ret;
 }
diff --git a/kernel/fork.c b/kernel/fork.c
index 89ceb4a68af2..dd1bded0294d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1261,9 +1261,6 @@  static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
 	seqcount_init(&mm->write_protect_seq);
 	mmap_init_lock(mm);
 	INIT_LIST_HEAD(&mm->mmlist);
-#ifdef CONFIG_PER_VMA_LOCK
-	mm->mm_lock_seq = 0;
-#endif
 	mm_pgtables_bytes_init(mm);
 	mm->map_count = 0;
 	mm->locked_vm = 0;