diff mbox series

[v4,18/33] mm: write-lock VMAs before removing them from VMA tree

Message ID 20230227173632.3292573-19-surenb@google.com (mailing list archive)
State New
Headers show
Series Per-VMA locks | expand

Commit Message

Suren Baghdasaryan Feb. 27, 2023, 5:36 p.m. UTC
Write-locking VMAs before isolating them ensures that page fault
handlers don't operate on isolated VMAs.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 mm/mmap.c  | 1 +
 mm/nommu.c | 5 +++++
 2 files changed, 6 insertions(+)

Comments

Hyeonggon Yoo March 1, 2023, 7:43 a.m. UTC | #1
On Mon, Feb 27, 2023 at 09:36:17AM -0800, Suren Baghdasaryan wrote:
> Write-locking VMAs before isolating them ensures that page fault
> handlers don't operate on isolated VMAs.
> 
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> ---
>  mm/mmap.c  | 1 +
>  mm/nommu.c | 5 +++++
>  2 files changed, 6 insertions(+)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 1f42b9a52b9b..f7ed357056c4 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2255,6 +2255,7 @@ int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
>  static inline int munmap_sidetree(struct vm_area_struct *vma,
>  				   struct ma_state *mas_detach)
>  {
> +	vma_start_write(vma);
>  	mas_set_range(mas_detach, vma->vm_start, vma->vm_end - 1);

I may be missing something, but have few questions:

	1) Why does a writer need to both write-lock a VMA and mark the VMA detached
	   when unmapping it, isn't it enough to just only write-lock a VMA?

	2) as VMAs that are going to be removed are already locked in vma_prepare(),
	   so I think this hunk could be dropped?

>  	if (mas_store_gfp(mas_detach, vma, GFP_KERNEL))
>  		return -ENOMEM;
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 57ba243c6a37..2ab162d773e2 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -588,6 +588,7 @@ static int delete_vma_from_mm(struct vm_area_struct *vma)
>  		       current->pid);
>  		return -ENOMEM;
>  	}
> +	vma_start_write(vma);
>  	cleanup_vma_from_mm(vma);

	3) I think this hunk could be dropped as Per-VMA lock depends on MMU anyway.

Thanks,
Hyeonggon

>  
>  	/* remove from the MM's tree and list */
> @@ -1519,6 +1520,10 @@ void exit_mmap(struct mm_struct *mm)
>  	 */
>  	mmap_write_lock(mm);
>  	for_each_vma(vmi, vma) {
> +		/*
> +		 * No need to lock VMA because this is the only mm user and no
> +		 * page fault handled can race with it.
> +		 */
>  		cleanup_vma_from_mm(vma);
>  		delete_vma(mm, vma);
>  		cond_resched();
> -- 
> 2.39.2.722.g9855ee24e9-goog
> 
>
Hyeonggon Yoo March 1, 2023, 7:56 a.m. UTC | #2
On Wed, Mar 01, 2023 at 07:43:33AM +0000, Hyeonggon Yoo wrote:
> On Mon, Feb 27, 2023 at 09:36:17AM -0800, Suren Baghdasaryan wrote:
> > Write-locking VMAs before isolating them ensures that page fault
> > handlers don't operate on isolated VMAs.
> > 
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > ---
> >  mm/mmap.c  | 1 +
> >  mm/nommu.c | 5 +++++
> >  2 files changed, 6 insertions(+)
> > 
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 1f42b9a52b9b..f7ed357056c4 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -2255,6 +2255,7 @@ int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
> >  static inline int munmap_sidetree(struct vm_area_struct *vma,
> >  				   struct ma_state *mas_detach)
> >  {
> > +	vma_start_write(vma);
> >  	mas_set_range(mas_detach, vma->vm_start, vma->vm_end - 1);
> 
> I may be missing something, but have few questions:
> 
> 	1) Why does a writer need to both write-lock a VMA and mark the VMA detached
> 	   when unmapping it, isn't it enough to just only write-lock a VMA?
> 
> 	2) as VMAs that are going to be removed are already locked in vma_prepare(),
> 	   so I think this hunk could be dropped?

After sending this just realized that I did not consider simple munmap case :)
But I still think 1) and 3) are valid question.

> 
> >  	if (mas_store_gfp(mas_detach, vma, GFP_KERNEL))
> >  		return -ENOMEM;
> > diff --git a/mm/nommu.c b/mm/nommu.c
> > index 57ba243c6a37..2ab162d773e2 100644
> > --- a/mm/nommu.c
> > +++ b/mm/nommu.c
> > @@ -588,6 +588,7 @@ static int delete_vma_from_mm(struct vm_area_struct *vma)
> >  		       current->pid);
> >  		return -ENOMEM;
> >  	}
> > +	vma_start_write(vma);
> >  	cleanup_vma_from_mm(vma);
> 
> 	3) I think this hunk could be dropped as Per-VMA lock depends on MMU anyway.
> 
> Thanks,
> Hyeonggon
> 
> >  
> >  	/* remove from the MM's tree and list */
> > @@ -1519,6 +1520,10 @@ void exit_mmap(struct mm_struct *mm)
> >  	 */
> >  	mmap_write_lock(mm);
> >  	for_each_vma(vmi, vma) {
> > +		/*
> > +		 * No need to lock VMA because this is the only mm user and no
> > +		 * page fault handled can race with it.
> > +		 */
> >  		cleanup_vma_from_mm(vma);
> >  		delete_vma(mm, vma);
> >  		cond_resched();
> > -- 
> > 2.39.2.722.g9855ee24e9-goog
> > 
> > 
>
Suren Baghdasaryan March 1, 2023, 6:34 p.m. UTC | #3
On Tue, Feb 28, 2023 at 11:57 PM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
>
> On Wed, Mar 01, 2023 at 07:43:33AM +0000, Hyeonggon Yoo wrote:
> > On Mon, Feb 27, 2023 at 09:36:17AM -0800, Suren Baghdasaryan wrote:
> > > Write-locking VMAs before isolating them ensures that page fault
> > > handlers don't operate on isolated VMAs.
> > >
> > > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > > ---
> > >  mm/mmap.c  | 1 +
> > >  mm/nommu.c | 5 +++++
> > >  2 files changed, 6 insertions(+)
> > >
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index 1f42b9a52b9b..f7ed357056c4 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -2255,6 +2255,7 @@ int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > >  static inline int munmap_sidetree(struct vm_area_struct *vma,
> > >                                struct ma_state *mas_detach)
> > >  {
> > > +   vma_start_write(vma);
> > >     mas_set_range(mas_detach, vma->vm_start, vma->vm_end - 1);
> >
> > I may be missing something, but have few questions:
> >
> >       1) Why does a writer need to both write-lock a VMA and mark the VMA detached
> >          when unmapping it, isn't it enough to just only write-lock a VMA?

We need to mark the VMA detached to avoid handling page fault in a
detached VMA. The possible scenario is:

lock_vma_under_rcu
  vma = mas_walk(&mas)
                                                        munmap_sidetree
                                                          vma_start_write(vma)

mas_store_gfp() // remove VMA from the tree
                                                          vma_end_write_all()
  vma_start_read(vma)
  // we locked the VMA but it is not part of the tree anymore.

So, marking the VMA locked before vma_end_write_all() and checking
vma->detached after vma_start_read() helps us avoid handling faults in
the detached VMA.


> >
> >       2) as VMAs that are going to be removed are already locked in vma_prepare(),
> >          so I think this hunk could be dropped?
>
> After sending this just realized that I did not consider simple munmap case :)
> But I still think 1) and 3) are valid question.
>
> >
> > >     if (mas_store_gfp(mas_detach, vma, GFP_KERNEL))
> > >             return -ENOMEM;
> > > diff --git a/mm/nommu.c b/mm/nommu.c
> > > index 57ba243c6a37..2ab162d773e2 100644
> > > --- a/mm/nommu.c
> > > +++ b/mm/nommu.c
> > > @@ -588,6 +588,7 @@ static int delete_vma_from_mm(struct vm_area_struct *vma)
> > >                    current->pid);
> > >             return -ENOMEM;
> > >     }
> > > +   vma_start_write(vma);
> > >     cleanup_vma_from_mm(vma);
> >
> >       3) I think this hunk could be dropped as Per-VMA lock depends on MMU anyway.

Ah, yes, you are right. We can safely remove the changes in nommu.c
Andrew, should I post a fixup or you can make the removal directly in
mm-unstable?
Thanks,
Suren.

> >
> > Thanks,
> > Hyeonggon
> >
> > >
> > >     /* remove from the MM's tree and list */
> > > @@ -1519,6 +1520,10 @@ void exit_mmap(struct mm_struct *mm)
> > >      */
> > >     mmap_write_lock(mm);
> > >     for_each_vma(vmi, vma) {
> > > +           /*
> > > +            * No need to lock VMA because this is the only mm user and no
> > > +            * page fault handled can race with it.
> > > +            */
> > >             cleanup_vma_from_mm(vma);
> > >             delete_vma(mm, vma);
> > >             cond_resched();
> > > --
> > > 2.39.2.722.g9855ee24e9-goog
> > >
> > >
> >
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
Suren Baghdasaryan March 1, 2023, 6:42 p.m. UTC | #4
On Wed, Mar 1, 2023 at 10:34 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Tue, Feb 28, 2023 at 11:57 PM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
> >
> > On Wed, Mar 01, 2023 at 07:43:33AM +0000, Hyeonggon Yoo wrote:
> > > On Mon, Feb 27, 2023 at 09:36:17AM -0800, Suren Baghdasaryan wrote:
> > > > Write-locking VMAs before isolating them ensures that page fault
> > > > handlers don't operate on isolated VMAs.
> > > >
> > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > > > ---
> > > >  mm/mmap.c  | 1 +
> > > >  mm/nommu.c | 5 +++++
> > > >  2 files changed, 6 insertions(+)
> > > >
> > > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > > index 1f42b9a52b9b..f7ed357056c4 100644
> > > > --- a/mm/mmap.c
> > > > +++ b/mm/mmap.c
> > > > @@ -2255,6 +2255,7 @@ int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > > >  static inline int munmap_sidetree(struct vm_area_struct *vma,
> > > >                                struct ma_state *mas_detach)
> > > >  {
> > > > +   vma_start_write(vma);
> > > >     mas_set_range(mas_detach, vma->vm_start, vma->vm_end - 1);
> > >
> > > I may be missing something, but have few questions:
> > >
> > >       1) Why does a writer need to both write-lock a VMA and mark the VMA detached
> > >          when unmapping it, isn't it enough to just only write-lock a VMA?
>
> We need to mark the VMA detached to avoid handling page fault in a
> detached VMA. The possible scenario is:
>
> lock_vma_under_rcu
>   vma = mas_walk(&mas)
>                                                         munmap_sidetree
>                                                           vma_start_write(vma)
>
> mas_store_gfp() // remove VMA from the tree
>                                                           vma_end_write_all()
>   vma_start_read(vma)
>   // we locked the VMA but it is not part of the tree anymore.
>
> So, marking the VMA locked before vma_end_write_all() and checking

Sorry, I should have said "marking the VMA *detached* before
vma_end_write_all() and checking vma->detached after vma_start_read()
helps us avoid handling faults in the detached VMA."

> vma->detached after vma_start_read() helps us avoid handling faults in
> the detached VMA.
>
>
> > >
> > >       2) as VMAs that are going to be removed are already locked in vma_prepare(),
> > >          so I think this hunk could be dropped?
> >
> > After sending this just realized that I did not consider simple munmap case :)
> > But I still think 1) and 3) are valid question.
> >
> > >
> > > >     if (mas_store_gfp(mas_detach, vma, GFP_KERNEL))
> > > >             return -ENOMEM;
> > > > diff --git a/mm/nommu.c b/mm/nommu.c
> > > > index 57ba243c6a37..2ab162d773e2 100644
> > > > --- a/mm/nommu.c
> > > > +++ b/mm/nommu.c
> > > > @@ -588,6 +588,7 @@ static int delete_vma_from_mm(struct vm_area_struct *vma)
> > > >                    current->pid);
> > > >             return -ENOMEM;
> > > >     }
> > > > +   vma_start_write(vma);
> > > >     cleanup_vma_from_mm(vma);
> > >
> > >       3) I think this hunk could be dropped as Per-VMA lock depends on MMU anyway.
>
> Ah, yes, you are right. We can safely remove the changes in nommu.c
> Andrew, should I post a fixup or you can make the removal directly in
> mm-unstable?
> Thanks,
> Suren.
>
> > >
> > > Thanks,
> > > Hyeonggon
> > >
> > > >
> > > >     /* remove from the MM's tree and list */
> > > > @@ -1519,6 +1520,10 @@ void exit_mmap(struct mm_struct *mm)
> > > >      */
> > > >     mmap_write_lock(mm);
> > > >     for_each_vma(vmi, vma) {
> > > > +           /*
> > > > +            * No need to lock VMA because this is the only mm user and no
> > > > +            * page fault handled can race with it.
> > > > +            */
> > > >             cleanup_vma_from_mm(vma);
> > > >             delete_vma(mm, vma);
> > > >             cond_resched();
> > > > --
> > > > 2.39.2.722.g9855ee24e9-goog
> > > >
> > > >
> > >
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> >
Suren Baghdasaryan March 1, 2023, 7:07 p.m. UTC | #5
On Wed, Mar 1, 2023 at 10:34 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Tue, Feb 28, 2023 at 11:57 PM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
> >
> > On Wed, Mar 01, 2023 at 07:43:33AM +0000, Hyeonggon Yoo wrote:
> > > On Mon, Feb 27, 2023 at 09:36:17AM -0800, Suren Baghdasaryan wrote:
> > > > Write-locking VMAs before isolating them ensures that page fault
> > > > handlers don't operate on isolated VMAs.
> > > >
> > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > > > ---
> > > >  mm/mmap.c  | 1 +
> > > >  mm/nommu.c | 5 +++++
> > > >  2 files changed, 6 insertions(+)
> > > >
> > > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > > index 1f42b9a52b9b..f7ed357056c4 100644
> > > > --- a/mm/mmap.c
> > > > +++ b/mm/mmap.c
> > > > @@ -2255,6 +2255,7 @@ int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > > >  static inline int munmap_sidetree(struct vm_area_struct *vma,
> > > >                                struct ma_state *mas_detach)
> > > >  {
> > > > +   vma_start_write(vma);
> > > >     mas_set_range(mas_detach, vma->vm_start, vma->vm_end - 1);
> > >
> > > I may be missing something, but have few questions:
> > >
> > >       1) Why does a writer need to both write-lock a VMA and mark the VMA detached
> > >          when unmapping it, isn't it enough to just only write-lock a VMA?
>
> We need to mark the VMA detached to avoid handling page fault in a
> detached VMA. The possible scenario is:
>
> lock_vma_under_rcu
>   vma = mas_walk(&mas)
>                                                         munmap_sidetree
>                                                           vma_start_write(vma)
>
> mas_store_gfp() // remove VMA from the tree
>                                                           vma_end_write_all()
>   vma_start_read(vma)
>   // we locked the VMA but it is not part of the tree anymore.
>
> So, marking the VMA locked before vma_end_write_all() and checking
> vma->detached after vma_start_read() helps us avoid handling faults in
> the detached VMA.
>
>
> > >
> > >       2) as VMAs that are going to be removed are already locked in vma_prepare(),
> > >          so I think this hunk could be dropped?
> >
> > After sending this just realized that I did not consider simple munmap case :)
> > But I still think 1) and 3) are valid question.
> >
> > >
> > > >     if (mas_store_gfp(mas_detach, vma, GFP_KERNEL))
> > > >             return -ENOMEM;
> > > > diff --git a/mm/nommu.c b/mm/nommu.c
> > > > index 57ba243c6a37..2ab162d773e2 100644
> > > > --- a/mm/nommu.c
> > > > +++ b/mm/nommu.c
> > > > @@ -588,6 +588,7 @@ static int delete_vma_from_mm(struct vm_area_struct *vma)
> > > >                    current->pid);
> > > >             return -ENOMEM;
> > > >     }
> > > > +   vma_start_write(vma);
> > > >     cleanup_vma_from_mm(vma);
> > >
> > >       3) I think this hunk could be dropped as Per-VMA lock depends on MMU anyway.
>
> Ah, yes, you are right. We can safely remove the changes in nommu.c
> Andrew, should I post a fixup or you can make the removal directly in
> mm-unstable?

I went ahead and posted the fixup for this at:
https://lore.kernel.org/all/20230301190457.1498985-1-surenb@google.com/

> Thanks,
> Suren.
>
> > >
> > > Thanks,
> > > Hyeonggon
> > >
> > > >
> > > >     /* remove from the MM's tree and list */
> > > > @@ -1519,6 +1520,10 @@ void exit_mmap(struct mm_struct *mm)
> > > >      */
> > > >     mmap_write_lock(mm);
> > > >     for_each_vma(vmi, vma) {
> > > > +           /*
> > > > +            * No need to lock VMA because this is the only mm user and no
> > > > +            * page fault handled can race with it.
> > > > +            */
> > > >             cleanup_vma_from_mm(vma);
> > > >             delete_vma(mm, vma);
> > > >             cond_resched();
> > > > --
> > > > 2.39.2.722.g9855ee24e9-goog
> > > >
> > > >
> > >
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> >
Hyeonggon Yoo March 2, 2023, 12:53 a.m. UTC | #6
On Wed, Mar 01, 2023 at 10:42:48AM -0800, Suren Baghdasaryan wrote:
> On Wed, Mar 1, 2023 at 10:34 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Tue, Feb 28, 2023 at 11:57 PM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
> > >
> > > On Wed, Mar 01, 2023 at 07:43:33AM +0000, Hyeonggon Yoo wrote:
> > > > On Mon, Feb 27, 2023 at 09:36:17AM -0800, Suren Baghdasaryan wrote:
> > > > > Write-locking VMAs before isolating them ensures that page fault
> > > > > handlers don't operate on isolated VMAs.
> > > > >
> > > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > > > > ---
> > > > >  mm/mmap.c  | 1 +
> > > > >  mm/nommu.c | 5 +++++
> > > > >  2 files changed, 6 insertions(+)
> > > > >
> > > > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > > > index 1f42b9a52b9b..f7ed357056c4 100644
> > > > > --- a/mm/mmap.c
> > > > > +++ b/mm/mmap.c
> > > > > @@ -2255,6 +2255,7 @@ int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > > > >  static inline int munmap_sidetree(struct vm_area_struct *vma,
> > > > >                                struct ma_state *mas_detach)
> > > > >  {
> > > > > +   vma_start_write(vma);
> > > > >     mas_set_range(mas_detach, vma->vm_start, vma->vm_end - 1);
> > > >
> > > > I may be missing something, but have few questions:
> > > >
> > > >       1) Why does a writer need to both write-lock a VMA and mark the VMA detached
> > > >          when unmapping it, isn't it enough to just only write-lock a VMA?
> >
> > We need to mark the VMA detached to avoid handling page fault in a
> > detached VMA. The possible scenario is:
> >
> > lock_vma_under_rcu
> >   vma = mas_walk(&mas)
> >                                                         munmap_sidetree
> >                                                           vma_start_write(vma)
> >
> > mas_store_gfp() // remove VMA from the tree
> >                                                           vma_end_write_all()
> >   vma_start_read(vma)
> >   // we locked the VMA but it is not part of the tree anymore.
> >
> > So, marking the VMA locked before vma_end_write_all() and checking
> 
> Sorry, I should have said "marking the VMA *detached* before
> vma_end_write_all() and checking vma->detached after vma_start_read()
> helps us avoid handling faults in the detached VMA."
> 
> > vma->detached after vma_start_read() helps us avoid handling faults in
> > the detached VMA.

Thank you for explanation, that makes sense!

By the way, if there are no 32bit users of Per-VMA lock (are there?),
"detached" bool could be a VMA flag (i.e. making it depend on 64BIT
and selecting ARCH_USES_HIGH_VMA_FLAGS)

Thanks,
Hyeonggon
Suren Baghdasaryan March 2, 2023, 2:21 a.m. UTC | #7
On Wed, Mar 1, 2023 at 4:54 PM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
>
> On Wed, Mar 01, 2023 at 10:42:48AM -0800, Suren Baghdasaryan wrote:
> > On Wed, Mar 1, 2023 at 10:34 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Tue, Feb 28, 2023 at 11:57 PM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
> > > >
> > > > On Wed, Mar 01, 2023 at 07:43:33AM +0000, Hyeonggon Yoo wrote:
> > > > > On Mon, Feb 27, 2023 at 09:36:17AM -0800, Suren Baghdasaryan wrote:
> > > > > > Write-locking VMAs before isolating them ensures that page fault
> > > > > > handlers don't operate on isolated VMAs.
> > > > > >
> > > > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > > > > > ---
> > > > > >  mm/mmap.c  | 1 +
> > > > > >  mm/nommu.c | 5 +++++
> > > > > >  2 files changed, 6 insertions(+)
> > > > > >
> > > > > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > > > > index 1f42b9a52b9b..f7ed357056c4 100644
> > > > > > --- a/mm/mmap.c
> > > > > > +++ b/mm/mmap.c
> > > > > > @@ -2255,6 +2255,7 @@ int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > > > > >  static inline int munmap_sidetree(struct vm_area_struct *vma,
> > > > > >                                struct ma_state *mas_detach)
> > > > > >  {
> > > > > > +   vma_start_write(vma);
> > > > > >     mas_set_range(mas_detach, vma->vm_start, vma->vm_end - 1);
> > > > >
> > > > > I may be missing something, but have few questions:
> > > > >
> > > > >       1) Why does a writer need to both write-lock a VMA and mark the VMA detached
> > > > >          when unmapping it, isn't it enough to just only write-lock a VMA?
> > >
> > > We need to mark the VMA detached to avoid handling page fault in a
> > > detached VMA. The possible scenario is:
> > >
> > > lock_vma_under_rcu
> > >   vma = mas_walk(&mas)
> > >                                                         munmap_sidetree
> > >                                                           vma_start_write(vma)
> > >
> > > mas_store_gfp() // remove VMA from the tree
> > >                                                           vma_end_write_all()
> > >   vma_start_read(vma)
> > >   // we locked the VMA but it is not part of the tree anymore.
> > >
> > > So, marking the VMA locked before vma_end_write_all() and checking
> >
> > Sorry, I should have said "marking the VMA *detached* before
> > vma_end_write_all() and checking vma->detached after vma_start_read()
> > helps us avoid handling faults in the detached VMA."
> >
> > > vma->detached after vma_start_read() helps us avoid handling faults in
> > > the detached VMA.
>
> Thank you for explanation, that makes sense!
>
> By the way, if there are no 32bit users of Per-VMA lock (are there?),
> "detached" bool could be a VMA flag (i.e. making it depend on 64BIT
> and selecting ARCH_USES_HIGH_VMA_FLAGS)

Yeah, I thought about it but didn't want to make assumptions about
potential users just yet. Besides, I heard there are attempts to make
vm_flags to be always 64-bit (I think Matthew mentioned that to me
once). If that happens, we won't need any dependencies here. Either
way, this conversion into a flag can be done as an additional
optimization later on. I prefer to keep the main patchset as simple as
possible for now.
Thanks,
Suren.

>
> Thanks,
> Hyeonggon
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
diff mbox series

Patch

diff --git a/mm/mmap.c b/mm/mmap.c
index 1f42b9a52b9b..f7ed357056c4 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2255,6 +2255,7 @@  int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
 static inline int munmap_sidetree(struct vm_area_struct *vma,
 				   struct ma_state *mas_detach)
 {
+	vma_start_write(vma);
 	mas_set_range(mas_detach, vma->vm_start, vma->vm_end - 1);
 	if (mas_store_gfp(mas_detach, vma, GFP_KERNEL))
 		return -ENOMEM;
diff --git a/mm/nommu.c b/mm/nommu.c
index 57ba243c6a37..2ab162d773e2 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -588,6 +588,7 @@  static int delete_vma_from_mm(struct vm_area_struct *vma)
 		       current->pid);
 		return -ENOMEM;
 	}
+	vma_start_write(vma);
 	cleanup_vma_from_mm(vma);
 
 	/* remove from the MM's tree and list */
@@ -1519,6 +1520,10 @@  void exit_mmap(struct mm_struct *mm)
 	 */
 	mmap_write_lock(mm);
 	for_each_vma(vmi, vma) {
+		/*
+		 * No need to lock VMA because this is the only mm user and no
+		 * page fault handled can race with it.
+		 */
 		cleanup_vma_from_mm(vma);
 		delete_vma(mm, vma);
 		cond_resched();