diff mbox series

[v3,5/6] mm: always lock new vma before inserting into vma tree

Message ID 20230803172652.2849981-6-surenb@google.com (mailing list archive)
State New
Headers show
Series make vma locking more obvious | expand

Commit Message

Suren Baghdasaryan Aug. 3, 2023, 5:26 p.m. UTC
While it's not strictly necessary to lock a newly created vma before
adding it into the vma tree (as long as no further changes are performed
to it), it seems like a good policy to lock it and prevent accidental
changes after it becomes visible to the page faults. Lock the vma before
adding it into the vma tree.

Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/mmap.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Linus Torvalds Aug. 3, 2023, 6:01 p.m. UTC | #1
On Thu, 3 Aug 2023 at 10:27, Suren Baghdasaryan <surenb@google.com> wrote:
>
> While it's not strictly necessary to lock a newly created vma before
> adding it into the vma tree (as long as no further changes are performed
> to it), it seems like a good policy to lock it and prevent accidental
> changes after it becomes visible to the page faults. Lock the vma before
> adding it into the vma tree.

So my main reaction here is that I started to wonder about the vma allocation.

Why doesn't vma_init() do something like

        mmap_assert_write_locked(mm);
        vma->vm_lock_seq = mm->mm_lock_seq;

and instead we seem to expect vma_lock_alloc() to do this (and do it
very badly indeed).

Strange.

Anyway, this observation was just a reaction to that "not strictly
necessary to lock a newly created vma" part of the commentary. I feel
like we could/should just make sure that all newly created vma's are
always simply created write-locked.

                Linus
Liam R. Howlett Aug. 3, 2023, 6:15 p.m. UTC | #2
* Linus Torvalds <torvalds@linux-foundation.org> [230803 14:02]:
> On Thu, 3 Aug 2023 at 10:27, Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > While it's not strictly necessary to lock a newly created vma before
> > adding it into the vma tree (as long as no further changes are performed
> > to it), it seems like a good policy to lock it and prevent accidental
> > changes after it becomes visible to the page faults. Lock the vma before
> > adding it into the vma tree.
> 
> So my main reaction here is that I started to wonder about the vma allocation.
> 
> Why doesn't vma_init() do something like
> 
>         mmap_assert_write_locked(mm);
>         vma->vm_lock_seq = mm->mm_lock_seq;
> 
> and instead we seem to expect vma_lock_alloc() to do this (and do it
> very badly indeed).
> 
> Strange.
> 
> Anyway, this observation was just a reaction to that "not strictly
> necessary to lock a newly created vma" part of the commentary. I feel
> like we could/should just make sure that all newly created vma's are
> always simply created write-locked.
> 

I thought the same thing initially, but Suren pointed out that it's not
necessary to hold the vma lock to allocate a vma object.  And it seems
there is at least one user (arch/ia64/mm/init.c) which does allocate
outside the lock during ia64_init_addr_space(), which is fine but I'm
not sure it gains much to do it this way - the insert needs to take the
lock anyways and it is hardly going to be contended.

Anywhere else besides an address space setup would probably introduce a
race.

Thanks,
Liam
Suren Baghdasaryan Aug. 3, 2023, 6:26 p.m. UTC | #3
On Thu, Aug 3, 2023 at 11:15 AM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
>
> * Linus Torvalds <torvalds@linux-foundation.org> [230803 14:02]:
> > On Thu, 3 Aug 2023 at 10:27, Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > While it's not strictly necessary to lock a newly created vma before
> > > adding it into the vma tree (as long as no further changes are performed
> > > to it), it seems like a good policy to lock it and prevent accidental
> > > changes after it becomes visible to the page faults. Lock the vma before
> > > adding it into the vma tree.
> >
> > So my main reaction here is that I started to wonder about the vma allocation.
> >
> > Why doesn't vma_init() do something like
> >
> >         mmap_assert_write_locked(mm);
> >         vma->vm_lock_seq = mm->mm_lock_seq;
> >
> > and instead we seem to expect vma_lock_alloc() to do this (and do it
> > very badly indeed).
> >
> > Strange.
> >
> > Anyway, this observation was just a reaction to that "not strictly
> > necessary to lock a newly created vma" part of the commentary. I feel
> > like we could/should just make sure that all newly created vma's are
> > always simply created write-locked.
> >
>
> I thought the same thing initially, but Suren pointed out that it's not
> necessary to hold the vma lock to allocate a vma object.  And it seems
> there is at least one user (arch/ia64/mm/init.c) which does allocate
> outside the lock during ia64_init_addr_space(), which is fine but I'm
> not sure it gains much to do it this way - the insert needs to take the
> lock anyways and it is hardly going to be contended.

Yeah, I remember discussing that. At the time of VMA creation the
mmap_lock might not be write-locked, so mmap_assert_write_locked()
would trigger and mm->mm_lock_seq is not stable. Maybe we can
necessitate holding mmap_lock at the time of VMA creation but that
sounds like an unnecessary restriction. IIRC some drivers also create
vm_are_structs without holding mmap_lock... I'll double-check.

>
> Anywhere else besides an address space setup would probably introduce a
> race.
>
> Thanks,
> Liam
>
Suren Baghdasaryan Aug. 3, 2023, 6:34 p.m. UTC | #4
On Thu, Aug 3, 2023 at 11:26 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Thu, Aug 3, 2023 at 11:15 AM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> >
> > * Linus Torvalds <torvalds@linux-foundation.org> [230803 14:02]:
> > > On Thu, 3 Aug 2023 at 10:27, Suren Baghdasaryan <surenb@google.com> wrote:
> > > >
> > > > While it's not strictly necessary to lock a newly created vma before
> > > > adding it into the vma tree (as long as no further changes are performed
> > > > to it), it seems like a good policy to lock it and prevent accidental
> > > > changes after it becomes visible to the page faults. Lock the vma before
> > > > adding it into the vma tree.
> > >
> > > So my main reaction here is that I started to wonder about the vma allocation.
> > >
> > > Why doesn't vma_init() do something like
> > >
> > >         mmap_assert_write_locked(mm);
> > >         vma->vm_lock_seq = mm->mm_lock_seq;
> > >
> > > and instead we seem to expect vma_lock_alloc() to do this (and do it
> > > very badly indeed).
> > >
> > > Strange.
> > >
> > > Anyway, this observation was just a reaction to that "not strictly
> > > necessary to lock a newly created vma" part of the commentary. I feel
> > > like we could/should just make sure that all newly created vma's are
> > > always simply created write-locked.
> > >
> >
> > I thought the same thing initially, but Suren pointed out that it's not
> > necessary to hold the vma lock to allocate a vma object.  And it seems
> > there is at least one user (arch/ia64/mm/init.c) which does allocate
> > outside the lock during ia64_init_addr_space(), which is fine but I'm
> > not sure it gains much to do it this way - the insert needs to take the
> > lock anyways and it is hardly going to be contended.
>
> Yeah, I remember discussing that. At the time of VMA creation the
> mmap_lock might not be write-locked, so mmap_assert_write_locked()
> would trigger and mm->mm_lock_seq is not stable. Maybe we can
> necessitate holding mmap_lock at the time of VMA creation but that
> sounds like an unnecessary restriction. IIRC some drivers also create
> vm_are_structs without holding mmap_lock... I'll double-check.

Yeah, there are places like an initcall gate_vma_init() which call
vma_init(). I don't think these are called with a locked mmap_lock.

>
> >
> > Anywhere else besides an address space setup would probably introduce a
> > race.
> >
> > Thanks,
> > Liam
> >
diff mbox series

Patch

diff --git a/mm/mmap.c b/mm/mmap.c
index 3937479d0e07..850a39dee075 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -412,6 +412,8 @@  static int vma_link(struct mm_struct *mm, struct vm_area_struct *vma)
 	if (vma_iter_prealloc(&vmi))
 		return -ENOMEM;
 
+	vma_start_write(vma);
+
 	if (vma->vm_file) {
 		mapping = vma->vm_file->f_mapping;
 		i_mmap_lock_write(mapping);
@@ -477,7 +479,8 @@  static inline void vma_prepare(struct vma_prepare *vp)
 	vma_start_write(vp->vma);
 	if (vp->adj_next)
 		vma_start_write(vp->adj_next);
-	/* vp->insert is always a newly created VMA, no need for locking */
+	if (vp->insert)
+		vma_start_write(vp->insert);
 	if (vp->remove)
 		vma_start_write(vp->remove);
 	if (vp->remove2)
@@ -3098,6 +3101,7 @@  static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	vma->vm_pgoff = addr >> PAGE_SHIFT;
 	vm_flags_init(vma, flags);
 	vma->vm_page_prot = vm_get_page_prot(flags);
+	vma_start_write(vma);
 	if (vma_iter_store_gfp(vmi, vma, GFP_KERNEL))
 		goto mas_store_fail;
 
@@ -3345,7 +3349,6 @@  struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
 			get_file(new_vma->vm_file);
 		if (new_vma->vm_ops && new_vma->vm_ops->open)
 			new_vma->vm_ops->open(new_vma);
-		vma_start_write(new_vma);
 		if (vma_link(mm, new_vma))
 			goto out_vma_link;
 		*need_rmap_locks = false;