diff mbox series

[v2] mm/mmap: Fix race in mmap_region() with ftrucate()

Message ID 20241016013455.2241533-1-Liam.Howlett@oracle.com (mailing list archive)
State New
Headers show
Series [v2] mm/mmap: Fix race in mmap_region() with ftrucate() | expand

Commit Message

Liam R. Howlett Oct. 16, 2024, 1:34 a.m. UTC
From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Avoiding the zeroing of the vma tree in mmap_region() introduced a race
with truncate in the page table walk.  To avoid any races, create a hole
in the rmap during the operation by clearing the pagetable entries
earlier under the mmap write lock and (critically) before the new vma is
installed into the vma tree.  The result is that the old vma(s) are left
in the vma tree, but free_pgtables() removes them from the rmap and
clears the ptes while holding the necessary locks.

This change extends the fix required for hugetblfs and the call_mmap()
function by moving the cleanup higher in the function and running it
unconditionally.

Cc: Jann Horn <jannh@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Fixes: f8d112a4e657 ("mm/mmap: avoid zeroing vma tree in mmap_region()")
Reported-by: Jann Horn <jannh@google.com>
Closes: https://lore.kernel.org/all/CAG48ez0ZpGzxi=-5O_uGQ0xKXOmbjeQ0LjZsRJ1Qtf2X5eOr1w@mail.gmail.com/
Link: https://lore.kernel.org/all/CAG48ez0ZpGzxi=-5O_uGQ0xKXOmbjeQ0LjZsRJ1Qtf2X5eOr1w@mail.gmail.com/
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/mmap.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

v1: https://lore.kernel.org/all/20241015161135.2133951-1-Liam.Howlett@oracle.com/

Changes since v1:
  Updated commit message - Thanks Lorenzo

Comments

Vlastimil Babka Oct. 16, 2024, 9:45 a.m. UTC | #1
Subject has a typo, should say ftruncate()

Also let's explicitly note it's a 6.12 hotfix.

On 10/16/24 03:34, Liam R. Howlett wrote:
> From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
> 
> Avoiding the zeroing of the vma tree in mmap_region() introduced a race
> with truncate in the page table walk.  To avoid any races, create a hole
> in the rmap during the operation by clearing the pagetable entries
> earlier under the mmap write lock and (critically) before the new vma is
> installed into the vma tree.  The result is that the old vma(s) are left
> in the vma tree, but free_pgtables() removes them from the rmap and
> clears the ptes while holding the necessary locks.

And no parallel page faults can reinstate any PTEs as the vma's are marked
as detached, right.

> This change extends the fix required for hugetblfs and the call_mmap()
> function by moving the cleanup higher in the function and running it
> unconditionally.
> 
> Cc: Jann Horn <jannh@google.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: David Hildenbrand <david@redhat.com>
> Fixes: f8d112a4e657 ("mm/mmap: avoid zeroing vma tree in mmap_region()")
> Reported-by: Jann Horn <jannh@google.com>
> Closes: https://lore.kernel.org/all/CAG48ez0ZpGzxi=-5O_uGQ0xKXOmbjeQ0LjZsRJ1Qtf2X5eOr1w@mail.gmail.com/
> Link: https://lore.kernel.org/all/CAG48ez0ZpGzxi=-5O_uGQ0xKXOmbjeQ0LjZsRJ1Qtf2X5eOr1w@mail.gmail.com/
> Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> Reviewed-by: Jann Horn <jannh@google.com>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/mmap.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> v1: https://lore.kernel.org/all/20241015161135.2133951-1-Liam.Howlett@oracle.com/
> 
> Changes since v1:
>   Updated commit message - Thanks Lorenzo
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index dd4b35a25aeb..a20998fb633c 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1413,6 +1413,13 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
>  		vmg.flags = vm_flags;
>  	}
>  
> +	/*
> +	 * clear PTEs while the vma is still in the tree so that rmap
> +	 * cannot race with the freeing later in the truncate scenario.
> +	 * This is also needed for call_mmap(), which is why vm_ops
> +	 * close function is called.
> +	 */
> +	vms_clean_up_area(&vms, &mas_detach);
>  	vma = vma_merge_new_range(&vmg);
>  	if (vma)
>  		goto expanded;
> @@ -1432,11 +1439,6 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
>  
>  	if (file) {
>  		vma->vm_file = get_file(file);
> -		/*
> -		 * call_mmap() may map PTE, so ensure there are no existing PTEs
> -		 * and call the vm_ops close function if one exists.
> -		 */
> -		vms_clean_up_area(&vms, &mas_detach);
>  		error = call_mmap(file, vma);
>  		if (error)
>  			goto unmap_and_free_vma;
Liam R. Howlett Oct. 16, 2024, 2:17 p.m. UTC | #2
* Vlastimil Babka <vbabka@suse.cz> [241016 05:45]:
> Subject has a typo, should say ftruncate()
> 
> Also let's explicitly note it's a 6.12 hotfix.
> 
> On 10/16/24 03:34, Liam R. Howlett wrote:
> > From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
> > 
> > Avoiding the zeroing of the vma tree in mmap_region() introduced a race
> > with truncate in the page table walk.  To avoid any races, create a hole
> > in the rmap during the operation by clearing the pagetable entries
> > earlier under the mmap write lock and (critically) before the new vma is
> > installed into the vma tree.  The result is that the old vma(s) are left
> > in the vma tree, but free_pgtables() removes them from the rmap and
> > clears the ptes while holding the necessary locks.
> 
> And no parallel page faults can reinstate any PTEs as the vma's are marked
> as detached, right.

Right, it is detached and waiting to be freed so the per-vma page
faulting will fall through and wait for the mmap read lock to continue
page faulting, but the MAP_FIXED call has to drop the mmap write lock.

> 
> > This change extends the fix required for hugetblfs and the call_mmap()
> > function by moving the cleanup higher in the function and running it
> > unconditionally.
> > 
> > Cc: Jann Horn <jannh@google.com>
> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: David Hildenbrand <david@redhat.com>
> > Fixes: f8d112a4e657 ("mm/mmap: avoid zeroing vma tree in mmap_region()")
> > Reported-by: Jann Horn <jannh@google.com>
> > Closes: https://lore.kernel.org/all/CAG48ez0ZpGzxi=-5O_uGQ0xKXOmbjeQ0LjZsRJ1Qtf2X5eOr1w@mail.gmail.com/
> > Link: https://lore.kernel.org/all/CAG48ez0ZpGzxi=-5O_uGQ0xKXOmbjeQ0LjZsRJ1Qtf2X5eOr1w@mail.gmail.com/
> > Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> > Reviewed-by: Jann Horn <jannh@google.com>
> > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>

Thanks!

> 
> > ---
> >  mm/mmap.c | 12 +++++++-----
> >  1 file changed, 7 insertions(+), 5 deletions(-)
> > 
> > v1: https://lore.kernel.org/all/20241015161135.2133951-1-Liam.Howlett@oracle.com/
> > 
> > Changes since v1:
> >   Updated commit message - Thanks Lorenzo
> > 
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index dd4b35a25aeb..a20998fb633c 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1413,6 +1413,13 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> >  		vmg.flags = vm_flags;
> >  	}
> >  
> > +	/*
> > +	 * clear PTEs while the vma is still in the tree so that rmap
> > +	 * cannot race with the freeing later in the truncate scenario.
> > +	 * This is also needed for call_mmap(), which is why vm_ops
> > +	 * close function is called.
> > +	 */
> > +	vms_clean_up_area(&vms, &mas_detach);
> >  	vma = vma_merge_new_range(&vmg);
> >  	if (vma)
> >  		goto expanded;
> > @@ -1432,11 +1439,6 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> >  
> >  	if (file) {
> >  		vma->vm_file = get_file(file);
> > -		/*
> > -		 * call_mmap() may map PTE, so ensure there are no existing PTEs
> > -		 * and call the vm_ops close function if one exists.
> > -		 */
> > -		vms_clean_up_area(&vms, &mas_detach);
> >  		error = call_mmap(file, vma);
> >  		if (error)
> >  			goto unmap_and_free_vma;
>
Liam R. Howlett Oct. 16, 2024, 4:56 p.m. UTC | #3
This should have been a hotfix as it needs to go into v6.12

* Liam R. Howlett <Liam.Howlett@oracle.com> [241015 21:35]:
> From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
> 
> Avoiding the zeroing of the vma tree in mmap_region() introduced a race
> with truncate in the page table walk.  To avoid any races, create a hole
> in the rmap during the operation by clearing the pagetable entries
> earlier under the mmap write lock and (critically) before the new vma is
> installed into the vma tree.  The result is that the old vma(s) are left
> in the vma tree, but free_pgtables() removes them from the rmap and
> clears the ptes while holding the necessary locks.
> 
> This change extends the fix required for hugetblfs and the call_mmap()
> function by moving the cleanup higher in the function and running it
> unconditionally.
> 
> Cc: Jann Horn <jannh@google.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: David Hildenbrand <david@redhat.com>
> Fixes: f8d112a4e657 ("mm/mmap: avoid zeroing vma tree in mmap_region()")
> Reported-by: Jann Horn <jannh@google.com>
> Closes: https://lore.kernel.org/all/CAG48ez0ZpGzxi=-5O_uGQ0xKXOmbjeQ0LjZsRJ1Qtf2X5eOr1w@mail.gmail.com/
> Link: https://lore.kernel.org/all/CAG48ez0ZpGzxi=-5O_uGQ0xKXOmbjeQ0LjZsRJ1Qtf2X5eOr1w@mail.gmail.com/
> Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> Reviewed-by: Jann Horn <jannh@google.com>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>  mm/mmap.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> v1: https://lore.kernel.org/all/20241015161135.2133951-1-Liam.Howlett@oracle.com/
> 
> Changes since v1:
>   Updated commit message - Thanks Lorenzo
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index dd4b35a25aeb..a20998fb633c 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1413,6 +1413,13 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
>  		vmg.flags = vm_flags;
>  	}
>  
> +	/*
> +	 * clear PTEs while the vma is still in the tree so that rmap
> +	 * cannot race with the freeing later in the truncate scenario.
> +	 * This is also needed for call_mmap(), which is why vm_ops
> +	 * close function is called.
> +	 */
> +	vms_clean_up_area(&vms, &mas_detach);
>  	vma = vma_merge_new_range(&vmg);
>  	if (vma)
>  		goto expanded;
> @@ -1432,11 +1439,6 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
>  
>  	if (file) {
>  		vma->vm_file = get_file(file);
> -		/*
> -		 * call_mmap() may map PTE, so ensure there are no existing PTEs
> -		 * and call the vm_ops close function if one exists.
> -		 */
> -		vms_clean_up_area(&vms, &mas_detach);
>  		error = call_mmap(file, vma);
>  		if (error)
>  			goto unmap_and_free_vma;
> -- 
> 2.43.0
>
diff mbox series

Patch

diff --git a/mm/mmap.c b/mm/mmap.c
index dd4b35a25aeb..a20998fb633c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1413,6 +1413,13 @@  unsigned long mmap_region(struct file *file, unsigned long addr,
 		vmg.flags = vm_flags;
 	}
 
+	/*
+	 * clear PTEs while the vma is still in the tree so that rmap
+	 * cannot race with the freeing later in the truncate scenario.
+	 * This is also needed for call_mmap(), which is why vm_ops
+	 * close function is called.
+	 */
+	vms_clean_up_area(&vms, &mas_detach);
 	vma = vma_merge_new_range(&vmg);
 	if (vma)
 		goto expanded;
@@ -1432,11 +1439,6 @@  unsigned long mmap_region(struct file *file, unsigned long addr,
 
 	if (file) {
 		vma->vm_file = get_file(file);
-		/*
-		 * call_mmap() may map PTE, so ensure there are no existing PTEs
-		 * and call the vm_ops close function if one exists.
-		 */
-		vms_clean_up_area(&vms, &mas_detach);
 		error = call_mmap(file, vma);
 		if (error)
 			goto unmap_and_free_vma;