diff mbox series

[1/6] mm/hugetlb: Fix uffd-wp during fork()

Message ID 20230413231120.544685-2-peterx@redhat.com (mailing list archive)
State New
Headers show
Series mm/hugetlb: More fixes around uffd-wp vs fork() / RO pins | expand

Commit Message

Peter Xu April 13, 2023, 11:11 p.m. UTC
There're a bunch of things that were wrong:

  - Reading uffd-wp bit from a swap entry should use pte_swp_uffd_wp()
    rather than huge_pte_uffd_wp().

  - When copying over a pte, we should drop uffd-wp bit when
    !EVENT_FORK (aka, when !userfaultfd_wp(dst_vma)).

  - When doing early CoW for private hugetlb (e.g. when the parent page was
    pinned), uffd-wp bit should be properly carried over if necessary.

No bug reported probably because most people do not even care about these
corner cases, but they are still bugs and can be exposed by the recent unit
tests introduced, so fix all of them in one shot.

Cc: linux-stable <stable@vger.kernel.org>
Fixes: bc70fbf269fd ("mm/hugetlb: handle uffd-wp during fork()")
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/hugetlb.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

Comments

David Hildenbrand April 14, 2023, 9:37 a.m. UTC | #1
On 14.04.23 01:11, Peter Xu wrote:
> There're a bunch of things that were wrong:
> 
>    - Reading uffd-wp bit from a swap entry should use pte_swp_uffd_wp()
>      rather than huge_pte_uffd_wp().
> 
>    - When copying over a pte, we should drop uffd-wp bit when
>      !EVENT_FORK (aka, when !userfaultfd_wp(dst_vma)).
> 
>    - When doing early CoW for private hugetlb (e.g. when the parent page was
>      pinned), uffd-wp bit should be properly carried over if necessary.
> 
> No bug reported probably because most people do not even care about these
> corner cases, but they are still bugs and can be exposed by the recent unit
> tests introduced, so fix all of them in one shot.
> 
> Cc: linux-stable <stable@vger.kernel.org>
> Fixes: bc70fbf269fd ("mm/hugetlb: handle uffd-wp during fork()")
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   mm/hugetlb.c | 26 ++++++++++++++++----------
>   1 file changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f16b25b1a6b9..7320e64aacc6 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4953,11 +4953,15 @@ static bool is_hugetlb_entry_hwpoisoned(pte_t pte)
>   
>   static void
>   hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr,
> -		     struct folio *new_folio)
> +		      struct folio *new_folio, pte_t old)
>   {

Nit: The function now expects old to be !swap_pte. Which works perfectly 
fine with existing code -- the function name is a bit generic and 
misleading, unfortunately. IMHO, instead of factoring that functionality 
out to desperately try keeping copy_hugetlb_page_range() somewhat 
readable, we should just have factored out the complete copy+replace 
into a copy_hugetlb_page() function -- similar to the ordinary page 
handling -- which would have made copy_hugetlb_page_range() more 
readable eventually.

Anyhow, unrelated.

> +	pte_t newpte = make_huge_pte(vma, &new_folio->page, 1);
> +
>   	__folio_mark_uptodate(new_folio);
>   	hugepage_add_new_anon_rmap(new_folio, vma, addr);
> -	set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, &new_folio->page, 1));
> +	if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old))
> +		newpte = huge_pte_mkuffd_wp(newpte);
> +	set_huge_pte_at(vma->vm_mm, addr, ptep, newpte);
>   	hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm);
>   	folio_set_hugetlb_migratable(new_folio);
>   }
> @@ -5032,14 +5036,11 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>   			 */
>   			;
>   		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) {
> -			bool uffd_wp = huge_pte_uffd_wp(entry);
> -
> -			if (!userfaultfd_wp(dst_vma) && uffd_wp)
> +			if (!userfaultfd_wp(dst_vma))
>   				entry = huge_pte_clear_uffd_wp(entry);
>   			set_huge_pte_at(dst, addr, dst_pte, entry);
>   		} else if (unlikely(is_hugetlb_entry_migration(entry))) {
>   			swp_entry_t swp_entry = pte_to_swp_entry(entry);
> -			bool uffd_wp = huge_pte_uffd_wp(entry);
>   
>   			if (!is_readable_migration_entry(swp_entry) && cow) {
>   				/*
> @@ -5049,11 +5050,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>   				swp_entry = make_readable_migration_entry(
>   							swp_offset(swp_entry));
>   				entry = swp_entry_to_pte(swp_entry);
> -				if (userfaultfd_wp(src_vma) && uffd_wp)
> -					entry = huge_pte_mkuffd_wp(entry);
> +				if (userfaultfd_wp(src_vma) &&
> +				    pte_swp_uffd_wp(entry))
> +					entry = pte_swp_mkuffd_wp(entry);
>   				set_huge_pte_at(src, addr, src_pte, entry);
>   			}
> -			if (!userfaultfd_wp(dst_vma) && uffd_wp)
> +			if (!userfaultfd_wp(dst_vma))
>   				entry = huge_pte_clear_uffd_wp(entry);
>   			set_huge_pte_at(dst, addr, dst_pte, entry);
>   		} else if (unlikely(is_pte_marker(entry))) {
> @@ -5114,7 +5116,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>   					/* huge_ptep of dst_pte won't change as in child */
>   					goto again;
>   				}
> -				hugetlb_install_folio(dst_vma, dst_pte, addr, new_folio);
> +				hugetlb_install_folio(dst_vma, dst_pte, addr,
> +						      new_folio, src_pte_old);
>   				spin_unlock(src_ptl);
>   				spin_unlock(dst_ptl);
>   				continue;
> @@ -5132,6 +5135,9 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>   				entry = huge_pte_wrprotect(entry);
>   			}
>   
> +			if (!userfaultfd_wp(dst_vma))
> +				entry = huge_pte_clear_uffd_wp(entry);
> +
>   			set_huge_pte_at(dst, addr, dst_pte, entry);
>   			hugetlb_count_add(npages, dst);
>   		}

LGTM

Reviewed-by: David Hildenbrand <david@redhat.com>
Mika Penttilä April 14, 2023, 9:45 a.m. UTC | #2
On 14.4.2023 2.11, Peter Xu wrote:
> There're a bunch of things that were wrong:
> 
>    - Reading uffd-wp bit from a swap entry should use pte_swp_uffd_wp()
>      rather than huge_pte_uffd_wp().
> 
>    - When copying over a pte, we should drop uffd-wp bit when
>      !EVENT_FORK (aka, when !userfaultfd_wp(dst_vma)).
> 
>    - When doing early CoW for private hugetlb (e.g. when the parent page was
>      pinned), uffd-wp bit should be properly carried over if necessary.
> 
> No bug reported probably because most people do not even care about these
> corner cases, but they are still bugs and can be exposed by the recent unit
> tests introduced, so fix all of them in one shot.
> 
> Cc: linux-stable <stable@vger.kernel.org>
> Fixes: bc70fbf269fd ("mm/hugetlb: handle uffd-wp during fork()")
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   mm/hugetlb.c | 26 ++++++++++++++++----------
>   1 file changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f16b25b1a6b9..7320e64aacc6 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4953,11 +4953,15 @@ static bool is_hugetlb_entry_hwpoisoned(pte_t pte)
>   
>   static void
>   hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr,
> -		     struct folio *new_folio)
> +		      struct folio *new_folio, pte_t old)
>   {
> +	pte_t newpte = make_huge_pte(vma, &new_folio->page, 1);
> +
>   	__folio_mark_uptodate(new_folio);
>   	hugepage_add_new_anon_rmap(new_folio, vma, addr);
> -	set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, &new_folio->page, 1));
> +	if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old))
> +		newpte = huge_pte_mkuffd_wp(newpte);
> +	set_huge_pte_at(vma->vm_mm, addr, ptep, newpte);
>   	hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm);
>   	folio_set_hugetlb_migratable(new_folio);
>   }
> @@ -5032,14 +5036,11 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>   			 */
>   			;
>   		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) {
> -			bool uffd_wp = huge_pte_uffd_wp(entry);
> -
> -			if (!userfaultfd_wp(dst_vma) && uffd_wp)
> +			if (!userfaultfd_wp(dst_vma))
>   				entry = huge_pte_clear_uffd_wp(entry);
>   			set_huge_pte_at(dst, addr, dst_pte, entry);
>   		} else if (unlikely(is_hugetlb_entry_migration(entry))) {
>   			swp_entry_t swp_entry = pte_to_swp_entry(entry);
> -			bool uffd_wp = huge_pte_uffd_wp(entry);
>   
>   			if (!is_readable_migration_entry(swp_entry) && cow) {
>   				/*
> @@ -5049,11 +5050,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>   				swp_entry = make_readable_migration_entry(
>   							swp_offset(swp_entry));
>   				entry = swp_entry_to_pte(swp_entry);
> -				if (userfaultfd_wp(src_vma) && uffd_wp)
> -					entry = huge_pte_mkuffd_wp(entry);
> +				if (userfaultfd_wp(src_vma) &&
> +				    pte_swp_uffd_wp(entry))
> +					entry = pte_swp_mkuffd_wp(entry);


This looks interesting with pte_swp_uffd_wp and pte_swp_mkuffd_wp ?


>   				set_huge_pte_at(src, addr, src_pte, entry);
>   			}
> -			if (!userfaultfd_wp(dst_vma) && uffd_wp)
> +			if (!userfaultfd_wp(dst_vma))
>   				entry = huge_pte_clear_uffd_wp(entry);
>   			set_huge_pte_at(dst, addr, dst_pte, entry);
>   		} else if (unlikely(is_pte_marker(entry))) {
> @@ -5114,7 +5116,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>   					/* huge_ptep of dst_pte won't change as in child */
>   					goto again;
>   				}
> -				hugetlb_install_folio(dst_vma, dst_pte, addr, new_folio);
> +				hugetlb_install_folio(dst_vma, dst_pte, addr,
> +						      new_folio, src_pte_old);
>   				spin_unlock(src_ptl);
>   				spin_unlock(dst_ptl);
>   				continue;
> @@ -5132,6 +5135,9 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>   				entry = huge_pte_wrprotect(entry);
>   			}
>   
> +			if (!userfaultfd_wp(dst_vma))
> +				entry = huge_pte_clear_uffd_wp(entry);
> +
>   			set_huge_pte_at(dst, addr, dst_pte, entry);
>   			hugetlb_count_add(npages, dst);
>   		}


--Mika
Peter Xu April 14, 2023, 2:09 p.m. UTC | #3
On Fri, Apr 14, 2023 at 12:45:29PM +0300, Mika Penttilä wrote:
> >   		} else if (unlikely(is_hugetlb_entry_migration(entry))) {
> >   			swp_entry_t swp_entry = pte_to_swp_entry(entry);
> > -			bool uffd_wp = huge_pte_uffd_wp(entry);

[1]

> >   			if (!is_readable_migration_entry(swp_entry) && cow) {
> >   				/*
> > @@ -5049,11 +5050,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
> >   				swp_entry = make_readable_migration_entry(
> >   							swp_offset(swp_entry));
> >   				entry = swp_entry_to_pte(swp_entry);

[2]

> > -				if (userfaultfd_wp(src_vma) && uffd_wp)
> > -					entry = huge_pte_mkuffd_wp(entry);
> > +				if (userfaultfd_wp(src_vma) &&
> > +				    pte_swp_uffd_wp(entry))
> > +					entry = pte_swp_mkuffd_wp(entry);
> 
> 
> This looks interesting with pte_swp_uffd_wp and pte_swp_mkuffd_wp ?

Could you explain what do you mean?

I think these helpers are the right ones to use, as afaict hugetlb
migration should follow the same pte format with !hugetlb.  However, I
noticed I did it wrong when dropping the temp var - when at [1], "entry"
still points to the src entry, but at [2] it's already pointing to the
newly created one..  so I think I can't drop the var, a fixup should like:

===8<===
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 083aae35bff8..cd3a9d8f4b70 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5041,6 +5041,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
                        set_huge_pte_at(dst, addr, dst_pte, entry);
                } else if (unlikely(is_hugetlb_entry_migration(entry))) {
                        swp_entry_t swp_entry = pte_to_swp_entry(entry);
+                       bool uffd_wp = pte_swp_uffd_wp(entry);

                        if (!is_readable_migration_entry(swp_entry) && cow) {
                                /*
@@ -5050,8 +5051,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
                                swp_entry = make_readable_migration_entry(
                                                        swp_offset(swp_entry));
                                entry = swp_entry_to_pte(swp_entry);
-                               if (userfaultfd_wp(src_vma) &&
-                                   pte_swp_uffd_wp(entry))
+                               if (userfaultfd_wp(src_vma) && uffd_wp)
                                        entry = pte_swp_mkuffd_wp(entry);
                                set_huge_pte_at(src, addr, src_pte, entry);
===8<===

Besides, did I miss something else?

Thanks,
Mika Penttilä April 14, 2023, 2:23 p.m. UTC | #4
On 14.4.2023 17.09, Peter Xu wrote:
> On Fri, Apr 14, 2023 at 12:45:29PM +0300, Mika Penttilä wrote:
>>>    		} else if (unlikely(is_hugetlb_entry_migration(entry))) {
>>>    			swp_entry_t swp_entry = pte_to_swp_entry(entry);
>>> -			bool uffd_wp = huge_pte_uffd_wp(entry);
> 
> [1]
> 
>>>    			if (!is_readable_migration_entry(swp_entry) && cow) {
>>>    				/*
>>> @@ -5049,11 +5050,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>>>    				swp_entry = make_readable_migration_entry(
>>>    							swp_offset(swp_entry));
>>>    				entry = swp_entry_to_pte(swp_entry);
> 
> [2]
> 
>>> -				if (userfaultfd_wp(src_vma) && uffd_wp)
>>> -					entry = huge_pte_mkuffd_wp(entry);
>>> +				if (userfaultfd_wp(src_vma) &&
>>> +				    pte_swp_uffd_wp(entry))
>>> +					entry = pte_swp_mkuffd_wp(entry);
>>
>>
>> This looks interesting with pte_swp_uffd_wp and pte_swp_mkuffd_wp ?
> 
> Could you explain what do you mean?
> 

Yes like you noticed also you called pte_swp_mkuffd_wp(entry) iff 
pte_swp_uffd_wp(entry) which is of course a nop.

But the fixup not dropping the temp var should work.

> I think these helpers are the right ones to use, as afaict hugetlb
> migration should follow the same pte format with !hugetlb.  However, I
> noticed I did it wrong when dropping the temp var - when at [1], "entry"
> still points to the src entry, but at [2] it's already pointing to the
> newly created one..  so I think I can't drop the var, a fixup should like:
> 
> ===8<===
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 083aae35bff8..cd3a9d8f4b70 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5041,6 +5041,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>                          set_huge_pte_at(dst, addr, dst_pte, entry);
>                  } else if (unlikely(is_hugetlb_entry_migration(entry))) {
>                          swp_entry_t swp_entry = pte_to_swp_entry(entry);
> +                       bool uffd_wp = pte_swp_uffd_wp(entry);
> 
>                          if (!is_readable_migration_entry(swp_entry) && cow) {
>                                  /*
> @@ -5050,8 +5051,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>                                  swp_entry = make_readable_migration_entry(
>                                                          swp_offset(swp_entry));
>                                  entry = swp_entry_to_pte(swp_entry);
> -                               if (userfaultfd_wp(src_vma) &&
> -                                   pte_swp_uffd_wp(entry))
> +                               if (userfaultfd_wp(src_vma) && uffd_wp)
>                                          entry = pte_swp_mkuffd_wp(entry);
>                                  set_huge_pte_at(src, addr, src_pte, entry);
> ===8<===
> 
> Besides, did I miss something else?
> 
> Thanks,
> 

--Mika
Peter Xu April 14, 2023, 3:21 p.m. UTC | #5
On Fri, Apr 14, 2023 at 05:23:12PM +0300, Mika Penttilä wrote:
> But the fixup not dropping the temp var should work.

Ok I see.  I'll wait for a few more days for a respin.  Thanks,
Mike Kravetz April 14, 2023, 10:17 p.m. UTC | #6
On 04/13/23 19:11, Peter Xu wrote:
> There're a bunch of things that were wrong:
> 
>   - Reading uffd-wp bit from a swap entry should use pte_swp_uffd_wp()
>     rather than huge_pte_uffd_wp().

That was/is quite confusing to me at least.

> 
>   - When copying over a pte, we should drop uffd-wp bit when
>     !EVENT_FORK (aka, when !userfaultfd_wp(dst_vma)).
> 
>   - When doing early CoW for private hugetlb (e.g. when the parent page was
>     pinned), uffd-wp bit should be properly carried over if necessary.
> 
> No bug reported probably because most people do not even care about these
> corner cases, but they are still bugs and can be exposed by the recent unit
> tests introduced, so fix all of them in one shot.
> 
> Cc: linux-stable <stable@vger.kernel.org>
> Fixes: bc70fbf269fd ("mm/hugetlb: handle uffd-wp during fork()")
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  mm/hugetlb.c | 26 ++++++++++++++++----------
>  1 file changed, 16 insertions(+), 10 deletions(-)

No issues except losing information in pte entry as pointed out by Mika.
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f16b25b1a6b9..7320e64aacc6 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4953,11 +4953,15 @@  static bool is_hugetlb_entry_hwpoisoned(pte_t pte)
 
 static void
 hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr,
-		     struct folio *new_folio)
+		      struct folio *new_folio, pte_t old)
 {
+	pte_t newpte = make_huge_pte(vma, &new_folio->page, 1);
+
 	__folio_mark_uptodate(new_folio);
 	hugepage_add_new_anon_rmap(new_folio, vma, addr);
-	set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, &new_folio->page, 1));
+	if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old))
+		newpte = huge_pte_mkuffd_wp(newpte);
+	set_huge_pte_at(vma->vm_mm, addr, ptep, newpte);
 	hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm);
 	folio_set_hugetlb_migratable(new_folio);
 }
@@ -5032,14 +5036,11 @@  int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 			 */
 			;
 		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) {
-			bool uffd_wp = huge_pte_uffd_wp(entry);
-
-			if (!userfaultfd_wp(dst_vma) && uffd_wp)
+			if (!userfaultfd_wp(dst_vma))
 				entry = huge_pte_clear_uffd_wp(entry);
 			set_huge_pte_at(dst, addr, dst_pte, entry);
 		} else if (unlikely(is_hugetlb_entry_migration(entry))) {
 			swp_entry_t swp_entry = pte_to_swp_entry(entry);
-			bool uffd_wp = huge_pte_uffd_wp(entry);
 
 			if (!is_readable_migration_entry(swp_entry) && cow) {
 				/*
@@ -5049,11 +5050,12 @@  int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 				swp_entry = make_readable_migration_entry(
 							swp_offset(swp_entry));
 				entry = swp_entry_to_pte(swp_entry);
-				if (userfaultfd_wp(src_vma) && uffd_wp)
-					entry = huge_pte_mkuffd_wp(entry);
+				if (userfaultfd_wp(src_vma) &&
+				    pte_swp_uffd_wp(entry))
+					entry = pte_swp_mkuffd_wp(entry);
 				set_huge_pte_at(src, addr, src_pte, entry);
 			}
-			if (!userfaultfd_wp(dst_vma) && uffd_wp)
+			if (!userfaultfd_wp(dst_vma))
 				entry = huge_pte_clear_uffd_wp(entry);
 			set_huge_pte_at(dst, addr, dst_pte, entry);
 		} else if (unlikely(is_pte_marker(entry))) {
@@ -5114,7 +5116,8 @@  int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 					/* huge_ptep of dst_pte won't change as in child */
 					goto again;
 				}
-				hugetlb_install_folio(dst_vma, dst_pte, addr, new_folio);
+				hugetlb_install_folio(dst_vma, dst_pte, addr,
+						      new_folio, src_pte_old);
 				spin_unlock(src_ptl);
 				spin_unlock(dst_ptl);
 				continue;
@@ -5132,6 +5135,9 @@  int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 				entry = huge_pte_wrprotect(entry);
 			}
 
+			if (!userfaultfd_wp(dst_vma))
+				entry = huge_pte_clear_uffd_wp(entry);
+
 			set_huge_pte_at(dst, addr, dst_pte, entry);
 			hugetlb_count_add(npages, dst);
 		}