Message ID | 20230413231120.544685-2-peterx@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/hugetlb: More fixes around uffd-wp vs fork() / RO pins | expand |
On 14.04.23 01:11, Peter Xu wrote: > There're a bunch of things that were wrong: > > - Reading uffd-wp bit from a swap entry should use pte_swp_uffd_wp() > rather than huge_pte_uffd_wp(). > > - When copying over a pte, we should drop uffd-wp bit when > !EVENT_FORK (aka, when !userfaultfd_wp(dst_vma)). > > - When doing early CoW for private hugetlb (e.g. when the parent page was > pinned), uffd-wp bit should be properly carried over if necessary. > > No bug reported probably because most people do not even care about these > corner cases, but they are still bugs and can be exposed by the recent unit > tests introduced, so fix all of them in one shot. > > Cc: linux-stable <stable@vger.kernel.org> > Fixes: bc70fbf269fd ("mm/hugetlb: handle uffd-wp during fork()") > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > mm/hugetlb.c | 26 ++++++++++++++++---------- > 1 file changed, 16 insertions(+), 10 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index f16b25b1a6b9..7320e64aacc6 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4953,11 +4953,15 @@ static bool is_hugetlb_entry_hwpoisoned(pte_t pte) > > static void > hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr, > - struct folio *new_folio) > + struct folio *new_folio, pte_t old) > { Nit: The function now expects old to be !swap_pte. Which works perfectly fine with existing code -- the function name is a bit generic and misleading, unfortunately. IMHO, instead of factoring that functionality out to desperately try keeping copy_hugetlb_page_range() somewhat readable, we should just have factored out the complete copy+replace into a copy_hugetlb_page() function -- similar to the ordinary page handling -- which would have made copy_hugetlb_page_range() more readable eventually. Anyhow, unrelated. > + pte_t newpte = make_huge_pte(vma, &new_folio->page, 1); > + > __folio_mark_uptodate(new_folio); > hugepage_add_new_anon_rmap(new_folio, vma, addr); > - set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, &new_folio->page, 1)); > + if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old)) > + newpte = huge_pte_mkuffd_wp(newpte); > + set_huge_pte_at(vma->vm_mm, addr, ptep, newpte); > hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm); > folio_set_hugetlb_migratable(new_folio); > } > @@ -5032,14 +5036,11 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > */ > ; > } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) { > - bool uffd_wp = huge_pte_uffd_wp(entry); > - > - if (!userfaultfd_wp(dst_vma) && uffd_wp) > + if (!userfaultfd_wp(dst_vma)) > entry = huge_pte_clear_uffd_wp(entry); > set_huge_pte_at(dst, addr, dst_pte, entry); > } else if (unlikely(is_hugetlb_entry_migration(entry))) { > swp_entry_t swp_entry = pte_to_swp_entry(entry); > - bool uffd_wp = huge_pte_uffd_wp(entry); > > if (!is_readable_migration_entry(swp_entry) && cow) { > /* > @@ -5049,11 +5050,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > swp_entry = make_readable_migration_entry( > swp_offset(swp_entry)); > entry = swp_entry_to_pte(swp_entry); > - if (userfaultfd_wp(src_vma) && uffd_wp) > - entry = huge_pte_mkuffd_wp(entry); > + if (userfaultfd_wp(src_vma) && > + pte_swp_uffd_wp(entry)) > + entry = pte_swp_mkuffd_wp(entry); > set_huge_pte_at(src, addr, src_pte, entry); > } > - if (!userfaultfd_wp(dst_vma) && uffd_wp) > + if (!userfaultfd_wp(dst_vma)) > entry = huge_pte_clear_uffd_wp(entry); > set_huge_pte_at(dst, addr, dst_pte, entry); > } else if (unlikely(is_pte_marker(entry))) { > @@ -5114,7 +5116,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > /* huge_ptep of dst_pte won't change as in child */ > goto again; > } > - hugetlb_install_folio(dst_vma, dst_pte, addr, new_folio); > + hugetlb_install_folio(dst_vma, dst_pte, addr, > + new_folio, src_pte_old); > spin_unlock(src_ptl); > spin_unlock(dst_ptl); > continue; > @@ -5132,6 +5135,9 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > entry = huge_pte_wrprotect(entry); > } > > + if (!userfaultfd_wp(dst_vma)) > + entry = huge_pte_clear_uffd_wp(entry); > + > set_huge_pte_at(dst, addr, dst_pte, entry); > hugetlb_count_add(npages, dst); > } LGTM Reviewed-by: David Hildenbrand <david@redhat.com>
On 14.4.2023 2.11, Peter Xu wrote: > There're a bunch of things that were wrong: > > - Reading uffd-wp bit from a swap entry should use pte_swp_uffd_wp() > rather than huge_pte_uffd_wp(). > > - When copying over a pte, we should drop uffd-wp bit when > !EVENT_FORK (aka, when !userfaultfd_wp(dst_vma)). > > - When doing early CoW for private hugetlb (e.g. when the parent page was > pinned), uffd-wp bit should be properly carried over if necessary. > > No bug reported probably because most people do not even care about these > corner cases, but they are still bugs and can be exposed by the recent unit > tests introduced, so fix all of them in one shot. > > Cc: linux-stable <stable@vger.kernel.org> > Fixes: bc70fbf269fd ("mm/hugetlb: handle uffd-wp during fork()") > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > mm/hugetlb.c | 26 ++++++++++++++++---------- > 1 file changed, 16 insertions(+), 10 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index f16b25b1a6b9..7320e64aacc6 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4953,11 +4953,15 @@ static bool is_hugetlb_entry_hwpoisoned(pte_t pte) > > static void > hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr, > - struct folio *new_folio) > + struct folio *new_folio, pte_t old) > { > + pte_t newpte = make_huge_pte(vma, &new_folio->page, 1); > + > __folio_mark_uptodate(new_folio); > hugepage_add_new_anon_rmap(new_folio, vma, addr); > - set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, &new_folio->page, 1)); > + if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old)) > + newpte = huge_pte_mkuffd_wp(newpte); > + set_huge_pte_at(vma->vm_mm, addr, ptep, newpte); > hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm); > folio_set_hugetlb_migratable(new_folio); > } > @@ -5032,14 +5036,11 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > */ > ; > } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) { > - bool uffd_wp = huge_pte_uffd_wp(entry); > - > - if (!userfaultfd_wp(dst_vma) && uffd_wp) > + if (!userfaultfd_wp(dst_vma)) > entry = huge_pte_clear_uffd_wp(entry); > set_huge_pte_at(dst, addr, dst_pte, entry); > } else if (unlikely(is_hugetlb_entry_migration(entry))) { > swp_entry_t swp_entry = pte_to_swp_entry(entry); > - bool uffd_wp = huge_pte_uffd_wp(entry); > > if (!is_readable_migration_entry(swp_entry) && cow) { > /* > @@ -5049,11 +5050,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > swp_entry = make_readable_migration_entry( > swp_offset(swp_entry)); > entry = swp_entry_to_pte(swp_entry); > - if (userfaultfd_wp(src_vma) && uffd_wp) > - entry = huge_pte_mkuffd_wp(entry); > + if (userfaultfd_wp(src_vma) && > + pte_swp_uffd_wp(entry)) > + entry = pte_swp_mkuffd_wp(entry); This looks interesting with pte_swp_uffd_wp and pte_swp_mkuffd_wp ? > set_huge_pte_at(src, addr, src_pte, entry); > } > - if (!userfaultfd_wp(dst_vma) && uffd_wp) > + if (!userfaultfd_wp(dst_vma)) > entry = huge_pte_clear_uffd_wp(entry); > set_huge_pte_at(dst, addr, dst_pte, entry); > } else if (unlikely(is_pte_marker(entry))) { > @@ -5114,7 +5116,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > /* huge_ptep of dst_pte won't change as in child */ > goto again; > } > - hugetlb_install_folio(dst_vma, dst_pte, addr, new_folio); > + hugetlb_install_folio(dst_vma, dst_pte, addr, > + new_folio, src_pte_old); > spin_unlock(src_ptl); > spin_unlock(dst_ptl); > continue; > @@ -5132,6 +5135,9 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > entry = huge_pte_wrprotect(entry); > } > > + if (!userfaultfd_wp(dst_vma)) > + entry = huge_pte_clear_uffd_wp(entry); > + > set_huge_pte_at(dst, addr, dst_pte, entry); > hugetlb_count_add(npages, dst); > } --Mika
On Fri, Apr 14, 2023 at 12:45:29PM +0300, Mika Penttilä wrote: > > } else if (unlikely(is_hugetlb_entry_migration(entry))) { > > swp_entry_t swp_entry = pte_to_swp_entry(entry); > > - bool uffd_wp = huge_pte_uffd_wp(entry); [1] > > if (!is_readable_migration_entry(swp_entry) && cow) { > > /* > > @@ -5049,11 +5050,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > > swp_entry = make_readable_migration_entry( > > swp_offset(swp_entry)); > > entry = swp_entry_to_pte(swp_entry); [2] > > - if (userfaultfd_wp(src_vma) && uffd_wp) > > - entry = huge_pte_mkuffd_wp(entry); > > + if (userfaultfd_wp(src_vma) && > > + pte_swp_uffd_wp(entry)) > > + entry = pte_swp_mkuffd_wp(entry); > > > This looks interesting with pte_swp_uffd_wp and pte_swp_mkuffd_wp ? Could you explain what do you mean? I think these helpers are the right ones to use, as afaict hugetlb migration should follow the same pte format with !hugetlb. However, I noticed I did it wrong when dropping the temp var - when at [1], "entry" still points to the src entry, but at [2] it's already pointing to the newly created one.. so I think I can't drop the var, a fixup should like: ===8<=== diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 083aae35bff8..cd3a9d8f4b70 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5041,6 +5041,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, set_huge_pte_at(dst, addr, dst_pte, entry); } else if (unlikely(is_hugetlb_entry_migration(entry))) { swp_entry_t swp_entry = pte_to_swp_entry(entry); + bool uffd_wp = pte_swp_uffd_wp(entry); if (!is_readable_migration_entry(swp_entry) && cow) { /* @@ -5050,8 +5051,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, swp_entry = make_readable_migration_entry( swp_offset(swp_entry)); entry = swp_entry_to_pte(swp_entry); - if (userfaultfd_wp(src_vma) && - pte_swp_uffd_wp(entry)) + if (userfaultfd_wp(src_vma) && uffd_wp) entry = pte_swp_mkuffd_wp(entry); set_huge_pte_at(src, addr, src_pte, entry); ===8<=== Besides, did I miss something else? Thanks,
On 14.4.2023 17.09, Peter Xu wrote: > On Fri, Apr 14, 2023 at 12:45:29PM +0300, Mika Penttilä wrote: >>> } else if (unlikely(is_hugetlb_entry_migration(entry))) { >>> swp_entry_t swp_entry = pte_to_swp_entry(entry); >>> - bool uffd_wp = huge_pte_uffd_wp(entry); > > [1] > >>> if (!is_readable_migration_entry(swp_entry) && cow) { >>> /* >>> @@ -5049,11 +5050,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, >>> swp_entry = make_readable_migration_entry( >>> swp_offset(swp_entry)); >>> entry = swp_entry_to_pte(swp_entry); > > [2] > >>> - if (userfaultfd_wp(src_vma) && uffd_wp) >>> - entry = huge_pte_mkuffd_wp(entry); >>> + if (userfaultfd_wp(src_vma) && >>> + pte_swp_uffd_wp(entry)) >>> + entry = pte_swp_mkuffd_wp(entry); >> >> >> This looks interesting with pte_swp_uffd_wp and pte_swp_mkuffd_wp ? > > Could you explain what do you mean? > Yes like you noticed also you called pte_swp_mkuffd_wp(entry) iff pte_swp_uffd_wp(entry) which is of course a nop. But the fixup not dropping the temp var should work. > I think these helpers are the right ones to use, as afaict hugetlb > migration should follow the same pte format with !hugetlb. However, I > noticed I did it wrong when dropping the temp var - when at [1], "entry" > still points to the src entry, but at [2] it's already pointing to the > newly created one.. so I think I can't drop the var, a fixup should like: > > ===8<=== > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 083aae35bff8..cd3a9d8f4b70 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5041,6 +5041,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > set_huge_pte_at(dst, addr, dst_pte, entry); > } else if (unlikely(is_hugetlb_entry_migration(entry))) { > swp_entry_t swp_entry = pte_to_swp_entry(entry); > + bool uffd_wp = pte_swp_uffd_wp(entry); > > if (!is_readable_migration_entry(swp_entry) && cow) { > /* > @@ -5050,8 +5051,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > swp_entry = make_readable_migration_entry( > swp_offset(swp_entry)); > entry = swp_entry_to_pte(swp_entry); > - if (userfaultfd_wp(src_vma) && > - pte_swp_uffd_wp(entry)) > + if (userfaultfd_wp(src_vma) && uffd_wp) > entry = pte_swp_mkuffd_wp(entry); > set_huge_pte_at(src, addr, src_pte, entry); > ===8<=== > > Besides, did I miss something else? > > Thanks, > --Mika
On Fri, Apr 14, 2023 at 05:23:12PM +0300, Mika Penttilä wrote:
> But the fixup not dropping the temp var should work.
Ok I see. I'll wait for a few more days for a respin. Thanks,
On 04/13/23 19:11, Peter Xu wrote: > There're a bunch of things that were wrong: > > - Reading uffd-wp bit from a swap entry should use pte_swp_uffd_wp() > rather than huge_pte_uffd_wp(). That was/is quite confusing to me at least. > > - When copying over a pte, we should drop uffd-wp bit when > !EVENT_FORK (aka, when !userfaultfd_wp(dst_vma)). > > - When doing early CoW for private hugetlb (e.g. when the parent page was > pinned), uffd-wp bit should be properly carried over if necessary. > > No bug reported probably because most people do not even care about these > corner cases, but they are still bugs and can be exposed by the recent unit > tests introduced, so fix all of them in one shot. > > Cc: linux-stable <stable@vger.kernel.org> > Fixes: bc70fbf269fd ("mm/hugetlb: handle uffd-wp during fork()") > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > mm/hugetlb.c | 26 ++++++++++++++++---------- > 1 file changed, 16 insertions(+), 10 deletions(-) No issues except losing information in pte entry as pointed out by Mika.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f16b25b1a6b9..7320e64aacc6 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4953,11 +4953,15 @@ static bool is_hugetlb_entry_hwpoisoned(pte_t pte) static void hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr, - struct folio *new_folio) + struct folio *new_folio, pte_t old) { + pte_t newpte = make_huge_pte(vma, &new_folio->page, 1); + __folio_mark_uptodate(new_folio); hugepage_add_new_anon_rmap(new_folio, vma, addr); - set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, &new_folio->page, 1)); + if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old)) + newpte = huge_pte_mkuffd_wp(newpte); + set_huge_pte_at(vma->vm_mm, addr, ptep, newpte); hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm); folio_set_hugetlb_migratable(new_folio); } @@ -5032,14 +5036,11 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, */ ; } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) { - bool uffd_wp = huge_pte_uffd_wp(entry); - - if (!userfaultfd_wp(dst_vma) && uffd_wp) + if (!userfaultfd_wp(dst_vma)) entry = huge_pte_clear_uffd_wp(entry); set_huge_pte_at(dst, addr, dst_pte, entry); } else if (unlikely(is_hugetlb_entry_migration(entry))) { swp_entry_t swp_entry = pte_to_swp_entry(entry); - bool uffd_wp = huge_pte_uffd_wp(entry); if (!is_readable_migration_entry(swp_entry) && cow) { /* @@ -5049,11 +5050,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, swp_entry = make_readable_migration_entry( swp_offset(swp_entry)); entry = swp_entry_to_pte(swp_entry); - if (userfaultfd_wp(src_vma) && uffd_wp) - entry = huge_pte_mkuffd_wp(entry); + if (userfaultfd_wp(src_vma) && + pte_swp_uffd_wp(entry)) + entry = pte_swp_mkuffd_wp(entry); set_huge_pte_at(src, addr, src_pte, entry); } - if (!userfaultfd_wp(dst_vma) && uffd_wp) + if (!userfaultfd_wp(dst_vma)) entry = huge_pte_clear_uffd_wp(entry); set_huge_pte_at(dst, addr, dst_pte, entry); } else if (unlikely(is_pte_marker(entry))) { @@ -5114,7 +5116,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, /* huge_ptep of dst_pte won't change as in child */ goto again; } - hugetlb_install_folio(dst_vma, dst_pte, addr, new_folio); + hugetlb_install_folio(dst_vma, dst_pte, addr, + new_folio, src_pte_old); spin_unlock(src_ptl); spin_unlock(dst_ptl); continue; @@ -5132,6 +5135,9 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, entry = huge_pte_wrprotect(entry); } + if (!userfaultfd_wp(dst_vma)) + entry = huge_pte_clear_uffd_wp(entry); + set_huge_pte_at(dst, addr, dst_pte, entry); hugetlb_count_add(npages, dst); }
There're a bunch of things that were wrong: - Reading uffd-wp bit from a swap entry should use pte_swp_uffd_wp() rather than huge_pte_uffd_wp(). - When copying over a pte, we should drop uffd-wp bit when !EVENT_FORK (aka, when !userfaultfd_wp(dst_vma)). - When doing early CoW for private hugetlb (e.g. when the parent page was pinned), uffd-wp bit should be properly carried over if necessary. No bug reported probably because most people do not even care about these corner cases, but they are still bugs and can be exposed by the recent unit tests introduced, so fix all of them in one shot. Cc: linux-stable <stable@vger.kernel.org> Fixes: bc70fbf269fd ("mm/hugetlb: handle uffd-wp during fork()") Signed-off-by: Peter Xu <peterx@redhat.com> --- mm/hugetlb.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-)