Message ID | 20230405142535.493854-2-david@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/userfaultfd: fix and cleanup for migration entries with uffd-wp | expand |
On Wed, Apr 05, 2023 at 04:25:34PM +0200, David Hildenbrand wrote: > Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb: > fix uffd-wp handling for migration entries in hugetlb_change_protection()") > similarly applies to THP. > > Setting/clearing uffd-wp on THP migration entries is not implemented > properly. Further, while removing migration PMDs considers the uffd-wp > bit, inserting migration PMDs does not consider the uffd-wp bit. > > We have to set/clear independently of the migration entry type in > change_huge_pmd() and properly copy the uffd-wp bit in > set_pmd_migration_entry(). > > Verified using a simple reproducer that triggers migration of a THP, that > the set_pmd_migration_entry() no longer loses the uffd-wp bit. > > Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration") > Cc: stable@vger.kernel.org > Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Thanks, one trivial nitpick: > --- > mm/huge_memory.c | 14 ++++++++++++-- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 032fb0ef9cd1..bdda4f426d58 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, > if (is_swap_pmd(*pmd)) { > swp_entry_t entry = pmd_to_swp_entry(*pmd); > struct page *page = pfn_swap_entry_to_page(entry); > + pmd_t newpmd; > > VM_BUG_ON(!is_pmd_migration_entry(*pmd)); > if (is_writable_migration_entry(entry)) { > - pmd_t newpmd; > /* > * A protection check is difficult so > * just be safe and disable write > @@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, > newpmd = pmd_swp_mksoft_dirty(newpmd); > if (pmd_swp_uffd_wp(*pmd)) > newpmd = pmd_swp_mkuffd_wp(newpmd); > - set_pmd_at(mm, addr, pmd, newpmd); > + } else { > + newpmd = *pmd; > } > + > + if (uffd_wp) > + newpmd = pmd_swp_mkuffd_wp(newpmd); > + else if (uffd_wp_resolve) > + newpmd = pmd_swp_clear_uffd_wp(newpmd); > + if (!pmd_same(*pmd, newpmd)) > + set_pmd_at(mm, addr, pmd, newpmd); > goto unlock; > } > #endif > @@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, > pmdswp = swp_entry_to_pmd(entry); > if (pmd_soft_dirty(pmdval)) > pmdswp = pmd_swp_mksoft_dirty(pmdswp); > + if (pmd_swp_uffd_wp(*pvmw->pmd)) > + pmdswp = pmd_swp_mkuffd_wp(pmdswp); I think it's fine to use *pmd, but maybe still better to use pmdval? I worry pmdp_invalidate()) can be something else in the future that may affect the bit. > set_pmd_at(mm, address, pvmw->pmd, pmdswp); > page_remove_rmap(page, vma, true); > put_page(page); > -- > 2.39.2 >
On 05.04.23 17:12, Peter Xu wrote: > On Wed, Apr 05, 2023 at 04:25:34PM +0200, David Hildenbrand wrote: >> Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb: >> fix uffd-wp handling for migration entries in hugetlb_change_protection()") >> similarly applies to THP. >> >> Setting/clearing uffd-wp on THP migration entries is not implemented >> properly. Further, while removing migration PMDs considers the uffd-wp >> bit, inserting migration PMDs does not consider the uffd-wp bit. >> >> We have to set/clear independently of the migration entry type in >> change_huge_pmd() and properly copy the uffd-wp bit in >> set_pmd_migration_entry(). >> >> Verified using a simple reproducer that triggers migration of a THP, that >> the set_pmd_migration_entry() no longer loses the uffd-wp bit. >> >> Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration") >> Cc: stable@vger.kernel.org >> Signed-off-by: David Hildenbrand <david@redhat.com> > > Reviewed-by: Peter Xu <peterx@redhat.com> > > Thanks, one trivial nitpick: > >> --- >> mm/huge_memory.c | 14 ++++++++++++-- >> 1 file changed, 12 insertions(+), 2 deletions(-) >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 032fb0ef9cd1..bdda4f426d58 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, >> if (is_swap_pmd(*pmd)) { >> swp_entry_t entry = pmd_to_swp_entry(*pmd); >> struct page *page = pfn_swap_entry_to_page(entry); >> + pmd_t newpmd; >> >> VM_BUG_ON(!is_pmd_migration_entry(*pmd)); >> if (is_writable_migration_entry(entry)) { >> - pmd_t newpmd; >> /* >> * A protection check is difficult so >> * just be safe and disable write >> @@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, >> newpmd = pmd_swp_mksoft_dirty(newpmd); >> if (pmd_swp_uffd_wp(*pmd)) >> newpmd = pmd_swp_mkuffd_wp(newpmd); >> - set_pmd_at(mm, addr, pmd, newpmd); >> + } else { >> + newpmd = *pmd; >> } >> + >> + if (uffd_wp) >> + newpmd = pmd_swp_mkuffd_wp(newpmd); >> + else if (uffd_wp_resolve) >> + newpmd = pmd_swp_clear_uffd_wp(newpmd); >> + if (!pmd_same(*pmd, newpmd)) >> + set_pmd_at(mm, addr, pmd, newpmd); >> goto unlock; >> } >> #endif >> @@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, >> pmdswp = swp_entry_to_pmd(entry); >> if (pmd_soft_dirty(pmdval)) >> pmdswp = pmd_swp_mksoft_dirty(pmdswp); >> + if (pmd_swp_uffd_wp(*pvmw->pmd)) >> + pmdswp = pmd_swp_mkuffd_wp(pmdswp); > > I think it's fine to use *pmd, but maybe still better to use pmdval? I > worry pmdp_invalidate()) can be something else in the future that may > affect the bit. Wondering how I ended up with that, I realized that it's actually wrong and might have worked by chance for my reproducer on x86. That should make it work: diff --git a/mm/huge_memory.c b/mm/huge_memory.c index f977c965fdad..fffc953fa6ea 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3257,7 +3257,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, pmdswp = swp_entry_to_pmd(entry); if (pmd_soft_dirty(pmdval)) pmdswp = pmd_swp_mksoft_dirty(pmdswp); - if (pmd_swp_uffd_wp(*pvmw->pmd)) + if (pmd_uffd_wp(pmdval)) pmdswp = pmd_swp_mkuffd_wp(pmdswp); set_pmd_at(mm, address, pvmw->pmd, pmdswp); page_remove_rmap(page, vma, true);
On Wed, Apr 05, 2023 at 05:17:31PM +0200, David Hildenbrand wrote: > On 05.04.23 17:12, Peter Xu wrote: > > On Wed, Apr 05, 2023 at 04:25:34PM +0200, David Hildenbrand wrote: > > > Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb: > > > fix uffd-wp handling for migration entries in hugetlb_change_protection()") > > > similarly applies to THP. > > > > > > Setting/clearing uffd-wp on THP migration entries is not implemented > > > properly. Further, while removing migration PMDs considers the uffd-wp > > > bit, inserting migration PMDs does not consider the uffd-wp bit. > > > > > > We have to set/clear independently of the migration entry type in > > > change_huge_pmd() and properly copy the uffd-wp bit in > > > set_pmd_migration_entry(). > > > > > > Verified using a simple reproducer that triggers migration of a THP, that > > > the set_pmd_migration_entry() no longer loses the uffd-wp bit. > > > > > > Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration") > > > Cc: stable@vger.kernel.org > > > Signed-off-by: David Hildenbrand <david@redhat.com> > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > Thanks, one trivial nitpick: > > > > > --- > > > mm/huge_memory.c | 14 ++++++++++++-- > > > 1 file changed, 12 insertions(+), 2 deletions(-) > > > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > > index 032fb0ef9cd1..bdda4f426d58 100644 > > > --- a/mm/huge_memory.c > > > +++ b/mm/huge_memory.c > > > @@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, > > > if (is_swap_pmd(*pmd)) { > > > swp_entry_t entry = pmd_to_swp_entry(*pmd); > > > struct page *page = pfn_swap_entry_to_page(entry); > > > + pmd_t newpmd; > > > VM_BUG_ON(!is_pmd_migration_entry(*pmd)); > > > if (is_writable_migration_entry(entry)) { > > > - pmd_t newpmd; > > > /* > > > * A protection check is difficult so > > > * just be safe and disable write > > > @@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, > > > newpmd = pmd_swp_mksoft_dirty(newpmd); > > > if (pmd_swp_uffd_wp(*pmd)) > > > newpmd = pmd_swp_mkuffd_wp(newpmd); > > > - set_pmd_at(mm, addr, pmd, newpmd); > > > + } else { > > > + newpmd = *pmd; > > > } > > > + > > > + if (uffd_wp) > > > + newpmd = pmd_swp_mkuffd_wp(newpmd); > > > + else if (uffd_wp_resolve) > > > + newpmd = pmd_swp_clear_uffd_wp(newpmd); > > > + if (!pmd_same(*pmd, newpmd)) > > > + set_pmd_at(mm, addr, pmd, newpmd); > > > goto unlock; > > > } > > > #endif > > > @@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, > > > pmdswp = swp_entry_to_pmd(entry); > > > if (pmd_soft_dirty(pmdval)) > > > pmdswp = pmd_swp_mksoft_dirty(pmdswp); > > > + if (pmd_swp_uffd_wp(*pvmw->pmd)) > > > + pmdswp = pmd_swp_mkuffd_wp(pmdswp); > > > > I think it's fine to use *pmd, but maybe still better to use pmdval? I > > worry pmdp_invalidate()) can be something else in the future that may > > affect the bit. > > Wondering how I ended up with that, I realized that it's actually > wrong and might have worked by chance for my reproducer on x86. > > That should make it work: > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index f977c965fdad..fffc953fa6ea 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -3257,7 +3257,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, > pmdswp = swp_entry_to_pmd(entry); > if (pmd_soft_dirty(pmdval)) > pmdswp = pmd_swp_mksoft_dirty(pmdswp); > - if (pmd_swp_uffd_wp(*pvmw->pmd)) > + if (pmd_uffd_wp(pmdval)) > pmdswp = pmd_swp_mkuffd_wp(pmdswp); > set_pmd_at(mm, address, pvmw->pmd, pmdswp); > page_remove_rmap(page, vma, true); I guess pmd_swp_uffd_wp() just reads the _USER bit 2 which is also set for a present pte, but then it sets swp uffd-wp always even if it was not set. Yes the change must be squashed in to be correct, with that, my R-b keeps. Thanks,
On 05.04.23 17:43, Peter Xu wrote: > On Wed, Apr 05, 2023 at 05:17:31PM +0200, David Hildenbrand wrote: >> On 05.04.23 17:12, Peter Xu wrote: >>> On Wed, Apr 05, 2023 at 04:25:34PM +0200, David Hildenbrand wrote: >>>> Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb: >>>> fix uffd-wp handling for migration entries in hugetlb_change_protection()") >>>> similarly applies to THP. >>>> >>>> Setting/clearing uffd-wp on THP migration entries is not implemented >>>> properly. Further, while removing migration PMDs considers the uffd-wp >>>> bit, inserting migration PMDs does not consider the uffd-wp bit. >>>> >>>> We have to set/clear independently of the migration entry type in >>>> change_huge_pmd() and properly copy the uffd-wp bit in >>>> set_pmd_migration_entry(). >>>> >>>> Verified using a simple reproducer that triggers migration of a THP, that >>>> the set_pmd_migration_entry() no longer loses the uffd-wp bit. >>>> >>>> Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration") >>>> Cc: stable@vger.kernel.org >>>> Signed-off-by: David Hildenbrand <david@redhat.com> >>> >>> Reviewed-by: Peter Xu <peterx@redhat.com> >>> >>> Thanks, one trivial nitpick: >>> >>>> --- >>>> mm/huge_memory.c | 14 ++++++++++++-- >>>> 1 file changed, 12 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>> index 032fb0ef9cd1..bdda4f426d58 100644 >>>> --- a/mm/huge_memory.c >>>> +++ b/mm/huge_memory.c >>>> @@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, >>>> if (is_swap_pmd(*pmd)) { >>>> swp_entry_t entry = pmd_to_swp_entry(*pmd); >>>> struct page *page = pfn_swap_entry_to_page(entry); >>>> + pmd_t newpmd; >>>> VM_BUG_ON(!is_pmd_migration_entry(*pmd)); >>>> if (is_writable_migration_entry(entry)) { >>>> - pmd_t newpmd; >>>> /* >>>> * A protection check is difficult so >>>> * just be safe and disable write >>>> @@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, >>>> newpmd = pmd_swp_mksoft_dirty(newpmd); >>>> if (pmd_swp_uffd_wp(*pmd)) >>>> newpmd = pmd_swp_mkuffd_wp(newpmd); >>>> - set_pmd_at(mm, addr, pmd, newpmd); >>>> + } else { >>>> + newpmd = *pmd; >>>> } >>>> + >>>> + if (uffd_wp) >>>> + newpmd = pmd_swp_mkuffd_wp(newpmd); >>>> + else if (uffd_wp_resolve) >>>> + newpmd = pmd_swp_clear_uffd_wp(newpmd); >>>> + if (!pmd_same(*pmd, newpmd)) >>>> + set_pmd_at(mm, addr, pmd, newpmd); >>>> goto unlock; >>>> } >>>> #endif >>>> @@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, >>>> pmdswp = swp_entry_to_pmd(entry); >>>> if (pmd_soft_dirty(pmdval)) >>>> pmdswp = pmd_swp_mksoft_dirty(pmdswp); >>>> + if (pmd_swp_uffd_wp(*pvmw->pmd)) >>>> + pmdswp = pmd_swp_mkuffd_wp(pmdswp); >>> >>> I think it's fine to use *pmd, but maybe still better to use pmdval? I >>> worry pmdp_invalidate()) can be something else in the future that may >>> affect the bit. >> >> Wondering how I ended up with that, I realized that it's actually >> wrong and might have worked by chance for my reproducer on x86. >> >> That should make it work: >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index f977c965fdad..fffc953fa6ea 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -3257,7 +3257,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, >> pmdswp = swp_entry_to_pmd(entry); >> if (pmd_soft_dirty(pmdval)) >> pmdswp = pmd_swp_mksoft_dirty(pmdswp); >> - if (pmd_swp_uffd_wp(*pvmw->pmd)) >> + if (pmd_uffd_wp(pmdval)) >> pmdswp = pmd_swp_mkuffd_wp(pmdswp); >> set_pmd_at(mm, address, pvmw->pmd, pmdswp); >> page_remove_rmap(page, vma, true); > > I guess pmd_swp_uffd_wp() just reads the _USER bit 2 which is also set for > a present pte, but then it sets swp uffd-wp always even if it was not set. > Yes. I modified the reproducer to migrate without uffd-wp first and we suddenly gain a uffd-wp bit. > Yes the change must be squashed in to be correct, with that, my R-b keeps. Thanks, I will resend later.
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 032fb0ef9cd1..bdda4f426d58 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, if (is_swap_pmd(*pmd)) { swp_entry_t entry = pmd_to_swp_entry(*pmd); struct page *page = pfn_swap_entry_to_page(entry); + pmd_t newpmd; VM_BUG_ON(!is_pmd_migration_entry(*pmd)); if (is_writable_migration_entry(entry)) { - pmd_t newpmd; /* * A protection check is difficult so * just be safe and disable write @@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, newpmd = pmd_swp_mksoft_dirty(newpmd); if (pmd_swp_uffd_wp(*pmd)) newpmd = pmd_swp_mkuffd_wp(newpmd); - set_pmd_at(mm, addr, pmd, newpmd); + } else { + newpmd = *pmd; } + + if (uffd_wp) + newpmd = pmd_swp_mkuffd_wp(newpmd); + else if (uffd_wp_resolve) + newpmd = pmd_swp_clear_uffd_wp(newpmd); + if (!pmd_same(*pmd, newpmd)) + set_pmd_at(mm, addr, pmd, newpmd); goto unlock; } #endif @@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, pmdswp = swp_entry_to_pmd(entry); if (pmd_soft_dirty(pmdval)) pmdswp = pmd_swp_mksoft_dirty(pmdswp); + if (pmd_swp_uffd_wp(*pvmw->pmd)) + pmdswp = pmd_swp_mkuffd_wp(pmdswp); set_pmd_at(mm, address, pvmw->pmd, pmdswp); page_remove_rmap(page, vma, true); put_page(page);
Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb: fix uffd-wp handling for migration entries in hugetlb_change_protection()") similarly applies to THP. Setting/clearing uffd-wp on THP migration entries is not implemented properly. Further, while removing migration PMDs considers the uffd-wp bit, inserting migration PMDs does not consider the uffd-wp bit. We have to set/clear independently of the migration entry type in change_huge_pmd() and properly copy the uffd-wp bit in set_pmd_migration_entry(). Verified using a simple reproducer that triggers migration of a THP, that the set_pmd_migration_entry() no longer loses the uffd-wp bit. Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration") Cc: stable@vger.kernel.org Signed-off-by: David Hildenbrand <david@redhat.com> --- mm/huge_memory.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)