diff mbox series

[v1,1/2] mm/userfaultfd: fix uffd-wp handling for THP migration entries

Message ID 20230405142535.493854-2-david@redhat.com (mailing list archive)
State New
Headers show
Series mm/userfaultfd: fix and cleanup for migration entries with uffd-wp | expand

Commit Message

David Hildenbrand April 5, 2023, 2:25 p.m. UTC
Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
fix uffd-wp handling for migration entries in hugetlb_change_protection()")
similarly applies to THP.

Setting/clearing uffd-wp on THP migration entries is not implemented
properly. Further, while removing migration PMDs considers the uffd-wp
bit, inserting migration PMDs does not consider the uffd-wp bit.

We have to set/clear independently of the migration entry type in
change_huge_pmd() and properly copy the uffd-wp bit in
set_pmd_migration_entry().

Verified using a simple reproducer that triggers migration of a THP, that
the set_pmd_migration_entry() no longer loses the uffd-wp bit.

Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
Cc: stable@vger.kernel.org
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/huge_memory.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Comments

Peter Xu April 5, 2023, 3:12 p.m. UTC | #1
On Wed, Apr 05, 2023 at 04:25:34PM +0200, David Hildenbrand wrote:
> Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
> fix uffd-wp handling for migration entries in hugetlb_change_protection()")
> similarly applies to THP.
> 
> Setting/clearing uffd-wp on THP migration entries is not implemented
> properly. Further, while removing migration PMDs considers the uffd-wp
> bit, inserting migration PMDs does not consider the uffd-wp bit.
> 
> We have to set/clear independently of the migration entry type in
> change_huge_pmd() and properly copy the uffd-wp bit in
> set_pmd_migration_entry().
> 
> Verified using a simple reproducer that triggers migration of a THP, that
> the set_pmd_migration_entry() no longer loses the uffd-wp bit.
> 
> Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
> Cc: stable@vger.kernel.org
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks, one trivial nitpick:

> ---
>  mm/huge_memory.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 032fb0ef9cd1..bdda4f426d58 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  	if (is_swap_pmd(*pmd)) {
>  		swp_entry_t entry = pmd_to_swp_entry(*pmd);
>  		struct page *page = pfn_swap_entry_to_page(entry);
> +		pmd_t newpmd;
>  
>  		VM_BUG_ON(!is_pmd_migration_entry(*pmd));
>  		if (is_writable_migration_entry(entry)) {
> -			pmd_t newpmd;
>  			/*
>  			 * A protection check is difficult so
>  			 * just be safe and disable write
> @@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  				newpmd = pmd_swp_mksoft_dirty(newpmd);
>  			if (pmd_swp_uffd_wp(*pmd))
>  				newpmd = pmd_swp_mkuffd_wp(newpmd);
> -			set_pmd_at(mm, addr, pmd, newpmd);
> +		} else {
> +			newpmd = *pmd;
>  		}
> +
> +		if (uffd_wp)
> +			newpmd = pmd_swp_mkuffd_wp(newpmd);
> +		else if (uffd_wp_resolve)
> +			newpmd = pmd_swp_clear_uffd_wp(newpmd);
> +		if (!pmd_same(*pmd, newpmd))
> +			set_pmd_at(mm, addr, pmd, newpmd);
>  		goto unlock;
>  	}
>  #endif
> @@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
>  	pmdswp = swp_entry_to_pmd(entry);
>  	if (pmd_soft_dirty(pmdval))
>  		pmdswp = pmd_swp_mksoft_dirty(pmdswp);
> +	if (pmd_swp_uffd_wp(*pvmw->pmd))
> +		pmdswp = pmd_swp_mkuffd_wp(pmdswp);

I think it's fine to use *pmd, but maybe still better to use pmdval?  I
worry pmdp_invalidate()) can be something else in the future that may
affect the bit.

>  	set_pmd_at(mm, address, pvmw->pmd, pmdswp);
>  	page_remove_rmap(page, vma, true);
>  	put_page(page);
> -- 
> 2.39.2
>
David Hildenbrand April 5, 2023, 3:17 p.m. UTC | #2
On 05.04.23 17:12, Peter Xu wrote:
> On Wed, Apr 05, 2023 at 04:25:34PM +0200, David Hildenbrand wrote:
>> Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
>> fix uffd-wp handling for migration entries in hugetlb_change_protection()")
>> similarly applies to THP.
>>
>> Setting/clearing uffd-wp on THP migration entries is not implemented
>> properly. Further, while removing migration PMDs considers the uffd-wp
>> bit, inserting migration PMDs does not consider the uffd-wp bit.
>>
>> We have to set/clear independently of the migration entry type in
>> change_huge_pmd() and properly copy the uffd-wp bit in
>> set_pmd_migration_entry().
>>
>> Verified using a simple reproducer that triggers migration of a THP, that
>> the set_pmd_migration_entry() no longer loses the uffd-wp bit.
>>
>> Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> 
> Thanks, one trivial nitpick:
> 
>> ---
>>   mm/huge_memory.c | 14 ++++++++++++--
>>   1 file changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 032fb0ef9cd1..bdda4f426d58 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>   	if (is_swap_pmd(*pmd)) {
>>   		swp_entry_t entry = pmd_to_swp_entry(*pmd);
>>   		struct page *page = pfn_swap_entry_to_page(entry);
>> +		pmd_t newpmd;
>>   
>>   		VM_BUG_ON(!is_pmd_migration_entry(*pmd));
>>   		if (is_writable_migration_entry(entry)) {
>> -			pmd_t newpmd;
>>   			/*
>>   			 * A protection check is difficult so
>>   			 * just be safe and disable write
>> @@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>   				newpmd = pmd_swp_mksoft_dirty(newpmd);
>>   			if (pmd_swp_uffd_wp(*pmd))
>>   				newpmd = pmd_swp_mkuffd_wp(newpmd);
>> -			set_pmd_at(mm, addr, pmd, newpmd);
>> +		} else {
>> +			newpmd = *pmd;
>>   		}
>> +
>> +		if (uffd_wp)
>> +			newpmd = pmd_swp_mkuffd_wp(newpmd);
>> +		else if (uffd_wp_resolve)
>> +			newpmd = pmd_swp_clear_uffd_wp(newpmd);
>> +		if (!pmd_same(*pmd, newpmd))
>> +			set_pmd_at(mm, addr, pmd, newpmd);
>>   		goto unlock;
>>   	}
>>   #endif
>> @@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
>>   	pmdswp = swp_entry_to_pmd(entry);
>>   	if (pmd_soft_dirty(pmdval))
>>   		pmdswp = pmd_swp_mksoft_dirty(pmdswp);
>> +	if (pmd_swp_uffd_wp(*pvmw->pmd))
>> +		pmdswp = pmd_swp_mkuffd_wp(pmdswp);
> 
> I think it's fine to use *pmd, but maybe still better to use pmdval?  I
> worry pmdp_invalidate()) can be something else in the future that may
> affect the bit.

Wondering how I ended up with that, I realized that it's actually
wrong and might have worked by chance for my reproducer on x86.

That should make it work:

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f977c965fdad..fffc953fa6ea 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3257,7 +3257,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
         pmdswp = swp_entry_to_pmd(entry);
         if (pmd_soft_dirty(pmdval))
                 pmdswp = pmd_swp_mksoft_dirty(pmdswp);
-       if (pmd_swp_uffd_wp(*pvmw->pmd))
+       if (pmd_uffd_wp(pmdval))
                 pmdswp = pmd_swp_mkuffd_wp(pmdswp);
         set_pmd_at(mm, address, pvmw->pmd, pmdswp);
         page_remove_rmap(page, vma, true);
Peter Xu April 5, 2023, 3:43 p.m. UTC | #3
On Wed, Apr 05, 2023 at 05:17:31PM +0200, David Hildenbrand wrote:
> On 05.04.23 17:12, Peter Xu wrote:
> > On Wed, Apr 05, 2023 at 04:25:34PM +0200, David Hildenbrand wrote:
> > > Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
> > > fix uffd-wp handling for migration entries in hugetlb_change_protection()")
> > > similarly applies to THP.
> > > 
> > > Setting/clearing uffd-wp on THP migration entries is not implemented
> > > properly. Further, while removing migration PMDs considers the uffd-wp
> > > bit, inserting migration PMDs does not consider the uffd-wp bit.
> > > 
> > > We have to set/clear independently of the migration entry type in
> > > change_huge_pmd() and properly copy the uffd-wp bit in
> > > set_pmd_migration_entry().
> > > 
> > > Verified using a simple reproducer that triggers migration of a THP, that
> > > the set_pmd_migration_entry() no longer loses the uffd-wp bit.
> > > 
> > > Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > 
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> > 
> > Thanks, one trivial nitpick:
> > 
> > > ---
> > >   mm/huge_memory.c | 14 ++++++++++++--
> > >   1 file changed, 12 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index 032fb0ef9cd1..bdda4f426d58 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> > >   	if (is_swap_pmd(*pmd)) {
> > >   		swp_entry_t entry = pmd_to_swp_entry(*pmd);
> > >   		struct page *page = pfn_swap_entry_to_page(entry);
> > > +		pmd_t newpmd;
> > >   		VM_BUG_ON(!is_pmd_migration_entry(*pmd));
> > >   		if (is_writable_migration_entry(entry)) {
> > > -			pmd_t newpmd;
> > >   			/*
> > >   			 * A protection check is difficult so
> > >   			 * just be safe and disable write
> > > @@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> > >   				newpmd = pmd_swp_mksoft_dirty(newpmd);
> > >   			if (pmd_swp_uffd_wp(*pmd))
> > >   				newpmd = pmd_swp_mkuffd_wp(newpmd);
> > > -			set_pmd_at(mm, addr, pmd, newpmd);
> > > +		} else {
> > > +			newpmd = *pmd;
> > >   		}
> > > +
> > > +		if (uffd_wp)
> > > +			newpmd = pmd_swp_mkuffd_wp(newpmd);
> > > +		else if (uffd_wp_resolve)
> > > +			newpmd = pmd_swp_clear_uffd_wp(newpmd);
> > > +		if (!pmd_same(*pmd, newpmd))
> > > +			set_pmd_at(mm, addr, pmd, newpmd);
> > >   		goto unlock;
> > >   	}
> > >   #endif
> > > @@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
> > >   	pmdswp = swp_entry_to_pmd(entry);
> > >   	if (pmd_soft_dirty(pmdval))
> > >   		pmdswp = pmd_swp_mksoft_dirty(pmdswp);
> > > +	if (pmd_swp_uffd_wp(*pvmw->pmd))
> > > +		pmdswp = pmd_swp_mkuffd_wp(pmdswp);
> > 
> > I think it's fine to use *pmd, but maybe still better to use pmdval?  I
> > worry pmdp_invalidate()) can be something else in the future that may
> > affect the bit.
> 
> Wondering how I ended up with that, I realized that it's actually
> wrong and might have worked by chance for my reproducer on x86.
> 
> That should make it work:
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index f977c965fdad..fffc953fa6ea 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3257,7 +3257,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
>         pmdswp = swp_entry_to_pmd(entry);
>         if (pmd_soft_dirty(pmdval))
>                 pmdswp = pmd_swp_mksoft_dirty(pmdswp);
> -       if (pmd_swp_uffd_wp(*pvmw->pmd))
> +       if (pmd_uffd_wp(pmdval))
>                 pmdswp = pmd_swp_mkuffd_wp(pmdswp);
>         set_pmd_at(mm, address, pvmw->pmd, pmdswp);
>         page_remove_rmap(page, vma, true);

I guess pmd_swp_uffd_wp() just reads the _USER bit 2 which is also set for
a present pte, but then it sets swp uffd-wp always even if it was not set.

Yes the change must be squashed in to be correct, with that, my R-b keeps.

Thanks,
David Hildenbrand April 5, 2023, 3:51 p.m. UTC | #4
On 05.04.23 17:43, Peter Xu wrote:
> On Wed, Apr 05, 2023 at 05:17:31PM +0200, David Hildenbrand wrote:
>> On 05.04.23 17:12, Peter Xu wrote:
>>> On Wed, Apr 05, 2023 at 04:25:34PM +0200, David Hildenbrand wrote:
>>>> Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
>>>> fix uffd-wp handling for migration entries in hugetlb_change_protection()")
>>>> similarly applies to THP.
>>>>
>>>> Setting/clearing uffd-wp on THP migration entries is not implemented
>>>> properly. Further, while removing migration PMDs considers the uffd-wp
>>>> bit, inserting migration PMDs does not consider the uffd-wp bit.
>>>>
>>>> We have to set/clear independently of the migration entry type in
>>>> change_huge_pmd() and properly copy the uffd-wp bit in
>>>> set_pmd_migration_entry().
>>>>
>>>> Verified using a simple reproducer that triggers migration of a THP, that
>>>> the set_pmd_migration_entry() no longer loses the uffd-wp bit.
>>>>
>>>> Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
>>>> Cc: stable@vger.kernel.org
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>
>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>>
>>> Thanks, one trivial nitpick:
>>>
>>>> ---
>>>>    mm/huge_memory.c | 14 ++++++++++++--
>>>>    1 file changed, 12 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 032fb0ef9cd1..bdda4f426d58 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>>>    	if (is_swap_pmd(*pmd)) {
>>>>    		swp_entry_t entry = pmd_to_swp_entry(*pmd);
>>>>    		struct page *page = pfn_swap_entry_to_page(entry);
>>>> +		pmd_t newpmd;
>>>>    		VM_BUG_ON(!is_pmd_migration_entry(*pmd));
>>>>    		if (is_writable_migration_entry(entry)) {
>>>> -			pmd_t newpmd;
>>>>    			/*
>>>>    			 * A protection check is difficult so
>>>>    			 * just be safe and disable write
>>>> @@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>>>    				newpmd = pmd_swp_mksoft_dirty(newpmd);
>>>>    			if (pmd_swp_uffd_wp(*pmd))
>>>>    				newpmd = pmd_swp_mkuffd_wp(newpmd);
>>>> -			set_pmd_at(mm, addr, pmd, newpmd);
>>>> +		} else {
>>>> +			newpmd = *pmd;
>>>>    		}
>>>> +
>>>> +		if (uffd_wp)
>>>> +			newpmd = pmd_swp_mkuffd_wp(newpmd);
>>>> +		else if (uffd_wp_resolve)
>>>> +			newpmd = pmd_swp_clear_uffd_wp(newpmd);
>>>> +		if (!pmd_same(*pmd, newpmd))
>>>> +			set_pmd_at(mm, addr, pmd, newpmd);
>>>>    		goto unlock;
>>>>    	}
>>>>    #endif
>>>> @@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
>>>>    	pmdswp = swp_entry_to_pmd(entry);
>>>>    	if (pmd_soft_dirty(pmdval))
>>>>    		pmdswp = pmd_swp_mksoft_dirty(pmdswp);
>>>> +	if (pmd_swp_uffd_wp(*pvmw->pmd))
>>>> +		pmdswp = pmd_swp_mkuffd_wp(pmdswp);
>>>
>>> I think it's fine to use *pmd, but maybe still better to use pmdval?  I
>>> worry pmdp_invalidate()) can be something else in the future that may
>>> affect the bit.
>>
>> Wondering how I ended up with that, I realized that it's actually
>> wrong and might have worked by chance for my reproducer on x86.
>>
>> That should make it work:
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index f977c965fdad..fffc953fa6ea 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3257,7 +3257,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
>>          pmdswp = swp_entry_to_pmd(entry);
>>          if (pmd_soft_dirty(pmdval))
>>                  pmdswp = pmd_swp_mksoft_dirty(pmdswp);
>> -       if (pmd_swp_uffd_wp(*pvmw->pmd))
>> +       if (pmd_uffd_wp(pmdval))
>>                  pmdswp = pmd_swp_mkuffd_wp(pmdswp);
>>          set_pmd_at(mm, address, pvmw->pmd, pmdswp);
>>          page_remove_rmap(page, vma, true);
> 
> I guess pmd_swp_uffd_wp() just reads the _USER bit 2 which is also set for
> a present pte, but then it sets swp uffd-wp always even if it was not set.
> 

Yes. I modified the reproducer to migrate without uffd-wp first and we 
suddenly gain a uffd-wp bit.

> Yes the change must be squashed in to be correct, with that, my R-b keeps.

Thanks, I will resend later.
diff mbox series

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 032fb0ef9cd1..bdda4f426d58 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1838,10 +1838,10 @@  int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	if (is_swap_pmd(*pmd)) {
 		swp_entry_t entry = pmd_to_swp_entry(*pmd);
 		struct page *page = pfn_swap_entry_to_page(entry);
+		pmd_t newpmd;
 
 		VM_BUG_ON(!is_pmd_migration_entry(*pmd));
 		if (is_writable_migration_entry(entry)) {
-			pmd_t newpmd;
 			/*
 			 * A protection check is difficult so
 			 * just be safe and disable write
@@ -1855,8 +1855,16 @@  int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 				newpmd = pmd_swp_mksoft_dirty(newpmd);
 			if (pmd_swp_uffd_wp(*pmd))
 				newpmd = pmd_swp_mkuffd_wp(newpmd);
-			set_pmd_at(mm, addr, pmd, newpmd);
+		} else {
+			newpmd = *pmd;
 		}
+
+		if (uffd_wp)
+			newpmd = pmd_swp_mkuffd_wp(newpmd);
+		else if (uffd_wp_resolve)
+			newpmd = pmd_swp_clear_uffd_wp(newpmd);
+		if (!pmd_same(*pmd, newpmd))
+			set_pmd_at(mm, addr, pmd, newpmd);
 		goto unlock;
 	}
 #endif
@@ -3251,6 +3259,8 @@  int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
 	pmdswp = swp_entry_to_pmd(entry);
 	if (pmd_soft_dirty(pmdval))
 		pmdswp = pmd_swp_mksoft_dirty(pmdswp);
+	if (pmd_swp_uffd_wp(*pvmw->pmd))
+		pmdswp = pmd_swp_mkuffd_wp(pmdswp);
 	set_pmd_at(mm, address, pvmw->pmd, pmdswp);
 	page_remove_rmap(page, vma, true);
 	put_page(page);