diff mbox series

[v1] arm64: mm: Permit PTE SW bits to change in live mappings

Message ID 20240619121859.4153966-1-ryan.roberts@arm.com (mailing list archive)
State New
Headers show
Series [v1] arm64: mm: Permit PTE SW bits to change in live mappings | expand

Commit Message

Ryan Roberts June 19, 2024, 12:18 p.m. UTC
Previously pgattr_change_is_safe() was overly-strict and complained
(e.g. "[  116.262743] __check_safe_pte_update: unsafe attribute change:
0x0560000043768fc3 -> 0x0160000043768fc3") if it saw any SW bits change
in a live PTE. There is no such restriction on SW bits in the Arm ARM.

Until now, no SW bits have been updated in live mappings via the
set_ptes() route. PTE_DIRTY would be updated live, but this is handled
by ptep_set_access_flags() which does not call pgattr_change_is_safe().
However, with the introduction of uffd-wp for arm64, there is core-mm
code that does ptep_get(); pte_clear_uffd_wp(); set_ptes(); which
triggers this false warning.

Silence this warning by masking out the SW bits during checks.

The bug isn't technically in the highlighted commit below, but that's
where bisecting would likely lead as its what made the bug user-visible.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Fixes: 5b32510af77b ("arm64/mm: Add uffd write-protect support")
---

Hi All,

This applies on top of v6.10-rc4 and it would be good to land this as a hotfix
for v6.10 since its effectively fixing a bug in 5b32510af77b which was merged
for v6.10.

I've only been able to trigger this occasionally by running the mm uffd
selftests, when swap is configured to use a small (64M) zRam disk. With this fix
applied I can no longer trigger it.

Thanks,
Ryan

 arch/arm64/include/asm/pgtable-hwdef.h | 1 +
 arch/arm64/mm/mmu.c                    | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

--
2.43.0

Comments

Peter Xu June 19, 2024, 2:54 p.m. UTC | #1
Hi, Ryan,

On Wed, Jun 19, 2024 at 01:18:56PM +0100, Ryan Roberts wrote:
> Previously pgattr_change_is_safe() was overly-strict and complained
> (e.g. "[  116.262743] __check_safe_pte_update: unsafe attribute change:
> 0x0560000043768fc3 -> 0x0160000043768fc3") if it saw any SW bits change
> in a live PTE. There is no such restriction on SW bits in the Arm ARM.
> 
> Until now, no SW bits have been updated in live mappings via the
> set_ptes() route. PTE_DIRTY would be updated live, but this is handled
> by ptep_set_access_flags() which does not call pgattr_change_is_safe().
> However, with the introduction of uffd-wp for arm64, there is core-mm
> code that does ptep_get(); pte_clear_uffd_wp(); set_ptes(); which
> triggers this false warning.
> 
> Silence this warning by masking out the SW bits during checks.
> 
> The bug isn't technically in the highlighted commit below, but that's
> where bisecting would likely lead as its what made the bug user-visible.
> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> Fixes: 5b32510af77b ("arm64/mm: Add uffd write-protect support")
> ---
> 
> Hi All,
> 
> This applies on top of v6.10-rc4 and it would be good to land this as a hotfix
> for v6.10 since its effectively fixing a bug in 5b32510af77b which was merged
> for v6.10.
> 
> I've only been able to trigger this occasionally by running the mm uffd
> selftests, when swap is configured to use a small (64M) zRam disk. With this fix
> applied I can no longer trigger it.

Totally not familiar with the arm64 pgtable checker here, but I'm just
wondering how the swap affected this, as I see there's:

	/* creating or taking down mappings is always safe */
	if (!pte_valid(__pte(old)) || !pte_valid(__pte(new)))
		return true;

Should pte_valid() always report false on swap entries? Does it mean that
it'll always report PASS for anything switch from/to a swap entry for the
checker?

I assume that's also why you didn't cover bit 3 (uffd-wp swap bit on arm64,
per my read in your previous series), but I don't think I'm confident on my
understanding yet.  It might be nice to mention how that was triggered in
the commit message from that regard.

> 
> Thanks,
> Ryan
> 
>  arch/arm64/include/asm/pgtable-hwdef.h | 1 +
>  arch/arm64/mm/mmu.c                    | 3 ++-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
> index 9943ff0af4c9..1f60aa1bc750 100644
> --- a/arch/arm64/include/asm/pgtable-hwdef.h
> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
> @@ -170,6 +170,7 @@
>  #define PTE_CONT		(_AT(pteval_t, 1) << 52)	/* Contiguous range */
>  #define PTE_PXN			(_AT(pteval_t, 1) << 53)	/* Privileged XN */
>  #define PTE_UXN			(_AT(pteval_t, 1) << 54)	/* User XN */
> +#define PTE_SWBITS_MASK		_AT(pteval_t, (BIT(63) | GENMASK(58, 55)))
> 
>  #define PTE_ADDR_LOW		(((_AT(pteval_t, 1) << (50 - PAGE_SHIFT)) - 1) << PAGE_SHIFT)
>  #ifdef CONFIG_ARM64_PA_BITS_52
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index c927e9312f10..353ea5dc32b8 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -124,7 +124,8 @@ bool pgattr_change_is_safe(u64 old, u64 new)
>  	 * The following mapping attributes may be updated in live
>  	 * kernel mappings without the need for break-before-make.
>  	 */
> -	pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE | PTE_NG;
> +	pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE | PTE_NG |
> +			PTE_SWBITS_MASK;

When applying the uffd-wp bit, normally we shouldn't need this as we'll
need to do BBM-alike ops to avoid concurrent HW A/D updates.  E.g.
change_pte_range() uses the ptep_modify_prot_* APIs.

But indeed at least unprotect / clear-uffd-bit doesn't logically need that,
we already do that in e.g. do_wp_page().  From that POV it makes sense to
me, as I also don't see why soft-bits are forbidden to be updated on ptes
if HWs ignore them as a pretty generic concept.  Just want to double check
with you.

> 
>  	/* creating or taking down mappings is always safe */
>  	if (!pte_valid(__pte(old)) || !pte_valid(__pte(new)))
> --
> 2.43.0
> 

When looking at this function I found this and caught my attention too:

	/* live contiguous mappings may not be manipulated at all */
	if ((old | new) & PTE_CONT)
		return false;

I'm now wondering how cont-ptes work with uffd-wp now for arm64, from
either hugetlb or mTHP pov.  This check may be relevant here as a start.

The other thing is since x86 doesn't have cont-ptes yet, uffd-wp didn't
consider that, and there may be things overlooked at least from my side.
E.g., consider wr-protect one cont-pte huge pages on hugetlb:

static inline pte_t huge_pte_mkuffd_wp(pte_t pte)
{
	return huge_pte_wrprotect(pte_mkuffd_wp(pte));
}

I think it means so far it won't touch the rest cont-ptes but the 1st.  Not
sure whether it'll work if write happens on the rest.

For mTHPs, they should still be done in change_pte_range() which doesn't
understand mTHPs yet, so it should loop over all ptes and looks good so
far, but I didn't further check other than that.

Thanks,
Will Deacon June 19, 2024, 3:13 p.m. UTC | #2
On Wed, 19 Jun 2024 13:18:56 +0100, Ryan Roberts wrote:
> Previously pgattr_change_is_safe() was overly-strict and complained
> (e.g. "[  116.262743] __check_safe_pte_update: unsafe attribute change:
> 0x0560000043768fc3 -> 0x0160000043768fc3") if it saw any SW bits change
> in a live PTE. There is no such restriction on SW bits in the Arm ARM.
> 
> Until now, no SW bits have been updated in live mappings via the
> set_ptes() route. PTE_DIRTY would be updated live, but this is handled
> by ptep_set_access_flags() which does not call pgattr_change_is_safe().
> However, with the introduction of uffd-wp for arm64, there is core-mm
> code that does ptep_get(); pte_clear_uffd_wp(); set_ptes(); which
> triggers this false warning.
> 
> [...]

This looks good to me, so I've picked it up. However, I see that Peter
just replied with some comments, so I'll keep an eye on that discussion
in case we need to respin.

But, for now:

Applied to arm64 (for-next/fixes), thanks!

[1/1] arm64: mm: Permit PTE SW bits to change in live mappings
      https://git.kernel.org/arm64/c/895a37028a48

Cheers,
Ryan Roberts June 19, 2024, 3:58 p.m. UTC | #3
On 19/06/2024 15:54, Peter Xu wrote:
> Hi, Ryan,
> 
> On Wed, Jun 19, 2024 at 01:18:56PM +0100, Ryan Roberts wrote:
>> Previously pgattr_change_is_safe() was overly-strict and complained
>> (e.g. "[  116.262743] __check_safe_pte_update: unsafe attribute change:
>> 0x0560000043768fc3 -> 0x0160000043768fc3") if it saw any SW bits change
>> in a live PTE. There is no such restriction on SW bits in the Arm ARM.
>>
>> Until now, no SW bits have been updated in live mappings via the
>> set_ptes() route. PTE_DIRTY would be updated live, but this is handled
>> by ptep_set_access_flags() which does not call pgattr_change_is_safe().
>> However, with the introduction of uffd-wp for arm64, there is core-mm
>> code that does ptep_get(); pte_clear_uffd_wp(); set_ptes(); which
>> triggers this false warning.
>>
>> Silence this warning by masking out the SW bits during checks.
>>
>> The bug isn't technically in the highlighted commit below, but that's
>> where bisecting would likely lead as its what made the bug user-visible.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> Fixes: 5b32510af77b ("arm64/mm: Add uffd write-protect support")
>> ---
>>
>> Hi All,
>>
>> This applies on top of v6.10-rc4 and it would be good to land this as a hotfix
>> for v6.10 since its effectively fixing a bug in 5b32510af77b which was merged
>> for v6.10.
>>
>> I've only been able to trigger this occasionally by running the mm uffd
>> selftests, when swap is configured to use a small (64M) zRam disk. With this fix
>> applied I can no longer trigger it.
> 
> Totally not familiar with the arm64 pgtable checker here, but I'm just
> wondering how the swap affected this, as I see there's:
> 
> 	/* creating or taking down mappings is always safe */
> 	if (!pte_valid(__pte(old)) || !pte_valid(__pte(new)))
> 		return true;
> 
> Should pte_valid() always report false on swap entries? Does it mean that
> it'll always report PASS for anything switch from/to a swap entry for the
> checker?

Yes that's correct; swap ptes are invalid from the HW's pov so you can always
safely change their values from anything to anything (as long as the valid bit
remains 0).

> 
> I assume that's also why you didn't cover bit 3 (uffd-wp swap bit on arm64,
> per my read in your previous series), but I don't think I'm confident on my
> understanding yet.  It might be nice to mention how that was triggered in
> the commit message from that regard.

Bit 3 is the uffd-wp bit in swap ptes. Bit 58 is the uffd-wp bit for valid ptes.
Here we are only concerned with valid ptes. Yes, its a mess ;-)

> 
>>
>> Thanks,
>> Ryan
>>
>>  arch/arm64/include/asm/pgtable-hwdef.h | 1 +
>>  arch/arm64/mm/mmu.c                    | 3 ++-
>>  2 files changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
>> index 9943ff0af4c9..1f60aa1bc750 100644
>> --- a/arch/arm64/include/asm/pgtable-hwdef.h
>> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
>> @@ -170,6 +170,7 @@
>>  #define PTE_CONT		(_AT(pteval_t, 1) << 52)	/* Contiguous range */
>>  #define PTE_PXN			(_AT(pteval_t, 1) << 53)	/* Privileged XN */
>>  #define PTE_UXN			(_AT(pteval_t, 1) << 54)	/* User XN */
>> +#define PTE_SWBITS_MASK		_AT(pteval_t, (BIT(63) | GENMASK(58, 55)))
>>
>>  #define PTE_ADDR_LOW		(((_AT(pteval_t, 1) << (50 - PAGE_SHIFT)) - 1) << PAGE_SHIFT)
>>  #ifdef CONFIG_ARM64_PA_BITS_52
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index c927e9312f10..353ea5dc32b8 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -124,7 +124,8 @@ bool pgattr_change_is_safe(u64 old, u64 new)
>>  	 * The following mapping attributes may be updated in live
>>  	 * kernel mappings without the need for break-before-make.
>>  	 */
>> -	pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE | PTE_NG;
>> +	pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE | PTE_NG |
>> +			PTE_SWBITS_MASK;
> 
> When applying the uffd-wp bit, normally we shouldn't need this as we'll
> need to do BBM-alike ops to avoid concurrent HW A/D updates.  E.g.
> change_pte_range() uses the ptep_modify_prot_* APIs.
> 
> But indeed at least unprotect / clear-uffd-bit doesn't logically need that,
> we already do that in e.g. do_wp_page().  From that POV it makes sense to
> me, as I also don't see why soft-bits are forbidden to be updated on ptes
> if HWs ignore them as a pretty generic concept.  Just want to double check
> with you.

This bug was indeed triggering from do_wp_page() as you say, and I was
considering sending out a separate patch to change that code to use the
ptep_modify_prot_start()/ptep_modify_prot_commit() pattern which transitions the
pte through 0 so that we guarrantee not to lose any A/D updates. In the end I
convinced myself that while ptep_get(); pte_clear_uffd_wp(); set_ptes(); is a
troubling pattern, it is safe in this instance because the page is
write-protected so the HW can't race to set the dirty bit.

The code in question is:

	if (userfaultfd_pte_wp(vma, ptep_get(vmf->pte))) {
		if (!userfaultfd_wp_async(vma)) {
			pte_unmap_unlock(vmf->pte, vmf->ptl);
			return handle_userfault(vmf, VM_UFFD_WP);
		}

		/*
		 * Nothing needed (cache flush, TLB invalidations,
		 * etc.) because we're only removing the uffd-wp bit,
		 * which is completely invisible to the user.
		 */
		pte = pte_clear_uffd_wp(ptep_get(vmf->pte));

		set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
		/*
		 * Update this to be prepared for following up CoW
		 * handling
		 */
		vmf->orig_pte = pte;
	}

Perhaps we should consider a change to the following style as a cleanup?

	old_pte = ptep_modify_prot_start(vma, addr, pte);
	ptent = pte_clear_uffd_wp(old_pte);
	ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent);

Regardless, this patch is still a correct and valuable change; arm64 arch
doesn't care if SW bits are modified in valid mappings so we shouldn't be
checking for it.

> 
>>
>>  	/* creating or taking down mappings is always safe */
>>  	if (!pte_valid(__pte(old)) || !pte_valid(__pte(new)))
>> --
>> 2.43.0
>>
> 
> When looking at this function I found this and caught my attention too:
> 
> 	/* live contiguous mappings may not be manipulated at all */
> 	if ((old | new) & PTE_CONT)
> 		return false;
> 
> I'm now wondering how cont-ptes work with uffd-wp now for arm64, from
> either hugetlb or mTHP pov.  This check may be relevant here as a start.

When transitioning a block of ptes between cont and non-cont, we transition the
block through invalid with tlb invalidation. See contpte_convert().

> 
> The other thing is since x86 doesn't have cont-ptes yet, uffd-wp didn't
> consider that, and there may be things overlooked at least from my side.
> E.g., consider wr-protect one cont-pte huge pages on hugetlb:
> 
> static inline pte_t huge_pte_mkuffd_wp(pte_t pte)
> {
> 	return huge_pte_wrprotect(pte_mkuffd_wp(pte));
> }
> 
> I think it means so far it won't touch the rest cont-ptes but the 1st.  Not
> sure whether it'll work if write happens on the rest.

I'm not completely sure I follow your point. I think this should work correctly.
The arm64 huge_pte code knows what size (and level) the huge pte is and spreads
the passed in pte across all the HW ptes.

> 
> For mTHPs, they should still be done in change_pte_range() which doesn't
> understand mTHPs yet, so it should loop over all ptes and looks good so
> far, but I didn't further check other than that.

For mTHP, it will JustWork (TM). PTEs are exposed to core-mm with the same
semantics they had before; they all appear independent. The code dertermines
when it needs to apply or remove PTE_CONT bit, and in that case, the block is
transitioned through invalid state + tlbi. See contpte_try_fold() and
contpte_try_unfold().

Hope that helps!

Thanks,
Ryan


> 
> Thanks,
>
Peter Xu June 19, 2024, 7:04 p.m. UTC | #4
On Wed, Jun 19, 2024 at 04:58:32PM +0100, Ryan Roberts wrote:
> The code in question is:
> 
> 	if (userfaultfd_pte_wp(vma, ptep_get(vmf->pte))) {
> 		if (!userfaultfd_wp_async(vma)) {
> 			pte_unmap_unlock(vmf->pte, vmf->ptl);
> 			return handle_userfault(vmf, VM_UFFD_WP);
> 		}
> 
> 		/*
> 		 * Nothing needed (cache flush, TLB invalidations,
> 		 * etc.) because we're only removing the uffd-wp bit,
> 		 * which is completely invisible to the user.
> 		 */
> 		pte = pte_clear_uffd_wp(ptep_get(vmf->pte));
> 
> 		set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
> 		/*
> 		 * Update this to be prepared for following up CoW
> 		 * handling
> 		 */
> 		vmf->orig_pte = pte;
> 	}
> 
> Perhaps we should consider a change to the following style as a cleanup?
> 
> 	old_pte = ptep_modify_prot_start(vma, addr, pte);
> 	ptent = pte_clear_uffd_wp(old_pte);
> 	ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent);

You're probably right that at least the access bit seems racy to be set
here, so we may have risk of losing that when a race happened against HW.
Dirty bit shouldn't be a concern in this case due to missing W bit, iiuc.

IMO it's a matter of whether we'd like to "make access bit 100% accurate"
when the race happened, while paying that off with an always slower generic
path.  Looks cleaner indeed but maybe not very beneficial in reality.

> 
> Regardless, this patch is still a correct and valuable change; arm64 arch
> doesn't care if SW bits are modified in valid mappings so we shouldn't be
> checking for it.

Agreed.  Let's keep this discussion separate from the original patch if
that already fixes stuff.

> 
> > 
> >>
> >>  	/* creating or taking down mappings is always safe */
> >>  	if (!pte_valid(__pte(old)) || !pte_valid(__pte(new)))
> >> --
> >> 2.43.0
> >>
> > 
> > When looking at this function I found this and caught my attention too:
> > 
> > 	/* live contiguous mappings may not be manipulated at all */
> > 	if ((old | new) & PTE_CONT)
> > 		return false;
> > 
> > I'm now wondering how cont-ptes work with uffd-wp now for arm64, from
> > either hugetlb or mTHP pov.  This check may be relevant here as a start.
> 
> When transitioning a block of ptes between cont and non-cont, we transition the
> block through invalid with tlb invalidation. See contpte_convert().
> 
> > 
> > The other thing is since x86 doesn't have cont-ptes yet, uffd-wp didn't
> > consider that, and there may be things overlooked at least from my side.
> > E.g., consider wr-protect one cont-pte huge pages on hugetlb:
> > 
> > static inline pte_t huge_pte_mkuffd_wp(pte_t pte)
> > {
> > 	return huge_pte_wrprotect(pte_mkuffd_wp(pte));
> > }
> > 
> > I think it means so far it won't touch the rest cont-ptes but the 1st.  Not
> > sure whether it'll work if write happens on the rest.
> 
> I'm not completely sure I follow your point. I think this should work correctly.
> The arm64 huge_pte code knows what size (and level) the huge pte is and spreads
> the passed in pte across all the HW ptes.

What I was considering is about wr-protect a 64K cont-pte entry in arm64:

  UFFDIO_WRITEPROTECT -> hugetlb_change_protection() -> huge_pte_mkuffd_wp()

What I'm expecting is huge_pte_mkuffd_wp() would wr-protect all ptes, but
looks not right now.  I'm not sure if the HW is able to identify "the whole
64K is wr-protected" in this case, rather than "only the 1st pte is
wr-protected", as IIUC current "pte" points to only the 1st pte entry.

Thanks,
Ryan Roberts June 20, 2024, 10:26 a.m. UTC | #5
On 19/06/2024 20:04, Peter Xu wrote:
> On Wed, Jun 19, 2024 at 04:58:32PM +0100, Ryan Roberts wrote:
>> The code in question is:
>>
>> 	if (userfaultfd_pte_wp(vma, ptep_get(vmf->pte))) {
>> 		if (!userfaultfd_wp_async(vma)) {
>> 			pte_unmap_unlock(vmf->pte, vmf->ptl);
>> 			return handle_userfault(vmf, VM_UFFD_WP);
>> 		}
>>
>> 		/*
>> 		 * Nothing needed (cache flush, TLB invalidations,
>> 		 * etc.) because we're only removing the uffd-wp bit,
>> 		 * which is completely invisible to the user.
>> 		 */
>> 		pte = pte_clear_uffd_wp(ptep_get(vmf->pte));
>>
>> 		set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
>> 		/*
>> 		 * Update this to be prepared for following up CoW
>> 		 * handling
>> 		 */
>> 		vmf->orig_pte = pte;
>> 	}
>>
>> Perhaps we should consider a change to the following style as a cleanup?
>>
>> 	old_pte = ptep_modify_prot_start(vma, addr, pte);
>> 	ptent = pte_clear_uffd_wp(old_pte);
>> 	ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent);
> 
> You're probably right that at least the access bit seems racy to be set
> here, so we may have risk of losing that when a race happened against HW.
> Dirty bit shouldn't be a concern in this case due to missing W bit, iiuc.
> 
> IMO it's a matter of whether we'd like to "make access bit 100% accurate"
> when the race happened, while paying that off with an always slower generic
> path.  Looks cleaner indeed but maybe not very beneficial in reality.
> 
>>
>> Regardless, this patch is still a correct and valuable change; arm64 arch
>> doesn't care if SW bits are modified in valid mappings so we shouldn't be
>> checking for it.
> 
> Agreed.  Let's keep this discussion separate from the original patch if
> that already fixes stuff.
> 
>>
>>>
>>>>
>>>>  	/* creating or taking down mappings is always safe */
>>>>  	if (!pte_valid(__pte(old)) || !pte_valid(__pte(new)))
>>>> --
>>>> 2.43.0
>>>>
>>>
>>> When looking at this function I found this and caught my attention too:
>>>
>>> 	/* live contiguous mappings may not be manipulated at all */
>>> 	if ((old | new) & PTE_CONT)
>>> 		return false;
>>>
>>> I'm now wondering how cont-ptes work with uffd-wp now for arm64, from
>>> either hugetlb or mTHP pov.  This check may be relevant here as a start.
>>
>> When transitioning a block of ptes between cont and non-cont, we transition the
>> block through invalid with tlb invalidation. See contpte_convert().
>>
>>>
>>> The other thing is since x86 doesn't have cont-ptes yet, uffd-wp didn't
>>> consider that, and there may be things overlooked at least from my side.
>>> E.g., consider wr-protect one cont-pte huge pages on hugetlb:
>>>
>>> static inline pte_t huge_pte_mkuffd_wp(pte_t pte)
>>> {
>>> 	return huge_pte_wrprotect(pte_mkuffd_wp(pte));
>>> }
>>>
>>> I think it means so far it won't touch the rest cont-ptes but the 1st.  Not
>>> sure whether it'll work if write happens on the rest.
>>
>> I'm not completely sure I follow your point. I think this should work correctly.
>> The arm64 huge_pte code knows what size (and level) the huge pte is and spreads
>> the passed in pte across all the HW ptes.
> 
> What I was considering is about wr-protect a 64K cont-pte entry in arm64:
> 
>   UFFDIO_WRITEPROTECT -> hugetlb_change_protection() -> huge_pte_mkuffd_wp()
> 
> What I'm expecting is huge_pte_mkuffd_wp() would wr-protect all ptes, 

Yes, I think this works as expected. huge_pte_mkuffd_wp() is just modifying the
bits in a pte passed on the stack and returns the modified pte. The pgtable is
not touched at this point. The magic happens in set_huge_pte_at() (called by
huge_ptep_modify_prot_commit() in your case), which knows how the abstract "huge
pte" maps to real PMDs or PTEs in the pgtable and applies the passed in pte
value appropriately to all real pmds/ptes (adjusting the pfn as required in the
process).

> but
> looks not right now.  I'm not sure if the HW is able to identify "the whole
> 64K is wr-protected" in this case, rather than "only the 1st pte is
> wr-protected", as IIUC current "pte" points to only the 1st pte entry.

I believe this works as you intended; there is no bug as far as I can see.

> 
> Thanks,
>
Peter Xu June 20, 2024, 1:31 p.m. UTC | #6
On Thu, Jun 20, 2024 at 11:26:07AM +0100, Ryan Roberts wrote:
> Yes, I think this works as expected. huge_pte_mkuffd_wp() is just modifying the
> bits in a pte passed on the stack and returns the modified pte. The pgtable is
> not touched at this point. The magic happens in set_huge_pte_at() (called by
> huge_ptep_modify_prot_commit() in your case), which knows how the abstract "huge
> pte" maps to real PMDs or PTEs in the pgtable and applies the passed in pte
> value appropriately to all real pmds/ptes (adjusting the pfn as required in the
> process).

Ah indeed!  That's good enough. :)

Thanks,
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index 9943ff0af4c9..1f60aa1bc750 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -170,6 +170,7 @@ 
 #define PTE_CONT		(_AT(pteval_t, 1) << 52)	/* Contiguous range */
 #define PTE_PXN			(_AT(pteval_t, 1) << 53)	/* Privileged XN */
 #define PTE_UXN			(_AT(pteval_t, 1) << 54)	/* User XN */
+#define PTE_SWBITS_MASK		_AT(pteval_t, (BIT(63) | GENMASK(58, 55)))

 #define PTE_ADDR_LOW		(((_AT(pteval_t, 1) << (50 - PAGE_SHIFT)) - 1) << PAGE_SHIFT)
 #ifdef CONFIG_ARM64_PA_BITS_52
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index c927e9312f10..353ea5dc32b8 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -124,7 +124,8 @@  bool pgattr_change_is_safe(u64 old, u64 new)
 	 * The following mapping attributes may be updated in live
 	 * kernel mappings without the need for break-before-make.
 	 */
-	pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE | PTE_NG;
+	pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE | PTE_NG |
+			PTE_SWBITS_MASK;

 	/* creating or taking down mappings is always safe */
 	if (!pte_valid(__pte(old)) || !pte_valid(__pte(new)))