diff mbox series

[v2,06/14] mm: handle_pte_fault() use pte_offset_map_rw_nolock()

Message ID 5acabedfae7ded01b075960b4a91f2e15b4d76b5.1724310149.git.zhengqi.arch@bytedance.com (mailing list archive)
State New, archived
Headers show
Series introduce pte_offset_map_{ro|rw}_nolock() | expand

Commit Message

Qi Zheng Aug. 22, 2024, 7:13 a.m. UTC
In handle_pte_fault(), we may modify the vmf->pte after acquiring the
vmf->ptl, so convert it to using pte_offset_map_rw_nolock(). But since we
will do the pte_same() check, so there is no need to get pmdval to do
pmd_same() check, just pass a dummy variable to it.

Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 mm/memory.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

David Hildenbrand Aug. 26, 2024, 3:36 p.m. UTC | #1
On 22.08.24 09:13, Qi Zheng wrote:
> In handle_pte_fault(), we may modify the vmf->pte after acquiring the
> vmf->ptl, so convert it to using pte_offset_map_rw_nolock(). But since we
> will do the pte_same() check, so there is no need to get pmdval to do
> pmd_same() check, just pass a dummy variable to it.
> 
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> ---
>   mm/memory.c | 12 ++++++++++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 93c0c25433d02..7b6071a0e21e2 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5499,14 +5499,22 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
>   		vmf->pte = NULL;
>   		vmf->flags &= ~FAULT_FLAG_ORIG_PTE_VALID;
>   	} else {
> +		pmd_t dummy_pmdval;
> +
>   		/*
>   		 * A regular pmd is established and it can't morph into a huge
>   		 * pmd by anon khugepaged, since that takes mmap_lock in write
>   		 * mode; but shmem or file collapse to THP could still morph
>   		 * it into a huge pmd: just retry later if so.
> +		 *
> +		 * Use the maywrite version to indicate that vmf->pte will be
> +		 * modified, but since we will use pte_same() to detect the
> +		 * change of the pte entry, there is no need to get pmdval, so
> +		 * just pass a dummy variable to it.
>   		 */
> -		vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd,
> -						 vmf->address, &vmf->ptl);
> +		vmf->pte = pte_offset_map_rw_nolock(vmf->vma->vm_mm, vmf->pmd,
> +						    vmf->address, &dummy_pmdval,
> +						    &vmf->ptl);
>   		if (unlikely(!vmf->pte))
>   			return 0;
>   		vmf->orig_pte = ptep_get_lockless(vmf->pte);

No I understand why we don't need the PMD val in these cases ... the PTE 
would also be pte_none() at the point the page table is freed, so we 
would detect the change as well.

I do enjoy documenting why we use a dummy value, though. Likely without 
that, new users will just pass NULL and call it a day.

Acked-by: David Hildenbrand <david@redhat.com>
Qi Zheng Aug. 27, 2024, 4:53 a.m. UTC | #2
On 2024/8/26 23:36, David Hildenbrand wrote:
> On 22.08.24 09:13, Qi Zheng wrote:
>> In handle_pte_fault(), we may modify the vmf->pte after acquiring the
>> vmf->ptl, so convert it to using pte_offset_map_rw_nolock(). But since we
>> will do the pte_same() check, so there is no need to get pmdval to do
>> pmd_same() check, just pass a dummy variable to it.
>>
>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>> ---
>>   mm/memory.c | 12 ++++++++++--
>>   1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 93c0c25433d02..7b6071a0e21e2 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -5499,14 +5499,22 @@ static vm_fault_t handle_pte_fault(struct 
>> vm_fault *vmf)
>>           vmf->pte = NULL;
>>           vmf->flags &= ~FAULT_FLAG_ORIG_PTE_VALID;
>>       } else {
>> +        pmd_t dummy_pmdval;
>> +
>>           /*
>>            * A regular pmd is established and it can't morph into a huge
>>            * pmd by anon khugepaged, since that takes mmap_lock in write
>>            * mode; but shmem or file collapse to THP could still morph
>>            * it into a huge pmd: just retry later if so.
>> +         *
>> +         * Use the maywrite version to indicate that vmf->pte will be
>> +         * modified, but since we will use pte_same() to detect the
>> +         * change of the pte entry, there is no need to get pmdval, so
>> +         * just pass a dummy variable to it.
>>            */
>> -        vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd,
>> -                         vmf->address, &vmf->ptl);
>> +        vmf->pte = pte_offset_map_rw_nolock(vmf->vma->vm_mm, vmf->pmd,
>> +                            vmf->address, &dummy_pmdval,
>> +                            &vmf->ptl);
>>           if (unlikely(!vmf->pte))
>>               return 0;
>>           vmf->orig_pte = ptep_get_lockless(vmf->pte);
> 
> No I understand why we don't need the PMD val in these cases ... the PTE 
> would also be pte_none() at the point the page table is freed, so we 
> would detect the change as well.

Yes.

> 
> I do enjoy documenting why we use a dummy value, though. Likely without 
> that, new users will just pass NULL and call it a day.

OK, how about the following:

Use the maywrite version to indicate that vmf->pte will be
modified, but since we will use pte_same() to detect the
change of the !pte_none() entry, there is no need to recheck
the pmdval. Here we chooes to pass a dummy variable instead
of NULL, which helps new user think about why this place is
special.

> 
> Acked-by: David Hildenbrand <david@redhat.com>

Thanks!

>
Muchun Song Aug. 29, 2024, 7:30 a.m. UTC | #3
> On Aug 22, 2024, at 15:13, Qi Zheng <zhengqi.arch@bytedance.com> wrote:
> 
> In handle_pte_fault(), we may modify the vmf->pte after acquiring the
> vmf->ptl, so convert it to using pte_offset_map_rw_nolock(). But since we
> will do the pte_same() check, so there is no need to get pmdval to do
> pmd_same() check, just pass a dummy variable to it.
> 
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>

Reviewed-by: Muchun Song <muchun.song@linux.dev>

A nit below.

> ---
> mm/memory.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 93c0c25433d02..7b6071a0e21e2 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5499,14 +5499,22 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
> 		vmf->pte = NULL;
> 		vmf->flags &= ~FAULT_FLAG_ORIG_PTE_VALID;
> 	} else {
> + 		pmd_t dummy_pmdval;
> +
> 	/*
> 	 * A regular pmd is established and it can't morph into a huge
> 	 * pmd by anon khugepaged, since that takes mmap_lock in write
> 	 * mode; but shmem or file collapse to THP could still morph
> 	 * it into a huge pmd: just retry later if so.
> +	 *
> +	 * Use the maywrite version to indicate that vmf->pte will be

Not "will be", should be "may be".

> +	 * modified, but since we will use pte_same() to detect the
> +	 * change of the pte entry, there is no need to get pmdval, so
> +	 * just pass a dummy variable to it.
> 	 */
> - 	vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd,
> -					 vmf->address, &vmf->ptl);
> +	vmf->pte = pte_offset_map_rw_nolock(vmf->vma->vm_mm, vmf->pmd,
> +					    vmf->address, &dummy_pmdval,
> +					    &vmf->ptl);
> 	if (unlikely(!vmf->pte))
> 		return 0;
> 	vmf->orig_pte = ptep_get_lockless(vmf->pte);
> -- 
> 2.20.1
>
diff mbox series

Patch

diff --git a/mm/memory.c b/mm/memory.c
index 93c0c25433d02..7b6071a0e21e2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5499,14 +5499,22 @@  static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 		vmf->pte = NULL;
 		vmf->flags &= ~FAULT_FLAG_ORIG_PTE_VALID;
 	} else {
+		pmd_t dummy_pmdval;
+
 		/*
 		 * A regular pmd is established and it can't morph into a huge
 		 * pmd by anon khugepaged, since that takes mmap_lock in write
 		 * mode; but shmem or file collapse to THP could still morph
 		 * it into a huge pmd: just retry later if so.
+		 *
+		 * Use the maywrite version to indicate that vmf->pte will be
+		 * modified, but since we will use pte_same() to detect the
+		 * change of the pte entry, there is no need to get pmdval, so
+		 * just pass a dummy variable to it.
 		 */
-		vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd,
-						 vmf->address, &vmf->ptl);
+		vmf->pte = pte_offset_map_rw_nolock(vmf->vma->vm_mm, vmf->pmd,
+						    vmf->address, &dummy_pmdval,
+						    &vmf->ptl);
 		if (unlikely(!vmf->pte))
 			return 0;
 		vmf->orig_pte = ptep_get_lockless(vmf->pte);