diff mbox series

[06/14] mm: handle_pte_fault() use pte_offset_map_maywrite_nolock()

Message ID 239432a0bc56464e58a6baf3622fdc72526c8d57.1724226076.git.zhengqi.arch@bytedance.com (mailing list archive)
State New, archived
Headers show
Series introduce pte_offset_map_{readonly|maywrite}_nolock() | expand

Commit Message

Qi Zheng Aug. 21, 2024, 8:18 a.m. UTC
In handle_pte_fault(), we may modify the vmf->pte after acquiring the
vmf->ptl, so convert it to using pte_offset_map_maywrite_nolock(). But
since we already do the pte_same() check, so there is no need to get
pmdval to do pmd_same() check, just pass NULL to pmdvalp parameter.

Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 mm/memory.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

LEROY Christophe Aug. 21, 2024, 9:17 a.m. UTC | #1
Le 21/08/2024 à 10:18, Qi Zheng a écrit :
> In handle_pte_fault(), we may modify the vmf->pte after acquiring the
> vmf->ptl, so convert it to using pte_offset_map_maywrite_nolock(). But
> since we already do the pte_same() check, so there is no need to get
> pmdval to do pmd_same() check, just pass NULL to pmdvalp parameter.
> 
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> ---
>   mm/memory.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 93c0c25433d02..d3378e98faf13 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5504,9 +5504,14 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
>   		 * pmd by anon khugepaged, since that takes mmap_lock in write
>   		 * mode; but shmem or file collapse to THP could still morph
>   		 * it into a huge pmd: just retry later if so.
> +		 *
> +		 * Use the maywrite version to indicate that vmf->pte will be
> +		 * modified, but since we will use pte_same() to detect the
> +		 * change of the pte entry, there is no need to get pmdval.
>   		 */
> -		vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd,
> -						 vmf->address, &vmf->ptl);
> +		vmf->pte = pte_offset_map_maywrite_nolock(vmf->vma->vm_mm,
> +							  vmf->pmd, vmf->address,
> +							  NULL, &vmf->ptl);

This might be the demonstration that the function name is becoming too long.

Can you find shorter names ?

>   		if (unlikely(!vmf->pte))
>   			return 0;
>   		vmf->orig_pte = ptep_get_lockless(vmf->pte);
Qi Zheng Aug. 21, 2024, 9:24 a.m. UTC | #2
On 2024/8/21 17:17, LEROY Christophe wrote:
> 
> 
> Le 21/08/2024 à 10:18, Qi Zheng a écrit :
>> In handle_pte_fault(), we may modify the vmf->pte after acquiring the
>> vmf->ptl, so convert it to using pte_offset_map_maywrite_nolock(). But
>> since we already do the pte_same() check, so there is no need to get
>> pmdval to do pmd_same() check, just pass NULL to pmdvalp parameter.
>>
>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>> ---
>>    mm/memory.c | 9 +++++++--
>>    1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 93c0c25433d02..d3378e98faf13 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -5504,9 +5504,14 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
>>    		 * pmd by anon khugepaged, since that takes mmap_lock in write
>>    		 * mode; but shmem or file collapse to THP could still morph
>>    		 * it into a huge pmd: just retry later if so.
>> +		 *
>> +		 * Use the maywrite version to indicate that vmf->pte will be
>> +		 * modified, but since we will use pte_same() to detect the
>> +		 * change of the pte entry, there is no need to get pmdval.
>>    		 */
>> -		vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd,
>> -						 vmf->address, &vmf->ptl);
>> +		vmf->pte = pte_offset_map_maywrite_nolock(vmf->vma->vm_mm,
>> +							  vmf->pmd, vmf->address,
>> +							  NULL, &vmf->ptl);
> 
> This might be the demonstration that the function name is becoming too long.
> 
> Can you find shorter names ?

Maybe use abbreviations?

pte_offset_map_ro_nolock()
pte_offset_map_rw_nolock()

> 
>>    		if (unlikely(!vmf->pte))
>>    			return 0;
>>    		vmf->orig_pte = ptep_get_lockless(vmf->pte);
David Hildenbrand Aug. 21, 2024, 9:41 a.m. UTC | #3
On 21.08.24 11:24, Qi Zheng wrote:
> 
> 
> On 2024/8/21 17:17, LEROY Christophe wrote:
>>
>>
>> Le 21/08/2024 à 10:18, Qi Zheng a écrit :
>>> In handle_pte_fault(), we may modify the vmf->pte after acquiring the
>>> vmf->ptl, so convert it to using pte_offset_map_maywrite_nolock(). But
>>> since we already do the pte_same() check, so there is no need to get
>>> pmdval to do pmd_same() check, just pass NULL to pmdvalp parameter.
>>>
>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>>> ---
>>>     mm/memory.c | 9 +++++++--
>>>     1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index 93c0c25433d02..d3378e98faf13 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -5504,9 +5504,14 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
>>>     		 * pmd by anon khugepaged, since that takes mmap_lock in write
>>>     		 * mode; but shmem or file collapse to THP could still morph
>>>     		 * it into a huge pmd: just retry later if so.
>>> +		 *
>>> +		 * Use the maywrite version to indicate that vmf->pte will be
>>> +		 * modified, but since we will use pte_same() to detect the
>>> +		 * change of the pte entry, there is no need to get pmdval.
>>>     		 */
>>> -		vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd,
>>> -						 vmf->address, &vmf->ptl);
>>> +		vmf->pte = pte_offset_map_maywrite_nolock(vmf->vma->vm_mm,
>>> +							  vmf->pmd, vmf->address,
>>> +							  NULL, &vmf->ptl);

I think we discussed that passing NULL should be forbidden for that 
function.

>>
>> This might be the demonstration that the function name is becoming too long.
>>
>> Can you find shorter names ?
> 
> Maybe use abbreviations?
> 
> pte_offset_map_ro_nolock()
> pte_offset_map_rw_nolock()

At least the "ro" is better, but "rw" does not express the "maywrite" -- 
because without taking the lock we are not allowed to write. But maybe 
"rw" is good enough for that if we document it properly.

And you can use up to 100 characters, if it helps readability.
Qi Zheng Aug. 21, 2024, 9:51 a.m. UTC | #4
On 2024/8/21 17:41, David Hildenbrand wrote:
> On 21.08.24 11:24, Qi Zheng wrote:
>>
>>
>> On 2024/8/21 17:17, LEROY Christophe wrote:
>>>
>>>
>>> Le 21/08/2024 à 10:18, Qi Zheng a écrit :
>>>> In handle_pte_fault(), we may modify the vmf->pte after acquiring the
>>>> vmf->ptl, so convert it to using pte_offset_map_maywrite_nolock(). But
>>>> since we already do the pte_same() check, so there is no need to get
>>>> pmdval to do pmd_same() check, just pass NULL to pmdvalp parameter.
>>>>
>>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>>>> ---
>>>>     mm/memory.c | 9 +++++++--
>>>>     1 file changed, 7 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/mm/memory.c b/mm/memory.c
>>>> index 93c0c25433d02..d3378e98faf13 100644
>>>> --- a/mm/memory.c
>>>> +++ b/mm/memory.c
>>>> @@ -5504,9 +5504,14 @@ static vm_fault_t handle_pte_fault(struct 
>>>> vm_fault *vmf)
>>>>              * pmd by anon khugepaged, since that takes mmap_lock in 
>>>> write
>>>>              * mode; but shmem or file collapse to THP could still 
>>>> morph
>>>>              * it into a huge pmd: just retry later if so.
>>>> +         *
>>>> +         * Use the maywrite version to indicate that vmf->pte will be
>>>> +         * modified, but since we will use pte_same() to detect the
>>>> +         * change of the pte entry, there is no need to get pmdval.
>>>>              */
>>>> -        vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd,
>>>> -                         vmf->address, &vmf->ptl);
>>>> +        vmf->pte = pte_offset_map_maywrite_nolock(vmf->vma->vm_mm,
>>>> +                              vmf->pmd, vmf->address,
>>>> +                              NULL, &vmf->ptl);
> 
> I think we discussed that passing NULL should be forbidden for that 
> function.

Yes, but for some maywrite case, there is no need to get pmdval to
do pmd_same() check. So I passed NULL and added a comment to
explain this.

> 
>>>
>>> This might be the demonstration that the function name is becoming 
>>> too long.
>>>
>>> Can you find shorter names ?
>>
>> Maybe use abbreviations?
>>
>> pte_offset_map_ro_nolock()
>> pte_offset_map_rw_nolock()
> 
> At least the "ro" is better, but "rw" does not express the "maywrite" -- 
> because without taking the lock we are not allowed to write. But maybe 
> "rw" is good enough for that if we document it properly.

OK, will change to it in the next version.

> 
> And you can use up to 100 characters, if it helps readability

Got it.

>
David Hildenbrand Aug. 21, 2024, 9:53 a.m. UTC | #5
On 21.08.24 11:51, Qi Zheng wrote:
> 
> 
> On 2024/8/21 17:41, David Hildenbrand wrote:
>> On 21.08.24 11:24, Qi Zheng wrote:
>>>
>>>
>>> On 2024/8/21 17:17, LEROY Christophe wrote:
>>>>
>>>>
>>>> Le 21/08/2024 à 10:18, Qi Zheng a écrit :
>>>>> In handle_pte_fault(), we may modify the vmf->pte after acquiring the
>>>>> vmf->ptl, so convert it to using pte_offset_map_maywrite_nolock(). But
>>>>> since we already do the pte_same() check, so there is no need to get
>>>>> pmdval to do pmd_same() check, just pass NULL to pmdvalp parameter.
>>>>>
>>>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>>>>> ---
>>>>>      mm/memory.c | 9 +++++++--
>>>>>      1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/mm/memory.c b/mm/memory.c
>>>>> index 93c0c25433d02..d3378e98faf13 100644
>>>>> --- a/mm/memory.c
>>>>> +++ b/mm/memory.c
>>>>> @@ -5504,9 +5504,14 @@ static vm_fault_t handle_pte_fault(struct
>>>>> vm_fault *vmf)
>>>>>               * pmd by anon khugepaged, since that takes mmap_lock in
>>>>> write
>>>>>               * mode; but shmem or file collapse to THP could still
>>>>> morph
>>>>>               * it into a huge pmd: just retry later if so.
>>>>> +         *
>>>>> +         * Use the maywrite version to indicate that vmf->pte will be
>>>>> +         * modified, but since we will use pte_same() to detect the
>>>>> +         * change of the pte entry, there is no need to get pmdval.
>>>>>               */
>>>>> -        vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd,
>>>>> -                         vmf->address, &vmf->ptl);
>>>>> +        vmf->pte = pte_offset_map_maywrite_nolock(vmf->vma->vm_mm,
>>>>> +                              vmf->pmd, vmf->address,
>>>>> +                              NULL, &vmf->ptl);
>>
>> I think we discussed that passing NULL should be forbidden for that
>> function.
> 
> Yes, but for some maywrite case, there is no need to get pmdval to
> do pmd_same() check. So I passed NULL and added a comment to
> explain this.

I wonder if it's better to pass a dummy variable instead. One has to 
think harder why that is required compared to blindly passing "NULL" :)
Qi Zheng Aug. 21, 2024, 10:03 a.m. UTC | #6
On 2024/8/21 17:53, David Hildenbrand wrote:
> On 21.08.24 11:51, Qi Zheng wrote:
>>
>>
>> On 2024/8/21 17:41, David Hildenbrand wrote:
>>> On 21.08.24 11:24, Qi Zheng wrote:
>>>>
>>>>
>>>> On 2024/8/21 17:17, LEROY Christophe wrote:
>>>>>
>>>>>
>>>>> Le 21/08/2024 à 10:18, Qi Zheng a écrit :
>>>>>> In handle_pte_fault(), we may modify the vmf->pte after acquiring the
>>>>>> vmf->ptl, so convert it to using pte_offset_map_maywrite_nolock(). 
>>>>>> But
>>>>>> since we already do the pte_same() check, so there is no need to get
>>>>>> pmdval to do pmd_same() check, just pass NULL to pmdvalp parameter.
>>>>>>
>>>>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>>>>>> ---
>>>>>>      mm/memory.c | 9 +++++++--
>>>>>>      1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/memory.c b/mm/memory.c
>>>>>> index 93c0c25433d02..d3378e98faf13 100644
>>>>>> --- a/mm/memory.c
>>>>>> +++ b/mm/memory.c
>>>>>> @@ -5504,9 +5504,14 @@ static vm_fault_t handle_pte_fault(struct
>>>>>> vm_fault *vmf)
>>>>>>               * pmd by anon khugepaged, since that takes mmap_lock in
>>>>>> write
>>>>>>               * mode; but shmem or file collapse to THP could still
>>>>>> morph
>>>>>>               * it into a huge pmd: just retry later if so.
>>>>>> +         *
>>>>>> +         * Use the maywrite version to indicate that vmf->pte 
>>>>>> will be
>>>>>> +         * modified, but since we will use pte_same() to detect the
>>>>>> +         * change of the pte entry, there is no need to get pmdval.
>>>>>>               */
>>>>>> -        vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd,
>>>>>> -                         vmf->address, &vmf->ptl);
>>>>>> +        vmf->pte = pte_offset_map_maywrite_nolock(vmf->vma->vm_mm,
>>>>>> +                              vmf->pmd, vmf->address,
>>>>>> +                              NULL, &vmf->ptl);
>>>
>>> I think we discussed that passing NULL should be forbidden for that
>>> function.
>>
>> Yes, but for some maywrite case, there is no need to get pmdval to
>> do pmd_same() check. So I passed NULL and added a comment to
>> explain this.
> 
> I wonder if it's better to pass a dummy variable instead. One has to 
> think harder why that is required compared to blindly passing "NULL" :)

You are afraid that subsequent caller will abuse this function, right?
My initial concern was that this would add a useless local vaiable, but
perhaps that is not a big deal.

Both are fine for me. ;)

>
David Hildenbrand Aug. 22, 2024, 9:29 a.m. UTC | #7
On 21.08.24 12:03, Qi Zheng wrote:
> 
> 
> On 2024/8/21 17:53, David Hildenbrand wrote:
>> On 21.08.24 11:51, Qi Zheng wrote:
>>>
>>>
>>> On 2024/8/21 17:41, David Hildenbrand wrote:
>>>> On 21.08.24 11:24, Qi Zheng wrote:
>>>>>
>>>>>
>>>>> On 2024/8/21 17:17, LEROY Christophe wrote:
>>>>>>
>>>>>>
>>>>>> Le 21/08/2024 à 10:18, Qi Zheng a écrit :
>>>>>>> In handle_pte_fault(), we may modify the vmf->pte after acquiring the
>>>>>>> vmf->ptl, so convert it to using pte_offset_map_maywrite_nolock().
>>>>>>> But
>>>>>>> since we already do the pte_same() check, so there is no need to get
>>>>>>> pmdval to do pmd_same() check, just pass NULL to pmdvalp parameter.
>>>>>>>
>>>>>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>>>>>>> ---
>>>>>>>       mm/memory.c | 9 +++++++--
>>>>>>>       1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/mm/memory.c b/mm/memory.c
>>>>>>> index 93c0c25433d02..d3378e98faf13 100644
>>>>>>> --- a/mm/memory.c
>>>>>>> +++ b/mm/memory.c
>>>>>>> @@ -5504,9 +5504,14 @@ static vm_fault_t handle_pte_fault(struct
>>>>>>> vm_fault *vmf)
>>>>>>>                * pmd by anon khugepaged, since that takes mmap_lock in
>>>>>>> write
>>>>>>>                * mode; but shmem or file collapse to THP could still
>>>>>>> morph
>>>>>>>                * it into a huge pmd: just retry later if so.
>>>>>>> +         *
>>>>>>> +         * Use the maywrite version to indicate that vmf->pte
>>>>>>> will be
>>>>>>> +         * modified, but since we will use pte_same() to detect the
>>>>>>> +         * change of the pte entry, there is no need to get pmdval.
>>>>>>>                */
>>>>>>> -        vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd,
>>>>>>> -                         vmf->address, &vmf->ptl);
>>>>>>> +        vmf->pte = pte_offset_map_maywrite_nolock(vmf->vma->vm_mm,
>>>>>>> +                              vmf->pmd, vmf->address,
>>>>>>> +                              NULL, &vmf->ptl);
>>>>
>>>> I think we discussed that passing NULL should be forbidden for that
>>>> function.
>>>
>>> Yes, but for some maywrite case, there is no need to get pmdval to
>>> do pmd_same() check. So I passed NULL and added a comment to
>>> explain this.
>>
>> I wonder if it's better to pass a dummy variable instead. One has to
>> think harder why that is required compared to blindly passing "NULL" :)
> 
> You are afraid that subsequent caller will abuse this function, right?

Yes! "oh, I don't need a pmdval, why would I? let's just pass NULL, easy" :)

> My initial concern was that this would add a useless local vaiable, but
> perhaps that is not a big deal.

How many of these "special" instances do we have?

> 
> Both are fine for me. ;)

Also no strong opinion, but having to pass a variable makes you think 
what you are supposed to do with it and why it is not optional.
Qi Zheng Aug. 22, 2024, 12:17 p.m. UTC | #8
Hi David,

On 2024/8/22 17:29, David Hildenbrand wrote:
> On 21.08.24 12:03, Qi Zheng wrote:
>>
>>

[...]

>>>>>>>> -        vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, 
>>>>>>>> vmf->pmd,
>>>>>>>> -                         vmf->address, &vmf->ptl);
>>>>>>>> +        vmf->pte = pte_offset_map_maywrite_nolock(vmf->vma->vm_mm,
>>>>>>>> +                              vmf->pmd, vmf->address,
>>>>>>>> +                              NULL, &vmf->ptl);
>>>>>
>>>>> I think we discussed that passing NULL should be forbidden for that
>>>>> function.
>>>>
>>>> Yes, but for some maywrite case, there is no need to get pmdval to
>>>> do pmd_same() check. So I passed NULL and added a comment to
>>>> explain this.
>>>
>>> I wonder if it's better to pass a dummy variable instead. One has to
>>> think harder why that is required compared to blindly passing "NULL" :)
>>
>> You are afraid that subsequent caller will abuse this function, right?
> 
> Yes! "oh, I don't need a pmdval, why would I? let's just pass NULL, 
> easy" :)
> 
>> My initial concern was that this would add a useless local vaiable, but
>> perhaps that is not a big deal.
> 
> How many of these "special" instances do we have?

We have 5 such special instances.

> 
>>
>> Both are fine for me. ;)
> 
> Also no strong opinion, but having to pass a variable makes you think 
> what you are supposed to do with it and why it is not optional.

Yeah, I added 'BUG_ON(!pmdvalp);' in pte_offset_map_ro_nolock(), and
have updated the v2 version [1].

[1]. 
https://lore.kernel.org/lkml/cover.1724310149.git.zhengqi.arch@bytedance.com/

Thanks,
Qi

>
David Hildenbrand Aug. 22, 2024, 12:19 p.m. UTC | #9
On 22.08.24 14:17, Qi Zheng wrote:
> Hi David,
> 
> On 2024/8/22 17:29, David Hildenbrand wrote:
>> On 21.08.24 12:03, Qi Zheng wrote:
>>>
>>>
> 
> [...]
> 
>>>>>>>>> -        vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm,
>>>>>>>>> vmf->pmd,
>>>>>>>>> -                         vmf->address, &vmf->ptl);
>>>>>>>>> +        vmf->pte = pte_offset_map_maywrite_nolock(vmf->vma->vm_mm,
>>>>>>>>> +                              vmf->pmd, vmf->address,
>>>>>>>>> +                              NULL, &vmf->ptl);
>>>>>>
>>>>>> I think we discussed that passing NULL should be forbidden for that
>>>>>> function.
>>>>>
>>>>> Yes, but for some maywrite case, there is no need to get pmdval to
>>>>> do pmd_same() check. So I passed NULL and added a comment to
>>>>> explain this.
>>>>
>>>> I wonder if it's better to pass a dummy variable instead. One has to
>>>> think harder why that is required compared to blindly passing "NULL" :)
>>>
>>> You are afraid that subsequent caller will abuse this function, right?
>>
>> Yes! "oh, I don't need a pmdval, why would I? let's just pass NULL,
>> easy" :)
>>
>>> My initial concern was that this would add a useless local vaiable, but
>>> perhaps that is not a big deal.
>>
>> How many of these "special" instances do we have?
> 
> We have 5 such special instances.
> 
>>
>>>
>>> Both are fine for me. ;)
>>
>> Also no strong opinion, but having to pass a variable makes you think
>> what you are supposed to do with it and why it is not optional.
> 
> Yeah, I added 'BUG_ON(!pmdvalp);' in pte_offset_map_ro_nolock(), and
> have updated the v2 version [1].

No BUG_ON please :) VM_WARN_ON_ONCE() is good enough.
Qi Zheng Aug. 22, 2024, 12:22 p.m. UTC | #10
On 2024/8/22 20:19, David Hildenbrand wrote:
> On 22.08.24 14:17, Qi Zheng wrote:
>> Hi David,
>>
>> On 2024/8/22 17:29, David Hildenbrand wrote:
>>> On 21.08.24 12:03, Qi Zheng wrote:
>>>>
>>>>
>>
>> [...]
>>
>>>>>>>>>> -        vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm,
>>>>>>>>>> vmf->pmd,
>>>>>>>>>> -                         vmf->address, &vmf->ptl);
>>>>>>>>>> +        vmf->pte = 
>>>>>>>>>> pte_offset_map_maywrite_nolock(vmf->vma->vm_mm,
>>>>>>>>>> +                              vmf->pmd, vmf->address,
>>>>>>>>>> +                              NULL, &vmf->ptl);
>>>>>>>
>>>>>>> I think we discussed that passing NULL should be forbidden for that
>>>>>>> function.
>>>>>>
>>>>>> Yes, but for some maywrite case, there is no need to get pmdval to
>>>>>> do pmd_same() check. So I passed NULL and added a comment to
>>>>>> explain this.
>>>>>
>>>>> I wonder if it's better to pass a dummy variable instead. One has to
>>>>> think harder why that is required compared to blindly passing 
>>>>> "NULL" :)
>>>>
>>>> You are afraid that subsequent caller will abuse this function, right?
>>>
>>> Yes! "oh, I don't need a pmdval, why would I? let's just pass NULL,
>>> easy" :)
>>>
>>>> My initial concern was that this would add a useless local vaiable, but
>>>> perhaps that is not a big deal.
>>>
>>> How many of these "special" instances do we have?
>>
>> We have 5 such special instances.
>>
>>>
>>>>
>>>> Both are fine for me. ;)
>>>
>>> Also no strong opinion, but having to pass a variable makes you think
>>> what you are supposed to do with it and why it is not optional.
>>
>> Yeah, I added 'BUG_ON(!pmdvalp);' in pte_offset_map_ro_nolock(), and
>> have updated the v2 version [1].
> 
> No BUG_ON please :) VM_WARN_ON_ONCE() is good enough.

Got it. Will do in the next version.

>
diff mbox series

Patch

diff --git a/mm/memory.c b/mm/memory.c
index 93c0c25433d02..d3378e98faf13 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5504,9 +5504,14 @@  static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 		 * pmd by anon khugepaged, since that takes mmap_lock in write
 		 * mode; but shmem or file collapse to THP could still morph
 		 * it into a huge pmd: just retry later if so.
+		 *
+		 * Use the maywrite version to indicate that vmf->pte will be
+		 * modified, but since we will use pte_same() to detect the
+		 * change of the pte entry, there is no need to get pmdval.
 		 */
-		vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd,
-						 vmf->address, &vmf->ptl);
+		vmf->pte = pte_offset_map_maywrite_nolock(vmf->vma->vm_mm,
+							  vmf->pmd, vmf->address,
+							  NULL, &vmf->ptl);
 		if (unlikely(!vmf->pte))
 			return 0;
 		vmf->orig_pte = ptep_get_lockless(vmf->pte);