diff mbox series

[v2] mm/mmap: return early if it can't merge in vma_merge()

Message ID 20240221091453.1785076-1-yajun.deng@linux.dev (mailing list archive)
State New
Headers show
Series [v2] mm/mmap: return early if it can't merge in vma_merge() | expand

Commit Message

Yajun Deng Feb. 21, 2024, 9:14 a.m. UTC
In most cases, the range of the area is valid. But in do_mprotect_pkey(),
the minimum value of end and vma->vm_end is passed to mprotect_fixup().
This will lead to the end is less than the end of prev.

In this case, the curr will be NULL, but the next will be equal to the
prev. So it will attempt to merge before, the vm_pgoff check will cause
this case to fail.

To avoid the process described above and reduce unnecessary operations.
Add a check to immediately return NULL if the end is less than the end of
prev.

Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
---
v2: remove the case label.
v1: https://lore.kernel.org/all/20240218085028.3294332-1-yajun.deng@linux.dev/
---
 mm/mmap.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Liam R. Howlett Feb. 21, 2024, 3:38 p.m. UTC | #1
* Yajun Deng <yajun.deng@linux.dev> [240221 04:15]:
> In most cases, the range of the area is valid. But in do_mprotect_pkey(),
> the minimum value of end and vma->vm_end is passed to mprotect_fixup().
> This will lead to the end is less than the end of prev.
> 
> In this case, the curr will be NULL, but the next will be equal to the
> prev. So it will attempt to merge before, the vm_pgoff check will cause
> this case to fail.
> 
> To avoid the process described above and reduce unnecessary operations.
> Add a check to immediately return NULL if the end is less than the end of
> prev.

If it's only one caller, could we stop that caller instead of checking
an almost never case for all callers?  Would this better fit in
vma_modify()?  Although that's not just for this caller at this point.
Maybe there isn't a good place?

Or are there other reasons this may happen and is better done in this
function?

Often, this is called the "punch a hole" scenario; where an operation
creates two entries from the old data and either leaves an empty space
or fills the space with a new VMA.

> 
> Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
> ---
> v2: remove the case label.
> v1: https://lore.kernel.org/all/20240218085028.3294332-1-yajun.deng@linux.dev/
> ---
>  mm/mmap.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 0fccd23f056e..7668854d2246 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -890,6 +890,9 @@ static struct vm_area_struct
>  	if (vm_flags & VM_SPECIAL)
>  		return NULL;
>  
> +	if (prev && end < prev->vm_end)
> +		return NULL;
> +
>  	/* Does the input range span an existing VMA? (cases 5 - 8) */
>  	curr = find_vma_intersection(mm, prev ? prev->vm_end : 0, end);
>  
> -- 
> 2.25.1
>
Lorenzo Stoakes Feb. 21, 2024, 8:41 p.m. UTC | #2
On Wed, Feb 21, 2024 at 10:38:27AM -0500, Liam R. Howlett wrote:
> * Yajun Deng <yajun.deng@linux.dev> [240221 04:15]:
> > In most cases, the range of the area is valid. But in do_mprotect_pkey(),
> > the minimum value of end and vma->vm_end is passed to mprotect_fixup().
> > This will lead to the end is less than the end of prev.
> >
> > In this case, the curr will be NULL, but the next will be equal to the
> > prev. So it will attempt to merge before, the vm_pgoff check will cause
> > this case to fail.
> >
> > To avoid the process described above and reduce unnecessary operations.
> > Add a check to immediately return NULL if the end is less than the end of
> > prev.
>
> If it's only one caller, could we stop that caller instead of checking
> an almost never case for all callers?  Would this better fit in
> vma_modify()?  Although that's not just for this caller at this point.
> Maybe there isn't a good place?

I definitely agree with Liam that this should not be in vma_merge(), as
it's not going to be relevant to _most_ callers.

I am not sure vma_modify() is much better, this would be the only early
exit check in that function and makes what is very simple and
straightforward now more confusing.

And I think this is the crux of it - it's confusing that we special case
this one particular non-merge scenario, but no others (all of which we then
deem ok to be caught by the usual rules).

I think it's simpler (and more efficient) to just keep things the way they
are.

>
> Or are there other reasons this may happen and is better done in this
> function?
>
> Often, this is called the "punch a hole" scenario; where an operation
> creates two entries from the old data and either leaves an empty space
> or fills the space with a new VMA.
>
> >
> > Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
> > ---
> > v2: remove the case label.
> > v1: https://lore.kernel.org/all/20240218085028.3294332-1-yajun.deng@linux.dev/
> > ---
> >  mm/mmap.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 0fccd23f056e..7668854d2246 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -890,6 +890,9 @@ static struct vm_area_struct
> >  	if (vm_flags & VM_SPECIAL)
> >  		return NULL;
> >
> > +	if (prev && end < prev->vm_end)
> > +		return NULL;
> > +
> >  	/* Does the input range span an existing VMA? (cases 5 - 8) */
> >  	curr = find_vma_intersection(mm, prev ? prev->vm_end : 0, end);
> >
> > --
> > 2.25.1
> >

So overall I don't think this check makes much sense anywhere.

I think a better solution would be to prevent it happening _at source_ if
you can find a logical way of doing so.

I do plan to do some cleanup passes over this stuff once again so maybe I
can figure something out that better handles non-merge scenarios.

This is a great find though overall even if a patch doesn't make sense
Yajun, thanks for this, it's really made me think about this case (+ others
like it) :)
Yajun Deng Feb. 22, 2024, 7:47 a.m. UTC | #3
On 2024/2/22 04:41, Lorenzo Stoakes wrote:
> On Wed, Feb 21, 2024 at 10:38:27AM -0500, Liam R. Howlett wrote:
>> * Yajun Deng <yajun.deng@linux.dev> [240221 04:15]:
>>> In most cases, the range of the area is valid. But in do_mprotect_pkey(),
>>> the minimum value of end and vma->vm_end is passed to mprotect_fixup().
>>> This will lead to the end is less than the end of prev.
>>>
>>> In this case, the curr will be NULL, but the next will be equal to the
>>> prev. So it will attempt to merge before, the vm_pgoff check will cause
>>> this case to fail.
>>>
>>> To avoid the process described above and reduce unnecessary operations.
>>> Add a check to immediately return NULL if the end is less than the end of
>>> prev.
>> If it's only one caller, could we stop that caller instead of checking
>> an almost never case for all callers?  Would this better fit in
>> vma_modify()?  Although that's not just for this caller at this point.
>> Maybe there isn't a good place?
> I definitely agree with Liam that this should not be in vma_merge(), as
> it's not going to be relevant to _most_ callers.
>
> I am not sure vma_modify() is much better, this would be the only early
> exit check in that function and makes what is very simple and
> straightforward now more confusing.


There are two paths that will cause this case. One is in 
mprotect_fixup(), the other is in

madvise_update_vma().


One way is to separate out the split_vma() from vma_modify(). And create 
a new helper function.

We can call it directly at source, but we need to do this in both 
paths.  It's more complicated.


The other way is to add a check in vma_modify(). Like the following:

diff --git a/mm/mmap.c b/mm/mmap.c
index 0fccd23f056e..f93f1d3939f2 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2431,11 +2431,15 @@ struct vm_area_struct *vma_modify(struct 
vma_iterator *vmi,
         pgoff_t pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> 
PAGE_SHIFT);
         struct vm_area_struct *merged;

+       if (prev && end < prev->vm_end)
+               goto cannot_merge;
+
         merged = vma_merge(vmi, prev, vma, start, end, vm_flags,
                            pgoff, policy, uffd_ctx, anon_name);
         if (merged)
                 return merged;

+cannot_merge:
         if (vma->vm_start < start) {
                 int err = split_vma(vmi, vma, start, 1);


> And I think this is the crux of it - it's confusing that we special case
> this one particular non-merge scenario, but no others (all of which we then
> deem ok to be caught by the usual rules).
>
> I think it's simpler (and more efficient) to just keep things the way they
> are.
>
>> Or are there other reasons this may happen and is better done in this
>> function?
>>
>> Often, this is called the "punch a hole" scenario; where an operation
>> creates two entries from the old data and either leaves an empty space
>> or fills the space with a new VMA.
>>
>>> Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
>>> ---
>>> v2: remove the case label.
>>> v1: https://lore.kernel.org/all/20240218085028.3294332-1-yajun.deng@linux.dev/
>>> ---
>>>   mm/mmap.c | 3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>> index 0fccd23f056e..7668854d2246 100644
>>> --- a/mm/mmap.c
>>> +++ b/mm/mmap.c
>>> @@ -890,6 +890,9 @@ static struct vm_area_struct
>>>   	if (vm_flags & VM_SPECIAL)
>>>   		return NULL;
>>>
>>> +	if (prev && end < prev->vm_end)
>>> +		return NULL;
>>> +
>>>   	/* Does the input range span an existing VMA? (cases 5 - 8) */
>>>   	curr = find_vma_intersection(mm, prev ? prev->vm_end : 0, end);
>>>
>>> --
>>> 2.25.1
>>>
> So overall I don't think this check makes much sense anywhere.
>
> I think a better solution would be to prevent it happening _at source_ if
> you can find a logical way of doing so.
>
> I do plan to do some cleanup passes over this stuff once again so maybe I
> can figure something out that better handles non-merge scenarios.
>
> This is a great find though overall even if a patch doesn't make sense
> Yajun, thanks for this, it's really made me think about this case (+ others
> like it) :)
Lorenzo Stoakes Feb. 22, 2024, 8:31 a.m. UTC | #4
On Thu, Feb 22, 2024 at 03:47:04PM +0800, Yajun Deng wrote:
>
> On 2024/2/22 04:41, Lorenzo Stoakes wrote:
> > On Wed, Feb 21, 2024 at 10:38:27AM -0500, Liam R. Howlett wrote:
> > > * Yajun Deng <yajun.deng@linux.dev> [240221 04:15]:
> > > > In most cases, the range of the area is valid. But in do_mprotect_pkey(),
> > > > the minimum value of end and vma->vm_end is passed to mprotect_fixup().
> > > > This will lead to the end is less than the end of prev.
> > > >
> > > > In this case, the curr will be NULL, but the next will be equal to the
> > > > prev. So it will attempt to merge before, the vm_pgoff check will cause
> > > > this case to fail.
> > > >
> > > > To avoid the process described above and reduce unnecessary operations.
> > > > Add a check to immediately return NULL if the end is less than the end of
> > > > prev.
> > > If it's only one caller, could we stop that caller instead of checking
> > > an almost never case for all callers?  Would this better fit in
> > > vma_modify()?  Although that's not just for this caller at this point.
> > > Maybe there isn't a good place?
> > I definitely agree with Liam that this should not be in vma_merge(), as
> > it's not going to be relevant to _most_ callers.
> >
> > I am not sure vma_modify() is much better, this would be the only early
> > exit check in that function and makes what is very simple and
> > straightforward now more confusing.
>
>
> There are two paths that will cause this case. One is in mprotect_fixup(),
> the other is in
>
> madvise_update_vma().
>
>
> One way is to separate out the split_vma() from vma_modify(). And create a
> new helper function.

Absolutely not. I wrote the vma_modify() patch series explicitly to expose
_less_ not more.

>
> We can call it directly at source, but we need to do this in both paths.
> It's more complicated.
>
>
> The other way is to add a check in vma_modify(). Like the following:

As I said above, I really don't think this is a good idea, you're just
special casing one non-merge case but not any others + adding an
unnecessary branch.

>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 0fccd23f056e..f93f1d3939f2 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2431,11 +2431,15 @@ struct vm_area_struct *vma_modify(struct
> vma_iterator *vmi,
>         pgoff_t pgoff = vma->vm_pgoff + ((start - vma->vm_start) >>
> PAGE_SHIFT);
>         struct vm_area_struct *merged;
>
> +       if (prev && end < prev->vm_end)
> +               goto cannot_merge;
> +
>         merged = vma_merge(vmi, prev, vma, start, end, vm_flags,
>                            pgoff, policy, uffd_ctx, anon_name);
>         if (merged)
>                 return merged;
>
> +cannot_merge:
>         if (vma->vm_start < start) {
>                 int err = split_vma(vmi, vma, start, 1);
>
>
> > And I think this is the crux of it - it's confusing that we special case
> > this one particular non-merge scenario, but no others (all of which we then
> > deem ok to be caught by the usual rules).
> >
> > I think it's simpler (and more efficient) to just keep things the way they
> > are.
> >
> > > Or are there other reasons this may happen and is better done in this
> > > function?
> > >
> > > Often, this is called the "punch a hole" scenario; where an operation
> > > creates two entries from the old data and either leaves an empty space
> > > or fills the space with a new VMA.
> > >
> > > > Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
> > > > ---
> > > > v2: remove the case label.
> > > > v1: https://lore.kernel.org/all/20240218085028.3294332-1-yajun.deng@linux.dev/
> > > > ---
> > > >   mm/mmap.c | 3 +++
> > > >   1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > > index 0fccd23f056e..7668854d2246 100644
> > > > --- a/mm/mmap.c
> > > > +++ b/mm/mmap.c
> > > > @@ -890,6 +890,9 @@ static struct vm_area_struct
> > > >   	if (vm_flags & VM_SPECIAL)
> > > >   		return NULL;
> > > >
> > > > +	if (prev && end < prev->vm_end)
> > > > +		return NULL;
> > > > +
> > > >   	/* Does the input range span an existing VMA? (cases 5 - 8) */
> > > >   	curr = find_vma_intersection(mm, prev ? prev->vm_end : 0, end);
> > > >
> > > > --
> > > > 2.25.1
> > > >
> > So overall I don't think this check makes much sense anywhere.
> >
> > I think a better solution would be to prevent it happening _at source_ if
> > you can find a logical way of doing so.
> >
> > I do plan to do some cleanup passes over this stuff once again so maybe I
> > can figure something out that better handles non-merge scenarios.
> >
> > This is a great find though overall even if a patch doesn't make sense
> > Yajun, thanks for this, it's really made me think about this case (+ others
> > like it) :)

I guess maybe again I've not been clear enough on this, so unless
compelling reasoning can otherwise be provided, I feel this check should
not be added _anywhere_.

Therefore, NACK.
Yajun Deng Feb. 22, 2024, 8:37 a.m. UTC | #5
On 2024/2/22 16:31, Lorenzo Stoakes wrote:
> On Thu, Feb 22, 2024 at 03:47:04PM +0800, Yajun Deng wrote:
>> On 2024/2/22 04:41, Lorenzo Stoakes wrote:
>>> On Wed, Feb 21, 2024 at 10:38:27AM -0500, Liam R. Howlett wrote:
>>>> * Yajun Deng <yajun.deng@linux.dev> [240221 04:15]:
>>>>> In most cases, the range of the area is valid. But in do_mprotect_pkey(),
>>>>> the minimum value of end and vma->vm_end is passed to mprotect_fixup().
>>>>> This will lead to the end is less than the end of prev.
>>>>>
>>>>> In this case, the curr will be NULL, but the next will be equal to the
>>>>> prev. So it will attempt to merge before, the vm_pgoff check will cause
>>>>> this case to fail.
>>>>>
>>>>> To avoid the process described above and reduce unnecessary operations.
>>>>> Add a check to immediately return NULL if the end is less than the end of
>>>>> prev.
>>>> If it's only one caller, could we stop that caller instead of checking
>>>> an almost never case for all callers?  Would this better fit in
>>>> vma_modify()?  Although that's not just for this caller at this point.
>>>> Maybe there isn't a good place?
>>> I definitely agree with Liam that this should not be in vma_merge(), as
>>> it's not going to be relevant to _most_ callers.
>>>
>>> I am not sure vma_modify() is much better, this would be the only early
>>> exit check in that function and makes what is very simple and
>>> straightforward now more confusing.
>>
>> There are two paths that will cause this case. One is in mprotect_fixup(),
>> the other is in
>>
>> madvise_update_vma().
>>
>>
>> One way is to separate out the split_vma() from vma_modify(). And create a
>> new helper function.
> Absolutely not. I wrote the vma_modify() patch series explicitly to expose
> _less_ not more.
>
>> We can call it directly at source, but we need to do this in both paths.
>> It's more complicated.
>>
>>
>> The other way is to add a check in vma_modify(). Like the following:
> As I said above, I really don't think this is a good idea, you're just
> special casing one non-merge case but not any others + adding an
> unnecessary branch.
>
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index 0fccd23f056e..f93f1d3939f2 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -2431,11 +2431,15 @@ struct vm_area_struct *vma_modify(struct
>> vma_iterator *vmi,
>>          pgoff_t pgoff = vma->vm_pgoff + ((start - vma->vm_start) >>
>> PAGE_SHIFT);
>>          struct vm_area_struct *merged;
>>
>> +       if (prev && end < prev->vm_end)
>> +               goto cannot_merge;
>> +
>>          merged = vma_merge(vmi, prev, vma, start, end, vm_flags,
>>                             pgoff, policy, uffd_ctx, anon_name);
>>          if (merged)
>>                  return merged;
>>
>> +cannot_merge:
>>          if (vma->vm_start < start) {
>>                  int err = split_vma(vmi, vma, start, 1);
>>
>>
>>> And I think this is the crux of it - it's confusing that we special case
>>> this one particular non-merge scenario, but no others (all of which we then
>>> deem ok to be caught by the usual rules).
>>>
>>> I think it's simpler (and more efficient) to just keep things the way they
>>> are.
>>>
>>>> Or are there other reasons this may happen and is better done in this
>>>> function?
>>>>
>>>> Often, this is called the "punch a hole" scenario; where an operation
>>>> creates two entries from the old data and either leaves an empty space
>>>> or fills the space with a new VMA.
>>>>
>>>>> Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
>>>>> ---
>>>>> v2: remove the case label.
>>>>> v1: https://lore.kernel.org/all/20240218085028.3294332-1-yajun.deng@linux.dev/
>>>>> ---
>>>>>    mm/mmap.c | 3 +++
>>>>>    1 file changed, 3 insertions(+)
>>>>>
>>>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>>>> index 0fccd23f056e..7668854d2246 100644
>>>>> --- a/mm/mmap.c
>>>>> +++ b/mm/mmap.c
>>>>> @@ -890,6 +890,9 @@ static struct vm_area_struct
>>>>>    	if (vm_flags & VM_SPECIAL)
>>>>>    		return NULL;
>>>>>
>>>>> +	if (prev && end < prev->vm_end)
>>>>> +		return NULL;
>>>>> +
>>>>>    	/* Does the input range span an existing VMA? (cases 5 - 8) */
>>>>>    	curr = find_vma_intersection(mm, prev ? prev->vm_end : 0, end);
>>>>>
>>>>> --
>>>>> 2.25.1
>>>>>
>>> So overall I don't think this check makes much sense anywhere.
>>>
>>> I think a better solution would be to prevent it happening _at source_ if
>>> you can find a logical way of doing so.
>>>
>>> I do plan to do some cleanup passes over this stuff once again so maybe I
>>> can figure something out that better handles non-merge scenarios.
>>>
>>> This is a great find though overall even if a patch doesn't make sense
>>> Yajun, thanks for this, it's really made me think about this case (+ others
>>> like it) :)
> I guess maybe again I've not been clear enough on this, so unless
> compelling reasoning can otherwise be provided, I feel this check should
> not be added _anywhere_.


Okay, I got it. Thank you!

> Therefore, NACK.
Lorenzo Stoakes Feb. 22, 2024, 8:41 a.m. UTC | #6
On Thu, Feb 22, 2024 at 04:37:44PM +0800, Yajun Deng wrote:
>
[snip]>
>
> Okay, I got it. Thank you!
>

Please do keep contributing though, you have really raised something
interesting here and I know you have a maple tree patch going also, it's
all very much appreciated :)

Thanks, Lorenzo
diff mbox series

Patch

diff --git a/mm/mmap.c b/mm/mmap.c
index 0fccd23f056e..7668854d2246 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -890,6 +890,9 @@  static struct vm_area_struct
 	if (vm_flags & VM_SPECIAL)
 		return NULL;
 
+	if (prev && end < prev->vm_end)
+		return NULL;
+
 	/* Does the input range span an existing VMA? (cases 5 - 8) */
 	curr = find_vma_intersection(mm, prev ? prev->vm_end : 0, end);