diff mbox series

[1/2] mm/vma: fix gap check for unmapped_area with VM_GROWSDOWN

Message ID 20250127075527.16614-2-richard.weiyang@gmail.com (mailing list archive)
State New
Headers show
Series vma: fix unmapped_area() | expand

Commit Message

Wei Yang Jan. 27, 2025, 7:55 a.m. UTC
Current unmapped_area() may fail to get the available area.

For example, we have a vma range like below:

    0123456789abcdef
      m  m  A m  m

Let's assume start_gap is 0x2000 and stack_guard_gap is 0x1000. And now
we are looking for free area with size 0x1000 within [0x2000, 0xd000].

The unmapped_area_topdown() could find address at 0x8000, while
unmapped_area() fails.

In original code before commit 3499a13168da ("mm/mmap: use maple tree
for unmapped_area{_topdown}"), the logic is:

  * find a gap with total length, including alignment
  * adjust start address with alignment

What we do now is:

  * find a gap with total length, including alignment
  * adjust start address with alignment
  * then compare the left range with total length

This is not necessary to compare with total length again after start
address adjusted.

Also, it is not correct to minus 1 here. This may not trigger an issue
in real world, since address are usually aligned with page size.

Fixes: 58c5d0d6d522 ("mm/mmap: regression fix for unmapped_area{_topdown}")
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Liam R. Howlett <Liam.Howlett@Oracle.com>
CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
CC: Vlastimil Babka <vbabka@suse.cz>
CC: Jann Horn <jannh@google.com>
CC: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: <stable@vger.kernel.org>
---
 mm/vma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Lorenzo Stoakes Jan. 27, 2025, 12:08 p.m. UTC | #1
You have a subject line of 'fix gap check for unmapped_area with
VM_GROWSDOWN'. I'm not sure this is quite accurate.

I don't really have time to do a deep dive (again, this is why it's so
important to give a decent commit message - explaining under what _real
world_ circumstances this will be used etc.).

But anyway, it seems it will only be the case if MMF_TOPDOWN is not set in
the mm flags, which usually requires an mmap compatibility mode to achieve
unless the arch otherwise forces it.

And these arches would be ones where the stack grows UP, right? Or at least
ones where this is possible?

So already we're into specifics - either arches that grow the stack up, or
ones that intentionally use the old mmap compatibility mode are affected.

This happens in:

[ pretty much all unmapped area callers ]
-> vm_unmapped_area()
-> unmapped_area() (if !(info->flags & VM_UNMAPPED_AREA_TOPDOWN)

Where VM_UNMAPPED_AREA_TOPDOWN is only not set in the circumstances
mentioned above.

So, for this issue you claim is the case to happen, you have to:

1. Either be using a stack grows up arch, or enabling an mmap()
compatibility mode.
2. Also set MAP_GROWSDOWN on the mmap() call, which is translated to
VM_GROWSDOWN.

We are already far from 'fix gap check for VM_GROWSDOWN' right? I mean I
don't have the time to absolutely dive into the guts of this, but I assume
this is correct right?

I'm not saying we shouldn't address this, but it's VITAL to clarify what
exactly it is you're tackling.

On Mon, Jan 27, 2025 at 07:55:26AM +0000, Wei Yang wrote:
> Current unmapped_area() may fail to get the available area.
>
> For example, we have a vma range like below:
>
>     0123456789abcdef
>       m  m  A m  m

I don't understand this diagram at all. What is going on here?  What is 'm'
what is 'A' what are these values? page offsets * 0x1000?

is that a page of memory allocated at each 'm'? Is A somehow an address
under consideration?

You _have_ to add a key and explanation here, my mind reading skills are
much diminished in my old age... :P

>
> Let's assume start_gap is 0x2000 and stack_guard_gap is 0x1000. And now
> we are looking for free area with size 0x1000 within [0x2000, 0xd000].

How can start_gap be 0x2000 when it is only ever 0x1000 at most and only
applicable in x86-64 anyway?

>

It'd be good if you'd shown this on the diagram somehow?

Like this:

  <--------->
0123456789abcdef
  m  m  A m  m

But then I'm confused as to what A is once again.

Ideally you'd actually provide what the struct vm_unmapped_area_info fields
actually are with other parameters and _work through_ an example.

Also you're now talking about a stack but you haven't mentioned the word
'stack' anywhere in any part of this series afaict.

'Fix case where the arch grows stacks upwards or we are in legacy mmap mode
but still want to map a grows-down stack' is a LOT more specific than 'fix
unmapped_area()'.

> The unmapped_area_topdown() could find address at 0x8000, while
> unmapped_area() fails.

OK you need to WORK THROUGH why this is. You're putting all the work on me
as a reviewer to go check to see if this is indeed the case. It's not a
fair distribution of work.

>
> In original code before commit 3499a13168da ("mm/mmap: use maple tree
> for unmapped_area{_topdown}"), the logic is:
>
>   * find a gap with total length, including alignment
>   * adjust start address with alignment
>
> What we do now is:
>
>   * find a gap with total length, including alignment
>   * adjust start address with alignment
>   * then compare the left range with total length

What is 'left range'? This explanation is really hard to follow.

>
> This is not necessary to compare with total length again after start
> address adjusted.
>
> Also, it is not correct to minus 1 here. This may not trigger an issue
> in real world, since address are usually aligned with page size.

You aren't saying why.

Also the VMA's start is _always_ page-aligned.

Presumably the minus 1 is intentionally to amke it an inclusive value once
offset by length?

>
> Fixes: 58c5d0d6d522 ("mm/mmap: regression fix for unmapped_area{_topdown}")

Fixes it how?

> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> CC: Liam R. Howlett <Liam.Howlett@Oracle.com>
> CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> CC: Vlastimil Babka <vbabka@suse.cz>
> CC: Jann Horn <jannh@google.com>
> CC: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Cc: <stable@vger.kernel.org>
> ---
>  mm/vma.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/vma.c b/mm/vma.c
> index 3f45a245e31b..d82fdbc710b0 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -2668,7 +2668,7 @@ unsigned long unmapped_area(struct vm_unmapped_area_info *info)
>  	gap += (info->align_offset - gap) & info->align_mask;
>  	tmp = vma_next(&vmi);
>  	if (tmp && (tmp->vm_flags & VM_STARTGAP_FLAGS)) { /* Avoid prev check if possible */
> -		if (vm_start_gap(tmp) < gap + length - 1) {
> +		if (vm_start_gap(tmp) < gap + info->length) {

Have already spent all morning on this :) Sigh.

OK so let's expand this (again - this is the kind of thing you should do
for a tricky change like this).

info->start_gap is set based on stack_guard_placement() and is either
PAGE_SIZE (0x1000) if VM_SHADOW_STACK is set or 0. This is only applicable
in x86-64.

The align mask is likely to be PAGE_SIZE - 1 but can vary.

length = info->length + align_mask + start_gap

This takes into account the worst possible alignment overhead accounting for any following VMA also.

gap is equal to the current start of the range under consideration (via
vma_iter_addr) and as well institutes the appropriate alignment and
accounts for start_gap for any _prior_ VMA.

Then we try to find the vm_start_gap() for a candidate _next_ VMA, which if
a stack, uses stack_guard_gap() to SUBTRACT the stack guard gap from
vma->vm_start or account for the shadow stack.

Then we finally have:

if (next->vm_start - stack gap < start_of_range + [preceeding gap/alignment requirements] + [worst case length] - 1) {
   Try again
}

Or:

start_of_range >= next->vm_start - stack gap + [preceeding gap/alignment requirements] + [worst case length] + 1

start of range
  v
  |[preceeding gap][ VMA, worst length ][following gap]<stack gap>

Surely the + 1 (which is the -1 in the original calculation) is simply
accounting for the fact that the start of the range is _inclusive_?

You're proposing eliminating entirely the after-this-VMA requirements... why?

This just seems incorrect to me?

You really need to argue the case better if this has some validity, otherwise I think it's just wrong?

>  			low_limit = tmp->vm_end;
>  			vma_iter_reset(&vmi);
>  			goto retry;
> --
> 2.34.1
>
Wei Yang Jan. 28, 2025, 3:30 a.m. UTC | #2
On Mon, Jan 27, 2025 at 12:08:04PM +0000, Lorenzo Stoakes wrote:
>You have a subject line of 'fix gap check for unmapped_area with
>VM_GROWSDOWN'. I'm not sure this is quite accurate.
>
>I don't really have time to do a deep dive (again, this is why it's so
>important to give a decent commit message - explaining under what _real
>world_ circumstances this will be used etc.).
>
>But anyway, it seems it will only be the case if MMF_TOPDOWN is not set in
>the mm flags, which usually requires an mmap compatibility mode to achieve
>unless the arch otherwise forces it.
>
>And these arches would be ones where the stack grows UP, right? Or at least
>ones where this is possible?
>
>So already we're into specifics - either arches that grow the stack up, or
>ones that intentionally use the old mmap compatibility mode are affected.
>
>This happens in:
>
>[ pretty much all unmapped area callers ]
>-> vm_unmapped_area()
>-> unmapped_area() (if !(info->flags & VM_UNMAPPED_AREA_TOPDOWN)
>
>Where VM_UNMAPPED_AREA_TOPDOWN is only not set in the circumstances
>mentioned above.
>
>So, for this issue you claim is the case to happen, you have to:
>
>1. Either be using a stack grows up arch, or enabling an mmap()
>compatibility mode.
>2. Also set MAP_GROWSDOWN on the mmap() call, which is translated to
>VM_GROWSDOWN.
>
>We are already far from 'fix gap check for VM_GROWSDOWN' right? I mean I
>don't have the time to absolutely dive into the guts of this, but I assume
>this is correct right?
>
>I'm not saying we shouldn't address this, but it's VITAL to clarify what
>exactly it is you're tackling.
>

Thanks for taking a look.

If my understanding is correct, your concern here is the case here never
happen in real world.

  We are searching a gap bottom-up, while the vma wants top-down.

This maybe possible to me. Here is my understanding, (but maybe not correct).

We have two separate flags affecting the search:

  * mm->flags:      MMF_TOPDOWN  or not
  * vma->vm_flags:  VM_GROWSDOWN or VM_GROWSUP

To me, they are independent.

For mm->flags, arch_pick_mmap_layout() could set/clear MMF_TOPDOWN it based on
the result of mmap_is_legacy(). Even we provide a sysctl file
/proc/sys/vm/legacy_vm_layout for configuration.


For vma->vm_flags, for general, VM_STACK is set to VM_GROWSDOWN by default.
And we use the flag in __bprm_mm_init() and setup_arg_pages().

So to me the case is real and not a very specific one.

But maybe I missed some background. Would you mind telling me the miss part,
if it is not too time wasting?
Lorenzo Stoakes Jan. 28, 2025, 6:32 a.m. UTC | #3
On Tue, Jan 28, 2025 at 03:30:30AM +0000, Wei Yang wrote:
> On Mon, Jan 27, 2025 at 12:08:04PM +0000, Lorenzo Stoakes wrote:
> >You have a subject line of 'fix gap check for unmapped_area with
> >VM_GROWSDOWN'. I'm not sure this is quite accurate.
> >
> >I don't really have time to do a deep dive (again, this is why it's so
> >important to give a decent commit message - explaining under what _real
> >world_ circumstances this will be used etc.).
> >
> >But anyway, it seems it will only be the case if MMF_TOPDOWN is not set in
> >the mm flags, which usually requires an mmap compatibility mode to achieve
> >unless the arch otherwise forces it.
> >
> >And these arches would be ones where the stack grows UP, right? Or at least
> >ones where this is possible?
> >
> >So already we're into specifics - either arches that grow the stack up, or
> >ones that intentionally use the old mmap compatibility mode are affected.
> >
> >This happens in:
> >
> >[ pretty much all unmapped area callers ]
> >-> vm_unmapped_area()
> >-> unmapped_area() (if !(info->flags & VM_UNMAPPED_AREA_TOPDOWN)
> >
> >Where VM_UNMAPPED_AREA_TOPDOWN is only not set in the circumstances
> >mentioned above.
> >
> >So, for this issue you claim is the case to happen, you have to:
> >
> >1. Either be using a stack grows up arch, or enabling an mmap()
> >compatibility mode.
> >2. Also set MAP_GROWSDOWN on the mmap() call, which is translated to
> >VM_GROWSDOWN.
> >
> >We are already far from 'fix gap check for VM_GROWSDOWN' right? I mean I
> >don't have the time to absolutely dive into the guts of this, but I assume
> >this is correct right?
> >
> >I'm not saying we shouldn't address this, but it's VITAL to clarify what
> >exactly it is you're tackling.
> >
>
> Thanks for taking a look.
>
> If my understanding is correct, your concern here is the case here never
> happen in real world.

No, it's not, re-read what I've said.

I'm saying you have completely failed to be specific in your commit message
about how, what where and why the alleged issue happens, forcing me to spend my
entire morning to figure out what on earth it is you're trying to do.

I also (but you've clipped it) say I think your solution is just wrong and that
there isn't an issue here.

I also (but you've clipped it) say you utterly fail to explain what on earth you
are doing.

I also (but you've clipped it) say you assume the start_gap can be 0x2000 even
though it can only ever be 0 or 0x1000.

I have taken a great deal of time to specifically critique this even though
you've given me little to work on.

I simply do not have time to do this and in future if your commit messages are
this bad I will just reject your series.

>
>   We are searching a gap bottom-up, while the vma wants top-down.
>
> This maybe possible to me. Here is my understanding, (but maybe not correct).
>
> We have two separate flags affecting the search:
>
>   * mm->flags:      MMF_TOPDOWN  or not
>   * vma->vm_flags:  VM_GROWSDOWN or VM_GROWSUP
>
> To me, they are independent.

You are simply reiterating things I've told you but you failed to mention in
your series.

>
> For mm->flags, arch_pick_mmap_layout() could set/clear MMF_TOPDOWN it based on
> the result of mmap_is_legacy(). Even we provide a sysctl file
> /proc/sys/vm/legacy_vm_layout for configuration.

Yes I know.

>
>
> For vma->vm_flags, for general, VM_STACK is set to VM_GROWSDOWN by default.
> And we use the flag in __bprm_mm_init() and setup_arg_pages().

Yes I know.

>
> So to me the case is real and not a very specific one.

It's literally very specific to the point where you have to enable a
compatibility mode. As you've just said.

Sorry, this is unacceptable for _core_ mm work. This is not an area where you
can provide nebulous and vague commit messages.

>
> But maybe I missed some background. Would you mind telling me the miss part,
> if it is not too time wasting?
>
> --
> Wei Yang
> Help you, Help me

You have clipped the part where I point out that I think your 'solution' to the
alleged 'problem' is incorrect.

Honestly, drop this series Wei. The finding unmapped area code is necessarily
conservative and fuzzy as it accounts for -worst case- alignment and providing
appropriate gaps.

It failing to find a gap in very specific and awkward circumstances across the
vastness of the virtual address space isn't a bug.
diff mbox series

Patch

diff --git a/mm/vma.c b/mm/vma.c
index 3f45a245e31b..d82fdbc710b0 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -2668,7 +2668,7 @@  unsigned long unmapped_area(struct vm_unmapped_area_info *info)
 	gap += (info->align_offset - gap) & info->align_mask;
 	tmp = vma_next(&vmi);
 	if (tmp && (tmp->vm_flags & VM_STARTGAP_FLAGS)) { /* Avoid prev check if possible */
-		if (vm_start_gap(tmp) < gap + length - 1) {
+		if (vm_start_gap(tmp) < gap + info->length) {
 			low_limit = tmp->vm_end;
 			vma_iter_reset(&vmi);
 			goto retry;