diff mbox series

[1/3] selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB

Message ID 20250107-virtual_address_range-tests-v1-1-3834a2fb47fe@linutronix.de (mailing list archive)
State New
Headers show
Series selftests/mm: virtual_address_range: Two bugfixes and a cleanup | expand

Commit Message

Thomas Weißschuh Jan. 7, 2025, 3:14 p.m. UTC
If not enough physical memory is available the kernel may fail mmap();
see __vm_enough_memory() and vm_commit_limit().
In that case the logic in validate_complete_va_space() does not make
sense and will even incorrectly fail.
Instead skip the test if no mmap() succeeded.

Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>

---
The logic in __vm_enough_memory() seems weird.
It describes itself as "Check that a process has enough memory to
allocate a new virtual mapping", however it never checks the current
memory usage of the process.
So it only disallows large mappings. But many small mappings taking the
same amount of memory are allowed; and then even automatically merged
into one big mapping.
---
 tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Dev Jain Jan. 8, 2025, 6:16 a.m. UTC | #1
On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
> If not enough physical memory is available the kernel may fail mmap();
> see __vm_enough_memory() and vm_commit_limit().
> In that case the logic in validate_complete_va_space() does not make
> sense and will even incorrectly fail.
> Instead skip the test if no mmap() succeeded.
>
> Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
> Cc: stable@vger.kernel.org
> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
>
> ---
> The logic in __vm_enough_memory() seems weird.
> It describes itself as "Check that a process has enough memory to
> allocate a new virtual mapping", however it never checks the current
> memory usage of the process.
> So it only disallows large mappings. But many small mappings taking the
> same amount of memory are allowed; and then even automatically merged
> into one big mapping.
> ---
>   tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644
> --- a/tools/testing/selftests/mm/virtual_address_range.c
> +++ b/tools/testing/selftests/mm/virtual_address_range.c
> @@ -178,6 +178,12 @@ int main(int argc, char *argv[])
>   		validate_addr(ptr[i], 0);
>   	}
>   	lchunks = i;
> +
> +	if (!lchunks) {
> +		ksft_test_result_skip("Not enough memory for a single chunk\n");
> +		ksft_finished();
> +	}
> +
>   	hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *));
>   	if (hptr == NULL) {
>   		ksft_test_result_skip("Memory constraint not fulfilled\n");
>

I do not  know about __vm_enough_memory(), but I am going by your description:
You say that the kernel may fail mmap() when enough physical memory is not
there, but it may happen that we have already done 100 mmap()'s, and then
the kernel fails mmap(), so if (!lchunks) won't be able to handle this case.
Basically, lchunks == 0 is not a complete indicator of kernel failing mmap().

The basic assumption of the test is that any process should be able to exhaust
its virtual address space, and running the test under memory pressure and the
kernel violating this behaviour defeats the point of the test I think?
Thomas Weißschuh Jan. 8, 2025, 8:05 a.m. UTC | #2
On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote:
> 
> On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
> > If not enough physical memory is available the kernel may fail mmap();
> > see __vm_enough_memory() and vm_commit_limit().
> > In that case the logic in validate_complete_va_space() does not make
> > sense and will even incorrectly fail.
> > Instead skip the test if no mmap() succeeded.
> > 
> > Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
> > 
> > ---
> > The logic in __vm_enough_memory() seems weird.
> > It describes itself as "Check that a process has enough memory to
> > allocate a new virtual mapping", however it never checks the current
> > memory usage of the process.
> > So it only disallows large mappings. But many small mappings taking the
> > same amount of memory are allowed; and then even automatically merged
> > into one big mapping.
> > ---
> >   tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++
> >   1 file changed, 6 insertions(+)
> > 
> > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> > index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644
> > --- a/tools/testing/selftests/mm/virtual_address_range.c
> > +++ b/tools/testing/selftests/mm/virtual_address_range.c
> > @@ -178,6 +178,12 @@ int main(int argc, char *argv[])
> >   		validate_addr(ptr[i], 0);
> >   	}
> >   	lchunks = i;
> > +
> > +	if (!lchunks) {
> > +		ksft_test_result_skip("Not enough memory for a single chunk\n");
> > +		ksft_finished();
> > +	}
> > +
> >   	hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *));
> >   	if (hptr == NULL) {
> >   		ksft_test_result_skip("Memory constraint not fulfilled\n");
> > 
> 
> I do not  know about __vm_enough_memory(), but I am going by your description:
> You say that the kernel may fail mmap() when enough physical memory is not
> there, but it may happen that we have already done 100 mmap()'s, and then
> the kernel fails mmap(), so if (!lchunks) won't be able to handle this case.
> Basically, lchunks == 0 is not a complete indicator of kernel failing mmap().

__vm_enough_memory() only checks the size of each single mmap() on its
own. It does not actually check the current memory or address space
usage of the process.
This seems a bit weird, as indicated in my after-the-fold explanation.

> The basic assumption of the test is that any process should be able to exhaust
> its virtual address space, and running the test under memory pressure and the
> kernel violating this behaviour defeats the point of the test I think?

The assumption is correct, as soon as one mapping succeeds the others
will also succeed, until the actual address space is exhausted.

Looking at it again, __vm_enough_memory() is only called for writable
mappings, so it would be possible to use only readable mappings in the
test. The test will still fail with OOM, as the many PTEs need more than
1GiB of physical memory anyways, but at least that produces a usable
error message.
However I'm not sure if this would violate other test assumptions.
David Hildenbrand Jan. 8, 2025, 1:36 p.m. UTC | #3
On 08.01.25 09:05, Thomas Weißschuh wrote:
> On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote:
>>
>> On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
>>> If not enough physical memory is available the kernel may fail mmap();
>>> see __vm_enough_memory() and vm_commit_limit().
>>> In that case the logic in validate_complete_va_space() does not make
>>> sense and will even incorrectly fail.
>>> Instead skip the test if no mmap() succeeded.
>>>
>>> Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
>>> Cc: stable@vger.kernel.org

CC stable on tests is ... odd.

>>> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
>>>
>>> ---
>>> The logic in __vm_enough_memory() seems weird.
>>> It describes itself as "Check that a process has enough memory to
>>> allocate a new virtual mapping", however it never checks the current
>>> memory usage of the process.
>>> So it only disallows large mappings. But many small mappings taking the
>>> same amount of memory are allowed; and then even automatically merged
>>> into one big mapping.
>>> ---
>>>    tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++
>>>    1 file changed, 6 insertions(+)
>>>
>>> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
>>> index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644
>>> --- a/tools/testing/selftests/mm/virtual_address_range.c
>>> +++ b/tools/testing/selftests/mm/virtual_address_range.c
>>> @@ -178,6 +178,12 @@ int main(int argc, char *argv[])
>>>    		validate_addr(ptr[i], 0);
>>>    	}
>>>    	lchunks = i;
>>> +
>>> +	if (!lchunks) {
>>> +		ksft_test_result_skip("Not enough memory for a single chunk\n");
>>> +		ksft_finished();
>>> +	}
>>> +
>>>    	hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *));
>>>    	if (hptr == NULL) {
>>>    		ksft_test_result_skip("Memory constraint not fulfilled\n");
>>>
>>
>> I do not  know about __vm_enough_memory(), but I am going by your description:
>> You say that the kernel may fail mmap() when enough physical memory is not
>> there, but it may happen that we have already done 100 mmap()'s, and then
>> the kernel fails mmap(), so if (!lchunks) won't be able to handle this case.
>> Basically, lchunks == 0 is not a complete indicator of kernel failing mmap().
> 
> __vm_enough_memory() only checks the size of each single mmap() on its
> own. It does not actually check the current memory or address space
> usage of the process.
> This seems a bit weird, as indicated in my after-the-fold explanation.
> 
>> The basic assumption of the test is that any process should be able to exhaust
>> its virtual address space, and running the test under memory pressure and the
>> kernel violating this behaviour defeats the point of the test I think?
> 
> The assumption is correct, as soon as one mapping succeeds the others
> will also succeed, until the actual address space is exhausted.
> 
> Looking at it again, __vm_enough_memory() is only called for writable
> mappings, so it would be possible to use only readable mappings in the
> test. The test will still fail with OOM, as the many PTEs need more than
> 1GiB of physical memory anyways, but at least that produces a usable
> error message.
> However I'm not sure if this would violate other test assumptions.
> 

Note that with MAP_NORESRVE, most setups we care about will allow 
mapping as much as you want, but on access OOM will fire.

So one could require that /proc/sys/vm/overcommit_memory is setup 
properly and use MAP_NORESRVE.

Reading from anonymous memory will populate the shared zeropage. To 
mitigate OOM from "too many page tables", one could simply unmap the 
pieces as they are verified (or MAP_FIXED over them, to free page tables).
Thomas Weißschuh Jan. 8, 2025, 4:13 p.m. UTC | #4
On Wed, Jan 08, 2025 at 02:36:57PM +0100, David Hildenbrand wrote:
> On 08.01.25 09:05, Thomas Weißschuh wrote:
> > On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote:
> > > 
> > > On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
> > > > If not enough physical memory is available the kernel may fail mmap();
> > > > see __vm_enough_memory() and vm_commit_limit().
> > > > In that case the logic in validate_complete_va_space() does not make
> > > > sense and will even incorrectly fail.
> > > > Instead skip the test if no mmap() succeeded.
> > > > 
> > > > Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
> > > > Cc: stable@vger.kernel.org
> 
> CC stable on tests is ... odd.

I thought it was fairly common, but it isn't.
Will drop it.

> > > > Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
> > > > 
> > > > ---
> > > > The logic in __vm_enough_memory() seems weird.
> > > > It describes itself as "Check that a process has enough memory to
> > > > allocate a new virtual mapping", however it never checks the current
> > > > memory usage of the process.
> > > > So it only disallows large mappings. But many small mappings taking the
> > > > same amount of memory are allowed; and then even automatically merged
> > > > into one big mapping.
> > > > ---
> > > >    tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++
> > > >    1 file changed, 6 insertions(+)
> > > > 
> > > > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> > > > index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644
> > > > --- a/tools/testing/selftests/mm/virtual_address_range.c
> > > > +++ b/tools/testing/selftests/mm/virtual_address_range.c
> > > > @@ -178,6 +178,12 @@ int main(int argc, char *argv[])
> > > >    		validate_addr(ptr[i], 0);
> > > >    	}
> > > >    	lchunks = i;
> > > > +
> > > > +	if (!lchunks) {
> > > > +		ksft_test_result_skip("Not enough memory for a single chunk\n");
> > > > +		ksft_finished();
> > > > +	}
> > > > +
> > > >    	hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *));
> > > >    	if (hptr == NULL) {
> > > >    		ksft_test_result_skip("Memory constraint not fulfilled\n");
> > > > 
> > > 
> > > I do not  know about __vm_enough_memory(), but I am going by your description:
> > > You say that the kernel may fail mmap() when enough physical memory is not
> > > there, but it may happen that we have already done 100 mmap()'s, and then
> > > the kernel fails mmap(), so if (!lchunks) won't be able to handle this case.
> > > Basically, lchunks == 0 is not a complete indicator of kernel failing mmap().
> > 
> > __vm_enough_memory() only checks the size of each single mmap() on its
> > own. It does not actually check the current memory or address space
> > usage of the process.
> > This seems a bit weird, as indicated in my after-the-fold explanation.
> > 
> > > The basic assumption of the test is that any process should be able to exhaust
> > > its virtual address space, and running the test under memory pressure and the
> > > kernel violating this behaviour defeats the point of the test I think?
> > 
> > The assumption is correct, as soon as one mapping succeeds the others
> > will also succeed, until the actual address space is exhausted.
> > 
> > Looking at it again, __vm_enough_memory() is only called for writable
> > mappings, so it would be possible to use only readable mappings in the
> > test. The test will still fail with OOM, as the many PTEs need more than
> > 1GiB of physical memory anyways, but at least that produces a usable
> > error message.
> > However I'm not sure if this would violate other test assumptions.
> > 
> 
> Note that with MAP_NORESRVE, most setups we care about will allow mapping as
> much as you want, but on access OOM will fire.

Thanks for the hint.

> So one could require that /proc/sys/vm/overcommit_memory is setup properly
> and use MAP_NORESRVE.

Isn't the check for lchunks == 0 essentially exactly this?

> Reading from anonymous memory will populate the shared zeropage. To mitigate
> OOM from "too many page tables", one could simply unmap the pieces as they
> are verified (or MAP_FIXED over them, to free page tables).

The code has to figure out if a verified region was created by mmap(),
otherwise an munmap() could crash the process.
As the entries from /proc/self/maps may have been merged and (I assume)
the ordering of mappings is not guaranteed, some bespoke logic to establish
the link will be needed.

Is it fine to rely on CONFIG_ANON_VMA_NAME?
That would make it much easier to implement.

Using MAP_NORESERVE and eager munmap()s, the testcase works nicely even
in very low physical memory conditions.

Thomas
David Hildenbrand Jan. 8, 2025, 4:46 p.m. UTC | #5
On 08.01.25 17:13, Thomas Weißschuh wrote:
> On Wed, Jan 08, 2025 at 02:36:57PM +0100, David Hildenbrand wrote:
>> On 08.01.25 09:05, Thomas Weißschuh wrote:
>>> On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote:
>>>>
>>>> On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
>>>>> If not enough physical memory is available the kernel may fail mmap();
>>>>> see __vm_enough_memory() and vm_commit_limit().
>>>>> In that case the logic in validate_complete_va_space() does not make
>>>>> sense and will even incorrectly fail.
>>>>> Instead skip the test if no mmap() succeeded.
>>>>>
>>>>> Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
>>>>> Cc: stable@vger.kernel.org
>>
>> CC stable on tests is ... odd.
> 
> I thought it was fairly common, but it isn't.
> Will drop it.

As it's not really a "kernel BUG", it's rather uncommon.

>>
>> Note that with MAP_NORESRVE, most setups we care about will allow mapping as
>> much as you want, but on access OOM will fire.
> 
> Thanks for the hint.
> 
>> So one could require that /proc/sys/vm/overcommit_memory is setup properly
>> and use MAP_NORESRVE.
> 
> Isn't the check for lchunks == 0 essentially exactly this?

I assume paired with MAP_NORESERVE?

Maybe, but it could be better to have something that says "if 
overcommit_memory is not setup properly I will SKIP this test", but 
otherwise I expect this to work and will FAIL if it doesn't".

Or would you expect to run into lchunks == 0 even if overcommit_memory 
is setup properly and MAP_NORESERVE is used? (very very low memory that 
we cannot even create all the VMAs?)

> 
>> Reading from anonymous memory will populate the shared zeropage. To mitigate
>> OOM from "too many page tables", one could simply unmap the pieces as they
>> are verified (or MAP_FIXED over them, to free page tables).
> 
> The code has to figure out if a verified region was created by mmap(),
> otherwise an munmap() could crash the process.
> As the entries from /proc/self/maps may have been merged and (I assume)

Yes, and partial unmap (in chunk granularity?) would split them again.

> the ordering of mappings is not guaranteed, some bespoke logic to establish
> the link will be needed.


My thinking was that you simply process one /proc/self/maps entry in 
some chunks. After processing a chunk, you munmap() it.

So you would process + munmap in chunks.

> 
> Is it fine to rely on CONFIG_ANON_VMA_NAME?
> That would make it much easier to implement.

Can you elaborate how you would do it?

> 
> Using MAP_NORESERVE and eager munmap()s, the testcase works nicely even
> in very low physical memory conditions.

Cool.
diff mbox series

Patch

diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644
--- a/tools/testing/selftests/mm/virtual_address_range.c
+++ b/tools/testing/selftests/mm/virtual_address_range.c
@@ -178,6 +178,12 @@  int main(int argc, char *argv[])
 		validate_addr(ptr[i], 0);
 	}
 	lchunks = i;
+
+	if (!lchunks) {
+		ksft_test_result_skip("Not enough memory for a single chunk\n");
+		ksft_finished();
+	}
+
 	hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *));
 	if (hptr == NULL) {
 		ksft_test_result_skip("Memory constraint not fulfilled\n");