selftests/mm: run_vmtests.sh: add missing tests

Message ID	20240116090641.3411660-1-usama.anjum@collabora.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> sender: usama.anjum) by madrid.collaboradmins.com (Postfix) with ESMTPSA id 52266378200E; Tue, 16 Jan 2024 09:06:40 +0000 (UTC) From: Muhammad Usama Anjum <usama.anjum@collabora.com> To: Andrew Morton <akpm@linux-foundation.org>, Shuah Khan <shuah@kernel.org> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>, kernel@collabora.com, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] selftests/mm: run_vmtests.sh: add missing tests Date: Tue, 16 Jan 2024 14:06:40 +0500 Message-ID: <20240116090641.3411660-1-usama.anjum@collabora.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	selftests/mm: run_vmtests.sh: add missing tests \| expand selftests/mm: run_vmtests.sh: add missing tests

Muhammad Usama Anjum Jan. 16, 2024, 9:06 a.m. UTC

Add missing tests to run_vmtests.sh. The mm kselftests are run through
run_vmtests.sh. If a test isn't present in this script, it'll not run
with run_tests or `make -C tools/testing/selftests/mm run_tests`.

Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
---
 tools/testing/selftests/mm/run_vmtests.sh | 3 +++
 1 file changed, 3 insertions(+)

Ryan Roberts Jan. 19, 2024, 4:09 p.m. UTC | #1

Hi Muhammad,

Afraid this patch is causing a regression on our CI system when it turned up in
linux-next today. Additionally, 2 of thetests you have added are failing because
the scripts are not exported correctly...

On 16/01/2024 09:06, Muhammad Usama Anjum wrote:
> Add missing tests to run_vmtests.sh. The mm kselftests are run through
> run_vmtests.sh. If a test isn't present in this script, it'll not run
> with run_tests or `make -C tools/testing/selftests/mm run_tests`.
> 
> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
> ---
>  tools/testing/selftests/mm/run_vmtests.sh | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
> index 246d53a5d7f2..a5e6ba8d3579 100755
> --- a/tools/testing/selftests/mm/run_vmtests.sh
> +++ b/tools/testing/selftests/mm/run_vmtests.sh
> @@ -248,6 +248,9 @@ CATEGORY="hugetlb" run_test ./map_hugetlb
>  CATEGORY="hugetlb" run_test ./hugepage-mremap
>  CATEGORY="hugetlb" run_test ./hugepage-vmemmap
>  CATEGORY="hugetlb" run_test ./hugetlb-madvise
> +CATEGORY="hugetlb" run_test ./charge_reserved_hugetlb.sh
> +CATEGORY="hugetlb" run_test ./hugetlb_reparenting_test.sh

These 2 tests are failing because the test scripts are not exported. You will
need to add them to the TEST_FILES variable in the Makefile.

> +CATEGORY="hugetlb" run_test ./hugetlb-read-hwpoison

The addition of this test causes 2 later tests to fail with ENOMEM. I suspect
its a side-effect of marking the hugetlbs as hwpoisoned? (just a guess based on
the test name!). Once a page is marked poisoned, is there a way to un-poison it?
If not, I suspect that's why it wasn't part of the standard test script in the
first place.

These are the tests that start failing:

# # ------------------------------------
# # running ./uffd-stress hugetlb 128 32
# # ------------------------------------
# # nr_pages: 64, nr_pages_per_cpu: 8
# # ERROR: context init failed (errno=12, @uffd-stress.c:254)
# # [FAIL]
# not ok 18 uffd-stress hugetlb 128 32 # exit=1
# # --------------------------------------------
# # running ./uffd-stress hugetlb-private 128 32
# # --------------------------------------------
# # nr_pages: 64, nr_pages_per_cpu: 8
# # bounces: 31, mode: rnd racing ver poll, ERROR: UFFDIO_COPY error: -12ERROR:
UFFDIO_COPY error: -12 (errno=12, @uffd-common.c:614)
# #  (errno=12, @uffd-common.c:614)
# # [FAIL]

Quickest way to repo is:

$ sudo ./run_vmtests.sh -t "userfaultfd hugetlb"

Thanks,
Ryan


>  
>  nr_hugepages_tmp=$(cat /proc/sys/vm/nr_hugepages)
>  # For this test, we need one and just one huge page

Muhammad Usama Anjum Jan. 22, 2024, 8:46 a.m. UTC | #2

On 1/19/24 9:09 PM, Ryan Roberts wrote:
> Hi Muhammad,
> 
> Afraid this patch is causing a regression on our CI system when it turned up in
> linux-next today. Additionally, 2 of thetests you have added are failing because
> the scripts are not exported correctly...
Andrew has dropped this patch for now.

> 
> On 16/01/2024 09:06, Muhammad Usama Anjum wrote:
>> Add missing tests to run_vmtests.sh. The mm kselftests are run through
>> run_vmtests.sh. If a test isn't present in this script, it'll not run
>> with run_tests or `make -C tools/testing/selftests/mm run_tests`.
>>
>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>> ---
>>  tools/testing/selftests/mm/run_vmtests.sh | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
>> index 246d53a5d7f2..a5e6ba8d3579 100755
>> --- a/tools/testing/selftests/mm/run_vmtests.sh
>> +++ b/tools/testing/selftests/mm/run_vmtests.sh
>> @@ -248,6 +248,9 @@ CATEGORY="hugetlb" run_test ./map_hugetlb
>>  CATEGORY="hugetlb" run_test ./hugepage-mremap
>>  CATEGORY="hugetlb" run_test ./hugepage-vmemmap
>>  CATEGORY="hugetlb" run_test ./hugetlb-madvise
>> +CATEGORY="hugetlb" run_test ./charge_reserved_hugetlb.sh
>> +CATEGORY="hugetlb" run_test ./hugetlb_reparenting_test.sh
> 
> These 2 tests are failing because the test scripts are not exported. You will
> need to add them to the TEST_FILES variable in the Makefile.
This must be done. I'll investigate even after adding them if these scripts
are robust enough to pass.

> 
>> +CATEGORY="hugetlb" run_test ./hugetlb-read-hwpoison
> 
> The addition of this test causes 2 later tests to fail with ENOMEM. I suspect
> its a side-effect of marking the hugetlbs as hwpoisoned? (just a guess based on
> the test name!). Once a page is marked poisoned, is there a way to un-poison it?
> If not, I suspect that's why it wasn't part of the standard test script in the
> first place.
hugetlb-read-hwpoison failed as probably the fix in the kernel for the test
hasn't been merged in the kernel. The other tests (uffd-stress) aren't
failing on my end and on CI [1][2]

[1] https://lava.collabora.dev/scheduler/job/12577207#L3677
[2] https://lava.collabora.dev/scheduler/job/12577229#L4027

Maybe its configurations issue which is exposed now. Not sure. Maybe
hugetlb-read-hwpoison is changing some configuration and not restoring it.
Maybe your system has less number of hugetlb pages.

> 
> These are the tests that start failing:
> 
> # # ------------------------------------
> # # running ./uffd-stress hugetlb 128 32
> # # ------------------------------------
> # # nr_pages: 64, nr_pages_per_cpu: 8
> # # ERROR: context init failed (errno=12, @uffd-stress.c:254)
> # # [FAIL]
> # not ok 18 uffd-stress hugetlb 128 32 # exit=1
> # # --------------------------------------------
> # # running ./uffd-stress hugetlb-private 128 32
> # # --------------------------------------------
> # # nr_pages: 64, nr_pages_per_cpu: 8
> # # bounces: 31, mode: rnd racing ver poll, ERROR: UFFDIO_COPY error: -12ERROR:
> UFFDIO_COPY error: -12 (errno=12, @uffd-common.c:614)
> # #  (errno=12, @uffd-common.c:614)
> # # [FAIL]
> 
> Quickest way to repo is:
> 
> $ sudo ./run_vmtests.sh -t "userfaultfd hugetlb"
> 
> Thanks,
> Ryan
> 
> 
>>  
>>  nr_hugepages_tmp=$(cat /proc/sys/vm/nr_hugepages)
>>  # For this test, we need one and just one huge page
> 
>

Ryan Roberts Jan. 22, 2024, 9:59 a.m. UTC | #3

On 22/01/2024 08:46, Muhammad Usama Anjum wrote:
> On 1/19/24 9:09 PM, Ryan Roberts wrote:
>> Hi Muhammad,
>>
>> Afraid this patch is causing a regression on our CI system when it turned up in
>> linux-next today. Additionally, 2 of thetests you have added are failing because
>> the scripts are not exported correctly...
> Andrew has dropped this patch for now.
> 
>>
>> On 16/01/2024 09:06, Muhammad Usama Anjum wrote:
>>> Add missing tests to run_vmtests.sh. The mm kselftests are run through
>>> run_vmtests.sh. If a test isn't present in this script, it'll not run
>>> with run_tests or `make -C tools/testing/selftests/mm run_tests`.
>>>
>>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>>> ---
>>>  tools/testing/selftests/mm/run_vmtests.sh | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
>>> index 246d53a5d7f2..a5e6ba8d3579 100755
>>> --- a/tools/testing/selftests/mm/run_vmtests.sh
>>> +++ b/tools/testing/selftests/mm/run_vmtests.sh
>>> @@ -248,6 +248,9 @@ CATEGORY="hugetlb" run_test ./map_hugetlb
>>>  CATEGORY="hugetlb" run_test ./hugepage-mremap
>>>  CATEGORY="hugetlb" run_test ./hugepage-vmemmap
>>>  CATEGORY="hugetlb" run_test ./hugetlb-madvise
>>> +CATEGORY="hugetlb" run_test ./charge_reserved_hugetlb.sh
>>> +CATEGORY="hugetlb" run_test ./hugetlb_reparenting_test.sh
>>
>> These 2 tests are failing because the test scripts are not exported. You will
>> need to add them to the TEST_FILES variable in the Makefile.
> This must be done. I'll investigate even after adding them if these scripts
> are robust enough to pass.

Great thanks!

> 
>>
>>> +CATEGORY="hugetlb" run_test ./hugetlb-read-hwpoison
>>
>> The addition of this test causes 2 later tests to fail with ENOMEM. I suspect
>> its a side-effect of marking the hugetlbs as hwpoisoned? (just a guess based on
>> the test name!). Once a page is marked poisoned, is there a way to un-poison it?
>> If not, I suspect that's why it wasn't part of the standard test script in the
>> first place.
> hugetlb-read-hwpoison failed as probably the fix in the kernel for the test
> hasn't been merged in the kernel. The other tests (uffd-stress) aren't
> failing on my end and on CI [1][2]

To be clear, hugetlb-read-hwpoison isn't failing for me, its just causing the
subsequent tests uffd-stress tests to fail. Both of those subsequent tests are
allocating hugetlbs so my guess is that since this test is marking some hugetlbs
as poisoned, there are no longer enough for the subsequent tests.

> 
> [1] https://lava.collabora.dev/scheduler/job/12577207#L3677
> [2] https://lava.collabora.dev/scheduler/job/12577229#L4027
> 
> Maybe its configurations issue which is exposed now. Not sure. Maybe
> hugetlb-read-hwpoison is changing some configuration and not restoring it.

Well yes - its marking some hugetlb pages as HWPOISONED.

> Maybe your system has less number of hugetlb pages.

YEs probably; What is hugetlb-read-hwpoison's requirement for size and number of
hugetlb pages? the run_vmtests.sh script allocates the required number of
default-sized hugetlb pages before running any tests (I guess this value should
be increased for hugetlb-read-hwpoison's requirements?).

Additionally, our CI preallocates non-default sizes from the kernel command line
at boot. Happy to increase these if you can tell me what the new requirement is:

hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K hugepages=0:2,1:2

Thanks,
Ryan

> 
>>
>> These are the tests that start failing:
>>
>> # # ------------------------------------
>> # # running ./uffd-stress hugetlb 128 32
>> # # ------------------------------------
>> # # nr_pages: 64, nr_pages_per_cpu: 8
>> # # ERROR: context init failed (errno=12, @uffd-stress.c:254)
>> # # [FAIL]
>> # not ok 18 uffd-stress hugetlb 128 32 # exit=1
>> # # --------------------------------------------
>> # # running ./uffd-stress hugetlb-private 128 32
>> # # --------------------------------------------
>> # # nr_pages: 64, nr_pages_per_cpu: 8
>> # # bounces: 31, mode: rnd racing ver poll, ERROR: UFFDIO_COPY error: -12ERROR:
>> UFFDIO_COPY error: -12 (errno=12, @uffd-common.c:614)
>> # #  (errno=12, @uffd-common.c:614)
>> # # [FAIL]
>>
>> Quickest way to repo is:
>>
>> $ sudo ./run_vmtests.sh -t "userfaultfd hugetlb"
>>
>> Thanks,
>> Ryan
>>
>>
>>>  
>>>  nr_hugepages_tmp=$(cat /proc/sys/vm/nr_hugepages)
>>>  # For this test, we need one and just one huge page
>>
>>
>

Muhammad Usama Anjum Jan. 23, 2024, 7:51 a.m. UTC | #4

On 1/22/24 2:59 PM, Ryan Roberts wrote:
>>>> +CATEGORY="hugetlb" run_test ./hugetlb-read-hwpoison
>>>
>>> The addition of this test causes 2 later tests to fail with ENOMEM. I suspect
>>> its a side-effect of marking the hugetlbs as hwpoisoned? (just a guess based on
>>> the test name!). Once a page is marked poisoned, is there a way to un-poison it?
>>> If not, I suspect that's why it wasn't part of the standard test script in the
>>> first place.
>> hugetlb-read-hwpoison failed as probably the fix in the kernel for the test
>> hasn't been merged in the kernel. The other tests (uffd-stress) aren't
>> failing on my end and on CI [1][2]
> 
> To be clear, hugetlb-read-hwpoison isn't failing for me, its just causing the
> subsequent tests uffd-stress tests to fail. Both of those subsequent tests are
> allocating hugetlbs so my guess is that since this test is marking some hugetlbs
> as poisoned, there are no longer enough for the subsequent tests.
> 
>>
>> [1] https://lava.collabora.dev/scheduler/job/12577207#L3677
>> [2] https://lava.collabora.dev/scheduler/job/12577229#L4027
>>
>> Maybe its configurations issue which is exposed now. Not sure. Maybe
>> hugetlb-read-hwpoison is changing some configuration and not restoring it.
> 
> Well yes - its marking some hugetlb pages as HWPOISONED.
> 
>> Maybe your system has less number of hugetlb pages.
> 
> YEs probably; What is hugetlb-read-hwpoison's requirement for size and number of
> hugetlb pages? the run_vmtests.sh script allocates the required number of
> default-sized hugetlb pages before running any tests (I guess this value should
> be increased for hugetlb-read-hwpoison's requirements?).
> 
> Additionally, our CI preallocates non-default sizes from the kernel command line
> at boot. Happy to increase these if you can tell me what the new requirement is:
I'm not sure about the exact requirement of the number of hugetlb for these
tests. But I specify hugepages=1000 and tests work for me.

I've sent v2 [1]. Would it be possible to run your CI on that and share
results before we merge that one?

[1]
https://lore.kernel.org/all/20240123073615.920324-1-usama.anjum@collabora.com

> 
> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K hugepages=0:2,1:2
> 
> Thanks,
> Ryan
>

Ryan Roberts Jan. 23, 2024, 9:23 a.m. UTC | #5

On 23/01/2024 07:51, Muhammad Usama Anjum wrote:
> On 1/22/24 2:59 PM, Ryan Roberts wrote:
>>>>> +CATEGORY="hugetlb" run_test ./hugetlb-read-hwpoison
>>>>
>>>> The addition of this test causes 2 later tests to fail with ENOMEM. I suspect
>>>> its a side-effect of marking the hugetlbs as hwpoisoned? (just a guess based on
>>>> the test name!). Once a page is marked poisoned, is there a way to un-poison it?
>>>> If not, I suspect that's why it wasn't part of the standard test script in the
>>>> first place.
>>> hugetlb-read-hwpoison failed as probably the fix in the kernel for the test
>>> hasn't been merged in the kernel. The other tests (uffd-stress) aren't
>>> failing on my end and on CI [1][2]
>>
>> To be clear, hugetlb-read-hwpoison isn't failing for me, its just causing the
>> subsequent tests uffd-stress tests to fail. Both of those subsequent tests are
>> allocating hugetlbs so my guess is that since this test is marking some hugetlbs
>> as poisoned, there are no longer enough for the subsequent tests.
>>
>>>
>>> [1] https://lava.collabora.dev/scheduler/job/12577207#L3677
>>> [2] https://lava.collabora.dev/scheduler/job/12577229#L4027
>>>
>>> Maybe its configurations issue which is exposed now. Not sure. Maybe
>>> hugetlb-read-hwpoison is changing some configuration and not restoring it.
>>
>> Well yes - its marking some hugetlb pages as HWPOISONED.
>>
>>> Maybe your system has less number of hugetlb pages.
>>
>> YEs probably; What is hugetlb-read-hwpoison's requirement for size and number of
>> hugetlb pages? the run_vmtests.sh script allocates the required number of
>> default-sized hugetlb pages before running any tests (I guess this value should
>> be increased for hugetlb-read-hwpoison's requirements?).
>>
>> Additionally, our CI preallocates non-default sizes from the kernel command line
>> at boot. Happy to increase these if you can tell me what the new requirement is:
> I'm not sure about the exact requirement of the number of hugetlb for these
> tests. But I specify hugepages=1000 and tests work for me.

1000 hugepages @2M is ~2G, which is quite a big ask for small arm systems. And
for big arm systems that use 64K base pages, the default hugepage size is 512M,
so 1000 of those is 512G which is also quite a big ask. So I'd prefer not to
make 1000 hugepages the requirement.

Looking at the test, I think its using 8 default sized hugepages; But supporting
it properly is still complex as the HWPOISON operation is destructive. I'll
reply with more detail against the v2 patch.

> 
> I've sent v2 [1]. Would it be possible to run your CI on that and share
> results before we merge that one?
> 
> [1]
> https://lore.kernel.org/all/20240123073615.920324-1-usama.anjum@collabora.com
> 
>>
>> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
>> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K hugepages=0:2,1:2
>>
>> Thanks,
>> Ryan
>>
>

selftests/mm: run_vmtests.sh: add missing tests

Commit Message

Comments

Patch