diff mbox series

[v12,14/14] selftests/sgx: Add scripts for EPC cgroup testing

Message ID 20240416032011.58578-15-haitao.huang@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series Add Cgroup support for SGX EPC memory | expand

Commit Message

Haitao Huang April 16, 2024, 3:20 a.m. UTC
With different cgroups, the script starts one or multiple concurrent SGX
selftests (test_sgx), each to run the unclobbered_vdso_oversubscribed
test case, which loads an enclave of EPC size equal to the EPC capacity
available on the platform. The script checks results against the
expectation set for each cgroup and reports success or failure.

The script creates 3 different cgroups at the beginning with following
expectations:

1) SMALL - intentionally small enough to fail the test loading an
enclave of size equal to the capacity.
2) LARGE - large enough to run up to 4 concurrent tests but fail some if
more than 4 concurrent tests are run. The script starts 4 expecting at
least one test to pass, and then starts 5 expecting at least one test
to fail.
3) LARGER - limit is the same as the capacity, large enough to run lots of
concurrent tests. The script starts 8 of them and expects all pass.
Then it reruns the same test with one process randomly killed and
usage checked to be zero after all processes exit.

The script also includes a test with low mem_cg limit and LARGE sgx_epc
limit to verify that the RAM used for per-cgroup reclamation is charged
to a proper mem_cg. For this test, it turns off swapping before start,
and turns swapping back on afterwards.

Add README to document how to run the tests.

Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
---
V12:
- Integrate the scripts to the "run_tests" target. (Jarkko)

V11:
- Remove cgroups-tools dependency and make scripts ash compatible. (Jarkko)
- Drop support for cgroup v1 and simplify. (Michal, Jarkko)
- Add documentation for functions. (Jarkko)
- Turn off swapping before memcontrol tests and back on after
- Format and style fixes, name for hard coded values

V7:
- Added memcontrol test.

V5:
- Added script with automatic results checking, remove the interactive
script.
- The script can run independent from the series below.
---
 tools/testing/selftests/sgx/Makefile          |   3 +-
 tools/testing/selftests/sgx/README            | 116 +++++++
 tools/testing/selftests/sgx/ash_cgexec.sh     |  16 +
 .../selftests/sgx/run_epc_cg_selftests.sh     | 283 ++++++++++++++++++
 .../selftests/sgx/watch_misc_for_tests.sh     |  11 +
 5 files changed, 428 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/sgx/README
 create mode 100755 tools/testing/selftests/sgx/ash_cgexec.sh
 create mode 100755 tools/testing/selftests/sgx/run_epc_cg_selftests.sh
 create mode 100755 tools/testing/selftests/sgx/watch_misc_for_tests.sh

Comments

Haitao Huang April 16, 2024, 5:16 a.m. UTC | #1
Missed adding the config file:

index 000000000000..e7f1db1d3eff
--- /dev/null
+++ b/tools/testing/selftests/sgx/config
@@ -0,0 +1,4 @@
+CONFIG_CGROUPS=y
+CONFIG_CGROUP_MISC=y
+CONFIG_MEMCG=y
+CONFIG_X86_SGX=y

I'll send a fixup for this patch or another version of the series if more  
changes are needed.

Thanks
Haitao
Huang, Kai April 16, 2024, 5:42 a.m. UTC | #2
> 
> I'll send a fixup for this patch or another version of the series if more  
> changes are needed.

Hi Haitao,

I don't like to say but in general I think you are sending too frequently.  The
last version was sent April, 11th (my time), so considering the weekend it has
only been 3 or at most 4 days.  

Please slow down a little bit to give people more time.

More information please also see:

https://www.kernel.org/doc/html/next/process/submitting-patches.html#resend-reminders
Jarkko Sakkinen April 16, 2024, 2:05 p.m. UTC | #3
On Tue Apr 16, 2024 at 6:20 AM EEST, Haitao Huang wrote:
> With different cgroups, the script starts one or multiple concurrent SGX
> selftests (test_sgx), each to run the unclobbered_vdso_oversubscribed
> test case, which loads an enclave of EPC size equal to the EPC capacity
> available on the platform. The script checks results against the
> expectation set for each cgroup and reports success or failure.
>
> The script creates 3 different cgroups at the beginning with following
> expectations:
>
> 1) SMALL - intentionally small enough to fail the test loading an
> enclave of size equal to the capacity.
> 2) LARGE - large enough to run up to 4 concurrent tests but fail some if
> more than 4 concurrent tests are run. The script starts 4 expecting at
> least one test to pass, and then starts 5 expecting at least one test
> to fail.
> 3) LARGER - limit is the same as the capacity, large enough to run lots of
> concurrent tests. The script starts 8 of them and expects all pass.
> Then it reruns the same test with one process randomly killed and
> usage checked to be zero after all processes exit.
>
> The script also includes a test with low mem_cg limit and LARGE sgx_epc
> limit to verify that the RAM used for per-cgroup reclamation is charged
> to a proper mem_cg. For this test, it turns off swapping before start,
> and turns swapping back on afterwards.
>
> Add README to document how to run the tests.
>
> Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>

jarkko@mustatorvisieni:~/linux-tpmdd> sudo make -C tools/testing/selftests/sgx run_tests
make: Entering directory '/home/jarkko/linux-tpmdd/tools/testing/selftests/sgx'
gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c main.c -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/main.o
gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c load.c -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/load.o
gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c sigstruct.c -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sigstruct.o
gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c call.S -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/call.o
gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c sign_key.S -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sign_key.o
gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/test_sgx /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/main.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/load.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sigstruct.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/call.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sign_key.o -z noexecstack -lcrypto
gcc -Wall -Werror -static-pie -nostdlib -ffreestanding -fPIE -fno-stack-protector -mrdrnd -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include test_encl.c test_encl_bootstrap.S -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/test_encl.elf -Wl,-T,test_encl.lds,--build-id=none
/usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld: warning: /tmp/ccqvDJVg.o: missing .note.GNU-stack section implies executable stack
/usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
TAP version 13
1..2
# timeout set to 45
# selftests: sgx: test_sgx
# TAP version 13
# 1..16
# # Starting 16 tests from 1 test cases.
# #  RUN           enclave.unclobbered_vdso ...
# #            OK  enclave.unclobbered_vdso
# ok 1 enclave.unclobbered_vdso
# #  RUN           enclave.unclobbered_vdso_oversubscribed ...
# #            OK  enclave.unclobbered_vdso_oversubscribed
# ok 2 enclave.unclobbered_vdso_oversubscribed
# #  RUN           enclave.unclobbered_vdso_oversubscribed_remove ...
# # main.c:402:unclobbered_vdso_oversubscribed_remove:Creating an enclave with 98566144 bytes heap may take a while ...
# # main.c:457:unclobbered_vdso_oversubscribed_remove:Changing type of 98566144 bytes to trimmed may take a while ...
# # main.c:473:unclobbered_vdso_oversubscribed_remove:Entering enclave to run EACCEPT for each page of 98566144 bytes may take a while ...
# # main.c:494:unclobbered_vdso_oversubscribed_remove:Removing 98566144 bytes from enclave may take a while ...
# #            OK  enclave.unclobbered_vdso_oversubscribed_remove
# ok 3 enclave.unclobbered_vdso_oversubscribed_remove
# #  RUN           enclave.clobbered_vdso ...
# #            OK  enclave.clobbered_vdso
# ok 4 enclave.clobbered_vdso
# #  RUN           enclave.clobbered_vdso_and_user_function ...
# #            OK  enclave.clobbered_vdso_and_user_function
# ok 5 enclave.clobbered_vdso_and_user_function
# #  RUN           enclave.tcs_entry ...
# #            OK  enclave.tcs_entry
# ok 6 enclave.tcs_entry
# #  RUN           enclave.pte_permissions ...
# #            OK  enclave.pte_permissions
# ok 7 enclave.pte_permissions
# #  RUN           enclave.tcs_permissions ...
# #            OK  enclave.tcs_permissions
# ok 8 enclave.tcs_permissions
# #  RUN           enclave.epcm_permissions ...
# #            OK  enclave.epcm_permissions
# ok 9 enclave.epcm_permissions
# #  RUN           enclave.augment ...
# #            OK  enclave.augment
# ok 10 enclave.augment
# #  RUN           enclave.augment_via_eaccept ...
# #            OK  enclave.augment_via_eaccept
# ok 11 enclave.augment_via_eaccept
# #  RUN           enclave.tcs_create ...
# #            OK  enclave.tcs_create
# ok 12 enclave.tcs_create
# #  RUN           enclave.remove_added_page_no_eaccept ...
# #            OK  enclave.remove_added_page_no_eaccept
# ok 13 enclave.remove_added_page_no_eaccept
# #  RUN           enclave.remove_added_page_invalid_access ...
# #            OK  enclave.remove_added_page_invalid_access
# ok 14 enclave.remove_added_page_invalid_access
# #  RUN           enclave.remove_added_page_invalid_access_after_eaccept ...
# #            OK  enclave.remove_added_page_invalid_access_after_eaccept
# ok 15 enclave.remove_added_page_invalid_access_after_eaccept
# #  RUN           enclave.remove_untouched_page ...
# #            OK  enclave.remove_untouched_page
# ok 16 enclave.remove_untouched_page
# # PASSED: 16 / 16 tests passed.
# # Totals: pass:16 fail:0 xfail:0 xpass:0 skip:0 error:0
ok 1 selftests: sgx: test_sgx
# timeout set to 45
# selftests: sgx: run_epc_cg_selftests.sh
# # Setting up limits.
# ./run_epc_cg_selftests.sh: line 50: echo: write error: Invalid argument
# # Failed setting up misc limits.
not ok 2 selftests: sgx: run_epc_cg_selftests.sh # exit=1
make: Leaving directory '/home/jarkko/linux-tpmdd/tools/testing/selftests/sgx'

This is what happens now.

BTW, I noticed a file that should not exist, i.e. README. Only thing
that should exist is the tests for kselftest and anything else should
not exist at all, so this file by definiton should not exist.

BR, Jarkko
Jarkko Sakkinen April 16, 2024, 2:10 p.m. UTC | #4
On Tue Apr 16, 2024 at 5:05 PM EEST, Jarkko Sakkinen wrote:
> On Tue Apr 16, 2024 at 6:20 AM EEST, Haitao Huang wrote:
> > With different cgroups, the script starts one or multiple concurrent SGX
> > selftests (test_sgx), each to run the unclobbered_vdso_oversubscribed
> > test case, which loads an enclave of EPC size equal to the EPC capacity
> > available on the platform. The script checks results against the
> > expectation set for each cgroup and reports success or failure.
> >
> > The script creates 3 different cgroups at the beginning with following
> > expectations:
> >
> > 1) SMALL - intentionally small enough to fail the test loading an
> > enclave of size equal to the capacity.
> > 2) LARGE - large enough to run up to 4 concurrent tests but fail some if
> > more than 4 concurrent tests are run. The script starts 4 expecting at
> > least one test to pass, and then starts 5 expecting at least one test
> > to fail.
> > 3) LARGER - limit is the same as the capacity, large enough to run lots of
> > concurrent tests. The script starts 8 of them and expects all pass.
> > Then it reruns the same test with one process randomly killed and
> > usage checked to be zero after all processes exit.
> >
> > The script also includes a test with low mem_cg limit and LARGE sgx_epc
> > limit to verify that the RAM used for per-cgroup reclamation is charged
> > to a proper mem_cg. For this test, it turns off swapping before start,
> > and turns swapping back on afterwards.
> >
> > Add README to document how to run the tests.
> >
> > Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
>
> jarkko@mustatorvisieni:~/linux-tpmdd> sudo make -C tools/testing/selftests/sgx run_tests
> make: Entering directory '/home/jarkko/linux-tpmdd/tools/testing/selftests/sgx'
> gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c main.c -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/main.o
> gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c load.c -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/load.o
> gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c sigstruct.c -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sigstruct.o
> gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c call.S -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/call.o
> gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c sign_key.S -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sign_key.o
> gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/test_sgx /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/main.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/load.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sigstruct.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/call.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sign_key.o -z noexecstack -lcrypto
> gcc -Wall -Werror -static-pie -nostdlib -ffreestanding -fPIE -fno-stack-protector -mrdrnd -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include test_encl.c test_encl_bootstrap.S -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/test_encl.elf -Wl,-T,test_encl.lds,--build-id=none
> /usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld: warning: /tmp/ccqvDJVg.o: missing .note.GNU-stack section implies executable stack
> /usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
> TAP version 13
> 1..2
> # timeout set to 45
> # selftests: sgx: test_sgx
> # TAP version 13
> # 1..16
> # # Starting 16 tests from 1 test cases.
> # #  RUN           enclave.unclobbered_vdso ...
> # #            OK  enclave.unclobbered_vdso
> # ok 1 enclave.unclobbered_vdso
> # #  RUN           enclave.unclobbered_vdso_oversubscribed ...
> # #            OK  enclave.unclobbered_vdso_oversubscribed
> # ok 2 enclave.unclobbered_vdso_oversubscribed
> # #  RUN           enclave.unclobbered_vdso_oversubscribed_remove ...
> # # main.c:402:unclobbered_vdso_oversubscribed_remove:Creating an enclave with 98566144 bytes heap may take a while ...
> # # main.c:457:unclobbered_vdso_oversubscribed_remove:Changing type of 98566144 bytes to trimmed may take a while ...
> # # main.c:473:unclobbered_vdso_oversubscribed_remove:Entering enclave to run EACCEPT for each page of 98566144 bytes may take a while ...
> # # main.c:494:unclobbered_vdso_oversubscribed_remove:Removing 98566144 bytes from enclave may take a while ...
> # #            OK  enclave.unclobbered_vdso_oversubscribed_remove
> # ok 3 enclave.unclobbered_vdso_oversubscribed_remove
> # #  RUN           enclave.clobbered_vdso ...
> # #            OK  enclave.clobbered_vdso
> # ok 4 enclave.clobbered_vdso
> # #  RUN           enclave.clobbered_vdso_and_user_function ...
> # #            OK  enclave.clobbered_vdso_and_user_function
> # ok 5 enclave.clobbered_vdso_and_user_function
> # #  RUN           enclave.tcs_entry ...
> # #            OK  enclave.tcs_entry
> # ok 6 enclave.tcs_entry
> # #  RUN           enclave.pte_permissions ...
> # #            OK  enclave.pte_permissions
> # ok 7 enclave.pte_permissions
> # #  RUN           enclave.tcs_permissions ...
> # #            OK  enclave.tcs_permissions
> # ok 8 enclave.tcs_permissions
> # #  RUN           enclave.epcm_permissions ...
> # #            OK  enclave.epcm_permissions
> # ok 9 enclave.epcm_permissions
> # #  RUN           enclave.augment ...
> # #            OK  enclave.augment
> # ok 10 enclave.augment
> # #  RUN           enclave.augment_via_eaccept ...
> # #            OK  enclave.augment_via_eaccept
> # ok 11 enclave.augment_via_eaccept
> # #  RUN           enclave.tcs_create ...
> # #            OK  enclave.tcs_create
> # ok 12 enclave.tcs_create
> # #  RUN           enclave.remove_added_page_no_eaccept ...
> # #            OK  enclave.remove_added_page_no_eaccept
> # ok 13 enclave.remove_added_page_no_eaccept
> # #  RUN           enclave.remove_added_page_invalid_access ...
> # #            OK  enclave.remove_added_page_invalid_access
> # ok 14 enclave.remove_added_page_invalid_access
> # #  RUN           enclave.remove_added_page_invalid_access_after_eaccept ...
> # #            OK  enclave.remove_added_page_invalid_access_after_eaccept
> # ok 15 enclave.remove_added_page_invalid_access_after_eaccept
> # #  RUN           enclave.remove_untouched_page ...
> # #            OK  enclave.remove_untouched_page
> # ok 16 enclave.remove_untouched_page
> # # PASSED: 16 / 16 tests passed.
> # # Totals: pass:16 fail:0 xfail:0 xpass:0 skip:0 error:0
> ok 1 selftests: sgx: test_sgx
> # timeout set to 45
> # selftests: sgx: run_epc_cg_selftests.sh
> # # Setting up limits.
> # ./run_epc_cg_selftests.sh: line 50: echo: write error: Invalid argument
> # # Failed setting up misc limits.
> not ok 2 selftests: sgx: run_epc_cg_selftests.sh # exit=1
> make: Leaving directory '/home/jarkko/linux-tpmdd/tools/testing/selftests/sgx'
>
> This is what happens now.
>
> BTW, I noticed a file that should not exist, i.e. README. Only thing
> that should exist is the tests for kselftest and anything else should
> not exist at all, so this file by definiton should not exist.

I'd suggest to sanity-check the kselftest with a person from Intel who
has worked with kselftest before the next version so that it will be
nailed next time. Or better internal review this single patch with a 
person with expertise on kernel QA.

I did not check this but I have also suspicion that it might have some
checks whetehr it is run as root or not. If there are any, those should
be removed too. Let people set their environment however want...

BR, Jarkko
Jarkko Sakkinen April 16, 2024, 2:15 p.m. UTC | #5
On Tue Apr 16, 2024 at 8:42 AM EEST, Huang, Kai wrote:
> > 
> > I'll send a fixup for this patch or another version of the series if more  
> > changes are needed.
>
> Hi Haitao,
>
> I don't like to say but in general I think you are sending too frequently.  The
> last version was sent April, 11th (my time), so considering the weekend it has
> only been 3 or at most 4 days.  
>
> Please slow down a little bit to give people more time.
>
> More information please also see:
>
> https://www.kernel.org/doc/html/next/process/submitting-patches.html#resend-reminders

+1

Yes, exactly. I'd take one week break and cycle the kselftest part
internally a bit as I said my previous response. I'm sure that there
is experise inside Intel how to implement it properly. I.e. take some
time to find the right person, and wait as long as that person has a
bit of bandwidth to go through the test and suggest modifications.

Cannot blame, as I've done the same mistake a few times in past but
yeah this would be the best possible corrective action to take.

BR, Jarkko
Haitao Huang April 16, 2024, 2:54 p.m. UTC | #6
On Tue, 16 Apr 2024 09:10:12 -0500, Jarkko Sakkinen <jarkko@kernel.org>  
wrote:

> On Tue Apr 16, 2024 at 5:05 PM EEST, Jarkko Sakkinen wrote:
>> On Tue Apr 16, 2024 at 6:20 AM EEST, Haitao Huang wrote:
>> > With different cgroups, the script starts one or multiple concurrent  
>> SGX
>> > selftests (test_sgx), each to run the unclobbered_vdso_oversubscribed
>> > test case, which loads an enclave of EPC size equal to the EPC  
>> capacity
>> > available on the platform. The script checks results against the
>> > expectation set for each cgroup and reports success or failure.
>> >
>> > The script creates 3 different cgroups at the beginning with following
>> > expectations:
>> >
>> > 1) SMALL - intentionally small enough to fail the test loading an
>> > enclave of size equal to the capacity.
>> > 2) LARGE - large enough to run up to 4 concurrent tests but fail some  
>> if
>> > more than 4 concurrent tests are run. The script starts 4 expecting at
>> > least one test to pass, and then starts 5 expecting at least one test
>> > to fail.
>> > 3) LARGER - limit is the same as the capacity, large enough to run  
>> lots of
>> > concurrent tests. The script starts 8 of them and expects all pass.
>> > Then it reruns the same test with one process randomly killed and
>> > usage checked to be zero after all processes exit.
>> >
>> > The script also includes a test with low mem_cg limit and LARGE  
>> sgx_epc
>> > limit to verify that the RAM used for per-cgroup reclamation is  
>> charged
>> > to a proper mem_cg. For this test, it turns off swapping before start,
>> > and turns swapping back on afterwards.
>> >
>> > Add README to document how to run the tests.
>> >
>> > Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
>>
>> jarkko@mustatorvisieni:~/linux-tpmdd> sudo make -C  
>> tools/testing/selftests/sgx run_tests
>> make: Entering directory  
>> '/home/jarkko/linux-tpmdd/tools/testing/selftests/sgx'
>> gcc -Wall -Werror -g  
>> -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include  
>> -fPIC -c main.c -o  
>> /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/main.o
>> gcc -Wall -Werror -g  
>> -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include  
>> -fPIC -c load.c -o  
>> /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/load.o
>> gcc -Wall -Werror -g  
>> -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include  
>> -fPIC -c sigstruct.c -o  
>> /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sigstruct.o
>> gcc -Wall -Werror -g  
>> -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include  
>> -fPIC -c call.S -o  
>> /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/call.o
>> gcc -Wall -Werror -g  
>> -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include  
>> -fPIC -c sign_key.S -o  
>> /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sign_key.o
>> gcc -Wall -Werror -g  
>> -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include  
>> -fPIC -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/test_sgx  
>> /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/main.o  
>> /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/load.o  
>> /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sigstruct.o  
>> /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/call.o  
>> /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sign_key.o -z  
>> noexecstack -lcrypto
>> gcc -Wall -Werror -static-pie -nostdlib -ffreestanding -fPIE  
>> -fno-stack-protector -mrdrnd  
>> -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include  
>> test_encl.c test_encl_bootstrap.S -o  
>> /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/test_encl.elf  
>> -Wl,-T,test_encl.lds,--build-id=none
>> /usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld:  
>> warning: /tmp/ccqvDJVg.o: missing .note.GNU-stack section implies  
>> executable stack
>> /usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld:  
>> NOTE: This behaviour is deprecated and will be removed in a future  
>> version of the linker
>> TAP version 13
>> 1..2
>> # timeout set to 45
>> # selftests: sgx: test_sgx
>> # TAP version 13
>> # 1..16
>> # # Starting 16 tests from 1 test cases.
>> # #  RUN           enclave.unclobbered_vdso ...
>> # #            OK  enclave.unclobbered_vdso
>> # ok 1 enclave.unclobbered_vdso
>> # #  RUN           enclave.unclobbered_vdso_oversubscribed ...
>> # #            OK  enclave.unclobbered_vdso_oversubscribed
>> # ok 2 enclave.unclobbered_vdso_oversubscribed
>> # #  RUN           enclave.unclobbered_vdso_oversubscribed_remove ...
>> # # main.c:402:unclobbered_vdso_oversubscribed_remove:Creating an  
>> enclave with 98566144 bytes heap may take a while ...
>> # # main.c:457:unclobbered_vdso_oversubscribed_remove:Changing type of  
>> 98566144 bytes to trimmed may take a while ...
>> # # main.c:473:unclobbered_vdso_oversubscribed_remove:Entering enclave  
>> to run EACCEPT for each page of 98566144 bytes may take a while ...
>> # # main.c:494:unclobbered_vdso_oversubscribed_remove:Removing 98566144  
>> bytes from enclave may take a while ...
>> # #            OK  enclave.unclobbered_vdso_oversubscribed_remove
>> # ok 3 enclave.unclobbered_vdso_oversubscribed_remove
>> # #  RUN           enclave.clobbered_vdso ...
>> # #            OK  enclave.clobbered_vdso
>> # ok 4 enclave.clobbered_vdso
>> # #  RUN           enclave.clobbered_vdso_and_user_function ...
>> # #            OK  enclave.clobbered_vdso_and_user_function
>> # ok 5 enclave.clobbered_vdso_and_user_function
>> # #  RUN           enclave.tcs_entry ...
>> # #            OK  enclave.tcs_entry
>> # ok 6 enclave.tcs_entry
>> # #  RUN           enclave.pte_permissions ...
>> # #            OK  enclave.pte_permissions
>> # ok 7 enclave.pte_permissions
>> # #  RUN           enclave.tcs_permissions ...
>> # #            OK  enclave.tcs_permissions
>> # ok 8 enclave.tcs_permissions
>> # #  RUN           enclave.epcm_permissions ...
>> # #            OK  enclave.epcm_permissions
>> # ok 9 enclave.epcm_permissions
>> # #  RUN           enclave.augment ...
>> # #            OK  enclave.augment
>> # ok 10 enclave.augment
>> # #  RUN           enclave.augment_via_eaccept ...
>> # #            OK  enclave.augment_via_eaccept
>> # ok 11 enclave.augment_via_eaccept
>> # #  RUN           enclave.tcs_create ...
>> # #            OK  enclave.tcs_create
>> # ok 12 enclave.tcs_create
>> # #  RUN           enclave.remove_added_page_no_eaccept ...
>> # #            OK  enclave.remove_added_page_no_eaccept
>> # ok 13 enclave.remove_added_page_no_eaccept
>> # #  RUN           enclave.remove_added_page_invalid_access ...
>> # #            OK  enclave.remove_added_page_invalid_access
>> # ok 14 enclave.remove_added_page_invalid_access
>> # #  RUN            
>> enclave.remove_added_page_invalid_access_after_eaccept ...
>> # #            OK   
>> enclave.remove_added_page_invalid_access_after_eaccept
>> # ok 15 enclave.remove_added_page_invalid_access_after_eaccept
>> # #  RUN           enclave.remove_untouched_page ...
>> # #            OK  enclave.remove_untouched_page
>> # ok 16 enclave.remove_untouched_page
>> # # PASSED: 16 / 16 tests passed.
>> # # Totals: pass:16 fail:0 xfail:0 xpass:0 skip:0 error:0
>> ok 1 selftests: sgx: test_sgx
>> # timeout set to 45
>> # selftests: sgx: run_epc_cg_selftests.sh
>> # # Setting up limits.
>> # ./run_epc_cg_selftests.sh: line 50: echo: write error: Invalid  
>> argument
>> # # Failed setting up misc limits.
>> not ok 2 selftests: sgx: run_epc_cg_selftests.sh # exit=1
>> make: Leaving directory

This means no sgx cgroup turned on. (echoing sgx_epc entries into misc.max  
not allowed)
v12 removed the need for config CGROUP_SGX_EPC.
Did you by chance running on a previous kernel build without the sgx  
cgroup configured?

I did declare the configs in the config file but I missed it in my patch  
as stated earlier. IIUC, that would not cause this error though.

Maybe I should exit with the skip code if no CGROUP_MISC (no more  
CGROUP_SGX_EPC) is configured?

>> '/home/jarkko/linux-tpmdd/tools/testing/selftests/sgx'
>>
>> This is what happens now.
>>
>> BTW, I noticed a file that should not exist, i.e. README. Only thing
>> that should exist is the tests for kselftest and anything else should
>> not exist at all, so this file by definiton should not exist.
>
Could you point me to ths rule?

I felt some instructions needed as tests getting more complex, and was  
following examples:

tools/testing/selftests$ find . -name README
./futex/README
./tc-testing/README
./net/forwarding/README
./powerpc/nx-gzip/README
./ftrace/README
./arm64/signal/README
./arm64/fp/README
./arm64/README
./zram/README
./livepatch/README
./resctrl/README

> I'd suggest to sanity-check the kselftest with a person from Intel who
> has worked with kselftest before the next version so that it will be
> nailed next time. Or better internal review this single patch with a
> person with expertise on kernel QA.
>
I'll double check.

> I did not check this but I have also suspicion that it might have some
> checks whetehr it is run as root or not. If there are any, those should
> be removed too. Let people set their environment however want...
>
Do you mean this part?

+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+if [ "$(id -u)" -ne 0 ]; then
+    echo "SKIP: SGX Cgroup tests need root privileges."
+    exit $ksft_skip
+fi

I saw lots of similar code reported when I ran following in the selftests  
directory:

tools/testing/selftests$ grep -C 5 -r "root" */*.sh

Thanks
Haitao
Haitao Huang April 16, 2024, 3 p.m. UTC | #7
On Tue, 16 Apr 2024 00:42:41 -0500, Huang, Kai <kai.huang@intel.com> wrote:

>>
>> I'll send a fixup for this patch or another version of the series if  
>> more
>> changes are needed.
>
> Hi Haitao,
>
> I don't like to say but in general I think you are sending too  
> frequently.  The
> last version was sent April, 11th (my time), so considering the weekend  
> it has
> only been 3 or at most 4 days.  
> Please slow down a little bit to give people more time.
>
> More information please also see:
>
> https://www.kernel.org/doc/html/next/process/submitting-patches.html#resend-reminders

Thanks for the feedback. I'll slow down sending new versions.

Haitao
Jarkko Sakkinen April 16, 2024, 4:08 p.m. UTC | #8
On Tue Apr 16, 2024 at 5:54 PM EEST, Haitao Huang wrote:
> I did declare the configs in the config file but I missed it in my patch  
> as stated earlier. IIUC, that would not cause this error though.
>
> Maybe I should exit with the skip code if no CGROUP_MISC (no more  
> CGROUP_SGX_EPC) is configured?

OK, so I wanted to do a distro kernel test here, and used the default
OpenSUSE kernel config. I need to check if it has CGROUP_MISC set.

> tools/testing/selftests$ find . -name README
> ./futex/README
> ./tc-testing/README
> ./net/forwarding/README
> ./powerpc/nx-gzip/README
> ./ftrace/README
> ./arm64/signal/README
> ./arm64/fp/README
> ./arm64/README
> ./zram/README
> ./livepatch/README
> ./resctrl/README

So is there a README because of override timeout parameter? Maybe it
should be just set to a high enough value?

BR, Jarkko
Haitao Huang April 16, 2024, 10:04 p.m. UTC | #9
On Tue, 16 Apr 2024 11:08:11 -0500, Jarkko Sakkinen <jarkko@kernel.org>  
wrote:

> On Tue Apr 16, 2024 at 5:54 PM EEST, Haitao Huang wrote:
>> I did declare the configs in the config file but I missed it in my patch
>> as stated earlier. IIUC, that would not cause this error though.
>>
>> Maybe I should exit with the skip code if no CGROUP_MISC (no more
>> CGROUP_SGX_EPC) is configured?
>
> OK, so I wanted to do a distro kernel test here, and used the default
> OpenSUSE kernel config. I need to check if it has CGROUP_MISC set.
>
>> tools/testing/selftests$ find . -name README
>> ./futex/README
>> ./tc-testing/README
>> ./net/forwarding/README
>> ./powerpc/nx-gzip/README
>> ./ftrace/README
>> ./arm64/signal/README
>> ./arm64/fp/README
>> ./arm64/README
>> ./zram/README
>> ./livepatch/README
>> ./resctrl/README
>
> So is there a README because of override timeout parameter? Maybe it
> should be just set to a high enough value?
>
> BR, Jarkko
>


 From the docs, I think we are supposed to use override.
See: https://docs.kernel.org/dev-tools/kselftest.html#timeout-for-selftests

Thanks
Haitao
Haitao Huang April 16, 2024, 10:21 p.m. UTC | #10
On Tue, 16 Apr 2024 17:04:21 -0500, Haitao Huang  
<haitao.huang@linux.intel.com> wrote:

> On Tue, 16 Apr 2024 11:08:11 -0500, Jarkko Sakkinen <jarkko@kernel.org>  
> wrote:
>
>> On Tue Apr 16, 2024 at 5:54 PM EEST, Haitao Huang wrote:
>>> I did declare the configs in the config file but I missed it in my  
>>> patch
>>> as stated earlier. IIUC, that would not cause this error though.
>>>
>>> Maybe I should exit with the skip code if no CGROUP_MISC (no more
>>> CGROUP_SGX_EPC) is configured?
>>
>> OK, so I wanted to do a distro kernel test here, and used the default
>> OpenSUSE kernel config. I need to check if it has CGROUP_MISC set.
>>
>>> tools/testing/selftests$ find . -name README
>>> ./futex/README
>>> ./tc-testing/README
>>> ./net/forwarding/README
>>> ./powerpc/nx-gzip/README
>>> ./ftrace/README
>>> ./arm64/signal/README
>>> ./arm64/fp/README
>>> ./arm64/README
>>> ./zram/README
>>> ./livepatch/README
>>> ./resctrl/README
>>
>> So is there a README because of override timeout parameter? Maybe it
>> should be just set to a high enough value?
>>
>> BR, Jarkko
>>
>
>
>  From the docs, I think we are supposed to use override.
> See:  
> https://docs.kernel.org/dev-tools/kselftest.html#timeout-for-selftests
>
> Thanks
> Haitao
>

Maybe you are suggesting we add settings file? I can do that.
README also explains what the tests do though. Do you still think they  
should not exist?
I was mostly following resctrl as example.

Thanks
Haitao
Haitao Huang April 17, 2024, 3:05 a.m. UTC | #11
On Tue, 16 Apr 2024 17:21:24 -0500, Haitao Huang  
<haitao.huang@linux.intel.com> wrote:

> On Tue, 16 Apr 2024 17:04:21 -0500, Haitao Huang  
> <haitao.huang@linux.intel.com> wrote:
>
>> On Tue, 16 Apr 2024 11:08:11 -0500, Jarkko Sakkinen <jarkko@kernel.org>  
>> wrote:
>>
>>> On Tue Apr 16, 2024 at 5:54 PM EEST, Haitao Huang wrote:
>>>> I did declare the configs in the config file but I missed it in my  
>>>> patch
>>>> as stated earlier. IIUC, that would not cause this error though.
>>>>
>>>> Maybe I should exit with the skip code if no CGROUP_MISC (no more
>>>> CGROUP_SGX_EPC) is configured?
>>>
>>> OK, so I wanted to do a distro kernel test here, and used the default
>>> OpenSUSE kernel config. I need to check if it has CGROUP_MISC set.
>>>
>>>> tools/testing/selftests$ find . -name README
>>>> ./futex/README
>>>> ./tc-testing/README
>>>> ./net/forwarding/README
>>>> ./powerpc/nx-gzip/README
>>>> ./ftrace/README
>>>> ./arm64/signal/README
>>>> ./arm64/fp/README
>>>> ./arm64/README
>>>> ./zram/README
>>>> ./livepatch/README
>>>> ./resctrl/README
>>>
>>> So is there a README because of override timeout parameter? Maybe it
>>> should be just set to a high enough value?
>>>
>>> BR, Jarkko
>>>
>>
>>
>>  From the docs, I think we are supposed to use override.
>> See:  
>> https://docs.kernel.org/dev-tools/kselftest.html#timeout-for-selftests
>>
>> Thanks
>> Haitao
>>
>
> Maybe you are suggesting we add settings file? I can do that.
> README also explains what the tests do though. Do you still think they  
> should not exist?
> I was mostly following resctrl as example.
>
> Thanks
> Haitao
>

With the settings I shortened the README quite bit. Now I also lean  
towards removing it. Let me know your preference. You can check the latest  
at my branch for reference:
https://github.com/haitaohuang/linux/tree/sgx_cg_upstream_v12_plus

Thanks
Haitao
Jarkko Sakkinen April 17, 2024, 10:46 p.m. UTC | #12
On Wed Apr 17, 2024 at 1:04 AM EEST, Haitao Huang wrote:
> On Tue, 16 Apr 2024 11:08:11 -0500, Jarkko Sakkinen <jarkko@kernel.org>  
> wrote:
>
> > On Tue Apr 16, 2024 at 5:54 PM EEST, Haitao Huang wrote:
> >> I did declare the configs in the config file but I missed it in my patch
> >> as stated earlier. IIUC, that would not cause this error though.
> >>
> >> Maybe I should exit with the skip code if no CGROUP_MISC (no more
> >> CGROUP_SGX_EPC) is configured?
> >
> > OK, so I wanted to do a distro kernel test here, and used the default
> > OpenSUSE kernel config. I need to check if it has CGROUP_MISC set.
> >
> >> tools/testing/selftests$ find . -name README
> >> ./futex/README
> >> ./tc-testing/README
> >> ./net/forwarding/README
> >> ./powerpc/nx-gzip/README
> >> ./ftrace/README
> >> ./arm64/signal/README
> >> ./arm64/fp/README
> >> ./arm64/README
> >> ./zram/README
> >> ./livepatch/README
> >> ./resctrl/README
> >
> > So is there a README because of override timeout parameter? Maybe it
> > should be just set to a high enough value?
> >
> > BR, Jarkko
> >
>
>
>  From the docs, I think we are supposed to use override.
> See: https://docs.kernel.org/dev-tools/kselftest.html#timeout-for-selftests

OK, fair enough :-) I did not know this.

BR, Jarkko
Haitao Huang April 24, 2024, 7:42 p.m. UTC | #13
Hi Jarkko
On Tue, 16 Apr 2024 11:08:11 -0500, Jarkko Sakkinen <jarkko@kernel.org>  
wrote:

> On Tue Apr 16, 2024 at 5:54 PM EEST, Haitao Huang wrote:
>> I did declare the configs in the config file but I missed it in my patch
>> as stated earlier. IIUC, that would not cause this error though.
>>
>> Maybe I should exit with the skip code if no CGROUP_MISC (no more
>> CGROUP_SGX_EPC) is configured?
> OK, so I wanted to do a distro kernel test here, and used the default
> OpenSUSE kernel config. I need to check if it has CGROUP_MISC set.

I couldn't figure out why this failure you have encountered. I think  
OpenSUSE kernel most likely config CGROUP_MISC.

Also if CGROUP_MISC not set, then there should be error happen earlier on  
echoing "+misc" to cgroup.subtree_control at line 20. But your log  
indicates only error on echoing "sgx_epc ..." to  
/sys/fs/cgroup/...//misc.max.

I can only speculate that can could happen (if sgx epc cgroup was compiled  
in) when the cgroup-fs subdirectories in question already have populated  
config that is conflicting with the scripts.

Could you double check or start from a clean environment?
Thanks
Haitao
Jarkko Sakkinen April 25, 2024, 4:51 a.m. UTC | #14
On Wed Apr 24, 2024 at 10:42 PM EEST, Haitao Huang wrote:
> Hi Jarkko
> On Tue, 16 Apr 2024 11:08:11 -0500, Jarkko Sakkinen <jarkko@kernel.org>  
> wrote:
>
> > On Tue Apr 16, 2024 at 5:54 PM EEST, Haitao Huang wrote:
> >> I did declare the configs in the config file but I missed it in my patch
> >> as stated earlier. IIUC, that would not cause this error though.
> >>
> >> Maybe I should exit with the skip code if no CGROUP_MISC (no more
> >> CGROUP_SGX_EPC) is configured?
> > OK, so I wanted to do a distro kernel test here, and used the default
> > OpenSUSE kernel config. I need to check if it has CGROUP_MISC set.
>
> I couldn't figure out why this failure you have encountered. I think  
> OpenSUSE kernel most likely config CGROUP_MISC.
>
> Also if CGROUP_MISC not set, then there should be error happen earlier on  
> echoing "+misc" to cgroup.subtree_control at line 20. But your log  
> indicates only error on echoing "sgx_epc ..." to  
> /sys/fs/cgroup/...//misc.max.
>
> I can only speculate that can could happen (if sgx epc cgroup was compiled  
> in) when the cgroup-fs subdirectories in question already have populated  
> config that is conflicting with the scripts.
>
> Could you double check or start from a clean environment?
> Thanks
> Haitao

I can re-check next week once I'm back from Estonia. I'm travelling now
to https://lu.ma/uncloud.

BR, Jarkko
Dave Hansen April 26, 2024, 2:28 p.m. UTC | #15
On 4/16/24 07:15, Jarkko Sakkinen wrote:
> On Tue Apr 16, 2024 at 8:42 AM EEST, Huang, Kai wrote:
> Yes, exactly. I'd take one week break and cycle the kselftest part
> internally a bit as I said my previous response. I'm sure that there
> is experise inside Intel how to implement it properly. I.e. take some
> time to find the right person, and wait as long as that person has a
> bit of bandwidth to go through the test and suggest modifications.

Folks, I worry that this series is getting bogged down in the selftests.
 Yes, selftests are important.  But getting _some_ tests in the kernel
is substantially more important than getting perfect tests.

I don't think Haitao needs to "cycle" this back inside Intel.
Jarkko Sakkinen April 28, 2024, 10:03 p.m. UTC | #16
On Fri Apr 26, 2024 at 5:28 PM EEST, Dave Hansen wrote:
> On 4/16/24 07:15, Jarkko Sakkinen wrote:
> > On Tue Apr 16, 2024 at 8:42 AM EEST, Huang, Kai wrote:
> > Yes, exactly. I'd take one week break and cycle the kselftest part
> > internally a bit as I said my previous response. I'm sure that there
> > is experise inside Intel how to implement it properly. I.e. take some
> > time to find the right person, and wait as long as that person has a
> > bit of bandwidth to go through the test and suggest modifications.
>
> Folks, I worry that this series is getting bogged down in the selftests.
>  Yes, selftests are important.  But getting _some_ tests in the kernel
> is substantially more important than getting perfect tests.
>
> I don't think Haitao needs to "cycle" this back inside Intel.

The problem with the tests was that they are hard to run anything else
than Ubuntu (and perhaps Debian). It is hopefully now taken care of.
Selftests do not have to be perfect but at minimum they need to be
runnable.

I need ret-test the latest series because it is possible that I did not
have right flags (I was travelling few days thus have not done it yet).

BR, Jarkko
Haitao Huang April 29, 2024, 4:18 p.m. UTC | #17
Hi Jarkko

On Sun, 28 Apr 2024 17:03:17 -0500, Jarkko Sakkinen <jarkko@kernel.org>  
wrote:

> On Fri Apr 26, 2024 at 5:28 PM EEST, Dave Hansen wrote:
>> On 4/16/24 07:15, Jarkko Sakkinen wrote:
>> > On Tue Apr 16, 2024 at 8:42 AM EEST, Huang, Kai wrote:
>> > Yes, exactly. I'd take one week break and cycle the kselftest part
>> > internally a bit as I said my previous response. I'm sure that there
>> > is experise inside Intel how to implement it properly. I.e. take some
>> > time to find the right person, and wait as long as that person has a
>> > bit of bandwidth to go through the test and suggest modifications.
>>
>> Folks, I worry that this series is getting bogged down in the selftests.
>>  Yes, selftests are important.  But getting _some_ tests in the kernel
>> is substantially more important than getting perfect tests.
>>
>> I don't think Haitao needs to "cycle" this back inside Intel.
>
> The problem with the tests was that they are hard to run anything else
> than Ubuntu (and perhaps Debian). It is hopefully now taken care of.
> Selftests do not have to be perfect but at minimum they need to be
> runnable.
>
> I need ret-test the latest series because it is possible that I did not
> have right flags (I was travelling few days thus have not done it yet).
>
> BR, Jarkko
>

Let me know if you want me to send v13 before testing or you can just use  
the sgx_cg_upstream_v12_plus branch in my repo.

Also thanks for the "Reviewed-by" tags for other patches. But I've not got  
"Reviewed-by" from you for patches #8-12 (not sure I missed). Could you go  
through those alsoe when you get chance?

Thanks
Haitao
Jarkko Sakkinen April 29, 2024, 4:43 p.m. UTC | #18
On Mon Apr 29, 2024 at 7:18 PM EEST, Haitao Huang wrote:
> Hi Jarkko
>
> On Sun, 28 Apr 2024 17:03:17 -0500, Jarkko Sakkinen <jarkko@kernel.org>  
> wrote:
>
> > On Fri Apr 26, 2024 at 5:28 PM EEST, Dave Hansen wrote:
> >> On 4/16/24 07:15, Jarkko Sakkinen wrote:
> >> > On Tue Apr 16, 2024 at 8:42 AM EEST, Huang, Kai wrote:
> >> > Yes, exactly. I'd take one week break and cycle the kselftest part
> >> > internally a bit as I said my previous response. I'm sure that there
> >> > is experise inside Intel how to implement it properly. I.e. take some
> >> > time to find the right person, and wait as long as that person has a
> >> > bit of bandwidth to go through the test and suggest modifications.
> >>
> >> Folks, I worry that this series is getting bogged down in the selftests.
> >>  Yes, selftests are important.  But getting _some_ tests in the kernel
> >> is substantially more important than getting perfect tests.
> >>
> >> I don't think Haitao needs to "cycle" this back inside Intel.
> >
> > The problem with the tests was that they are hard to run anything else
> > than Ubuntu (and perhaps Debian). It is hopefully now taken care of.
> > Selftests do not have to be perfect but at minimum they need to be
> > runnable.
> >
> > I need ret-test the latest series because it is possible that I did not
> > have right flags (I was travelling few days thus have not done it yet).
> >
> > BR, Jarkko
> >
>
> Let me know if you want me to send v13 before testing or you can just use  
> the sgx_cg_upstream_v12_plus branch in my repo.
>
> Also thanks for the "Reviewed-by" tags for other patches. But I've not got  
> "Reviewed-by" from you for patches #8-12 (not sure I missed). Could you go  
> through those alsoe when you get chance?

So, I compiled v12 branch. Was the only difference in selftests?

I can just copy them to the device.

BR, Jarkko
Haitao Huang April 29, 2024, 5:14 p.m. UTC | #19
On Mon, 29 Apr 2024 11:43:05 -0500, Jarkko Sakkinen <jarkko@kernel.org>  
wrote:

> On Mon Apr 29, 2024 at 7:18 PM EEST, Haitao Huang wrote:
>> Hi Jarkko
>>
>> On Sun, 28 Apr 2024 17:03:17 -0500, Jarkko Sakkinen <jarkko@kernel.org>
>> wrote:
>>
>> > On Fri Apr 26, 2024 at 5:28 PM EEST, Dave Hansen wrote:
>> >> On 4/16/24 07:15, Jarkko Sakkinen wrote:
>> >> > On Tue Apr 16, 2024 at 8:42 AM EEST, Huang, Kai wrote:
>> >> > Yes, exactly. I'd take one week break and cycle the kselftest part
>> >> > internally a bit as I said my previous response. I'm sure that  
>> there
>> >> > is experise inside Intel how to implement it properly. I.e. take  
>> some
>> >> > time to find the right person, and wait as long as that person has  
>> a
>> >> > bit of bandwidth to go through the test and suggest modifications.
>> >>
>> >> Folks, I worry that this series is getting bogged down in the  
>> selftests.
>> >>  Yes, selftests are important.  But getting _some_ tests in the  
>> kernel
>> >> is substantially more important than getting perfect tests.
>> >>
>> >> I don't think Haitao needs to "cycle" this back inside Intel.
>> >
>> > The problem with the tests was that they are hard to run anything else
>> > than Ubuntu (and perhaps Debian). It is hopefully now taken care of.
>> > Selftests do not have to be perfect but at minimum they need to be
>> > runnable.
>> >
>> > I need ret-test the latest series because it is possible that I did  
>> not
>> > have right flags (I was travelling few days thus have not done it  
>> yet).
>> >
>> > BR, Jarkko
>> >
>>
>> Let me know if you want me to send v13 before testing or you can just  
>> use
>> the sgx_cg_upstream_v12_plus branch in my repo.
>>
>> Also thanks for the "Reviewed-by" tags for other patches. But I've not  
>> got
>> "Reviewed-by" from you for patches #8-12 (not sure I missed). Could you  
>> go
>> through those alsoe when you get chance?
>
> So, I compiled v12 branch. Was the only difference in selftests?
>
> I can just copy them to the device.
>
> BR, Jarkko
>
The only other functional change is using BUG_ON() when workqueue  
allocation fails in sgx_cgroup_init(). It should not affect testing  
results.

Thanks
Haitao
diff mbox series

Patch

diff --git a/tools/testing/selftests/sgx/Makefile b/tools/testing/selftests/sgx/Makefile
index 867f88ce2570..739376af9e33 100644
--- a/tools/testing/selftests/sgx/Makefile
+++ b/tools/testing/selftests/sgx/Makefile
@@ -20,7 +20,8 @@  ENCL_LDFLAGS := -Wl,-T,test_encl.lds,--build-id=none
 
 ifeq ($(CAN_BUILD_X86_64), 1)
 TEST_CUSTOM_PROGS := $(OUTPUT)/test_sgx
-TEST_FILES := $(OUTPUT)/test_encl.elf
+TEST_FILES := $(OUTPUT)/test_encl.elf ash_cgexec.sh
+TEST_PROGS := run_epc_cg_selftests.sh
 
 all: $(TEST_CUSTOM_PROGS) $(OUTPUT)/test_encl.elf
 endif
diff --git a/tools/testing/selftests/sgx/README b/tools/testing/selftests/sgx/README
new file mode 100644
index 000000000000..dfc0c74ce99d
--- /dev/null
+++ b/tools/testing/selftests/sgx/README
@@ -0,0 +1,116 @@ 
+SGX selftests
+
+The SGX selftests includes a c program (test_sgx) that covers basic user space
+facing APIs and a shell scripts (run_sgx_cg_selftests.sh) testing SGX misc
+cgroup. The SGX cgroup test script requires root privileges and runs a
+specific test case of  the test_sgx in different cgroups configured by the
+script. More details about the cgroup test can be found below.
+
+All SGX selftests can run with or without kselftest framework.
+
+WITH KSELFTEST FRAMEWORK
+=======================
+
+BUILD
+-----
+
+Build executable file "test_sgx" from top level directory of the kernel source:
+ $ make -C tools/testing/selftests TARGETS=sgx
+
+RUN
+---
+
+Run all sgx tests as sudo or root since the cgroup tests need to configure cgroup
+limits in files under /sys/fs/cgroup.
+
+ $ sudo make -C tools/testing/selftests/sgx run_tests
+
+Without sudo, SGX cgroup tests will be skipped.
+
+On platforms with large Enclave Page Cache (EPC) and/or less cpu cores, the
+tests may need run longer than default timeout of 45 seconds. To avoid
+timeouts, set a value for kselftest_override_timeout for the make command:
+
+ $ sudo kselftest_override_timeout=165 make -C tools/testing/selftests/sgx run_tests
+
+Or use --override-timeout option if running the installed kselftests from the
+installation root directory:
+
+  $sudo ./run_kselftest.sh --override-timeout 165 -c sgx
+
+More details about kselftest framework can be found in
+Documentation/dev-tools/kselftest.rst.
+
+WITHOUT KSELFTEST FRAMEWORK
+===========================
+
+BUILD
+-----
+
+Build executable file "test_sgx" from this
+directory(tools/testing/selftests/sgx/):
+
+  $ make
+
+RUN
+---
+
+Run all non-cgroup tests:
+
+ $ ./test_sgx
+
+To test SGX cgroup:
+
+ $ sudo ./run_sgx_cg_selftests.sh
+
+THE SGX CGROUP TEST SCRIPTS
+===========================
+
+Overview of the main cgroup test script
+---------------------------------------
+
+With different cgroups, the script (run_sgx_cg_selftests.sh)  starts one or
+multiple concurrent SGX selftests (test_sgx), each to run the
+unclobbered_vdso_oversubscribed test case, which loads an enclave of EPC size
+equal to the EPC capacity available on the platform. The script checks results
+against the expectation set for each cgroup and reports success or failure.
+
+The script creates 3 different cgroups at the beginning with following
+expectations:
+
+  1) SMALL - intentionally small enough to fail the test loading an enclave of
+             size equal to the capacity.
+
+  2) LARGE - large enough to run up to 4 concurrent tests but fail some if more
+	     than 4 concurrent tests are run. The script starts 4 expecting at
+	     least one test to pass, and then starts 5 expecting at least one
+             test to fail.
+
+  3) LARGER - limit is the same as the capacity, large enough to run lots of
+	      concurrent tests. The script starts 8 of them and expects all
+	      pass.  Then it reruns the same test with one process randomly
+	      killed and usage checked to be zero after all processes exit.
+
+The script also includes a test with low mem_cg limit (memory.max) and LARGE
+sgx_epc limit to verify that the RAM used for per-cgroup reclamation is charged
+to a proper mem_cg. To validate mem_cg OOM-kills processes when its memory.max
+limit is reached due to SGX EPC reclamation, the script turns off swapping
+before start, and turns swapping back on afterwards for this particular test.
+
+The helper scripts for monitoring and trouble-shooting
+------------------------------------------------------
+
+For developer/tester to monitor the SGX cgroup settings and behavior or
+trouble-shoot any issues encountered during testing, a helper script,
+watch_misc_for_tests.sh, is provided, which can watch relevant entries in
+cgroupfs files. For example, to watch the SGX cgroup 'current' counter changes
+during testing, run this in a separate terminal from this directory:
+
+  $ ./watch_misc_for_tests.sh current
+
+For more details about SGX cgroups, see "Cgroup Support" in
+Documentation/arch/x86/sgx.rst.
+
+The scripts require cgroup v2 support. More details about cgroup v2 can be found
+in Documentation/admin-guide/cgroup-v2.rst.
+
diff --git a/tools/testing/selftests/sgx/ash_cgexec.sh b/tools/testing/selftests/sgx/ash_cgexec.sh
new file mode 100755
index 000000000000..cfa5d2b0e795
--- /dev/null
+++ b/tools/testing/selftests/sgx/ash_cgexec.sh
@@ -0,0 +1,16 @@ 
+#!/usr/bin/env sh
+# SPDX-License-Identifier: GPL-2.0
+# Copyright(c) 2024 Intel Corporation.
+
+# Start a program in a given cgroup.
+# Supports V2 cgroup paths, relative to /sys/fs/cgroup
+if [ "$#" -lt 2 ]; then
+    echo "Usage: $0 <v2 cgroup path> <command> [args...]"
+    exit 1
+fi
+# Move this shell to the cgroup.
+echo 0 >/sys/fs/cgroup/$1/cgroup.procs
+shift
+# Execute the command within the cgroup
+exec "$@"
+
diff --git a/tools/testing/selftests/sgx/run_epc_cg_selftests.sh b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh
new file mode 100755
index 000000000000..cd2911e865f0
--- /dev/null
+++ b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh
@@ -0,0 +1,283 @@ 
+#!/usr/bin/env sh
+# SPDX-License-Identifier: GPL-2.0
+# Copyright(c) 2023, 2024 Intel Corporation.
+
+
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+if [ "$(id -u)" -ne 0 ]; then
+    echo "SKIP: SGX Cgroup tests need root privileges."
+    exit $ksft_skip
+fi
+
+TEST_ROOT_CG=selftest
+TEST_CG_SUB1=$TEST_ROOT_CG/test1
+TEST_CG_SUB2=$TEST_ROOT_CG/test2
+# We will only set limit in test1 and run tests in test3
+TEST_CG_SUB3=$TEST_ROOT_CG/test1/test3
+TEST_CG_SUB4=$TEST_ROOT_CG/test4
+
+# Cgroup v2 only
+CG_ROOT=/sys/fs/cgroup
+mkdir -p $CG_ROOT/$TEST_CG_SUB1
+mkdir -p $CG_ROOT/$TEST_CG_SUB2
+mkdir -p $CG_ROOT/$TEST_CG_SUB3
+mkdir -p $CG_ROOT/$TEST_CG_SUB4
+
+# Turn on misc and memory controller in non-leaf nodes
+echo "+misc" >  $CG_ROOT/cgroup.subtree_control && \
+echo "+memory" > $CG_ROOT/cgroup.subtree_control && \
+echo "+misc" >  $CG_ROOT/$TEST_ROOT_CG/cgroup.subtree_control && \
+echo "+memory" > $CG_ROOT/$TEST_ROOT_CG/cgroup.subtree_control && \
+echo "+misc" >  $CG_ROOT/$TEST_CG_SUB1/cgroup.subtree_control
+if [ $? -ne 0 ]; then
+    echo "# Failed setting up cgroups, make sure misc and memory cgroups are enabled."
+    exit 1
+fi
+
+CAPACITY=$(grep "sgx_epc" "$CG_ROOT/misc.capacity" | awk '{print $2}')
+# This is below number of VA pages needed for enclave of capacity size. So
+# should fail oversubscribed cases
+SMALL=$(( CAPACITY / 512 ))
+
+# At least load one enclave of capacity size successfully, maybe up to 4.
+# But some may fail if we run more than 4 concurrent enclaves of capacity size.
+LARGE=$(( SMALL * 4 ))
+
+# Load lots of enclaves
+LARGER=$CAPACITY
+echo "# Setting up limits."
+echo "sgx_epc $SMALL" > $CG_ROOT/$TEST_CG_SUB1/misc.max && \
+echo "sgx_epc $LARGE" >  $CG_ROOT/$TEST_CG_SUB2/misc.max && \
+echo "sgx_epc $LARGER" > $CG_ROOT/$TEST_CG_SUB4/misc.max
+if [ $? -ne 0 ]; then
+    echo "# Failed setting up misc limits."
+    exit 1
+fi
+
+clean_up()
+{
+    sleep 2
+    rmdir $CG_ROOT/$TEST_CG_SUB2
+    rmdir $CG_ROOT/$TEST_CG_SUB3
+    rmdir $CG_ROOT/$TEST_CG_SUB4
+    rmdir $CG_ROOT/$TEST_CG_SUB1
+    rmdir $CG_ROOT/$TEST_ROOT_CG
+}
+
+timestamp=$(date +%Y%m%d_%H%M%S)
+
+test_cmd="./test_sgx -t unclobbered_vdso_oversubscribed"
+
+PROCESS_SUCCESS=1
+PROCESS_FAILURE=0
+
+# Wait for a process and check for expected exit status.
+#
+# Arguments:
+#	$1 - the pid of the process to wait and check.
+#	$2 - 1 if expecting success, 0 for failure.
+#
+# Return:
+#	0 if the exit status of the process matches the expectation.
+#	1 otherwise.
+wait_check_process_status() {
+    pid=$1
+    check_for_success=$2
+
+    wait "$pid"
+    status=$?
+
+    if [ $check_for_success -eq $PROCESS_SUCCESS ] && [ $status -eq 0 ]; then
+        echo "# Process $pid succeeded."
+        return 0
+    elif [ $check_for_success -eq $PROCESS_FAILURE ] && [ $status -ne 0 ]; then
+        echo "# Process $pid returned failure."
+        return 0
+    fi
+    return 1
+}
+
+# Wait for a set of processes and check for expected exit status
+#
+# Arguments:
+#	$1 - 1 if expecting success, 0 for failure.
+#	remaining args - The pids of the processes
+#
+# Return:
+#	0 if exit status of any process matches the expectation.
+#	1 otherwise.
+wait_and_detect_for_any() {
+    check_for_success=$1
+
+    shift
+    detected=1 # 0 for success detection
+
+    for pid in $@; do
+        if wait_check_process_status "$pid" "$check_for_success"; then
+            detected=0
+            # Wait for other processes to exit
+        fi
+    done
+
+    return $detected
+}
+
+echo "# Start unclobbered_vdso_oversubscribed with SMALL limit, expecting failure..."
+# Always use leaf node of misc cgroups
+# these may fail on OOM
+./ash_cgexec.sh $TEST_CG_SUB3 $test_cmd >cgtest_small_$timestamp.log 2>&1
+if [ $? -eq 0 ]; then
+    echo "# Fail on SMALL limit, not expecting any test passes."
+    clean_up
+    exit 1
+else
+    echo "# Test failed as expected."
+fi
+
+echo "# PASSED SMALL limit."
+
+echo "# Start 4 concurrent unclobbered_vdso_oversubscribed tests with LARGE limit,
+        expecting at least one success...."
+
+pids=""
+for i in 1 2 3 4; do
+    (
+        ./ash_cgexec.sh $TEST_CG_SUB2 $test_cmd >cgtest_large_positive_$timestamp.$i.log 2>&1
+    ) &
+    pids="$pids $!"
+done
+
+
+if wait_and_detect_for_any $PROCESS_SUCCESS "$pids"; then
+    echo "# PASSED LARGE limit positive testing."
+else
+    echo "# Failed on LARGE limit positive testing, no test passes."
+    clean_up
+    exit 1
+fi
+
+echo "# Start 5 concurrent unclobbered_vdso_oversubscribed tests with LARGE limit,
+        expecting at least one failure...."
+pids=""
+for i in 1 2 3 4 5; do
+    (
+        ./ash_cgexec.sh $TEST_CG_SUB2 $test_cmd >cgtest_large_negative_$timestamp.$i.log 2>&1
+    ) &
+    pids="$pids $!"
+done
+
+if wait_and_detect_for_any $PROCESS_FAILURE "$pids"; then
+    echo "# PASSED LARGE limit negative testing."
+else
+    echo "# Failed on LARGE limit negative testing, no test fails."
+    clean_up
+    exit 1
+fi
+
+echo "# Start 8 concurrent unclobbered_vdso_oversubscribed tests with LARGER limit,
+        expecting no failure...."
+pids=""
+for i in 1 2 3 4 5 6 7 8; do
+    (
+        ./ash_cgexec.sh $TEST_CG_SUB4 $test_cmd >cgtest_larger_$timestamp.$i.log 2>&1
+    ) &
+    pids="$pids $!"
+done
+
+if wait_and_detect_for_any $PROCESS_FAILURE "$pids"; then
+    echo "# Failed on LARGER limit, at least one test fails."
+    clean_up
+    exit 1
+else
+    echo "# PASSED LARGER limit tests."
+fi
+
+echo "# Start 8 concurrent unclobbered_vdso_oversubscribed tests with LARGER limit,
+      randomly kill one, expecting no failure...."
+pids=""
+for i in 1 2 3 4 5 6 7 8; do
+    (
+        ./ash_cgexec.sh $TEST_CG_SUB4 $test_cmd >cgtest_larger_kill_$timestamp.$i.log 2>&1
+    ) &
+    pids="$pids $!"
+done
+random_number=$(awk 'BEGIN{srand();print int(rand()*5)}')
+sleep $((random_number + 1))
+
+# Randomly select a process to kill
+# Make sure usage counter not leaked at the end.
+RANDOM_INDEX=$(awk 'BEGIN{srand();print int(rand()*8)}')
+counter=0
+for pid in $pids; do
+    if [ "$counter" -eq "$RANDOM_INDEX" ]; then
+        PID_TO_KILL=$pid
+        break
+    fi
+    counter=$((counter + 1))
+done
+
+kill $PID_TO_KILL
+echo "# Killed process with PID: $PID_TO_KILL"
+
+any_failure=0
+for pid in $pids; do
+    wait "$pid"
+    status=$?
+    if [ "$pid" != "$PID_TO_KILL" ]; then
+        if [ $status -ne 0 ]; then
+	    echo "# Process $pid returned failure."
+            any_failure=1
+        fi
+    fi
+done
+
+if [ $any_failure -ne 0 ]; then
+    echo "# Failed on random killing, at least one test fails."
+    clean_up
+    exit 1
+fi
+echo "# PASSED LARGER limit test with a process randomly killed."
+
+MEM_LIMIT_TOO_SMALL=$((CAPACITY - 2 * LARGE))
+
+echo "$MEM_LIMIT_TOO_SMALL" > $CG_ROOT/$TEST_CG_SUB2/memory.max
+if [ $? -ne 0 ]; then
+    echo "# Failed creating memory controller."
+    clean_up
+    exit 1
+fi
+
+echo "# Start 4 concurrent unclobbered_vdso_oversubscribed tests with LARGE EPC limit,
+        and too small RAM limit, expecting all failures...."
+# Ensure swapping off so the OOM killer is activated when mem_cgroup limit is hit.
+swapoff -a
+pids=""
+for i in 1 2 3 4; do
+    (
+        ./ash_cgexec.sh $TEST_CG_SUB2 $test_cmd >cgtest_large_oom_$timestamp.$i.log 2>&1
+    ) &
+    pids="$pids $!"
+done
+
+if wait_and_detect_for_any $PROCESS_SUCCESS "$pids"; then
+    echo "# Failed on tests with memcontrol, some tests did not fail."
+    clean_up
+    swapon -a
+    exit 1
+else
+    swapon -a
+    echo "# PASSED LARGE limit tests with memcontrol."
+fi
+
+sleep 2
+
+USAGE=$(grep '^sgx_epc' "$CG_ROOT/$TEST_ROOT_CG/misc.current" | awk '{print $2}')
+if [ "$USAGE" -ne 0 ]; then
+    echo "# Failed: Final usage is $USAGE, not 0."
+else
+    echo "# PASSED leakage check."
+    echo "# PASSED ALL cgroup limit tests, cleanup cgroups..."
+fi
+clean_up
+echo "# done."
diff --git a/tools/testing/selftests/sgx/watch_misc_for_tests.sh b/tools/testing/selftests/sgx/watch_misc_for_tests.sh
new file mode 100755
index 000000000000..1c9985726ace
--- /dev/null
+++ b/tools/testing/selftests/sgx/watch_misc_for_tests.sh
@@ -0,0 +1,11 @@ 
+#!/usr/bin/env sh
+# SPDX-License-Identifier: GPL-2.0
+# Copyright(c) 2023, 2024 Intel Corporation.
+
+if [ -z "$1" ]; then
+    echo "No argument supplied, please provide 'max', 'current', or 'events'"
+    exit 1
+fi
+
+watch -n 1 'find /sys/fs/cgroup -wholename "*/test*/misc.'$1'" -exec \
+    sh -c '\''echo "$1:"; cat "$1"'\'' _ {} \;'