Message ID | 20240416032011.58578-15-haitao.huang@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add Cgroup support for SGX EPC memory | expand |
Missed adding the config file: index 000000000000..e7f1db1d3eff --- /dev/null +++ b/tools/testing/selftests/sgx/config @@ -0,0 +1,4 @@ +CONFIG_CGROUPS=y +CONFIG_CGROUP_MISC=y +CONFIG_MEMCG=y +CONFIG_X86_SGX=y I'll send a fixup for this patch or another version of the series if more changes are needed. Thanks Haitao
> > I'll send a fixup for this patch or another version of the series if more > changes are needed. Hi Haitao, I don't like to say but in general I think you are sending too frequently. The last version was sent April, 11th (my time), so considering the weekend it has only been 3 or at most 4 days. Please slow down a little bit to give people more time. More information please also see: https://www.kernel.org/doc/html/next/process/submitting-patches.html#resend-reminders
On Tue Apr 16, 2024 at 6:20 AM EEST, Haitao Huang wrote: > With different cgroups, the script starts one or multiple concurrent SGX > selftests (test_sgx), each to run the unclobbered_vdso_oversubscribed > test case, which loads an enclave of EPC size equal to the EPC capacity > available on the platform. The script checks results against the > expectation set for each cgroup and reports success or failure. > > The script creates 3 different cgroups at the beginning with following > expectations: > > 1) SMALL - intentionally small enough to fail the test loading an > enclave of size equal to the capacity. > 2) LARGE - large enough to run up to 4 concurrent tests but fail some if > more than 4 concurrent tests are run. The script starts 4 expecting at > least one test to pass, and then starts 5 expecting at least one test > to fail. > 3) LARGER - limit is the same as the capacity, large enough to run lots of > concurrent tests. The script starts 8 of them and expects all pass. > Then it reruns the same test with one process randomly killed and > usage checked to be zero after all processes exit. > > The script also includes a test with low mem_cg limit and LARGE sgx_epc > limit to verify that the RAM used for per-cgroup reclamation is charged > to a proper mem_cg. For this test, it turns off swapping before start, > and turns swapping back on afterwards. > > Add README to document how to run the tests. > > Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com> jarkko@mustatorvisieni:~/linux-tpmdd> sudo make -C tools/testing/selftests/sgx run_tests make: Entering directory '/home/jarkko/linux-tpmdd/tools/testing/selftests/sgx' gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c main.c -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/main.o gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c load.c -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/load.o gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c sigstruct.c -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sigstruct.o gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c call.S -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/call.o gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c sign_key.S -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sign_key.o gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/test_sgx /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/main.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/load.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sigstruct.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/call.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sign_key.o -z noexecstack -lcrypto gcc -Wall -Werror -static-pie -nostdlib -ffreestanding -fPIE -fno-stack-protector -mrdrnd -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include test_encl.c test_encl_bootstrap.S -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/test_encl.elf -Wl,-T,test_encl.lds,--build-id=none /usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld: warning: /tmp/ccqvDJVg.o: missing .note.GNU-stack section implies executable stack /usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker TAP version 13 1..2 # timeout set to 45 # selftests: sgx: test_sgx # TAP version 13 # 1..16 # # Starting 16 tests from 1 test cases. # # RUN enclave.unclobbered_vdso ... # # OK enclave.unclobbered_vdso # ok 1 enclave.unclobbered_vdso # # RUN enclave.unclobbered_vdso_oversubscribed ... # # OK enclave.unclobbered_vdso_oversubscribed # ok 2 enclave.unclobbered_vdso_oversubscribed # # RUN enclave.unclobbered_vdso_oversubscribed_remove ... # # main.c:402:unclobbered_vdso_oversubscribed_remove:Creating an enclave with 98566144 bytes heap may take a while ... # # main.c:457:unclobbered_vdso_oversubscribed_remove:Changing type of 98566144 bytes to trimmed may take a while ... # # main.c:473:unclobbered_vdso_oversubscribed_remove:Entering enclave to run EACCEPT for each page of 98566144 bytes may take a while ... # # main.c:494:unclobbered_vdso_oversubscribed_remove:Removing 98566144 bytes from enclave may take a while ... # # OK enclave.unclobbered_vdso_oversubscribed_remove # ok 3 enclave.unclobbered_vdso_oversubscribed_remove # # RUN enclave.clobbered_vdso ... # # OK enclave.clobbered_vdso # ok 4 enclave.clobbered_vdso # # RUN enclave.clobbered_vdso_and_user_function ... # # OK enclave.clobbered_vdso_and_user_function # ok 5 enclave.clobbered_vdso_and_user_function # # RUN enclave.tcs_entry ... # # OK enclave.tcs_entry # ok 6 enclave.tcs_entry # # RUN enclave.pte_permissions ... # # OK enclave.pte_permissions # ok 7 enclave.pte_permissions # # RUN enclave.tcs_permissions ... # # OK enclave.tcs_permissions # ok 8 enclave.tcs_permissions # # RUN enclave.epcm_permissions ... # # OK enclave.epcm_permissions # ok 9 enclave.epcm_permissions # # RUN enclave.augment ... # # OK enclave.augment # ok 10 enclave.augment # # RUN enclave.augment_via_eaccept ... # # OK enclave.augment_via_eaccept # ok 11 enclave.augment_via_eaccept # # RUN enclave.tcs_create ... # # OK enclave.tcs_create # ok 12 enclave.tcs_create # # RUN enclave.remove_added_page_no_eaccept ... # # OK enclave.remove_added_page_no_eaccept # ok 13 enclave.remove_added_page_no_eaccept # # RUN enclave.remove_added_page_invalid_access ... # # OK enclave.remove_added_page_invalid_access # ok 14 enclave.remove_added_page_invalid_access # # RUN enclave.remove_added_page_invalid_access_after_eaccept ... # # OK enclave.remove_added_page_invalid_access_after_eaccept # ok 15 enclave.remove_added_page_invalid_access_after_eaccept # # RUN enclave.remove_untouched_page ... # # OK enclave.remove_untouched_page # ok 16 enclave.remove_untouched_page # # PASSED: 16 / 16 tests passed. # # Totals: pass:16 fail:0 xfail:0 xpass:0 skip:0 error:0 ok 1 selftests: sgx: test_sgx # timeout set to 45 # selftests: sgx: run_epc_cg_selftests.sh # # Setting up limits. # ./run_epc_cg_selftests.sh: line 50: echo: write error: Invalid argument # # Failed setting up misc limits. not ok 2 selftests: sgx: run_epc_cg_selftests.sh # exit=1 make: Leaving directory '/home/jarkko/linux-tpmdd/tools/testing/selftests/sgx' This is what happens now. BTW, I noticed a file that should not exist, i.e. README. Only thing that should exist is the tests for kselftest and anything else should not exist at all, so this file by definiton should not exist. BR, Jarkko
On Tue Apr 16, 2024 at 5:05 PM EEST, Jarkko Sakkinen wrote: > On Tue Apr 16, 2024 at 6:20 AM EEST, Haitao Huang wrote: > > With different cgroups, the script starts one or multiple concurrent SGX > > selftests (test_sgx), each to run the unclobbered_vdso_oversubscribed > > test case, which loads an enclave of EPC size equal to the EPC capacity > > available on the platform. The script checks results against the > > expectation set for each cgroup and reports success or failure. > > > > The script creates 3 different cgroups at the beginning with following > > expectations: > > > > 1) SMALL - intentionally small enough to fail the test loading an > > enclave of size equal to the capacity. > > 2) LARGE - large enough to run up to 4 concurrent tests but fail some if > > more than 4 concurrent tests are run. The script starts 4 expecting at > > least one test to pass, and then starts 5 expecting at least one test > > to fail. > > 3) LARGER - limit is the same as the capacity, large enough to run lots of > > concurrent tests. The script starts 8 of them and expects all pass. > > Then it reruns the same test with one process randomly killed and > > usage checked to be zero after all processes exit. > > > > The script also includes a test with low mem_cg limit and LARGE sgx_epc > > limit to verify that the RAM used for per-cgroup reclamation is charged > > to a proper mem_cg. For this test, it turns off swapping before start, > > and turns swapping back on afterwards. > > > > Add README to document how to run the tests. > > > > Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com> > > jarkko@mustatorvisieni:~/linux-tpmdd> sudo make -C tools/testing/selftests/sgx run_tests > make: Entering directory '/home/jarkko/linux-tpmdd/tools/testing/selftests/sgx' > gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c main.c -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/main.o > gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c load.c -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/load.o > gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c sigstruct.c -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sigstruct.o > gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c call.S -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/call.o > gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c sign_key.S -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sign_key.o > gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/test_sgx /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/main.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/load.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sigstruct.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/call.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sign_key.o -z noexecstack -lcrypto > gcc -Wall -Werror -static-pie -nostdlib -ffreestanding -fPIE -fno-stack-protector -mrdrnd -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include test_encl.c test_encl_bootstrap.S -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/test_encl.elf -Wl,-T,test_encl.lds,--build-id=none > /usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld: warning: /tmp/ccqvDJVg.o: missing .note.GNU-stack section implies executable stack > /usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker > TAP version 13 > 1..2 > # timeout set to 45 > # selftests: sgx: test_sgx > # TAP version 13 > # 1..16 > # # Starting 16 tests from 1 test cases. > # # RUN enclave.unclobbered_vdso ... > # # OK enclave.unclobbered_vdso > # ok 1 enclave.unclobbered_vdso > # # RUN enclave.unclobbered_vdso_oversubscribed ... > # # OK enclave.unclobbered_vdso_oversubscribed > # ok 2 enclave.unclobbered_vdso_oversubscribed > # # RUN enclave.unclobbered_vdso_oversubscribed_remove ... > # # main.c:402:unclobbered_vdso_oversubscribed_remove:Creating an enclave with 98566144 bytes heap may take a while ... > # # main.c:457:unclobbered_vdso_oversubscribed_remove:Changing type of 98566144 bytes to trimmed may take a while ... > # # main.c:473:unclobbered_vdso_oversubscribed_remove:Entering enclave to run EACCEPT for each page of 98566144 bytes may take a while ... > # # main.c:494:unclobbered_vdso_oversubscribed_remove:Removing 98566144 bytes from enclave may take a while ... > # # OK enclave.unclobbered_vdso_oversubscribed_remove > # ok 3 enclave.unclobbered_vdso_oversubscribed_remove > # # RUN enclave.clobbered_vdso ... > # # OK enclave.clobbered_vdso > # ok 4 enclave.clobbered_vdso > # # RUN enclave.clobbered_vdso_and_user_function ... > # # OK enclave.clobbered_vdso_and_user_function > # ok 5 enclave.clobbered_vdso_and_user_function > # # RUN enclave.tcs_entry ... > # # OK enclave.tcs_entry > # ok 6 enclave.tcs_entry > # # RUN enclave.pte_permissions ... > # # OK enclave.pte_permissions > # ok 7 enclave.pte_permissions > # # RUN enclave.tcs_permissions ... > # # OK enclave.tcs_permissions > # ok 8 enclave.tcs_permissions > # # RUN enclave.epcm_permissions ... > # # OK enclave.epcm_permissions > # ok 9 enclave.epcm_permissions > # # RUN enclave.augment ... > # # OK enclave.augment > # ok 10 enclave.augment > # # RUN enclave.augment_via_eaccept ... > # # OK enclave.augment_via_eaccept > # ok 11 enclave.augment_via_eaccept > # # RUN enclave.tcs_create ... > # # OK enclave.tcs_create > # ok 12 enclave.tcs_create > # # RUN enclave.remove_added_page_no_eaccept ... > # # OK enclave.remove_added_page_no_eaccept > # ok 13 enclave.remove_added_page_no_eaccept > # # RUN enclave.remove_added_page_invalid_access ... > # # OK enclave.remove_added_page_invalid_access > # ok 14 enclave.remove_added_page_invalid_access > # # RUN enclave.remove_added_page_invalid_access_after_eaccept ... > # # OK enclave.remove_added_page_invalid_access_after_eaccept > # ok 15 enclave.remove_added_page_invalid_access_after_eaccept > # # RUN enclave.remove_untouched_page ... > # # OK enclave.remove_untouched_page > # ok 16 enclave.remove_untouched_page > # # PASSED: 16 / 16 tests passed. > # # Totals: pass:16 fail:0 xfail:0 xpass:0 skip:0 error:0 > ok 1 selftests: sgx: test_sgx > # timeout set to 45 > # selftests: sgx: run_epc_cg_selftests.sh > # # Setting up limits. > # ./run_epc_cg_selftests.sh: line 50: echo: write error: Invalid argument > # # Failed setting up misc limits. > not ok 2 selftests: sgx: run_epc_cg_selftests.sh # exit=1 > make: Leaving directory '/home/jarkko/linux-tpmdd/tools/testing/selftests/sgx' > > This is what happens now. > > BTW, I noticed a file that should not exist, i.e. README. Only thing > that should exist is the tests for kselftest and anything else should > not exist at all, so this file by definiton should not exist. I'd suggest to sanity-check the kselftest with a person from Intel who has worked with kselftest before the next version so that it will be nailed next time. Or better internal review this single patch with a person with expertise on kernel QA. I did not check this but I have also suspicion that it might have some checks whetehr it is run as root or not. If there are any, those should be removed too. Let people set their environment however want... BR, Jarkko
On Tue Apr 16, 2024 at 8:42 AM EEST, Huang, Kai wrote: > > > > I'll send a fixup for this patch or another version of the series if more > > changes are needed. > > Hi Haitao, > > I don't like to say but in general I think you are sending too frequently. The > last version was sent April, 11th (my time), so considering the weekend it has > only been 3 or at most 4 days. > > Please slow down a little bit to give people more time. > > More information please also see: > > https://www.kernel.org/doc/html/next/process/submitting-patches.html#resend-reminders +1 Yes, exactly. I'd take one week break and cycle the kselftest part internally a bit as I said my previous response. I'm sure that there is experise inside Intel how to implement it properly. I.e. take some time to find the right person, and wait as long as that person has a bit of bandwidth to go through the test and suggest modifications. Cannot blame, as I've done the same mistake a few times in past but yeah this would be the best possible corrective action to take. BR, Jarkko
On Tue, 16 Apr 2024 09:10:12 -0500, Jarkko Sakkinen <jarkko@kernel.org> wrote: > On Tue Apr 16, 2024 at 5:05 PM EEST, Jarkko Sakkinen wrote: >> On Tue Apr 16, 2024 at 6:20 AM EEST, Haitao Huang wrote: >> > With different cgroups, the script starts one or multiple concurrent >> SGX >> > selftests (test_sgx), each to run the unclobbered_vdso_oversubscribed >> > test case, which loads an enclave of EPC size equal to the EPC >> capacity >> > available on the platform. The script checks results against the >> > expectation set for each cgroup and reports success or failure. >> > >> > The script creates 3 different cgroups at the beginning with following >> > expectations: >> > >> > 1) SMALL - intentionally small enough to fail the test loading an >> > enclave of size equal to the capacity. >> > 2) LARGE - large enough to run up to 4 concurrent tests but fail some >> if >> > more than 4 concurrent tests are run. The script starts 4 expecting at >> > least one test to pass, and then starts 5 expecting at least one test >> > to fail. >> > 3) LARGER - limit is the same as the capacity, large enough to run >> lots of >> > concurrent tests. The script starts 8 of them and expects all pass. >> > Then it reruns the same test with one process randomly killed and >> > usage checked to be zero after all processes exit. >> > >> > The script also includes a test with low mem_cg limit and LARGE >> sgx_epc >> > limit to verify that the RAM used for per-cgroup reclamation is >> charged >> > to a proper mem_cg. For this test, it turns off swapping before start, >> > and turns swapping back on afterwards. >> > >> > Add README to document how to run the tests. >> > >> > Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com> >> >> jarkko@mustatorvisieni:~/linux-tpmdd> sudo make -C >> tools/testing/selftests/sgx run_tests >> make: Entering directory >> '/home/jarkko/linux-tpmdd/tools/testing/selftests/sgx' >> gcc -Wall -Werror -g >> -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include >> -fPIC -c main.c -o >> /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/main.o >> gcc -Wall -Werror -g >> -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include >> -fPIC -c load.c -o >> /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/load.o >> gcc -Wall -Werror -g >> -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include >> -fPIC -c sigstruct.c -o >> /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sigstruct.o >> gcc -Wall -Werror -g >> -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include >> -fPIC -c call.S -o >> /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/call.o >> gcc -Wall -Werror -g >> -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include >> -fPIC -c sign_key.S -o >> /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sign_key.o >> gcc -Wall -Werror -g >> -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include >> -fPIC -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/test_sgx >> /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/main.o >> /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/load.o >> /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sigstruct.o >> /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/call.o >> /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sign_key.o -z >> noexecstack -lcrypto >> gcc -Wall -Werror -static-pie -nostdlib -ffreestanding -fPIE >> -fno-stack-protector -mrdrnd >> -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include >> test_encl.c test_encl_bootstrap.S -o >> /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/test_encl.elf >> -Wl,-T,test_encl.lds,--build-id=none >> /usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld: >> warning: /tmp/ccqvDJVg.o: missing .note.GNU-stack section implies >> executable stack >> /usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld: >> NOTE: This behaviour is deprecated and will be removed in a future >> version of the linker >> TAP version 13 >> 1..2 >> # timeout set to 45 >> # selftests: sgx: test_sgx >> # TAP version 13 >> # 1..16 >> # # Starting 16 tests from 1 test cases. >> # # RUN enclave.unclobbered_vdso ... >> # # OK enclave.unclobbered_vdso >> # ok 1 enclave.unclobbered_vdso >> # # RUN enclave.unclobbered_vdso_oversubscribed ... >> # # OK enclave.unclobbered_vdso_oversubscribed >> # ok 2 enclave.unclobbered_vdso_oversubscribed >> # # RUN enclave.unclobbered_vdso_oversubscribed_remove ... >> # # main.c:402:unclobbered_vdso_oversubscribed_remove:Creating an >> enclave with 98566144 bytes heap may take a while ... >> # # main.c:457:unclobbered_vdso_oversubscribed_remove:Changing type of >> 98566144 bytes to trimmed may take a while ... >> # # main.c:473:unclobbered_vdso_oversubscribed_remove:Entering enclave >> to run EACCEPT for each page of 98566144 bytes may take a while ... >> # # main.c:494:unclobbered_vdso_oversubscribed_remove:Removing 98566144 >> bytes from enclave may take a while ... >> # # OK enclave.unclobbered_vdso_oversubscribed_remove >> # ok 3 enclave.unclobbered_vdso_oversubscribed_remove >> # # RUN enclave.clobbered_vdso ... >> # # OK enclave.clobbered_vdso >> # ok 4 enclave.clobbered_vdso >> # # RUN enclave.clobbered_vdso_and_user_function ... >> # # OK enclave.clobbered_vdso_and_user_function >> # ok 5 enclave.clobbered_vdso_and_user_function >> # # RUN enclave.tcs_entry ... >> # # OK enclave.tcs_entry >> # ok 6 enclave.tcs_entry >> # # RUN enclave.pte_permissions ... >> # # OK enclave.pte_permissions >> # ok 7 enclave.pte_permissions >> # # RUN enclave.tcs_permissions ... >> # # OK enclave.tcs_permissions >> # ok 8 enclave.tcs_permissions >> # # RUN enclave.epcm_permissions ... >> # # OK enclave.epcm_permissions >> # ok 9 enclave.epcm_permissions >> # # RUN enclave.augment ... >> # # OK enclave.augment >> # ok 10 enclave.augment >> # # RUN enclave.augment_via_eaccept ... >> # # OK enclave.augment_via_eaccept >> # ok 11 enclave.augment_via_eaccept >> # # RUN enclave.tcs_create ... >> # # OK enclave.tcs_create >> # ok 12 enclave.tcs_create >> # # RUN enclave.remove_added_page_no_eaccept ... >> # # OK enclave.remove_added_page_no_eaccept >> # ok 13 enclave.remove_added_page_no_eaccept >> # # RUN enclave.remove_added_page_invalid_access ... >> # # OK enclave.remove_added_page_invalid_access >> # ok 14 enclave.remove_added_page_invalid_access >> # # RUN >> enclave.remove_added_page_invalid_access_after_eaccept ... >> # # OK >> enclave.remove_added_page_invalid_access_after_eaccept >> # ok 15 enclave.remove_added_page_invalid_access_after_eaccept >> # # RUN enclave.remove_untouched_page ... >> # # OK enclave.remove_untouched_page >> # ok 16 enclave.remove_untouched_page >> # # PASSED: 16 / 16 tests passed. >> # # Totals: pass:16 fail:0 xfail:0 xpass:0 skip:0 error:0 >> ok 1 selftests: sgx: test_sgx >> # timeout set to 45 >> # selftests: sgx: run_epc_cg_selftests.sh >> # # Setting up limits. >> # ./run_epc_cg_selftests.sh: line 50: echo: write error: Invalid >> argument >> # # Failed setting up misc limits. >> not ok 2 selftests: sgx: run_epc_cg_selftests.sh # exit=1 >> make: Leaving directory This means no sgx cgroup turned on. (echoing sgx_epc entries into misc.max not allowed) v12 removed the need for config CGROUP_SGX_EPC. Did you by chance running on a previous kernel build without the sgx cgroup configured? I did declare the configs in the config file but I missed it in my patch as stated earlier. IIUC, that would not cause this error though. Maybe I should exit with the skip code if no CGROUP_MISC (no more CGROUP_SGX_EPC) is configured? >> '/home/jarkko/linux-tpmdd/tools/testing/selftests/sgx' >> >> This is what happens now. >> >> BTW, I noticed a file that should not exist, i.e. README. Only thing >> that should exist is the tests for kselftest and anything else should >> not exist at all, so this file by definiton should not exist. > Could you point me to ths rule? I felt some instructions needed as tests getting more complex, and was following examples: tools/testing/selftests$ find . -name README ./futex/README ./tc-testing/README ./net/forwarding/README ./powerpc/nx-gzip/README ./ftrace/README ./arm64/signal/README ./arm64/fp/README ./arm64/README ./zram/README ./livepatch/README ./resctrl/README > I'd suggest to sanity-check the kselftest with a person from Intel who > has worked with kselftest before the next version so that it will be > nailed next time. Or better internal review this single patch with a > person with expertise on kernel QA. > I'll double check. > I did not check this but I have also suspicion that it might have some > checks whetehr it is run as root or not. If there are any, those should > be removed too. Let people set their environment however want... > Do you mean this part? +# Kselftest framework requirement - SKIP code is 4. +ksft_skip=4 +if [ "$(id -u)" -ne 0 ]; then + echo "SKIP: SGX Cgroup tests need root privileges." + exit $ksft_skip +fi I saw lots of similar code reported when I ran following in the selftests directory: tools/testing/selftests$ grep -C 5 -r "root" */*.sh Thanks Haitao
On Tue, 16 Apr 2024 00:42:41 -0500, Huang, Kai <kai.huang@intel.com> wrote: >> >> I'll send a fixup for this patch or another version of the series if >> more >> changes are needed. > > Hi Haitao, > > I don't like to say but in general I think you are sending too > frequently. The > last version was sent April, 11th (my time), so considering the weekend > it has > only been 3 or at most 4 days. > Please slow down a little bit to give people more time. > > More information please also see: > > https://www.kernel.org/doc/html/next/process/submitting-patches.html#resend-reminders Thanks for the feedback. I'll slow down sending new versions. Haitao
On Tue Apr 16, 2024 at 5:54 PM EEST, Haitao Huang wrote: > I did declare the configs in the config file but I missed it in my patch > as stated earlier. IIUC, that would not cause this error though. > > Maybe I should exit with the skip code if no CGROUP_MISC (no more > CGROUP_SGX_EPC) is configured? OK, so I wanted to do a distro kernel test here, and used the default OpenSUSE kernel config. I need to check if it has CGROUP_MISC set. > tools/testing/selftests$ find . -name README > ./futex/README > ./tc-testing/README > ./net/forwarding/README > ./powerpc/nx-gzip/README > ./ftrace/README > ./arm64/signal/README > ./arm64/fp/README > ./arm64/README > ./zram/README > ./livepatch/README > ./resctrl/README So is there a README because of override timeout parameter? Maybe it should be just set to a high enough value? BR, Jarkko
On Tue, 16 Apr 2024 11:08:11 -0500, Jarkko Sakkinen <jarkko@kernel.org> wrote: > On Tue Apr 16, 2024 at 5:54 PM EEST, Haitao Huang wrote: >> I did declare the configs in the config file but I missed it in my patch >> as stated earlier. IIUC, that would not cause this error though. >> >> Maybe I should exit with the skip code if no CGROUP_MISC (no more >> CGROUP_SGX_EPC) is configured? > > OK, so I wanted to do a distro kernel test here, and used the default > OpenSUSE kernel config. I need to check if it has CGROUP_MISC set. > >> tools/testing/selftests$ find . -name README >> ./futex/README >> ./tc-testing/README >> ./net/forwarding/README >> ./powerpc/nx-gzip/README >> ./ftrace/README >> ./arm64/signal/README >> ./arm64/fp/README >> ./arm64/README >> ./zram/README >> ./livepatch/README >> ./resctrl/README > > So is there a README because of override timeout parameter? Maybe it > should be just set to a high enough value? > > BR, Jarkko > From the docs, I think we are supposed to use override. See: https://docs.kernel.org/dev-tools/kselftest.html#timeout-for-selftests Thanks Haitao
On Tue, 16 Apr 2024 17:04:21 -0500, Haitao Huang <haitao.huang@linux.intel.com> wrote: > On Tue, 16 Apr 2024 11:08:11 -0500, Jarkko Sakkinen <jarkko@kernel.org> > wrote: > >> On Tue Apr 16, 2024 at 5:54 PM EEST, Haitao Huang wrote: >>> I did declare the configs in the config file but I missed it in my >>> patch >>> as stated earlier. IIUC, that would not cause this error though. >>> >>> Maybe I should exit with the skip code if no CGROUP_MISC (no more >>> CGROUP_SGX_EPC) is configured? >> >> OK, so I wanted to do a distro kernel test here, and used the default >> OpenSUSE kernel config. I need to check if it has CGROUP_MISC set. >> >>> tools/testing/selftests$ find . -name README >>> ./futex/README >>> ./tc-testing/README >>> ./net/forwarding/README >>> ./powerpc/nx-gzip/README >>> ./ftrace/README >>> ./arm64/signal/README >>> ./arm64/fp/README >>> ./arm64/README >>> ./zram/README >>> ./livepatch/README >>> ./resctrl/README >> >> So is there a README because of override timeout parameter? Maybe it >> should be just set to a high enough value? >> >> BR, Jarkko >> > > > From the docs, I think we are supposed to use override. > See: > https://docs.kernel.org/dev-tools/kselftest.html#timeout-for-selftests > > Thanks > Haitao > Maybe you are suggesting we add settings file? I can do that. README also explains what the tests do though. Do you still think they should not exist? I was mostly following resctrl as example. Thanks Haitao
On Tue, 16 Apr 2024 17:21:24 -0500, Haitao Huang <haitao.huang@linux.intel.com> wrote: > On Tue, 16 Apr 2024 17:04:21 -0500, Haitao Huang > <haitao.huang@linux.intel.com> wrote: > >> On Tue, 16 Apr 2024 11:08:11 -0500, Jarkko Sakkinen <jarkko@kernel.org> >> wrote: >> >>> On Tue Apr 16, 2024 at 5:54 PM EEST, Haitao Huang wrote: >>>> I did declare the configs in the config file but I missed it in my >>>> patch >>>> as stated earlier. IIUC, that would not cause this error though. >>>> >>>> Maybe I should exit with the skip code if no CGROUP_MISC (no more >>>> CGROUP_SGX_EPC) is configured? >>> >>> OK, so I wanted to do a distro kernel test here, and used the default >>> OpenSUSE kernel config. I need to check if it has CGROUP_MISC set. >>> >>>> tools/testing/selftests$ find . -name README >>>> ./futex/README >>>> ./tc-testing/README >>>> ./net/forwarding/README >>>> ./powerpc/nx-gzip/README >>>> ./ftrace/README >>>> ./arm64/signal/README >>>> ./arm64/fp/README >>>> ./arm64/README >>>> ./zram/README >>>> ./livepatch/README >>>> ./resctrl/README >>> >>> So is there a README because of override timeout parameter? Maybe it >>> should be just set to a high enough value? >>> >>> BR, Jarkko >>> >> >> >> From the docs, I think we are supposed to use override. >> See: >> https://docs.kernel.org/dev-tools/kselftest.html#timeout-for-selftests >> >> Thanks >> Haitao >> > > Maybe you are suggesting we add settings file? I can do that. > README also explains what the tests do though. Do you still think they > should not exist? > I was mostly following resctrl as example. > > Thanks > Haitao > With the settings I shortened the README quite bit. Now I also lean towards removing it. Let me know your preference. You can check the latest at my branch for reference: https://github.com/haitaohuang/linux/tree/sgx_cg_upstream_v12_plus Thanks Haitao
On Wed Apr 17, 2024 at 1:04 AM EEST, Haitao Huang wrote: > On Tue, 16 Apr 2024 11:08:11 -0500, Jarkko Sakkinen <jarkko@kernel.org> > wrote: > > > On Tue Apr 16, 2024 at 5:54 PM EEST, Haitao Huang wrote: > >> I did declare the configs in the config file but I missed it in my patch > >> as stated earlier. IIUC, that would not cause this error though. > >> > >> Maybe I should exit with the skip code if no CGROUP_MISC (no more > >> CGROUP_SGX_EPC) is configured? > > > > OK, so I wanted to do a distro kernel test here, and used the default > > OpenSUSE kernel config. I need to check if it has CGROUP_MISC set. > > > >> tools/testing/selftests$ find . -name README > >> ./futex/README > >> ./tc-testing/README > >> ./net/forwarding/README > >> ./powerpc/nx-gzip/README > >> ./ftrace/README > >> ./arm64/signal/README > >> ./arm64/fp/README > >> ./arm64/README > >> ./zram/README > >> ./livepatch/README > >> ./resctrl/README > > > > So is there a README because of override timeout parameter? Maybe it > > should be just set to a high enough value? > > > > BR, Jarkko > > > > > From the docs, I think we are supposed to use override. > See: https://docs.kernel.org/dev-tools/kselftest.html#timeout-for-selftests OK, fair enough :-) I did not know this. BR, Jarkko
Hi Jarkko On Tue, 16 Apr 2024 11:08:11 -0500, Jarkko Sakkinen <jarkko@kernel.org> wrote: > On Tue Apr 16, 2024 at 5:54 PM EEST, Haitao Huang wrote: >> I did declare the configs in the config file but I missed it in my patch >> as stated earlier. IIUC, that would not cause this error though. >> >> Maybe I should exit with the skip code if no CGROUP_MISC (no more >> CGROUP_SGX_EPC) is configured? > OK, so I wanted to do a distro kernel test here, and used the default > OpenSUSE kernel config. I need to check if it has CGROUP_MISC set. I couldn't figure out why this failure you have encountered. I think OpenSUSE kernel most likely config CGROUP_MISC. Also if CGROUP_MISC not set, then there should be error happen earlier on echoing "+misc" to cgroup.subtree_control at line 20. But your log indicates only error on echoing "sgx_epc ..." to /sys/fs/cgroup/...//misc.max. I can only speculate that can could happen (if sgx epc cgroup was compiled in) when the cgroup-fs subdirectories in question already have populated config that is conflicting with the scripts. Could you double check or start from a clean environment? Thanks Haitao
On Wed Apr 24, 2024 at 10:42 PM EEST, Haitao Huang wrote: > Hi Jarkko > On Tue, 16 Apr 2024 11:08:11 -0500, Jarkko Sakkinen <jarkko@kernel.org> > wrote: > > > On Tue Apr 16, 2024 at 5:54 PM EEST, Haitao Huang wrote: > >> I did declare the configs in the config file but I missed it in my patch > >> as stated earlier. IIUC, that would not cause this error though. > >> > >> Maybe I should exit with the skip code if no CGROUP_MISC (no more > >> CGROUP_SGX_EPC) is configured? > > OK, so I wanted to do a distro kernel test here, and used the default > > OpenSUSE kernel config. I need to check if it has CGROUP_MISC set. > > I couldn't figure out why this failure you have encountered. I think > OpenSUSE kernel most likely config CGROUP_MISC. > > Also if CGROUP_MISC not set, then there should be error happen earlier on > echoing "+misc" to cgroup.subtree_control at line 20. But your log > indicates only error on echoing "sgx_epc ..." to > /sys/fs/cgroup/...//misc.max. > > I can only speculate that can could happen (if sgx epc cgroup was compiled > in) when the cgroup-fs subdirectories in question already have populated > config that is conflicting with the scripts. > > Could you double check or start from a clean environment? > Thanks > Haitao I can re-check next week once I'm back from Estonia. I'm travelling now to https://lu.ma/uncloud. BR, Jarkko
On 4/16/24 07:15, Jarkko Sakkinen wrote: > On Tue Apr 16, 2024 at 8:42 AM EEST, Huang, Kai wrote: > Yes, exactly. I'd take one week break and cycle the kselftest part > internally a bit as I said my previous response. I'm sure that there > is experise inside Intel how to implement it properly. I.e. take some > time to find the right person, and wait as long as that person has a > bit of bandwidth to go through the test and suggest modifications. Folks, I worry that this series is getting bogged down in the selftests. Yes, selftests are important. But getting _some_ tests in the kernel is substantially more important than getting perfect tests. I don't think Haitao needs to "cycle" this back inside Intel.
On Fri Apr 26, 2024 at 5:28 PM EEST, Dave Hansen wrote: > On 4/16/24 07:15, Jarkko Sakkinen wrote: > > On Tue Apr 16, 2024 at 8:42 AM EEST, Huang, Kai wrote: > > Yes, exactly. I'd take one week break and cycle the kselftest part > > internally a bit as I said my previous response. I'm sure that there > > is experise inside Intel how to implement it properly. I.e. take some > > time to find the right person, and wait as long as that person has a > > bit of bandwidth to go through the test and suggest modifications. > > Folks, I worry that this series is getting bogged down in the selftests. > Yes, selftests are important. But getting _some_ tests in the kernel > is substantially more important than getting perfect tests. > > I don't think Haitao needs to "cycle" this back inside Intel. The problem with the tests was that they are hard to run anything else than Ubuntu (and perhaps Debian). It is hopefully now taken care of. Selftests do not have to be perfect but at minimum they need to be runnable. I need ret-test the latest series because it is possible that I did not have right flags (I was travelling few days thus have not done it yet). BR, Jarkko
Hi Jarkko On Sun, 28 Apr 2024 17:03:17 -0500, Jarkko Sakkinen <jarkko@kernel.org> wrote: > On Fri Apr 26, 2024 at 5:28 PM EEST, Dave Hansen wrote: >> On 4/16/24 07:15, Jarkko Sakkinen wrote: >> > On Tue Apr 16, 2024 at 8:42 AM EEST, Huang, Kai wrote: >> > Yes, exactly. I'd take one week break and cycle the kselftest part >> > internally a bit as I said my previous response. I'm sure that there >> > is experise inside Intel how to implement it properly. I.e. take some >> > time to find the right person, and wait as long as that person has a >> > bit of bandwidth to go through the test and suggest modifications. >> >> Folks, I worry that this series is getting bogged down in the selftests. >> Yes, selftests are important. But getting _some_ tests in the kernel >> is substantially more important than getting perfect tests. >> >> I don't think Haitao needs to "cycle" this back inside Intel. > > The problem with the tests was that they are hard to run anything else > than Ubuntu (and perhaps Debian). It is hopefully now taken care of. > Selftests do not have to be perfect but at minimum they need to be > runnable. > > I need ret-test the latest series because it is possible that I did not > have right flags (I was travelling few days thus have not done it yet). > > BR, Jarkko > Let me know if you want me to send v13 before testing or you can just use the sgx_cg_upstream_v12_plus branch in my repo. Also thanks for the "Reviewed-by" tags for other patches. But I've not got "Reviewed-by" from you for patches #8-12 (not sure I missed). Could you go through those alsoe when you get chance? Thanks Haitao
On Mon Apr 29, 2024 at 7:18 PM EEST, Haitao Huang wrote: > Hi Jarkko > > On Sun, 28 Apr 2024 17:03:17 -0500, Jarkko Sakkinen <jarkko@kernel.org> > wrote: > > > On Fri Apr 26, 2024 at 5:28 PM EEST, Dave Hansen wrote: > >> On 4/16/24 07:15, Jarkko Sakkinen wrote: > >> > On Tue Apr 16, 2024 at 8:42 AM EEST, Huang, Kai wrote: > >> > Yes, exactly. I'd take one week break and cycle the kselftest part > >> > internally a bit as I said my previous response. I'm sure that there > >> > is experise inside Intel how to implement it properly. I.e. take some > >> > time to find the right person, and wait as long as that person has a > >> > bit of bandwidth to go through the test and suggest modifications. > >> > >> Folks, I worry that this series is getting bogged down in the selftests. > >> Yes, selftests are important. But getting _some_ tests in the kernel > >> is substantially more important than getting perfect tests. > >> > >> I don't think Haitao needs to "cycle" this back inside Intel. > > > > The problem with the tests was that they are hard to run anything else > > than Ubuntu (and perhaps Debian). It is hopefully now taken care of. > > Selftests do not have to be perfect but at minimum they need to be > > runnable. > > > > I need ret-test the latest series because it is possible that I did not > > have right flags (I was travelling few days thus have not done it yet). > > > > BR, Jarkko > > > > Let me know if you want me to send v13 before testing or you can just use > the sgx_cg_upstream_v12_plus branch in my repo. > > Also thanks for the "Reviewed-by" tags for other patches. But I've not got > "Reviewed-by" from you for patches #8-12 (not sure I missed). Could you go > through those alsoe when you get chance? So, I compiled v12 branch. Was the only difference in selftests? I can just copy them to the device. BR, Jarkko
On Mon, 29 Apr 2024 11:43:05 -0500, Jarkko Sakkinen <jarkko@kernel.org> wrote: > On Mon Apr 29, 2024 at 7:18 PM EEST, Haitao Huang wrote: >> Hi Jarkko >> >> On Sun, 28 Apr 2024 17:03:17 -0500, Jarkko Sakkinen <jarkko@kernel.org> >> wrote: >> >> > On Fri Apr 26, 2024 at 5:28 PM EEST, Dave Hansen wrote: >> >> On 4/16/24 07:15, Jarkko Sakkinen wrote: >> >> > On Tue Apr 16, 2024 at 8:42 AM EEST, Huang, Kai wrote: >> >> > Yes, exactly. I'd take one week break and cycle the kselftest part >> >> > internally a bit as I said my previous response. I'm sure that >> there >> >> > is experise inside Intel how to implement it properly. I.e. take >> some >> >> > time to find the right person, and wait as long as that person has >> a >> >> > bit of bandwidth to go through the test and suggest modifications. >> >> >> >> Folks, I worry that this series is getting bogged down in the >> selftests. >> >> Yes, selftests are important. But getting _some_ tests in the >> kernel >> >> is substantially more important than getting perfect tests. >> >> >> >> I don't think Haitao needs to "cycle" this back inside Intel. >> > >> > The problem with the tests was that they are hard to run anything else >> > than Ubuntu (and perhaps Debian). It is hopefully now taken care of. >> > Selftests do not have to be perfect but at minimum they need to be >> > runnable. >> > >> > I need ret-test the latest series because it is possible that I did >> not >> > have right flags (I was travelling few days thus have not done it >> yet). >> > >> > BR, Jarkko >> > >> >> Let me know if you want me to send v13 before testing or you can just >> use >> the sgx_cg_upstream_v12_plus branch in my repo. >> >> Also thanks for the "Reviewed-by" tags for other patches. But I've not >> got >> "Reviewed-by" from you for patches #8-12 (not sure I missed). Could you >> go >> through those alsoe when you get chance? > > So, I compiled v12 branch. Was the only difference in selftests? > > I can just copy them to the device. > > BR, Jarkko > The only other functional change is using BUG_ON() when workqueue allocation fails in sgx_cgroup_init(). It should not affect testing results. Thanks Haitao
diff --git a/tools/testing/selftests/sgx/Makefile b/tools/testing/selftests/sgx/Makefile index 867f88ce2570..739376af9e33 100644 --- a/tools/testing/selftests/sgx/Makefile +++ b/tools/testing/selftests/sgx/Makefile @@ -20,7 +20,8 @@ ENCL_LDFLAGS := -Wl,-T,test_encl.lds,--build-id=none ifeq ($(CAN_BUILD_X86_64), 1) TEST_CUSTOM_PROGS := $(OUTPUT)/test_sgx -TEST_FILES := $(OUTPUT)/test_encl.elf +TEST_FILES := $(OUTPUT)/test_encl.elf ash_cgexec.sh +TEST_PROGS := run_epc_cg_selftests.sh all: $(TEST_CUSTOM_PROGS) $(OUTPUT)/test_encl.elf endif diff --git a/tools/testing/selftests/sgx/README b/tools/testing/selftests/sgx/README new file mode 100644 index 000000000000..dfc0c74ce99d --- /dev/null +++ b/tools/testing/selftests/sgx/README @@ -0,0 +1,116 @@ +SGX selftests + +The SGX selftests includes a c program (test_sgx) that covers basic user space +facing APIs and a shell scripts (run_sgx_cg_selftests.sh) testing SGX misc +cgroup. The SGX cgroup test script requires root privileges and runs a +specific test case of the test_sgx in different cgroups configured by the +script. More details about the cgroup test can be found below. + +All SGX selftests can run with or without kselftest framework. + +WITH KSELFTEST FRAMEWORK +======================= + +BUILD +----- + +Build executable file "test_sgx" from top level directory of the kernel source: + $ make -C tools/testing/selftests TARGETS=sgx + +RUN +--- + +Run all sgx tests as sudo or root since the cgroup tests need to configure cgroup +limits in files under /sys/fs/cgroup. + + $ sudo make -C tools/testing/selftests/sgx run_tests + +Without sudo, SGX cgroup tests will be skipped. + +On platforms with large Enclave Page Cache (EPC) and/or less cpu cores, the +tests may need run longer than default timeout of 45 seconds. To avoid +timeouts, set a value for kselftest_override_timeout for the make command: + + $ sudo kselftest_override_timeout=165 make -C tools/testing/selftests/sgx run_tests + +Or use --override-timeout option if running the installed kselftests from the +installation root directory: + + $sudo ./run_kselftest.sh --override-timeout 165 -c sgx + +More details about kselftest framework can be found in +Documentation/dev-tools/kselftest.rst. + +WITHOUT KSELFTEST FRAMEWORK +=========================== + +BUILD +----- + +Build executable file "test_sgx" from this +directory(tools/testing/selftests/sgx/): + + $ make + +RUN +--- + +Run all non-cgroup tests: + + $ ./test_sgx + +To test SGX cgroup: + + $ sudo ./run_sgx_cg_selftests.sh + +THE SGX CGROUP TEST SCRIPTS +=========================== + +Overview of the main cgroup test script +--------------------------------------- + +With different cgroups, the script (run_sgx_cg_selftests.sh) starts one or +multiple concurrent SGX selftests (test_sgx), each to run the +unclobbered_vdso_oversubscribed test case, which loads an enclave of EPC size +equal to the EPC capacity available on the platform. The script checks results +against the expectation set for each cgroup and reports success or failure. + +The script creates 3 different cgroups at the beginning with following +expectations: + + 1) SMALL - intentionally small enough to fail the test loading an enclave of + size equal to the capacity. + + 2) LARGE - large enough to run up to 4 concurrent tests but fail some if more + than 4 concurrent tests are run. The script starts 4 expecting at + least one test to pass, and then starts 5 expecting at least one + test to fail. + + 3) LARGER - limit is the same as the capacity, large enough to run lots of + concurrent tests. The script starts 8 of them and expects all + pass. Then it reruns the same test with one process randomly + killed and usage checked to be zero after all processes exit. + +The script also includes a test with low mem_cg limit (memory.max) and LARGE +sgx_epc limit to verify that the RAM used for per-cgroup reclamation is charged +to a proper mem_cg. To validate mem_cg OOM-kills processes when its memory.max +limit is reached due to SGX EPC reclamation, the script turns off swapping +before start, and turns swapping back on afterwards for this particular test. + +The helper scripts for monitoring and trouble-shooting +------------------------------------------------------ + +For developer/tester to monitor the SGX cgroup settings and behavior or +trouble-shoot any issues encountered during testing, a helper script, +watch_misc_for_tests.sh, is provided, which can watch relevant entries in +cgroupfs files. For example, to watch the SGX cgroup 'current' counter changes +during testing, run this in a separate terminal from this directory: + + $ ./watch_misc_for_tests.sh current + +For more details about SGX cgroups, see "Cgroup Support" in +Documentation/arch/x86/sgx.rst. + +The scripts require cgroup v2 support. More details about cgroup v2 can be found +in Documentation/admin-guide/cgroup-v2.rst. + diff --git a/tools/testing/selftests/sgx/ash_cgexec.sh b/tools/testing/selftests/sgx/ash_cgexec.sh new file mode 100755 index 000000000000..cfa5d2b0e795 --- /dev/null +++ b/tools/testing/selftests/sgx/ash_cgexec.sh @@ -0,0 +1,16 @@ +#!/usr/bin/env sh +# SPDX-License-Identifier: GPL-2.0 +# Copyright(c) 2024 Intel Corporation. + +# Start a program in a given cgroup. +# Supports V2 cgroup paths, relative to /sys/fs/cgroup +if [ "$#" -lt 2 ]; then + echo "Usage: $0 <v2 cgroup path> <command> [args...]" + exit 1 +fi +# Move this shell to the cgroup. +echo 0 >/sys/fs/cgroup/$1/cgroup.procs +shift +# Execute the command within the cgroup +exec "$@" + diff --git a/tools/testing/selftests/sgx/run_epc_cg_selftests.sh b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh new file mode 100755 index 000000000000..cd2911e865f0 --- /dev/null +++ b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh @@ -0,0 +1,283 @@ +#!/usr/bin/env sh +# SPDX-License-Identifier: GPL-2.0 +# Copyright(c) 2023, 2024 Intel Corporation. + + +# Kselftest framework requirement - SKIP code is 4. +ksft_skip=4 +if [ "$(id -u)" -ne 0 ]; then + echo "SKIP: SGX Cgroup tests need root privileges." + exit $ksft_skip +fi + +TEST_ROOT_CG=selftest +TEST_CG_SUB1=$TEST_ROOT_CG/test1 +TEST_CG_SUB2=$TEST_ROOT_CG/test2 +# We will only set limit in test1 and run tests in test3 +TEST_CG_SUB3=$TEST_ROOT_CG/test1/test3 +TEST_CG_SUB4=$TEST_ROOT_CG/test4 + +# Cgroup v2 only +CG_ROOT=/sys/fs/cgroup +mkdir -p $CG_ROOT/$TEST_CG_SUB1 +mkdir -p $CG_ROOT/$TEST_CG_SUB2 +mkdir -p $CG_ROOT/$TEST_CG_SUB3 +mkdir -p $CG_ROOT/$TEST_CG_SUB4 + +# Turn on misc and memory controller in non-leaf nodes +echo "+misc" > $CG_ROOT/cgroup.subtree_control && \ +echo "+memory" > $CG_ROOT/cgroup.subtree_control && \ +echo "+misc" > $CG_ROOT/$TEST_ROOT_CG/cgroup.subtree_control && \ +echo "+memory" > $CG_ROOT/$TEST_ROOT_CG/cgroup.subtree_control && \ +echo "+misc" > $CG_ROOT/$TEST_CG_SUB1/cgroup.subtree_control +if [ $? -ne 0 ]; then + echo "# Failed setting up cgroups, make sure misc and memory cgroups are enabled." + exit 1 +fi + +CAPACITY=$(grep "sgx_epc" "$CG_ROOT/misc.capacity" | awk '{print $2}') +# This is below number of VA pages needed for enclave of capacity size. So +# should fail oversubscribed cases +SMALL=$(( CAPACITY / 512 )) + +# At least load one enclave of capacity size successfully, maybe up to 4. +# But some may fail if we run more than 4 concurrent enclaves of capacity size. +LARGE=$(( SMALL * 4 )) + +# Load lots of enclaves +LARGER=$CAPACITY +echo "# Setting up limits." +echo "sgx_epc $SMALL" > $CG_ROOT/$TEST_CG_SUB1/misc.max && \ +echo "sgx_epc $LARGE" > $CG_ROOT/$TEST_CG_SUB2/misc.max && \ +echo "sgx_epc $LARGER" > $CG_ROOT/$TEST_CG_SUB4/misc.max +if [ $? -ne 0 ]; then + echo "# Failed setting up misc limits." + exit 1 +fi + +clean_up() +{ + sleep 2 + rmdir $CG_ROOT/$TEST_CG_SUB2 + rmdir $CG_ROOT/$TEST_CG_SUB3 + rmdir $CG_ROOT/$TEST_CG_SUB4 + rmdir $CG_ROOT/$TEST_CG_SUB1 + rmdir $CG_ROOT/$TEST_ROOT_CG +} + +timestamp=$(date +%Y%m%d_%H%M%S) + +test_cmd="./test_sgx -t unclobbered_vdso_oversubscribed" + +PROCESS_SUCCESS=1 +PROCESS_FAILURE=0 + +# Wait for a process and check for expected exit status. +# +# Arguments: +# $1 - the pid of the process to wait and check. +# $2 - 1 if expecting success, 0 for failure. +# +# Return: +# 0 if the exit status of the process matches the expectation. +# 1 otherwise. +wait_check_process_status() { + pid=$1 + check_for_success=$2 + + wait "$pid" + status=$? + + if [ $check_for_success -eq $PROCESS_SUCCESS ] && [ $status -eq 0 ]; then + echo "# Process $pid succeeded." + return 0 + elif [ $check_for_success -eq $PROCESS_FAILURE ] && [ $status -ne 0 ]; then + echo "# Process $pid returned failure." + return 0 + fi + return 1 +} + +# Wait for a set of processes and check for expected exit status +# +# Arguments: +# $1 - 1 if expecting success, 0 for failure. +# remaining args - The pids of the processes +# +# Return: +# 0 if exit status of any process matches the expectation. +# 1 otherwise. +wait_and_detect_for_any() { + check_for_success=$1 + + shift + detected=1 # 0 for success detection + + for pid in $@; do + if wait_check_process_status "$pid" "$check_for_success"; then + detected=0 + # Wait for other processes to exit + fi + done + + return $detected +} + +echo "# Start unclobbered_vdso_oversubscribed with SMALL limit, expecting failure..." +# Always use leaf node of misc cgroups +# these may fail on OOM +./ash_cgexec.sh $TEST_CG_SUB3 $test_cmd >cgtest_small_$timestamp.log 2>&1 +if [ $? -eq 0 ]; then + echo "# Fail on SMALL limit, not expecting any test passes." + clean_up + exit 1 +else + echo "# Test failed as expected." +fi + +echo "# PASSED SMALL limit." + +echo "# Start 4 concurrent unclobbered_vdso_oversubscribed tests with LARGE limit, + expecting at least one success...." + +pids="" +for i in 1 2 3 4; do + ( + ./ash_cgexec.sh $TEST_CG_SUB2 $test_cmd >cgtest_large_positive_$timestamp.$i.log 2>&1 + ) & + pids="$pids $!" +done + + +if wait_and_detect_for_any $PROCESS_SUCCESS "$pids"; then + echo "# PASSED LARGE limit positive testing." +else + echo "# Failed on LARGE limit positive testing, no test passes." + clean_up + exit 1 +fi + +echo "# Start 5 concurrent unclobbered_vdso_oversubscribed tests with LARGE limit, + expecting at least one failure...." +pids="" +for i in 1 2 3 4 5; do + ( + ./ash_cgexec.sh $TEST_CG_SUB2 $test_cmd >cgtest_large_negative_$timestamp.$i.log 2>&1 + ) & + pids="$pids $!" +done + +if wait_and_detect_for_any $PROCESS_FAILURE "$pids"; then + echo "# PASSED LARGE limit negative testing." +else + echo "# Failed on LARGE limit negative testing, no test fails." + clean_up + exit 1 +fi + +echo "# Start 8 concurrent unclobbered_vdso_oversubscribed tests with LARGER limit, + expecting no failure...." +pids="" +for i in 1 2 3 4 5 6 7 8; do + ( + ./ash_cgexec.sh $TEST_CG_SUB4 $test_cmd >cgtest_larger_$timestamp.$i.log 2>&1 + ) & + pids="$pids $!" +done + +if wait_and_detect_for_any $PROCESS_FAILURE "$pids"; then + echo "# Failed on LARGER limit, at least one test fails." + clean_up + exit 1 +else + echo "# PASSED LARGER limit tests." +fi + +echo "# Start 8 concurrent unclobbered_vdso_oversubscribed tests with LARGER limit, + randomly kill one, expecting no failure...." +pids="" +for i in 1 2 3 4 5 6 7 8; do + ( + ./ash_cgexec.sh $TEST_CG_SUB4 $test_cmd >cgtest_larger_kill_$timestamp.$i.log 2>&1 + ) & + pids="$pids $!" +done +random_number=$(awk 'BEGIN{srand();print int(rand()*5)}') +sleep $((random_number + 1)) + +# Randomly select a process to kill +# Make sure usage counter not leaked at the end. +RANDOM_INDEX=$(awk 'BEGIN{srand();print int(rand()*8)}') +counter=0 +for pid in $pids; do + if [ "$counter" -eq "$RANDOM_INDEX" ]; then + PID_TO_KILL=$pid + break + fi + counter=$((counter + 1)) +done + +kill $PID_TO_KILL +echo "# Killed process with PID: $PID_TO_KILL" + +any_failure=0 +for pid in $pids; do + wait "$pid" + status=$? + if [ "$pid" != "$PID_TO_KILL" ]; then + if [ $status -ne 0 ]; then + echo "# Process $pid returned failure." + any_failure=1 + fi + fi +done + +if [ $any_failure -ne 0 ]; then + echo "# Failed on random killing, at least one test fails." + clean_up + exit 1 +fi +echo "# PASSED LARGER limit test with a process randomly killed." + +MEM_LIMIT_TOO_SMALL=$((CAPACITY - 2 * LARGE)) + +echo "$MEM_LIMIT_TOO_SMALL" > $CG_ROOT/$TEST_CG_SUB2/memory.max +if [ $? -ne 0 ]; then + echo "# Failed creating memory controller." + clean_up + exit 1 +fi + +echo "# Start 4 concurrent unclobbered_vdso_oversubscribed tests with LARGE EPC limit, + and too small RAM limit, expecting all failures...." +# Ensure swapping off so the OOM killer is activated when mem_cgroup limit is hit. +swapoff -a +pids="" +for i in 1 2 3 4; do + ( + ./ash_cgexec.sh $TEST_CG_SUB2 $test_cmd >cgtest_large_oom_$timestamp.$i.log 2>&1 + ) & + pids="$pids $!" +done + +if wait_and_detect_for_any $PROCESS_SUCCESS "$pids"; then + echo "# Failed on tests with memcontrol, some tests did not fail." + clean_up + swapon -a + exit 1 +else + swapon -a + echo "# PASSED LARGE limit tests with memcontrol." +fi + +sleep 2 + +USAGE=$(grep '^sgx_epc' "$CG_ROOT/$TEST_ROOT_CG/misc.current" | awk '{print $2}') +if [ "$USAGE" -ne 0 ]; then + echo "# Failed: Final usage is $USAGE, not 0." +else + echo "# PASSED leakage check." + echo "# PASSED ALL cgroup limit tests, cleanup cgroups..." +fi +clean_up +echo "# done." diff --git a/tools/testing/selftests/sgx/watch_misc_for_tests.sh b/tools/testing/selftests/sgx/watch_misc_for_tests.sh new file mode 100755 index 000000000000..1c9985726ace --- /dev/null +++ b/tools/testing/selftests/sgx/watch_misc_for_tests.sh @@ -0,0 +1,11 @@ +#!/usr/bin/env sh +# SPDX-License-Identifier: GPL-2.0 +# Copyright(c) 2023, 2024 Intel Corporation. + +if [ -z "$1" ]; then + echo "No argument supplied, please provide 'max', 'current', or 'events'" + exit 1 +fi + +watch -n 1 'find /sys/fs/cgroup -wholename "*/test*/misc.'$1'" -exec \ + sh -c '\''echo "$1:"; cat "$1"'\'' _ {} \;'
With different cgroups, the script starts one or multiple concurrent SGX selftests (test_sgx), each to run the unclobbered_vdso_oversubscribed test case, which loads an enclave of EPC size equal to the EPC capacity available on the platform. The script checks results against the expectation set for each cgroup and reports success or failure. The script creates 3 different cgroups at the beginning with following expectations: 1) SMALL - intentionally small enough to fail the test loading an enclave of size equal to the capacity. 2) LARGE - large enough to run up to 4 concurrent tests but fail some if more than 4 concurrent tests are run. The script starts 4 expecting at least one test to pass, and then starts 5 expecting at least one test to fail. 3) LARGER - limit is the same as the capacity, large enough to run lots of concurrent tests. The script starts 8 of them and expects all pass. Then it reruns the same test with one process randomly killed and usage checked to be zero after all processes exit. The script also includes a test with low mem_cg limit and LARGE sgx_epc limit to verify that the RAM used for per-cgroup reclamation is charged to a proper mem_cg. For this test, it turns off swapping before start, and turns swapping back on afterwards. Add README to document how to run the tests. Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com> --- V12: - Integrate the scripts to the "run_tests" target. (Jarkko) V11: - Remove cgroups-tools dependency and make scripts ash compatible. (Jarkko) - Drop support for cgroup v1 and simplify. (Michal, Jarkko) - Add documentation for functions. (Jarkko) - Turn off swapping before memcontrol tests and back on after - Format and style fixes, name for hard coded values V7: - Added memcontrol test. V5: - Added script with automatic results checking, remove the interactive script. - The script can run independent from the series below. --- tools/testing/selftests/sgx/Makefile | 3 +- tools/testing/selftests/sgx/README | 116 +++++++ tools/testing/selftests/sgx/ash_cgexec.sh | 16 + .../selftests/sgx/run_epc_cg_selftests.sh | 283 ++++++++++++++++++ .../selftests/sgx/watch_misc_for_tests.sh | 11 + 5 files changed, 428 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/sgx/README create mode 100755 tools/testing/selftests/sgx/ash_cgexec.sh create mode 100755 tools/testing/selftests/sgx/run_epc_cg_selftests.sh create mode 100755 tools/testing/selftests/sgx/watch_misc_for_tests.sh