diff mbox series

[v14,14/14] selftests/sgx: Add scripts for EPC cgroup testing

Message ID 20240531222630.4634-15-haitao.huang@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series Add Cgroup support for SGX EPC memory | expand

Commit Message

Haitao Huang May 31, 2024, 10:26 p.m. UTC
With different cgroups, the script starts one or multiple concurrent SGX
selftests (test_sgx), each to run the unclobbered_vdso_oversubscribed
test case, which loads an enclave of EPC size equal to the EPC capacity
available on the platform. The script checks results against the
expectation set for each cgroup and reports success or failure.

The script creates 3 different cgroups at the beginning with following
expectations:

1) small - intentionally small enough to fail the test loading an
enclave of size equal to the capacity.
2) large - large enough to run up to 4 concurrent tests but fail some if
more than 4 concurrent tests are run. The script starts 4 expecting at
least one test to pass, and then starts 5 expecting at least one test
to fail.
3) larger - limit is the same as the capacity, large enough to run lots of
concurrent tests. The script starts 8 of them and expects all pass.
Then it reruns the same test with one process randomly killed and
usage checked to be zero after all processes exit.

The script also includes a test with low mem_cg limit and large sgx_epc
limit to verify that the RAM used for per-cgroup reclamation is charged
to a proper mem_cg. For this test, it turns off swapping before start,
and turns swapping back on afterwards.

Add README to document how to run the tests.

Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
---
V13:
- More improvement on handling error cases and style fixes.
- Add settings file for custom timeout

V12:
- Integrate the scripts to the "run_tests" target. (Jarkko)

V11:
- Remove cgroups-tools dependency and make scripts ash compatible. (Jarkko)
- Drop support for cgroup v1 and simplify. (Michal, Jarkko)
- Add documentation for functions. (Jarkko)
- Turn off swapping before memcontrol tests and back on after
- Format and style fixes, name for hard coded values

V7:
- Added memcontrol test.

V5:
- Added script with automatic results checking, remove the interactive
script.
- The script can run independent from the series below.
---
 tools/testing/selftests/sgx/Makefile          |   3 +-
 tools/testing/selftests/sgx/README            | 109 +++++++
 tools/testing/selftests/sgx/ash_cgexec.sh     |  16 +
 tools/testing/selftests/sgx/config            |   4 +
 .../selftests/sgx/run_epc_cg_selftests.sh     | 295 ++++++++++++++++++
 tools/testing/selftests/sgx/settings          |   2 +
 .../selftests/sgx/watch_misc_for_tests.sh     |  11 +
 7 files changed, 439 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/sgx/README
 create mode 100755 tools/testing/selftests/sgx/ash_cgexec.sh
 create mode 100644 tools/testing/selftests/sgx/config
 create mode 100755 tools/testing/selftests/sgx/run_epc_cg_selftests.sh
 create mode 100644 tools/testing/selftests/sgx/settings
 create mode 100755 tools/testing/selftests/sgx/watch_misc_for_tests.sh

Comments

Jarkko Sakkinen June 4, 2024, 10 p.m. UTC | #1
On Sat Jun 1, 2024 at 1:26 AM EEST, Haitao Huang wrote:
> With different cgroups, the script starts one or multiple concurrent SGX
> selftests (test_sgx), each to run the unclobbered_vdso_oversubscribed
> test case, which loads an enclave of EPC size equal to the EPC capacity
> available on the platform. The script checks results against the
> expectation set for each cgroup and reports success or failure.
>
> The script creates 3 different cgroups at the beginning with following
> expectations:
>
> 1) small - intentionally small enough to fail the test loading an
> enclave of size equal to the capacity.
> 2) large - large enough to run up to 4 concurrent tests but fail some if
> more than 4 concurrent tests are run. The script starts 4 expecting at
> least one test to pass, and then starts 5 expecting at least one test
> to fail.
> 3) larger - limit is the same as the capacity, large enough to run lots of
> concurrent tests. The script starts 8 of them and expects all pass.
> Then it reruns the same test with one process randomly killed and
> usage checked to be zero after all processes exit.
>
> The script also includes a test with low mem_cg limit and large sgx_epc
> limit to verify that the RAM used for per-cgroup reclamation is charged
> to a proper mem_cg. For this test, it turns off swapping before start,
> and turns swapping back on afterwards.
>
> Add README to document how to run the tests.
>
> Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
> Tested-by: Jarkko Sakkinen <jarkko@kernel.org>

Reorg:

void sgx_cgroup_init(void)
{
	struct workqueue_struct *wq;

	/* eagerly allocate the workqueue: */
	wq = alloc_workqueue("sgx_cg_wq", wq_unbound | wq_freezable, wq_unbound_max_active);
	if (!wq) {
		pr_warn("sgx_cg_wq creation failed\n");
		return;
	}

	misc_cg_set_ops(MISC_CG_RES_SGX_EPC, &sgx_cgroup_ops);
	sgx_cgroup_misc_init(misc_cg_root(), &sgx_cg_root);

	/* Depending on misc state, keep or destory the workqueue: */
	if (cgroup_subsys_enabled(misc_cgrp_subsys))
		sgx_cg_wq = wq;
	else
		destroy_workqueue(wq);
}

BTW, why two previous operations are performed if subsystem is not
enabled?

I.e. why not instead:

void sgx_cgroup_init(void)
{
	struct workqueue_struct *wq;

	/* Eagerly allocate the workqueue: */
	wq = alloc_workqueue("sgx_cg_wq", wq_unbound | wq_freezable, wq_unbound_max_active);
	if (!wq) {
		pr_warn("sgx_cg_wq creation failed\n");
		return;
	}

	if (!cgroup_subsys_enabled(misc_cgrp_subsys)) {
		destroy_workqueue(wq);
		return;
	}

	misc_cg_set_ops(MISC_CG_RES_SGX_EPC, &sgx_cgroup_ops);
	sgx_cgroup_misc_init(misc_cg_root(), &sgx_cg_root);
	sgx_cg_wq = wq;
}

Finally, why this does not have __init? And neither sgx_cgroup_misc_init().

The names for these are also somewhat confusing, maybe something like:

* __sgx_cgroups_misc_init()
* sgx_cgroups_misc_init()

And both with __init.

I just made a trivial checkpatch run as a final check, and spotted the
warning on BUG_ON(), and noticed that this can't be right as it is but
please comment and correct where I might have gotten something wrong.

With "--strict" flag I also catched these:

CHECK: spinlock_t definition without comment
#1308: FILE: arch/x86/kernel/cpu/sgx/sgx.h:122:
+	spinlock_t lock;

CHECK: multiple assignments should be avoided
#444: FILE: kernel/cgroup/misc.c:450:
+		parent_cg = cg = &root_cg;

BR, Jarkko
Haitao Huang June 5, 2024, 3:33 p.m. UTC | #2
Hi Jarkko

Thanks for your review.

On Tue, 04 Jun 2024 17:00:34 -0500, Jarkko Sakkinen <jarkko@kernel.org>  
wrote:

> On Sat Jun 1, 2024 at 1:26 AM EEST, Haitao Huang wrote:
>> With different cgroups, the script starts one or multiple concurrent SGX
>> selftests (test_sgx), each to run the unclobbered_vdso_oversubscribed
>> test case, which loads an enclave of EPC size equal to the EPC capacity
>> available on the platform. The script checks results against the
>> expectation set for each cgroup and reports success or failure.
>>
>> The script creates 3 different cgroups at the beginning with following
>> expectations:
>>
>> 1) small - intentionally small enough to fail the test loading an
>> enclave of size equal to the capacity.
>> 2) large - large enough to run up to 4 concurrent tests but fail some if
>> more than 4 concurrent tests are run. The script starts 4 expecting at
>> least one test to pass, and then starts 5 expecting at least one test
>> to fail.
>> 3) larger - limit is the same as the capacity, large enough to run lots  
>> of
>> concurrent tests. The script starts 8 of them and expects all pass.
>> Then it reruns the same test with one process randomly killed and
>> usage checked to be zero after all processes exit.
>>
>> The script also includes a test with low mem_cg limit and large sgx_epc
>> limit to verify that the RAM used for per-cgroup reclamation is charged
>> to a proper mem_cg. For this test, it turns off swapping before start,
>> and turns swapping back on afterwards.
>>
>> Add README to document how to run the tests.
>>
>> Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
>> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
>> Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
>
> Reorg:
>
> void sgx_cgroup_init(void)
> {
> 	struct workqueue_struct *wq;
>
> 	/* eagerly allocate the workqueue: */
> 	wq = alloc_workqueue("sgx_cg_wq", wq_unbound | wq_freezable,  
> wq_unbound_max_active);
> 	if (!wq) {
> 		pr_warn("sgx_cg_wq creation failed\n");
> 		return;

sgx_cgroup_try_charge() expects sgx_cg_wq, so it would break unless we  
check and return 0 which was the initially implemented in v12. But then  
Kai had some concern on that we expose all the interface files to allow  
user to set limits but we don't enforce. To keep it simple we settled down  
back to BUG_ON(). This would only happen rarely and user can add  
command-line to disable SGX if s/he really wants to start kernel in this  
case, just can't do SGX.

Another thought we could just guard queue_work() with a null check for  
sgx_cg_wq, but I thought that's also not good for user as it basically  
disables all async global and per-cgroup reclamation. If user needs run  
SGX in such a case, better have async anyways.

See previous discusion:  
https://lore.kernel.org/linux-sgx/e56216ee4d5f7ef6d97c70f56243a5ddc8dea19d.camel@intel.com/

> 	}
>
> 	misc_cg_set_ops(MISC_CG_RES_SGX_EPC, &sgx_cgroup_ops);
> 	sgx_cgroup_misc_init(misc_cg_root(), &sgx_cg_root);
>
> 	/* Depending on misc state, keep or destory the workqueue: */
> 	if (cgroup_subsys_enabled(misc_cgrp_subsys))
> 		sgx_cg_wq = wq;
> 	else
> 		destroy_workqueue(wq);
> }
>
> BTW, why two previous operations are performed if subsystem is not
> enabled?
>

Note this file is conditionally compiled on CGROUP_MISC KConfig. Even  
though subsystem can be 'disabled' from the command line when CGROUP_MISC  
is on, we still need initialize the misc root node which is static for  
below reason.

All process will be assigned to misc root if misc is disabled and  
get_current_misc_cg() will return the root. So we need initialize ->lru  
and ->reclaim_work for the root sgx cgroup.

> I.e. why not instead:
>
> void sgx_cgroup_init(void)
> {
> 	struct workqueue_struct *wq;
>
> 	/* Eagerly allocate the workqueue: */
> 	wq = alloc_workqueue("sgx_cg_wq", wq_unbound | wq_freezable,  
> wq_unbound_max_active);
> 	if (!wq) {
> 		pr_warn("sgx_cg_wq creation failed\n");
> 		return;
> 	}
>

Same comment as above

> 	if (!cgroup_subsys_enabled(misc_cgrp_subsys)) {
> 		destroy_workqueue(wq);
> 		return;
> 	}
>
> 	misc_cg_set_ops(MISC_CG_RES_SGX_EPC, &sgx_cgroup_ops);
> 	sgx_cgroup_misc_init(misc_cg_root(), &sgx_cg_root);
> 	sgx_cg_wq = wq;
> }
>
> Finally, why this does not have __init?

This was my misunderstanding that __init only applies to the main init  
function. After looking at some actual usages, I tend to agree __init is  
appropriate. Thanks for catching this.

> And neither sgx_cgroup_misc_init().

This one is called when a specific misc cgroup is created (a new  
sub-folder in sysfs paraent in which misc is enabled), by the callback  
sgx_cgroup_alloc(). Not necessarily when subsys init.

>
> The names for these are also somewhat confusing, maybe something like:
>
> * __sgx_cgroups_misc_init()
> * sgx_cgroups_misc_init()
>
> And both with __init.
>
sgx_cgroup_init() is for the whole sgx cgroup support so original name is  
OK?
similar to sgx_drv_init(), sgx_vepc_init()?

I'm fine to rename the other one and add __init for sgx_cgroup().

> I just made a trivial checkpatch run as a final check, and spotted the
> warning on BUG_ON(), and noticed that this can't be right as it is but
> please comment and correct where I might have gotten something wrong.
>

See above

> With "--strict" flag I also catched these:
>
> CHECK: spinlock_t definition without comment
> #1308: FILE: arch/x86/kernel/cpu/sgx/sgx.h:122:
> +	spinlock_t lock;
>
Yes I had a comment but Kai thought it was too obvious and I can't think  
of a better one that's not obvious so I removed:

https://lore.kernel.org/linux-sgx/9ffb02a3344807f2c173fe8c7cb000cd6c7843b6.camel@intel.com/


> CHECK: multiple assignments should be avoided
> #444: FILE: kernel/cgroup/misc.c:450:
> +		parent_cg = cg = &root_cg;
>

This was also suggested by Kai a few versions back:
https://lore.kernel.org/linux-sgx/8f08f0b0f2b04b90d7cdb7b628f16f9080687c43.camel@intel.com/

Thanks
Haitao
Kai Huang June 5, 2024, 10:30 p.m. UTC | #3
>> Reorg:
>>
>> void sgx_cgroup_init(void)
>> {
>>     struct workqueue_struct *wq;
>>
>>     /* eagerly allocate the workqueue: */
>>     wq = alloc_workqueue("sgx_cg_wq", wq_unbound | wq_freezable, 
>> wq_unbound_max_active);
>>     if (!wq) {
>>         pr_warn("sgx_cg_wq creation failed\n");
>>         return;
> 
> sgx_cgroup_try_charge() expects sgx_cg_wq, so it would break unless we 
> check and return 0 which was the initially implemented in v12. But then 
> Kai had some concern on that we expose all the interface files to allow 
> user to set limits but we don't enforce. To keep it simple we settled 
> down back to BUG_ON(). 

[...]

> This would only happen rarely and user can add 
> command-line to disable SGX if s/he really wants to start kernel in this 
> case, just can't do SGX.

Just to be clear that I don't like BUG_ON() either.  It's inevitable you 
will get attention because of using it.

This is a compromise that you don't want to reset "capacity" to 0 when 
alloc_workqueue() fails.

There are existing code where BUG_ON() is used during the kernel early 
boot code when memory allocation fails (e.g., see cgroup_init_subsys()), 
so it might be acceptable to use BUG_ON() here, but it's up to 
maintainers to decide whether it is OK.

[...]

>> With "--strict" flag I also catched these:
>>
>> CHECK: spinlock_t definition without comment
>> #1308: FILE: arch/x86/kernel/cpu/sgx/sgx.h:122:
>> +    spinlock_t lock;
>>
> Yes I had a comment but Kai thought it was too obvious and I can't think 
> of a better one that's not obvious so I removed:
> 
> https://lore.kernel.org/linux-sgx/9ffb02a3344807f2c173fe8c7cb000cd6c7843b6.camel@intel.com/
>
To be clear, my reply was really about the comment itself isn't useful, 
but didn't say we shouldn't use a comment here.

> 
>> CHECK: multiple assignments should be avoided
>> #444: FILE: kernel/cgroup/misc.c:450:
>> +        parent_cg = cg = &root_cg;
>>
> 
> This was also suggested by Kai a few versions back:
> https://lore.kernel.org/linux-sgx/8f08f0b0f2b04b90d7cdb7b628f16f9080687c43.camel@intel.com/
> 

I didn't know checkpatch complains this.  Feel free to revert back as it 
is trivial to me.
Jarkko Sakkinen June 6, 2024, 5:32 a.m. UTC | #4
On Wed Jun 5, 2024 at 6:33 PM EEST, Haitao Huang wrote:

> sgx_cgroup_try_charge() expects sgx_cg_wq, so it would break unless we  
> check and return 0 which was the initially implemented in v12. But then  
> Kai had some concern on that we expose all the interface files to allow  
> user to set limits but we don't enforce. To keep it simple we settled down  
                                              ~~~~~~~~~~~~~~

Sure:

"Keep it simple and corpse"

> back to BUG_ON(). This would only happen rarely and user can add  
> command-line to disable SGX if s/he really wants to start kernel in this  
> case, just can't do SGX.

Even disabling all of SGX would be a less catastrophical measure.

> Yes I had a comment but Kai thought it was too obvious and I can't think  
> of a better one that's not obvious so I removed:

Not great advice given. Please just document it. In patch, which
BUG_ON() I don't want to see my R-by in it, until I've reviewed an
updated version.

BR, Jarkko
Jarkko Sakkinen June 6, 2024, 6:20 a.m. UTC | #5
On Thu Jun 6, 2024 at 1:30 AM EEST, Huang, Kai wrote:
>
> >> Reorg:
> >>
> >> void sgx_cgroup_init(void)
> >> {
> >>     struct workqueue_struct *wq;
> >>
> >>     /* eagerly allocate the workqueue: */
> >>     wq = alloc_workqueue("sgx_cg_wq", wq_unbound | wq_freezable, 
> >> wq_unbound_max_active);
> >>     if (!wq) {
> >>         pr_warn("sgx_cg_wq creation failed\n");
> >>         return;
> > 
> > sgx_cgroup_try_charge() expects sgx_cg_wq, so it would break unless we 
> > check and return 0 which was the initially implemented in v12. But then 
> > Kai had some concern on that we expose all the interface files to allow 
> > user to set limits but we don't enforce. To keep it simple we settled 
> > down back to BUG_ON(). 
>
> [...]
>
> > This would only happen rarely and user can add 
> > command-line to disable SGX if s/he really wants to start kernel in this 
> > case, just can't do SGX.
>
> Just to be clear that I don't like BUG_ON() either.  It's inevitable you 
> will get attention because of using it.

Just then plain disable SGX if it fails to start.

Do not take down the whole system.

> This is a compromise that you don't want to reset "capacity" to 0 when 
> alloc_workqueue() fails.

BUG_ON() is not a "compromise".

> There are existing code where BUG_ON() is used during the kernel early 
> boot code when memory allocation fails (e.g., see cgroup_init_subsys()), 
> so it might be acceptable to use BUG_ON() here, but it's up to 
> maintainers to decide whether it is OK.

When it is not possible continue to run the system at all, and only
then.

Here it is possible. Without SGX.

BR, Jarkko
Jarkko Sakkinen June 6, 2024, 6:23 a.m. UTC | #6
On Thu Jun 6, 2024 at 9:20 AM EEST, Jarkko Sakkinen wrote:
> > There are existing code where BUG_ON() is used during the kernel early 
> > boot code when memory allocation fails (e.g., see cgroup_init_subsys()), 
> > so it might be acceptable to use BUG_ON() here, but it's up to 
> > maintainers to decide whether it is OK.
>
> When it is not possible continue to run the system at all, and only
> then.
>
> Here it is possible. Without SGX.

With this logic sgx_init() should call BUG() in the error paths.

BR, Jarkko
Haitao Huang June 6, 2024, 1:13 p.m. UTC | #7
On Thu, 06 Jun 2024 00:32:55 -0500, Jarkko Sakkinen <jarkko@kernel.org>  
wrote:

> On Wed Jun 5, 2024 at 6:33 PM EEST, Haitao Huang wrote:
>
>> sgx_cgroup_try_charge() expects sgx_cg_wq, so it would break unless we
>> check and return 0 which was the initially implemented in v12. But then
>> Kai had some concern on that we expose all the interface files to allow
>> user to set limits but we don't enforce. To keep it simple we settled  
>> down
>                                               ~~~~~~~~~~~~~~
>
> Sure:
>
> "Keep it simple and corpse"
>
>> back to BUG_ON(). This would only happen rarely and user can add
>> command-line to disable SGX if s/he really wants to start kernel in this
>> case, just can't do SGX.
>
> Even disabling all of SGX would be a less catastrophical measure.
>
>> Yes I had a comment but Kai thought it was too obvious and I can't think
>> of a better one that's not obvious so I removed:
>
> Not great advice given. Please just document it. In patch, which
> BUG_ON() I don't want to see my R-by in it, until I've reviewed an
> updated version.
>
> BR, Jarkko
>

Sure. that makes sens.  So far I have following changes. I did not do the  
renaming for the functions as I am not sure it is still needed. Let me  
know otherwise and if I missed any more.

--- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c
+++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
@@ -329,7 +329,7 @@ const struct misc_res_ops sgx_cgroup_ops = {
      .free = sgx_cgroup_free,
  };

-void sgx_cgroup_init(void)
+int __init sgx_cgroup_init(void)
  {
      /*
       * misc root always exists even if misc is disabled from command line.
@@ -345,7 +345,8 @@ void sgx_cgroup_init(void)
      if (cgroup_subsys_enabled(misc_cgrp_subsys)) {
          sgx_cg_wq = alloc_workqueue("sgx_cg_wq", WQ_UNBOUND |  
WQ_FREEZABLE,
                          WQ_UNBOUND_MAX_ACTIVE);
-        BUG_ON(!sgx_cg_wq);
+        return -ENOMEM;
      }

+    return 0;
  }

--- a/arch/x86/kernel/cpu/sgx/epc_cgroup.h
+++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.h
@@ -99,7 +99,7 @@ bool sgx_cgroup_should_reclaim(struct sgx_cgroup  
*sgx_cg);
  struct misc_cg *sgx_cgroup_reclaim_pages(struct misc_cg *root,
                       struct misc_cg *next_cg,
                       struct mm_struct *charge_mm);
-void sgx_cgroup_init(void);
+int __init sgx_cgroup_init(void);

  #endif /* CONFIG_CGROUP_MISC */


--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -1045,7 +1045,7 @@ static int __init sgx_init(void)
      if (!sgx_page_cache_init())
          return -ENOMEM;

-    if (!sgx_page_reclaimer_init()) {
+    if (!sgx_page_reclaimer_init() || !sgx_cgroup_init()) {
          ret = -ENOMEM;
          goto err_page_cache;
      }
@@ -1067,9 +1067,6 @@ static int __init sgx_init(void)
      if (sgx_vepc_init() && ret)
          goto err_provision;

-    /* Setup cgroup if either the native or vepc driver is active */
-    sgx_cgroup_init();
-
      return 0;

  err_provision:

--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -119,6 +119,7 @@ static inline void *sgx_get_epc_virt_addr(struct  
sgx_epc_page *page)
   * cgroup.
   */
  struct sgx_epc_lru_list {
+    /* Use this lock to protect access from multiple reclamation worker  
threads */
      spinlock_t lock;
      struct list_head reclaimable;
  };

--- a/kernel/cgroup/misc.c
+++ b/kernel/cgroup/misc.c
@@ -446,7 +446,8 @@ misc_cg_alloc(struct cgroup_subsys_state *parent_css)
      int ret;

      if (!parent_css) {
-        parent_cg = cg = &root_cg;
+        parent_cg = &root_cg;
+        cg = &root_cg;
      } else {
          cg = kzalloc(sizeof(*cg), GFP_KERNEL);
          if (!cg)

Thanks
Haitao
Kai Huang June 10, 2024, 10:39 p.m. UTC | #8
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -1045,7 +1045,7 @@ static int __init sgx_init(void)
>        if (!sgx_page_cache_init())
>            return -ENOMEM;
> 
> -    if (!sgx_page_reclaimer_init()) {
> +    if (!sgx_page_reclaimer_init() || !sgx_cgroup_init()) {
>            ret = -ENOMEM;
>            goto err_page_cache;
>        }

Does it make more sense to move the sgx_cgroup_init() to the 
sgx_drv_init()?  The SGX cgroup only works for the driver side anyway. 
In this case, if something went wrong in sgx_cgroup_init(), the 
sgx_vepc_init() could still have a chance to work.

And IIUC we need to reset the "capacity" to 0 if sgx_cgroup_init() 
fails, no matter it is called inside sgx_drv_init() or sgx_init(), 
otherwise the "epc" would appear in the cgroup hierarchy as a misc 
cgroup resource.

Another option is to defer setting the capacity to the point where we 
have made sure sgx_drv_init() and sgx_cgroup_init() cannot fail.

Btw, I plan to review the rest from late of this week or next week 
because this week I have some other staff needs to be finished first.
Haitao Huang June 11, 2024, 12:57 p.m. UTC | #9
On Mon, 10 Jun 2024 17:39:53 -0500, Huang, Kai <kai.huang@intel.com> wrote:

>
>> --- a/arch/x86/kernel/cpu/sgx/main.c
>> +++ b/arch/x86/kernel/cpu/sgx/main.c
>> @@ -1045,7 +1045,7 @@ static int __init sgx_init(void)
>>        if (!sgx_page_cache_init())
>>            return -ENOMEM;
>>  -    if (!sgx_page_reclaimer_init()) {
>> +    if (!sgx_page_reclaimer_init() || !sgx_cgroup_init()) {
>>            ret = -ENOMEM;
>>            goto err_page_cache;
>>        }
>
> Does it make more sense to move the sgx_cgroup_init() to the  
> sgx_drv_init()?  The SGX cgroup only works for the driver side anyway.  
> In this case, if something went wrong in sgx_cgroup_init(), the  
> sgx_vepc_init() could still have a chance to work.
>

vepc reclamation is not done by cgroup/ksgxd but try_charge() won't work  
if user expecting cgroup to limit vepc allocation. Would it be more  
consistent to just disable vepc, i.e., on system with MISC, sgx/vepc  
always go with cgroup enabled?

> And IIUC we need to reset the "capacity" to 0 if sgx_cgroup_init()  
> fails, no matter it is called inside sgx_drv_init() or sgx_init(),  
> otherwise the "epc" would appear in the cgroup hierarchy as a misc  
> cgroup resource.
>
> Another option is to defer setting the capacity to the point where we  
> have made sure sgx_drv_init() and sgx_cgroup_init() cannot fail.
>

Yes agree we need do this.
> Btw, I plan to review the rest from late of this week or next week  
> because this week I have some other staff needs to be finished first.
>

Sure. Thanks
Haitao
Kai Huang June 11, 2024, 9:56 p.m. UTC | #10
On Tue, 2024-06-11 at 07:57 -0500, Haitao Huang wrote:
> On Mon, 10 Jun 2024 17:39:53 -0500, Huang, Kai <kai.huang@intel.com> wrote:
> 
> > 
> > > --- a/arch/x86/kernel/cpu/sgx/main.c
> > > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > > @@ -1045,7 +1045,7 @@ static int __init sgx_init(void)
> > >        if (!sgx_page_cache_init())
> > >            return -ENOMEM;
> > >  -    if (!sgx_page_reclaimer_init()) {
> > > +    if (!sgx_page_reclaimer_init() || !sgx_cgroup_init()) {
> > >            ret = -ENOMEM;
> > >            goto err_page_cache;
> > >        }
> > 
> > Does it make more sense to move the sgx_cgroup_init() to the  
> > sgx_drv_init()?  The SGX cgroup only works for the driver side anyway.  
> > In this case, if something went wrong in sgx_cgroup_init(), the  
> > sgx_vepc_init() could still have a chance to work.
> > 
> 
> vepc reclamation is not done by cgroup/ksgxd but try_charge() won't work  
> if user expecting cgroup to limit vepc allocation. 
> 

Oh ok.

> Would it be more  
> consistent to just disable vepc, i.e., on system with MISC, sgx/vepc  
> always go with cgroup enabled?
> 

Yes fine to me.
>
diff mbox series

Patch

diff --git a/tools/testing/selftests/sgx/Makefile b/tools/testing/selftests/sgx/Makefile
index 867f88ce2570..739376af9e33 100644
--- a/tools/testing/selftests/sgx/Makefile
+++ b/tools/testing/selftests/sgx/Makefile
@@ -20,7 +20,8 @@  ENCL_LDFLAGS := -Wl,-T,test_encl.lds,--build-id=none
 
 ifeq ($(CAN_BUILD_X86_64), 1)
 TEST_CUSTOM_PROGS := $(OUTPUT)/test_sgx
-TEST_FILES := $(OUTPUT)/test_encl.elf
+TEST_FILES := $(OUTPUT)/test_encl.elf ash_cgexec.sh
+TEST_PROGS := run_epc_cg_selftests.sh
 
 all: $(TEST_CUSTOM_PROGS) $(OUTPUT)/test_encl.elf
 endif
diff --git a/tools/testing/selftests/sgx/README b/tools/testing/selftests/sgx/README
new file mode 100644
index 000000000000..f84406bf29a4
--- /dev/null
+++ b/tools/testing/selftests/sgx/README
@@ -0,0 +1,109 @@ 
+SGX selftests
+
+The SGX selftests includes a c program (test_sgx) that covers basic user space
+facing APIs and a shell scripts (run_sgx_cg_selftests.sh) testing SGX misc
+cgroup. The SGX cgroup test script requires root privileges and runs a
+specific test case of  the test_sgx in different cgroups configured by the
+script. More details about the cgroup test can be found below.
+
+All SGX selftests can run with or without kselftest framework.
+
+WITH KSELFTEST FRAMEWORK
+=======================
+
+BUILD
+-----
+
+Build executable file "test_sgx" from top level directory of the kernel source:
+ $ make -C tools/testing/selftests TARGETS=sgx
+
+RUN
+---
+
+Run all sgx tests as sudo or root since the cgroup tests need to configure cgroup
+limits in files under /sys/fs/cgroup.
+
+ $ sudo make -C tools/testing/selftests/sgx run_tests
+
+Without sudo, SGX cgroup tests will be skipped.
+
+On platforms with large Enclave Page Cache (EPC) and/or less cpu cores, you may
+need adjust the timeout in 'settings' to avoid timeouts.
+
+More details about kselftest framework can be found in
+Documentation/dev-tools/kselftest.rst.
+
+WITHOUT KSELFTEST FRAMEWORK
+===========================
+
+BUILD
+-----
+
+Build executable file "test_sgx" from this
+directory(tools/testing/selftests/sgx/):
+
+  $ make
+
+RUN
+---
+
+Run all non-cgroup tests:
+
+ $ ./test_sgx
+
+To test SGX cgroup:
+
+ $ sudo ./run_sgx_cg_selftests.sh
+
+THE SGX CGROUP TEST SCRIPTS
+===========================
+
+Overview of the main cgroup test script
+---------------------------------------
+
+With different cgroups, the script (run_sgx_cg_selftests.sh) starts one or
+multiple concurrent SGX selftests (test_sgx), each to run the
+unclobbered_vdso_oversubscribed test case, which loads an enclave of EPC size
+equal to the EPC capacity available on the platform. The script checks results
+against the expectation set for each cgroup and reports success or failure.
+
+The script creates 3 different cgroups at the beginning with following
+expectations:
+
+  1) small - intentionally small enough to fail the test loading an enclave of
+             size equal to the capacity.
+
+  2) large - large enough to run up to 4 concurrent tests but fail some if more
+	     than 4 concurrent tests are run. The script starts 4 expecting at
+	     least one test to pass, and then starts 5 expecting at least one
+             test to fail.
+
+  3) larger - limit is the same as the capacity, large enough to run lots of
+	      concurrent tests. The script starts 8 of them and expects all
+	      pass.  Then it reruns the same test with one process randomly
+	      killed and usage checked to be zero after all processes exit.
+
+The script also includes a test with low mem_cg limit (memory.max) and the
+'large' sgx_epc limit to verify that the RAM used for per-cgroup reclamation is
+charged to a proper mem_cg. To validate mem_cg OOM-kills processes when its
+memory.max limit is reached due to SGX EPC reclamation, the script turns off
+swapping before start, and turns swapping back on afterwards for this particular
+test.
+
+The helper script
+------------------------------------------------------
+
+To monitor the SGX cgroup settings and behaviors or trouble-shoot during
+testing, the helper script, watch_misc_for_tests.sh, can be used to watch
+relevant entries in cgroupfs files. For example, to watch the SGX cgroup
+'current' counter changes during testing, run this in a separate terminal from
+this directory:
+
+  $ ./watch_misc_for_tests.sh current
+
+For more details about SGX cgroups, see "Cgroup Support" in
+Documentation/arch/x86/sgx.rst.
+
+The scripts require cgroup v2 support. More details about cgroup v2 can be found
+in Documentation/admin-guide/cgroup-v2.rst.
+
diff --git a/tools/testing/selftests/sgx/ash_cgexec.sh b/tools/testing/selftests/sgx/ash_cgexec.sh
new file mode 100755
index 000000000000..cfa5d2b0e795
--- /dev/null
+++ b/tools/testing/selftests/sgx/ash_cgexec.sh
@@ -0,0 +1,16 @@ 
+#!/usr/bin/env sh
+# SPDX-License-Identifier: GPL-2.0
+# Copyright(c) 2024 Intel Corporation.
+
+# Start a program in a given cgroup.
+# Supports V2 cgroup paths, relative to /sys/fs/cgroup
+if [ "$#" -lt 2 ]; then
+    echo "Usage: $0 <v2 cgroup path> <command> [args...]"
+    exit 1
+fi
+# Move this shell to the cgroup.
+echo 0 >/sys/fs/cgroup/$1/cgroup.procs
+shift
+# Execute the command within the cgroup
+exec "$@"
+
diff --git a/tools/testing/selftests/sgx/config b/tools/testing/selftests/sgx/config
new file mode 100644
index 000000000000..e7f1db1d3eff
--- /dev/null
+++ b/tools/testing/selftests/sgx/config
@@ -0,0 +1,4 @@ 
+CONFIG_CGROUPS=y
+CONFIG_CGROUP_MISC=y
+CONFIG_MEMCG=y
+CONFIG_X86_SGX=y
diff --git a/tools/testing/selftests/sgx/run_epc_cg_selftests.sh b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh
new file mode 100755
index 000000000000..f3d463c09cc2
--- /dev/null
+++ b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh
@@ -0,0 +1,295 @@ 
+#!/usr/bin/env sh
+# SPDX-License-Identifier: GPL-2.0
+# Copyright(c) 2023, 2024 Intel Corporation.
+
+PROCESS_SUCCESS=1
+PROCESS_FAILURE=0
+# Wait for a process and check for expected exit status.
+#
+# Arguments:
+#	$1 - the pid of the process to wait and check.
+#	$2 - 1 if expecting success, 0 for failure.
+#
+# Return:
+#	0 if the exit status of the process matches the expectation.
+#	1 otherwise.
+wait_check_process_status() {
+    pid=$1
+    check_for_success=$2
+
+    wait "$pid"
+    status=$?
+
+    if [ $check_for_success -eq $PROCESS_SUCCESS ] && [ $status -eq 0 ]; then
+        echo "# Process $pid succeeded."
+        return 0
+    elif [ $check_for_success -eq $PROCESS_FAILURE ] && [ $status -ne 0 ]; then
+        echo "# Process $pid returned failure."
+        return 0
+    fi
+    return 1
+}
+
+# Wait for a set of processes and check for expected exit status
+#
+# Arguments:
+#	$1 - 1 if expecting success, 0 for failure.
+#	remaining args - The pids of the processes
+#
+# Return:
+#	0 if exit status of any process matches the expectation.
+#	1 otherwise.
+wait_and_detect_for_any() {
+    check_for_success=$1
+
+    shift
+    detected=1 # 0 for success detection
+
+    for pid in $@; do
+        if wait_check_process_status "$pid" "$check_for_success"; then
+            detected=0
+            # Wait for other processes to exit
+        fi
+    done
+
+    return $detected
+}
+
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+if [ "$(id -u)" -ne 0 ]; then
+    echo "SKIP: SGX cgroup tests need root privileges."
+    exit $ksft_skip
+fi
+
+cg_root=/sys/fs/cgroup
+if [ ! -d "$cg_root/$test_root_cg" ]; then
+    echo "SKIP: SGX cgroup tests require v2 cgroups."
+    exit $ksft_skip
+fi
+test_root_cg=sgx_kselftest
+#make sure we start clean
+if [ -d "$cg_root/$test_root_cg" ]; then
+    echo "SKIP: Please clean up $cg_root/$test_root_cg."
+    exit $ksft_skip
+fi
+
+test_cg_small_parent=$test_root_cg/sgx_test_small_parent
+test_cg_large=$test_root_cg/sgx_test_large
+# We will only set limit in test1 and run tests in test3
+test_cg_small=$test_cg_small_parent/sgx_test_small
+test_cg_larger=$test_root_cg/sgx_test_larger
+
+clean_up()
+{
+    # Wait a little for cgroups to reset counters for dead processes.
+    sleep 2
+    rmdir $cg_root/$test_cg_large
+    rmdir $cg_root/$test_cg_small
+    rmdir $cg_root/$test_cg_larger
+    rmdir $cg_root/$test_cg_small_parent
+    rmdir $cg_root/$test_root_cg
+}
+
+mkdir $cg_root/$test_root_cg && \
+mkdir $cg_root/$test_cg_small_parent && \
+mkdir $cg_root/$test_cg_large && \
+mkdir $cg_root/$test_cg_small && \
+mkdir $cg_root/$test_cg_larger
+if [ $? -ne 0 ]; then
+    echo "FAIL: Failed creating cgroups."
+    exit 1
+fi
+
+# Turn on misc and memory controller in non-leaf nodes
+echo "+misc" >  $cg_root/cgroup.subtree_control && \
+echo "+memory" > $cg_root/cgroup.subtree_control && \
+echo "+misc" >  $cg_root/$test_root_cg/cgroup.subtree_control && \
+echo "+memory" > $cg_root/$test_root_cg/cgroup.subtree_control && \
+echo "+misc" >  $cg_root/$test_cg_small_parent/cgroup.subtree_control
+if [ $? -ne 0 ]; then
+    echo "FAIL: can't set up cgroups, make sure misc and memory cgroups are enabled."
+    clean_up
+    exit 1
+fi
+
+epc_capacity=$(grep "sgx_epc" "$cg_root/misc.capacity" | awk '{print $2}')
+
+# This is below number of VA pages needed for enclave of capacity size. So
+# should fail oversubscribed cases
+epc_small_limit=$(( epc_capacity / 512 ))
+
+# At least load one enclave of capacity size successfully, maybe up to 4.
+# But some may fail if we run more than 4 concurrent enclaves of capacity size.
+epc_large_limit=$(( epc_small_limit * 4 ))
+
+# Load lots of enclaves
+epc_larger_limit=$epc_capacity
+echo "# Setting up SGX cgroup limits."
+echo "sgx_epc $epc_small_limit" > $cg_root/$test_cg_small_parent/misc.max && \
+echo "sgx_epc $epc_large_limit" >  $cg_root/$test_cg_large/misc.max && \
+echo "sgx_epc $epc_larger_limit" > $cg_root/$test_cg_larger/misc.max
+if [ $? -ne 0 ]; then
+    echo "# Failed setting up misc limits for sgx_epc."
+    echo "SKIP: Kernel does not support SGX cgroup."
+    clean_up
+    exit $ksft_skip
+fi
+
+test_cmd="./test_sgx -t unclobbered_vdso_oversubscribed"
+
+echo "# Start unclobbered_vdso_oversubscribed with small EPC limit, expecting failure..."
+./ash_cgexec.sh $test_cg_small $test_cmd >/dev/null 2>&1
+if [ $? -eq 0 ]; then
+    echo "FAIL: Fail on small EPC limit, not expecting any test passes."
+    clean_up
+    exit 1
+else
+    echo "# Test failed as expected."
+fi
+
+echo "PASS: small EPC limit test."
+
+echo "# Start 4 concurrent unclobbered_vdso_oversubscribed tests with large EPC limit, \
+expecting at least one success...."
+
+pids=""
+for i in 1 2 3 4; do
+    (
+        ./ash_cgexec.sh $test_cg_large $test_cmd >/dev/null 2>&1
+    ) &
+    pids="$pids $!"
+done
+
+if wait_and_detect_for_any $PROCESS_SUCCESS "$pids"; then
+    echo "PASS: large EPC limit positive testing."
+else
+    echo "FAIL: Failed on large EPC limit positive testing, no test passes."
+    clean_up
+    exit 1
+fi
+
+echo "# Start 5 concurrent unclobbered_vdso_oversubscribed tests with large EPC limit, \
+expecting at least one failure...."
+pids=""
+for i in 1 2 3 4 5; do
+    (
+        ./ash_cgexec.sh $test_cg_large $test_cmd >/dev/null 2>&1
+    ) &
+    pids="$pids $!"
+done
+
+if wait_and_detect_for_any $PROCESS_FAILURE "$pids"; then
+    echo "PASS: large EPC limit negative testing."
+else
+    echo "FAIL: Failed on large EPC limit negative testing, no test fails."
+    clean_up
+    exit 1
+fi
+
+echo "# Start 8 concurrent unclobbered_vdso_oversubscribed tests with larger EPC limit, \
+expecting no failure...."
+pids=""
+for i in 1 2 3 4 5 6 7 8; do
+    (
+        ./ash_cgexec.sh $test_cg_larger $test_cmd >/dev/null 2>&1
+    ) &
+    pids="$pids $!"
+done
+
+if wait_and_detect_for_any $PROCESS_FAILURE "$pids"; then
+    echo "FAIL: Failed on larger EPC limit, at least one test fails."
+    clean_up
+    exit 1
+else
+    echo "PASS: larger EPC limit tests."
+fi
+
+echo "# Start 8 concurrent unclobbered_vdso_oversubscribed tests with larger EPC limit,\
+ randomly kill one, expecting no failure...."
+pids=""
+for i in 1 2 3 4 5 6 7 8; do
+    (
+        ./ash_cgexec.sh $test_cg_larger $test_cmd >/dev/null 2>&1
+    ) &
+    pids="$pids $!"
+done
+random_number=$(awk 'BEGIN{srand();print int(rand()*5)}')
+sleep $((random_number + 1))
+
+# Randomly select a process to kill
+# Make sure usage counter not leaked at the end.
+random_index=$(awk 'BEGIN{srand();print int(rand()*8)}')
+counter=0
+for pid in $pids; do
+    if [ "$counter" -eq "$random_index" ]; then
+        pid_to_kill=$pid
+        break
+    fi
+    counter=$((counter + 1))
+done
+
+kill $pid_to_kill
+echo "# Killed process with PID: $pid_to_kill"
+
+any_failure=0
+for pid in $pids; do
+    wait "$pid"
+    status=$?
+    if [ "$pid" != "$pid_to_kill" ]; then
+        if [ $status -ne 0 ]; then
+	    echo "# Process $pid returned failure."
+            any_failure=1
+        fi
+    fi
+done
+
+if [ $any_failure -ne 0 ]; then
+    echo "FAIL: Failed on random killing, at least one test fails."
+    clean_up
+    exit 1
+fi
+echo "PASS: larger EPC limit test with a process randomly killed."
+
+mem_limit_too_small=$((epc_capacity - 2 * epc_large_limit))
+
+echo "$mem_limit_too_small" > $cg_root/$test_cg_large/memory.max
+if [ $? -ne 0 ]; then
+    echo "FAIL: Failed setting up memory controller."
+    clean_up
+    exit 1
+fi
+
+echo "# Start 4 concurrent unclobbered_vdso_oversubscribed tests with large EPC limit, \
+and too small RAM limit, expecting all failures...."
+# Ensure swapping off so the OOM killer is activated when mem_cgroup limit is hit.
+swapoff -a
+pids=""
+for i in 1 2 3 4; do
+    (
+        ./ash_cgexec.sh $test_cg_large $test_cmd >/dev/null 2>&1
+    ) &
+    pids="$pids $!"
+done
+
+if wait_and_detect_for_any $PROCESS_SUCCESS "$pids"; then
+    echo "FAIL: Failed on tests with memcontrol, some tests did not fail."
+    clean_up
+    swapon -a
+    exit 1
+else
+    swapon -a
+    echo "PASS: large EPC limit tests with memcontrol."
+fi
+
+sleep 2
+
+epc_usage=$(grep '^sgx_epc' "$cg_root/$test_root_cg/misc.current" | awk '{print $2}')
+if [ "$epc_usage" -ne 0 ]; then
+    echo "FAIL: Final usage is $epc_usage, not 0."
+else
+    echo "PASS: leakage check."
+    echo "PASS: ALL cgroup limit tests, cleanup cgroups..."
+fi
+clean_up
+echo "# Done SGX cgroup tests."
diff --git a/tools/testing/selftests/sgx/settings b/tools/testing/selftests/sgx/settings
new file mode 100644
index 000000000000..4bf7dcbf9fa8
--- /dev/null
+++ b/tools/testing/selftests/sgx/settings
@@ -0,0 +1,2 @@ 
+# This timeout may need be increased for platforms with EPC larger than 4G
+timeout=140
diff --git a/tools/testing/selftests/sgx/watch_misc_for_tests.sh b/tools/testing/selftests/sgx/watch_misc_for_tests.sh
new file mode 100755
index 000000000000..9280a5e0962b
--- /dev/null
+++ b/tools/testing/selftests/sgx/watch_misc_for_tests.sh
@@ -0,0 +1,11 @@ 
+#!/usr/bin/env sh
+# SPDX-License-Identifier: GPL-2.0
+# Copyright(c) 2023, 2024 Intel Corporation.
+
+if [ -z "$1" ]; then
+    echo "No argument supplied, please provide 'max', 'current', or 'events'"
+    exit 1
+fi
+
+watch -n 1 'find /sys/fs/cgroup -wholename "*/sgx_test*/misc.'$1'" -exec \
+    sh -c '\''echo "$1:"; cat "$1"'\'' _ {} \;'