[v12,08/14] x86/sgx: Add basic EPC reclamation flow for cgroup

From: Kristen Carlson Accardi <kristen@linux.intel.com>

From: Kristen Carlson Accardi <kristen@linux.intel.com>

Currently in the EPC page allocation, the kernel simply fails the
allocation when the current EPC cgroup fails to charge due to its usage
reaching limit.  This is not ideal. When that happens, a better way is
to reclaim EPC page(s) from the current EPC cgroup (and/or its
descendants) to reduce its usage so the new allocation can succeed.

Add the basic building blocks to support per-cgroup reclamation.

Currently the kernel only has one place to reclaim EPC pages: the global
EPC LRU list.  To support the "per-cgroup" EPC reclaim, maintain an LRU
list for each EPC cgroup, and introduce a "cgroup" variant function to
reclaim EPC pages from a given EPC cgroup and its descendants.

Currently the kernel does the global EPC reclaim in sgx_reclaim_page().
It always tries to reclaim EPC pages in batch of SGX_NR_TO_SCAN (16)
pages.  Specifically, it always "scans", or "isolates" SGX_NR_TO_SCAN
pages from the global LRU, and then tries to reclaim these pages at once
for better performance.

Implement the "cgroup" variant EPC reclaim in a similar way, but keep
the implementation simple: 1) change sgx_reclaim_pages() to take an LRU
as input, and return the pages that are "scanned" and attempted for
reclamation (but not necessarily reclaimed successfully); 2) loop the
given EPC cgroup and its descendants and do the new sgx_reclaim_pages()
until SGX_NR_TO_SCAN pages are "scanned".

This implementation, encapsulated in sgx_cgroup_reclaim_pages(), always
tries to reclaim SGX_NR_TO_SCAN pages from the LRU of the given EPC
cgroup, and only moves to its descendants when there's no enough
reclaimable EPC pages to "scan" in its LRU.  It should be enough for
most cases.

Note, this simple implementation doesn't _exactly_ mimic the current
global EPC reclaim (which always tries to do the actual reclaim in batch
of SGX_NR_TO_SCAN pages): when LRUs have less than SGX_NR_TO_SCAN
reclaimable pages, the actual reclaim of EPC pages will be split into
smaller batches _across_ multiple LRUs with each being smaller than
SGX_NR_TO_SCAN pages.

A more precise way to mimic the current global EPC reclaim would be to
have a new function to only "scan" (or "isolate") SGX_NR_TO_SCAN pages
_across_ the given EPC cgroup _AND_ its descendants, and then do the
actual reclaim in one batch.  But this is unnecessarily complicated at
this stage.

Alternatively, the current sgx_reclaim_pages() could be changed to
return the actual "reclaimed" pages, but not "scanned" pages. However,
the reclamation is a lengthy process, forcing a successful reclamation
of predetermined number of pages may block the caller for too long. And
that may not be acceptable in some synchronous contexts, e.g., in
serving an ioctl().

With this building block in place, add synchronous reclamation support
in sgx_cgroup_try_charge(): trigger a call to
sgx_cgroup_reclaim_pages() if the cgroup reaches its limit and the
caller allows synchronous reclaim as indicated by s newly added
parameter.

A later patch will add support for asynchronous reclamation reusing
sgx_cgroup_reclaim_pages().

Note all reclaimable EPC pages are still tracked in the global LRU thus
no per-cgroup reclamation is actually active at the moment. Per-cgroup
tracking and reclamation will be turned on in the end after all
necessary infrastructure is in place.

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Co-developed-by: Haitao Huang <haitao.huang@linux.intel.com>
Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
---
V11:
- Use commit message suggested by Kai
- Remove "usage" comments for functions. (Kai)

V10:
- Simplify the signature by removing a pointer to nr_to_scan (Kai)
- Return pages attempted instead of reclaimed as it is really what the
cgroup caller needs to track progress. This further simplifies the design.
- Merge patch for exposing sgx_reclaim_pages() with basic synchronous
reclamation. (Kai)
- Shorten names for EPC cgroup functions. (Jarkko)
- Fix/add comments to justify the design (Kai)
- Separate out a helper for for addressing single iteration of the loop
in sgx_cgroup_try_charge(). (Jarkko)

V9:
- Add comments for static variables. (Jarkko)

V8:
- Use width of 80 characters in text paragraphs. (Jarkko)
- Remove alignment for substructure variables. (Jarkko)

V7:
- Reworked from patch 9 of V6, "x86/sgx: Restructure top-level EPC reclaim
function". Do not split the top level function (Kai)
- Dropped patches 7 and 8 of V6.
- Split this out from the big patch, #10 in V6. (Dave, Kai)
---
 arch/x86/kernel/cpu/sgx/epc_cgroup.c | 119 ++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/sgx/epc_cgroup.h |   5 +-
 arch/x86/kernel/cpu/sgx/main.c       |  45 ++++++----
 arch/x86/kernel/cpu/sgx/sgx.h        |   1 +
 4 files changed, 148 insertions(+), 22 deletions(-)

Message ID	20240416032011.58578-9-haitao.huang@linux.intel.com (mailing list archive)
State	New
Headers	show Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A90C139FCF; Tue, 16 Apr 2024 03:20:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713237621; cv=none; b=AJzRQfywydYF1tbidwhIbq4F42SUT35FTpdoaClhxej3wKo7j5hJ1gPiLfBHgclq0Fo0EergpWNo5AcfTduYFo+CgK+1pdSOaXduWM0Y+h82ivNoqH0DCTlDnXIjayt5oIN8Xbj0gxNjvIssSpeN7NcEJDIPe0ASTS5WUqFbbxo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713237621; c=relaxed/simple; bh=+54bkrZZ7p47kN5iRbdgTVL1y+ILnJCS6TkzxDMSrII=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=rsl4MlhYSGBO+ZUi/xQrZ6WKt94gf4RybmdEPxv26EIHa5EabpUwLm93FrZRlvdRQHQGAQSRzcOYXi3Zzz21g8QGSPL3snz9j1rnOGs32yWUkU+fHhBF2F9HuhRvTJ+jZcfVBicCgz/4nz1mlXdjqNkiXGCfzMXHaxBJOmjH+SQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=NRrU5klS; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="NRrU5klS" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1713237619; x=1744773619; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+54bkrZZ7p47kN5iRbdgTVL1y+ILnJCS6TkzxDMSrII=; b=NRrU5klSTK6VR7sN3/djb637MAfaIIz1PnRPvV8H4ZVaSk9yWa6H3ENQ 99BQKeEgGEQfZA4ifp9zU4qG4NoGKgKrfuvF61oYGLJmRqVevQ1Qg0UST 5zIzb0s8lU+nzeUpml51jm15yb1++VjyqDzhmJm0oYnejhzZ34y31Vth6 YScIy3MVuwQmnV6w10HM5o2iHcGudk9vRntK/5RSLheWLFqF0dwZ0QXDT f25PKyyDvsTfHZBLnvKcGInXYM/xie6PnWn2XQrl7WmouBgCKPlVnDzQV qANI3OlZbTylIjU6/1Yc9gCZBZ6ryKa5MDdk5uSzzOHq7fSdQv7AGAdMu A==; X-CSE-ConnectionGUID: Cssj7CtiQRmYvAXsrA/Sqw== X-CSE-MsgGUID: jJp5ZSqrQaienAolAR9wyw== X-IronPort-AV: E=McAfee;i="6600,9927,11045"; a="34043384" X-IronPort-AV: E=Sophos;i="6.07,204,1708416000"; d="scan'208";a="34043384" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2024 20:20:17 -0700 X-CSE-ConnectionGUID: +txEDFDWT7m9PfG0OqbhfQ== X-CSE-MsgGUID: O8v+e5t8S6aavb0tr3+DjA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,204,1708416000"; d="scan'208";a="22193617" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmviesa008.fm.intel.com with ESMTP; 15 Apr 2024 20:20:16 -0700 From: Haitao Huang <haitao.huang@linux.intel.com> To: jarkko@kernel.org, dave.hansen@linux.intel.com, kai.huang@intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com, tim.c.chen@linux.intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, chrisyan@microsoft.com Subject: [PATCH v12 08/14] x86/sgx: Add basic EPC reclamation flow for cgroup Date: Mon, 15 Apr 2024 20:20:05 -0700 Message-Id: <20240416032011.58578-9-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240416032011.58578-1-haitao.huang@linux.intel.com> References: <20240416032011.58578-1-haitao.huang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-sgx@vger.kernel.org List-Id: <linux-sgx.vger.kernel.org> List-Subscribe: <mailto:linux-sgx+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-sgx+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Add Cgroup support for SGX EPC memory \| expand [v12,00/14] Add Cgroup support for SGX EPC memory [v12,01/14] x86/sgx: Replace boolean parameters with enums [v12,02/14] cgroup/misc: Add per resource callbacks for CSS events [v12,03/14] cgroup/misc: Export APIs for SGX driver [v12,04/14] cgroup/misc: Add SGX EPC resource type [v12,05/14] x86/sgx: Implement basic EPC misc cgroup functionality [v12,06/14] x86/sgx: Add sgx_epc_lru_list to encapsulate LRU list [v12,07/14] x86/sgx: Abstract tracking reclaimable pages in LRU [v12,08/14] x86/sgx: Add basic EPC reclamation flow for cgroup [v12,09/14] x86/sgx: Implement async reclamation for cgroup [v12,10/14] x86/sgx: Charge mem_cgroup for per-cgroup reclamation [v12,11/14] x86/sgx: Abstract check for global reclaimable pages [v12,12/14] x86/sgx: Turn on per-cgroup EPC reclamation [v12,13/14] Docs/x86/sgx: Add description for cgroup support [v12,14/14] selftests/sgx: Add scripts for EPC cgroup testing

[v12,08/14] x86/sgx: Add basic EPC reclamation flow for cgroup

Commit Message

Comments

Patch