[v14,08/14] x86/sgx: Add basic EPC reclamation flow for cgroup

From: Kristen Carlson Accardi <kristen@linux.intel.com>

From: Kristen Carlson Accardi <kristen@linux.intel.com>

Currently in the EPC page allocation, the kernel simply fails the
allocation when the current EPC cgroup fails to charge due to its usage
reaching limit.  This is not ideal. When that happens, a better way is
to reclaim EPC page(s) from the current EPC cgroup (and/or its
descendants) to reduce its usage so the new allocation can succeed.

Add the basic building blocks to support per-cgroup reclamation.

Currently the kernel only has one place to reclaim EPC pages: the global
EPC LRU list.  To support the "per-cgroup" EPC reclaim, maintain an LRU
list for each EPC cgroup, and introduce a "cgroup" variant function to
reclaim EPC pages from a given EPC cgroup and its descendants.

Currently the kernel does the global EPC reclaim in sgx_reclaim_page().
It always tries to reclaim EPC pages in batch of SGX_NR_TO_SCAN (16)
pages.  Specifically, it always "scans", or "isolates" SGX_NR_TO_SCAN
pages from the global LRU, and then tries to reclaim these pages at once
for better performance.

Implement the "cgroup" variant EPC reclaim in a similar way, but keep
the implementation simple: 1) change sgx_reclaim_pages() to take an LRU
as input, and return the pages that are "scanned" and attempted for
reclamation (but not necessarily reclaimed successfully); 2) loop the
given EPC cgroup and its descendants and do the new sgx_reclaim_pages()
until SGX_NR_TO_SCAN pages are "scanned".

This implementation, encapsulated in sgx_cgroup_reclaim_pages(), always
tries to reclaim SGX_NR_TO_SCAN pages from the LRU of the given EPC
cgroup, and only moves to its descendants when there's no enough
reclaimable EPC pages to "scan" in its LRU.  It should be enough for
most cases. In other cases, the caller may invoke this function in a
loop to ensure enough pages reclaimed for its usage. To ensure all
descendant groups scanned in a round-robin fashion in those cases,
sgx_cgroup_reclaim_pages() takes in a starting cgroup and returns the
next cgroup that the caller can pass in as the new starting cgroup for a
subsequent call.

Note, this simple implementation doesn't _exactly_ mimic the current
global EPC reclaim (which always tries to do the actual reclaim in batch
of SGX_NR_TO_SCAN pages): when LRUs have less than SGX_NR_TO_SCAN
reclaimable pages, the actual reclaim of EPC pages will be split into
smaller batches _across_ multiple LRUs with each being smaller than
SGX_NR_TO_SCAN pages.

A more precise way to mimic the current global EPC reclaim would be to
have a new function to only "scan" (or "isolate") SGX_NR_TO_SCAN pages
_across_ the given EPC cgroup _AND_ its descendants, and then do the
actual reclaim in one batch.  But this is unnecessarily complicated at
this stage.

Alternatively, the current sgx_reclaim_pages() could be changed to
return the actual "reclaimed" pages, but not "scanned" pages. However,
the reclamation is a lengthy process, forcing a successful reclamation
of predetermined number of pages may block the caller for too long. And
that may not be acceptable in some synchronous contexts, e.g., in
serving an ioctl().

With this building block in place, add synchronous reclamation support
in sgx_cgroup_try_charge(): trigger a call to
sgx_cgroup_reclaim_pages() if the cgroup reaches its limit and the
caller allows synchronous reclaim as indicated by s newly added
parameter.

A later patch will add support for asynchronous reclamation reusing
sgx_cgroup_reclaim_pages().

Note all reclaimable EPC pages are still tracked in the global LRU thus
no per-cgroup reclamation is actually active at the moment. Per-cgroup
tracking and reclamation will be turned on in the end after all
necessary infrastructure is in place.

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Co-developed-by: Haitao Huang <haitao.huang@linux.intel.com>
Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
---
V14:
- Allow sgx_cgroup_reclaim_pages() to continue from previous tree-walk.
It takes in a 'start' node and returns the 'next' node for the caller to
use as the new 'start'. This is to ensure pages in lower level cgroups
can be reclaimed if all pages in upper level nodes are "too young".
(Kai)
- Move renaming sgx_should_reclaim() to sgx_should_reclaim_global() from
a later patch to this one. (Kai)

V11:
- Use commit message suggested by Kai
- Remove "usage" comments for functions. (Kai)

V10:
- Simplify the signature by removing a pointer to nr_to_scan (Kai)
- Return pages attempted instead of reclaimed as it is really what the
cgroup caller needs to track progress. This further simplifies the design.
- Merge patch for exposing sgx_reclaim_pages() with basic synchronous
reclamation. (Kai)
- Shorten names for EPC cgroup functions. (Jarkko)
- Fix/add comments to justify the design (Kai)
- Separate out a helper for for addressing single iteration of the loop
in sgx_cgroup_try_charge(). (Jarkko)

V9:
- Add comments for static variables. (Jarkko)

V8:
- Use width of 80 characters in text paragraphs. (Jarkko)
- Remove alignment for substructure variables. (Jarkko)

V7:
- Reworked from patch 9 of V6, "x86/sgx: Restructure top-level EPC reclaim
function". Do not split the top level function (Kai)
- Dropped patches 7 and 8 of V6.
- Split this out from the big patch, #10 in V6. (Dave, Kai)
---
 arch/x86/kernel/cpu/sgx/epc_cgroup.c | 149 ++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/sgx/epc_cgroup.h |   5 +-
 arch/x86/kernel/cpu/sgx/main.c       |  55 ++++++----
 arch/x86/kernel/cpu/sgx/sgx.h        |   1 +
 4 files changed, 183 insertions(+), 27 deletions(-)

Message ID	20240531222630.4634-9-haitao.huang@linux.intel.com (mailing list archive)
State	New, archived
Headers	show Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 596F112FB2F; Fri, 31 May 2024 22:26:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717194400; cv=none; b=ptFZML++jooJt4RWc8hpYaWM5DBrku8XbVKZ1SA5pazVD33LI511/zzl6v+k2bUtMqP7dLNqWEidJ/4GOzQWNM09IWq4zsTiZ2+ALSaPJkpxx95en8AP2gSbNFXyEG/g1GGd3mbrYZ9B1aYvk6TYy/cWnjS8RMwn0hQOKJw0lhw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717194400; c=relaxed/simple; bh=UJuS7mu2GOF4JEYJpGcdD5mywD6itpqClAursfA189Y=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=g8udTiGYpJWPxukMKZcnh/qVwKto5Bb3CRFxU1ws5EfUtIvZTYxA+sFZ2t/pCwnuJsqKVwkaEENkGZVMdfMx8WKAjuUlGup2vBQ1knDvx69sr7JsjsDOSxfFFwrmrUJpDbuSen8PkqvvZnIe/ryWOr1HEePuZtl3O61QvsXUcJM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=LrenDag6; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="LrenDag6" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1717194398; x=1748730398; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=UJuS7mu2GOF4JEYJpGcdD5mywD6itpqClAursfA189Y=; b=LrenDag60kJzAGhpuAWCBD4psjFgShUFlhCVJzBypIozmwVZrvE9aXq5 uSBrUX655QLrdtSofNJc3KVvgwfeIriU9QnbJOXpCUm2I0POBUBHiRKFW qamTXdvlXnvhRAoFiztwn3EOc3Rw7QN/X7zgscJEbEhpb8b7Ob/h1k1+1 t7UU7ZJJ3sCx7Fsx76i7KIOytnVwSjtEd4zkzyhqQTl8VQ1C/cCn7LZwL JeoqaSOlk4zHyeWIKw9mpReY9sbX2AztIydS2vX3tn5N/vKOkRA8sz+hP XmsAoNctz1RPvK3M+lC/9bQhj9VKTMSim9rW2ihV8czq8Bq4Z3Zor9lNW Q==; X-CSE-ConnectionGUID: qrYFusJnR7udI0/fdvwPyA== X-CSE-MsgGUID: 2GQoykUMQMus5/FEkhgUpQ== X-IronPort-AV: E=McAfee;i="6600,9927,11089"; a="13949794" X-IronPort-AV: E=Sophos;i="6.08,205,1712646000"; d="scan'208";a="13949794" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 May 2024 15:26:36 -0700 X-CSE-ConnectionGUID: pS6wsOZoSrqiKwDEH6IGQg== X-CSE-MsgGUID: iE8fLZ7/S+ieRL/rnc9pow== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,205,1712646000"; d="scan'208";a="40736936" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmviesa005.fm.intel.com with ESMTP; 31 May 2024 15:26:35 -0700 From: Haitao Huang <haitao.huang@linux.intel.com> To: jarkko@kernel.org, dave.hansen@linux.intel.com, kai.huang@intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com, tim.c.chen@linux.intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, chrisyan@microsoft.com Subject: [PATCH v14 08/14] x86/sgx: Add basic EPC reclamation flow for cgroup Date: Fri, 31 May 2024 15:26:24 -0700 Message-Id: <20240531222630.4634-9-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240531222630.4634-1-haitao.huang@linux.intel.com> References: <20240531222630.4634-1-haitao.huang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-sgx@vger.kernel.org List-Id: <linux-sgx.vger.kernel.org> List-Subscribe: <mailto:linux-sgx+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-sgx+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Add Cgroup support for SGX EPC memory \| expand [v14,00/14] Add Cgroup support for SGX EPC memory [v14,01/14] x86/sgx: Replace boolean parameters with enums [v14,02/14] cgroup/misc: Add per resource callbacks for CSS events [v14,03/14] cgroup/misc: Export APIs for SGX driver [v14,04/14] cgroup/misc: Add SGX EPC resource type [v14,05/14] x86/sgx: Implement basic EPC misc cgroup functionality [v14,06/14] x86/sgx: Add sgx_epc_lru_list to encapsulate LRU list [v14,07/14] x86/sgx: Abstract tracking reclaimable pages in LRU [v14,08/14] x86/sgx: Add basic EPC reclamation flow for cgroup [v14,09/14] x86/sgx: Abstract check for global reclaimable pages [v14,10/14] x86/sgx: Implement async reclamation for cgroup [v14,11/14] x86/sgx: Charge mem_cgroup for per-cgroup reclamation [v14,12/14] x86/sgx: Turn on per-cgroup EPC reclamation [v14,13/14] Docs/x86/sgx: Add description for cgroup support [v14,14/14] selftests/sgx: Add scripts for EPC cgroup testing

[v14,08/14] x86/sgx: Add basic EPC reclamation flow for cgroup

Commit Message

Patch