From patchwork Thu Sep 22 17:10:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 12985584 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F823C54EE9 for ; Thu, 22 Sep 2022 17:11:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231510AbiIVRLP (ORCPT ); Thu, 22 Sep 2022 13:11:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231294AbiIVRLO (ORCPT ); Thu, 22 Sep 2022 13:11:14 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C0C2EC541; Thu, 22 Sep 2022 10:11:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663866673; x=1695402673; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ifb30cg6P3JKvBBa008GWBRWYGH5CsWJa5YGYBaxzPg=; b=iGeegM4aoNIwR7xjkvi4BBeFVBXbC+s0Po3o98gR1hhnt6qMWYM4zFPc SFsX1tM0B3XwRhPffvTOMCMEdIseGOaVjwG5RCY9EQpYewVLARxbPHs0e epcj0OviE++mhinBnE5aDIXD01w7TOOrzyXmxmKIA3Ui/GH30DaUj3auj KT0u+UAvuXCkfCDQYEXOiNbjFwjB5+Wc6/7nFO+sXPZjeBsAzQaXf6Y0h pYqTH73ETk1pu79nCWHcwAVJGJEp4bJKaDTjPUC600u2IdmxajYs7R3z3 kEsKzvGiprIIqmX33riSiVrAY8fl9NQNBGux0mAI5/obywq3JJicunDly w==; X-IronPort-AV: E=McAfee;i="6500,9779,10478"; a="326689825" X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="326689825" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:11 -0700 X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="762269823" Received: from sknaidu-mobl1.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.165.187]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:07 -0700 From: Kristen Carlson Accardi To: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Jarkko Sakkinen , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: Kristen Carlson Accardi , Sean Christopherson Subject: [RFC PATCH 01/20] x86/sgx: Call cond_resched() at the end of sgx_reclaim_pages() Date: Thu, 22 Sep 2022 10:10:38 -0700 Message-Id: <20220922171057.1236139-2-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220922171057.1236139-1-kristen@linux.intel.com> References: <20220922171057.1236139-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Move the invocation of post-reclaim cond_resched() from the callers of sgx_reclaim_pages() into the reclaim path itself. sgx_reclaim_pages() is always called in a loop and is always followed by a call to cond_resched(). This will hold true for the EPC cgroup as well, which adds even more calls to sgx_reclaim_pages() and thus cond_resched(). Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/main.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 515e2a5f25bb..4cdeb915dc86 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -367,6 +367,8 @@ static void sgx_reclaim_pages(void) sgx_free_epc_page(epc_page); } + + cond_resched(); } static bool sgx_should_reclaim(unsigned long watermark) @@ -410,8 +412,6 @@ static int ksgxd(void *p) if (sgx_should_reclaim(SGX_NR_HIGH_PAGES)) sgx_reclaim_pages(); - - cond_resched(); } return 0; @@ -578,7 +578,6 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) } sgx_reclaim_pages(); - cond_resched(); } if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) From patchwork Thu Sep 22 17:10:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 12985585 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4313C6FA8B for ; Thu, 22 Sep 2022 17:11:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231531AbiIVRLQ (ORCPT ); Thu, 22 Sep 2022 13:11:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39290 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229530AbiIVRLO (ORCPT ); Thu, 22 Sep 2022 13:11:14 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 869B2EC556; Thu, 22 Sep 2022 10:11:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663866673; x=1695402673; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/jwJz/0WALvq7OG3385Vm/i2qLLDEZvPaM+GPnlyhu0=; b=iMSd57bSTkpPVeOOlY2OdO72UFL6HZqhdTujj4H0dy2mu95uswY6kEZX 8Wz2LB07SuiHvQXwy8B0l50FEEXLNd2k4PPHnRhRAnTig5awT77/zzSjb MKh15S43E8+B8B0ye3JFRGGIJErzGHmW14hijDbKpZucXd5P38X1CwSKP VtgiSBY8e4wecp7lp61mwpD9/kBVU6STVlQpf117Suh7c/4p92SqTfNEb eObOEj8vhvZUlN8/VeRzk7utBD1ndP7WPOgpk8SAwW9pPA3bjHlJNapYf K8icMme2SjJw+Jf7ycyhrKciMDK7gmcfNjwyyUaWUqvHOVA7w7VwLg7xy A==; X-IronPort-AV: E=McAfee;i="6500,9779,10478"; a="326689827" X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="326689827" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:12 -0700 X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="762269853" Received: from sknaidu-mobl1.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.165.187]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:09 -0700 From: Kristen Carlson Accardi To: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Jarkko Sakkinen , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: Kristen Carlson Accardi , Sean Christopherson Subject: [RFC PATCH 02/20] x86/sgx: Store EPC page owner as a 'void *' to handle multiple users Date: Thu, 22 Sep 2022 10:10:39 -0700 Message-Id: <20220922171057.1236139-3-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220922171057.1236139-1-kristen@linux.intel.com> References: <20220922171057.1236139-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson A future patch will use the owner field for either a pointer to a struct sgx_encl, or a struct sgx_encl_page. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/sgx.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 0f2020653fba..5a7e858a8f98 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -33,7 +33,7 @@ struct sgx_epc_page { unsigned int section; u16 flags; u16 poison; - struct sgx_encl_page *owner; + void *owner; struct list_head list; }; From patchwork Thu Sep 22 17:10:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 12985586 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25821C6FA92 for ; Thu, 22 Sep 2022 17:11:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231539AbiIVRLQ (ORCPT ); Thu, 22 Sep 2022 13:11:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231455AbiIVRLP (ORCPT ); Thu, 22 Sep 2022 13:11:15 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D09EE723E; Thu, 22 Sep 2022 10:11:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663866674; x=1695402674; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DjiUzJ8L4GVZjCfQCnWcxNqSYGYXwpjlzhw0010ck3A=; b=PbjQlYpsf+qDXLzF2/dd4XfKEeoiHYi9V5msC0vyxwFEC401SsFyAAFD GjYkkYcqW9TOWASeSad3sMZSko8Ay71r7zPQqN+PQQUK+R+wAeiXqAxPp BeFTijve4laZfTdcNpYnAtP3UE+DD0nGDFDQYzpgSyzvHSPE1LGyLi0Tr pFKggs52Nx5KX7Av8VjP41wnhKiMYEXB1BRaCq4BMu7/IM7Ry33gWhyDV /F4gGGPqCHQ3Z5GAODOpwP4FRYL7ens2AgIzBrERc5252aedC/tiJFHSA mEsx0AFzVjELePWjtENLozal1Rtj0lBuZCIyfjBGTthMNF325f6UD1Sgv Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10478"; a="326689836" X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="326689836" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:13 -0700 X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="762269880" Received: from sknaidu-mobl1.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.165.187]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:12 -0700 From: Kristen Carlson Accardi To: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Jarkko Sakkinen , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: Kristen Carlson Accardi , Sean Christopherson Subject: [RFC PATCH 03/20] x86/sgx: Track owning enclave in VA EPC pages Date: Thu, 22 Sep 2022 10:10:40 -0700 Message-Id: <20220922171057.1236139-4-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220922171057.1236139-1-kristen@linux.intel.com> References: <20220922171057.1236139-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson In order to fully account for an enclave's EPC page usage, store the owning enclave of a VA EPC page. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/encl.c | 5 ++++- arch/x86/kernel/cpu/sgx/encl.h | 2 +- arch/x86/kernel/cpu/sgx/ioctl.c | 2 +- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index f40d64206ded..a18f1311b57d 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -1193,6 +1193,7 @@ void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr) /** * sgx_alloc_va_page() - Allocate a Version Array (VA) page + * @encl: The enclave that this page is allocated to. * @reclaim: Reclaim EPC pages directly if none available. Enclave * mutex should not be held if this is set. * @@ -1202,7 +1203,7 @@ void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr) * a VA page, * -errno otherwise */ -struct sgx_epc_page *sgx_alloc_va_page(bool reclaim) +struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool reclaim) { struct sgx_epc_page *epc_page; int ret; @@ -1218,6 +1219,8 @@ struct sgx_epc_page *sgx_alloc_va_page(bool reclaim) return ERR_PTR(-EFAULT); } + epc_page->owner = encl; + return epc_page; } diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h index f94ff14c9486..831d63f80f5a 100644 --- a/arch/x86/kernel/cpu/sgx/encl.h +++ b/arch/x86/kernel/cpu/sgx/encl.h @@ -116,7 +116,7 @@ struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl, unsigned long offset, u64 secinfo_flags); void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr); -struct sgx_epc_page *sgx_alloc_va_page(bool reclaim); +struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool reclaim); unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page); void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset); bool sgx_va_page_full(struct sgx_va_page *va_page); diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index ebe79d60619f..9a1bb3c3211a 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -30,7 +30,7 @@ struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl, bool reclaim) if (!va_page) return ERR_PTR(-ENOMEM); - va_page->epc_page = sgx_alloc_va_page(reclaim); + va_page->epc_page = sgx_alloc_va_page(encl, reclaim); if (IS_ERR(va_page->epc_page)) { err = ERR_CAST(va_page->epc_page); kfree(va_page); From patchwork Thu Sep 22 17:10:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 12985587 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7716C54EE9 for ; Thu, 22 Sep 2022 17:11:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231770AbiIVRLX (ORCPT ); Thu, 22 Sep 2022 13:11:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39504 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231689AbiIVRLT (ORCPT ); Thu, 22 Sep 2022 13:11:19 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 41B9CFF3C0; Thu, 22 Sep 2022 10:11:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663866677; x=1695402677; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SP1h/AVDS1rYjbGfp3sx5WI8G6QhYX2mnj2uQhxeucs=; b=Av1poSON40f8mP1dvl/8MU30CTHglbebMGhwX8KiOBji0rVtMleYjtaR gQ8ezGe+37aqkam+quptBqrBou04MXWfdPZ6kA6Es/9FHphpWSh7AHI0i bGwHK4RrySoq0qtcCOgB5HpQnplpDQOKa1Ve7pySDNdr5wgoZ+SHKS27b BRZ9ob9yQtInzquR5ZE9u2Ek1u2N4DTPMct5f/WGLJnAtYCbzH/mMRkMv X8KfHwm/rfv73RNy+meRwKRDaD9yagFJB1h27By+mZOJ/CNVQV2EUOFu2 2Sr+anBkSN5OTPiTdNT4xBw6HyZnIdXYJJIRJBMqiG1V9hWzqjZnpl2Vu Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10478"; a="326689848" X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="326689848" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:16 -0700 X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="762269906" Received: from sknaidu-mobl1.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.165.187]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:14 -0700 From: Kristen Carlson Accardi To: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Jarkko Sakkinen , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: Kristen Carlson Accardi , Sean Christopherson Subject: [RFC PATCH 04/20] x86/sgx: Add 'struct sgx_epc_lru' to encapsulate lru list(s) Date: Thu, 22 Sep 2022 10:10:41 -0700 Message-Id: <20220922171057.1236139-5-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220922171057.1236139-1-kristen@linux.intel.com> References: <20220922171057.1236139-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Wrap the existing reclaimable list and its spinlock in a struct to minimize the code changes needed to handle multiple LRUs as well as reclaimable and non-reclaimable lists, both of which will be introduced and used by SGX EPC cgroups. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/main.c | 37 +++++++++++++++++----------------- arch/x86/kernel/cpu/sgx/sgx.h | 11 ++++++++++ 2 files changed, 30 insertions(+), 18 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 4cdeb915dc86..af68dc1c677b 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -26,10 +26,9 @@ static DEFINE_XARRAY(sgx_epc_address_space); /* * These variables are part of the state of the reclaimer, and must be accessed - * with sgx_reclaimer_lock acquired. + * with sgx_global_lru.lock acquired. */ -static LIST_HEAD(sgx_active_page_list); -static DEFINE_SPINLOCK(sgx_reclaimer_lock); +static struct sgx_epc_lru sgx_global_lru; static atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0); @@ -298,12 +297,12 @@ static void sgx_reclaim_pages(void) int ret; int i; - spin_lock(&sgx_reclaimer_lock); + spin_lock(&sgx_global_lru.lock); for (i = 0; i < SGX_NR_TO_SCAN; i++) { - if (list_empty(&sgx_active_page_list)) + if (list_empty(&sgx_global_lru.reclaimable)) break; - epc_page = list_first_entry(&sgx_active_page_list, + epc_page = list_first_entry(&sgx_global_lru.reclaimable, struct sgx_epc_page, list); list_del_init(&epc_page->list); encl_page = epc_page->owner; @@ -316,7 +315,7 @@ static void sgx_reclaim_pages(void) */ epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; } - spin_unlock(&sgx_reclaimer_lock); + spin_unlock(&sgx_global_lru.lock); for (i = 0; i < cnt; i++) { epc_page = chunk[i]; @@ -339,9 +338,9 @@ static void sgx_reclaim_pages(void) continue; skip: - spin_lock(&sgx_reclaimer_lock); - list_add_tail(&epc_page->list, &sgx_active_page_list); - spin_unlock(&sgx_reclaimer_lock); + spin_lock(&sgx_global_lru.lock); + list_add_tail(&epc_page->list, &sgx_global_lru.reclaimable); + spin_unlock(&sgx_global_lru.lock); kref_put(&encl_page->encl->refcount, sgx_encl_release); @@ -374,7 +373,7 @@ static void sgx_reclaim_pages(void) static bool sgx_should_reclaim(unsigned long watermark) { return atomic_long_read(&sgx_nr_free_pages) < watermark && - !list_empty(&sgx_active_page_list); + !list_empty(&sgx_global_lru.reclaimable); } /* @@ -427,6 +426,8 @@ static bool __init sgx_page_reclaimer_init(void) ksgxd_tsk = tsk; + sgx_lru_init(&sgx_global_lru); + return true; } @@ -502,10 +503,10 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) */ void sgx_mark_page_reclaimable(struct sgx_epc_page *page) { - spin_lock(&sgx_reclaimer_lock); + spin_lock(&sgx_global_lru.lock); page->flags |= SGX_EPC_PAGE_RECLAIMER_TRACKED; - list_add_tail(&page->list, &sgx_active_page_list); - spin_unlock(&sgx_reclaimer_lock); + list_add_tail(&page->list, &sgx_global_lru.reclaimable); + spin_unlock(&sgx_global_lru.lock); } /** @@ -520,18 +521,18 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page *page) */ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page) { - spin_lock(&sgx_reclaimer_lock); + spin_lock(&sgx_global_lru.lock); if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) { /* The page is being reclaimed. */ if (list_empty(&page->list)) { - spin_unlock(&sgx_reclaimer_lock); + spin_unlock(&sgx_global_lru.lock); return -EBUSY; } list_del(&page->list); page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; } - spin_unlock(&sgx_reclaimer_lock); + spin_unlock(&sgx_global_lru.lock); return 0; } @@ -564,7 +565,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - if (list_empty(&sgx_active_page_list)) + if (list_empty(&sgx_global_lru.reclaimable)) return ERR_PTR(-ENOMEM); if (!reclaim) { diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 5a7e858a8f98..7b208ee8eb45 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -83,6 +83,17 @@ static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page) return section->virt_addr + index * PAGE_SIZE; } +struct sgx_epc_lru { + spinlock_t lock; + struct list_head reclaimable; +}; + +static inline void sgx_lru_init(struct sgx_epc_lru *lru) +{ + spin_lock_init(&lru->lock); + INIT_LIST_HEAD(&lru->reclaimable); +} + struct sgx_epc_page *__sgx_alloc_epc_page(void); void sgx_free_epc_page(struct sgx_epc_page *page); From patchwork Thu Sep 22 17:10:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 12985601 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69252C54EE9 for ; Thu, 22 Sep 2022 17:16:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230064AbiIVRQL (ORCPT ); Thu, 22 Sep 2022 13:16:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42476 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231919AbiIVRPy (ORCPT ); Thu, 22 Sep 2022 13:15:54 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB3EBFE074; Thu, 22 Sep 2022 10:15:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663866934; x=1695402934; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=UDz/+joCMOsr/E6xxinNXlAv/UogP8vf91LabdqMtoc=; b=hu+DqGiNgWVVBqPQUqlZoZnwzx51emQGWvl8fosYNc3n2EN/wnwuoGNg 8/br2GViVc2+UzzqpnyLgEmEmmKk5LIx67qEq8EjA7ic/Fno0GzJTRNZr 2FZb7J3PVp4HHBn4A9L4V4Kq5DKWr8jDvN/h+z/nMYpgNRWdOq2WEYG+K Dcf7d5413gbS8GWVnOKw6SMut8/aKwMP9i5vijTUjGkFwCgDON1uCnDtG DVcZK2m3KCLZLOIiHvzeW394ALgqEQSobNm/SNYLJ9iBETAyhAqgfjz8q cEiAPthAmTH+yLc5epo6GjXGNE+2BT9KVAtAl43HxyTuhqc1zPYT++9VD A==; X-IronPort-AV: E=McAfee;i="6500,9779,10478"; a="364351924" X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="364351924" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:40 -0700 X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="762269940" Received: from sknaidu-mobl1.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.165.187]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:17 -0700 From: Kristen Carlson Accardi To: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Jarkko Sakkinen , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: Kristen Carlson Accardi , Sean Christopherson Subject: [RFC PATCH 05/20] x86/sgx: Introduce unreclaimable EPC page lists Date: Thu, 22 Sep 2022 10:10:42 -0700 Message-Id: <20220922171057.1236139-6-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220922171057.1236139-1-kristen@linux.intel.com> References: <20220922171057.1236139-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Add code to keep track of pages that are not tracked by the reclaimer in the LRU's "unreclaimable" list. When there is an OOM event and an enclave must be OOM killed, the EPC pages which are not tracked by the reclaimer can still be freed. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/encl.c | 10 +++++++--- arch/x86/kernel/cpu/sgx/ioctl.c | 11 +++++++---- arch/x86/kernel/cpu/sgx/main.c | 26 +++++++++++++++----------- arch/x86/kernel/cpu/sgx/sgx.h | 7 ++++--- arch/x86/kernel/cpu/sgx/virt.c | 28 ++++++++++++++++++++-------- 5 files changed, 53 insertions(+), 29 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index a18f1311b57d..ad611c06798f 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -252,6 +252,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl, epc_page = sgx_encl_eldu(&encl->secs, NULL); if (IS_ERR(epc_page)) return ERR_CAST(epc_page); + sgx_record_epc_page(epc_page, 0); } epc_page = sgx_encl_eldu(entry, encl->secs.epc_page); @@ -259,7 +260,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl, return ERR_CAST(epc_page); encl->secs_child_cnt++; - sgx_mark_page_reclaimable(entry->epc_page); + sgx_record_epc_page(entry->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); return entry; } @@ -375,7 +376,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma, encl_page->type = SGX_PAGE_TYPE_REG; encl->secs_child_cnt++; - sgx_mark_page_reclaimable(encl_page->epc_page); + sgx_record_epc_page(encl_page->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); phys_addr = sgx_get_epc_phys_addr(epc_page); /* @@ -687,7 +688,7 @@ void sgx_encl_release(struct kref *ref) * The page and its radix tree entry cannot be freed * if the page is being held by the reclaimer. */ - if (sgx_unmark_page_reclaimable(entry->epc_page)) + if (sgx_drop_epc_page(entry->epc_page)) continue; sgx_encl_free_epc_page(entry->epc_page); @@ -703,6 +704,7 @@ void sgx_encl_release(struct kref *ref) xa_destroy(&encl->page_array); if (!encl->secs_child_cnt && encl->secs.epc_page) { + sgx_drop_epc_page(encl->secs.epc_page); sgx_encl_free_epc_page(encl->secs.epc_page); encl->secs.epc_page = NULL; } @@ -711,6 +713,7 @@ void sgx_encl_release(struct kref *ref) va_page = list_first_entry(&encl->va_pages, struct sgx_va_page, list); list_del(&va_page->list); + sgx_drop_epc_page(va_page->epc_page); sgx_encl_free_epc_page(va_page->epc_page); kfree(va_page); } @@ -1218,6 +1221,7 @@ struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool reclaim) sgx_encl_free_epc_page(epc_page); return ERR_PTR(-EFAULT); } + sgx_record_epc_page(epc_page, 0); epc_page->owner = encl; diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index 9a1bb3c3211a..aca80a3f38a1 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -48,6 +48,7 @@ void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page) encl->page_cnt--; if (va_page) { + sgx_drop_epc_page(va_page->epc_page); sgx_encl_free_epc_page(va_page->epc_page); list_del(&va_page->list); kfree(va_page); @@ -113,6 +114,8 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs) encl->attributes = secs->attributes; encl->attributes_mask = SGX_ATTR_DEBUG | SGX_ATTR_MODE64BIT | SGX_ATTR_KSS; + sgx_record_epc_page(encl->secs.epc_page, 0); + /* Set only after completion, as encl->lock has not been taken. */ set_bit(SGX_ENCL_CREATED, &encl->flags); @@ -322,7 +325,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src, goto err_out; } - sgx_mark_page_reclaimable(encl_page->epc_page); + sgx_record_epc_page(encl_page->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); mutex_unlock(&encl->lock); mmap_read_unlock(current->mm); return ret; @@ -958,7 +961,7 @@ static long sgx_enclave_modify_types(struct sgx_encl *encl, * Prevent page from being reclaimed while mutex * is released. */ - if (sgx_unmark_page_reclaimable(entry->epc_page)) { + if (sgx_drop_epc_page(entry->epc_page)) { ret = -EAGAIN; goto out_entry_changed; } @@ -973,7 +976,7 @@ static long sgx_enclave_modify_types(struct sgx_encl *encl, mutex_lock(&encl->lock); - sgx_mark_page_reclaimable(entry->epc_page); + sgx_record_epc_page(entry->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); } /* Change EPC type */ @@ -1130,7 +1133,7 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl, goto out_unlock; } - if (sgx_unmark_page_reclaimable(entry->epc_page)) { + if (sgx_drop_epc_page(entry->epc_page)) { ret = -EBUSY; goto out_unlock; } diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index af68dc1c677b..543bc5b20508 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -262,7 +262,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, goto out; sgx_encl_ewb(encl->secs.epc_page, &secs_backing); - + sgx_drop_epc_page(encl->secs.epc_page); sgx_encl_free_epc_page(encl->secs.epc_page); encl->secs.epc_page = NULL; @@ -495,31 +495,35 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) } /** - * sgx_mark_page_reclaimable() - Mark a page as reclaimable + * sgx_record_epc_page() - Add a page to the LRU tracking * @page: EPC page * - * Mark a page as reclaimable and add it to the active page list. Pages - * are automatically removed from the active list when freed. + * Mark a page with the specified flags and add it to the appropriate + * (un)reclaimable list. */ -void sgx_mark_page_reclaimable(struct sgx_epc_page *page) +void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags) { spin_lock(&sgx_global_lru.lock); - page->flags |= SGX_EPC_PAGE_RECLAIMER_TRACKED; - list_add_tail(&page->list, &sgx_global_lru.reclaimable); + WARN_ON(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED); + page->flags |= flags; + if (flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) + list_add_tail(&page->list, &sgx_global_lru.reclaimable); + else + list_add_tail(&page->list, &sgx_global_lru.unreclaimable); spin_unlock(&sgx_global_lru.lock); } /** - * sgx_unmark_page_reclaimable() - Remove a page from the reclaim list + * sgx_drop_epc_page() - Remove a page from a LRU list * @page: EPC page * - * Clear the reclaimable flag and remove the page from the active page list. + * Clear the reclaimable flag if set and remove the page from its LRU. * * Return: * 0 on success, * -EBUSY if the page is in the process of being reclaimed */ -int sgx_unmark_page_reclaimable(struct sgx_epc_page *page) +int sgx_drop_epc_page(struct sgx_epc_page *page) { spin_lock(&sgx_global_lru.lock); if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) { @@ -529,9 +533,9 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page) return -EBUSY; } - list_del(&page->list); page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; } + list_del(&page->list); spin_unlock(&sgx_global_lru.lock); return 0; diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 7b208ee8eb45..65625ea8fd6e 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -86,20 +86,21 @@ static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page) struct sgx_epc_lru { spinlock_t lock; struct list_head reclaimable; + struct list_head unreclaimable; }; static inline void sgx_lru_init(struct sgx_epc_lru *lru) { spin_lock_init(&lru->lock); INIT_LIST_HEAD(&lru->reclaimable); + INIT_LIST_HEAD(&lru->unreclaimable); } struct sgx_epc_page *__sgx_alloc_epc_page(void); void sgx_free_epc_page(struct sgx_epc_page *page); - void sgx_reclaim_direct(void); -void sgx_mark_page_reclaimable(struct sgx_epc_page *page); -int sgx_unmark_page_reclaimable(struct sgx_epc_page *page); +void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags); +int sgx_drop_epc_page(struct sgx_epc_page *page); struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim); void sgx_ipi_cb(void *info); diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c index 6a77a14eee38..287e235bc3c1 100644 --- a/arch/x86/kernel/cpu/sgx/virt.c +++ b/arch/x86/kernel/cpu/sgx/virt.c @@ -62,6 +62,8 @@ static int __sgx_vepc_fault(struct sgx_vepc *vepc, goto err_delete; } + sgx_record_epc_page(epc_page, 0); + return 0; err_delete: @@ -146,6 +148,7 @@ static int sgx_vepc_free_page(struct sgx_epc_page *epc_page) return ret; } + sgx_drop_epc_page(epc_page); sgx_free_epc_page(epc_page); return 0; } @@ -218,8 +221,15 @@ static int sgx_vepc_release(struct inode *inode, struct file *file) * have been removed, the SECS page must have a child on * another instance. */ - if (sgx_vepc_free_page(epc_page)) + if (sgx_vepc_free_page(epc_page)) { + /* + * Drop the page before adding it to the list of SECS + * pages. Moving the page off the unreclaimable list + * needs to be done under the LRU's spinlock. + */ + sgx_drop_epc_page(epc_page); list_add_tail(&epc_page->list, &secs_pages); + } xa_erase(&vepc->page_array, index); } @@ -234,15 +244,17 @@ static int sgx_vepc_release(struct inode *inode, struct file *file) mutex_lock(&zombie_secs_pages_lock); list_for_each_entry_safe(epc_page, tmp, &zombie_secs_pages, list) { /* - * Speculatively remove the page from the list of zombies, - * if the page is successfully EREMOVE'd it will be added to - * the list of free pages. If EREMOVE fails, throw the page - * on the local list, which will be spliced on at the end. + * If EREMOVE fails, throw the page on the local list, which + * will be spliced on at the end. + * + * Note, this abuses sgx_drop_epc_page() to delete the page off + * the list of zombies, but this is a very rare path (probably + * never hit in production). It's not worth special casing the + * free path for this super rare case just to avoid taking the + * LRU's spinlock. */ - list_del(&epc_page->list); - if (sgx_vepc_free_page(epc_page)) - list_add_tail(&epc_page->list, &secs_pages); + list_move_tail(&epc_page->list, &secs_pages); } if (!list_empty(&secs_pages)) From patchwork Thu Sep 22 17:10:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 12985602 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1AAEC6FA82 for ; Thu, 22 Sep 2022 17:16:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230141AbiIVRQM (ORCPT ); Thu, 22 Sep 2022 13:16:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43228 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229905AbiIVRPz (ORCPT ); Thu, 22 Sep 2022 13:15:55 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CCE6A8169C; Thu, 22 Sep 2022 10:15:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663866936; x=1695402936; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Nj39Q9ceqr5A+bt2PwpCeJxtAg2UJZ0zT/MG8hez1vE=; b=dHm3Vzz3l5arsdDq+EDfzMzGSzpvwUmJ83p720XVbIHPoJ7XVlS3MSBo iB6vjClvWBvamj0q0Jh9AfIdp0q5bLxxWSpZXQAdgDNoPDPenkpd2JJ1j uqEMRw7YA1hJkdwH5fJZhSK1IkVHjgtL9/3h6Epha/sH1mFS0USAuRHFN hrop34GTvoE4hD3pdG2+j8DF9PpyUEa/2N8cxQYhYxwnyivDS2Ob54MPl l44lH9HyVeqYOaq7erLidypc6RQY9JVTA4HxOyQ8r5ESvOU3MPjdnXisQ pKt22qyjUW+l1/heF5jhs/wxWfbiwqazc8baKqFkLvuSpU2t4hNP5BkgJ w==; X-IronPort-AV: E=McAfee;i="6500,9779,10478"; a="364351930" X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="364351930" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:41 -0700 X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="762270003" Received: from sknaidu-mobl1.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.165.187]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:23 -0700 From: Kristen Carlson Accardi To: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Jarkko Sakkinen , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: Kristen Carlson Accardi , Sean Christopherson Subject: [RFC PATCH 06/20] x86/sgx: Introduce RECLAIM_IN_PROGRESS flag for EPC pages Date: Thu, 22 Sep 2022 10:10:43 -0700 Message-Id: <20220922171057.1236139-7-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220922171057.1236139-1-kristen@linux.intel.com> References: <20220922171057.1236139-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Keep track of whether the EPC page is in the middle of being reclaimed and do not delete the page off the it's LRU if it has not yet finished being reclaimed. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/main.c | 14 +++++++++----- arch/x86/kernel/cpu/sgx/sgx.h | 5 +++++ 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 543bc5b20508..93aa9e09c26d 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -307,13 +307,15 @@ static void sgx_reclaim_pages(void) list_del_init(&epc_page->list); encl_page = epc_page->owner; - if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) + if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) { + epc_page->flags |= SGX_EPC_PAGE_RECLAIM_IN_PROGRESS; chunk[cnt++] = epc_page; - else + } else { /* The owner is freeing the page. No need to add the * page back to the list of reclaimable pages. */ epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; + } } spin_unlock(&sgx_global_lru.lock); @@ -339,6 +341,7 @@ static void sgx_reclaim_pages(void) skip: spin_lock(&sgx_global_lru.lock); + epc_page->flags &= ~SGX_EPC_PAGE_RECLAIM_IN_PROGRESS; list_add_tail(&epc_page->list, &sgx_global_lru.reclaimable); spin_unlock(&sgx_global_lru.lock); @@ -362,7 +365,8 @@ static void sgx_reclaim_pages(void) sgx_reclaimer_write(epc_page, &backing[i]); kref_put(&encl_page->encl->refcount, sgx_encl_release); - epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; + epc_page->flags &= ~(SGX_EPC_PAGE_RECLAIMER_TRACKED | + SGX_EPC_PAGE_RECLAIM_IN_PROGRESS); sgx_free_epc_page(epc_page); } @@ -504,7 +508,7 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags) { spin_lock(&sgx_global_lru.lock); - WARN_ON(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED); + WARN_ON(page->flags & SGX_EPC_PAGE_RECLAIM_FLAGS); page->flags |= flags; if (flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) list_add_tail(&page->list, &sgx_global_lru.reclaimable); @@ -528,7 +532,7 @@ int sgx_drop_epc_page(struct sgx_epc_page *page) spin_lock(&sgx_global_lru.lock); if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) { /* The page is being reclaimed. */ - if (list_empty(&page->list)) { + if (page->flags & SGX_EPC_PAGE_RECLAIM_IN_PROGRESS) { spin_unlock(&sgx_global_lru.lock); return -EBUSY; } diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 65625ea8fd6e..284d0cda9e36 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -29,6 +29,11 @@ /* Pages on free list */ #define SGX_EPC_PAGE_IS_FREE BIT(1) +/* page flag to indicate reclaim is in progress */ +#define SGX_EPC_PAGE_RECLAIM_IN_PROGRESS BIT(2) +#define SGX_EPC_PAGE_RECLAIM_FLAGS (SGX_EPC_PAGE_RECLAIMER_TRACKED | \ + SGX_EPC_PAGE_RECLAIM_IN_PROGRESS) + struct sgx_epc_page { unsigned int section; u16 flags; From patchwork Thu Sep 22 17:10:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 12985603 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02151C6FA8B for ; Thu, 22 Sep 2022 17:16:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230419AbiIVRQO (ORCPT ); Thu, 22 Sep 2022 13:16:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42012 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229713AbiIVRPz (ORCPT ); Thu, 22 Sep 2022 13:15:55 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 204C42CCAB; Thu, 22 Sep 2022 10:15:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663866936; x=1695402936; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DRy35tf5iLyrHuxp2SgL7lHDqjPxBWrPxC87VIa2/uA=; b=m0HVHg6povi8OQ8CI7/MQbPJwtOS+85+4I0bgmZgszRbvpObJJ9Xcxuo FyUZ7FZEeRY8Yfd3OEtEx1ppOrnDiwmL/4c+aPJAY4bC95on7k0BH2Hja 6IB1J5w+d3ovgfroMB4gOkdEl2kiLBrPF7gqD7ldTjbfNPGW6HdBCbXnc mI1uoi0c+9T3MdDkr3ccSdE4drKQGlZHOsmTfaMAf6CQ3JqsxTFjQFCE8 a3op0OBa+8DVAYxnRVLexeZ2m1fAZQtkI9xVk5vzeNLaLC+rUHTXVxSF0 WFw1zSdJydqsmYXpP2kBzH3YrrdcfHf92Ed5jEaa2RFafI6vycI79Q2FZ w==; X-IronPort-AV: E=McAfee;i="6500,9779,10478"; a="364351940" X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="364351940" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:43 -0700 X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="762270053" Received: from sknaidu-mobl1.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.165.187]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:25 -0700 From: Kristen Carlson Accardi To: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Jarkko Sakkinen , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: Kristen Carlson Accardi , Sean Christopherson Subject: [RFC PATCH 07/20] x86/sgx: Use a list to track to-be-reclaimed pages during reclaim Date: Thu, 22 Sep 2022 10:10:44 -0700 Message-Id: <20220922171057.1236139-8-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220922171057.1236139-1-kristen@linux.intel.com> References: <20220922171057.1236139-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Change sgx_reclaim_pages() to use a list rather than an array for storing the epc_pages which will be reclaimed. This change is needed to transition to the LRU implementation for EPC cgroup support. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/main.c | 43 +++++++++++++++------------------- 1 file changed, 19 insertions(+), 24 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 93aa9e09c26d..085c06fdc359 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -288,12 +288,11 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, */ static void sgx_reclaim_pages(void) { - struct sgx_epc_page *chunk[SGX_NR_TO_SCAN]; struct sgx_backing backing[SGX_NR_TO_SCAN]; struct sgx_encl_page *encl_page; - struct sgx_epc_page *epc_page; + struct sgx_epc_page *epc_page, *tmp; pgoff_t page_index; - int cnt = 0; + LIST_HEAD(iso); int ret; int i; @@ -304,23 +303,26 @@ static void sgx_reclaim_pages(void) epc_page = list_first_entry(&sgx_global_lru.reclaimable, struct sgx_epc_page, list); - list_del_init(&epc_page->list); encl_page = epc_page->owner; if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) { epc_page->flags |= SGX_EPC_PAGE_RECLAIM_IN_PROGRESS; - chunk[cnt++] = epc_page; + list_move_tail(&epc_page->list, &iso); } else { - /* The owner is freeing the page. No need to add the - * page back to the list of reclaimable pages. + /* The owner is freeing the page, remove it from the + * LRU list */ epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; + list_del_init(&epc_page->list); } } spin_unlock(&sgx_global_lru.lock); - for (i = 0; i < cnt; i++) { - epc_page = chunk[i]; + if (list_empty(&iso)) + goto out; + + i = 0; + list_for_each_entry_safe(epc_page, tmp, &iso, list) { encl_page = epc_page->owner; if (!sgx_reclaimer_age(epc_page)) @@ -335,6 +337,7 @@ static void sgx_reclaim_pages(void) goto skip; } + i++; encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED; mutex_unlock(&encl_page->encl->lock); continue; @@ -342,27 +345,19 @@ static void sgx_reclaim_pages(void) skip: spin_lock(&sgx_global_lru.lock); epc_page->flags &= ~SGX_EPC_PAGE_RECLAIM_IN_PROGRESS; - list_add_tail(&epc_page->list, &sgx_global_lru.reclaimable); + list_move_tail(&epc_page->list, &sgx_global_lru.reclaimable); spin_unlock(&sgx_global_lru.lock); kref_put(&encl_page->encl->refcount, sgx_encl_release); - - chunk[i] = NULL; - } - - for (i = 0; i < cnt; i++) { - epc_page = chunk[i]; - if (epc_page) - sgx_reclaimer_block(epc_page); } - for (i = 0; i < cnt; i++) { - epc_page = chunk[i]; - if (!epc_page) - continue; + list_for_each_entry(epc_page, &iso, list) + sgx_reclaimer_block(epc_page); + i= 0; + list_for_each_entry_safe(epc_page, tmp, &iso, list) { encl_page = epc_page->owner; - sgx_reclaimer_write(epc_page, &backing[i]); + sgx_reclaimer_write(epc_page, &backing[i++]); kref_put(&encl_page->encl->refcount, sgx_encl_release); epc_page->flags &= ~(SGX_EPC_PAGE_RECLAIMER_TRACKED | @@ -370,7 +365,7 @@ static void sgx_reclaim_pages(void) sgx_free_epc_page(epc_page); } - +out: cond_resched(); } From patchwork Thu Sep 22 17:10:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 12985604 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFAF7C6FA91 for ; Thu, 22 Sep 2022 17:16:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229740AbiIVRQO (ORCPT ); Thu, 22 Sep 2022 13:16:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46016 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231960AbiIVRPz (ORCPT ); Thu, 22 Sep 2022 13:15:55 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0146F267A; Thu, 22 Sep 2022 10:15:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663866937; x=1695402937; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=RZlp1H2OsjXwhzzSjzzWVpctKJLOlloT8euGikMJFHM=; b=biIPUgJxGAHggXeEp2S8kM0xOA79muhQaNsIWCY3OzDbNQry/R3WdXjj VfMK7Lii6ZapTJxHPQ20pTR0CYy8nRd35uNpfEQ5plNmHRqWTJtYG4iEy JoTsDm8huwN1poAO1MLnQZekK74DA2vYV+LMdx6RxnWJ+5tYtWHiAFGz9 B+S6mqUVU+GCQPA12bJaSdQS9V6i+UqJHX4wzsIvjXKFhE7fZLnD8kckV VUaRXOQLgrxcOPoR8f1ENUHkHiPgMJJtXQcxwCcu0cHkC7wWxNsneGuzS fL/XGt6IhRBWAPfVL815dtbR0SSJGAAlnuNlu/+mQFEmHKIILRPCLHdpn w==; X-IronPort-AV: E=McAfee;i="6500,9779,10478"; a="364351968" X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="364351968" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:46 -0700 X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="762270101" Received: from sknaidu-mobl1.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.165.187]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:27 -0700 From: Kristen Carlson Accardi To: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Jarkko Sakkinen , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: Kristen Carlson Accardi , Sean Christopherson Subject: [RFC PATCH 08/20] x86/sgx: Add EPC page flags to identify type of page Date: Thu, 22 Sep 2022 10:10:45 -0700 Message-Id: <20220922171057.1236139-9-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220922171057.1236139-1-kristen@linux.intel.com> References: <20220922171057.1236139-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Create new flags to help identify whether a page is an enclave page or a va page and save the page type when the page is recorded. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/encl.c | 6 +++--- arch/x86/kernel/cpu/sgx/ioctl.c | 4 ++-- arch/x86/kernel/cpu/sgx/main.c | 20 ++++++++++---------- arch/x86/kernel/cpu/sgx/sgx.h | 8 +++++++- 4 files changed, 22 insertions(+), 16 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index ad611c06798f..672b302f3688 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -252,7 +252,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl, epc_page = sgx_encl_eldu(&encl->secs, NULL); if (IS_ERR(epc_page)) return ERR_CAST(epc_page); - sgx_record_epc_page(epc_page, 0); + sgx_record_epc_page(epc_page, SGX_EPC_PAGE_ENCLAVE); } epc_page = sgx_encl_eldu(entry, encl->secs.epc_page); @@ -260,7 +260,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl, return ERR_CAST(epc_page); encl->secs_child_cnt++; - sgx_record_epc_page(entry->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); + sgx_record_epc_page(entry->epc_page, SGX_EPC_PAGE_ENCLAVE_RECLAIMABLE); return entry; } @@ -1221,7 +1221,7 @@ struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool reclaim) sgx_encl_free_epc_page(epc_page); return ERR_PTR(-EFAULT); } - sgx_record_epc_page(epc_page, 0); + sgx_record_epc_page(epc_page, SGX_EPC_PAGE_VERSION_ARRAY); epc_page->owner = encl; diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index aca80a3f38a1..c91cc6a01232 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -114,7 +114,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs) encl->attributes = secs->attributes; encl->attributes_mask = SGX_ATTR_DEBUG | SGX_ATTR_MODE64BIT | SGX_ATTR_KSS; - sgx_record_epc_page(encl->secs.epc_page, 0); + sgx_record_epc_page(encl->secs.epc_page, SGX_EPC_PAGE_ENCLAVE); /* Set only after completion, as encl->lock has not been taken. */ set_bit(SGX_ENCL_CREATED, &encl->flags); @@ -325,7 +325,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src, goto err_out; } - sgx_record_epc_page(encl_page->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); + sgx_record_epc_page(encl_page->epc_page, SGX_EPC_PAGE_ENCLAVE_RECLAIMABLE); mutex_unlock(&encl->lock); mmap_read_unlock(current->mm); return ret; diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 085c06fdc359..3c0d33b72896 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -304,6 +304,8 @@ static void sgx_reclaim_pages(void) epc_page = list_first_entry(&sgx_global_lru.reclaimable, struct sgx_epc_page, list); encl_page = epc_page->owner; + if (WARN_ON_ONCE(!(epc_page->flags & SGX_EPC_PAGE_ENCLAVE))) + continue; if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) { epc_page->flags |= SGX_EPC_PAGE_RECLAIM_IN_PROGRESS; @@ -360,8 +362,7 @@ static void sgx_reclaim_pages(void) sgx_reclaimer_write(epc_page, &backing[i++]); kref_put(&encl_page->encl->refcount, sgx_encl_release); - epc_page->flags &= ~(SGX_EPC_PAGE_RECLAIMER_TRACKED | - SGX_EPC_PAGE_RECLAIM_IN_PROGRESS); + epc_page->flags &= ~SGX_EPC_PAGE_RECLAIM_FLAGS; sgx_free_epc_page(epc_page); } @@ -496,6 +497,7 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) /** * sgx_record_epc_page() - Add a page to the LRU tracking * @page: EPC page + * @flags: Reclaim flags for the page. * * Mark a page with the specified flags and add it to the appropriate * (un)reclaimable list. @@ -525,18 +527,16 @@ void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags) int sgx_drop_epc_page(struct sgx_epc_page *page) { spin_lock(&sgx_global_lru.lock); - if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) { - /* The page is being reclaimed. */ - if (page->flags & SGX_EPC_PAGE_RECLAIM_IN_PROGRESS) { - spin_unlock(&sgx_global_lru.lock); - return -EBUSY; - } - - page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; + if ((page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) && + (page->flags & SGX_EPC_PAGE_RECLAIM_IN_PROGRESS)) { + spin_unlock(&sgx_global_lru.lock); + return -EBUSY; } list_del(&page->list); spin_unlock(&sgx_global_lru.lock); + page->flags &= ~SGX_EPC_PAGE_RECLAIM_FLAGS; + return 0; } diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 284d0cda9e36..76eae4ecbf87 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -31,8 +31,14 @@ /* page flag to indicate reclaim is in progress */ #define SGX_EPC_PAGE_RECLAIM_IN_PROGRESS BIT(2) +#define SGX_EPC_PAGE_ENCLAVE BIT(3) +#define SGX_EPC_PAGE_VERSION_ARRAY BIT(4) +#define SGX_EPC_PAGE_ENCLAVE_RECLAIMABLE (SGX_EPC_PAGE_ENCLAVE | \ + SGX_EPC_PAGE_RECLAIMER_TRACKED) #define SGX_EPC_PAGE_RECLAIM_FLAGS (SGX_EPC_PAGE_RECLAIMER_TRACKED | \ - SGX_EPC_PAGE_RECLAIM_IN_PROGRESS) + SGX_EPC_PAGE_RECLAIM_IN_PROGRESS | \ + SGX_EPC_PAGE_ENCLAVE | \ + SGX_EPC_PAGE_VERSION_ARRAY) struct sgx_epc_page { unsigned int section; From patchwork Thu Sep 22 17:10:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 12985589 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A235C54EE9 for ; Thu, 22 Sep 2022 17:12:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231792AbiIVRMF (ORCPT ); Thu, 22 Sep 2022 13:12:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231982AbiIVRLx (ORCPT ); Thu, 22 Sep 2022 13:11:53 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0DCF4100AA4; Thu, 22 Sep 2022 10:11:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663866708; x=1695402708; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YYmyfKC3KMAXnaj8uAGWA1KxL3gThquOW2nfx+FoyxM=; b=QB2pUA5r0oWn0mSF4286hqTK36NxZRmicAWUphAmDGHoTlDkESUdczRw B3KhfNXHf0bUC3UHiCyYEDygDvTMGdMmEp8REQ4oX2MlI5+o0LGlYBi/6 WVUJTq/Kn4E53PYMY15hzCy486b73v3mxObds6nfb6+iaamTUiOdWmUmY KpG8Zdi9ZP9JnovkDLU5fKx5R0aA5zatg8caugg/xATJEa8JVn5EUNXkz lNDGkJefxERcNRZah4R6tBE2cZbS9r8v4cEVfVCueHP4Q37/vWaJcyvpU mGSd+LLSMt94XvKAA7NjGtfAiWQ+9ErTXn6nk0EBU0fqRsn0lmeeleCFJ Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10478"; a="299081597" X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="299081597" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:47 -0700 X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="762270115" Received: from sknaidu-mobl1.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.165.187]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:29 -0700 From: Kristen Carlson Accardi To: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Jarkko Sakkinen , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: Kristen Carlson Accardi , Sean Christopherson Subject: [RFC PATCH 09/20] x86/sgx: Allow reclaiming up to 32 pages, but scan 16 by default Date: Thu, 22 Sep 2022 10:10:46 -0700 Message-Id: <20220922171057.1236139-10-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220922171057.1236139-1-kristen@linux.intel.com> References: <20220922171057.1236139-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Modify sgx_reclaim_pages() to take a parameter that specifies the number of pages to scan for reclaiming. Specify a max value of 32, but scan 16 in the usual case. This allows the number of pages sgx_reclaim_pages() scans to be specified by the caller, and adjusted in future patches. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/main.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 3c0d33b72896..0010ed1b2e98 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -18,6 +18,8 @@ #include "encl.h" #include "encls.h" +#define SGX_MAX_NR_TO_RECLAIM 32 + struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS]; static int sgx_nr_epc_sections; static struct task_struct *ksgxd_tsk; @@ -273,7 +275,10 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, mutex_unlock(&encl->lock); } -/* +/** + * sgx_reclaim_pages() - Reclaim EPC pages from the consumers + * @nr_to_scan: Number of EPC pages to scan for reclaim + * * Take a fixed number of pages from the head of the active page pool and * reclaim them to the enclave's private shmem files. Skip the pages, which have * been accessed since the last scan. Move those pages to the tail of active @@ -286,9 +291,9 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, * problematic as it would increase the lock contention too much, which would * halt forward progress. */ -static void sgx_reclaim_pages(void) +static void sgx_reclaim_pages(int nr_to_scan) { - struct sgx_backing backing[SGX_NR_TO_SCAN]; + struct sgx_backing backing[SGX_MAX_NR_TO_RECLAIM]; struct sgx_encl_page *encl_page; struct sgx_epc_page *epc_page, *tmp; pgoff_t page_index; @@ -297,7 +302,7 @@ static void sgx_reclaim_pages(void) int i; spin_lock(&sgx_global_lru.lock); - for (i = 0; i < SGX_NR_TO_SCAN; i++) { + for (i = 0; i < nr_to_scan; i++) { if (list_empty(&sgx_global_lru.reclaimable)) break; @@ -327,7 +332,7 @@ static void sgx_reclaim_pages(void) list_for_each_entry_safe(epc_page, tmp, &iso, list) { encl_page = epc_page->owner; - if (!sgx_reclaimer_age(epc_page)) + if (i == SGX_MAX_NR_TO_RECLAIM || !sgx_reclaimer_age(epc_page)) goto skip; page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base); @@ -384,7 +389,7 @@ static bool sgx_should_reclaim(unsigned long watermark) void sgx_reclaim_direct(void) { if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) - sgx_reclaim_pages(); + sgx_reclaim_pages(SGX_NR_TO_SCAN); } static int ksgxd(void *p) @@ -410,7 +415,7 @@ static int ksgxd(void *p) sgx_should_reclaim(SGX_NR_HIGH_PAGES)); if (sgx_should_reclaim(SGX_NR_HIGH_PAGES)) - sgx_reclaim_pages(); + sgx_reclaim_pages(SGX_NR_TO_SCAN); } return 0; @@ -581,7 +586,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - sgx_reclaim_pages(); + sgx_reclaim_pages(SGX_NR_TO_SCAN); } if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) From patchwork Thu Sep 22 17:10:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 12985590 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90E18C6FA91 for ; Thu, 22 Sep 2022 17:13:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231872AbiIVRNG (ORCPT ); Thu, 22 Sep 2022 13:13:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41758 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231770AbiIVRNE (ORCPT ); Thu, 22 Sep 2022 13:13:04 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0CFCFF8FAF; Thu, 22 Sep 2022 10:13:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663866784; x=1695402784; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=IRVYENCJ7yHBfO6uy21g2daHEDZv9zK8fsTrdOaM4yM=; b=EXxjT2XZLxuD5xD8eD5nTCvlL5NCtCt28zOQLTSRdUbrqH4D/ha+lf7J Oml4nU0ZmmzBymmEDjw0L1f7cydWSZn8bqbSV95VQNL1b0jJ89at3KDY6 V1HXjL0sTGgifDMRBjXRNveDaIsxZN8QMWF+v30He/La0OcvQCjKbNZZL W0XcdBdg0UfkgvVE6mRzQ2BYcyfxusV3NPkAGlaGcEnQhdDK42XS8wNUF NeNn/Un0mhyRBsoMFm+JRxAF0g1RBAljbLxPc8VgtPPJWxdEhX31IPVrM DAY+EHNUg3FQeoXrPYFNsVwJpxQiplZZ/j3G6dB0iyyh9Hy2XbEUlaBoy A==; X-IronPort-AV: E=McAfee;i="6500,9779,10478"; a="283421362" X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="283421362" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:48 -0700 X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="762270126" Received: from sknaidu-mobl1.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.165.187]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:31 -0700 From: Kristen Carlson Accardi To: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Jarkko Sakkinen , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: Kristen Carlson Accardi , Sean Christopherson Subject: [RFC PATCH 10/20] x86/sgx: Return the number of EPC pages that were successfully reclaimed Date: Thu, 22 Sep 2022 10:10:47 -0700 Message-Id: <20220922171057.1236139-11-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220922171057.1236139-1-kristen@linux.intel.com> References: <20220922171057.1236139-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Return the number of reclaimed pages from sgx_reclaim_pages(), the EPC cgroup will use the result to track the success rate of its reclaim calls, e.g. to escalate to a more forceful reclaiming mode if necessary. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/main.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 0010ed1b2e98..fc5aed813834 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -290,8 +290,10 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, * + EWB) but not sufficiently. Reclaiming one page at a time would also be * problematic as it would increase the lock contention too much, which would * halt forward progress. + * + * Return: number of EPC pages reclaimed */ -static void sgx_reclaim_pages(int nr_to_scan) +static int sgx_reclaim_pages(int nr_to_scan) { struct sgx_backing backing[SGX_MAX_NR_TO_RECLAIM]; struct sgx_encl_page *encl_page; @@ -373,6 +375,7 @@ static void sgx_reclaim_pages(int nr_to_scan) } out: cond_resched(); + return i; } static bool sgx_should_reclaim(unsigned long watermark) From patchwork Thu Sep 22 17:10:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 12985592 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0FC4C6FA8B for ; Thu, 22 Sep 2022 17:13:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231915AbiIVRNH (ORCPT ); Thu, 22 Sep 2022 13:13:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41778 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231715AbiIVRNG (ORCPT ); Thu, 22 Sep 2022 13:13:06 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 073A8F85AB; Thu, 22 Sep 2022 10:13:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663866785; x=1695402785; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fdMNPny7GFohOVYJu7HIsWNUmP6XOgvNz9VT3/oKdyc=; b=UezCxTzDIhyqcXaUbH5n4Lkr34G/Qfd/SblliyyA0p7yJ3NWnMajB7fx FRDj83Iy2UAFKJLIyvVdcWLidkxely3pQAGNppg+S1CqkBB2f7rfuHCjS rV9aGaRIC7LVZyjrYGhNwPq+CH6mSUUZgMiTbI2iwLCMtiCIJLrtIqmWY Gh1l6ysVff76Jjq27ki3cfnrGBLYn76dpMp/ckS8Yx1yMQEbiFsiSWZMb 8ltHspwPMt2MprREQcoABQRcohCZTwxY/t5sxBcSXe4plB0IlaAv7vaWm vewp2sFwrIcWE2LFhIeRt4JNQ1zZh5FjQgQiiLbsStM3HWxz+mnjHPw2I Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10478"; a="283421367" X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="283421367" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:48 -0700 X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="762270140" Received: from sknaidu-mobl1.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.165.187]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:33 -0700 From: Kristen Carlson Accardi To: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Jarkko Sakkinen , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: Kristen Carlson Accardi , Sean Christopherson Subject: [RFC PATCH 11/20] x86/sgx: Add option to ignore age of page during EPC reclaim Date: Thu, 22 Sep 2022 10:10:48 -0700 Message-Id: <20220922171057.1236139-12-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220922171057.1236139-1-kristen@linux.intel.com> References: <20220922171057.1236139-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Add a flag to sgx_reclaim_pages() to instruct it to ignore the age of page, i.e. reclaim the page even if it's young. The EPC cgroup will use the flag to enforce its limits by draining the reclaimable lists before resorting to other measures, e.g. forcefully reclaimable "unreclaimable" pages by killing enclaves. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/main.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index fc5aed813834..98531f6fb448 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -278,6 +278,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, /** * sgx_reclaim_pages() - Reclaim EPC pages from the consumers * @nr_to_scan: Number of EPC pages to scan for reclaim + * @ignore_age: Reclaim a page even if it is young * * Take a fixed number of pages from the head of the active page pool and * reclaim them to the enclave's private shmem files. Skip the pages, which have @@ -293,7 +294,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, * * Return: number of EPC pages reclaimed */ -static int sgx_reclaim_pages(int nr_to_scan) +static int sgx_reclaim_pages(int nr_to_scan, bool ignore_age) { struct sgx_backing backing[SGX_MAX_NR_TO_RECLAIM]; struct sgx_encl_page *encl_page; @@ -334,7 +335,8 @@ static int sgx_reclaim_pages(int nr_to_scan) list_for_each_entry_safe(epc_page, tmp, &iso, list) { encl_page = epc_page->owner; - if (i == SGX_MAX_NR_TO_RECLAIM || !sgx_reclaimer_age(epc_page)) + if (i == SGX_MAX_NR_TO_RECLAIM || + (!ignore_age && !sgx_reclaimer_age(epc_page))) goto skip; page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base); @@ -392,7 +394,7 @@ static bool sgx_should_reclaim(unsigned long watermark) void sgx_reclaim_direct(void) { if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) - sgx_reclaim_pages(SGX_NR_TO_SCAN); + sgx_reclaim_pages(SGX_NR_TO_SCAN, false); } static int ksgxd(void *p) @@ -418,7 +420,7 @@ static int ksgxd(void *p) sgx_should_reclaim(SGX_NR_HIGH_PAGES)); if (sgx_should_reclaim(SGX_NR_HIGH_PAGES)) - sgx_reclaim_pages(SGX_NR_TO_SCAN); + sgx_reclaim_pages(SGX_NR_TO_SCAN, false); } return 0; @@ -589,7 +591,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - sgx_reclaim_pages(SGX_NR_TO_SCAN); + sgx_reclaim_pages(SGX_NR_TO_SCAN, false); } if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) From patchwork Thu Sep 22 17:10:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 12985593 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 360E8C6FA82 for ; Thu, 22 Sep 2022 17:13:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231934AbiIVRNJ (ORCPT ); Thu, 22 Sep 2022 13:13:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231838AbiIVRNH (ORCPT ); Thu, 22 Sep 2022 13:13:07 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B8EE2EF2D; Thu, 22 Sep 2022 10:13:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663866786; x=1695402786; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=uAqBkTUmYYX0pPDN5Vz9uab6QodYLXhHPk1Zsj7TxBE=; b=Rph/LjkQFeQlhV2N9AiBG6/YOR618v073CZiFSpwl4DbfSV5QETh98AJ x7lNyG0dtr8G985Fbmb0utazLPmgy0vwsUfShXhe/6Ek2JpbKh9kjXe66 zXsZQ6oguTSXlUd+OzOc7eFHYf72ZqcvJzUiR9H1c89wPkFJ5cQP/Mwtn UQbfA19YZBNYi8Imei/hKA5M7YBWAMq0SdcCSTNa8J8nX14TTOqLjGkdk QtIhAnpNo39P9UXuScZXaKkko1Zq3XK5b5jV0ZUIDE84dsxfGHtrFU3Xp kcOopBzT8MIhmLPn6wF9ZqSrQG/fELliZ4WX6s3Oe53Hxt6i5dQMwMHhz g==; X-IronPort-AV: E=McAfee;i="6500,9779,10478"; a="283421376" X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="283421376" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:49 -0700 X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="762270151" Received: from sknaidu-mobl1.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.165.187]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:34 -0700 From: Kristen Carlson Accardi To: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Jarkko Sakkinen , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: Kristen Carlson Accardi , Sean Christopherson Subject: [RFC PATCH 12/20] x86/sgx: Add helper to retrieve SGX EPC LRU given an EPC page Date: Thu, 22 Sep 2022 10:10:49 -0700 Message-Id: <20220922171057.1236139-13-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220922171057.1236139-1-kristen@linux.intel.com> References: <20220922171057.1236139-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Introduce a function that will be used to retrieve an LRU from an EPC page. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/main.c | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 98531f6fb448..9f2cb264a347 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -31,6 +31,10 @@ static DEFINE_XARRAY(sgx_epc_address_space); * with sgx_global_lru.lock acquired. */ static struct sgx_epc_lru sgx_global_lru; +static inline struct sgx_epc_lru *sgx_lru(struct sgx_epc_page *epc_page) +{ + return &sgx_global_lru; +} static atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0); @@ -299,6 +303,7 @@ static int sgx_reclaim_pages(int nr_to_scan, bool ignore_age) struct sgx_backing backing[SGX_MAX_NR_TO_RECLAIM]; struct sgx_encl_page *encl_page; struct sgx_epc_page *epc_page, *tmp; + struct sgx_epc_lru *lru; pgoff_t page_index; LIST_HEAD(iso); int ret; @@ -354,10 +359,11 @@ static int sgx_reclaim_pages(int nr_to_scan, bool ignore_age) continue; skip: - spin_lock(&sgx_global_lru.lock); + lru = sgx_lru(epc_page); + spin_lock(&lru->lock); epc_page->flags &= ~SGX_EPC_PAGE_RECLAIM_IN_PROGRESS; - list_move_tail(&epc_page->list, &sgx_global_lru.reclaimable); - spin_unlock(&sgx_global_lru.lock); + list_move_tail(&epc_page->list, &lru->reclaimable); + spin_unlock(&lru->lock); kref_put(&encl_page->encl->refcount, sgx_encl_release); } @@ -514,14 +520,16 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) */ void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags) { - spin_lock(&sgx_global_lru.lock); + struct sgx_epc_lru *lru = sgx_lru(page); + + spin_lock(&lru->lock); WARN_ON(page->flags & SGX_EPC_PAGE_RECLAIM_FLAGS); page->flags |= flags; if (flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) - list_add_tail(&page->list, &sgx_global_lru.reclaimable); + list_add_tail(&page->list, &lru->reclaimable); else - list_add_tail(&page->list, &sgx_global_lru.unreclaimable); - spin_unlock(&sgx_global_lru.lock); + list_add_tail(&page->list, &lru->unreclaimable); + spin_unlock(&lru->lock); } /** @@ -536,14 +544,16 @@ void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags) */ int sgx_drop_epc_page(struct sgx_epc_page *page) { - spin_lock(&sgx_global_lru.lock); + struct sgx_epc_lru *lru = sgx_lru(page); + + spin_lock(&lru->lock); if ((page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) && (page->flags & SGX_EPC_PAGE_RECLAIM_IN_PROGRESS)) { - spin_unlock(&sgx_global_lru.lock); + spin_unlock(&lru->lock); return -EBUSY; } list_del(&page->list); - spin_unlock(&sgx_global_lru.lock); + spin_unlock(&lru->lock); page->flags &= ~SGX_EPC_PAGE_RECLAIM_FLAGS; From patchwork Thu Sep 22 17:10:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 12985591 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B380BC6FA92 for ; Thu, 22 Sep 2022 17:13:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231910AbiIVRNI (ORCPT ); Thu, 22 Sep 2022 13:13:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41784 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231896AbiIVRNG (ORCPT ); Thu, 22 Sep 2022 13:13:06 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29C9DFFA4A; Thu, 22 Sep 2022 10:13:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663866785; x=1695402785; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=n2WN4hJfwaj5ptNYeIEmphrPKVncHIzDa6421dNQKGI=; b=Rnn9FtAG7nyMCpIAzBbAQokX0UvsOOHxXmGOB+Fl1xf/FwWS2CrdQ6yT 4627OM/+9OgTfRvzzxFi+4EzguxLcMC9EAMZloCAfXVErx7RttH+2Qa4S Or2zewLtA+UtSEv2NLdSNvFQD2/zbA/ttPHPTZSktaJmwGJ7tSTkMd6YJ TDmhGih/+BKw7WPmgskNOIdtdMgjkvwUFyvcyxaGS6tmWU7Z2wzBwuizx TQusCoPQOBNWzOt8AG5Eybnu5xRCb+s/YeFWZKQbfd1Zw4IuDLKnAVq2i xqkx/ShVvbUcNAwJrEGZh9UD6hgmKwils6Rk1vgflj8lJFSQaG5kYHhfO Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10478"; a="283421372" X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="283421372" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:48 -0700 X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="762270157" Received: from sknaidu-mobl1.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.165.187]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:36 -0700 From: Kristen Carlson Accardi To: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Jarkko Sakkinen , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: Kristen Carlson Accardi , Sean Christopherson Subject: [RFC PATCH 13/20] x86/sgx: Prepare for multiple LRUs Date: Thu, 22 Sep 2022 10:10:50 -0700 Message-Id: <20220922171057.1236139-14-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220922171057.1236139-1-kristen@linux.intel.com> References: <20220922171057.1236139-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Add sgx_can_reclaim() wrapper so that in a subsequent patch, multiple LRUs can be used cleanly. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/main.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 9f2cb264a347..ac49346302ed 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -386,10 +386,15 @@ static int sgx_reclaim_pages(int nr_to_scan, bool ignore_age) return i; } +static bool sgx_can_reclaim(void) +{ + return !list_empty(&sgx_global_lru.reclaimable); +} + static bool sgx_should_reclaim(unsigned long watermark) { return atomic_long_read(&sgx_nr_free_pages) < watermark && - !list_empty(&sgx_global_lru.reclaimable); + sgx_can_reclaim(); } /* @@ -588,7 +593,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - if (list_empty(&sgx_global_lru.reclaimable)) + if (!sgx_can_reclaim()) return ERR_PTR(-ENOMEM); if (!reclaim) { From patchwork Thu Sep 22 17:10:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 12985595 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74676C54EE9 for ; Thu, 22 Sep 2022 17:13:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230073AbiIVRNL (ORCPT ); Thu, 22 Sep 2022 13:13:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41812 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231913AbiIVRNH (ORCPT ); Thu, 22 Sep 2022 13:13:07 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A0CF4F8FAF; Thu, 22 Sep 2022 10:13:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663866786; x=1695402786; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9bTNtbUwu0kZhlILKHr0tCSy+BWIa2iLIfhpMM9AQno=; b=jynNwmPx9NL7ypLTViDkhkpeMfvtxMYIPJBeskfqrC6lypCE3UxYZY+S iLfYO7GioUP6k+hoeqjOtIZmobHM/5KwN/6iUyl8QPCDf11vgY8j5ABjq wbNzN1Dl7Nmi/7qh1XvcNuS6rW/F8ES+Zz3tpSdi8RM3MG/aQQQOVGg/d 3xPqXVfOl0O8rTWRyxoVkjJle3OjTvXKVTaj3T6Go7PLtMZaHYFElWfr0 VR9vyj58ukH88drGLm4BTGCCWzeUbqgkUhu5YYt67zxWAOp/0bqKm+xCV Vdik5I7sOF4K/lw+vqqbqsBX7vfMaqGL38tch93DcuKTEp0Up1Gac5JbY w==; X-IronPort-AV: E=McAfee;i="6500,9779,10478"; a="283421386" X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="283421386" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:49 -0700 X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="762270162" Received: from sknaidu-mobl1.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.165.187]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:38 -0700 From: Kristen Carlson Accardi To: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Jarkko Sakkinen , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: Kristen Carlson Accardi , Sean Christopherson Subject: [RFC PATCH 14/20] x86/sgx: Expose sgx_reclaim_pages() for use by EPC cgroup Date: Thu, 22 Sep 2022 10:10:51 -0700 Message-Id: <20220922171057.1236139-15-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220922171057.1236139-1-kristen@linux.intel.com> References: <20220922171057.1236139-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Expose the top-level reclaim function as sgx_reclaim_epc_pages() for use by the upcoming EPC cgroup, which will initiate reclaim to enforce changes to high/max limits. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/main.c | 9 +++++---- arch/x86/kernel/cpu/sgx/sgx.h | 1 + 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index ac49346302ed..1791881aa1b1 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -281,6 +281,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, /** * sgx_reclaim_pages() - Reclaim EPC pages from the consumers + * sgx_reclaim_epc_pages() - Reclaim EPC pages from the consumers * @nr_to_scan: Number of EPC pages to scan for reclaim * @ignore_age: Reclaim a page even if it is young * @@ -298,7 +299,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, * * Return: number of EPC pages reclaimed */ -static int sgx_reclaim_pages(int nr_to_scan, bool ignore_age) +int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age) { struct sgx_backing backing[SGX_MAX_NR_TO_RECLAIM]; struct sgx_encl_page *encl_page; @@ -405,7 +406,7 @@ static bool sgx_should_reclaim(unsigned long watermark) void sgx_reclaim_direct(void) { if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) - sgx_reclaim_pages(SGX_NR_TO_SCAN, false); + sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false); } static int ksgxd(void *p) @@ -431,7 +432,7 @@ static int ksgxd(void *p) sgx_should_reclaim(SGX_NR_HIGH_PAGES)); if (sgx_should_reclaim(SGX_NR_HIGH_PAGES)) - sgx_reclaim_pages(SGX_NR_TO_SCAN, false); + sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false); } return 0; @@ -606,7 +607,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - sgx_reclaim_pages(SGX_NR_TO_SCAN, false); + sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false); } if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 76eae4ecbf87..a2042303a666 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -113,6 +113,7 @@ void sgx_reclaim_direct(void); void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags); int sgx_drop_epc_page(struct sgx_epc_page *page); struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim); +int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age); void sgx_ipi_cb(void *info); From patchwork Thu Sep 22 17:10:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 12985594 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93E71C6FA92 for ; Thu, 22 Sep 2022 17:13:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231946AbiIVRNK (ORCPT ); Thu, 22 Sep 2022 13:13:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41806 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231904AbiIVRNH (ORCPT ); Thu, 22 Sep 2022 13:13:07 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C3DDF8595; Thu, 22 Sep 2022 10:13:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663866786; x=1695402786; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=op2/5yqxd7J9ovTWhDUl48f07ozoWvtTyl9KC7Oy26o=; b=niBZMfobb9IJr4uj6nd1v5YrrGcPHm0dy5USD244PtxSe8gBA+e06JQD 3kRtYCla4Lb1nUXO8UuRzwmXRy/oIBr4KGrVFlLr3kbe8u9u9iVzst+ly eDr8Gz9bXtM6d67HuCJiensDQwqNtyUXJpCFAzlKJE/B8SAfeg9iuNmpT FZz0doqGBDQZbmMWRBFgg+Ea7wuItW9zFmDxlxHfuIO5SQdvSf9C9qNfO kaBEv6wrxFA+vZaUttw5Zh6eD6hiOPqPcdeJFdhiCTMxei7ILnbgWZkho oHoLVubcHnS5a7ZW4x3fOuO8Xew7HI1x5mdFM3o5WVik4Vwx884DspHOk g==; X-IronPort-AV: E=McAfee;i="6500,9779,10478"; a="283421379" X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="283421379" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:49 -0700 X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="762270184" Received: from sknaidu-mobl1.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.165.187]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:42 -0700 From: Kristen Carlson Accardi To: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Jarkko Sakkinen , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: Kristen Carlson Accardi , Sean Christopherson Subject: [RFC PATCH 15/20] x86/sgx: Add helper to grab pages from an arbitrary EPC LRU Date: Thu, 22 Sep 2022 10:10:52 -0700 Message-Id: <20220922171057.1236139-16-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220922171057.1236139-1-kristen@linux.intel.com> References: <20220922171057.1236139-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Move the isolation loop into a standalone helper, sgx_isolate_pages(), in preparation for existence of multiple LRUs. Expose the helper to other SGX code so that it can be called from the EPC cgroup code, e.g. to isolate pages from a single cgroup LRU. Exposing the isolation loop allows the cgroup iteration logic to be wholly encapsulated within the cgroup code. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/main.c | 72 ++++++++++++++++++++-------------- arch/x86/kernel/cpu/sgx/sgx.h | 2 + 2 files changed, 45 insertions(+), 29 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 1791881aa1b1..151ad720a4ec 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -280,10 +280,47 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, } /** - * sgx_reclaim_pages() - Reclaim EPC pages from the consumers + * sgx_isolate_epc_pages - Isolate pages from an LRU for reclaim + * @lru LRU from which to reclaim + * @nr_to_scan Number of pages to scan for reclaim + * @dst Destination list to hold the isolated pages + */ +void sgx_isolate_epc_pages(struct sgx_epc_lru *lru, int *nr_to_scan, + struct list_head *dst) +{ + struct sgx_encl_page *encl_page; + struct sgx_epc_page *epc_page; + + spin_lock(&lru->lock); + for (; *nr_to_scan > 0; --(*nr_to_scan)) { + if (list_empty(&lru->reclaimable)) + break; + + epc_page = list_first_entry(&lru->reclaimable, + struct sgx_epc_page, list); + + encl_page = epc_page->owner; + if (WARN_ON_ONCE(!(epc_page->flags & SGX_EPC_PAGE_ENCLAVE))) + continue; + + if (kref_get_unless_zero(&encl_page->encl->refcount)) { + epc_page->flags |= SGX_EPC_PAGE_RECLAIM_IN_PROGRESS; + list_move_tail(&epc_page->list, dst); + } else { + /* The owner is freeing the page, remove it from the + * LRU list + */ + epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; + list_del_init(&epc_page->list); + } + } + spin_unlock(&lru->lock); +} + +/** * sgx_reclaim_epc_pages() - Reclaim EPC pages from the consumers - * @nr_to_scan: Number of EPC pages to scan for reclaim - * @ignore_age: Reclaim a page even if it is young + * @nr_to_scan: Number of EPC pages to scan for reclaim + * @ignore_age: Reclaim a page even if it is young * * Take a fixed number of pages from the head of the active page pool and * reclaim them to the enclave's private shmem files. Skip the pages, which have @@ -302,42 +339,19 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age) { struct sgx_backing backing[SGX_MAX_NR_TO_RECLAIM]; - struct sgx_encl_page *encl_page; struct sgx_epc_page *epc_page, *tmp; + struct sgx_encl_page *encl_page; struct sgx_epc_lru *lru; pgoff_t page_index; LIST_HEAD(iso); + int i = 0; int ret; - int i; - - spin_lock(&sgx_global_lru.lock); - for (i = 0; i < nr_to_scan; i++) { - if (list_empty(&sgx_global_lru.reclaimable)) - break; - - epc_page = list_first_entry(&sgx_global_lru.reclaimable, - struct sgx_epc_page, list); - encl_page = epc_page->owner; - if (WARN_ON_ONCE(!(epc_page->flags & SGX_EPC_PAGE_ENCLAVE))) - continue; - if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) { - epc_page->flags |= SGX_EPC_PAGE_RECLAIM_IN_PROGRESS; - list_move_tail(&epc_page->list, &iso); - } else { - /* The owner is freeing the page, remove it from the - * LRU list - */ - epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; - list_del_init(&epc_page->list); - } - } - spin_unlock(&sgx_global_lru.lock); + sgx_isolate_epc_pages(&sgx_global_lru, &nr_to_scan, &iso); if (list_empty(&iso)) goto out; - i = 0; list_for_each_entry_safe(epc_page, tmp, &iso, list) { encl_page = epc_page->owner; diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index a2042303a666..0598d534371b 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -114,6 +114,8 @@ void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags); int sgx_drop_epc_page(struct sgx_epc_page *page); struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim); int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age); +void sgx_isolate_epc_pages(struct sgx_epc_lru *lru, int *nr_to_scan, + struct list_head *dst); void sgx_ipi_cb(void *info); From patchwork Thu Sep 22 17:10:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 12985596 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB74EC6FA8B for ; Thu, 22 Sep 2022 17:13:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229794AbiIVRNM (ORCPT ); Thu, 22 Sep 2022 13:13:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41828 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230094AbiIVRNI (ORCPT ); Thu, 22 Sep 2022 13:13:08 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 390ACFF3D8; Thu, 22 Sep 2022 10:13:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663866787; x=1695402787; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9mFYnX2TOYVcASIcWsVppJeAe7R/9ztKVDQfD22SjbY=; b=PyJ3lDqozoWG3frJU3f4rRXYxAEzRRsIQOdPqjLPYOJfYk12oQkW4V5Z O6Q0oN/GjhJTU6JPGRjcaGlzygE4CHxbMc4+khrmhe2YNlw0v+8IzaNv9 Do585ESEc82eO+HyQJnrdSdhBTvwQfIAbisFLeI5urUrgFG6+pOBTF9PC mbisa0u2v6ybMxVLKmbkHA8Dt3voFHdUT6uOuFuNpObFpJ8xwR7fTxxiO 7OWrQZI6WfIbzKXC02BFbHIeSP4RUNhC2DW9wPi/zEkI4qlEjOJMrMbIF zao/t+JqESfx4Pdz1PgPn6PmLbvdc09ElpltAuwwfJSNE/YPAaOHLcu5Y g==; X-IronPort-AV: E=McAfee;i="6500,9779,10478"; a="283421390" X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="283421390" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:49 -0700 X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="762270194" Received: from sknaidu-mobl1.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.165.187]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:47 -0700 From: Kristen Carlson Accardi To: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Jarkko Sakkinen , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: Kristen Carlson Accardi , Sean Christopherson Subject: [RFC PATCH 16/20] x86/sgx: Add EPC OOM path to forcefully reclaim EPC Date: Thu, 22 Sep 2022 10:10:53 -0700 Message-Id: <20220922171057.1236139-17-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220922171057.1236139-1-kristen@linux.intel.com> References: <20220922171057.1236139-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Introduce the OOM path for killing an enclave with the reclaimer is no longer able to reclaim enough EPC pages. Find a victim enclave, which will be an enclave with EPC pages remaining that are not accessible to the reclaimer ("unreclaimable"). Once a victim is identified, mark the enclave as OOM and zap the enclaves entire page range. Release all the enclaves resources except for the struct sgx_encl memory itself. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/encl.c | 74 +++++++++++++++--- arch/x86/kernel/cpu/sgx/encl.h | 2 + arch/x86/kernel/cpu/sgx/main.c | 135 +++++++++++++++++++++++++++++++++ arch/x86/kernel/cpu/sgx/sgx.h | 1 + 4 files changed, 201 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 672b302f3688..fe6f0a62c4f1 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -622,7 +622,8 @@ static int sgx_vma_access(struct vm_area_struct *vma, unsigned long addr, if (!encl) return -EFAULT; - if (!test_bit(SGX_ENCL_DEBUG, &encl->flags)) + if (!test_bit(SGX_ENCL_DEBUG, &encl->flags) || + test_bit(SGX_ENCL_OOM, &encl->flags)) return -EFAULT; for (i = 0; i < len; i += cnt) { @@ -668,16 +669,8 @@ const struct vm_operations_struct sgx_vm_ops = { .access = sgx_vma_access, }; -/** - * sgx_encl_release - Destroy an enclave instance - * @ref: address of a kref inside &sgx_encl - * - * Used together with kref_put(). Frees all the resources associated with the - * enclave and the instance itself. - */ -void sgx_encl_release(struct kref *ref) +static void __sgx_encl_release(struct sgx_encl *encl) { - struct sgx_encl *encl = container_of(ref, struct sgx_encl, refcount); struct sgx_va_page *va_page; struct sgx_encl_page *entry; unsigned long index; @@ -712,7 +705,7 @@ void sgx_encl_release(struct kref *ref) while (!list_empty(&encl->va_pages)) { va_page = list_first_entry(&encl->va_pages, struct sgx_va_page, list); - list_del(&va_page->list); + list_del_init(&va_page->list); sgx_drop_epc_page(va_page->epc_page); sgx_encl_free_epc_page(va_page->epc_page); kfree(va_page); @@ -728,10 +721,66 @@ void sgx_encl_release(struct kref *ref) /* Detect EPC page leak's. */ WARN_ON_ONCE(encl->secs_child_cnt); WARN_ON_ONCE(encl->secs.epc_page); +} + +/** + * sgx_encl_release - Destroy an enclave instance + * @ref: address of a kref inside &sgx_encl + * + * Used together with kref_put(). Frees all the resources associated with the + * enclave and the instance itself. + */ +void sgx_encl_release(struct kref *ref) +{ + struct sgx_encl *encl = container_of(ref, struct sgx_encl, refcount); + + /* if the enclave was OOM killed previously, it just needs to be freed */ + if (!test_bit(SGX_ENCL_OOM, &encl->flags)) + __sgx_encl_release(encl); kfree(encl); } +/** + * sgx_encl_destroy - prepare the enclave for release + * @encl: address of the sgx_encl to drain + * + * Used during oom kill to empty the mm_list entries after they have + * been zapped. Release the remaining enclave resources without freeing + * struct sgx_encl. + */ +void sgx_encl_destroy(struct sgx_encl *encl) +{ + struct sgx_encl_mm *encl_mm; + + for ( ; ; ) { + spin_lock(&encl->mm_lock); + + if (list_empty(&encl->mm_list)) { + encl_mm = NULL; + } else { + encl_mm = list_first_entry(&encl->mm_list, + struct sgx_encl_mm, list); + list_del_rcu(&encl_mm->list); + } + + spin_unlock(&encl->mm_lock); + + /* The enclave is no longer mapped by any mm. */ + if (!encl_mm) + break; + + synchronize_srcu(&encl->srcu); + mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm); + kfree(encl_mm); + + /* 'encl_mm' is gone, put encl_mm->encl reference: */ + kref_put(&encl->refcount, sgx_encl_release); + } + + __sgx_encl_release(encl); +} + /* * 'mm' is exiting and no longer needs mmu notifications. */ @@ -801,6 +850,9 @@ int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm) struct sgx_encl_mm *encl_mm; int ret; + if (test_bit(SGX_ENCL_OOM, &encl->flags)) + return -ENOMEM; + /* * Even though a single enclave may be mapped into an mm more than once, * each 'mm' only appears once on encl->mm_list. This is guaranteed by diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h index 831d63f80f5a..f4935632e53a 100644 --- a/arch/x86/kernel/cpu/sgx/encl.h +++ b/arch/x86/kernel/cpu/sgx/encl.h @@ -39,6 +39,7 @@ enum sgx_encl_flags { SGX_ENCL_DEBUG = BIT(1), SGX_ENCL_CREATED = BIT(2), SGX_ENCL_INITIALIZED = BIT(3), + SGX_ENCL_OOM = BIT(4), }; struct sgx_encl_mm { @@ -125,5 +126,6 @@ struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl, unsigned long addr); struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl, bool reclaim); void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page); +void sgx_encl_destroy(struct sgx_encl *encl); #endif /* _X86_ENCL_H */ diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 151ad720a4ec..082c08228840 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -657,6 +657,141 @@ void sgx_free_epc_page(struct sgx_epc_page *page) atomic_long_inc(&sgx_nr_free_pages); } +static bool sgx_oom_get_ref(struct sgx_epc_page *epc_page) +{ + struct sgx_encl *encl; + + if (epc_page->flags & SGX_EPC_PAGE_ENCLAVE) + encl = ((struct sgx_encl_page *)epc_page->owner)->encl; + else if (epc_page->flags & SGX_EPC_PAGE_VERSION_ARRAY) + encl = epc_page->owner; + else + return false; + + return kref_get_unless_zero(&encl->refcount); +} + +static struct sgx_epc_page *sgx_oom_get_victim(struct sgx_epc_lru *lru) +{ + struct sgx_epc_page *epc_page, *tmp; + + if (list_empty(&lru->unreclaimable)) + return NULL; + + list_for_each_entry_safe(epc_page, tmp, &lru->unreclaimable, list) { + list_del_init(&epc_page->list); + + if (sgx_oom_get_ref(epc_page)) + return epc_page; + } + return NULL; +} + +static void sgx_epc_oom_zap(void *owner, struct mm_struct *mm, unsigned long start, + unsigned long end, const struct vm_operations_struct *ops) +{ + struct vm_area_struct *vma, *tmp; + unsigned long vm_end; + + vma = find_vma(mm, start); + if (!vma || vma->vm_ops != ops || vma->vm_private_data != owner || + vma->vm_start >= end) + return; + + for (tmp = vma; tmp->vm_start < end; tmp = tmp->vm_next) { + do { + vm_end = tmp->vm_end; + tmp = tmp->vm_next; + } while (tmp && tmp->vm_ops == ops && + vma->vm_private_data == owner && tmp->vm_start < end); + + zap_page_range(vma, vma->vm_start, vm_end - vma->vm_start); + + if (!tmp) + break; + } +} + +static void sgx_oom_encl(struct sgx_encl *encl) +{ + unsigned long mm_list_version; + struct sgx_encl_mm *encl_mm; + int idx; + + set_bit(SGX_ENCL_OOM, &encl->flags); + + if (!test_bit(SGX_ENCL_CREATED, &encl->flags)) + goto out; + + do { + mm_list_version = encl->mm_list_version; + + /* Pairs with smp_rmb() in sgx_encl_mm_add(). */ + smp_rmb(); + + idx = srcu_read_lock(&encl->srcu); + + list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) { + if (!mmget_not_zero(encl_mm->mm)) + continue; + + mmap_read_lock(encl_mm->mm); + + sgx_epc_oom_zap(encl, encl_mm->mm, encl->base, + encl->base + encl->size, &sgx_vm_ops); + + mmap_read_unlock(encl_mm->mm); + + mmput_async(encl_mm->mm); + } + + srcu_read_unlock(&encl->srcu, idx); + } while (WARN_ON_ONCE(encl->mm_list_version != mm_list_version)); + + mutex_lock(&encl->lock); + sgx_encl_destroy(encl); + mutex_unlock(&encl->lock); + +out: + /* + * This puts the refcount we took when we identified this enclave as + * an OOM victim. + */ + kref_put(&encl->refcount, sgx_encl_release); +} + +static inline void sgx_oom_encl_page(struct sgx_encl_page *encl_page) +{ + return sgx_oom_encl(encl_page->encl); +} + +/** + * sgx_epc_oom() - invoke EPC out-of-memory handling on target LRU + * @lru: LRU that is low + * + * Return: %true if a victim was found and kicked. + */ +bool sgx_epc_oom(struct sgx_epc_lru *lru) +{ + struct sgx_epc_page *victim; + + spin_lock(&lru->lock); + victim = sgx_oom_get_victim(lru); + spin_unlock(&lru->lock); + + if (!victim) + return false; + + if (victim->flags & SGX_EPC_PAGE_ENCLAVE) + sgx_oom_encl_page(victim->owner); + else if (victim->flags & SGX_EPC_PAGE_VERSION_ARRAY) + sgx_oom_encl(victim->owner); + else + WARN_ON_ONCE(1); + + return true; +} + static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size, unsigned long index, struct sgx_epc_section *section) diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 0598d534371b..a4c7ee0a4958 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -116,6 +116,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim); int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age); void sgx_isolate_epc_pages(struct sgx_epc_lru *lru, int *nr_to_scan, struct list_head *dst); +bool sgx_epc_oom(struct sgx_epc_lru *lru); void sgx_ipi_cb(void *info); From patchwork Thu Sep 22 17:10:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 12985599 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16B3FC6FA8B for ; Thu, 22 Sep 2022 17:14:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231974AbiIVROb (ORCPT ); Thu, 22 Sep 2022 13:14:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42692 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231551AbiIVROC (ORCPT ); Thu, 22 Sep 2022 13:14:02 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0C0C9103FFA; Thu, 22 Sep 2022 10:13:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663866839; x=1695402839; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SG+j0Ze6lCik/OeaYt6iz9BizoJpGhJOsGDfe2GFcks=; b=LViibgw+AljhGceNauUL9QYi5ahsRLAxDyCGC8nK+q9p2Yo4Zmd0/ziN Hg0l/15MvIVp8R6R1m9V5Ko11w2Fjw2RIKdC2Mu1KFtNuVZnzt21YU44Y pM8qDpRmwQsJ8HTl7BHnIgS/rDOAs61Nf44jq3ucCtRHcIqRrs+kWfjRO WhXQeMRcwCm0gytSODPVIDGMkh09rr8MJxI+IRaOiXyTkd5aGHzYDJWXD R9KY1V/2ZkQStwvVXaWsJDxJLgDU//9l6iYFhYMEd3ALLQaR/Qj/+rgBm 1aelrzUlP1q83DZsOioYdsbSzVxWmCQ3ia6K1qgC1gjWnYQZxquDMeA65 A==; X-IronPort-AV: E=McAfee;i="6500,9779,10478"; a="283421611" X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="283421611" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:12:31 -0700 X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="762270229" Received: from sknaidu-mobl1.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.165.187]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:49 -0700 From: Kristen Carlson Accardi To: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Jarkko Sakkinen , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , Tejun Heo , Zefan Li , Johannes Weiner Cc: Kristen Carlson Accardi , Sean Christopherson Subject: [RFC PATCH 17/20] cgroup, x86/sgx: Add SGX EPC cgroup controller Date: Thu, 22 Sep 2022 10:10:54 -0700 Message-Id: <20220922171057.1236139-18-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220922171057.1236139-1-kristen@linux.intel.com> References: <20220922171057.1236139-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Implement a cgroup controller, sgx_epc, which regulates distribution of SGX Enclave Page Cache (EPC) memory. EPC memory is independent from normal system memory, e.g. must be reserved at boot from RAM and cannot be converted between EPC and normal memory while the system is running. EPC is managed by the SGX subsystem and is not accounted by the memory controller. Much like normal system memory, EPC memory can be overcommitted via virtual memory techniques and pages can be swapped out of the EPC to their backing store (normal system memory, e.g. shmem). The SGX EPC subsystem is analogous to the memory subsytem and the SGX EPC controller is in turn analogous to the memory controller; it implements limit and protection models for EPC memory. "sgx_epc.high" and "sgx_epc.low" are the main mechanisms to control EPC usage, while "sgx_epc.max" is a last line of defense mechanism. "sgx_epc.high" is a best-effort limit of EPC usage. "sgx_epc.low" is a best-effort protection of EPC usage. "sgx_epc.max" is a hard limit of EPC usage. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/Makefile | 1 + arch/x86/kernel/cpu/sgx/epc_cgroup.c | 830 +++++++++++++++++++++++++++ arch/x86/kernel/cpu/sgx/epc_cgroup.h | 37 ++ include/linux/cgroup_subsys.h | 4 + init/Kconfig | 12 + 5 files changed, 884 insertions(+) create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.c create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.h diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile index 9c1656779b2a..12901a488da7 100644 --- a/arch/x86/kernel/cpu/sgx/Makefile +++ b/arch/x86/kernel/cpu/sgx/Makefile @@ -4,3 +4,4 @@ obj-y += \ ioctl.o \ main.o obj-$(CONFIG_X86_SGX_KVM) += virt.o +obj-$(CONFIG_CGROUP_SGX_EPC) += epc_cgroup.o diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c b/arch/x86/kernel/cpu/sgx/epc_cgroup.c new file mode 100644 index 000000000000..0a61bb8548ff --- /dev/null +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c @@ -0,0 +1,830 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright(c) 2022 Intel Corporation. + +#include +#include +#include +#include +#include +#include + +#include "epc_cgroup.h" + +#define SGX_EPC_RECLAIM_MIN_PAGES 16UL +#define SGX_EPC_RECLAIM_MAX_PAGES 64UL +#define SGX_EPC_RECLAIM_IGNORE_AGE_THRESHOLD 5 +#define SGX_EPC_RECLAIM_OOM_THRESHOLD 5 + +struct sgx_epc_reclaim_control { + struct sgx_epc_cgroup *epc_cg; + int nr_fails; + bool ignore_age; +}; + +static struct sgx_epc_cgroup *root_epc_cgroup __read_mostly; +static struct workqueue_struct *sgx_epc_cg_wq; + +static int __init sgx_epc_cgroup_init(void) +{ + if (!boot_cpu_has(X86_FEATURE_SGX)) + return 0; + + sgx_epc_cg_wq = alloc_workqueue("sgx_epc_cg_wq", + WQ_UNBOUND | WQ_FREEZABLE, + WQ_UNBOUND_MAX_ACTIVE); + BUG_ON(!sgx_epc_cg_wq); + + return 0; +} +subsys_initcall(sgx_epc_cgroup_init); + +static inline bool sgx_epc_cgroup_disabled(void) +{ + return !cgroup_subsys_enabled(sgx_epc_cgrp_subsys); +} + +static +struct sgx_epc_cgroup *sgx_epc_cgroup_from_css(struct cgroup_subsys_state *css) +{ + return container_of(css, struct sgx_epc_cgroup, css); +} + +static +struct sgx_epc_cgroup *sgx_epc_cgroup_from_task(struct task_struct *task) +{ + if (unlikely(!task)) + return NULL; + return sgx_epc_cgroup_from_css(task_css(task, sgx_epc_cgrp_id)); +} + +static struct sgx_epc_cgroup *sgx_epc_cgroup_from_mm(struct mm_struct *mm) +{ + struct sgx_epc_cgroup *epc_cg; + + rcu_read_lock(); + do { + epc_cg = sgx_epc_cgroup_from_task(rcu_dereference(mm->owner)); + if (unlikely(!epc_cg)) + epc_cg = root_epc_cgroup; + } while (!css_tryget_online(&epc_cg->css)); + rcu_read_unlock(); + + return epc_cg; +} + +static struct sgx_epc_cgroup *parent_epc_cgroup(struct sgx_epc_cgroup *epc_cg) +{ + return sgx_epc_cgroup_from_css(epc_cg->css.parent); +} + +/** + * sgx_epc_cgroup_iter - iterate over the EPC cgroup hierarchy + * @root: hierarchy root + * @prev: previously returned epc_cg, NULL on first invocation + * @reclaim_epoch: epoch for shared reclaim walks, NULL for full walks + * + * Return: references to children of the hierarchy below @root, or + * @root itself, or %NULL after a full round-trip. + * + * Caller must pass the return value in @prev on subsequent invocations + * for reference counting, or use sgx_epc_cgroup_iter_break() to cancel + * a hierarchy walk before the round-trip is complete. + */ +static struct sgx_epc_cgroup *sgx_epc_cgroup_iter(struct sgx_epc_cgroup *prev, + struct sgx_epc_cgroup *root, + unsigned long *reclaim_epoch) +{ + struct cgroup_subsys_state *css = NULL; + struct sgx_epc_cgroup *epc_cg = NULL; + struct sgx_epc_cgroup *pos = NULL; + bool inc_epoch = false; + + if (sgx_epc_cgroup_disabled()) + return NULL; + + if (!root) + root = root_epc_cgroup; + + if (prev && !reclaim_epoch) + pos = prev; + + rcu_read_lock(); + +start: + if (reclaim_epoch) { + /* + * Abort the walk if a reclaimer working from the same root has + * started a new walk after this reclaimer has already scanned + * at least one cgroup. + */ + if (prev && *reclaim_epoch != root->epoch) + goto out; + + while (1) { + pos = READ_ONCE(root->reclaim_iter); + if (!pos || css_tryget(&pos->css)) + break; + + /* + * The css is dying, clear the reclaim_iter immediately + * instead of waiting for ->css_released to be called. + * Busy waiting serves no purpose and attempting to wait + * for ->css_released may actually block it from being + * called. + */ + (void)cmpxchg(&root->reclaim_iter, pos, NULL); + } + } + + if (pos) + css = &pos->css; + + while (!epc_cg) { + css = css_next_descendant_pre(css, &root->css); + if (!css) { + /* + * Increment the epoch as we've reached the end of the + * tree and the next call to css_next_descendant_pre + * will restart at root. Do not update root->epoch + * directly as we should only do so if we update the + * reclaim_iter, i.e. a different thread may win the + * race and update the epoch for us. + */ + inc_epoch = true; + + /* + * Reclaimers share the hierarchy walk, and a new one + * might jump in at the end of the hierarchy. Restart + * at root so that we don't return NULL on a thread's + * initial call. + */ + if (!prev) + continue; + break; + } + + /* + * Verify the css and acquire a reference. Don't take an + * extra reference to root as it's either the global root + * or is provided by the caller and so is guaranteed to be + * alive. Keep walking if this css is dying. + */ + if (css != &root->css && !css_tryget(css)) + continue; + + epc_cg = sgx_epc_cgroup_from_css(css); + } + + if (reclaim_epoch) { + /* + * reclaim_iter could have already been updated by a competing + * thread; check that the value hasn't changed since we read + * it to avoid reclaiming from the same cgroup twice. If the + * value did change, put all of our references and restart the + * entire process, for all intents and purposes we're making a + * new call. + */ + if (cmpxchg(&root->reclaim_iter, pos, epc_cg) != pos) { + if (epc_cg && epc_cg != root) + css_put(&epc_cg->css); + if (pos) + css_put(&pos->css); + css = NULL; + epc_cg = NULL; + inc_epoch = false; + goto start; + } + + if (inc_epoch) + root->epoch++; + if (!prev) + *reclaim_epoch = root->epoch; + + if (pos) + css_put(&pos->css); + } + +out: + rcu_read_unlock(); + if (prev && prev != root) + css_put(&prev->css); + + return epc_cg; +} + +/** + * sgx_epc_cgroup_iter_break - abort a hierarchy walk prematurely + * @prev: last visited cgroup as returned by sgx_epc_cgroup_iter() + * @root: hierarchy root + */ +static void sgx_epc_cgroup_iter_break(struct sgx_epc_cgroup *prev, + struct sgx_epc_cgroup *root) +{ + if (!root) + root = root_epc_cgroup; + if (prev && prev != root) + css_put(&prev->css); +} + +/** + * sgx_epc_cgroup_lru_empty - check if a cgroup tree has no pages on its lrus + * @root: root of the tree to check + * + * Return: %true if all cgroups under the specified root have empty LRU lists. + * Used to avoid livelocks due to a cgroup having a non-zero charge count but + * no pages on its LRUs, e.g. due to a dead enclave waiting to be released or + * because all pages in the cgroup are unreclaimable. + */ +bool sgx_epc_cgroup_lru_empty(struct sgx_epc_cgroup *root) +{ + struct sgx_epc_cgroup *epc_cg; + + for (epc_cg = sgx_epc_cgroup_iter(NULL, root, NULL); + epc_cg; + epc_cg = sgx_epc_cgroup_iter(epc_cg, root, NULL)) { + if (!list_empty(&epc_cg->lru.reclaimable)) { + sgx_epc_cgroup_iter_break(epc_cg, root); + return false; + } + } + return true; +} + +static inline bool __sgx_epc_cgroup_is_low(struct sgx_epc_cgroup *epc_cg) +{ + unsigned long cur = page_counter_read(&epc_cg->pc); + + return cur < epc_cg->pc.low && + cur < epc_cg->high && + cur < epc_cg->pc.max; +} + +/** + * sgx_epc_cgroup_is_low - check if EPC consumption is below the normal range + * @epc_cg: the EPC cgroup to check + * @root: the top ancestor of the sub-tree being checked + * + * Returns %true if EPC consumption of @epc_cg, and that of all + * ancestors up to (but not including) @root, is below the normal range. + * + * @root is exclusive; it is never low when looked at directly and isn't + * checked when traversing the hierarchy. + * + * Excluding @root enables using sgx_epc.low to prioritize EPC usage + * between cgroups within a subtree of the hierarchy that is limited + * by sgx_epc.high or sgx_epc.max. + * + * For example, given cgroup A with children B and C: + * + * A + * / \ + * B C + * + * and + * + * 1. A/sgx_epc.current > A/sgx_epc.high + * 2. A/B/sgx_epc.current < A/B/sgx_epc.low + * 3. A/C/sgx_epc.current >= A/C/sgx_epc.low + * + * As 'A' is high, i.e. triggers reclaim from 'A', and 'B' is low, we + * should reclaim from 'C' until 'A' is no longer high or until we can + * no longer reclaim from 'C'. If 'A', i.e. @root, isn't excluded by + * when reclaming from 'A', then 'B' will not be considered low and we + * will reclaim indiscriminately from both 'B' and 'C'. + */ +static bool sgx_epc_cgroup_is_low(struct sgx_epc_cgroup *epc_cg, + struct sgx_epc_cgroup *root) +{ + if (sgx_epc_cgroup_disabled()) + return false; + + if (!root) + root = root_epc_cgroup; + if (epc_cg == root) + return false; + + for (; epc_cg != root; epc_cg = parent_epc_cgroup(epc_cg)) { + if (!__sgx_epc_cgroup_is_low(epc_cg)) + return false; + } + + return true; +} + +/** + * sgx_epc_cgroup_all_in_use_are_low - check if all cgroups in a tree are low + * @root: the root EPC cgroup of the hierarchy to check + * + * Returns true if all cgroups in a hierarchy are either low or + * or do not have any pages on their LRU. + */ +static bool sgx_epc_cgroup_all_in_use_are_low(struct sgx_epc_cgroup *root) +{ + struct sgx_epc_cgroup *epc_cg; + + if (sgx_epc_cgroup_disabled()) + return false; + + for (epc_cg = sgx_epc_cgroup_iter(NULL, root, NULL); + epc_cg; + epc_cg = sgx_epc_cgroup_iter(epc_cg, root, NULL)) { + if (!list_empty(&epc_cg->lru.reclaimable) && + !__sgx_epc_cgroup_is_low(epc_cg)) { + sgx_epc_cgroup_iter_break(epc_cg, root); + return false; + } + } + + return true; +} + +void sgx_epc_cgroup_isolate_pages(struct sgx_epc_cgroup *root, + int *nr_to_scan, struct list_head *dst) +{ + struct sgx_epc_cgroup *epc_cg; + unsigned long epoch; + bool do_high; + + if (!*nr_to_scan) + return; + + /* + * If we're not targeting a specific cgroup, try to reclaim only from + * cgroups that are above their high limit. If there are none, then go + * ahead and grab anything available. + */ + do_high = !root; +retry: + for (epc_cg = sgx_epc_cgroup_iter(NULL, root, &epoch); + epc_cg; + epc_cg = sgx_epc_cgroup_iter(epc_cg, root, &epoch)) { + if (do_high && page_counter_read(&epc_cg->pc) < epc_cg->high) + continue; + + if (sgx_epc_cgroup_is_low(epc_cg, root)) { + /* + * Ignore low if all cgroups below @root are low, + * in which case low is "normal". + */ + if (!sgx_epc_cgroup_all_in_use_are_low(root)) + continue; + } + + sgx_isolate_epc_pages(&epc_cg->lru, nr_to_scan, dst); + if (!*nr_to_scan) { + sgx_epc_cgroup_iter_break(epc_cg, root); + break; + } + } + if (*nr_to_scan && do_high) { + do_high = false; + goto retry; + } +} + +static int sgx_epc_cgroup_reclaim_pages(unsigned long nr_pages, + struct sgx_epc_reclaim_control *rc) +{ + /* + * Ensure sgx_reclaim_pages is called with a minimum and maximum + * number of pages. Attempting to reclaim only a few pages will + * often fail and is inefficient, while reclaiming a huge number + * of pages can result in soft lockups due to holding various + * locks for an extended duration. This also bounds nr_pages so + * that its guaranteed not to overflow 'int nr_to_scan'. + */ + nr_pages = max(nr_pages, SGX_EPC_RECLAIM_MIN_PAGES); + nr_pages = min(nr_pages, SGX_EPC_RECLAIM_MAX_PAGES); + + return sgx_reclaim_epc_pages(nr_pages, rc->ignore_age); +} + +static int sgx_epc_cgroup_reclaim_failed(struct sgx_epc_reclaim_control *rc) +{ + if (sgx_epc_cgroup_lru_empty(rc->epc_cg)) + return -ENOMEM; + + ++rc->nr_fails; + if (rc->nr_fails > SGX_EPC_RECLAIM_IGNORE_AGE_THRESHOLD) + rc->ignore_age = true; + + return 0; +} + +static inline +void sgx_epc_reclaim_control_init(struct sgx_epc_reclaim_control *rc, + struct sgx_epc_cgroup *epc_cg) +{ + rc->epc_cg = epc_cg; + rc->nr_fails = 0; + rc->ignore_age = false; +} + +static inline void __sgx_epc_cgroup_reclaim_high(struct sgx_epc_cgroup *epc_cg) +{ + struct sgx_epc_reclaim_control rc; + unsigned long cur, high; + + sgx_epc_reclaim_control_init(&rc, epc_cg); + + for (;;) { + high = READ_ONCE(epc_cg->high); + + cur = page_counter_read(&epc_cg->pc); + if (cur <= high) + break; + + if (!sgx_epc_cgroup_reclaim_pages(cur - high, &rc)) { + if (sgx_epc_cgroup_reclaim_failed(&rc)) + break; + } + } +} + +static void sgx_epc_cgroup_reclaim_high(struct sgx_epc_cgroup *epc_cg) +{ + for (; epc_cg; epc_cg = parent_epc_cgroup(epc_cg)) + __sgx_epc_cgroup_reclaim_high(epc_cg); +} + +/* + * Scheduled by sgx_epc_cgroup_try_charge() to reclaim pages from the + * cgroup, either when the cgroup is at/near its maximum capacity or + * when the cgroup is above its high threshold. + */ +static void sgx_epc_cgroup_reclaim_work_func(struct work_struct *work) +{ + struct sgx_epc_reclaim_control rc; + struct sgx_epc_cgroup *epc_cg; + unsigned long cur, max; + + epc_cg = container_of(work, struct sgx_epc_cgroup, reclaim_work); + + sgx_epc_reclaim_control_init(&rc, epc_cg); + + for (;;) { + max = READ_ONCE(epc_cg->pc.max); + + /* + * Adjust the limit down by one page, the goal is to free up + * pages for fault allocations, not to simply obey the limit. + * Conditionally decrementing max also means the cur vs. max + * check will correctly handle the case where both are zero. + */ + if (max) + max--; + + /* + * Unless the limit is extremely low, in which case forcing + * reclaim will likely cause thrashing, force the cgroup to + * reclaim at least once if it's operating *near* its maximum + * limit by adjusting @max down by half the min reclaim size. + * This work func is scheduled by sgx_epc_cgroup_try_charge + * when it cannot directly reclaim due to being in an atomic + * context, e.g. EPC allocation in a fault handler. Waiting + * to reclaim until the cgroup is actually at its limit is less + * performant as it means the faulting task is effectively + * blocked until a worker makes its way through the global work + * queue. + */ + if (max > SGX_EPC_RECLAIM_MAX_PAGES) + max -= (SGX_EPC_RECLAIM_MIN_PAGES/2); + + cur = page_counter_read(&epc_cg->pc); + if (cur <= max) + break; + + if (!sgx_epc_cgroup_reclaim_pages(cur - max, &rc)) { + if (sgx_epc_cgroup_reclaim_failed(&rc)) + break; + } + } + + sgx_epc_cgroup_reclaim_high(epc_cg); +} + +static int __sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, + unsigned long nr_pages, bool reclaim) +{ + struct sgx_epc_reclaim_control rc; + unsigned long cur, max, over; + unsigned int nr_empty = 0; + struct page_counter *fail; + + if (epc_cg == root_epc_cgroup) { + page_counter_charge(&epc_cg->pc, nr_pages); + return 0; + } + + sgx_epc_reclaim_control_init(&rc, NULL); + + for (;;) { + if (page_counter_try_charge(&epc_cg->pc, nr_pages, &fail)) + break; + + rc.epc_cg = container_of(fail, struct sgx_epc_cgroup, pc); + max = READ_ONCE(rc.epc_cg->pc.max); + if (nr_pages > max) + return -ENOMEM; + + if (signal_pending(current)) + return -ERESTARTSYS; + + if (!reclaim) { + queue_work(sgx_epc_cg_wq, &rc.epc_cg->reclaim_work); + return -EBUSY; + } + + cur = page_counter_read(&rc.epc_cg->pc); + over = ((cur + nr_pages) > max) ? + (cur + nr_pages) - max : SGX_EPC_RECLAIM_MIN_PAGES; + + if (!sgx_epc_cgroup_reclaim_pages(over, &rc)) { + if (sgx_epc_cgroup_reclaim_failed(&rc)) { + if (++nr_empty > SGX_EPC_RECLAIM_OOM_THRESHOLD) + return -ENOMEM; + schedule(); + } + } + } + + css_get_many(&epc_cg->css, nr_pages); + + for (; epc_cg; epc_cg = parent_epc_cgroup(epc_cg)) { + if (page_counter_read(&epc_cg->pc) >= epc_cg->high) { + if (!reclaim) + queue_work(sgx_epc_cg_wq, &epc_cg->reclaim_work); + else + sgx_epc_cgroup_reclaim_high(epc_cg); + break; + } + } + return 0; +} + + +/** + * sgx_epc_cgroup_try_charge - hierarchically try to charge a single EPC page + * @mm: the mm_struct of the process to charge + * @reclaim: whether or not synchronous reclaim is allowed + * @epc_cg_ptr: out parameter for the charged EPC cgroup + * + * Returns EPC cgroup or NULL on success, -errno on failure. + */ +struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(struct mm_struct *mm, + bool reclaim) +{ + struct sgx_epc_cgroup *epc_cg; + int ret; + + if (sgx_epc_cgroup_disabled()) + return NULL; + + epc_cg = sgx_epc_cgroup_from_mm(mm); + ret = __sgx_epc_cgroup_try_charge(epc_cg, 1, reclaim); + css_put(&epc_cg->css); + + if (ret) + return ERR_PTR(ret); + return epc_cg; +} + +/** + * sgx_epc_cgroup_uncharge - hierarchically uncharge EPC pages + * @epc_cg: the charged epc cgroup + * @nr_pages: the number of pages to uncharge + */ +void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg) +{ + if (sgx_epc_cgroup_disabled()) + return; + + page_counter_uncharge(&epc_cg->pc, 1); + + if (epc_cg != root_epc_cgroup) + css_put_many(&epc_cg->css, 1); +} + +static void sgx_epc_cgroup_oom(struct sgx_epc_cgroup *root) +{ + struct sgx_epc_cgroup *epc_cg; + + for (epc_cg = sgx_epc_cgroup_iter(NULL, root, NULL); + epc_cg; + epc_cg = sgx_epc_cgroup_iter(epc_cg, root, NULL)) { + if (sgx_epc_oom(&epc_cg->lru)) { + sgx_epc_cgroup_iter_break(epc_cg, root); + return; + } + } +} + +static struct cgroup_subsys_state * +sgx_epc_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) +{ + struct sgx_epc_cgroup *parent = sgx_epc_cgroup_from_css(parent_css); + struct sgx_epc_cgroup *epc_cg; + + epc_cg = kzalloc(sizeof(struct sgx_epc_cgroup), GFP_KERNEL); + if (!epc_cg) + return ERR_PTR(-ENOMEM); + + if (!parent) + root_epc_cgroup = epc_cg; + + epc_cg->high = PAGE_COUNTER_MAX; + sgx_lru_init(&epc_cg->lru); + page_counter_init(&epc_cg->pc, parent ? &parent->pc : NULL); + INIT_WORK(&epc_cg->reclaim_work, sgx_epc_cgroup_reclaim_work_func); + + return &epc_cg->css; +} + +static void sgx_epc_cgroup_css_released(struct cgroup_subsys_state *css) +{ + struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(css); + struct sgx_epc_cgroup *dead_cg = epc_cg; + + while ((epc_cg = parent_epc_cgroup(epc_cg))) + cmpxchg(&epc_cg->reclaim_iter, dead_cg, NULL); +} + +static void sgx_epc_cgroup_css_free(struct cgroup_subsys_state *css) +{ + struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(css); + + cancel_work_sync(&epc_cg->reclaim_work); + kfree(epc_cg); +} + +static u64 sgx_epc_current_read(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(css); + + return (u64)page_counter_read(&epc_cg->pc) * PAGE_SIZE; +} + +static int sgx_epc_low_show(struct seq_file *m, void *v) +{ + struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(seq_css(m)); + unsigned long low = READ_ONCE(epc_cg->pc.low); + + if (low == PAGE_COUNTER_MAX) + seq_puts(m, "max\n"); + else + seq_printf(m, "%llu\n", (u64)low * PAGE_SIZE); + + return 0; +} + +static ssize_t sgx_epc_low_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(of_css(of)); + unsigned long low; + int err; + + buf = strstrip(buf); + err = page_counter_memparse(buf, "max", &low); + if (err) + return err; + + page_counter_set_low(&epc_cg->pc, low); + + return nbytes; +} + +static int sgx_epc_high_show(struct seq_file *m, void *v) +{ + struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(seq_css(m)); + unsigned long high = READ_ONCE(epc_cg->high); + + if (high == PAGE_COUNTER_MAX) + seq_puts(m, "max\n"); + else + seq_printf(m, "%llu\n", (u64)high * PAGE_SIZE); + + return 0; +} + +static ssize_t sgx_epc_high_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(of_css(of)); + struct sgx_epc_reclaim_control rc; + unsigned long cur, high; + int err; + + buf = strstrip(buf); + err = page_counter_memparse(buf, "max", &high); + if (err) + return err; + + epc_cg->high = high; + + sgx_epc_reclaim_control_init(&rc, epc_cg); + + for (;;) { + cur = page_counter_read(&epc_cg->pc); + if (cur <= high) + break; + + if (signal_pending(current)) + break; + + if (!sgx_epc_cgroup_reclaim_pages(cur - high, &rc)) { + if (sgx_epc_cgroup_reclaim_failed(&rc)) + break; + } + } + + return nbytes; +} + +static int sgx_epc_max_show(struct seq_file *m, void *v) +{ + struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(seq_css(m)); + unsigned long max = READ_ONCE(epc_cg->pc.max); + + if (max == PAGE_COUNTER_MAX) + seq_puts(m, "max\n"); + else + seq_printf(m, "%llu\n", (u64)max * PAGE_SIZE); + + return 0; +} + + +static ssize_t sgx_epc_max_write(struct kernfs_open_file *of, char *buf, + size_t nbytes, loff_t off) +{ + struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(of_css(of)); + struct sgx_epc_reclaim_control rc; + unsigned int nr_empty = 0; + unsigned long cur, max; + int err; + + buf = strstrip(buf); + err = page_counter_memparse(buf, "max", &max); + if (err) + return err; + + xchg(&epc_cg->pc.max, max); + + sgx_epc_reclaim_control_init(&rc, epc_cg); + + for (;;) { + cur = page_counter_read(&epc_cg->pc); + if (cur <= max) + break; + + if (signal_pending(current)) + break; + + if (!sgx_epc_cgroup_reclaim_pages(cur - max, &rc)) { + if (sgx_epc_cgroup_reclaim_failed(&rc)) { + if (++nr_empty > SGX_EPC_RECLAIM_OOM_THRESHOLD) + sgx_epc_cgroup_oom(epc_cg); + schedule(); + } + } + } + + return nbytes; +} + +static struct cftype sgx_epc_cgroup_files[] = { + { + .name = "current", + .read_u64 = sgx_epc_current_read, + }, + { + .name = "low", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = sgx_epc_low_show, + .write = sgx_epc_low_write, + }, + { + .name = "high", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = sgx_epc_high_show, + .write = sgx_epc_high_write, + }, + { + .name = "max", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = sgx_epc_max_show, + .write = sgx_epc_max_write, + }, + { } /* terminate */ +}; + +struct cgroup_subsys sgx_epc_cgrp_subsys = { + .css_alloc = sgx_epc_cgroup_css_alloc, + .css_free = sgx_epc_cgroup_css_free, + .css_released = sgx_epc_cgroup_css_released, + + .legacy_cftypes = sgx_epc_cgroup_files, + .dfl_cftypes = sgx_epc_cgroup_files, +}; diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.h b/arch/x86/kernel/cpu/sgx/epc_cgroup.h new file mode 100644 index 000000000000..226304a3d523 --- /dev/null +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.h @@ -0,0 +1,37 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2022 Intel Corporation. */ +#ifndef _INTEL_SGX_EPC_CGROUP_H_ +#define _INTEL_SGX_EPC_CGROUP_H_ + +#include +#include +#include +#include +#include + +#include "sgx.h" + +#ifndef CONFIG_CGROUP_SGX_EPC +struct sgx_epc_cgroup; +#else +struct sgx_epc_cgroup { + struct cgroup_subsys_state css; + + struct page_counter pc; + unsigned long high; + + struct sgx_epc_lru lru; + struct sgx_epc_cgroup *reclaim_iter; + struct work_struct reclaim_work; + unsigned int epoch; +}; + +struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(struct mm_struct *mm, + bool reclaim); +void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg); +bool sgx_epc_cgroup_lru_empty(struct sgx_epc_cgroup *root); +void sgx_epc_cgroup_isolate_pages(struct sgx_epc_cgroup *root, + int *nr_to_scan, struct list_head *dst); +#endif + +#endif /* _INTEL_SGX_EPC_CGROUP_H_ */ diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h index 445235487230..ff7fbb3e057a 100644 --- a/include/linux/cgroup_subsys.h +++ b/include/linux/cgroup_subsys.h @@ -65,6 +65,10 @@ SUBSYS(rdma) SUBSYS(misc) #endif +#if IS_ENABLED(CONFIG_CGROUP_SGX_EPC) +SUBSYS(sgx_epc) +#endif + /* * The following subsystems are not supported on the default hierarchy. */ diff --git a/init/Kconfig b/init/Kconfig index 80fe60fa77fb..aba7502b40b0 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1178,6 +1178,18 @@ config CGROUP_MISC For more information, please check misc cgroup section in /Documentation/admin-guide/cgroup-v2.rst. +config CGROUP_SGX_EPC + bool "Enclave Page Cache (EPC) controller for Intel SGX" + depends on X86_SGX && MEMCG + select PAGE_COUNTER + help + Provides control over the EPC footprint of tasks in a cgroup. + EPC is a subset of regular memory that is usable only by SGX + enclaves and is very limited in quantity, e.g. less than 1% + of total DRAM. + + Say N if unsure. + config CGROUP_DEBUG bool "Debug controller" default n From patchwork Thu Sep 22 17:10:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 12985597 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91D38C6FA82 for ; Thu, 22 Sep 2022 17:14:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232013AbiIVRN7 (ORCPT ); Thu, 22 Sep 2022 13:13:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42886 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232248AbiIVRNk (ORCPT ); Thu, 22 Sep 2022 13:13:40 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 95F4A107DD5; Thu, 22 Sep 2022 10:13:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663866814; x=1695402814; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9LpMipHIxpgQ5nuRBoedZKsszRldILRxqIiqKR5tFVA=; b=R4mDlt96XY/9bKstIPeHbY6VeNGWbGzHJX8AkLoGKy7lwaMwDgSEX+5V RQJir0SU2gi87eeoeOaRT1FCCcrXh6etmXu+hMLtIcDvowPWrnKQX8WAk YHCs7asdZZKvvh7AFmBRFg/Ih/gkRwgjI9ySyuJHn6N1SErpWzFBj5zH6 enwD+VVTUvcb1fuBuhTEfzjac+2R2RyP43+AV81NfNPMDC/+y9bCdaSac +MlWyPhh165wFLFUcaRyF0zDC9fqvOGtRR9SxYGazQhCNA+BAFiyirbxx Ixgx3CoH3VM1aelvcUSs3S4dkH4XhW2AMaY3Hmojrqg/Kx76GEHKRtMiy Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10478"; a="301216978" X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="301216978" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:12:31 -0700 X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="762270253" Received: from sknaidu-mobl1.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.165.187]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:52 -0700 From: Kristen Carlson Accardi To: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Jarkko Sakkinen , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: Kristen Carlson Accardi , Sean Christopherson Subject: [RFC PATCH 18/20] x86/sgx: Enable EPC cgroup controller in SGX core Date: Thu, 22 Sep 2022 10:10:55 -0700 Message-Id: <20220922171057.1236139-19-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220922171057.1236139-1-kristen@linux.intel.com> References: <20220922171057.1236139-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Add the appropriate calls to (un)charge a cgroup during EPC page allocation and free, and to isolate pages for reclaim based on the provided cgroup. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/epc_cgroup.c | 2 +- arch/x86/kernel/cpu/sgx/main.c | 65 +++++++++++++++++++++++++--- arch/x86/kernel/cpu/sgx/sgx.h | 7 ++- 3 files changed, 65 insertions(+), 9 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c b/arch/x86/kernel/cpu/sgx/epc_cgroup.c index 0a61bb8548ff..71da3b499950 100644 --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c @@ -396,7 +396,7 @@ static int sgx_epc_cgroup_reclaim_pages(unsigned long nr_pages, nr_pages = max(nr_pages, SGX_EPC_RECLAIM_MIN_PAGES); nr_pages = min(nr_pages, SGX_EPC_RECLAIM_MAX_PAGES); - return sgx_reclaim_epc_pages(nr_pages, rc->ignore_age); + return sgx_reclaim_epc_pages(nr_pages, rc->ignore_age, rc->epc_cg); } static int sgx_epc_cgroup_reclaim_failed(struct sgx_epc_reclaim_control *rc) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 082c08228840..29653a0d4670 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -17,6 +17,7 @@ #include "driver.h" #include "encl.h" #include "encls.h" +#include "epc_cgroup.h" #define SGX_MAX_NR_TO_RECLAIM 32 @@ -33,6 +34,10 @@ static DEFINE_XARRAY(sgx_epc_address_space); static struct sgx_epc_lru sgx_global_lru; static inline struct sgx_epc_lru *sgx_lru(struct sgx_epc_page *epc_page) { +#ifdef CONFIG_CGROUP_SGX_EPC + if (epc_page->epc_cg) + return &epc_page->epc_cg->lru; +#endif return &sgx_global_lru; } @@ -321,6 +326,7 @@ void sgx_isolate_epc_pages(struct sgx_epc_lru *lru, int *nr_to_scan, * sgx_reclaim_epc_pages() - Reclaim EPC pages from the consumers * @nr_to_scan: Number of EPC pages to scan for reclaim * @ignore_age: Reclaim a page even if it is young + * @epc_cg: EPC cgroup from which to reclaim * * Take a fixed number of pages from the head of the active page pool and * reclaim them to the enclave's private shmem files. Skip the pages, which have @@ -336,7 +342,8 @@ void sgx_isolate_epc_pages(struct sgx_epc_lru *lru, int *nr_to_scan, * * Return: number of EPC pages reclaimed */ -int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age) +int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age, + struct sgx_epc_cgroup *epc_cg) { struct sgx_backing backing[SGX_MAX_NR_TO_RECLAIM]; struct sgx_epc_page *epc_page, *tmp; @@ -347,8 +354,17 @@ int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age) int i = 0; int ret; - sgx_isolate_epc_pages(&sgx_global_lru, &nr_to_scan, &iso); + /* + * If a specific cgroup is not being targetted, take from the global + * list first, even when cgroups are enabled. If there are + * pages on the global LRU then they should get reclaimed asap. + */ + if (!IS_ENABLED(CONFIG_CGROUP_SGX_EPC) || !epc_cg) + sgx_isolate_epc_pages(&sgx_global_lru, &nr_to_scan, &iso); +#ifdef CONFIG_CGROUP_SGX_EPC + sgx_epc_cgroup_isolate_pages(epc_cg, &nr_to_scan, &iso); +#endif if (list_empty(&iso)) goto out; @@ -394,6 +410,12 @@ int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age) kref_put(&encl_page->encl->refcount, sgx_encl_release); epc_page->flags &= ~SGX_EPC_PAGE_RECLAIM_FLAGS; +#ifdef CONFIG_CGROUP_SGX_EPC + if (epc_page->epc_cg) { + sgx_epc_cgroup_uncharge(epc_page->epc_cg); + epc_page->epc_cg = NULL; + } +#endif sgx_free_epc_page(epc_page); } out: @@ -403,7 +425,11 @@ int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age) static bool sgx_can_reclaim(void) { +#ifdef CONFIG_CGROUP_SGX_EPC + return !sgx_epc_cgroup_lru_empty(NULL); +#else return !list_empty(&sgx_global_lru.reclaimable); +#endif } static bool sgx_should_reclaim(unsigned long watermark) @@ -420,7 +446,7 @@ static bool sgx_should_reclaim(unsigned long watermark) void sgx_reclaim_direct(void) { if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) - sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false); + sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false, NULL); } static int ksgxd(void *p) @@ -446,7 +472,7 @@ static int ksgxd(void *p) sgx_should_reclaim(SGX_NR_HIGH_PAGES)); if (sgx_should_reclaim(SGX_NR_HIGH_PAGES)) - sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false); + sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false, NULL); } return 0; @@ -600,7 +626,13 @@ int sgx_drop_epc_page(struct sgx_epc_page *page) struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) { struct sgx_epc_page *page; +#ifdef CONFIG_CGROUP_SGX_EPC + struct sgx_epc_cgroup *epc_cg; + epc_cg = sgx_epc_cgroup_try_charge(current->mm, reclaim); + if (IS_ERR(epc_cg)) + return ERR_CAST(epc_cg); +#endif for ( ; ; ) { page = __sgx_alloc_epc_page(); if (!IS_ERR(page)) { @@ -608,8 +640,10 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - if (!sgx_can_reclaim()) - return ERR_PTR(-ENOMEM); + if (!sgx_can_reclaim()) { + page = ERR_PTR(-ENOMEM); + break; + } if (!reclaim) { page = ERR_PTR(-EBUSY); @@ -621,9 +655,17 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false); + sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false, NULL); } +#ifdef CONFIG_CGROUP_SGX_EPC + if (!IS_ERR(page)) { + WARN_ON(page->epc_cg); + page->epc_cg = epc_cg; + } else { + sgx_epc_cgroup_uncharge(epc_cg); + } +#endif if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) wake_up(&ksgxd_waitq); @@ -654,6 +696,12 @@ void sgx_free_epc_page(struct sgx_epc_page *page) page->flags = SGX_EPC_PAGE_IS_FREE; spin_unlock(&node->lock); +#ifdef CONFIG_CGROUP_SGX_EPC + if (page->epc_cg) { + sgx_epc_cgroup_uncharge(page->epc_cg); + page->epc_cg = NULL; + } +#endif atomic_long_inc(&sgx_nr_free_pages); } @@ -818,6 +866,9 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size, section->pages[i].flags = 0; section->pages[i].owner = NULL; section->pages[i].poison = 0; +#ifdef CONFIG_CGROUP_SGX_EPC + section->pages[i].epc_cg = NULL; +#endif list_add_tail(§ion->pages[i].list, &sgx_dirty_page_list); } diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index a4c7ee0a4958..3ea96779dd28 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -39,6 +39,7 @@ SGX_EPC_PAGE_RECLAIM_IN_PROGRESS | \ SGX_EPC_PAGE_ENCLAVE | \ SGX_EPC_PAGE_VERSION_ARRAY) +struct sgx_epc_cgroup; struct sgx_epc_page { unsigned int section; @@ -46,6 +47,9 @@ struct sgx_epc_page { u16 poison; void *owner; struct list_head list; +#ifdef CONFIG_CGROUP_SGX_EPC + struct sgx_epc_cgroup *epc_cg; +#endif }; /* @@ -113,7 +117,8 @@ void sgx_reclaim_direct(void); void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags); int sgx_drop_epc_page(struct sgx_epc_page *page); struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim); -int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age); +int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age, + struct sgx_epc_cgroup *epc_cg); void sgx_isolate_epc_pages(struct sgx_epc_lru *lru, int *nr_to_scan, struct list_head *dst); bool sgx_epc_oom(struct sgx_epc_lru *lru); From patchwork Thu Sep 22 17:10:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 12985598 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7CF17C6FA82 for ; Thu, 22 Sep 2022 17:14:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229499AbiIVROM (ORCPT ); Thu, 22 Sep 2022 13:14:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42448 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232308AbiIVRNt (ORCPT ); Thu, 22 Sep 2022 13:13:49 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1ABA101964; Thu, 22 Sep 2022 10:13:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663866821; x=1695402821; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Zhko5yUObMvnwMj2ppPRiU1c++lm7yggSa7NwT4K8xo=; b=jfkTG6mc0ySGFA+oXCY9GEW/MgOnNB0FvGUDO3Nb0hQJ4FSgvJKUlM0d Z0wWMB7jiAgrjYj+CCAPRAYj+EkdqjinBmcpXzPXjh3VDRqE33lvM2TNQ OdUGShRO9wlS307grgxqWM4Gg/GPMa7qIo7xwvsW+osaEyO0lrFBZiSZn xyJhVKA7oVIehcQRxztDDDoiyAIDdH/OACm1g1twp8p2KZ52+y84PDd3p +TL4cAJsD7Ru+YRfvfiYf63HU5qup7mCdr/pm4A6WMOTOEhz0pv9DAq3S 1vuGF1PEjFAuyHCUjvfXPdTxGpA+wfsg+sEWf08J49n8PV+JnadZZXDmC A==; X-IronPort-AV: E=McAfee;i="6500,9779,10478"; a="301216985" X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="301216985" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:12:31 -0700 X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="762270270" Received: from sknaidu-mobl1.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.165.187]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:54 -0700 From: Kristen Carlson Accardi To: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Jarkko Sakkinen , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: Kristen Carlson Accardi , Sean Christopherson Subject: [RFC PATCH 19/20] x86/sgx: Add stats and events interfaces to EPC cgroup controller Date: Thu, 22 Sep 2022 10:10:56 -0700 Message-Id: <20220922171057.1236139-20-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220922171057.1236139-1-kristen@linux.intel.com> References: <20220922171057.1236139-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Enable the cgroup sgx_epc.stats and sgx_epc.events files and associated counters. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/epc_cgroup.c | 134 +++++++++++++++++++++++++-- arch/x86/kernel/cpu/sgx/epc_cgroup.h | 16 +++- arch/x86/kernel/cpu/sgx/main.c | 6 +- 3 files changed, 145 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c b/arch/x86/kernel/cpu/sgx/epc_cgroup.c index 71da3b499950..8541029b86be 100644 --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c @@ -77,6 +77,43 @@ static struct sgx_epc_cgroup *parent_epc_cgroup(struct sgx_epc_cgroup *epc_cg) return sgx_epc_cgroup_from_css(epc_cg->css.parent); } +static inline unsigned long sgx_epc_cgroup_cnt_read(struct sgx_epc_cgroup *epc_cg, + enum sgx_epc_cgroup_counter i) +{ + return atomic_long_read(&epc_cg->cnt[i]); +} + +static inline void sgx_epc_cgroup_cnt_reset(struct sgx_epc_cgroup *epc_cg, + enum sgx_epc_cgroup_counter i) +{ + atomic_long_set(&epc_cg->cnt[i], 0); +} + +static inline void sgx_epc_cgroup_cnt_add(struct sgx_epc_cgroup *epc_cg, + enum sgx_epc_cgroup_counter i, + unsigned long cnt) +{ + atomic_long_add(cnt, &epc_cg->cnt[i]); +} + +static inline void sgx_epc_cgroup_event(struct sgx_epc_cgroup *epc_cg, + enum sgx_epc_cgroup_counter i, + unsigned long cnt) +{ + sgx_epc_cgroup_cnt_add(epc_cg, i, cnt); + + if (i == SGX_EPC_CGROUP_LOW || i == SGX_EPC_CGROUP_HIGH || + i == SGX_EPC_CGROUP_MAX) + cgroup_file_notify(&epc_cg->events_file); +} + +static inline void sgx_epc_cgroup_cnt_sub(struct sgx_epc_cgroup *epc_cg, + enum sgx_epc_cgroup_counter i, + unsigned long cnt) +{ + atomic_long_sub(cnt, &epc_cg->cnt[i]); +} + /** * sgx_epc_cgroup_iter - iterate over the EPC cgroup hierarchy * @root: hierarchy root @@ -368,7 +405,9 @@ void sgx_epc_cgroup_isolate_pages(struct sgx_epc_cgroup *root, */ if (!sgx_epc_cgroup_all_in_use_are_low(root)) continue; + sgx_epc_cgroup_event(epc_cg, SGX_EPC_CGROUP_LOW, 1); } + sgx_epc_cgroup_event(epc_cg, SGX_EPC_CGROUP_RECLAMATIONS, 1); sgx_isolate_epc_pages(&epc_cg->lru, nr_to_scan, dst); if (!*nr_to_scan) { @@ -383,8 +422,11 @@ void sgx_epc_cgroup_isolate_pages(struct sgx_epc_cgroup *root, } static int sgx_epc_cgroup_reclaim_pages(unsigned long nr_pages, - struct sgx_epc_reclaim_control *rc) + struct sgx_epc_reclaim_control *rc, + enum sgx_epc_cgroup_counter c) { + sgx_epc_cgroup_event(rc->epc_cg, c, 1); + /* * Ensure sgx_reclaim_pages is called with a minimum and maximum * number of pages. Attempting to reclaim only a few pages will @@ -434,7 +476,8 @@ static inline void __sgx_epc_cgroup_reclaim_high(struct sgx_epc_cgroup *epc_cg) if (cur <= high) break; - if (!sgx_epc_cgroup_reclaim_pages(cur - high, &rc)) { + if (!sgx_epc_cgroup_reclaim_pages(cur - high, &rc, + SGX_EPC_CGROUP_HIGH)) { if (sgx_epc_cgroup_reclaim_failed(&rc)) break; } @@ -494,7 +537,8 @@ static void sgx_epc_cgroup_reclaim_work_func(struct work_struct *work) if (cur <= max) break; - if (!sgx_epc_cgroup_reclaim_pages(cur - max, &rc)) { + if (!sgx_epc_cgroup_reclaim_pages(cur - max, &rc, + SGX_EPC_CGROUP_MAX)) { if (sgx_epc_cgroup_reclaim_failed(&rc)) break; } @@ -539,7 +583,8 @@ static int __sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, over = ((cur + nr_pages) > max) ? (cur + nr_pages) - max : SGX_EPC_RECLAIM_MIN_PAGES; - if (!sgx_epc_cgroup_reclaim_pages(over, &rc)) { + if (!sgx_epc_cgroup_reclaim_pages(over, &rc, + SGX_EPC_CGROUP_MAX)) { if (sgx_epc_cgroup_reclaim_failed(&rc)) { if (++nr_empty > SGX_EPC_RECLAIM_OOM_THRESHOLD) return -ENOMEM; @@ -586,6 +631,8 @@ struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(struct mm_struct *mm, if (ret) return ERR_PTR(ret); + + sgx_epc_cgroup_cnt_add(epc_cg, SGX_EPC_CGROUP_PAGES, 1); return epc_cg; } @@ -593,13 +640,17 @@ struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(struct mm_struct *mm, * sgx_epc_cgroup_uncharge - hierarchically uncharge EPC pages * @epc_cg: the charged epc cgroup * @nr_pages: the number of pages to uncharge + * @reclaimed: whether the pages were reclaimed (vs. freed) */ -void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg) +void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg, bool reclaimed) { if (sgx_epc_cgroup_disabled()) return; page_counter_uncharge(&epc_cg->pc, 1); + sgx_epc_cgroup_cnt_sub(epc_cg, SGX_EPC_CGROUP_PAGES, 1); + if (reclaimed) + sgx_epc_cgroup_event(epc_cg, SGX_EPC_CGROUP_RECLAIMED, 1); if (epc_cg != root_epc_cgroup) css_put_many(&epc_cg->css, 1); @@ -665,6 +716,61 @@ static u64 sgx_epc_current_read(struct cgroup_subsys_state *css, return (u64)page_counter_read(&epc_cg->pc) * PAGE_SIZE; } +static int sgx_epc_stats_show(struct seq_file *m, void *v) +{ + struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(seq_css(m)); + + unsigned long cur, dir, rec, recs; + cur = page_counter_read(&epc_cg->pc); + dir = sgx_epc_cgroup_cnt_read(epc_cg, SGX_EPC_CGROUP_PAGES); + rec = sgx_epc_cgroup_cnt_read(epc_cg, SGX_EPC_CGROUP_RECLAIMED); + recs= sgx_epc_cgroup_cnt_read(epc_cg, SGX_EPC_CGROUP_RECLAMATIONS); + + seq_printf(m, "pages %lu\n", cur); + seq_printf(m, "direct %lu\n", dir); + seq_printf(m, "indirect %lu\n", (cur - dir)); + seq_printf(m, "reclaimed %lu\n", rec); + seq_printf(m, "reclamations %lu\n", recs); + + return 0; +} + +static ssize_t sgx_epc_stats_reset(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(of_css(of)); + sgx_epc_cgroup_cnt_reset(epc_cg, SGX_EPC_CGROUP_RECLAIMED); + sgx_epc_cgroup_cnt_reset(epc_cg, SGX_EPC_CGROUP_RECLAMATIONS); + return nbytes; +} + + +static int sgx_epc_events_show(struct seq_file *m, void *v) +{ + struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(seq_css(m)); + + unsigned long low, high, max; + low = sgx_epc_cgroup_cnt_read(epc_cg, SGX_EPC_CGROUP_LOW); + high = sgx_epc_cgroup_cnt_read(epc_cg, SGX_EPC_CGROUP_HIGH); + max = sgx_epc_cgroup_cnt_read(epc_cg, SGX_EPC_CGROUP_MAX); + + seq_printf(m, "low %lu\n", low); + seq_printf(m, "high %lu\n", high); + seq_printf(m, "max %lu\n", max); + + return 0; +} + +static ssize_t sgx_epc_events_reset(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(of_css(of)); + sgx_epc_cgroup_cnt_reset(epc_cg, SGX_EPC_CGROUP_LOW); + sgx_epc_cgroup_cnt_reset(epc_cg, SGX_EPC_CGROUP_HIGH); + sgx_epc_cgroup_cnt_reset(epc_cg, SGX_EPC_CGROUP_MAX); + return nbytes; +} + static int sgx_epc_low_show(struct seq_file *m, void *v) { struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(seq_css(m)); @@ -733,7 +839,8 @@ static ssize_t sgx_epc_high_write(struct kernfs_open_file *of, if (signal_pending(current)) break; - if (!sgx_epc_cgroup_reclaim_pages(cur - high, &rc)) { + if (!sgx_epc_cgroup_reclaim_pages(cur - high, &rc, + SGX_EPC_CGROUP_HIGH)) { if (sgx_epc_cgroup_reclaim_failed(&rc)) break; } @@ -782,7 +889,8 @@ static ssize_t sgx_epc_max_write(struct kernfs_open_file *of, char *buf, if (signal_pending(current)) break; - if (!sgx_epc_cgroup_reclaim_pages(cur - max, &rc)) { + if (!sgx_epc_cgroup_reclaim_pages(cur - max, &rc, + SGX_EPC_CGROUP_MAX)) { if (sgx_epc_cgroup_reclaim_failed(&rc)) { if (++nr_empty > SGX_EPC_RECLAIM_OOM_THRESHOLD) sgx_epc_cgroup_oom(epc_cg); @@ -799,6 +907,18 @@ static struct cftype sgx_epc_cgroup_files[] = { .name = "current", .read_u64 = sgx_epc_current_read, }, + { + .name = "stats", + .seq_show = sgx_epc_stats_show, + .write = sgx_epc_stats_reset, + }, + { + .name = "events", + .flags = CFTYPE_NOT_ON_ROOT, + .file_offset = offsetof(struct sgx_epc_cgroup, events_file), + .seq_show = sgx_epc_events_show, + .write = sgx_epc_events_reset, + }, { .name = "low", .flags = CFTYPE_NOT_ON_ROOT, diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.h b/arch/x86/kernel/cpu/sgx/epc_cgroup.h index 226304a3d523..656c9f386b48 100644 --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.h +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.h @@ -14,6 +14,16 @@ #ifndef CONFIG_CGROUP_SGX_EPC struct sgx_epc_cgroup; #else +enum sgx_epc_cgroup_counter { + SGX_EPC_CGROUP_PAGES, + SGX_EPC_CGROUP_RECLAIMED, + SGX_EPC_CGROUP_RECLAMATIONS, + SGX_EPC_CGROUP_LOW, + SGX_EPC_CGROUP_HIGH, + SGX_EPC_CGROUP_MAX, + SGX_EPC_CGROUP_NR_COUNTERS, +}; + struct sgx_epc_cgroup { struct cgroup_subsys_state css; @@ -24,11 +34,15 @@ struct sgx_epc_cgroup { struct sgx_epc_cgroup *reclaim_iter; struct work_struct reclaim_work; unsigned int epoch; + + atomic_long_t cnt[SGX_EPC_CGROUP_NR_COUNTERS]; + + struct cgroup_file events_file; }; struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(struct mm_struct *mm, bool reclaim); -void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg); +void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg, bool reclaimed); bool sgx_epc_cgroup_lru_empty(struct sgx_epc_cgroup *root); void sgx_epc_cgroup_isolate_pages(struct sgx_epc_cgroup *root, int *nr_to_scan, struct list_head *dst); diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 29653a0d4670..3330ed4d0d43 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -412,7 +412,7 @@ int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age, #ifdef CONFIG_CGROUP_SGX_EPC if (epc_page->epc_cg) { - sgx_epc_cgroup_uncharge(epc_page->epc_cg); + sgx_epc_cgroup_uncharge(epc_page->epc_cg, true); epc_page->epc_cg = NULL; } #endif @@ -663,7 +663,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) WARN_ON(page->epc_cg); page->epc_cg = epc_cg; } else { - sgx_epc_cgroup_uncharge(epc_cg); + sgx_epc_cgroup_uncharge(epc_cg, false); } #endif if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) @@ -698,7 +698,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page) spin_unlock(&node->lock); #ifdef CONFIG_CGROUP_SGX_EPC if (page->epc_cg) { - sgx_epc_cgroup_uncharge(page->epc_cg); + sgx_epc_cgroup_uncharge(page->epc_cg, false); page->epc_cg = NULL; } #endif From patchwork Thu Sep 22 17:10:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 12985600 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C141CC6FA91 for ; Thu, 22 Sep 2022 17:14:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230070AbiIVROc (ORCPT ); Thu, 22 Sep 2022 13:14:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42002 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232108AbiIVROG (ORCPT ); Thu, 22 Sep 2022 13:14:06 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D24DE103FD0; Thu, 22 Sep 2022 10:14:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663866843; x=1695402843; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0g8SxneYPLxLe679j97cgFtWHKoCh+6kgTZISdI3av0=; b=KMJulJLdwqoBaA8RaBo3oSjztV+niqBy3NSBl500Z39dbhy6NFZzAXt2 C/GjUiZvcsrs2ZMZoj5Z+XbTk8zkUllOMyWqGg5oPeYuGPu8W2lu/PX65 uVqowGifTtEzsTGT0LM4OB3ckWB5cmrJoHqJEAczDumyJBstM1GnVR3li CSYt13gdR3ml3IcZv02NKucZy15PDWRVWqD6GRQV3QQmjlvVqsH92DocN NPVHM0MwJnyCjCOy2jU9IjlQwpQwNVGrpZoyORqQODP8kxUef3tDP+O9h /M3lg+y/CeNm3HmA//WNbkNOT+G9rMEDNhPV12c5ofUcLFKiLvTDIH017 Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10478"; a="283421613" X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="283421613" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:12:32 -0700 X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="762270292" Received: from sknaidu-mobl1.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.165.187]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 10:11:56 -0700 From: Kristen Carlson Accardi To: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet Cc: Kristen Carlson Accardi , Sean Christopherson , linux-doc@vger.kernel.org Subject: [RFC PATCH 20/20] docs, cgroup, x86/sgx: Add SGX EPC cgroup controller documentation Date: Thu, 22 Sep 2022 10:10:57 -0700 Message-Id: <20220922171057.1236139-21-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20220922171057.1236139-1-kristen@linux.intel.com> References: <20220922171057.1236139-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Add initial documentation for the SGX EPC cgroup controller, which regulates distribution of SGX Enclave Page Cache (EPC) memory. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- Documentation/admin-guide/cgroup-v2.rst | 201 ++++++++++++++++++++++++ 1 file changed, 201 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index be4a77baf784..c355cb08fc18 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -71,6 +71,10 @@ v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst