From patchwork Fri Nov 11 18:35:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040692 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57E2FC43217 for ; Fri, 11 Nov 2022 18:36:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233963AbiKKSgG (ORCPT ); Fri, 11 Nov 2022 13:36:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36898 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233968AbiKKSf4 (ORCPT ); Fri, 11 Nov 2022 13:35:56 -0500 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C2FF5836A0; Fri, 11 Nov 2022 10:35:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191740; x=1699727740; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yI9+1QYO+OyJR5ru97E8yvOHVHg/mVuCwIGgN4RVNoY=; b=mrhcvffruyMtPOpwocXnyCph0O/H0tFoV/oH/huJ2Q3LJDWKuBFQVmie Dl9umXSFd2got7cCK7GttGWfyqTOJb1+vy9gBI8o/rRCNlUHWKaCoUWxD Bj9EzUZiqqiMAMYgwnppqyxGzS7+n0YcMjiziN+srGCuO13N//O4Sp97V n36JGRkzPC2dgpnS7lPZm7R5keBkRUtflZlZwiUHAXZTa93NXMkhEZUCs W9DYlXcoRgfEgwXL8LW65i+o3b8javyJjcN4s7Lx7gOTD7CZQlQ31K77D Yh78Ly+xYwfxYNq/a7Qu8rC6/VBii84PT/fq66oW1+9pfzql96z1AKBso A==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="292050289" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="292050289" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:35:40 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089159" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089159" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:35:39 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi , Sean Christopherson Subject: [PATCH 01/26] x86/sgx: Call cond_resched() at the end of sgx_reclaim_pages() Date: Fri, 11 Nov 2022 10:35:06 -0800 Message-Id: <20221111183532.3676646-2-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson In order to avoid repetition of cond_resched() in ksgxd() and sgx_alloc_epc_page(), move the invocation of post-reclaim cond_resched() inside sgx_reclaim_pages(). Except in the case of sgx_reclaim_direct(), sgx_reclaim_pages() is always called in a loop and is always followed by a call to cond_resched(). This will hold true for the EPC cgroup as well, which adds even more calls to sgx_reclaim_pages() and thus cond_resched(). Calls to sgx_reclaim_direct() may be performance sensitive. Allow sgx_reclaim_direct() to avoid the cond_resched() call by moving the original sgx_reclaim_pages() call to __sgx_reclaim_pages() and then have sgx_reclaim_pages() become a wrapper around that call with a cond_resched(). Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/main.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 160c8dbee0ab..ffce6fc70a1f 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -287,7 +287,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, * problematic as it would increase the lock contention too much, which would * halt forward progress. */ -static void sgx_reclaim_pages(void) +static void __sgx_reclaim_pages(void) { struct sgx_epc_page *chunk[SGX_NR_TO_SCAN]; struct sgx_backing backing[SGX_NR_TO_SCAN]; @@ -369,6 +369,12 @@ static void sgx_reclaim_pages(void) } } +static void sgx_reclaim_pages(void) +{ + __sgx_reclaim_pages(); + cond_resched(); +} + static bool sgx_should_reclaim(unsigned long watermark) { return atomic_long_read(&sgx_nr_free_pages) < watermark && @@ -378,12 +384,14 @@ static bool sgx_should_reclaim(unsigned long watermark) /* * sgx_reclaim_direct() should be called (without enclave's mutex held) * in locations where SGX memory resources might be low and might be - * needed in order to make forward progress. + * needed in order to make forward progress. This call to + * __sgx_reclaim_pages() avoids the cond_resched() in sgx_reclaim_pages() + * to improve performance. */ void sgx_reclaim_direct(void) { if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) - sgx_reclaim_pages(); + __sgx_reclaim_pages(); } static int ksgxd(void *p) @@ -410,8 +418,6 @@ static int ksgxd(void *p) if (sgx_should_reclaim(SGX_NR_HIGH_PAGES)) sgx_reclaim_pages(); - - cond_resched(); } return 0; @@ -582,7 +588,6 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) } sgx_reclaim_pages(); - cond_resched(); } if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) From patchwork Fri Nov 11 18:35:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040693 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA43AC433FE for ; Fri, 11 Nov 2022 18:36:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234112AbiKKSgI (ORCPT ); Fri, 11 Nov 2022 13:36:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37376 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233980AbiKKSf5 (ORCPT ); Fri, 11 Nov 2022 13:35:57 -0500 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F11CE836A8; Fri, 11 Nov 2022 10:35:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191743; x=1699727743; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FaCcKUEIwrradxMdSEDDiScXst5sj/3Hc0SbSP7zTG4=; b=oD9ZN4namfYO10ahv/jbx4GWXsybj3a1b+ovASJsN3VCN23kTFkllk7k MCiVVyNAhGnwOvwP+G8j5/BLLz4ala+RetIwHVpb1xlRuBDrYy1iKLFqg DjaYHes9W0rrxXgTFZxXBantYv3/DLbX9rF23DfeZ7TSVUtMVqE4QE4ZY e2NWBGoP6xoK+S/hXL0d8wo/5KPN4+d9d9ypTtR/hJTGv/G18dbeXV1NW WmMMGLoUZ4B06F09hWN6iqdWre3qsBwEVxvKh9rR6klhIguheciVmTpcY nVQG8OjnjOSV+jyK9f0iyc9ODfKuJG13ZNjQ2akZywU6+EMo+b9WkUb6P A==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="292050295" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="292050295" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:35:42 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089166" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089166" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:35:41 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi , Sean Christopherson Subject: [PATCH 02/26] x86/sgx: Store struct sgx_encl when allocating new va pages Date: Fri, 11 Nov 2022 10:35:07 -0800 Message-Id: <20221111183532.3676646-3-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson When allocating new va pages, pass the struct sgx_encl of the enclave that is allocating the page. sgx_alloc_epc_page() will store this value in the encl_owner field of the struct sgx_epc_page. In a later patch, version array pages will be placed in an unreclaimable queue, and then when the cgroup max limit is reached and there are no more reclaimable pages and the enclave must be oom killed, all the va pages associated with that enclave can be uncharged and freed. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/encl.c | 5 +++-- arch/x86/kernel/cpu/sgx/encl.h | 2 +- arch/x86/kernel/cpu/sgx/ioctl.c | 2 +- arch/x86/kernel/cpu/sgx/sgx.h | 2 ++ 4 files changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index f40d64206ded..4eaf9d21e71b 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -1193,6 +1193,7 @@ void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr) /** * sgx_alloc_va_page() - Allocate a Version Array (VA) page + * @encl: The enclave that this page is allocated to. * @reclaim: Reclaim EPC pages directly if none available. Enclave * mutex should not be held if this is set. * @@ -1202,12 +1203,12 @@ void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr) * a VA page, * -errno otherwise */ -struct sgx_epc_page *sgx_alloc_va_page(bool reclaim) +struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool reclaim) { struct sgx_epc_page *epc_page; int ret; - epc_page = sgx_alloc_epc_page(NULL, reclaim); + epc_page = sgx_alloc_epc_page(encl, reclaim); if (IS_ERR(epc_page)) return ERR_CAST(epc_page); diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h index f94ff14c9486..831d63f80f5a 100644 --- a/arch/x86/kernel/cpu/sgx/encl.h +++ b/arch/x86/kernel/cpu/sgx/encl.h @@ -116,7 +116,7 @@ struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl, unsigned long offset, u64 secinfo_flags); void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr); -struct sgx_epc_page *sgx_alloc_va_page(bool reclaim); +struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool reclaim); unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page); void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset); bool sgx_va_page_full(struct sgx_va_page *va_page); diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index ebe79d60619f..9a1bb3c3211a 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -30,7 +30,7 @@ struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl, bool reclaim) if (!va_page) return ERR_PTR(-ENOMEM); - va_page->epc_page = sgx_alloc_va_page(reclaim); + va_page->epc_page = sgx_alloc_va_page(encl, reclaim); if (IS_ERR(va_page->epc_page)) { err = ERR_CAST(va_page->epc_page); kfree(va_page); diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index d16a8baa28d4..efb10eacd3aa 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -39,6 +39,8 @@ struct sgx_epc_page { struct sgx_encl_page *encl_owner; /* Use when SGX_EPC_PAGE_KVM_GUEST set in ->flags: */ void __user *vepc_vaddr; + + struct sgx_encl *encl; }; struct list_head list; }; From patchwork Fri Nov 11 18:35:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040694 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 784EFC433FE for ; Fri, 11 Nov 2022 18:36:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234245AbiKKSgQ (ORCPT ); Fri, 11 Nov 2022 13:36:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37408 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233784AbiKKSf6 (ORCPT ); Fri, 11 Nov 2022 13:35:58 -0500 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4962F3C6E2; Fri, 11 Nov 2022 10:35:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191752; x=1699727752; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dkrbuSbW29NorxoWdbkKDSqPb7WHSXOFvxv4JgA/XCU=; b=JCf16JLAKNkfeR0v/GvSBTQVhZkD7p/5cudaMAWBpllKpoGyZs869bfD 5M4fudpHjaOT8aITEoo/4ebUcSbd+6XGWVJKP31Z6sc+hoEtjtsIsWgRR fPtoQtXliws51NZog+kSMvAzAQ3ZKtBSVb/ekd3O8FuFx2edsHaFn5NT1 ZTvSs8gstb5NYbjTGa5uShFWmlVVTxG40cF5HX7NA04TMxEcUiS5TUreM o9fbVVw3PDJlAq9h0KdBBk7pkPioOW1L7jhzwJjH6Q16d3YnsdNHzGiga f9NBpZHpZUhN4+VIIuPBZ5g+7HITmE2Iklg0cBYbFoar94DSi/aAFtfAS Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="292050302" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="292050302" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:35:44 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089171" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089171" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:35:43 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi , Sean Christopherson Subject: [PATCH 03/26] x86/sgx: Add 'struct sgx_epc_lru' to encapsulate lru list(s) Date: Fri, 11 Nov 2022 10:35:08 -0800 Message-Id: <20221111183532.3676646-4-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org Introduce a data structure to wrap the existing reclaimable list and its spinlock in a struct to minimize the code changes needed to handle multiple LRUs as well as reclaimable and non-reclaimable lists, both of which will be introduced and used by SGX EPC cgroups. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/sgx.h | 45 +++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index efb10eacd3aa..aac7d4feb0fa 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -91,6 +91,51 @@ static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page) return section->virt_addr + index * PAGE_SIZE; } +struct sgx_epc_lru { + spinlock_t lock; + struct list_head reclaimable; + struct list_head unreclaimable; +}; + +static inline void sgx_lru_init(struct sgx_epc_lru *lru) +{ + spin_lock_init(&lru->lock); + INIT_LIST_HEAD(&lru->reclaimable); + INIT_LIST_HEAD(&lru->unreclaimable); +} + +/* + * Must be called with queue lock acquired + */ +static inline void __sgx_epc_page_list_push(struct list_head *list, struct sgx_epc_page *page) +{ + list_add_tail(&page->list, list); +} + +/* + * Must be called with queue lock acquired + */ +static inline struct sgx_epc_page * __sgx_epc_page_list_pop(struct list_head *list) +{ + struct sgx_epc_page *epc_page; + + if (list_empty(list)) + return NULL; + + epc_page = list_first_entry(list, struct sgx_epc_page, list); + list_del_init(&epc_page->list); + return epc_page; +} + +#define sgx_epc_pop_reclaimable(lru) \ + __sgx_epc_page_list_pop(&(lru)->reclaimable) +#define sgx_epc_push_reclaimable(lru, page) \ + __sgx_epc_page_list_push(&(lru)->reclaimable, page) +#define sgx_epc_pop_unreclaimable(lru) \ + __sgx_epc_page_list_pop(&(lru)->unreclaimable) +#define sgx_epc_push_unreclaimable(lru, page) \ + __sgx_epc_page_list_push(&(lru)->unreclaimable, page) + struct sgx_epc_page *__sgx_alloc_epc_page(void); void sgx_free_epc_page(struct sgx_epc_page *page); From patchwork Fri Nov 11 18:35:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040695 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25E87C433FE for ; Fri, 11 Nov 2022 18:36:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233933AbiKKSg0 (ORCPT ); Fri, 11 Nov 2022 13:36:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37280 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233899AbiKKSf7 (ORCPT ); Fri, 11 Nov 2022 13:35:59 -0500 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6170E5CD2A; Fri, 11 Nov 2022 10:35:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191754; x=1699727754; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=i0mg8Ya6R+UXd/hAsIcofGbYCmr7Xzm1K38RaWFQY70=; b=CNDqOLMHTa4HyF22yTy5WC1SQvkE+UOPvmF/Z/ohxbg0p3xSoeicnVvz 0QPol+rgKnUJZlI9u4HJb5OTa7blLX6g7bfMrEkOW55t0Z23Pjy2FZMTP YkLeo3ESHTw+iOOOV2wQMC3kLx9ZDMVuv318Jwj4KD5vanP3CKf6QZHJW 4xoBE2BuQ1OfNQcGEeuyQbf2fkMGyQCLCtGrONBtkZFpPn05WSr4xqu9O QRgiFpXYFVLOrysUTSixdwwLyomt/y6yf2apCNc6rnenz8Pfekg7XrIZY rrdCLDDrU7IppIFMLQoVm/eCqYb19N3+NQ1wK/lhd2rWc+RWDY2XkuZ44 Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="292050306" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="292050306" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:35:46 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089177" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089177" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:35:44 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi , Sean Christopherson Subject: [PATCH 04/26] x86/sgx: Use sgx_epc_lru for existing active page list Date: Fri, 11 Nov 2022 10:35:09 -0800 Message-Id: <20221111183532.3676646-5-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org Replace the existing sgx_active_page_list and its spinlock with a global sgx_epc_lru struct. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/main.c | 39 +++++++++++++++++----------------- 1 file changed, 19 insertions(+), 20 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index ffce6fc70a1f..aa938e4d4a73 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -26,10 +26,9 @@ static DEFINE_XARRAY(sgx_epc_address_space); /* * These variables are part of the state of the reclaimer, and must be accessed - * with sgx_reclaimer_lock acquired. + * with sgx_global_lru.lock acquired. */ -static LIST_HEAD(sgx_active_page_list); -static DEFINE_SPINLOCK(sgx_reclaimer_lock); +static struct sgx_epc_lru sgx_global_lru; static atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0); @@ -298,14 +297,12 @@ static void __sgx_reclaim_pages(void) int ret; int i; - spin_lock(&sgx_reclaimer_lock); + spin_lock(&sgx_global_lru.lock); for (i = 0; i < SGX_NR_TO_SCAN; i++) { - if (list_empty(&sgx_active_page_list)) + epc_page = sgx_epc_pop_reclaimable(&sgx_global_lru); + if (!epc_page) break; - epc_page = list_first_entry(&sgx_active_page_list, - struct sgx_epc_page, list); - list_del_init(&epc_page->list); encl_page = epc_page->encl_owner; if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) @@ -316,7 +313,7 @@ static void __sgx_reclaim_pages(void) */ epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; } - spin_unlock(&sgx_reclaimer_lock); + spin_unlock(&sgx_global_lru.lock); for (i = 0; i < cnt; i++) { epc_page = chunk[i]; @@ -339,9 +336,9 @@ static void __sgx_reclaim_pages(void) continue; skip: - spin_lock(&sgx_reclaimer_lock); - list_add_tail(&epc_page->list, &sgx_active_page_list); - spin_unlock(&sgx_reclaimer_lock); + spin_lock(&sgx_global_lru.lock); + sgx_epc_push_reclaimable(&sgx_global_lru, epc_page); + spin_unlock(&sgx_global_lru.lock); kref_put(&encl_page->encl->refcount, sgx_encl_release); @@ -378,7 +375,7 @@ static void sgx_reclaim_pages(void) static bool sgx_should_reclaim(unsigned long watermark) { return atomic_long_read(&sgx_nr_free_pages) < watermark && - !list_empty(&sgx_active_page_list); + !list_empty(&sgx_global_lru.reclaimable); } /* @@ -433,6 +430,8 @@ static bool __init sgx_page_reclaimer_init(void) ksgxd_tsk = tsk; + sgx_lru_init(&sgx_global_lru); + return true; } @@ -508,10 +507,10 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) */ void sgx_mark_page_reclaimable(struct sgx_epc_page *page) { - spin_lock(&sgx_reclaimer_lock); + spin_lock(&sgx_global_lru.lock); page->flags |= SGX_EPC_PAGE_RECLAIMER_TRACKED; - list_add_tail(&page->list, &sgx_active_page_list); - spin_unlock(&sgx_reclaimer_lock); + sgx_epc_push_reclaimable(&sgx_global_lru, page); + spin_unlock(&sgx_global_lru.lock); } /** @@ -526,18 +525,18 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page *page) */ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page) { - spin_lock(&sgx_reclaimer_lock); + spin_lock(&sgx_global_lru.lock); if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) { /* The page is being reclaimed. */ if (list_empty(&page->list)) { - spin_unlock(&sgx_reclaimer_lock); + spin_unlock(&sgx_global_lru.lock); return -EBUSY; } list_del(&page->list); page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; } - spin_unlock(&sgx_reclaimer_lock); + spin_unlock(&sgx_global_lru.lock); return 0; } @@ -574,7 +573,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - if (list_empty(&sgx_active_page_list)) + if (list_empty(&sgx_global_lru.reclaimable)) return ERR_PTR(-ENOMEM); if (!reclaim) { From patchwork Fri Nov 11 18:35:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040696 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 548E1C43217 for ; Fri, 11 Nov 2022 18:36:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234180AbiKKSg1 (ORCPT ); Fri, 11 Nov 2022 13:36:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37432 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233938AbiKKSgA (ORCPT ); Fri, 11 Nov 2022 13:36:00 -0500 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4AF966CA0E; Fri, 11 Nov 2022 10:35:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191755; x=1699727755; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=5xhlVwQG/KBxFjRGNP4mKqJ7NtCZV2S4HhsKxjA0yAU=; b=gvcGaXs+KbymuxRzE5frapAs59Q4LgQ5WrnsZhx20d5f63X8nAgORxmW 5WLJhqZKD25BDMcjhi1YrU0rtEbgUmaNsFLfeSmTXg/XRxXclhhMkM4xk YAeF1rGol1Z4/4+9Qlq+f1lNSxQ+wwgjajWjN+VUKuXuQSjZ88x+1r6Vy 6CrQOaKTdU+U7Y0/6YdHwEJs6Hu5uZnN4Yq/8KbVRujIib7CgoYbcE45K /5CnjqSL1ekKbDyZ3EjxBWzhXdzK4eAb7Ym72tWBQ7Mcx9/SIIuDKbv53 Q9jIeaya88zax13Ipwg11VwEoxrmaXtW57XMMexzpaal+wucPmXLwe7Fc w==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="292050311" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="292050311" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:35:49 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089203" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089203" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:35:47 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi , Sean Christopherson Subject: [PATCH 05/26] x86/sgx: Track epc pages on reclaimable or unreclaimable lists Date: Fri, 11 Nov 2022 10:35:10 -0800 Message-Id: <20221111183532.3676646-6-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org Replace functions sgx_mark_page_reclaimable() and sgx_unmark_page_reclaimable() with sgx_record_epc_page() and sgx_drop_epc_page(). sgx_record_epc_page() wil add the epc_page to the correct "reclaimable" or "unreclaimable" list in the sgx_epc_lru struct. sgx_drop_epc_page() will delete the page from the sgx_epc_lru list. Tracking pages that are not tracked by the reclaimer in the LRU's "unreclaimable" list allows an OOM event to cause all the pages in use by an enclave to be freed, regardless of whether they were reclaimable pages or not. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/encl.c | 10 +++++++--- arch/x86/kernel/cpu/sgx/ioctl.c | 11 +++++++---- arch/x86/kernel/cpu/sgx/main.c | 26 +++++++++++++++----------- arch/x86/kernel/cpu/sgx/sgx.h | 4 ++-- arch/x86/kernel/cpu/sgx/virt.c | 28 ++++++++++++++++++++-------- 5 files changed, 51 insertions(+), 28 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 4eaf9d21e71b..4683da9ef4f1 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -252,6 +252,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl, epc_page = sgx_encl_eldu(&encl->secs, NULL); if (IS_ERR(epc_page)) return ERR_CAST(epc_page); + sgx_record_epc_page(epc_page, 0); } epc_page = sgx_encl_eldu(entry, encl->secs.epc_page); @@ -259,7 +260,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl, return ERR_CAST(epc_page); encl->secs_child_cnt++; - sgx_mark_page_reclaimable(entry->epc_page); + sgx_record_epc_page(entry->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); return entry; } @@ -375,7 +376,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma, encl_page->type = SGX_PAGE_TYPE_REG; encl->secs_child_cnt++; - sgx_mark_page_reclaimable(encl_page->epc_page); + sgx_record_epc_page(encl_page->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); phys_addr = sgx_get_epc_phys_addr(epc_page); /* @@ -687,7 +688,7 @@ void sgx_encl_release(struct kref *ref) * The page and its radix tree entry cannot be freed * if the page is being held by the reclaimer. */ - if (sgx_unmark_page_reclaimable(entry->epc_page)) + if (sgx_drop_epc_page(entry->epc_page)) continue; sgx_encl_free_epc_page(entry->epc_page); @@ -703,6 +704,7 @@ void sgx_encl_release(struct kref *ref) xa_destroy(&encl->page_array); if (!encl->secs_child_cnt && encl->secs.epc_page) { + sgx_drop_epc_page(encl->secs.epc_page); sgx_encl_free_epc_page(encl->secs.epc_page); encl->secs.epc_page = NULL; } @@ -711,6 +713,7 @@ void sgx_encl_release(struct kref *ref) va_page = list_first_entry(&encl->va_pages, struct sgx_va_page, list); list_del(&va_page->list); + sgx_drop_epc_page(va_page->epc_page); sgx_encl_free_epc_page(va_page->epc_page); kfree(va_page); } @@ -1218,6 +1221,7 @@ struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool reclaim) sgx_encl_free_epc_page(epc_page); return ERR_PTR(-EFAULT); } + sgx_record_epc_page(epc_page, 0); return epc_page; } diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index 9a1bb3c3211a..aca80a3f38a1 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -48,6 +48,7 @@ void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page) encl->page_cnt--; if (va_page) { + sgx_drop_epc_page(va_page->epc_page); sgx_encl_free_epc_page(va_page->epc_page); list_del(&va_page->list); kfree(va_page); @@ -113,6 +114,8 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs) encl->attributes = secs->attributes; encl->attributes_mask = SGX_ATTR_DEBUG | SGX_ATTR_MODE64BIT | SGX_ATTR_KSS; + sgx_record_epc_page(encl->secs.epc_page, 0); + /* Set only after completion, as encl->lock has not been taken. */ set_bit(SGX_ENCL_CREATED, &encl->flags); @@ -322,7 +325,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src, goto err_out; } - sgx_mark_page_reclaimable(encl_page->epc_page); + sgx_record_epc_page(encl_page->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); mutex_unlock(&encl->lock); mmap_read_unlock(current->mm); return ret; @@ -958,7 +961,7 @@ static long sgx_enclave_modify_types(struct sgx_encl *encl, * Prevent page from being reclaimed while mutex * is released. */ - if (sgx_unmark_page_reclaimable(entry->epc_page)) { + if (sgx_drop_epc_page(entry->epc_page)) { ret = -EAGAIN; goto out_entry_changed; } @@ -973,7 +976,7 @@ static long sgx_enclave_modify_types(struct sgx_encl *encl, mutex_lock(&encl->lock); - sgx_mark_page_reclaimable(entry->epc_page); + sgx_record_epc_page(entry->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); } /* Change EPC type */ @@ -1130,7 +1133,7 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl, goto out_unlock; } - if (sgx_unmark_page_reclaimable(entry->epc_page)) { + if (sgx_drop_epc_page(entry->epc_page)) { ret = -EBUSY; goto out_unlock; } diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index aa938e4d4a73..3b09433ffd85 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -262,7 +262,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, goto out; sgx_encl_ewb(encl->secs.epc_page, &secs_backing); - + sgx_drop_epc_page(encl->secs.epc_page); sgx_encl_free_epc_page(encl->secs.epc_page); encl->secs.epc_page = NULL; @@ -499,31 +499,35 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) } /** - * sgx_mark_page_reclaimable() - Mark a page as reclaimable + * sgx_record_epc_page() - Add a page to the LRU tracking * @page: EPC page * - * Mark a page as reclaimable and add it to the active page list. Pages - * are automatically removed from the active list when freed. + * Mark a page with the specified flags and add it to the appropriate + * (un)reclaimable list. */ -void sgx_mark_page_reclaimable(struct sgx_epc_page *page) +void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags) { spin_lock(&sgx_global_lru.lock); - page->flags |= SGX_EPC_PAGE_RECLAIMER_TRACKED; - sgx_epc_push_reclaimable(&sgx_global_lru, page); + WARN_ON(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED); + page->flags |= flags; + if (flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) + sgx_epc_push_reclaimable(&sgx_global_lru, page); + else + sgx_epc_push_unreclaimable(&sgx_global_lru, page); spin_unlock(&sgx_global_lru.lock); } /** - * sgx_unmark_page_reclaimable() - Remove a page from the reclaim list + * sgx_drop_epc_page() - Remove a page from a LRU list * @page: EPC page * - * Clear the reclaimable flag and remove the page from the active page list. + * Clear the reclaimable flag if set and remove the page from its LRU. * * Return: * 0 on success, * -EBUSY if the page is in the process of being reclaimed */ -int sgx_unmark_page_reclaimable(struct sgx_epc_page *page) +int sgx_drop_epc_page(struct sgx_epc_page *page) { spin_lock(&sgx_global_lru.lock); if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) { @@ -533,9 +537,9 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page) return -EBUSY; } - list_del(&page->list); page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; } + list_del(&page->list); spin_unlock(&sgx_global_lru.lock); return 0; diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index aac7d4feb0fa..969606615211 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -140,8 +140,8 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void); void sgx_free_epc_page(struct sgx_epc_page *page); void sgx_reclaim_direct(void); -void sgx_mark_page_reclaimable(struct sgx_epc_page *page); -int sgx_unmark_page_reclaimable(struct sgx_epc_page *page); +void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags); +int sgx_drop_epc_page(struct sgx_epc_page *page); struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim); void sgx_ipi_cb(void *info); diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c index 776ae5c1c032..0eabc4db91d0 100644 --- a/arch/x86/kernel/cpu/sgx/virt.c +++ b/arch/x86/kernel/cpu/sgx/virt.c @@ -64,6 +64,8 @@ static int __sgx_vepc_fault(struct sgx_vepc *vepc, goto err_delete; } + sgx_record_epc_page(epc_page, 0); + return 0; err_delete: @@ -148,6 +150,7 @@ static int sgx_vepc_free_page(struct sgx_epc_page *epc_page) return ret; } + sgx_drop_epc_page(epc_page); sgx_free_epc_page(epc_page); return 0; } @@ -220,8 +223,15 @@ static int sgx_vepc_release(struct inode *inode, struct file *file) * have been removed, the SECS page must have a child on * another instance. */ - if (sgx_vepc_free_page(epc_page)) + if (sgx_vepc_free_page(epc_page)) { + /* + * Drop the page before adding it to the list of SECS + * pages. Moving the page off the unreclaimable list + * needs to be done under the LRU's spinlock. + */ + sgx_drop_epc_page(epc_page); list_add_tail(&epc_page->list, &secs_pages); + } xa_erase(&vepc->page_array, index); } @@ -236,15 +246,17 @@ static int sgx_vepc_release(struct inode *inode, struct file *file) mutex_lock(&zombie_secs_pages_lock); list_for_each_entry_safe(epc_page, tmp, &zombie_secs_pages, list) { /* - * Speculatively remove the page from the list of zombies, - * if the page is successfully EREMOVE'd it will be added to - * the list of free pages. If EREMOVE fails, throw the page - * on the local list, which will be spliced on at the end. + * If EREMOVE fails, throw the page on the local list, which + * will be spliced on at the end. + * + * Note, this abuses sgx_drop_epc_page() to delete the page off + * the list of zombies, but this is a very rare path (probably + * never hit in production). It's not worth special casing the + * free path for this super rare case just to avoid taking the + * LRU's spinlock. */ - list_del(&epc_page->list); - if (sgx_vepc_free_page(epc_page)) - list_add_tail(&epc_page->list, &secs_pages); + list_move_tail(&epc_page->list, &secs_pages); } if (!list_empty(&secs_pages)) From patchwork Fri Nov 11 18:35:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040699 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B0E7C4332F for ; Fri, 11 Nov 2022 18:36:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234375AbiKKSg3 (ORCPT ); Fri, 11 Nov 2022 13:36:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37292 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234042AbiKKSgA (ORCPT ); Fri, 11 Nov 2022 13:36:00 -0500 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E5EDC76FA8; Fri, 11 Nov 2022 10:35:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191758; x=1699727758; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CrS+kKaSo6YqmfqiKR19e0LuugWCU+rOlWI2gpswICU=; b=BJInnXeBTVEjceo2PbkEAXbpQ/E04cHV9zHi83yhei+RNq0ajrfXyLMW CNYau8sVqniBHgL3KZ80ifJ6xsJSOs1cI2ldvMJDsq8WQHd+b2noFvZGy E4IgVqF6DxGCjFPh/P2/LpwnR1UjU1hKRT+9lP1mV84NcCtQqbENxRhAf 3/3N+Ek9IoxK50wsGp2SKEVis1XtBL+uznkVjSiIHCyRazYnhlPBk9T3I OI5n/XYjdNweR+qSDYh/GD6EgkPfnvDk5B9byN30//LXJ3//VphwOlp7V ud1sBgRnHZ8TfeqHWwQBxr4sg862Ycx0jj0GXF6sDVhbwGFIAgMdm+v/G Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="292050322" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="292050322" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:35:51 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089207" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089207" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:35:49 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi , Sean Christopherson Subject: [PATCH 06/26] x86/sgx: Introduce RECLAIM_IN_PROGRESS flag for EPC pages Date: Fri, 11 Nov 2022 10:35:11 -0800 Message-Id: <20221111183532.3676646-7-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Keep track of whether the EPC page is in the middle of being reclaimed and do not delete the page off the it's LRU if it has not yet finished being reclaimed. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/main.c | 14 +++++++++----- arch/x86/kernel/cpu/sgx/sgx.h | 4 ++++ 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 3b09433ffd85..8c451071fa91 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -305,13 +305,15 @@ static void __sgx_reclaim_pages(void) encl_page = epc_page->encl_owner; - if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) + if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) { + epc_page->flags |= SGX_EPC_PAGE_RECLAIM_IN_PROGRESS; chunk[cnt++] = epc_page; - else + } else { /* The owner is freeing the page. No need to add the * page back to the list of reclaimable pages. */ epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; + } } spin_unlock(&sgx_global_lru.lock); @@ -337,6 +339,7 @@ static void __sgx_reclaim_pages(void) skip: spin_lock(&sgx_global_lru.lock); + epc_page->flags &= ~SGX_EPC_PAGE_RECLAIM_IN_PROGRESS; sgx_epc_push_reclaimable(&sgx_global_lru, epc_page); spin_unlock(&sgx_global_lru.lock); @@ -360,7 +363,8 @@ static void __sgx_reclaim_pages(void) sgx_reclaimer_write(epc_page, &backing[i]); kref_put(&encl_page->encl->refcount, sgx_encl_release); - epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; + epc_page->flags &= ~(SGX_EPC_PAGE_RECLAIMER_TRACKED | + SGX_EPC_PAGE_RECLAIM_IN_PROGRESS); sgx_free_epc_page(epc_page); } @@ -508,7 +512,7 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags) { spin_lock(&sgx_global_lru.lock); - WARN_ON(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED); + WARN_ON(page->flags & SGX_EPC_PAGE_RECLAIM_FLAGS); page->flags |= flags; if (flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) sgx_epc_push_reclaimable(&sgx_global_lru, page); @@ -532,7 +536,7 @@ int sgx_drop_epc_page(struct sgx_epc_page *page) spin_lock(&sgx_global_lru.lock); if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) { /* The page is being reclaimed. */ - if (list_empty(&page->list)) { + if (page->flags & SGX_EPC_PAGE_RECLAIM_IN_PROGRESS) { spin_unlock(&sgx_global_lru.lock); return -EBUSY; } diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 969606615211..04ca644928a8 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -30,6 +30,10 @@ #define SGX_EPC_PAGE_IS_FREE BIT(1) /* Pages allocated for KVM guest */ #define SGX_EPC_PAGE_KVM_GUEST BIT(2) +/* page flag to indicate reclaim is in progress */ +#define SGX_EPC_PAGE_RECLAIM_IN_PROGRESS BIT(3) +#define SGX_EPC_PAGE_RECLAIM_FLAGS (SGX_EPC_PAGE_RECLAIMER_TRACKED | \ + SGX_EPC_PAGE_RECLAIM_IN_PROGRESS) struct sgx_epc_page { unsigned int section; From patchwork Fri Nov 11 18:35:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040698 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FFCCC43219 for ; Fri, 11 Nov 2022 18:36:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234341AbiKKSg2 (ORCPT ); Fri, 11 Nov 2022 13:36:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37294 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234041AbiKKSgA (ORCPT ); Fri, 11 Nov 2022 13:36:00 -0500 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 571187C8D6; Fri, 11 Nov 2022 10:35:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191759; x=1699727759; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CIa/7h6XPbf5DpSxTVt6veE3s6+KUgB6b/1ZxHux3uI=; b=aaVhyy4smRD5mAabgZ3ld1/76uN6/RYyrnkt41wrAWLNLnnPHpHvROnb 8hGBJGNEEZUWjA/7xvwdVP7T5BS690Ee47VAQnx5UZW3UIdy5DJBQE0+H 1ydiNp9II13fXLkIk9UDHh3lvsr23J+FSMZBaNKZdSwXSA1joizIWtqdC +bAGuyLpnyNMVVGs3h4hcXkF/twa9Q5NBvBun+YcKUMREVMGCihSJhrqD F7UPK6coHJze3R1s8CED7ALYO58zMvBidrTmvcm02VYQpLoAPNq8MkQBB JTtRDtKzQg1UeTHfgxnQZVxN075YdiFrX/3tMbF8AckDQEKvHOOry0t85 Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="292050336" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="292050336" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:35:53 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089221" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089221" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:35:52 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi , Sean Christopherson Subject: [PATCH 07/26] x86/sgx: Use a list to track to-be-reclaimed pages during reclaim Date: Fri, 11 Nov 2022 10:35:12 -0800 Message-Id: <20221111183532.3676646-8-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Change sgx_reclaim_pages() to use a list rather than an array for storing the epc_pages which will be reclaimed. This change is needed to transition to the LRU implementation for EPC cgroup support. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/main.c | 44 ++++++++++++++++------------------ arch/x86/kernel/cpu/sgx/sgx.h | 28 ++++++++++++++++++++++ 2 files changed, 48 insertions(+), 24 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 8c451071fa91..c76a53b63fa2 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -288,18 +288,17 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, */ static void __sgx_reclaim_pages(void) { - struct sgx_epc_page *chunk[SGX_NR_TO_SCAN]; struct sgx_backing backing[SGX_NR_TO_SCAN]; + struct sgx_epc_page *epc_page, *tmp; struct sgx_encl_page *encl_page; - struct sgx_epc_page *epc_page; pgoff_t page_index; - int cnt = 0; + LIST_HEAD(iso); int ret; int i; spin_lock(&sgx_global_lru.lock); for (i = 0; i < SGX_NR_TO_SCAN; i++) { - epc_page = sgx_epc_pop_reclaimable(&sgx_global_lru); + epc_page = sgx_epc_peek_reclaimable(&sgx_global_lru); if (!epc_page) break; @@ -307,18 +306,22 @@ static void __sgx_reclaim_pages(void) if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) { epc_page->flags |= SGX_EPC_PAGE_RECLAIM_IN_PROGRESS; - chunk[cnt++] = epc_page; + list_move_tail(&epc_page->list, &iso); } else { - /* The owner is freeing the page. No need to add the - * page back to the list of reclaimable pages. + /* The owner is freeing the page, remove it from the + * LRU list */ epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; + list_del_init(&epc_page->list); } } spin_unlock(&sgx_global_lru.lock); - for (i = 0; i < cnt; i++) { - epc_page = chunk[i]; + if (list_empty(&iso)) + return; + + i = 0; + list_for_each_entry_safe(epc_page, tmp, &iso, list) { encl_page = epc_page->encl_owner; if (!sgx_reclaimer_age(epc_page)) @@ -333,6 +336,7 @@ static void __sgx_reclaim_pages(void) goto skip; } + i++; encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED; mutex_unlock(&encl_page->encl->lock); continue; @@ -340,27 +344,19 @@ static void __sgx_reclaim_pages(void) skip: spin_lock(&sgx_global_lru.lock); epc_page->flags &= ~SGX_EPC_PAGE_RECLAIM_IN_PROGRESS; - sgx_epc_push_reclaimable(&sgx_global_lru, epc_page); + sgx_epc_move_reclaimable(&sgx_global_lru, epc_page); spin_unlock(&sgx_global_lru.lock); kref_put(&encl_page->encl->refcount, sgx_encl_release); - - chunk[i] = NULL; - } - - for (i = 0; i < cnt; i++) { - epc_page = chunk[i]; - if (epc_page) - sgx_reclaimer_block(epc_page); } - for (i = 0; i < cnt; i++) { - epc_page = chunk[i]; - if (!epc_page) - continue; - + list_for_each_entry(epc_page, &iso, list) + sgx_reclaimer_block(epc_page); + + i = 0; + list_for_each_entry_safe(epc_page, tmp, &iso, list) { encl_page = epc_page->encl_owner; - sgx_reclaimer_write(epc_page, &backing[i]); + sgx_reclaimer_write(epc_page, &backing[i++]); kref_put(&encl_page->encl->refcount, sgx_encl_release); epc_page->flags &= ~(SGX_EPC_PAGE_RECLAIMER_TRACKED | diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 04ca644928a8..29c0981d6310 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -116,6 +116,14 @@ static inline void __sgx_epc_page_list_push(struct list_head *list, struct sgx_e list_add_tail(&page->list, list); } +/* + * Must be called with queue lock acquired + */ +static inline void __sgx_epc_page_list_move(struct list_head *list, struct sgx_epc_page *page) +{ + list_move_tail(&page->list, list); +} + /* * Must be called with queue lock acquired */ @@ -131,14 +139,34 @@ static inline struct sgx_epc_page * __sgx_epc_page_list_pop(struct list_head *li return epc_page; } +/* + * Must be called with queue lock acquired + */ +static inline struct sgx_epc_page * __sgx_epc_page_list_peek(struct list_head *list) +{ + struct sgx_epc_page *epc_page; + + if (list_empty(list)) + return NULL; + + epc_page = list_first_entry(list, struct sgx_epc_page, list); + return epc_page; +} + #define sgx_epc_pop_reclaimable(lru) \ __sgx_epc_page_list_pop(&(lru)->reclaimable) #define sgx_epc_push_reclaimable(lru, page) \ __sgx_epc_page_list_push(&(lru)->reclaimable, page) +#define sgx_epc_peek_reclaimable(lru) \ + __sgx_epc_page_list_peek(&(lru)->reclaimable) +#define sgx_epc_move_reclaimable(lru, page) \ + __sgx_epc_page_list_move(&(lru)->reclaimable, page) #define sgx_epc_pop_unreclaimable(lru) \ __sgx_epc_page_list_pop(&(lru)->unreclaimable) #define sgx_epc_push_unreclaimable(lru, page) \ __sgx_epc_page_list_push(&(lru)->unreclaimable, page) +#define sgx_epc_peek_unreclaimable(lru) \ + __sgx_epc_page_list_peek(&(lru)->unreclaimable) struct sgx_epc_page *__sgx_alloc_epc_page(void); void sgx_free_epc_page(struct sgx_epc_page *page); From patchwork Fri Nov 11 18:35:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040697 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C28CC4167B for ; Fri, 11 Nov 2022 18:36:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234346AbiKKSg2 (ORCPT ); Fri, 11 Nov 2022 13:36:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234053AbiKKSgB (ORCPT ); Fri, 11 Nov 2022 13:36:01 -0500 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 164EC814C8; Fri, 11 Nov 2022 10:36:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191760; x=1699727760; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0+aBTzx8Uf2/SBfmd8GXuOUotmJgQMlSlBljHCmIA5M=; b=lNiOCXv78s5N/y2VerATNU0jCeWvguwD/JPkRmI3EsoH8V/NciHMdjBD DCbeph1GQkih3GyZi8dzK7xuo7oxDdh0HlAn+I9vteINvpTQ1SA/kERiu 1ggiJenC6Aod8Ha5D7vFG95ju2B7j9TE1xp1BN9mnwiuqfmbX6vbR7ne2 wI7b9QIdqGhqaUX08fcC8xEv+0PCIWX18exGOZY7xs22DXIWHQdr+k1+D GhTtYEPQ6Lf/RHamycs7w9Sj497yr0NuyYfKe0kIyqwqyvOPoMI3qeqzs uODvbMyYdNbardj5O2XLE7BhNO5aVYOmTLvjKj6d0HbdFY5KYSPrF57hy A==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="292050344" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="292050344" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:35:55 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089238" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089238" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:35:54 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi , Sean Christopherson Subject: [PATCH 08/26] x86/sgx: Add EPC page flags to identify type of page Date: Fri, 11 Nov 2022 10:35:13 -0800 Message-Id: <20221111183532.3676646-9-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Create new flags to help identify whether a page is an enclave page or a va page and save the page type when the page is recorded. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/encl.c | 6 +++--- arch/x86/kernel/cpu/sgx/ioctl.c | 4 ++-- arch/x86/kernel/cpu/sgx/main.c | 21 +++++++++++---------- arch/x86/kernel/cpu/sgx/sgx.h | 8 +++++++- 4 files changed, 23 insertions(+), 16 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 4683da9ef4f1..653c9ee5bf57 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -252,7 +252,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl, epc_page = sgx_encl_eldu(&encl->secs, NULL); if (IS_ERR(epc_page)) return ERR_CAST(epc_page); - sgx_record_epc_page(epc_page, 0); + sgx_record_epc_page(epc_page, SGX_EPC_PAGE_ENCLAVE); } epc_page = sgx_encl_eldu(entry, encl->secs.epc_page); @@ -260,7 +260,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl, return ERR_CAST(epc_page); encl->secs_child_cnt++; - sgx_record_epc_page(entry->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); + sgx_record_epc_page(entry->epc_page, SGX_EPC_PAGE_ENCLAVE_RECLAIMABLE); return entry; } @@ -1221,7 +1221,7 @@ struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool reclaim) sgx_encl_free_epc_page(epc_page); return ERR_PTR(-EFAULT); } - sgx_record_epc_page(epc_page, 0); + sgx_record_epc_page(epc_page, SGX_EPC_PAGE_VERSION_ARRAY); return epc_page; } diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index aca80a3f38a1..c91cc6a01232 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -114,7 +114,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs) encl->attributes = secs->attributes; encl->attributes_mask = SGX_ATTR_DEBUG | SGX_ATTR_MODE64BIT | SGX_ATTR_KSS; - sgx_record_epc_page(encl->secs.epc_page, 0); + sgx_record_epc_page(encl->secs.epc_page, SGX_EPC_PAGE_ENCLAVE); /* Set only after completion, as encl->lock has not been taken. */ set_bit(SGX_ENCL_CREATED, &encl->flags); @@ -325,7 +325,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src, goto err_out; } - sgx_record_epc_page(encl_page->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); + sgx_record_epc_page(encl_page->epc_page, SGX_EPC_PAGE_ENCLAVE_RECLAIMABLE); mutex_unlock(&encl->lock); mmap_read_unlock(current->mm); return ret; diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index c76a53b63fa2..09cc83d7cb97 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -304,6 +304,9 @@ static void __sgx_reclaim_pages(void) encl_page = epc_page->encl_owner; + if (WARN_ON_ONCE(!(epc_page->flags & SGX_EPC_PAGE_ENCLAVE))) + continue; + if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) { epc_page->flags |= SGX_EPC_PAGE_RECLAIM_IN_PROGRESS; list_move_tail(&epc_page->list, &iso); @@ -359,8 +362,7 @@ static void __sgx_reclaim_pages(void) sgx_reclaimer_write(epc_page, &backing[i++]); kref_put(&encl_page->encl->refcount, sgx_encl_release); - epc_page->flags &= ~(SGX_EPC_PAGE_RECLAIMER_TRACKED | - SGX_EPC_PAGE_RECLAIM_IN_PROGRESS); + epc_page->flags &= ~SGX_EPC_PAGE_RECLAIM_FLAGS; sgx_free_epc_page(epc_page); } @@ -501,6 +503,7 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) /** * sgx_record_epc_page() - Add a page to the LRU tracking * @page: EPC page + * @flags: Reclaim flags for the page. * * Mark a page with the specified flags and add it to the appropriate * (un)reclaimable list. @@ -530,18 +533,16 @@ void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags) int sgx_drop_epc_page(struct sgx_epc_page *page) { spin_lock(&sgx_global_lru.lock); - if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) { - /* The page is being reclaimed. */ - if (page->flags & SGX_EPC_PAGE_RECLAIM_IN_PROGRESS) { - spin_unlock(&sgx_global_lru.lock); - return -EBUSY; - } - - page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; + if ((page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) && + (page->flags & SGX_EPC_PAGE_RECLAIM_IN_PROGRESS)) { + spin_unlock(&sgx_global_lru.lock); + return -EBUSY; } list_del(&page->list); spin_unlock(&sgx_global_lru.lock); + page->flags &= ~SGX_EPC_PAGE_RECLAIM_FLAGS; + return 0; } diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 29c0981d6310..f3fc027f7cd0 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -32,8 +32,14 @@ #define SGX_EPC_PAGE_KVM_GUEST BIT(2) /* page flag to indicate reclaim is in progress */ #define SGX_EPC_PAGE_RECLAIM_IN_PROGRESS BIT(3) +#define SGX_EPC_PAGE_ENCLAVE BIT(4) +#define SGX_EPC_PAGE_VERSION_ARRAY BIT(5) +#define SGX_EPC_PAGE_ENCLAVE_RECLAIMABLE (SGX_EPC_PAGE_ENCLAVE | \ + SGX_EPC_PAGE_RECLAIMER_TRACKED) #define SGX_EPC_PAGE_RECLAIM_FLAGS (SGX_EPC_PAGE_RECLAIMER_TRACKED | \ - SGX_EPC_PAGE_RECLAIM_IN_PROGRESS) + SGX_EPC_PAGE_RECLAIM_IN_PROGRESS | \ + SGX_EPC_PAGE_ENCLAVE | \ + SGX_EPC_PAGE_VERSION_ARRAY) struct sgx_epc_page { unsigned int section; From patchwork Fri Nov 11 18:35:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040700 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20396C43217 for ; Fri, 11 Nov 2022 18:36:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233802AbiKKSg3 (ORCPT ); Fri, 11 Nov 2022 13:36:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234055AbiKKSgB (ORCPT ); Fri, 11 Nov 2022 13:36:01 -0500 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5CC6D814CB; Fri, 11 Nov 2022 10:36:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191760; x=1699727760; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=k+CVp2hi4UWDtXCcGnCdgpBlK4viKY8x8JVMjdD5Pak=; b=FEtJInvrR9Jcy+UGn6nMLT91pW6eoyIwxIrcaQuTuDIgNuvqcmoniPe8 BLfI8e1IsaE+/LGpb/WfiY/0Hm917V4ODBCeMm74eQf8vfBhW0n18zFdA vRQzmbjpVqp1fp4r/5CNfdZUPAN/8YaIsJ+2k80ECI4iZjn1vjURLBj2O KSFc9qsPefjPC9Fgjo7mzaMOobnqFoyNr3/eK7MkpeTHFpBB9zrjQfZLC u3cD/OdIO3jfgp0cuC9t1KLZGvkRrJ4tFwg5ZjSEs86zdfvVLfHfzd4+D siH8eCnSqtYqG/r+4gs81NvDDes4wI11yM27RiMnCQZW4s9wC9Etzk0RC w==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="292050351" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="292050351" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:35:58 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089252" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089252" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:35:56 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi , Sean Christopherson Subject: [PATCH 09/26] x86/sgx: Allow reclaiming up to 32 pages, but scan 16 by default Date: Fri, 11 Nov 2022 10:35:14 -0800 Message-Id: <20221111183532.3676646-10-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Modify sgx_reclaim_pages() to take a parameter that specifies the number of pages to scan for reclaiming. Specify a max value of 32, but scan 16 in the usual case. This allows the number of pages sgx_reclaim_pages() scans to be specified by the caller, and adjusted in future patches. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/main.c | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 09cc83d7cb97..02b9eafa90a2 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -18,6 +18,8 @@ #include "encl.h" #include "encls.h" +#define SGX_MAX_NR_TO_RECLAIM 32 + struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS]; static int sgx_nr_epc_sections; static struct task_struct *ksgxd_tsk; @@ -273,7 +275,10 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, mutex_unlock(&encl->lock); } -/* +/** + * sgx_reclaim_pages() - Reclaim EPC pages from the consumers + * @nr_to_scan: Number of EPC pages to scan for reclaim + * * Take a fixed number of pages from the head of the active page pool and * reclaim them to the enclave's private shmem files. Skip the pages, which have * been accessed since the last scan. Move those pages to the tail of active @@ -286,9 +291,9 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, * problematic as it would increase the lock contention too much, which would * halt forward progress. */ -static void __sgx_reclaim_pages(void) +static void __sgx_reclaim_pages(int nr_to_scan) { - struct sgx_backing backing[SGX_NR_TO_SCAN]; + struct sgx_backing backing[SGX_MAX_NR_TO_RECLAIM]; struct sgx_epc_page *epc_page, *tmp; struct sgx_encl_page *encl_page; pgoff_t page_index; @@ -297,7 +302,7 @@ static void __sgx_reclaim_pages(void) int i; spin_lock(&sgx_global_lru.lock); - for (i = 0; i < SGX_NR_TO_SCAN; i++) { + for (i = 0; i < nr_to_scan; i++) { epc_page = sgx_epc_peek_reclaimable(&sgx_global_lru); if (!epc_page) break; @@ -327,7 +332,7 @@ static void __sgx_reclaim_pages(void) list_for_each_entry_safe(epc_page, tmp, &iso, list) { encl_page = epc_page->encl_owner; - if (!sgx_reclaimer_age(epc_page)) + if (i == SGX_MAX_NR_TO_RECLAIM || !sgx_reclaimer_age(epc_page)) goto skip; page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base); @@ -368,9 +373,9 @@ static void __sgx_reclaim_pages(void) } } -static void sgx_reclaim_pages(void) +static void sgx_reclaim_pages(int nr_to_scan) { - __sgx_reclaim_pages(); + __sgx_reclaim_pages(nr_to_scan); cond_resched(); } @@ -390,7 +395,7 @@ static bool sgx_should_reclaim(unsigned long watermark) void sgx_reclaim_direct(void) { if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) - __sgx_reclaim_pages(); + __sgx_reclaim_pages(SGX_NR_TO_SCAN); } static int ksgxd(void *p) @@ -416,7 +421,7 @@ static int ksgxd(void *p) sgx_should_reclaim(SGX_NR_HIGH_PAGES)); if (sgx_should_reclaim(SGX_NR_HIGH_PAGES)) - sgx_reclaim_pages(); + sgx_reclaim_pages(SGX_NR_TO_SCAN); } return 0; @@ -591,7 +596,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - sgx_reclaim_pages(); + sgx_reclaim_pages(SGX_NR_TO_SCAN); } if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) From patchwork Fri Nov 11 18:35:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040701 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9CF6EC433FE for ; Fri, 11 Nov 2022 18:36:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234389AbiKKSga (ORCPT ); Fri, 11 Nov 2022 13:36:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37234 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234057AbiKKSgB (ORCPT ); Fri, 11 Nov 2022 13:36:01 -0500 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8DC567879A; Fri, 11 Nov 2022 10:36:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191760; x=1699727760; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=auEkniWHXFa2fE8QomwYSOXBkMl7XCSjIOySc76haUo=; b=moE5q5ze8HRfh2Io/10CiATX4QuLtmCZ9WUGz5eTpMe6kB/wOIsPq+WB U/UmBWrsSRMys87dGMDh24avstK8wJ5z6a4ZT079FDDoyyinPiX+Fe+0F sP98tgIkmDERihHKBVt/odKR2/Wz3h6rWmsjg4xQlGf1maJ7tQnAw/4lh r40zxA1+d0apprXFTYBRK7vFqL+CNrIbv2jWcmj74x4iS6RmOBWus4hJr dOVHHOhbVR5fj631MvmkeP8sCUWKzc7JHtlH9a1aWlD4ElJeg4Ciz1z4Z BpRXTP9Yi+0RoU9YqYu8jvGo/Oyi82Xp0EIoPZwomsjVu7aDPb9eacr8D A==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="292050354" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="292050354" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:35:59 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089258" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089258" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:35:58 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi , Sean Christopherson Subject: [PATCH 10/26] x86/sgx: Return the number of EPC pages that were successfully reclaimed Date: Fri, 11 Nov 2022 10:35:15 -0800 Message-Id: <20221111183532.3676646-11-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Return the number of reclaimed pages from sgx_reclaim_pages(), the EPC cgroup will use the result to track the success rate of its reclaim calls, e.g. to escalate to a more forceful reclaiming mode if necessary. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/main.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 02b9eafa90a2..dfd76c605ef2 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -291,7 +291,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, * problematic as it would increase the lock contention too much, which would * halt forward progress. */ -static void __sgx_reclaim_pages(int nr_to_scan) +static int __sgx_reclaim_pages(int nr_to_scan) { struct sgx_backing backing[SGX_MAX_NR_TO_RECLAIM]; struct sgx_epc_page *epc_page, *tmp; @@ -326,7 +326,7 @@ static void __sgx_reclaim_pages(int nr_to_scan) spin_unlock(&sgx_global_lru.lock); if (list_empty(&iso)) - return; + return 0; i = 0; list_for_each_entry_safe(epc_page, tmp, &iso, list) { @@ -371,12 +371,16 @@ static void __sgx_reclaim_pages(int nr_to_scan) sgx_free_epc_page(epc_page); } + return i; } -static void sgx_reclaim_pages(int nr_to_scan) +static int sgx_reclaim_pages(int nr_to_scan) { - __sgx_reclaim_pages(nr_to_scan); + int ret; + + ret = __sgx_reclaim_pages(nr_to_scan); cond_resched(); + return ret; } static bool sgx_should_reclaim(unsigned long watermark) From patchwork Fri Nov 11 18:35:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040702 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7897BC4332F for ; Fri, 11 Nov 2022 18:36:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233938AbiKKSgb (ORCPT ); Fri, 11 Nov 2022 13:36:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37302 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233593AbiKKSgC (ORCPT ); Fri, 11 Nov 2022 13:36:02 -0500 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 95241814D1; Fri, 11 Nov 2022 10:36:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191761; x=1699727761; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=T3P7Q3xIri7oBiVTNsBMGR4O8OWA5OQYl9qvR3LdFVU=; b=XkUwzZ3aHIemRjo/noiKurMhPxOTF54mZDv4Cq1yMUGABab6eXeoUeeW e0QW9HCj8qQIdygxumVqKLqfoSsD5sbZDZuA1OvUcZlOv28HynRETJuKa UAZjLwQMa2VaZyYBIgw3N+rMtiyIDgPw4N8RsFaTo+KC3EDeQRhZ/Ur3S omM39GLEu/QErO96sWoScPvvQzzTxUVy8OlSEMkyTROVFZehPGES8HNg/ PoUxJTxTU//YUKWXiPF97MlKM6Hd+BUobHPEfqfiI1m1X1CDkz/3LqfFE HKBlM8AzJqbjuTMimnlZ8n6HTRr734pVSAVC89q/V6jzQr4j/bG8ACdFt Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="292050357" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="292050357" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:01 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089262" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089262" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:00 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi , Sean Christopherson Subject: [PATCH 11/26] x86/sgx: Add option to ignore age of page during EPC reclaim Date: Fri, 11 Nov 2022 10:35:16 -0800 Message-Id: <20221111183532.3676646-12-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Add a flag to sgx_reclaim_pages() to instruct it to ignore the age of page, i.e. reclaim the page even if it's young. The EPC cgroup will use the flag to enforce its limits by draining the reclaimable lists before resorting to other measures, e.g. forcefully reclaimable "unreclaimable" pages by killing enclaves. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/main.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index dfd76c605ef2..b72b5868dd01 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -278,6 +278,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, /** * sgx_reclaim_pages() - Reclaim EPC pages from the consumers * @nr_to_scan: Number of EPC pages to scan for reclaim + * @ignore_age: Reclaim a page even if it is young * * Take a fixed number of pages from the head of the active page pool and * reclaim them to the enclave's private shmem files. Skip the pages, which have @@ -291,7 +292,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, * problematic as it would increase the lock contention too much, which would * halt forward progress. */ -static int __sgx_reclaim_pages(int nr_to_scan) +static int __sgx_reclaim_pages(int nr_to_scan, bool ignore_age) { struct sgx_backing backing[SGX_MAX_NR_TO_RECLAIM]; struct sgx_epc_page *epc_page, *tmp; @@ -332,7 +333,8 @@ static int __sgx_reclaim_pages(int nr_to_scan) list_for_each_entry_safe(epc_page, tmp, &iso, list) { encl_page = epc_page->encl_owner; - if (i == SGX_MAX_NR_TO_RECLAIM || !sgx_reclaimer_age(epc_page)) + if (i == SGX_MAX_NR_TO_RECLAIM || + (!ignore_age && !sgx_reclaimer_age(epc_page))) goto skip; page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base); @@ -374,11 +376,11 @@ static int __sgx_reclaim_pages(int nr_to_scan) return i; } -static int sgx_reclaim_pages(int nr_to_scan) +static int sgx_reclaim_pages(int nr_to_scan, bool ignore_age) { int ret; - ret = __sgx_reclaim_pages(nr_to_scan); + ret = __sgx_reclaim_pages(nr_to_scan, ignore_age); cond_resched(); return ret; } @@ -399,7 +401,7 @@ static bool sgx_should_reclaim(unsigned long watermark) void sgx_reclaim_direct(void) { if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) - __sgx_reclaim_pages(SGX_NR_TO_SCAN); + __sgx_reclaim_pages(SGX_NR_TO_SCAN, false); } static int ksgxd(void *p) @@ -425,7 +427,7 @@ static int ksgxd(void *p) sgx_should_reclaim(SGX_NR_HIGH_PAGES)); if (sgx_should_reclaim(SGX_NR_HIGH_PAGES)) - sgx_reclaim_pages(SGX_NR_TO_SCAN); + sgx_reclaim_pages(SGX_NR_TO_SCAN, false); } return 0; @@ -600,7 +602,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - sgx_reclaim_pages(SGX_NR_TO_SCAN); + sgx_reclaim_pages(SGX_NR_TO_SCAN, false); } if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) From patchwork Fri Nov 11 18:35:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040703 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B53A6C433FE for ; Fri, 11 Nov 2022 18:36:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234400AbiKKSgb (ORCPT ); Fri, 11 Nov 2022 13:36:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37242 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233368AbiKKSgE (ORCPT ); Fri, 11 Nov 2022 13:36:04 -0500 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 982F4787B6; Fri, 11 Nov 2022 10:36:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191763; x=1699727763; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xqEEntqYbLvI5IiAkoUnLqlFey5cK++4386ISoTRbDQ=; b=NyeS7rN3OaA/u3vut/55VUDCWVXYTm9hf7Ix0JkVZJI6BfW5/q65vi5I Mcx9yG0cyu4yFKXzdP+TstbbzQE3kpV/ThxJOQcqByhBh5MGp6r0KFXKY iYwO7ymRmhTkp0JEMdjYY/nVp68miCJi+ip68wdc5u4InuQk620Z6dJVV Wms1+CUebNkp4BM4PIUHTtLERBqeyL5VXCnR4/s2jbsaL+VFq7tkluWdo hy+MIDjZp7Dq68zx/qmQthDg+ftxAJEASR0NSSe6P2BaMdnqM0jKAt1mD 68pDWfzd2gdIBg3cumKo78YhxxyakP6z1jGCUzcZ4q3NjdWzUhOgrkyxo Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="292050361" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="292050361" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:03 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089272" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089272" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:01 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi , Sean Christopherson Subject: [PATCH 12/26] x86/sgx: Add helper to retrieve SGX EPC LRU given an EPC page Date: Fri, 11 Nov 2022 10:35:17 -0800 Message-Id: <20221111183532.3676646-13-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Introduce a function that will be used to retrieve an LRU from an EPC page. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/main.c | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index b72b5868dd01..c33966eafab6 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -31,6 +31,10 @@ static DEFINE_XARRAY(sgx_epc_address_space); * with sgx_global_lru.lock acquired. */ static struct sgx_epc_lru sgx_global_lru; +static inline struct sgx_epc_lru *sgx_lru(struct sgx_epc_page *epc_page) +{ + return &sgx_global_lru; +} static atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0); @@ -297,6 +301,7 @@ static int __sgx_reclaim_pages(int nr_to_scan, bool ignore_age) struct sgx_backing backing[SGX_MAX_NR_TO_RECLAIM]; struct sgx_epc_page *epc_page, *tmp; struct sgx_encl_page *encl_page; + struct sgx_epc_lru *lru; pgoff_t page_index; LIST_HEAD(iso); int ret; @@ -352,10 +357,11 @@ static int __sgx_reclaim_pages(int nr_to_scan, bool ignore_age) continue; skip: - spin_lock(&sgx_global_lru.lock); + lru = sgx_lru(epc_page); + spin_lock(&lru->lock); epc_page->flags &= ~SGX_EPC_PAGE_RECLAIM_IN_PROGRESS; - sgx_epc_move_reclaimable(&sgx_global_lru, epc_page); - spin_unlock(&sgx_global_lru.lock); + sgx_epc_move_reclaimable(lru, epc_page); + spin_unlock(&lru->lock); kref_put(&encl_page->encl->refcount, sgx_encl_release); } @@ -521,14 +527,16 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) */ void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags) { - spin_lock(&sgx_global_lru.lock); + struct sgx_epc_lru *lru = sgx_lru(page); + + spin_lock(&lru->lock); WARN_ON(page->flags & SGX_EPC_PAGE_RECLAIM_FLAGS); page->flags |= flags; if (flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) - sgx_epc_push_reclaimable(&sgx_global_lru, page); + sgx_epc_push_reclaimable(lru, page); else - sgx_epc_push_unreclaimable(&sgx_global_lru, page); - spin_unlock(&sgx_global_lru.lock); + sgx_epc_push_unreclaimable(lru, page); + spin_unlock(&lru->lock); } /** @@ -543,14 +551,16 @@ void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags) */ int sgx_drop_epc_page(struct sgx_epc_page *page) { - spin_lock(&sgx_global_lru.lock); + struct sgx_epc_lru *lru = sgx_lru(page); + + spin_lock(&lru->lock); if ((page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) && (page->flags & SGX_EPC_PAGE_RECLAIM_IN_PROGRESS)) { - spin_unlock(&sgx_global_lru.lock); + spin_unlock(&lru->lock); return -EBUSY; } list_del(&page->list); - spin_unlock(&sgx_global_lru.lock); + spin_unlock(&lru->lock); page->flags &= ~SGX_EPC_PAGE_RECLAIM_FLAGS; From patchwork Fri Nov 11 18:35:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040704 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF173C4332F for ; Fri, 11 Nov 2022 18:36:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234404AbiKKSgd (ORCPT ); Fri, 11 Nov 2022 13:36:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37338 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233828AbiKKSgF (ORCPT ); Fri, 11 Nov 2022 13:36:05 -0500 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 22EB832061; Fri, 11 Nov 2022 10:36:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191765; x=1699727765; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tdinHlyyGqVYquaFqgcvC9rbRf2JmMok2HdQOnKfOO0=; b=kQNug6BumCzHV+Fz26lBnWpC6rlBPPrsDXL0pxOp5hB0JkA4jnJRgOPi kz5oxfnJZlMIuJDZ7wtPO3eJQe9LOWGnrrLi3DJxcUVMnyPrMuKh19BgB 55MrZUl7rjW065OKk+Lwcl/u0TivsHL7aNU6Em/pltOXyVl/MeMcJdB09 z7AmC0dEZier/ZCijcxdG5Fd37LF7ddD8lBuyosgD0h49oeqEWOuY66Pa 0tosQkA3HwIeb7Hgz4OSM8aqaNlmB2Fuwh4sXWn8AIkD3aj8sTIR4rSj5 dpWLe8XD1HH6rPfIw/1zn8ajBdYtFjtn4ZKYVs1e7sRgoEm/sNrzGtNJ+ g==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="292050364" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="292050364" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:05 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089302" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089302" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:03 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi , Sean Christopherson Subject: [PATCH 13/26] x86/sgx: Prepare for multiple LRUs Date: Fri, 11 Nov 2022 10:35:18 -0800 Message-Id: <20221111183532.3676646-14-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Add sgx_can_reclaim() wrapper so that in a subsequent patch, multiple LRUs can be used cleanly. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/main.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index c33966eafab6..b2c050fcc989 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -391,10 +391,15 @@ static int sgx_reclaim_pages(int nr_to_scan, bool ignore_age) return ret; } +static bool sgx_can_reclaim(void) +{ + return !list_empty(&sgx_global_lru.reclaimable); +} + static bool sgx_should_reclaim(unsigned long watermark) { return atomic_long_read(&sgx_nr_free_pages) < watermark && - !list_empty(&sgx_global_lru.reclaimable); + sgx_can_reclaim(); } /* @@ -599,7 +604,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - if (list_empty(&sgx_global_lru.reclaimable)) + if (!sgx_can_reclaim()) return ERR_PTR(-ENOMEM); if (!reclaim) { From patchwork Fri Nov 11 18:35:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040705 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17815C43219 for ; Fri, 11 Nov 2022 18:36:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234408AbiKKSge (ORCPT ); Fri, 11 Nov 2022 13:36:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37366 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234096AbiKKSgH (ORCPT ); Fri, 11 Nov 2022 13:36:07 -0500 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C1D31814CF; Fri, 11 Nov 2022 10:36:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191766; x=1699727766; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9lDq2QGtdu+ipirjs6bxBHl2Vfzbdikl95QVLbAr5eU=; b=RZUE1KX7cUxej+1xwc5R2arogNp2qVojj/AJkjgbNgHM2knPfCFJWizI FsITFUyFWtiMwvkVWX+D9kyjH5h+TQUimvR7AEohr5Qy3ZNCA6GI+HZCY fRYIwKDuL3No5IKQqEmvCGmCsw1AfQyltFKx0BHTLFIZlCNpKLOIoG8E0 SaZqdr6z7GzVeVJeEHbsSG4RDiAQXkpnNkblNsi7GCdgGsWEo/d8y5WQT VgAU6RC4YLdBWhu1GLqDoVVW85cGNmyE+0ZRs/W5NroPDbHEptDD9cqWA aNsctRcAuopVWFjTExnM+VCrzymklAtKA5Xs6COMvFNashGEMd3ncJglH g==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="292050366" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="292050366" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:06 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089309" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089309" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:05 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi , Sean Christopherson Subject: [PATCH 14/26] x86/sgx: Expose sgx_reclaim_pages() for use by EPC cgroup Date: Fri, 11 Nov 2022 10:35:19 -0800 Message-Id: <20221111183532.3676646-15-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Expose the top-level reclaim function as sgx_reclaim_epc_pages() for use by the upcoming EPC cgroup, which will initiate reclaim to enforce changes to high/max limits. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/main.c | 7 ++++--- arch/x86/kernel/cpu/sgx/sgx.h | 1 + 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index b2c050fcc989..cb6f57caf24c 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -281,6 +281,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, /** * sgx_reclaim_pages() - Reclaim EPC pages from the consumers + * sgx_reclaim_epc_pages() - Reclaim EPC pages from the consumers * @nr_to_scan: Number of EPC pages to scan for reclaim * @ignore_age: Reclaim a page even if it is young * @@ -382,7 +383,7 @@ static int __sgx_reclaim_pages(int nr_to_scan, bool ignore_age) return i; } -static int sgx_reclaim_pages(int nr_to_scan, bool ignore_age) +int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age) { int ret; @@ -438,7 +439,7 @@ static int ksgxd(void *p) sgx_should_reclaim(SGX_NR_HIGH_PAGES)); if (sgx_should_reclaim(SGX_NR_HIGH_PAGES)) - sgx_reclaim_pages(SGX_NR_TO_SCAN, false); + sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false); } return 0; @@ -617,7 +618,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - sgx_reclaim_pages(SGX_NR_TO_SCAN, false); + sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false); } if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index f3fc027f7cd0..ca51b3c7d905 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -181,6 +181,7 @@ void sgx_reclaim_direct(void); void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags); int sgx_drop_epc_page(struct sgx_epc_page *page); struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim); +int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age); void sgx_ipi_cb(void *info); From patchwork Fri Nov 11 18:35:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040706 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75527C43217 for ; Fri, 11 Nov 2022 18:36:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234415AbiKKSgg (ORCPT ); Fri, 11 Nov 2022 13:36:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37222 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234118AbiKKSgJ (ORCPT ); Fri, 11 Nov 2022 13:36:09 -0500 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AE0288290B; Fri, 11 Nov 2022 10:36:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191768; x=1699727768; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7ztjZ1lXO5N5qsaAMVTmsTfaoQ2t8XkFBGpOnZicwNo=; b=TgBBh/kYsih/e5ziEBSsW36bz0DDQDcAFsqjK0TGP3RghTwJ6R6QdhEG k307Vgma/TIbuq3XAAiJgyR7fP2javZPI/aQR0lNWyPt56Mgj5XdfsYu+ ojUrPBfhAjOo0lN6UxqswD1bXMSx/PuBhTbc5P3hnxYYISSCIIlvttapY o0y9av1Mckr1xoVKnJ3bUW+E/jyurI67UFdLYNGfwyb7qAFFKOOmbqcBf boaWNJmuuY2hVzr+E2xq1wNj7toco6j3HlxfM9Ut7Y+O8S+ZbuJOtlTHr pOpAWbkZ6Yj1EXKcMv89MPDgpgt63aHQona80SmpICKrAF0Oc9MrfVsFj A==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="292050373" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="292050373" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:08 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089313" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089313" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:07 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi , Sean Christopherson Subject: [PATCH 15/26] x86/sgx: Add helper to grab pages from an arbitrary EPC LRU Date: Fri, 11 Nov 2022 10:35:20 -0800 Message-Id: <20221111183532.3676646-16-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Move the isolation loop into a standalone helper, sgx_isolate_pages(), in preparation for existence of multiple LRUs. Expose the helper to other SGX code so that it can be called from the EPC cgroup code, e.g. to isolate pages from a single cgroup LRU. Exposing the isolation loop allows the cgroup iteration logic to be wholly encapsulated within the cgroup code. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/main.c | 68 +++++++++++++++++++++------------- arch/x86/kernel/cpu/sgx/sgx.h | 2 + 2 files changed, 44 insertions(+), 26 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index cb6f57caf24c..f8f1451b0a11 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -280,7 +280,46 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, } /** - * sgx_reclaim_pages() - Reclaim EPC pages from the consumers + * sgx_isolate_epc_pages() - Isolate pages from an LRU for reclaim + * @lru: LRU from which to reclaim + * @nr_to_scan: Number of pages to scan for reclaim + * @dst: Destination list to hold the isolated pages + */ +void sgx_isolate_epc_pages(struct sgx_epc_lru *lru, int *nr_to_scan, + struct list_head *dst) +{ + struct sgx_encl_page *encl_page; + struct sgx_epc_page *epc_page; + + spin_lock(&lru->lock); + for (; *nr_to_scan > 0; --(*nr_to_scan)) { + if (list_empty(&lru->reclaimable)) + break; + + epc_page = sgx_epc_peek_reclaimable(lru); + if (!epc_page) + break; + + encl_page = epc_page->encl_owner; + + if (WARN_ON_ONCE(!(epc_page->flags & SGX_EPC_PAGE_ENCLAVE))) + continue; + + if (kref_get_unless_zero(&encl_page->encl->refcount)) { + epc_page->flags |= SGX_EPC_PAGE_RECLAIM_IN_PROGRESS; + list_move_tail(&epc_page->list, dst); + } else { + /* The owner is freeing the page, remove it from the + * LRU list + */ + epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; + list_del_init(&epc_page->list); + } + } + spin_unlock(&lru->lock); +} + +/** * sgx_reclaim_epc_pages() - Reclaim EPC pages from the consumers * @nr_to_scan: Number of EPC pages to scan for reclaim * @ignore_age: Reclaim a page even if it is young @@ -305,37 +344,14 @@ static int __sgx_reclaim_pages(int nr_to_scan, bool ignore_age) struct sgx_epc_lru *lru; pgoff_t page_index; LIST_HEAD(iso); + int i = 0; int ret; - int i; - - spin_lock(&sgx_global_lru.lock); - for (i = 0; i < nr_to_scan; i++) { - epc_page = sgx_epc_peek_reclaimable(&sgx_global_lru); - if (!epc_page) - break; - - encl_page = epc_page->encl_owner; - if (WARN_ON_ONCE(!(epc_page->flags & SGX_EPC_PAGE_ENCLAVE))) - continue; - - if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) { - epc_page->flags |= SGX_EPC_PAGE_RECLAIM_IN_PROGRESS; - list_move_tail(&epc_page->list, &iso); - } else { - /* The owner is freeing the page, remove it from the - * LRU list - */ - epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; - list_del_init(&epc_page->list); - } - } - spin_unlock(&sgx_global_lru.lock); + sgx_isolate_epc_pages(&sgx_global_lru, &nr_to_scan, &iso); if (list_empty(&iso)) return 0; - i = 0; list_for_each_entry_safe(epc_page, tmp, &iso, list) { encl_page = epc_page->encl_owner; diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index ca51b3c7d905..29c37f20792c 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -182,6 +182,8 @@ void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags); int sgx_drop_epc_page(struct sgx_epc_page *page); struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim); int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age); +void sgx_isolate_epc_pages(struct sgx_epc_lru *lru, int *nr_to_scan, + struct list_head *dst); void sgx_ipi_cb(void *info); From patchwork Fri Nov 11 18:35:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040707 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C43D4C4332F for ; Fri, 11 Nov 2022 18:36:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234148AbiKKSgr (ORCPT ); Fri, 11 Nov 2022 13:36:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37608 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234159AbiKKSgM (ORCPT ); Fri, 11 Nov 2022 13:36:12 -0500 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9C7B7F54C; Fri, 11 Nov 2022 10:36:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191770; x=1699727770; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BhkDQba3waDzI7XsFxcqpI9nmgBLwwaPF8PxtWxCjOU=; b=dS/PkOSYAChvUMugt26i9e3ZRkUUHAqiJFY9RqD5xFiOdEycTsJQmIrp hMtwNLoeoO6Xm04sHYqPektTvYmiRwXMppiUbcIm4jco64SBwVJ/Mpquc s2FLPdAl4utlcmjQ8z5r+tMFU9iC1xtv6E1+njI5SfdkvCJFH9+LKuEtl SIKpkpvDLKnPybbdgu29CZlK986xT7qIfTOzoO1jxYf62bOKeUwcFYNce Ot3OqaL+rRXkv6X66DvLl3jMzNx9eJAI1tuHIbEX8hZ7ifxWuQppsvTZT kXJBN854DbA9Z9Qjc+OsMtEu+JS1Rnd7UUpDklo2xEUVu7SYD3xujKuW2 A==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="292050377" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="292050377" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:10 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089325" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089325" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:09 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi , Sean Christopherson Subject: [PATCH 16/26] x86/sgx: Add EPC OOM path to forcefully reclaim EPC Date: Fri, 11 Nov 2022 10:35:21 -0800 Message-Id: <20221111183532.3676646-17-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org From: Sean Christopherson Introduce the OOM path for killing an enclave with the reclaimer is no longer able to reclaim enough EPC pages. Find a victim enclave, which will be an enclave with EPC pages remaining that are not accessible to the reclaimer ("unreclaimable"). Once a victim is identified, mark the enclave as OOM and zap the enclaves entire page range. Release all the enclaves resources except for the struct sgx_encl memory itself. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/encl.c | 74 +++++++++++++++--- arch/x86/kernel/cpu/sgx/encl.h | 2 + arch/x86/kernel/cpu/sgx/main.c | 135 +++++++++++++++++++++++++++++++++ arch/x86/kernel/cpu/sgx/sgx.h | 1 + 4 files changed, 201 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 653c9ee5bf57..c1d772a11462 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -622,7 +622,8 @@ static int sgx_vma_access(struct vm_area_struct *vma, unsigned long addr, if (!encl) return -EFAULT; - if (!test_bit(SGX_ENCL_DEBUG, &encl->flags)) + if (!test_bit(SGX_ENCL_DEBUG, &encl->flags) || + test_bit(SGX_ENCL_OOM, &encl->flags)) return -EFAULT; for (i = 0; i < len; i += cnt) { @@ -668,16 +669,8 @@ const struct vm_operations_struct sgx_vm_ops = { .access = sgx_vma_access, }; -/** - * sgx_encl_release - Destroy an enclave instance - * @ref: address of a kref inside &sgx_encl - * - * Used together with kref_put(). Frees all the resources associated with the - * enclave and the instance itself. - */ -void sgx_encl_release(struct kref *ref) +static void __sgx_encl_release(struct sgx_encl *encl) { - struct sgx_encl *encl = container_of(ref, struct sgx_encl, refcount); struct sgx_va_page *va_page; struct sgx_encl_page *entry; unsigned long index; @@ -712,7 +705,7 @@ void sgx_encl_release(struct kref *ref) while (!list_empty(&encl->va_pages)) { va_page = list_first_entry(&encl->va_pages, struct sgx_va_page, list); - list_del(&va_page->list); + list_del_init(&va_page->list); sgx_drop_epc_page(va_page->epc_page); sgx_encl_free_epc_page(va_page->epc_page); kfree(va_page); @@ -728,10 +721,66 @@ void sgx_encl_release(struct kref *ref) /* Detect EPC page leak's. */ WARN_ON_ONCE(encl->secs_child_cnt); WARN_ON_ONCE(encl->secs.epc_page); +} + +/** + * sgx_encl_release - Destroy an enclave instance + * @ref: address of a kref inside &sgx_encl + * + * Used together with kref_put(). Frees all the resources associated with the + * enclave and the instance itself. + */ +void sgx_encl_release(struct kref *ref) +{ + struct sgx_encl *encl = container_of(ref, struct sgx_encl, refcount); + + /* if the enclave was OOM killed previously, it just needs to be freed */ + if (!test_bit(SGX_ENCL_OOM, &encl->flags)) + __sgx_encl_release(encl); kfree(encl); } +/** + * sgx_encl_destroy - prepare the enclave for release + * @encl: address of the sgx_encl to drain + * + * Used during oom kill to empty the mm_list entries after they have + * been zapped. Release the remaining enclave resources without freeing + * struct sgx_encl. + */ +void sgx_encl_destroy(struct sgx_encl *encl) +{ + struct sgx_encl_mm *encl_mm; + + for ( ; ; ) { + spin_lock(&encl->mm_lock); + + if (list_empty(&encl->mm_list)) { + encl_mm = NULL; + } else { + encl_mm = list_first_entry(&encl->mm_list, + struct sgx_encl_mm, list); + list_del_rcu(&encl_mm->list); + } + + spin_unlock(&encl->mm_lock); + + /* The enclave is no longer mapped by any mm. */ + if (!encl_mm) + break; + + synchronize_srcu(&encl->srcu); + mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm); + kfree(encl_mm); + + /* 'encl_mm' is gone, put encl_mm->encl reference: */ + kref_put(&encl->refcount, sgx_encl_release); + } + + __sgx_encl_release(encl); +} + /* * 'mm' is exiting and no longer needs mmu notifications. */ @@ -801,6 +850,9 @@ int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm) struct sgx_encl_mm *encl_mm; int ret; + if (test_bit(SGX_ENCL_OOM, &encl->flags)) + return -ENOMEM; + /* * Even though a single enclave may be mapped into an mm more than once, * each 'mm' only appears once on encl->mm_list. This is guaranteed by diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h index 831d63f80f5a..f4935632e53a 100644 --- a/arch/x86/kernel/cpu/sgx/encl.h +++ b/arch/x86/kernel/cpu/sgx/encl.h @@ -39,6 +39,7 @@ enum sgx_encl_flags { SGX_ENCL_DEBUG = BIT(1), SGX_ENCL_CREATED = BIT(2), SGX_ENCL_INITIALIZED = BIT(3), + SGX_ENCL_OOM = BIT(4), }; struct sgx_encl_mm { @@ -125,5 +126,6 @@ struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl, unsigned long addr); struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl, bool reclaim); void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page); +void sgx_encl_destroy(struct sgx_encl *encl); #endif /* _X86_ENCL_H */ diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index f8f1451b0a11..5a511046ad38 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -670,6 +670,141 @@ void sgx_free_epc_page(struct sgx_epc_page *page) atomic_long_inc(&sgx_nr_free_pages); } +static bool sgx_oom_get_ref(struct sgx_epc_page *epc_page) +{ + struct sgx_encl *encl; + + if (epc_page->flags & SGX_EPC_PAGE_ENCLAVE) + encl = ((struct sgx_encl_page *)epc_page->encl_owner)->encl; + else if (epc_page->flags & SGX_EPC_PAGE_VERSION_ARRAY) + encl = epc_page->encl; + else + return false; + + return kref_get_unless_zero(&encl->refcount); +} + +static struct sgx_epc_page *sgx_oom_get_victim(struct sgx_epc_lru *lru) +{ + struct sgx_epc_page *epc_page, *tmp; + + if (list_empty(&lru->unreclaimable)) + return NULL; + + list_for_each_entry_safe(epc_page, tmp, &lru->unreclaimable, list) { + list_del_init(&epc_page->list); + + if (sgx_oom_get_ref(epc_page)) + return epc_page; + } + return NULL; +} + +static void sgx_epc_oom_zap(void *owner, struct mm_struct *mm, unsigned long start, + unsigned long end, const struct vm_operations_struct *ops) +{ + struct vm_area_struct *vma, *tmp; + unsigned long vm_end; + + vma = find_vma(mm, start); + if (!vma || vma->vm_ops != ops || vma->vm_private_data != owner || + vma->vm_start >= end) + return; + + for (tmp = vma; tmp->vm_start < end; tmp = tmp->vm_next) { + do { + vm_end = tmp->vm_end; + tmp = tmp->vm_next; + } while (tmp && tmp->vm_ops == ops && + vma->vm_private_data == owner && tmp->vm_start < end); + + zap_page_range(vma, vma->vm_start, vm_end - vma->vm_start); + + if (!tmp) + break; + } +} + +static void sgx_oom_encl(struct sgx_encl *encl) +{ + unsigned long mm_list_version; + struct sgx_encl_mm *encl_mm; + int idx; + + set_bit(SGX_ENCL_OOM, &encl->flags); + + if (!test_bit(SGX_ENCL_CREATED, &encl->flags)) + goto out; + + do { + mm_list_version = encl->mm_list_version; + + /* Pairs with smp_rmb() in sgx_encl_mm_add(). */ + smp_rmb(); + + idx = srcu_read_lock(&encl->srcu); + + list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) { + if (!mmget_not_zero(encl_mm->mm)) + continue; + + mmap_read_lock(encl_mm->mm); + + sgx_epc_oom_zap(encl, encl_mm->mm, encl->base, + encl->base + encl->size, &sgx_vm_ops); + + mmap_read_unlock(encl_mm->mm); + + mmput_async(encl_mm->mm); + } + + srcu_read_unlock(&encl->srcu, idx); + } while (WARN_ON_ONCE(encl->mm_list_version != mm_list_version)); + + mutex_lock(&encl->lock); + sgx_encl_destroy(encl); + mutex_unlock(&encl->lock); + +out: + /* + * This puts the refcount we took when we identified this enclave as + * an OOM victim. + */ + kref_put(&encl->refcount, sgx_encl_release); +} + +static inline void sgx_oom_encl_page(struct sgx_encl_page *encl_page) +{ + return sgx_oom_encl(encl_page->encl); +} + +/** + * sgx_epc_oom() - invoke EPC out-of-memory handling on target LRU + * @lru: LRU that is low + * + * Return: %true if a victim was found and kicked. + */ +bool sgx_epc_oom(struct sgx_epc_lru *lru) +{ + struct sgx_epc_page *victim; + + spin_lock(&lru->lock); + victim = sgx_oom_get_victim(lru); + spin_unlock(&lru->lock); + + if (!victim) + return false; + + if (victim->flags & SGX_EPC_PAGE_ENCLAVE) + sgx_oom_encl_page(victim->encl_owner); + else if (victim->flags & SGX_EPC_PAGE_VERSION_ARRAY) + sgx_oom_encl(victim->encl); + else + WARN_ON_ONCE(1); + + return true; +} + static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size, unsigned long index, struct sgx_epc_section *section) diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 29c37f20792c..db09a8a0ea6e 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -184,6 +184,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim); int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age); void sgx_isolate_epc_pages(struct sgx_epc_lru *lru, int *nr_to_scan, struct list_head *dst); +bool sgx_epc_oom(struct sgx_epc_lru *lru); void sgx_ipi_cb(void *info); From patchwork Fri Nov 11 18:35:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040717 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96565C4332F for ; Fri, 11 Nov 2022 18:37:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234271AbiKKShH (ORCPT ); Fri, 11 Nov 2022 13:37:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37698 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234228AbiKKSgN (ORCPT ); Fri, 11 Nov 2022 13:36:13 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0A678292C; Fri, 11 Nov 2022 10:36:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191772; x=1699727772; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dM/C9NKoWZQns8V2ubS9Pw5AwFS8wR+Q2pt2pk3v+2I=; b=Tcy66Lxgsa8Sb74Bg1bLbC4NvZ2VNy3fJZkpayckrIRMWN1g5baOxzZ0 /1SkTRIHou9mLTxfnb71M4SHerUTWz6r5cmIzgkMfyh/lobt8C8C6Z8N5 EBo5/nc8hHucOAeDYHgoC8HBTmKYLjXVrMO43rU8jA+oylNsoyECXYuis UO5pAhhb0k3PZkmJTNcdTIqyyfixuzY9H9LmqpA7Atep7uoc+tHPJJSU7 qGwnREeVnL0BMikzWjYp4HgU2WkCHqxkcJc034TQ1nso+NUoFvo+8HNiv y7t3T+61elwgIMkNuONh25XmB3VPma/79TOwlv0LWZfPzkJ+tRx052Qat A==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="313447702" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="313447702" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:12 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089347" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089347" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:11 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Zefan Li , Johannes Weiner Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi Subject: [PATCH 17/26] cgroup/misc: Add notifier block list support for css events Date: Fri, 11 Nov 2022 10:35:22 -0800 Message-Id: <20221111183532.3676646-18-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org Consumers of the misc cgroup controller might need to perform separate actions in the event of a cgroup alloc, free or release call. In addition, writes to the max value may also need separate action. Add the ability to allow code to register for these notifications, and call the notifier block chain list when appropriate. This code will be utilized by the SGX driver in a future patch. Signed-off-by: Kristen Carlson Accardi --- include/linux/misc_cgroup.h | 17 ++++++++++++ kernel/cgroup/misc.c | 52 +++++++++++++++++++++++++++++++++++-- 2 files changed, 67 insertions(+), 2 deletions(-) diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index c238207d1615..8f1b7b6cb81d 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -21,6 +21,12 @@ enum misc_res_type { MISC_CG_RES_TYPES }; +enum misc_cg_events { + MISC_CG_ALLOC, /* a misc_cg was allocated */ + MISC_CG_FREE, /* a misc_cg was freed */ + MISC_CG_RELEASED, /* a misc_cg is being freed */ + MISC_CG_CHANGE, /* the misc_cg max value was changed */ +}; struct misc_cg; #ifdef CONFIG_CGROUP_MISC @@ -59,6 +65,8 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, unsigned long amount); void misc_cg_uncharge(enum misc_res_type type, struct misc_cg *cg, unsigned long amount); +int register_misc_cg_notifier(struct notifier_block *nb); +int unregister_misc_cg_notifier(struct notifier_block *nb); /** * css_misc() - Get misc cgroup from the css. @@ -132,5 +140,14 @@ static inline void put_misc_cg(struct misc_cg *cg) { } +static inline int register_misc_cg_notifier(struct notifier_block *nb) +{ + return 0; +} + +static inline int unregister_misc_cg_notifier(struct notifier_block *nb) +{ + return 0; +} #endif /* CONFIG_CGROUP_MISC */ #endif /* _MISC_CGROUP_H_ */ diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c index fe3e8a0eb7ed..1e93e1d20347 100644 --- a/kernel/cgroup/misc.c +++ b/kernel/cgroup/misc.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #define MAX_STR "max" @@ -39,6 +40,11 @@ static struct misc_cg root_cg; */ static unsigned long misc_res_capacity[MISC_CG_RES_TYPES]; +/* + * Notifier list for misc_cg cgroup callback events. + */ +static BLOCKING_NOTIFIER_HEAD(misc_cg_notify_list); + /** * parent_misc() - Get the parent of the passed misc cgroup. * @cgroup: cgroup whose parent needs to be fetched. @@ -278,10 +284,12 @@ static ssize_t misc_cg_max_write(struct kernfs_open_file *of, char *buf, cg = css_misc(of_css(of)); - if (READ_ONCE(misc_res_capacity[type])) + if (READ_ONCE(misc_res_capacity[type])) { WRITE_ONCE(cg->res[type].max, max); - else + blocking_notifier_call_chain(&misc_cg_notify_list, MISC_CG_CHANGE, cg); + } else { ret = -EINVAL; + } return ret ? ret : nbytes; } @@ -400,6 +408,7 @@ misc_cg_alloc(struct cgroup_subsys_state *parent_css) WRITE_ONCE(cg->res[i].max, MAX_NUM); atomic_long_set(&cg->res[i].usage, 0); } + blocking_notifier_call_chain(&misc_cg_notify_list, MISC_CG_ALLOC, cg); return &cg->css; } @@ -412,13 +421,52 @@ misc_cg_alloc(struct cgroup_subsys_state *parent_css) */ static void misc_cg_free(struct cgroup_subsys_state *css) { + blocking_notifier_call_chain(&misc_cg_notify_list, MISC_CG_FREE, css_misc(css)); kfree(css_misc(css)); } +/** + * misc_cg_released() - Release the misc cgroup + * @css: cgroup subsys object. + * + * Call the notifier chain to notify about the event. + * + * Context: Any context. + */ +static void misc_cg_released(struct cgroup_subsys_state *css) +{ + blocking_notifier_call_chain(&misc_cg_notify_list, MISC_CG_RELEASED, css_misc(css)); +} + /* Cgroup controller callbacks */ struct cgroup_subsys misc_cgrp_subsys = { .css_alloc = misc_cg_alloc, .css_free = misc_cg_free, + .css_released = misc_cg_released, .legacy_cftypes = misc_cg_files, .dfl_cftypes = misc_cg_files, }; + +/** + * register_misc_cg_notifier() - Register for css callback events + * @nb: notifier_block to register + * + * Context: Any context. + */ +int register_misc_cg_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_register(&misc_cg_notify_list, nb); +} +EXPORT_SYMBOL_GPL(register_misc_cg_notifier); + +/** + * unregister_misc_cg_notifier() - unregister for css callback events + * @nb: notifier_block to unregister + * + * Context: Any context. + */ +int unregister_misc_cg_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_unregister(&misc_cg_notify_list, nb); +} +EXPORT_SYMBOL_GPL(unregister_misc_cg_notifier); From patchwork Fri Nov 11 18:35:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040708 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12C4CC433FE for ; Fri, 11 Nov 2022 18:37:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233900AbiKKShA (ORCPT ); Fri, 11 Nov 2022 13:37:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37722 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234234AbiKKSgO (ORCPT ); Fri, 11 Nov 2022 13:36:14 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2775083381; Fri, 11 Nov 2022 10:36:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191774; x=1699727774; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=wY8Ia7AhZOsSBmUDim+2xlwCX7VM8EjOrhb9ezVOttc=; b=CP7moDwxZdBbz+ai2ISJn8ohwUFqPbFTnZc97wcRrphBU0tPUK4mn+NE nyfu8SRGvGBGTFkAeEn0CC2GPcfgqadZFqgvf9K409D+kr+YJ4DrYLu8M n+P9994qJ5tAXS9OvBSOILxZ3HifnVKcw7tX/01ArQLXeMee4eeyd1kxk SW4XREmyq3/IOEZC+N/rQ+pbzFL9JBFip+CYyJQfShUWfYa+48tNcTf4k oM3osAoLXgCL/iXLyU/HRtkxM8V4kriPg2lYt5WiLWTdoFrMbguyMNmh8 PcJjgbjMOBum5JAG3l3/lNKgb6BWJFdOjIcBh7hEUztb5tFaBDbME77nY Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="313447706" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="313447706" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:13 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089360" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089360" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:12 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Zefan Li , Johannes Weiner Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi Subject: [PATCH 18/26] cgroup/misc: Expose root_misc Date: Fri, 11 Nov 2022 10:35:23 -0800 Message-Id: <20221111183532.3676646-19-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org The SGX driver will need to get access to the root misc_cg object to do iterative walks and also determine if a charge will be towards the root cgroup or not. Signed-off-by: Kristen Carlson Accardi --- include/linux/misc_cgroup.h | 5 +++++ kernel/cgroup/misc.c | 9 +++++++++ 2 files changed, 14 insertions(+) diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index 8f1b7b6cb81d..b79c78378f17 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -59,6 +59,7 @@ struct misc_cg { struct misc_res res[MISC_CG_RES_TYPES]; }; +struct misc_cg *root_misc(void); unsigned long misc_cg_res_total_usage(enum misc_res_type type); int misc_cg_set_capacity(enum misc_res_type type, unsigned long capacity); int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, @@ -106,6 +107,10 @@ static inline void put_misc_cg(struct misc_cg *cg) } #else /* !CONFIG_CGROUP_MISC */ +static inline struct misc_cg *root_misc(void) +{ + return NULL; +} static inline unsigned long misc_cg_res_total_usage(enum misc_res_type type) { diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c index 1e93e1d20347..8aa994d9cd02 100644 --- a/kernel/cgroup/misc.c +++ b/kernel/cgroup/misc.c @@ -45,6 +45,15 @@ static unsigned long misc_res_capacity[MISC_CG_RES_TYPES]; */ static BLOCKING_NOTIFIER_HEAD(misc_cg_notify_list); +/** + * root_misc() - Return the root misc cgroup. + */ +struct misc_cg *root_misc(void) +{ + return &root_cg; +} +EXPORT_SYMBOL_GPL(root_misc); + /** * parent_misc() - Get the parent of the passed misc cgroup. * @cgroup: cgroup whose parent needs to be fetched. From patchwork Fri Nov 11 18:35:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040713 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C31CC4167B for ; Fri, 11 Nov 2022 18:37:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234199AbiKKShD (ORCPT ); Fri, 11 Nov 2022 13:37:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37276 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233768AbiKKSgP (ORCPT ); Fri, 11 Nov 2022 13:36:15 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4727382937; Fri, 11 Nov 2022 10:36:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191775; x=1699727775; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dQ7ZYI4WFVi0TVSU0cIX9E3CVIqNB+6yLjwlI7zqKjc=; b=CsngimnXo57fCXP7N4YGZ/TcdiElF/D8Igbx28tiEIMmCWwsYAvnQZ4D 92Fcq/tEC8GYa4IzRXjSI6KFosPa13OrKdWeeCVp5MvSyRs9ziGxhvVJu tSvIGGTGjLUSkvK1XpFrP/3aWI7Usgz644lmBdHslrJXAc6mpJtk7IcD2 d0mWnxocIwUW/23M/dd/umPgzyGbbp3hmtshCL4l5o4CFU5JlUD8SYpd1 qmVFkw2naOrAEcM8CuO/S1qKVY5tY9GFA3RPfhvHd4eqeySvc0oeSxYph zAOjMWHfFcJmpra4q0RLM2lOl59TEwhiOZRM5VL8NSApGqM0iskFE1OS9 g==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="313447709" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="313447709" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:15 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089369" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089369" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:13 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Zefan Li , Johannes Weiner Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi Subject: [PATCH 19/26] cgroup/misc: Expose parent_misc() Date: Fri, 11 Nov 2022 10:35:24 -0800 Message-Id: <20221111183532.3676646-20-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org To manage the SGX EPC memory via the misc controller, the SGX driver will need to be able to iterate over the misc cgroup hierarchy. Make parent_misc() available for a future patch that will utilize it. Signed-off-by: Kristen Carlson Accardi --- include/linux/misc_cgroup.h | 6 ++++++ kernel/cgroup/misc.c | 3 ++- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index b79c78378f17..d1aeb85f2ed6 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -60,6 +60,7 @@ struct misc_cg { }; struct misc_cg *root_misc(void); +struct misc_cg *parent_misc(struct misc_cg *cg); unsigned long misc_cg_res_total_usage(enum misc_res_type type); int misc_cg_set_capacity(enum misc_res_type type, unsigned long capacity); int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, @@ -112,6 +113,11 @@ static inline struct misc_cg *root_misc(void) return NULL; } +static inline struct misc_cg *parent_misc(struct misc_cg *cg) +{ + return NULL; +} + static inline unsigned long misc_cg_res_total_usage(enum misc_res_type type) { return 0; diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c index 8aa994d9cd02..b22a055af9ad 100644 --- a/kernel/cgroup/misc.c +++ b/kernel/cgroup/misc.c @@ -63,10 +63,11 @@ EXPORT_SYMBOL_GPL(root_misc); * * struct misc_cg* - Parent of the @cgroup. * * %NULL - If @cgroup is null or the passed cgroup does not have a parent. */ -static struct misc_cg *parent_misc(struct misc_cg *cgroup) +struct misc_cg *parent_misc(struct misc_cg *cgroup) { return cgroup ? css_misc(cgroup->css.parent) : NULL; } +EXPORT_SYMBOL_GPL(parent_misc); /** * valid_type() - Check if @type is valid or not. From patchwork Fri Nov 11 18:35:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040715 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D9ACC4167E for ; Fri, 11 Nov 2022 18:37:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234237AbiKKShF (ORCPT ); Fri, 11 Nov 2022 13:37:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37840 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234273AbiKKSgS (ORCPT ); Fri, 11 Nov 2022 13:36:18 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0225833B2; Fri, 11 Nov 2022 10:36:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191776; x=1699727776; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YoGb4j/ZQ61R2iQhhmxSzGX3p7smajRdGWvm5NNdTBs=; b=AucB0PwBRMGpEItGXZUTWsmONUkG/FmfD9juI6gfUor9xA9+Ob2NTkFF L4B8WH1aia4/s3+82BeFJSj6aGiwK3REC5R9EgYxI5sA2aBq3zE0U1a85 k6tbUxOeztfpFY2iCr5AbjT4VX4nHMrQe+xO9lXXyJWVUAfnR5ZTZlaCA GgGYmNqEIumkHYiGU3Q1RTsPVj8AVy/yaRFb9KYt1Z/g1jrIJ/gYcLIrg NXbE6p2g2ovZuV42r9PEAd4VAaM8sZzyOgLI6k1YxTsCYWvs0WmgbTgd1 elqSIreDZevioZG43Ny0wU9B8UJi/nOwqh8t3PZtVRl6UAKBPCGGv8KOY Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="313447711" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="313447711" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:16 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089385" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089385" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:15 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Zefan Li , Johannes Weiner Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi Subject: [PATCH 20/26] cgroup/misc: allow users of misc cgroup to read specific cgroup usage Date: Fri, 11 Nov 2022 10:35:25 -0800 Message-Id: <20221111183532.3676646-21-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org Add a function to return the current usage of the specified cgroup. The SGX driver will need this information to decide whether to reclaim EPC pages from a cgroup. Signed-off-by: Kristen Carlson Accardi --- include/linux/misc_cgroup.h | 6 ++++++ kernel/cgroup/misc.c | 19 +++++++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index d1aeb85f2ed6..a9dd087132dc 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -61,6 +61,7 @@ struct misc_cg { struct misc_cg *root_misc(void); struct misc_cg *parent_misc(struct misc_cg *cg); +unsigned long misc_cg_read(enum misc_res_type type, struct misc_cg *cg); unsigned long misc_cg_res_total_usage(enum misc_res_type type); int misc_cg_set_capacity(enum misc_res_type type, unsigned long capacity); int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, @@ -118,6 +119,11 @@ static inline struct misc_cg *parent_misc(struct misc_cg *cg) return NULL; } +static inline unsigned long misc_cg_read(enum misc_res_type type, struct misc_cg *cg) +{ + return 0; +} + static inline unsigned long misc_cg_res_total_usage(enum misc_res_type type) { return 0; diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c index b22a055af9ad..e2c99fdc1d40 100644 --- a/kernel/cgroup/misc.c +++ b/kernel/cgroup/misc.c @@ -213,6 +213,25 @@ void misc_cg_uncharge(enum misc_res_type type, struct misc_cg *cg, } EXPORT_SYMBOL_GPL(misc_cg_uncharge); +/** + * misc_cg_read() - Return the current usage of the misc cgroup res. + * @type: Type of the misc res. + * @cg: Misc cgroup whose usage will be read + * + * Context: Any context. + * Return: + * The current total usage of the specified misc cgroup. + * If an invalid misc_res_type is given, zero will be returned. + */ +unsigned long misc_cg_read(enum misc_res_type type, struct misc_cg *cg) +{ + if (!(valid_type(type) && cg)) + return 0; + + return atomic_long_read(&cg->res[type].usage); +} +EXPORT_SYMBOL_GPL(misc_cg_read); + /** * misc_cg_max_show() - Show the misc cgroup max limit. * @sf: Interface file From patchwork Fri Nov 11 18:35:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040714 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A308CC43219 for ; Fri, 11 Nov 2022 18:37:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234225AbiKKShE (ORCPT ); Fri, 11 Nov 2022 13:37:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37900 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234293AbiKKSgT (ORCPT ); Fri, 11 Nov 2022 13:36:19 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5CD40833BE; Fri, 11 Nov 2022 10:36:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191778; x=1699727778; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bRGewpsYybIXP9Kh4EzoNOkrDN/D0MwqytH8St6QCMQ=; b=kYL2gj81ydUbyzi95Wd8M68ps5XL0o4vg+qgNxzUxtOgTaTT/qrmBUdq wDcMIth16dZId3UeKLYnymTWho2sjcbRbCqLna6FMC9ZlzOhVirN7M5Yy kpUXHYiAPLIlLj6U/vz0zf7fzHDfmBKiFNyWvGEiPCK22PMdIGen7l3Gw 38DIKkkm9hdM9EP2Q311XH9+AzVIALjZsPnsrMN7AtSjUaw64tpxj40Yb LrqoiYrzxt3g0fqYOsGksRB6R8rcMIJLEBX/+TjCDCPIsbTVyZHt/YTYD uAJxnTXH9cnsJLQveoXuKgXao8Vb+XctLy2ceSGvkeZO+nqxpgfY5DYLE A==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="313447719" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="313447719" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:18 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089411" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089411" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:16 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Zefan Li , Johannes Weiner Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi Subject: [PATCH 21/26] cgroup/misc: allow misc cgroup consumers to read the max value Date: Fri, 11 Nov 2022 10:35:26 -0800 Message-Id: <20221111183532.3676646-22-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org The SGX driver will need to be able to read the max value per cgroup to determine how far usage is from max. Add an api to return the max value of the given cgroup. Signed-off-by: Kristen Carlson Accardi --- include/linux/misc_cgroup.h | 6 ++++++ kernel/cgroup/misc.c | 19 +++++++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index a9dd087132dc..c00deae4d2df 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -62,6 +62,7 @@ struct misc_cg { struct misc_cg *root_misc(void); struct misc_cg *parent_misc(struct misc_cg *cg); unsigned long misc_cg_read(enum misc_res_type type, struct misc_cg *cg); +unsigned long misc_cg_max(enum misc_res_type type, struct misc_cg *cg); unsigned long misc_cg_res_total_usage(enum misc_res_type type); int misc_cg_set_capacity(enum misc_res_type type, unsigned long capacity); int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, @@ -124,6 +125,11 @@ static inline unsigned long misc_cg_read(enum misc_res_type type, struct misc_cg return 0; } +static inline unsigned long misc_cg_max(enum misc_res_type type, struct misc_cg *cg) +{ + return 0; +} + static inline unsigned long misc_cg_res_total_usage(enum misc_res_type type) { return 0; diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c index e2c99fdc1d40..18d0bec7d609 100644 --- a/kernel/cgroup/misc.c +++ b/kernel/cgroup/misc.c @@ -232,6 +232,25 @@ unsigned long misc_cg_read(enum misc_res_type type, struct misc_cg *cg) } EXPORT_SYMBOL_GPL(misc_cg_read); +/** + * misc_cg_max() - Return the max value of the misc cgroup res. + * @type: Type of the misc res. + * @cg: Misc cgroup whose max will be read + * + * Context: Any context. + * Return: + * The max value of the specified misc cgroup. + * If an invalid misc_res_type is given, zero will be returned. + */ +unsigned long misc_cg_max(enum misc_res_type type, struct misc_cg *cg) +{ + if (!(valid_type(type) && cg)) + return 0; + + return READ_ONCE(cg->res[type].max); +} +EXPORT_SYMBOL_GPL(misc_cg_max); + /** * misc_cg_max_show() - Show the misc cgroup max limit. * @sf: Interface file From patchwork Fri Nov 11 18:35:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040709 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA3EAC4332F for ; Fri, 11 Nov 2022 18:37:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233958AbiKKShB (ORCPT ); Fri, 11 Nov 2022 13:37:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37292 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234334AbiKKSg1 (ORCPT ); Fri, 11 Nov 2022 13:36:27 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE2A6178A6; Fri, 11 Nov 2022 10:36:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191779; x=1699727779; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qDw1+KlOA5sa/9TNbGFSCosxMRaFIDw8SgzYI5H+AWg=; b=isfnaWRjwuVMTz7qupQcJpeA5mJyp0cM8SxoRmcmL8y1s2UNL6r95fdQ 2vtLWDWTi8nJZhMEJJkOl3t1zhh1FyJU2rMQOhybxVAG0HMmtt36yHERl hjtWKXfOTe6yl7q2pTrgKY6uzYVjwqGkW4Ml/iP9BXkbgPrdxIEWTcq0M gNhrNL2ZnopOBHXZKsL5YTIZ8dnqoiXHzawEapwzfm0zpyD1y+f9BhQjb lTWktPfsblzcPsiyYZfELSXzH0Ai6vAtouYXNgYesjaPWx687yJCoVkYm nQXIcfNcxUpwkI4ls4W/SdAZjgyIKafaT6E0wS9Vo+2dPbRcEFljd+knK g==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="313447727" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="313447727" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:19 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089422" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089422" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:18 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Zefan Li , Johannes Weiner Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi Subject: [PATCH 22/26] cgroup/misc: Add private per cgroup data to struct misc_cg Date: Fri, 11 Nov 2022 10:35:27 -0800 Message-Id: <20221111183532.3676646-23-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org The SGX driver needs to be able to store additional per cgroup data specific to SGX along with the misc_cg struct. Add the ability to get and set this data in struct misc_cg. Signed-off-by: Kristen Carlson Accardi --- include/linux/misc_cgroup.h | 12 ++++++++++++ kernel/cgroup/misc.c | 39 +++++++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index c00deae4d2df..7fbf3efb0f62 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -43,6 +43,7 @@ struct misc_res { unsigned long max; atomic_long_t usage; atomic_long_t events; + void *priv; }; /** @@ -63,6 +64,8 @@ struct misc_cg *root_misc(void); struct misc_cg *parent_misc(struct misc_cg *cg); unsigned long misc_cg_read(enum misc_res_type type, struct misc_cg *cg); unsigned long misc_cg_max(enum misc_res_type type, struct misc_cg *cg); +void *misc_cg_get_priv(enum misc_res_type type, struct misc_cg *cg); +void misc_cg_set_priv(enum misc_res_type type, struct misc_cg *cg, void *priv); unsigned long misc_cg_res_total_usage(enum misc_res_type type); int misc_cg_set_capacity(enum misc_res_type type, unsigned long capacity); int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, @@ -130,6 +133,15 @@ static inline unsigned long misc_cg_max(enum misc_res_type type, struct misc_cg return 0; } +static void *misc_cg_get_priv(enum misc_res_type type, struct misc_cg *cg) +{ + return NULL; +} + +static void misc_cg_set_priv(enum misc_res_type type, struct misc_cg *cg, void *priv) +{ +} + static inline unsigned long misc_cg_res_total_usage(enum misc_res_type type) { return 0; diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c index 18d0bec7d609..642879ad136f 100644 --- a/kernel/cgroup/misc.c +++ b/kernel/cgroup/misc.c @@ -251,6 +251,45 @@ unsigned long misc_cg_max(enum misc_res_type type, struct misc_cg *cg) } EXPORT_SYMBOL_GPL(misc_cg_max); +/** + * misc_cg_get_priv() - Return the priv value of the misc cgroup res. + * @type: Type of the misc res. + * @cg: Misc cgroup whose priv will be read + * + * Context: Any context. + * Return: + * The value of the priv field for the specified misc cgroup. + * If an invalid misc_res_type is given, NULL will be returned. + */ +void *misc_cg_get_priv(enum misc_res_type type, struct misc_cg *cg) +{ + if (!(valid_type(type) && cg)) + return NULL; + + return cg->res[type].priv; +} +EXPORT_SYMBOL_GPL(misc_cg_get_priv); + +/** + * misc_cg_set_priv() - Set the priv value of the misc cgroup res. + * @type: Type of the misc res. + * @cg: Misc cgroup whose priv will be written + * @priv: Value to store in the priv field of the struct misc_cg + * + * If an invalid misc_res_type is given, the priv data will not be + * stored. + * + * Context: Any context. + */ +void misc_cg_set_priv(enum misc_res_type type, struct misc_cg *cg, void *priv) +{ + if (!(valid_type(type) && cg)) + return; + + cg->res[type].priv = priv; +} +EXPORT_SYMBOL_GPL(misc_cg_set_priv); + /** * misc_cg_max_show() - Show the misc cgroup max limit. * @sf: Interface file From patchwork Fri Nov 11 18:35:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040716 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41D09C433FE for ; Fri, 11 Nov 2022 18:37:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234255AbiKKShG (ORCPT ); Fri, 11 Nov 2022 13:37:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37300 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233899AbiKKSg2 (ORCPT ); Fri, 11 Nov 2022 13:36:28 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52FB283691; Fri, 11 Nov 2022 10:36:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191781; x=1699727781; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=eSkn/vztgv3b1H9AuqHP1f8+st1SIUnPEpjHioF/AGg=; b=aNX7gCXNB77Zoq0LIpI74k/wVWVCbS7k02cNdzM8lHUCMjrk3M49xxs4 9Hy6XxA3OtiyyF4pwFMjhyeXj5D2geooZ18TZm3ALUbxGWrIfr8//X5j9 F3+RLD9n7WUAQ1Dn3HZCjko1MdEVCUpsdwvLz25WGV2r/St+bukPRShkZ 1Dl7NEzvT8yy/3s3FykGwa/9mIdUhLSOC+LtQ1TbQ4MxCEW9G+3wLr6Pk eMQIR1DbKSmsxgSEPg+vCLdXZeYT3nc9Egx7QElFfUAl8qdVNzoOsGZUo QAwr/154NpdKnULBX+np9F1mmJRsiEdUdbFLONgT5VaBexGUVtkQIthhB w==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="313447732" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="313447732" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:20 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089434" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089434" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:19 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi Subject: [PATCH 23/26] cgroup/misc: Add tryget functionality for misc controller Date: Fri, 11 Nov 2022 10:35:28 -0800 Message-Id: <20221111183532.3676646-24-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org Allows caller to do a css_tryget() on a specific misc cgroup. Signed-off-by: Kristen Carlson Accardi --- include/linux/misc_cgroup.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index 7fbf3efb0f62..cee848205715 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -112,6 +112,16 @@ static inline void put_misc_cg(struct misc_cg *cg) css_put(&cg->css); } +/* + * misc_cg_tryget() - Try to increment this misc cgroup ref count. + * @cg - cgroup to get. + */ +static inline bool misc_cg_tryget(struct misc_cg *cg) +{ + if (cg) + return css_tryget(&cg->css); + return false; +} #else /* !CONFIG_CGROUP_MISC */ static inline struct misc_cg *root_misc(void) { @@ -175,6 +185,11 @@ static inline void put_misc_cg(struct misc_cg *cg) { } +static inline bool misc_cg_tryget(struct misc_cg *cg) +{ + return true; +} + static inline int register_misc_cg_notifier(struct notifier_block *nb) { return 0; From patchwork Fri Nov 11 18:35:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040718 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57775C43217 for ; Fri, 11 Nov 2022 18:37:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234279AbiKKShI (ORCPT ); Fri, 11 Nov 2022 13:37:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37262 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234387AbiKKSga (ORCPT ); Fri, 11 Nov 2022 13:36:30 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87F4F836A8; Fri, 11 Nov 2022 10:36:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191782; x=1699727782; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EBcLsiw30eqLl7g1ih7HHqPXoxhQ0Yyb7AgcJd3NM6M=; b=DcdzJGxlzqeMI/lVJgpZqqrS2R/BtbGEaER+UPPX/2sRCSlBUD7UsRBQ iinw8ZfNwf/pMHccT7UNwlg5FXGzc5JjIS2JvaGOiQ9/ZocAEF8cbrArC 1uDU6p+EwBNwHuIhpW5jragpop1al97Xa+tUg/o6A1S7DxsRIYwYNnjsB psk0qIH9OPaj8FXtVonWN2kw1Fk++umdmg5ctHUgPPWhcyx0q3wFX3TQm tHjlLM7lyA9zJ+Vt3DtyzfY2W+aaBhL78/zIFq4m5u3ziAwsiRc2t2Kym FXOKldQlIV+Vj6/TBZwu5YVxa/l/hde1crqPRw2Ey7oyWNiXsWGYDupsf A==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="313447734" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="313447734" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:22 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089447" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089447" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:21 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Zefan Li , Johannes Weiner Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi Subject: [PATCH 24/26] cgroup/misc: Add SGX EPC resource type Date: Fri, 11 Nov 2022 10:35:29 -0800 Message-Id: <20221111183532.3676646-25-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org Allow SGX EPC memory to be a valid resource type for the misc controller. Signed-off-by: Kristen Carlson Accardi --- include/linux/misc_cgroup.h | 4 ++++ kernel/cgroup/misc.c | 4 ++++ 2 files changed, 8 insertions(+) diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index cee848205715..aeaf4acf22af 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -17,6 +17,10 @@ enum misc_res_type { MISC_CG_RES_SEV, /* AMD SEV-ES ASIDs resource */ MISC_CG_RES_SEV_ES, +#endif +#ifdef CONFIG_CGROUP_SGX_EPC + /* SGX EPC memory resource */ + MISC_CG_RES_SGX_EPC, #endif MISC_CG_RES_TYPES }; diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c index 642879ad136f..e73a034adca3 100644 --- a/kernel/cgroup/misc.c +++ b/kernel/cgroup/misc.c @@ -25,6 +25,10 @@ static const char *const misc_res_name[] = { /* AMD SEV-ES ASIDs resource */ "sev_es", #endif +#ifdef CONFIG_CGROUP_SGX_EPC + /* Intel SGX EPC memory bytes */ + "sgx_epc", +#endif }; /* Root misc cgroup */ From patchwork Fri Nov 11 18:35:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040719 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DEDE2C4332F for ; Fri, 11 Nov 2022 18:37:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234595AbiKKSho (ORCPT ); Fri, 11 Nov 2022 13:37:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234457AbiKKSgu (ORCPT ); Fri, 11 Nov 2022 13:36:50 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9B41C845D4; Fri, 11 Nov 2022 10:36:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191785; x=1699727785; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=JOo1STAY5mXNE7WRHhgICM+niYz0qz8X60t0f/Hdtn0=; b=TbTprBpAKVjwbpTVEdYV7EGKkeetIT2Si7K6w6Skc+Eb+4aDfu2gKLgw PtpNEpHWo2DLFU0ElCSm9GWkvie5/EFV0zr7fT+3XugwwHbXozFCNkWcy pyj79EvhMV3+j0olb0xvAwnevfoYUjg/VKOKKMmqSlRKZDTZGfyZaz0Lz vTkqnSJRCcv44mWcEMQtiRU039pzNn7a/c6Y27BBI3yzWR8VUaHWUrclH /lO4M/hFbkdQ52Ipm5Rr6gab/S4sv/emBxa+Sv7/5DgFOTwfsYupWVoNm LVXXXAD+kTV4dOSLdhWZKvYhxei+tJSq/OkaYKFKicRJo6c7JrKqvDyQu w==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="313447743" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="313447743" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:25 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089465" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089465" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:23 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi , Sean Christopherson Subject: [PATCH 25/26] x86/sgx: Add support for misc cgroup controller Date: Fri, 11 Nov 2022 10:35:30 -0800 Message-Id: <20221111183532.3676646-26-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org Implement support for cgroup control of SGX Enclave Page Cache (EPC) memory using the misc cgroup controller. EPC memory is independent from normal system memory, e.g. must be reserved at boot from RAM and cannot be converted between EPC and normal memory while the system is running. EPC is managed by the SGX subsystem and is not accounted by the memory controller. Much like normal system memory, EPC memory can be overcommitted via virtual memory techniques and pages can be swapped out of the EPC to their backing store (normal system memory, e.g. shmem). The SGX EPC subsystem is analogous to the memory subsytem and the SGX EPC controller is in turn analogous to the memory controller; it implements limit and protection models for EPC memory. The misc controller provides a mechanism to set a hard limit of EPC usage via the "sgx_epc" resource in "misc.max". The total EPC memory available on the system is reported via the "sgx_epc" resource in "misc.capacity". This patch was modified from its original version to use the misc cgroup controller instead of a custom controller. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/Kconfig | 13 + arch/x86/kernel/cpu/sgx/Makefile | 1 + arch/x86/kernel/cpu/sgx/epc_cgroup.c | 561 +++++++++++++++++++++++++++ arch/x86/kernel/cpu/sgx/epc_cgroup.h | 59 +++ arch/x86/kernel/cpu/sgx/main.c | 86 +++- arch/x86/kernel/cpu/sgx/sgx.h | 5 +- 6 files changed, 709 insertions(+), 16 deletions(-) create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.c create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f9920f1341c8..0eeae4ebe1c3 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1936,6 +1936,19 @@ config X86_SGX If unsure, say N. +config CGROUP_SGX_EPC + bool "Miscellaneous Cgroup Controller for Enclave Page Cache (EPC) for Intel SGX" + depends on X86_SGX && CGROUP_MISC + help + Provides control over the EPC footprint of tasks in a cgroup via + the Miscellaneous cgroup controller. + + EPC is a subset of regular memory that is usable only by SGX + enclaves and is very limited in quantity, e.g. less than 1% + of total DRAM. + + Say N if unsure. + config EFI bool "EFI runtime service support" depends on ACPI diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile index 9c1656779b2a..12901a488da7 100644 --- a/arch/x86/kernel/cpu/sgx/Makefile +++ b/arch/x86/kernel/cpu/sgx/Makefile @@ -4,3 +4,4 @@ obj-y += \ ioctl.o \ main.o obj-$(CONFIG_X86_SGX_KVM) += virt.o +obj-$(CONFIG_CGROUP_SGX_EPC) += epc_cgroup.o diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c b/arch/x86/kernel/cpu/sgx/epc_cgroup.c new file mode 100644 index 000000000000..03c0fa42880c --- /dev/null +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c @@ -0,0 +1,561 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright(c) 2022 Intel Corporation. + +#include +#include +#include +#include +#include +#include + +#include "epc_cgroup.h" + +#define SGX_EPC_RECLAIM_MIN_PAGES 16UL +#define SGX_EPC_RECLAIM_MAX_PAGES 64UL +#define SGX_EPC_RECLAIM_IGNORE_AGE_THRESHOLD 5 +#define SGX_EPC_RECLAIM_OOM_THRESHOLD 5 + +static struct workqueue_struct *sgx_epc_cg_wq; + +struct sgx_epc_reclaim_control { + struct sgx_epc_cgroup *epc_cg; + int nr_fails; + bool ignore_age; +}; + +static inline unsigned long sgx_epc_cgroup_page_counter_read(struct sgx_epc_cgroup *epc_cg) +{ + return misc_cg_read(MISC_CG_RES_SGX_EPC, epc_cg->cg) / PAGE_SIZE; +} + +static inline unsigned long sgx_epc_cgroup_max_pages(struct sgx_epc_cgroup *epc_cg) +{ + return misc_cg_max(MISC_CG_RES_SGX_EPC, epc_cg->cg) / PAGE_SIZE; +} + +static inline struct sgx_epc_cgroup *sgx_epc_cgroup_from_misc_cg(struct misc_cg *cg) +{ + return (struct sgx_epc_cgroup *)misc_cg_get_priv(MISC_CG_RES_SGX_EPC, cg); +} + +static inline struct sgx_epc_cgroup *parent_epc_cgroup(struct sgx_epc_cgroup *epc_cg) +{ + return sgx_epc_cgroup_from_misc_cg(parent_misc(epc_cg->cg)); +} + +static inline bool sgx_epc_cgroup_disabled(void) +{ + return !cgroup_subsys_enabled(misc_cgrp_subsys); +} + +/** + * sgx_epc_cgroup_iter - iterate over the EPC cgroup hierarchy + * @root: hierarchy root + * @prev: previously returned epc_cg, NULL on first invocation + * @reclaim_epoch: epoch for shared reclaim walks, NULL for full walks + * + * Return: references to children of the hierarchy below @root, or + * @root itself, or %NULL after a full round-trip. + * + * Caller must pass the return value in @prev on subsequent invocations + * for reference counting, or use sgx_epc_cgroup_iter_break() to cancel + * a hierarchy walk before the round-trip is complete. + */ +static struct sgx_epc_cgroup *sgx_epc_cgroup_iter(struct sgx_epc_cgroup *prev, + struct sgx_epc_cgroup *root, + unsigned long *reclaim_epoch) +{ + struct cgroup_subsys_state *css = NULL; + struct sgx_epc_cgroup *epc_cg = NULL; + struct sgx_epc_cgroup *pos = NULL; + bool inc_epoch = false; + + if (sgx_epc_cgroup_disabled()) + return NULL; + + if (!root) + root = sgx_epc_cgroup_from_misc_cg(root_misc()); + + if (prev && !reclaim_epoch) + pos = prev; + + rcu_read_lock(); + +start: + if (reclaim_epoch) { + /* + * Abort the walk if a reclaimer working from the same root has + * started a new walk after this reclaimer has already scanned + * at least one cgroup. + */ + if (prev && *reclaim_epoch != root->epoch) + goto out; + + while (1) { + pos = READ_ONCE(root->reclaim_iter); + if (!pos || misc_cg_tryget(pos->cg)) + break; + + /* + * The css is dying, clear the reclaim_iter immediately + * instead of waiting for ->css_released to be called. + * Busy waiting serves no purpose and attempting to wait + * for ->css_released may actually block it from being + * called. + */ + (void)cmpxchg(&root->reclaim_iter, pos, NULL); + } + } + + if (pos) + css = &pos->cg->css; + + while (!epc_cg) { + struct misc_cg *cg; + + css = css_next_descendant_pre(css, &root->cg->css); + if (!css) { + /* + * Increment the epoch as we've reached the end of the + * tree and the next call to css_next_descendant_pre + * will restart at root. Do not update root->epoch + * directly as we should only do so if we update the + * reclaim_iter, i.e. a different thread may win the + * race and update the epoch for us. + */ + inc_epoch = true; + + /* + * Reclaimers share the hierarchy walk, and a new one + * might jump in at the end of the hierarchy. Restart + * at root so that we don't return NULL on a thread's + * initial call. + */ + if (!prev) + continue; + break; + } + + cg = css_misc(css); + /* + * Verify the css and acquire a reference. Don't take an + * extra reference to root as it's either the global root + * or is provided by the caller and so is guaranteed to be + * alive. Keep walking if this css is dying. + */ + if (cg != root->cg && !misc_cg_tryget(cg)) + continue; + + epc_cg = sgx_epc_cgroup_from_misc_cg(cg); + } + + if (reclaim_epoch) { + /* + * reclaim_iter could have already been updated by a competing + * thread; check that the value hasn't changed since we read + * it to avoid reclaiming from the same cgroup twice. If the + * value did change, put all of our references and restart the + * entire process, for all intents and purposes we're making a + * new call. + */ + if (cmpxchg(&root->reclaim_iter, pos, epc_cg) != pos) { + if (epc_cg && epc_cg != root) + put_misc_cg(epc_cg->cg); + if (pos) + put_misc_cg(pos->cg); + css = NULL; + epc_cg = NULL; + inc_epoch = false; + goto start; + } + + if (inc_epoch) + root->epoch++; + if (!prev) + *reclaim_epoch = root->epoch; + + if (pos) + put_misc_cg(pos->cg); + } + +out: + rcu_read_unlock(); + if (prev && prev != root) + put_misc_cg(prev->cg); + + return epc_cg; +} + +/** + * sgx_epc_cgroup_iter_break - abort a hierarchy walk prematurely + * @prev: last visited cgroup as returned by sgx_epc_cgroup_iter() + * @root: hierarchy root + */ +static void sgx_epc_cgroup_iter_break(struct sgx_epc_cgroup *prev, + struct sgx_epc_cgroup *root) +{ + if (!root) + root = sgx_epc_cgroup_from_misc_cg(root_misc()); + if (prev && prev != root) + put_misc_cg(prev->cg); +} + +/** + * sgx_epc_cgroup_lru_empty - check if a cgroup tree has no pages on its lrus + * @root: root of the tree to check + * + * Return: %true if all cgroups under the specified root have empty LRU lists. + * Used to avoid livelocks due to a cgroup having a non-zero charge count but + * no pages on its LRUs, e.g. due to a dead enclave waiting to be released or + * because all pages in the cgroup are unreclaimable. + */ +bool sgx_epc_cgroup_lru_empty(struct sgx_epc_cgroup *root) +{ + struct sgx_epc_cgroup *epc_cg; + + for (epc_cg = sgx_epc_cgroup_iter(NULL, root, NULL); + epc_cg; + epc_cg = sgx_epc_cgroup_iter(epc_cg, root, NULL)) { + if (!list_empty(&epc_cg->lru.reclaimable)) { + sgx_epc_cgroup_iter_break(epc_cg, root); + return false; + } + } + return true; +} + +/** + * sgx_epc_cgroup_isolate_pages - walk a cgroup tree and separate pages + * @root: root of the tree to start walking + * @nr_to_scan: The number of pages that need to be isolated + * @dst: Destination list to hold the isolated pages + * + * Walk the cgroup tree and isolate the pages in the hierarchy + * for reclaiming. + */ +void sgx_epc_cgroup_isolate_pages(struct sgx_epc_cgroup *root, + int *nr_to_scan, struct list_head *dst) +{ + struct sgx_epc_cgroup *epc_cg; + unsigned long epoch; + + if (!*nr_to_scan) + return; + + for (epc_cg = sgx_epc_cgroup_iter(NULL, root, &epoch); + epc_cg; + epc_cg = sgx_epc_cgroup_iter(epc_cg, root, &epoch)) { + sgx_isolate_epc_pages(&epc_cg->lru, nr_to_scan, dst); + if (!*nr_to_scan) { + sgx_epc_cgroup_iter_break(epc_cg, root); + break; + } + } +} + +static int sgx_epc_cgroup_reclaim_pages(unsigned long nr_pages, + struct sgx_epc_reclaim_control *rc) +{ + /* + * Ensure sgx_reclaim_pages is called with a minimum and maximum + * number of pages. Attempting to reclaim only a few pages will + * often fail and is inefficient, while reclaiming a huge number + * of pages can result in soft lockups due to holding various + * locks for an extended duration. This also bounds nr_pages so + */ + nr_pages = max(nr_pages, SGX_EPC_RECLAIM_MIN_PAGES); + nr_pages = min(nr_pages, SGX_EPC_RECLAIM_MAX_PAGES); + + return sgx_reclaim_epc_pages(nr_pages, rc->ignore_age, rc->epc_cg); +} + +static int sgx_epc_cgroup_reclaim_failed(struct sgx_epc_reclaim_control *rc) +{ + if (sgx_epc_cgroup_lru_empty(rc->epc_cg)) + return -ENOMEM; + + ++rc->nr_fails; + if (rc->nr_fails > SGX_EPC_RECLAIM_IGNORE_AGE_THRESHOLD) + rc->ignore_age = true; + + return 0; +} + +static inline +void sgx_epc_reclaim_control_init(struct sgx_epc_reclaim_control *rc, + struct sgx_epc_cgroup *epc_cg) +{ + rc->epc_cg = epc_cg; + rc->nr_fails = 0; + rc->ignore_age = false; +} + +/* + * Scheduled by sgx_epc_cgroup_try_charge() to reclaim pages from the + * cgroup when the cgroup is at/near its maximum capacity + */ +static void sgx_epc_cgroup_reclaim_work_func(struct work_struct *work) +{ + struct sgx_epc_reclaim_control rc; + struct sgx_epc_cgroup *epc_cg; + unsigned long cur, max; + + epc_cg = container_of(work, struct sgx_epc_cgroup, reclaim_work); + + sgx_epc_reclaim_control_init(&rc, epc_cg); + + for (;;) { + max = sgx_epc_cgroup_max_pages(epc_cg); + + /* + * Adjust the limit down by one page, the goal is to free up + * pages for fault allocations, not to simply obey the limit. + * Conditionally decrementing max also means the cur vs. max + * check will correctly handle the case where both are zero. + */ + if (max) + max--; + + /* + * Unless the limit is extremely low, in which case forcing + * reclaim will likely cause thrashing, force the cgroup to + * reclaim at least once if it's operating *near* its maximum + * limit by adjusting @max down by half the min reclaim size. + * This work func is scheduled by sgx_epc_cgroup_try_charge + * when it cannot directly reclaim due to being in an atomic + * context, e.g. EPC allocation in a fault handler. Waiting + * to reclaim until the cgroup is actually at its limit is less + * performant as it means the faulting task is effectively + * blocked until a worker makes its way through the global work + * queue. + */ + if (max > SGX_EPC_RECLAIM_MAX_PAGES) + max -= (SGX_EPC_RECLAIM_MIN_PAGES/2); + + cur = sgx_epc_cgroup_page_counter_read(epc_cg); + if (cur <= max) + break; + + if (!sgx_epc_cgroup_reclaim_pages(cur - max, &rc)) { + if (sgx_epc_cgroup_reclaim_failed(&rc)) + break; + } + } +} + +static int __sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, + unsigned long nr_pages, bool reclaim) +{ + struct sgx_epc_reclaim_control rc; + unsigned long cur, max, over; + unsigned int nr_empty = 0; + + if (epc_cg == sgx_epc_cgroup_from_misc_cg(root_misc())) { + misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, + nr_pages * PAGE_SIZE); + return 0; + } + + sgx_epc_reclaim_control_init(&rc, NULL); + + for (;;) { + if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, + nr_pages * PAGE_SIZE)) + break; + + rc.epc_cg = epc_cg; + max = sgx_epc_cgroup_max_pages(rc.epc_cg); + if (nr_pages > max) + return -ENOMEM; + + if (signal_pending(current)) + return -ERESTARTSYS; + + if (!reclaim) { + queue_work(sgx_epc_cg_wq, &rc.epc_cg->reclaim_work); + return -EBUSY; + } + + cur = sgx_epc_cgroup_page_counter_read(rc.epc_cg); + over = ((cur + nr_pages) > max) ? + (cur + nr_pages) - max : SGX_EPC_RECLAIM_MIN_PAGES; + + if (!sgx_epc_cgroup_reclaim_pages(over, &rc)) { + if (sgx_epc_cgroup_reclaim_failed(&rc)) { + if (++nr_empty > SGX_EPC_RECLAIM_OOM_THRESHOLD) + return -ENOMEM; + schedule(); + } + } + } + + css_get_many(&epc_cg->cg->css, nr_pages); + + return 0; +} + + +/** + * sgx_epc_cgroup_try_charge - hierarchically try to charge a single EPC page + * @mm: the mm_struct of the process to charge + * @reclaim: whether or not synchronous reclaim is allowed + * + * Returns EPC cgroup or NULL on success, -errno on failure. + */ +struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(struct mm_struct *mm, + bool reclaim) +{ + struct sgx_epc_cgroup *epc_cg; + int ret; + + if (sgx_epc_cgroup_disabled()) + return NULL; + + epc_cg = sgx_epc_cgroup_from_misc_cg(get_current_misc_cg()); + ret = __sgx_epc_cgroup_try_charge(epc_cg, 1, reclaim); + put_misc_cg(epc_cg->cg); + + if (ret) + return ERR_PTR(ret); + + return epc_cg; +} + +/** + * sgx_epc_cgroup_uncharge - hierarchically uncharge EPC pages + * @epc_cg: the charged epc cgroup + */ +void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg) +{ + if (sgx_epc_cgroup_disabled()) + return; + + misc_cg_uncharge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE); + + if (epc_cg->cg != root_misc()) + put_misc_cg(epc_cg->cg); +} + +static void sgx_epc_cgroup_oom(struct sgx_epc_cgroup *root) +{ + struct sgx_epc_cgroup *epc_cg; + + for (epc_cg = sgx_epc_cgroup_iter(NULL, root, NULL); + epc_cg; + epc_cg = sgx_epc_cgroup_iter(epc_cg, root, NULL)) { + if (sgx_epc_oom(&epc_cg->lru)) { + sgx_epc_cgroup_iter_break(epc_cg, root); + return; + } + } +} + +static void sgx_epc_cgroup_release(struct sgx_epc_cgroup *epc_cg) +{ + struct sgx_epc_cgroup *dead_cg = epc_cg; + + while ((epc_cg = parent_epc_cgroup(epc_cg))) + cmpxchg(&epc_cg->reclaim_iter, dead_cg, NULL); +} + +static void sgx_epc_cgroup_free(struct sgx_epc_cgroup *epc_cg) +{ + cancel_work_sync(&epc_cg->reclaim_work); + kfree(epc_cg); +} + +static struct sgx_epc_cgroup *sgx_epc_cgroup_alloc(struct misc_cg *cg) +{ + struct sgx_epc_cgroup *epc_cg; + + epc_cg = kzalloc(sizeof(struct sgx_epc_cgroup), GFP_KERNEL); + if (!epc_cg) + return ERR_PTR(-ENOMEM); + + sgx_lru_init(&epc_cg->lru); + INIT_WORK(&epc_cg->reclaim_work, sgx_epc_cgroup_reclaim_work_func); + epc_cg->cg = cg; + misc_cg_set_priv(MISC_CG_RES_SGX_EPC, cg, epc_cg); + + return epc_cg; +} + +static void sgx_epc_cgroup_max_write(struct sgx_epc_cgroup *epc_cg) +{ + struct sgx_epc_reclaim_control rc; + unsigned int nr_empty = 0; + unsigned long cur, max; + + sgx_epc_reclaim_control_init(&rc, epc_cg); + + max = sgx_epc_cgroup_max_pages(epc_cg); + + for (;;) { + cur = sgx_epc_cgroup_page_counter_read(epc_cg); + if (cur <= max) + break; + + if (signal_pending(current)) + break; + + if (!sgx_epc_cgroup_reclaim_pages(cur - max, &rc)) { + if (sgx_epc_cgroup_reclaim_failed(&rc)) { + if (++nr_empty > SGX_EPC_RECLAIM_OOM_THRESHOLD) + sgx_epc_cgroup_oom(epc_cg); + schedule(); + } + } + } +} + +static int sgx_epc_cgroup_callback(struct notifier_block *nb, + unsigned long val, void *data) +{ + struct misc_cg *cg = data; + struct sgx_epc_cgroup *epc_cg; + + if (val == MISC_CG_ALLOC) { + epc_cg = sgx_epc_cgroup_alloc(cg); + if (!epc_cg) + return NOTIFY_BAD; + + return NOTIFY_OK; + } + + epc_cg = sgx_epc_cgroup_from_misc_cg(cg); + + if (val == MISC_CG_FREE) { + sgx_epc_cgroup_free(epc_cg); + return NOTIFY_OK; + } else if (val == MISC_CG_CHANGE) { + sgx_epc_cgroup_max_write(epc_cg); + return NOTIFY_OK; + } else if (val == MISC_CG_RELEASED) { + sgx_epc_cgroup_release(epc_cg); + return NOTIFY_OK; + } + return NOTIFY_DONE; +} + +static struct notifier_block sgx_epc_cg_nb = { + .notifier_call = sgx_epc_cgroup_callback, + .priority = 0, +}; + +static int __init sgx_epc_cgroup_init(void) +{ + if (!boot_cpu_has(X86_FEATURE_SGX)) + return 0; + + sgx_epc_cg_wq = alloc_workqueue("sgx_epc_cg_wq", + WQ_UNBOUND | WQ_FREEZABLE, + WQ_UNBOUND_MAX_ACTIVE); + BUG_ON(!sgx_epc_cg_wq); + + sgx_epc_cgroup_alloc(root_misc()); + + register_misc_cg_notifier(&sgx_epc_cg_nb); + + return 0; +} +subsys_initcall(sgx_epc_cgroup_init); diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.h b/arch/x86/kernel/cpu/sgx/epc_cgroup.h new file mode 100644 index 000000000000..a8c631ee6fac --- /dev/null +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.h @@ -0,0 +1,59 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2022 Intel Corporation. */ +#ifndef _INTEL_SGX_EPC_CGROUP_H_ +#define _INTEL_SGX_EPC_CGROUP_H_ + +#include +#include +#include +#include +#include +#include + +#include "sgx.h" + +#ifndef CONFIG_CGROUP_SGX_EPC +#define MISC_CG_RES_SGX_EPC MISC_CG_RES_TYPES +struct sgx_epc_cgroup; + +static inline struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(struct mm_struct *mm, + bool reclaim) +{ + return NULL; +} +static inline void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg) { } +static inline void sgx_epc_cgroup_isolate_pages(struct sgx_epc_cgroup *root, + int *nr_to_scan, + struct list_head *dst) { } +static inline struct sgx_epc_lru *epc_cg_lru(struct sgx_epc_cgroup *epc_cg) +{ + return NULL; +} +static bool sgx_epc_cgroup_lru_empty(struct sgx_epc_cgroup *root) +{ + return true; +} +#else +struct sgx_epc_cgroup { + struct misc_cg *cg; + struct sgx_epc_lru lru; + struct sgx_epc_cgroup *reclaim_iter; + struct work_struct reclaim_work; + unsigned int epoch; +}; + +struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(struct mm_struct *mm, + bool reclaim); +void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg); +bool sgx_epc_cgroup_lru_empty(struct sgx_epc_cgroup *root); +void sgx_epc_cgroup_isolate_pages(struct sgx_epc_cgroup *root, + int *nr_to_scan, struct list_head *dst); +static inline struct sgx_epc_lru *epc_cg_lru(struct sgx_epc_cgroup *epc_cg) +{ + if (epc_cg) + return &epc_cg->lru; + return NULL; +} +#endif + +#endif /* _INTEL_SGX_EPC_CGROUP_H_ */ diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 5a511046ad38..b9b55068f87f 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -17,6 +18,7 @@ #include "driver.h" #include "encl.h" #include "encls.h" +#include "epc_cgroup.h" #define SGX_MAX_NR_TO_RECLAIM 32 @@ -33,9 +35,20 @@ static DEFINE_XARRAY(sgx_epc_address_space); static struct sgx_epc_lru sgx_global_lru; static inline struct sgx_epc_lru *sgx_lru(struct sgx_epc_page *epc_page) { + if (IS_ENABLED(CONFIG_CGROUP_SGX_EPC)) + return epc_cg_lru(epc_page->epc_cg); + return &sgx_global_lru; } +static inline bool sgx_can_reclaim(void) +{ + if (!IS_ENABLED(CONFIG_CGROUP_SGX_EPC)) + return !list_empty(&sgx_global_lru.reclaimable); + + return !sgx_epc_cgroup_lru_empty(NULL); +} + static atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0); /* Nodes with one or more EPC sections. */ @@ -320,9 +333,10 @@ void sgx_isolate_epc_pages(struct sgx_epc_lru *lru, int *nr_to_scan, } /** - * sgx_reclaim_epc_pages() - Reclaim EPC pages from the consumers + * __sgx_reclaim_epc_pages() - Reclaim EPC pages from the consumers * @nr_to_scan: Number of EPC pages to scan for reclaim * @ignore_age: Reclaim a page even if it is young + * @epc_cg: EPC cgroup from which to reclaim * * Take a fixed number of pages from the head of the active page pool and * reclaim them to the enclave's private shmem files. Skip the pages, which have @@ -336,7 +350,8 @@ void sgx_isolate_epc_pages(struct sgx_epc_lru *lru, int *nr_to_scan, * problematic as it would increase the lock contention too much, which would * halt forward progress. */ -static int __sgx_reclaim_pages(int nr_to_scan, bool ignore_age) +static int __sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age, + struct sgx_epc_cgroup *epc_cg) { struct sgx_backing backing[SGX_MAX_NR_TO_RECLAIM]; struct sgx_epc_page *epc_page, *tmp; @@ -347,7 +362,15 @@ static int __sgx_reclaim_pages(int nr_to_scan, bool ignore_age) int i = 0; int ret; - sgx_isolate_epc_pages(&sgx_global_lru, &nr_to_scan, &iso); + /* + * If a specific cgroup is not being targetted, take from the global + * list first, even when cgroups are enabled. If there are + * pages on the global LRU then they should get reclaimed asap. + */ + if (!IS_ENABLED(CONFIG_CGROUP_SGX_EPC) || !epc_cg) + sgx_isolate_epc_pages(&sgx_global_lru, &nr_to_scan, &iso); + + sgx_epc_cgroup_isolate_pages(epc_cg, &nr_to_scan, &iso); if (list_empty(&iso)) return 0; @@ -394,25 +417,33 @@ static int __sgx_reclaim_pages(int nr_to_scan, bool ignore_age) kref_put(&encl_page->encl->refcount, sgx_encl_release); epc_page->flags &= ~SGX_EPC_PAGE_RECLAIM_FLAGS; + if (epc_page->epc_cg) { + sgx_epc_cgroup_uncharge(epc_page->epc_cg); + epc_page->epc_cg = NULL; + } + sgx_free_epc_page(epc_page); } return i; } -int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age) +/** + * sgx_reclaim_epc_pages() - wrapper for __sgx_reclaim_epc_pages which + * calls cond_resched() upon completion. + * @nr_to_scan: Number of EPC pages to scan for reclaim + * @ignore_age: Reclaim a page even if it is young + * @epc_cg: EPC cgroup from which to reclaim + */ +int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age, + struct sgx_epc_cgroup *epc_cg) { int ret; - ret = __sgx_reclaim_pages(nr_to_scan, ignore_age); + ret = __sgx_reclaim_epc_pages(nr_to_scan, ignore_age, epc_cg); cond_resched(); return ret; } -static bool sgx_can_reclaim(void) -{ - return !list_empty(&sgx_global_lru.reclaimable); -} - static bool sgx_should_reclaim(unsigned long watermark) { return atomic_long_read(&sgx_nr_free_pages) < watermark && @@ -429,7 +460,7 @@ static bool sgx_should_reclaim(unsigned long watermark) void sgx_reclaim_direct(void) { if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) - __sgx_reclaim_pages(SGX_NR_TO_SCAN, false); + __sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false, NULL); } static int ksgxd(void *p) @@ -455,7 +486,7 @@ static int ksgxd(void *p) sgx_should_reclaim(SGX_NR_HIGH_PAGES)); if (sgx_should_reclaim(SGX_NR_HIGH_PAGES)) - sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false); + sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false, NULL); } return 0; @@ -613,6 +644,11 @@ int sgx_drop_epc_page(struct sgx_epc_page *page) struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) { struct sgx_epc_page *page; + struct sgx_epc_cgroup *epc_cg; + + epc_cg = sgx_epc_cgroup_try_charge(current->mm, reclaim); + if (IS_ERR(epc_cg)) + return ERR_CAST(epc_cg); for ( ; ; ) { page = __sgx_alloc_epc_page(); @@ -621,8 +657,10 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - if (!sgx_can_reclaim()) - return ERR_PTR(-ENOMEM); + if (!sgx_can_reclaim()) { + page = ERR_PTR(-ENOMEM); + break; + } if (!reclaim) { page = ERR_PTR(-EBUSY); @@ -634,7 +672,14 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false); + sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false, NULL); + } + + if (!IS_ERR(page)) { + WARN_ON(page->epc_cg); + page->epc_cg = epc_cg; + } else { + sgx_epc_cgroup_uncharge(epc_cg); } if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) @@ -667,6 +712,12 @@ void sgx_free_epc_page(struct sgx_epc_page *page) page->flags = SGX_EPC_PAGE_IS_FREE; spin_unlock(&node->lock); + + if (page->epc_cg) { + sgx_epc_cgroup_uncharge(page->epc_cg); + page->epc_cg = NULL; + } + atomic_long_inc(&sgx_nr_free_pages); } @@ -831,6 +882,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size, section->pages[i].flags = 0; section->pages[i].encl_owner = NULL; section->pages[i].poison = 0; + section->pages[i].epc_cg = NULL; list_add_tail(§ion->pages[i].list, &sgx_dirty_page_list); } @@ -995,6 +1047,7 @@ static void __init arch_update_sysfs_visibility(int nid) {} static bool __init sgx_page_cache_init(void) { u32 eax, ebx, ecx, edx, type; + u64 capacity = 0; u64 pa, size; int nid; int i; @@ -1045,6 +1098,7 @@ static bool __init sgx_page_cache_init(void) sgx_epc_sections[i].node = &sgx_numa_nodes[nid]; sgx_numa_nodes[nid].size += size; + capacity += size; sgx_nr_epc_sections++; } @@ -1054,6 +1108,8 @@ static bool __init sgx_page_cache_init(void) return false; } + misc_cg_set_capacity(MISC_CG_RES_SGX_EPC, capacity); + return true; } diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index db09a8a0ea6e..4059dd74b0d4 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -40,6 +40,7 @@ SGX_EPC_PAGE_RECLAIM_IN_PROGRESS | \ SGX_EPC_PAGE_ENCLAVE | \ SGX_EPC_PAGE_VERSION_ARRAY) +struct sgx_epc_cgroup; struct sgx_epc_page { unsigned int section; @@ -53,6 +54,7 @@ struct sgx_epc_page { struct sgx_encl *encl; }; struct list_head list; + struct sgx_epc_cgroup *epc_cg; }; /* @@ -181,7 +183,8 @@ void sgx_reclaim_direct(void); void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags); int sgx_drop_epc_page(struct sgx_epc_page *page); struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim); -int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age); +int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age, + struct sgx_epc_cgroup *epc_cg); void sgx_isolate_epc_pages(struct sgx_epc_lru *lru, int *nr_to_scan, struct list_head *dst); bool sgx_epc_oom(struct sgx_epc_lru *lru); From patchwork Fri Nov 11 18:35:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 13040720 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36A85C4332F for ; Fri, 11 Nov 2022 18:37:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234604AbiKKShq (ORCPT ); Fri, 11 Nov 2022 13:37:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37762 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234481AbiKKSgv (ORCPT ); Fri, 11 Nov 2022 13:36:51 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 814C2845D8; Fri, 11 Nov 2022 10:36:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668191787; x=1699727787; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=k3YCP4QpTMOre6gSfvMdJZ1a5qHh/SBGDc5XHII2Kw0=; b=Olxpu5x0vcSe8HBAo5ghJOeZv8316LiznvpTaLiovqgbW0Vho7EWEPJZ I8N73QSBtc/ddGwU67MCMYlJXshhI9MkG6LAsw8yteYUDul6FkLIYL3UO ZFDjhHchM7Wnbs1ReZ+yX0lsndcTHULnlThG3+FulM1Suiez4w+RlRj2H +k6Dt2V+Jt2kBtFZQNixYy810aGEm7gUGItP77d0IJQ/XUW5YXZ5+4fRc BvDLBuMyHX88VvbWS+qwNPgedXDB0m1joaAObbgWlhHoXCwa4zuYM3aRm aCeh/nRBBz0RWQkAeVXH0uitOdWR071z/0vE2oUl4P7JiNSnHQdu7o0zX Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="313447749" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="313447749" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:26 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10528"; a="640089475" X-IronPort-AV: E=Sophos;i="5.96,157,1665471600"; d="scan'208";a="640089475" Received: from hermesli-mobl.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.212.218.5]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2022 10:36:25 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , Jonathan Corbet Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi , Sean Christopherson , linux-doc@vger.kernel.org Subject: [PATCH 26/26] Docs/x86/sgx: Add description for cgroup support Date: Fri, 11 Nov 2022 10:35:31 -0800 Message-Id: <20221111183532.3676646-27-kristen@linux.intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221111183532.3676646-1-kristen@linux.intel.com> References: <20221111183532.3676646-1-kristen@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org Add initial documentation of how to regulate the distribution of SGX Enclave Page Cache (EPC) memory via the Miscellaneous cgroup controller. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson Reviewed-by: Bagas Sanjaya --- Documentation/x86/sgx.rst | 77 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 77 insertions(+) diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst index 2bcbffacbed5..f6ca5594dcf2 100644 --- a/Documentation/x86/sgx.rst +++ b/Documentation/x86/sgx.rst @@ -300,3 +300,80 @@ to expected failures and handle them as follows: first call. It indicates a bug in the kernel or the userspace client if any of the second round of ``SGX_IOC_VEPC_REMOVE_ALL`` calls has a return code other than 0. + + +Cgroup Support +============== + +The "sgx_epc" resource within the Miscellaneous cgroup controller regulates +distribution of SGX EPC memory, which is a subset of system RAM that +is used to provide SGX-enabled applications with protected memory, +and is otherwise inaccessible, i.e. shows up as reserved in +/proc/iomem and cannot be read/written outside of an SGX enclave. + +Although current systems implement EPC by stealing memory from RAM, +for all intents and purposes the EPC is independent from normal system +memory, e.g. must be reserved at boot from RAM and cannot be converted +between EPC and normal memory while the system is running. The EPC is +managed by the SGX subsystem and is not accounted by the memory +controller. Note that this is true only for EPC memory itself, i.e. +normal memory allocations related to SGX and EPC memory, e.g. the +backing memory for evicted EPC pages, are accounted, limited and +protected by the memory controller. + +Much like normal system memory, EPC memory can be overcommitted via +virtual memory techniques and pages can be swapped out of the EPC +to their backing store (normal system memory allocated via shmem). +The SGX EPC subsystem is analogous to the memory subsytem, and +it implements limit and protection models for EPC memory. + +SGX EPC Interface Files +----------------------- + +For a generic description of the Miscellaneous controller interface +files, please see Documentation/admin-guide/cgroup-v2.rst + +All SGX EPC memory amounts are in bytes unless explicitly stated +otherwise. If a value which is not PAGE_SIZE aligned is written, +the actual value used by the controller will be rounded down to +the closest PAGE_SIZE multiple. + + misc.capacity + A read-only flat-keyed file shown only in the root cgroup. + The sgx_epc resource will show the total amount of EPC + memory available on the platform. + + misc.current + A read-only flat-keyed file shown in the non-root cgroups. + The sgx_epc resource will show the current active EPC memory + usage of the cgroup and its descendants. EPC pages that are + swapped out to backing RAM are not included in the current count. + + misc.max + A read-write single value file which exists on non-root + cgroups. The sgx_epc resource will show the EPC usage + hard limit. The default is "max". + + If a cgroup's EPC usage reaches this limit, EPC allocations, + e.g. for page fault handling, will be blocked until EPC can + be reclaimed from the cgroup. If EPC cannot be reclaimed in + a timely manner, reclaim will be forced, e.g. by ignoring LRU. + + misc.events + A read-write flat-keyed file which exists on non-root cgroups. + Writes to the file reset the event counters to zero. A value + change in this file generates a file modified event. + + max + The number of times the cgroup has triggered a reclaim + due to its EPC usage approaching (or exceeding) its max + EPC boundary. + +Migration +--------- + +Once an EPC page is charged to a cgroup (during allocation), it +remains charged to the original cgroup until the page is released +or reclaimed. Migrating a process to a different cgroup doesn't +move the EPC charges that it incurred while in the previous cgroup +to its new cgroup.