From patchwork Mon May 9 21:47:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Reinette Chatre X-Patchwork-Id: 12844088 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D54EBC433F5 for ; Mon, 9 May 2022 21:49:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230404AbiEIVxg (ORCPT ); Mon, 9 May 2022 17:53:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57626 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230332AbiEIVwZ (ORCPT ); Mon, 9 May 2022 17:52:25 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9737C2734EB for ; Mon, 9 May 2022 14:48:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1652132894; x=1683668894; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mduTokMMrJYVyD5/g6ZiIHRpR8h3VCNe5bSzvlrzGVY=; b=QP/5bNrcBcxpWZT4tsBQVAq8/keRphklxHDHN4yZAdpDRyMyPv0dCdnJ pNH4zTzwlxwAYi/SraafxqHbPR7rtqU/AzWS5c+utx04m9A69PGDZtc5X lw2bJeieZCDrgrY2XlF3ONrVvsuTkLvMtKXgi+mOkPSlH2FyVtMJ/JghV qIybS9rr1JeCHqpQznFJ65e4PO5Q355+SVxqdmJD1Cc0TtO8gVuQDgAaG Jc/YrkCzTYmV5s6YhXpMCDggGL6zmPYUqEn/cykxrNwlv+ERiLMLcNXcQ bNDTQUui/6SDYpL+DDMm/CcXeI4ckl53qGMan4Q9qn7akhNL6A6vg4Y5E g==; X-IronPort-AV: E=McAfee;i="6400,9594,10342"; a="332212852" X-IronPort-AV: E=Sophos;i="5.91,212,1647327600"; d="scan'208";a="332212852" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2022 14:48:08 -0700 X-IronPort-AV: E=Sophos;i="5.91,212,1647327600"; d="scan'208";a="565293490" Received: from rchatre-ws.ostc.intel.com ([10.54.69.144]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2022 14:48:08 -0700 From: Reinette Chatre To: dave.hansen@linux.intel.com, jarkko@kernel.org, linux-sgx@vger.kernel.org Cc: haitao.huang@intel.com Subject: [PATCH V2 1/5] x86/sgx: Disconnect backing page references from dirty status Date: Mon, 9 May 2022 14:47:59 -0700 Message-Id: <4cda80a0049de6bc47480b4ad119091027c11d45.1652131695.git.reinette.chatre@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org sgx_encl_put_backing() is used to release references to the backing storage and, optionally, mark both backing store pages as dirty. Managing references and dirty status together has two consequences: 1) Both backing store pages are marked as dirty, even if only one of the backing store pages are changed. 2) Backing store pages are marked dirty after the important data has been written to it. This is against the custom of marking pages as dirty before important data are written to them. Remove the option to sgx_encl_put_backing() to set the backing pages as dirty and set the needed pages as dirty right before receiving important data. This aligns the usage with the existing custom and prepares the code for a following change where only one of the backing pages need to be marked as dirty. Suggested-by: Dave Hansen Signed-off-by: Reinette Chatre --- arch/x86/kernel/cpu/sgx/encl.c | 10 ++-------- arch/x86/kernel/cpu/sgx/encl.h | 2 +- arch/x86/kernel/cpu/sgx/main.c | 6 ++++-- 3 files changed, 7 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 7c63a1911fae..398695a20605 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -94,7 +94,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page, kunmap_atomic(pcmd_page); kunmap_atomic((void *)(unsigned long)pginfo.contents); - sgx_encl_put_backing(&b, false); + sgx_encl_put_backing(&b); sgx_encl_truncate_backing_page(encl, page_index); @@ -645,15 +645,9 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index, /** * sgx_encl_put_backing() - Unpin the backing storage * @backing: data for accessing backing storage for the page - * @do_write: mark pages dirty */ -void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write) +void sgx_encl_put_backing(struct sgx_backing *backing) { - if (do_write) { - set_page_dirty(backing->pcmd); - set_page_dirty(backing->contents); - } - put_page(backing->pcmd); put_page(backing->contents); } diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h index fec43ca65065..d44e7372151f 100644 --- a/arch/x86/kernel/cpu/sgx/encl.h +++ b/arch/x86/kernel/cpu/sgx/encl.h @@ -107,7 +107,7 @@ void sgx_encl_release(struct kref *ref); int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm); int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index, struct sgx_backing *backing); -void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write); +void sgx_encl_put_backing(struct sgx_backing *backing); int sgx_encl_test_and_clear_young(struct mm_struct *mm, struct sgx_encl_page *page); diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 8e4bc6453d26..fad3d6c4756e 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -190,6 +190,8 @@ static int __sgx_encl_ewb(struct sgx_epc_page *epc_page, void *va_slot, pginfo.metadata = (unsigned long)kmap_atomic(backing->pcmd) + backing->pcmd_offset; + set_page_dirty(backing->pcmd); + set_page_dirty(backing->contents); ret = __ewb(&pginfo, sgx_get_epc_virt_addr(epc_page), va_slot); kunmap_atomic((void *)(unsigned long)(pginfo.metadata - @@ -320,7 +322,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, sgx_encl_free_epc_page(encl->secs.epc_page); encl->secs.epc_page = NULL; - sgx_encl_put_backing(&secs_backing, true); + sgx_encl_put_backing(&secs_backing); } out: @@ -411,7 +413,7 @@ static void sgx_reclaim_pages(void) encl_page = epc_page->owner; sgx_reclaimer_write(epc_page, &backing[i]); - sgx_encl_put_backing(&backing[i], true); + sgx_encl_put_backing(&backing[i]); kref_put(&encl_page->encl->refcount, sgx_encl_release); epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; From patchwork Mon May 9 21:48:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Reinette Chatre X-Patchwork-Id: 12844089 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94B90C4332F for ; Mon, 9 May 2022 21:49:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230332AbiEIVxg (ORCPT ); Mon, 9 May 2022 17:53:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59344 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230058AbiEIVw1 (ORCPT ); Mon, 9 May 2022 17:52:27 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ECF3F274A24 for ; Mon, 9 May 2022 14:48:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1652132894; x=1683668894; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3CwrkwoWgI+gj3QvjNXSD+lpJOi4/RJbVtvcKVuxOlM=; b=BpcqQROPw/Ig/cBIzRvSxugBsMKAElrk24b4Bcp3bNEAonizo4BNKtyb 1zsIGwjBx2smS0/uDo3Bsuq3UAGZlI4r0Vc/k1sITEyazWRpLdiiFCmkX /wNYazZke6FQGzxNf48FWn8R8eyIJ941eXZBu0zrSym9Iswhg63XzY45Z sx1Q4VS1Lc+nJxVnOqnmWxte+XUuQi2Zvq+Md9xXixIi4wNx1GjaZ41+u 7CwR/nuK8xZ/jywLGnboN8ip/XRfKvLWQ1urvGYUbFVflUANF+KC1Ut7N 483F9O/zXOmvdz5c/hEPTPonl28NWu40DBaTE664zcyAxDRKottw2RCHK A==; X-IronPort-AV: E=McAfee;i="6400,9594,10342"; a="332212853" X-IronPort-AV: E=Sophos;i="5.91,212,1647327600"; d="scan'208";a="332212853" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2022 14:48:08 -0700 X-IronPort-AV: E=Sophos;i="5.91,212,1647327600"; d="scan'208";a="565293493" Received: from rchatre-ws.ostc.intel.com ([10.54.69.144]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2022 14:48:08 -0700 From: Reinette Chatre To: dave.hansen@linux.intel.com, jarkko@kernel.org, linux-sgx@vger.kernel.org Cc: haitao.huang@intel.com Subject: [PATCH V2 2/5] x86/sgx: Mark PCMD page as dirty when modifying contents Date: Mon, 9 May 2022 14:48:00 -0700 Message-Id: <217608112793e76b335540edde75dfda290de16c.1652131695.git.reinette.chatre@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org Recent commit 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page") expanded __sgx_encl_eldu() to clear an enclave page's PCMD (Paging Crypto MetaData) from the PCMD page in the backing store after the enclave page is restored to the enclave. Since the PCMD page in the backing store is modified the page should be marked as dirty to ensure the modified data is retained. Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page") Signed-off-by: Reinette Chatre Reviewed-by: Jarkko Sakkinen --- Changes since RFC v1: - Do not set dirty bit on enclave page since it is not being written to and always will be discarded. (Dave) arch/x86/kernel/cpu/sgx/encl.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 398695a20605..2521d64e8bcf 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -83,6 +83,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page, ret = -EFAULT; } + set_page_dirty(b.pcmd); memset(pcmd_page + b.pcmd_offset, 0, sizeof(struct sgx_pcmd)); /* From patchwork Mon May 9 21:48:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Reinette Chatre X-Patchwork-Id: 12844092 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA256C433FE for ; Mon, 9 May 2022 21:49:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229947AbiEIVxk (ORCPT ); Mon, 9 May 2022 17:53:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230414AbiEIVwc (ORCPT ); Mon, 9 May 2022 17:52:32 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9DDA227B334 for ; Mon, 9 May 2022 14:48:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1652132895; x=1683668895; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=pVPMh9qJI0zCKvPUmbl5njR/x7X71fdJBQf7+7YHM/c=; b=Tis+p0Ziw8FZiHnBDNT0JLgv91Zc2RSaBtt4j8duSXmP4y/Wq/d1TE+Y o7U0XkEyN045NBLB7Usx5nd6ZtH1mNIYCDpsADkMmsr7l6sxamkir78I/ TkZqcQFnnIwTJAOos7kQVb7UpHshlOwRYO4FpQrPWDreP1/Ti2Edq/BJb M+fF2dKcRM7GvFnY2TijzoEuE+fmdW3CEB+QPFY6CjEM14uBtcYgZKtn3 JZT3wJgxuclV5rp321m8gsCYRVagOkyC+KZrq0frCMHqQgITF/sS35hsI V6ume2YPqzRH90oTayrQO3nUCKZp1LQumqRZZHTCtfGV5IXE7wSm7fUHR Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10342"; a="332212854" X-IronPort-AV: E=Sophos;i="5.91,212,1647327600"; d="scan'208";a="332212854" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2022 14:48:08 -0700 X-IronPort-AV: E=Sophos;i="5.91,212,1647327600"; d="scan'208";a="565293496" Received: from rchatre-ws.ostc.intel.com ([10.54.69.144]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2022 14:48:08 -0700 From: Reinette Chatre To: dave.hansen@linux.intel.com, jarkko@kernel.org, linux-sgx@vger.kernel.org Cc: haitao.huang@intel.com Subject: [PATCH V2 3/5] x86/sgx: Obtain backing storage page with enclave mutex held Date: Mon, 9 May 2022 14:48:01 -0700 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org Haitao reported encountering a WARN triggered by the ENCLS[ELDU] instruction faulting with a #GP. The WARN is encountered when the reclaimer evicts a range of pages from the enclave when the same pages are faulted back right away. The SGX backing storage is accessed on two paths: when there are insufficient free pages in the EPC the reclaimer works to move enclave pages to the backing storage and as enclaves access pages that have been moved to the backing storage they are retrieved from there as part of page fault handling. An oversubscribed SGX system will often run the reclaimer and page fault handler concurrently and needs to ensure that the backing store is accessed safely between the reclaimer and the page fault handler. This is not the case because the reclaimer accesses the backing store without the enclave mutex while the page fault handler accesses the backing store with the enclave mutex. Two scenarios are considered to describe the consequences of the unsafe access: (a) Scenario: Fault a page right after it was reclaimed. Consequence: The page is faulted by loading outdated data into the enclave using ENCLS[ELDU] that faults when it checks the MAC and PCMD data. (b) Scenario: Fault a page while reclaiming another page that share a PCMD page. Consequence: A race between the reclaimer and page fault handler, the reclaimer attempting to access a PCMD at the same time it is truncated by the page fault handler. This could result in lost PCMD data. Data may still be lost if the reclaimer wins the race, this is addressed in the following patch. The reclaimer accesses pages from the backing storage without holding the enclave mutex and runs the risk of concurrently accessing the backing storage with the page fault handler that does access the backing storage with the enclave mutex held. The two scenarios ((a) and (b)) are shown below. In scenario (a), a page is written to the backing store by the reclaimer and then immediately faulted back, before the reclaimer is able to set the dirty bit of the page: sgx_reclaim_pages() { sgx_vma_fault() { ... ... sgx_reclaimer_write() { mutex_lock(&encl->lock); /* Write data to backing store */ mutex_unlock(&encl->lock); } mutex_lock(&encl->lock); __sgx_encl_eldu() { ... /* Enclave backing store * page not released * nor marked dirty - * contents may not be * up to date. */ sgx_encl_get_backing(); ... /* * Enclave data restored * from backing store * and PCMD pages that * are not up to date. * ENCLS[ELDU] faults * because of MAC or PCMD * checking failure. */ sgx_encl_put_backing(); } ... /* set page dirty */ sgx_encl_put_backing(); ... mutex_unlock(&encl->lock); } } In scenario (b) below a PCMD page is truncated from the backing store after all its pages have been loaded in to the enclave at the same time the PCMD page is loaded from the backing store when one of its pages are reclaimed: sgx_reclaim_pages() { sgx_vma_fault() { ... mutex_lock(&encl->lock); ... __sgx_encl_eldu() { ... if (pcmd_page_empty) { /* * EPC page being reclaimed /* * shares a PCMD page with an * PCMD page truncated * enclave page that is being * while requested from * faulted in. * reclaimer. */ */ sgx_encl_get_backing() <----------> sgx_encl_truncate_backing_page() } mutex_unlock(&encl->lock); } } In scenario (b) there is a race between the reclaimer and the page fault handler when the reclaimer attempts to get access to the same PCMD page that is being truncated. This could result in the reclaimer writing to the PCMD page that is then truncated, causing the PCMD data to be lost, or in a new PCMD page being allocated. The lost PCMD data may still occur after protecting the backing store access with the mutex - this is fixed in the next patch. By ensuring the backing store is accessed with the mutex held the enclave page state can be made accurate with the SGX_ENCL_PAGE_BEING_RECLAIMED flag accurately reflecting that a page is in the process of being reclaimed. Consistently protect the reclaimer's backing store access with the enclave's mutex to ensure that it can safely run concurrently with the page fault handler. Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") Reported-by: Haitao Huang Signed-off-by: Reinette Chatre Reviewed-by: Jarkko Sakkinen Tested-by: Jarkko Sakkinen --- arch/x86/kernel/cpu/sgx/main.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index fad3d6c4756e..a60f8b2780fb 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -310,6 +310,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, sgx_encl_ewb(epc_page, backing); encl_page->epc_page = NULL; encl->secs_child_cnt--; + sgx_encl_put_backing(backing); if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) { ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size), @@ -381,11 +382,14 @@ static void sgx_reclaim_pages(void) goto skip; page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base); + + mutex_lock(&encl_page->encl->lock); ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]); - if (ret) + if (ret) { + mutex_unlock(&encl_page->encl->lock); goto skip; + } - mutex_lock(&encl_page->encl->lock); encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED; mutex_unlock(&encl_page->encl->lock); continue; @@ -413,7 +417,6 @@ static void sgx_reclaim_pages(void) encl_page = epc_page->owner; sgx_reclaimer_write(epc_page, &backing[i]); - sgx_encl_put_backing(&backing[i]); kref_put(&encl_page->encl->refcount, sgx_encl_release); epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; From patchwork Mon May 9 21:48:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Reinette Chatre X-Patchwork-Id: 12844091 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4D2FC433F5 for ; Mon, 9 May 2022 21:49:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230058AbiEIVxi (ORCPT ); Mon, 9 May 2022 17:53:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59354 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230423AbiEIVwc (ORCPT ); Mon, 9 May 2022 17:52:32 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B3CAC27BC4D for ; Mon, 9 May 2022 14:48:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1652132895; x=1683668895; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=whxWWVemejDOVTwU6hRaMMJSP5rXPBCrbpl+q7VREr4=; b=lUMgx1z0qK8NTIuKrBgTlN5esJaz6PkKtoVFs9BcNVj3kSNYCDjbDAJi E25/9qrZa352LZbSip7i4lGCA45mLeZuXgxNV7L1PVO/DVQTDE+jmKxGQ hhx4jZCHmVJ9KCHZSNZtv2YyozUPOTV1bcz9Q3mtRvvgahrThoAUcXcEP juSLv5/UcROvNwv4KtYZoWxbT8xGxvi76++j5xsWFg3YJ7eL/MrptOABi qeR7VGmcq60lhreJtNFsHveUbkG7n+/6Q8vU/L8U8AkrQCMO5kNwvzf56 UbX5QSjYsdPVql+leN6nX6fXJHsnfMrZeeIc3h3IrHi5NkbY6eNNb2PvM A==; X-IronPort-AV: E=McAfee;i="6400,9594,10342"; a="332212855" X-IronPort-AV: E=Sophos;i="5.91,212,1647327600"; d="scan'208";a="332212855" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2022 14:48:08 -0700 X-IronPort-AV: E=Sophos;i="5.91,212,1647327600"; d="scan'208";a="565293499" Received: from rchatre-ws.ostc.intel.com ([10.54.69.144]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2022 14:48:08 -0700 From: Reinette Chatre To: dave.hansen@linux.intel.com, jarkko@kernel.org, linux-sgx@vger.kernel.org Cc: haitao.huang@intel.com Subject: [PATCH V2 4/5] x86/sgx: Fix race between reclaimer and page fault handler Date: Mon, 9 May 2022 14:48:02 -0700 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org Haitao reported encountering a WARN triggered by the ENCLS[ELDU] instruction faulting with a #GP. The WARN is encountered when the reclaimer evicts a range of pages from the enclave when the same pages are faulted back right away. Consider two enclave pages (ENCLAVE_A and ENCLAVE_B) sharing a PCMD page (PCMD_AB). ENCLAVE_A is in the enclave memory and ENCLAVE_B is in the backing store. PCMD_AB contains just one entry, that of ENCLAVE_B. Scenario proceeds where ENCLAVE_A is being evicted from the enclave while ENCLAVE_B is faulted in. sgx_reclaim_pages() { ... /* * Reclaim ENCLAVE_A */ mutex_lock(&encl->lock); /* * Get a reference to ENCLAVE_A's * shmem page where enclave page * encrypted data will be stored * as well as a reference to the * enclave page's PCMD data page, * PCMD_AB. * Release mutex before writing * any data to the shmem pages. */ sgx_encl_get_backing(...); encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED; mutex_unlock(&encl->lock); /* * Fault ENCLAVE_B */ sgx_vma_fault() { mutex_lock(&encl->lock); /* * Get reference to * ENCLAVE_B's shmem page * as well as PCMD_AB. */ sgx_encl_get_backing(...) /* * Load page back into * enclave via ELDU. */ /* * Release reference to * ENCLAVE_B' shmem page and * PCMD_AB. */ sgx_encl_put_backing(...); /* * PCMD_AB is found empty so * it and ENCLAVE_B's shmem page * are truncated. */ /* Truncate ENCLAVE_B backing page */ sgx_encl_truncate_backing_page(); /* Truncate PCMD_AB */ sgx_encl_truncate_backing_page(); mutex_unlock(&encl->lock); ... } mutex_lock(&encl->lock); encl_page->desc &= ~SGX_ENCL_PAGE_BEING_RECLAIMED; /* * Write encrypted contents of * ENCLAVE_A to ENCLAVE_A shmem * page and its PCMD data to * PCMD_AB. */ sgx_encl_put_backing(...) /* * Reference to PCMD_AB is * dropped and it is truncated. * ENCLAVE_A's PCMD data is lost. */ mutex_unlock(&encl->lock); } What happens next depends on whether it is ENCLAVE_A being faulted in or ENCLAVE_B being evicted - but both end up with ENCLS[ELDU] faulting with a #GP. If ENCLAVE_A is faulted then at the time sgx_encl_get_backing() is called a new PCMD page is allocated and providing the empty PCMD data for ENCLAVE_A would cause ENCLS[ELDU] to #GP If ENCLAVE_B is evicted first then a new PCMD_AB would be allocated by the reclaimer but later when ENCLAVE_A is faulted the ENCLS[ELDU] instruction would #GP during its checks of the PCMD value and the WARN would be encountered. Noting that the reclaimer sets SGX_ENCL_PAGE_BEING_RECLAIMED at the time it obtains a reference to the backing store pages of an enclave page it is in the process of reclaiming, fix the race by only truncating the PCMD page after ensuring that no page sharing the PCMD page is in the process of being reclaimed. Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page") Reported-by: Haitao Huang Signed-off-by: Reinette Chatre --- arch/x86/kernel/cpu/sgx/encl.c | 90 +++++++++++++++++++++++++++++++++- 1 file changed, 89 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 2521d64e8bcf..d1d4e8572702 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -12,6 +12,87 @@ #include "encls.h" #include "sgx.h" +#define PCMDS_PER_PAGE (PAGE_SIZE / sizeof(struct sgx_pcmd)) +/* + * 32 PCMD entries share a PCMD page. PCMD_FIRST_MASK is used to + * determine the page index associated with the first PCMD entry + * within a PCMD page. + */ +#define PCMD_FIRST_MASK GENMASK(4, 0) + +/** + * pcmd_page_in_use() - Query if any enclave page associated with a PCMD + * page is in process of being reclaimed. + * @encl: Enclave to which PCMD page belongs + * @start_addr: Address of enclave page using first entry within the PCMD page + * + * When an enclave page is reclaimed some Paging Crypto MetaData (PCMD) is + * stored. The PCMD data of a reclaimed enclave page contains enough + * information for the processor to verify the page at the time + * it is loaded back into the Enclave Page Cache (EPC). + * + * The backing storage to which enclave pages are reclaimed is laid out as + * follows: + * All enclave pages:SECS page:PCMD pages + * + * Each PCMD page contains the PCMD metadata of + * PAGE_SIZE/sizeof(struct sgx_pcmd) enclave pages. + * + * A PCMD page can only be truncated if it is (a) empty, and (b) no enclave + * page sharing the PCMD page is in the process of being reclaimed. + * + * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it + * intends to reclaim that enclave page - it means that the PCMD data + * associated with that enclave page is about to get some data and thus + * even if the PCMD page is empty, it should not be truncated. + * + * Return: 1 the PCMD page is in use, 0 if is not in use, -EFAULT on error + */ +static int pcmd_page_in_use(struct sgx_encl *encl, + unsigned long start_addr) +{ + struct sgx_encl_page *entry; + unsigned long addr; + int reclaimed = 0; + int i; + + /* + * PCMD_FIRST_MASK is based on number of PCMD entries within + * PCMD page being 32. + */ + BUILD_BUG_ON(PCMDS_PER_PAGE != 32); + + for (i = 0; i < PCMDS_PER_PAGE; i++) { + addr = start_addr + i * PAGE_SIZE; + + /* + * Stop when reaching the SECS page - it does not + * have a page_array entry and its reclaim is + * started and completed with enclave mutex held so + * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED + * flag. + */ + if (addr == encl->base + encl->size) + break; + + entry = xa_load(&encl->page_array, PFN_DOWN(addr)); + if (!entry) + return -EFAULT; + + /* + * VA page slot ID uses same bit as the flag so it is important + * to ensure that the page is not already in backing store. + */ + if (entry->epc_page && + (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) { + reclaimed = 1; + break; + } + } + + return reclaimed; +} + /* * Calculate byte offset of a PCMD struct associated with an enclave page. PCMD's * follow right after the EPC data in the backing storage. In addition to the @@ -47,6 +128,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page, unsigned long va_offset = encl_page->desc & SGX_ENCL_PAGE_VA_OFFSET_MASK; struct sgx_encl *encl = encl_page->encl; pgoff_t page_index, page_pcmd_off; + unsigned long pcmd_first_page; struct sgx_pageinfo pginfo; struct sgx_backing b; bool pcmd_page_empty; @@ -58,6 +140,12 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page, else page_index = PFN_DOWN(encl->size); + /* + * Address of enclave page using the first entry within the + * PCMD page. + */ + pcmd_first_page = PFN_PHYS(page_index & ~PCMD_FIRST_MASK) + encl->base; + page_pcmd_off = sgx_encl_get_backing_page_pcmd_offset(encl, page_index); ret = sgx_encl_get_backing(encl, page_index, &b); @@ -99,7 +187,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page, sgx_encl_truncate_backing_page(encl, page_index); - if (pcmd_page_empty) + if (pcmd_page_empty && !pcmd_page_in_use(encl, pcmd_first_page)) sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off)); return ret; From patchwork Mon May 9 21:48:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Reinette Chatre X-Patchwork-Id: 12844090 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27F2DC43217 for ; Mon, 9 May 2022 21:49:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230381AbiEIVxj (ORCPT ); Mon, 9 May 2022 17:53:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230442AbiEIVwl (ORCPT ); Mon, 9 May 2022 17:52:41 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 041142701BE for ; Mon, 9 May 2022 14:48:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1652132895; x=1683668895; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bbYYQxyHa8LEuTGa6nZky7nvbJ9u7KqCGscsCsj+0Xw=; b=UmBnNobAgHyoZET40cNm176zODj02mk3Cj+y4ZsJt3H4hCRkf/vkmqNe 5052uo5Zi041bw0gObFlH20BD9sMoHDJxUTbketLvDW2G3Hgazrd0b/UQ OB1Rc1QCKHsqSsrkoI4SULwZMgafOGGy7faiAYKF8wjqaO9KZlOK/ihav yKw93LBoEJVNT88rQVJSVA6foQOo7g+VPtQkpu85P6YmnIgDjFEPJOdue UBo8QMxBu6ZfbOxb0XXbL+oiqTEnSM9ZLG5uLQ5hjKGxOMKTJtE1e/ELf +ErOa856LQeG/2SfMHsaywUqFwsUosdAcdhwIdu7bsYc73ogoTwSHXwJg w==; X-IronPort-AV: E=McAfee;i="6400,9594,10342"; a="332212856" X-IronPort-AV: E=Sophos;i="5.91,212,1647327600"; d="scan'208";a="332212856" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2022 14:48:08 -0700 X-IronPort-AV: E=Sophos;i="5.91,212,1647327600"; d="scan'208";a="565293501" Received: from rchatre-ws.ostc.intel.com ([10.54.69.144]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2022 14:48:08 -0700 From: Reinette Chatre To: dave.hansen@linux.intel.com, jarkko@kernel.org, linux-sgx@vger.kernel.org Cc: haitao.huang@intel.com Subject: [PATCH V2 5/5] x86/sgx: Ensure no data in PCMD page after truncate Date: Mon, 9 May 2022 14:48:03 -0700 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org A PCMD (Paging Crypto MetaData) page contains the PCMD structures of enclave pages that have been encrypted and moved to the shmem backing store. When all enclave pages sharing a PCMD page are loaded in the enclave, there is no need for the PCMD page and it can be truncated from the backing store. A few issues appeared around the truncation of PCMD pages. The known issues have been addressed but the PCMD handling code could be made more robust by loudly complaining if any new issue appears in this area. Add a check that will complain once with a WARN if the PCMD page is not actually empty after it has been truncated. There should never be data in the PCMD page at this point since it is always updated with the enclave mutex held. Suggested-by: Dave Hansen Signed-off-by: Reinette Chatre --- arch/x86/kernel/cpu/sgx/encl.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index d1d4e8572702..af972dbad965 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -183,12 +183,19 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page, kunmap_atomic(pcmd_page); kunmap_atomic((void *)(unsigned long)pginfo.contents); + get_page(b.pcmd); sgx_encl_put_backing(&b); sgx_encl_truncate_backing_page(encl, page_index); - if (pcmd_page_empty && !pcmd_page_in_use(encl, pcmd_first_page)) + if (pcmd_page_empty && !pcmd_page_in_use(encl, pcmd_first_page)) { sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off)); + pcmd_page = kmap_atomic(b.pcmd); + WARN_ON_ONCE(memchr_inv(pcmd_page, 0, PAGE_SIZE)); + kunmap_atomic(pcmd_page); + } + + put_page(b.pcmd); return ret; }