[v19,RESEND,18/27] x86/sgx: Add swapping code to the core and SGX driver

Because the kernel is untrusted, swapping pages in/out of the Enclave
Page Cache (EPC) has specialized requirements:

* The kernel cannot directly access EPC memory, i.e. cannot copy data
  to/from the EPC.
* To evict a page from the EPC, the kernel must "prove" to hardware that
  are no valid TLB entries for said page since a stale TLB entry would
  allow an attacker to bypass SGX access controls.
* When loading a page back into the EPC, hardware must be able to verify
  the integrity and freshness of the data.
* When loading an enclave page, e.g. regular pages and Thread Control
  Structures (TCS), hardware must be able to associate the page with a
  Secure Enclave Control Structure (SECS).

To satisfy the above requirements, the CPU provides dedicated ENCLS
functions to support paging data in/out of the EPC:

* EBLOCK:   Mark a page as blocked in the EPC Map (EPCM).  Attempting
            to access a blocked page that misses the TLB will fault.
* ETRACK:   Activate blocking tracking.  Hardware verifies that all
            translations for pages marked as "blocked" have been flushed
	    from the TLB.
* EPA:      Add version array page to the EPC.  As the name suggests, a
            VA page is an 512-entry array of version numbers that are
	    used to uniquely identify pages evicted from the EPC.
* EWB:      Write back a page from EPC to memory, e.g. RAM.  Software
            must supply a VA slot, memory to hold the a Paging Crypto
	    Metadata (PCMD) of the page and obviously backing for the
	    evicted page.
* ELD{B,U}: Load a page in {un}blocked state from memory to EPC.  The
            driver only uses the ELDU variant as there is no use case
	    for loading a page as "blocked" in a bare metal environment.

To top things off, all of the above ENCLS functions are subject to
strict concurrency rules, e.g. many operations will #GP fault if two
or more operations attempt to access common pages/structures.

To put it succinctly, paging in/out of the EPC requires coordinating
with the SGX driver where all of an enclave's tracking resides.  But,
simply shoving all reclaim logic into the driver is not desirable as
doing so has unwanted long term implications:

* Oversubscribing EPC to KVM guests, i.e. virtualizing SGX in KVM and
  swapping a guest's EPC pages (without the guest's cooperation) needs
  the same high level flows for reclaim but has painfully different
  semantics in the details.
* Accounting EPC, i.e. adding an EPC cgroup controller, is desirable
  as EPC is effectively a specialized memory type and even more scarce
  than system memory.  Providing a single touchpoint for EPC accounting
  regardless of end consumer greatly simplifies the EPC controller.
* Allowing the userspace-facing driver to be built as a loaded module
  is desirable, e.g. for debug, testing and development.  The cgroup
  infrastructure does not support dependencies on loadable modules.
* Separating EPC swapping from the driver once it has been tightly
  coupled to the driver is non-trivial (speaking from experience).

So, although the SGX driver is currently the sole consumer of EPC,
encapsulate EPC swapping in the driver to minimize the dependencies
between the core SGX code and driver, and do so in a way that can be
extended to an abstracted interface with minimal effort.

To that end, add functions to swap EPC pages to the driver.  The user
of these functions will be the core SGX subsystem, which will be enabled
in a future patch.

* sgx_encl_page_{get,put}() - Attempt to pin/unpin (the owner of) an EPC
  page so that it can be operated on by a reclaimer.
* sgx_encl_page_reclaim()   - Mark a page as being reclaimed. The
  page is considered reclaimable if it hasn't been accessed recently and
  it isn't reserved by the driver for other use.
* sgx_encl_page_block()     - EBLOCK an EPC page
* sgx_encl_page_write()     - Evict an EPC page to the regular memory via
  EWB.  Activates ETRACK (via sgx_encl_track()) if necessary.

Since we also need to be able to fault pages back into the EPC, add a
page fault handler to allocate an EPC page and ELDU a previously evicted
page.

Wire up the EPC manager's reclaim flow to the SGX driver's swapping
functionality.  In the long term there will be multiple users of the
EPC manager, e.g. SGX driver and KVM, thus the interface between the
EPC manager and the driver is fairly genericized and decoupled.  But
to avoid adding unusued infrastructure, do not add any indirection
between the EPC manager and the SGX driver.  This has the unfortunate
and odd side effect of preventing the SGX driver from being compiled
as a loadable module.  However, this should be a temporary situation
that is remedied when a second user of EPC is added, i.e. KVM.

The swapper thread ksgxswapd reclaims pages on the event when the number
of free EPC pages goes below %SGX_NR_LOW_PAGES up until it reaches
%SGX_NR_HIGH_PAGES.

Pages are reclaimed in LRU fashion from a global list. The consumers
take care of calling EBLOCK (block page from new accesses), ETRACK
(restart counting the entering hardware threads) and EWB (write page to
the regular memory) because executing these operations usually (if not
always) requires to do some subsystem-internal locking operations.

Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Serge Ayoun <serge.ayoun@intel.com>
Signed-off-by: Serge Ayoun <serge.ayoun@intel.com>
Co-developed-by: Shay Katz-zamir <shay.katz-zamir@intel.com>
Signed-off-by: Shay Katz-zamir <shay.katz-zamir@intel.com>
---
 arch/x86/Kconfig                       |   3 +
 arch/x86/kernel/cpu/sgx/Makefile       |   1 +
 arch/x86/kernel/cpu/sgx/driver/ioctl.c |  59 +++-
 arch/x86/kernel/cpu/sgx/encl.c         | 267 +++++++++++++++-
 arch/x86/kernel/cpu/sgx/encl.h         |  38 +++
 arch/x86/kernel/cpu/sgx/main.c         |  96 +++++-
 arch/x86/kernel/cpu/sgx/reclaim.c      | 410 +++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/sgx.h          |  34 +-
 8 files changed, 887 insertions(+), 21 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/sgx/reclaim.c

Message ID	20190320162119.4469-19-jarkko.sakkinen@linux.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-sgx-owner@kernel.org> Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F37D613B5 for <patchwork-linux-sgx@patchwork.kernel.org>; Wed, 20 Mar 2019 16:25:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D410828807 for <patchwork-linux-sgx@patchwork.kernel.org>; Wed, 20 Mar 2019 16:25:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C7CF628864; Wed, 20 Mar 2019 16:25:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7D3E428A22 for <patchwork-linux-sgx@patchwork.kernel.org>; Wed, 20 Mar 2019 16:25:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727202AbfCTQZJ (ORCPT <rfc822;patchwork-linux-sgx@patchwork.kernel.org>); Wed, 20 Mar 2019 12:25:09 -0400 Received: from mga09.intel.com ([134.134.136.24]:31869 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726860AbfCTQZI (ORCPT <rfc822;linux-sgx@vger.kernel.org>); Wed, 20 Mar 2019 12:25:08 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Mar 2019 09:25:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,249,1549958400"; d="scan'208";a="135715572" Received: from sorenthe-mobl1.ger.corp.intel.com (HELO localhost) ([10.249.254.203]) by orsmga003.jf.intel.com with ESMTP; 20 Mar 2019 09:24:50 -0700 From: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> To: linux-kernel@vger.kernel.org, x86@kernel.org, linux-sgx@vger.kernel.org Cc: akpm@linux-foundation.org, dave.hansen@intel.com, sean.j.christopherson@intel.com, nhorman@redhat.com, npmccallum@redhat.com, serge.ayoun@intel.com, shay.katz-zamir@intel.com, haitao.huang@intel.com, andriy.shevchenko@linux.intel.com, tglx@linutronix.de, kai.svahn@intel.com, bp@alien8.de, josh@joshtriplett.org, luto@kernel.org, kai.huang@intel.com, rientjes@google.com, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Subject: [PATCH v19,RESEND 18/27] x86/sgx: Add swapping code to the core and SGX driver Date: Wed, 20 Mar 2019 18:21:10 +0200 Message-Id: <20190320162119.4469-19-jarkko.sakkinen@linux.intel.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20190320162119.4469-1-jarkko.sakkinen@linux.intel.com> References: <20190320162119.4469-1-jarkko.sakkinen@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-sgx-owner@vger.kernel.org Precedence: bulk List-ID: <linux-sgx.vger.kernel.org> X-Mailing-List: linux-sgx@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP
Series	Intel SGX1 support \| expand [v19,RESEND,00/27] Intel SGX1 support [v19,RESEND,01/27] x86/cpufeatures: Add Intel-defined SGX feature bit [v19,RESEND,02/27] x86/cpufeatures: Add SGX sub-features (as Linux-defined bits) [v19,RESEND,03/27] x86/msr: Add IA32_FEATURE_CONTROL.SGX_ENABLE definition [v19,RESEND,04/27] x86/cpufeatures: Add Intel-defined SGX_LC feature bit [v19,RESEND,05/27] x86/msr: Add SGX Launch Control MSR definitions [v19,RESEND,06/27] x86/mm: x86/sgx: Add new 'PF_SGX' page fault error code bit [v19,RESEND,07/27] x86/mm: x86/sgx: Signal SIGSEGV for userspace #PFs w/ PF_SGX [v19,RESEND,08/27] x86/cpu/intel: Detect SGX support and update caps appropriately [v19,RESEND,09/27] x86/sgx: Add ENCLS architectural error codes [v19,RESEND,10/27] x86/sgx: Add SGX1 and SGX2 architectural data structures [v19,RESEND,11/27] x86/sgx: Add definitions for SGX's CPUID leaf and variable sub-leafs [v19,RESEND,12/27] x86/sgx: Enumerate and track EPC sections [v19,RESEND,13/27] x86/sgx: Add wrappers for ENCLS leaf functions [v19,RESEND,14/27] x86/sgx: Add functions to allocate and free EPC pages [v19,RESEND,15/27] x86/sgx: Add sgx_einit() for initializing enclaves [v19,RESEND,16/27] x86/sgx: Add the Linux SGX Enclave Driver [v19,RESEND,17/27] x86/sgx: Add provisioning [v19,RESEND,18/27] x86/sgx: Add swapping code to the core and SGX driver [v19,RESEND,19/27] x86/sgx: ptrace() support for the SGX driver [v19,RESEND,20/27] x86/vdso: Add support for exception fixup in vDSO functions [v19,RESEND,21/27] x86/fault: Add helper function to sanitize error code [v19,RESEND,22/27] x86/fault: Attempt to fixup unhandled #PF in vDSO before signaling [v19,RESEND,23/27] x86/traps: Attempt to fixup exceptions in vDSO before signaling [v19,RESEND,24/27] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions [v19,RESEND,25/27] x86/sgx: SGX documentation [v19,RESEND,26/27] selftests/x86: Add a selftest for SGX [v19,RESEND,27/27] x86/sgx: Update MAINTAINERS

[v19,RESEND,18/27] x86/sgx: Add swapping code to the core and SGX driver

Commit Message

Patch