[v29,00/20] Intel SGX foundations

Message ID	20200421215316.56503-1-jarkko.sakkinen@linux.intel.com (mailing list archive)
Headers	show Return-Path: <SRS0=X7Hr=6F=vger.kernel.org=linux-sgx-owner@kernel.org> IronPort-SDR: VKHt7fbKsYLFKn4H3ZFRB97UcGr3BDeGjYDusH/t4qlAdUinSDb2uTOe19iPDmf6Dez3U/GfTJ i1iZogPciPlA== IronPort-SDR: c13wrYRa9/nO+w4n2/iJ1NVXF3KHAcJK+1rO0+T8h7iHmHEreEOET+crS6bAEjGTpkPjI1R/Zj GtU+pNin7bIw== From: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> To: linux-kernel@vger.kernel.org, x86@kernel.org, linux-sgx@vger.kernel.org Cc: akpm@linux-foundation.org, dave.hansen@intel.com, sean.j.christopherson@intel.com, nhorman@redhat.com, npmccallum@redhat.com, haitao.huang@intel.com, andriy.shevchenko@linux.intel.com, tglx@linutronix.de, kai.svahn@intel.com, bp@alien8.de, josh@joshtriplett.org, luto@kernel.org, kai.huang@intel.com, rientjes@google.com, cedric.xing@intel.com, puiterwijk@redhat.com, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Subject: [PATCH v29 00/20] Intel SGX foundations Date: Wed, 22 Apr 2020 00:52:56 +0300 Message-Id: <20200421215316.56503-1-jarkko.sakkinen@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-sgx-owner@vger.kernel.org Precedence: bulk
Series	Intel SGX foundations \| expand [v29,00/20] Intel SGX foundations [v29,01/20] x86/sgx: Update MAINTAINERS [v29,02/20] x86/cpufeatures: x86/msr: Add Intel SGX hardware bits [v29,03/20] x86/cpufeatures: x86/msr: Intel SGX Launch Control hardware bits [v29,04/20] x86/mm: x86/sgx: Signal SIGSEGV with PF_SGX [v29,05/20] x86/sgx: Add SGX microarchitectural data structures [v29,06/20] x86/sgx: Add wrappers for ENCLS leaf functions [v29,07/20] x86/cpu/intel: Detect SGX support [v29,08/20] x86/sgx: Enumerate and track EPC sections [v29,09/20] x86/sgx: Add functions to allocate and free EPC pages [v29,10/20] mm: Introduce vm_ops->may_mprotect() [v29,11/20] x86/sgx: Linux Enclave Driver [v29,12/20] x86/sgx: Add provisioning [v29,13/20] x86/sgx: Add a page reclaimer [v29,14/20] x86/sgx: ptrace() support for the SGX driver [v29,15/20] x86/vdso: Add support for exception fixup in vDSO functions [v29,16/20] x86/fault: Add helper function to sanitize error code [v29,17/20] x86/traps: Attempt to fixup exceptions in vDSO before signaling [v29,18/20] x86/vdso: Implement a vDSO for Intel SGX enclave call [v29,19/20] selftests/x86: Add a selftest for SGX [v29,20/20] docs: x86/sgx: Document SGX micro architecture and kernel internals

Jarkko Sakkinen April 21, 2020, 9:52 p.m. UTC

Intel(R) SGX is a set of CPU instructions that can be used by applications
to set aside private regions of code and data. The code outside the enclave
is disallowed to access the memory inside the enclave by the CPU access
control.

There is a new hardware unit in the processor called Memory Encryption
Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define
one or many MEE regions that can hold enclave data by configuring them with
PRMRR registers.

The MEE automatically encrypts the data leaving the processor package to
the MEE regions. The data is encrypted using a random key whose life-time
is exactly one power cycle.

The current implementation requires that the firmware sets
IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can
decide what enclaves it wants run. The implementation does not create
any bottlenecks to support read-only MSRs later on.

You can tell if your CPU supports SGX by looking into /proc/cpuinfo:

	cat /proc/cpuinfo  | grep sgx

v29:
* The selftest has been moved to selftests/sgx. Because SGX is an execution
  environment of its own, it really isn't a great fit with more "standard"
  x86 tests.

  The RSA key is now generated on fly and the whole signing process has
  been made as part of the enclave loader instead of signing the enclave
  during the compilation time.

  Finally, the enclave loader loads now the test enclave directly from its
  ELF file, which means that ELF file does not need to be coverted as raw
  binary during the build process.
* Version the mm_list instead of using on synchronize_mm() when adding new
  entries. We hold the write lock for the mm_struct, and dup_mm() can thus
  deadlock with the page reclaimer, which could hold the lock for the old
  mm_struct.
* Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. anonymous) can
  be used to reserve the address range. Now /dev/sgx supports only opaque
  mappings to the (initialized) enclave data.
* Make the vDSO callable directly from C by preserving RBX and taking leaf
  from RCX.

v28:
* Documented to Documentation/x86/sgx.rst how the kernel manages the
  enclave ownership.
* Removed non-LC flow from sgx_einit().
* Removed struct sgx_einittoken since only the size of the corresponding
  microarchitectural structure is used in the series ATM.

v27:
* Disallow RIE processes to use enclaves as there could a permission
  conflict between VMA and enclave permissions.
* In the documentation, replace "grep /proc/cpuinfo" with
  "grep sgx /proc/cpuinfo".

v26:
* Fixed the commit author in "x86/sgx: Linux Enclave Driver", which was
  changed in v25 by mistake.
* Addressed a bunch of grammar mistakes in sgx.rst (thanks Randy once
  again for such a detailed feedback).
* Added back the MAINTAINERS update commit, which was mistakenly removed
  in v25.
* EREMOVE's for SECS cannot be done while sanitizing an EPC section. The
  CPU does not allow to remove a SECS page before all of its children have
  been removed, and a child page can be in some other section than the one
  currently being processed. Thus, removed special SECS processing from
  sgx_sanitize_page() and instead put sections through it twice. In the
  2nd round the lists should only contain SECS pages.

v25:
* Fix a double-free issue when SGX_IOC_ENCLAVE_ADD_PAGES
  fails on executing ENCLS[EADD]. The rollback path executed
  radix_tree_delete() on the same address twice when this happened.
* Return -EINTR instead of -ERESTARTSYS in SGX_IOC_ENCLAVE_ADD_PAGES when
  a signal is pending.
* As requested by Borislav, move the CPUID 0x12 features to their own word
  in cpufeatures.
* Sean fixed a bug from sgx_reclaimer_write() where sgx_encl_put_backing()
  was called with an uninitialized pointer when sgx_encl_get_backing()
  fails.
* Migrated /dev/sgx/* to misc. This is future-proof as struct miscdevice
  has 'groups' for setting up sysfs attributes for the device.
* Use device_initcall instead of subsys_initcall so that misc_class is
  initialized before SGX is initialized.
* Return -EACCES in SGX_IOC_ENCLAVE_INIT when caller tries to select
  enclave attributes that we the kernel does not allow it to set instead
  of -EINVAL.
* Unless SGX public key MSRs are writable always deny the feature from
  Linux. Previously this was only denied from driver. How VMs should be
  supported is not really part of initial patch set, which makes this
  an obvious choice.
* Cleaned up and refined documentation to be more approachable.

v24:
* Reclaim unmeasured and TCS pages (regression in v23).
* Replace usages of GFP_HIGHUSER with GFP_KERNEL.
* Return -EIO on when EADD or EEXTEND fails in %SGX_IOC_ENCLAVE_ADD_PAGES
  and use the same rollback (destroy enclave). This can happen when host
  suspends itself unknowingly to a VM running enclaves. From -EIO the user
  space can deduce what happened.
* Have a separate @count in struct sgx_enclave_add_pages to output number
  of bytes processed instead of overwriting the input parameters for
  clarity and more importantly that the API provides means for partial
  processing (@count could be less than @length in success case).

v23:
* Replace SGX_ENCLAVE_ADD_PAGE with SGX_ENCLAVE_ADD_PAGES. Replace @mrmask
  with %SGX_PAGE_MEASURE flag.
* Return -EIO instead of -ECANCELED when ptrace() fails to read a TCS page.
* In the reclaimer, pin page before ENCLS[EBLOCK] because pinning can fail
  (because of OOM) even in legit behaviour and after EBLOCK the reclaiming
  flow can be only reverted by killing the whole enclave.
* Fixed SGX_ATTR_RESERVED_MASK. Bit 7 was marked as reserved while in fact
  it should have been bit 6 (Table 37-3 in the SDM).
* Return -EPERM from SGX_IOC_ENCLAVE_INIT when ENCLS[EINIT] returns an SGX
  error code.

v22:
* Refined bunch commit messages and added associated SDM references as
  many of them were too exhausting and some outdated.
* Alignment checks have been removed from mmap() because it does not define the
  ELRANGE. VMAs only act as windows to the enclave. The semantics compare
  somewhat how mmap() works with regular files.
* We now require user space addresses given to SGX_IOC_ENCLAVE_ADD_PAGE to be
  page aligned so that we can pass the page directly to EADD and do not have
  to do an extra copy. This was made effectively possible by removing the
  worker thread for adding pages.
* The selftest build files have been refined throughout of various glitches
  and work properly in a cross compilation environment such as BuildRoot.
  In addition, libcalls fail the build with an assertion in the linker
  script, if they end up to the enclave binary.
* CONFIG_INTEL_SGX_DRIVER has been removed because you cannot use SGX core
  for anything without having the driver. This could change when KVM support
  is added.
* We require zero permissions in SECINFO for TCS pages because the CPU
  overwrites SECINFO flags with zero permissions and measures the page
  only after that. Allowing to pass TCS with non-zero permissions would
  cause mismatching measurement between the one provided in SIGSTRUCT and
  the one computed by the CPU.
* Obviously lots of small fixes and clean ups (does make sense to
  document them all).

v21:
* Check on mmap() that the VMA does cover an area that does not have
  enclave pages. Only mapping with PROT_NONE can do that to reserve
  initial address space for an enclave.
* Check om mmap() and mprotect() that the VMA permissions do not
  surpass the enclave permissions.
* Remove two refcounts from vma_close(): mm_list and encl->refcount.
  Enclave refcount is only need for swapper/enclave sync and we can
  remove mm_list refcount by destroying mm_struct when the process
  is closed. By not having vm_close() the Linux MM can merge VMAs.
* Do not naturally align MAP_FIXED address.
* Numerous small fixes and clean ups.
* Use SRCU for synchronizing the list of mm_struct's.
* Move to stack based call convention in the vDSO.

v20:
* Fine-tune Kconfig messages and spacing and remove MMU_NOTIFIER
  dependency as MMU notifiers are no longer used in the driver.
* Use mm_users instead of mm_count as refcount for mm_struct as mm_count
  only protects from deleting mm_struct, not removing its contents.
* Sanitize EPC when the reclaimer thread starts by doing EREMOVE for all
  of them. They could be in initialized state when the kernel starts
  because it might be spawned by kexec().
* Documentation overhaul.
* Use a device /dev/sgx/provision for delivering the provision token
  instead of securityfs.
* Create a reference to the enclave when already when opening
  /dev/sgx/enclave.  The file is then associated with this enclave only.
  mmap() can be done at free at any point and always get a reference to
  the enclave. To summarize the file now represents the enclave.

v19:
* Took 3-4 months but in some sense this was more like a rewrite of most
  of the corners of the source code. If I've forgotten to deal with some
  feedback, please don't shout me. Make a remark and I will fix it for
  the next version. Hopefully there won't be this big turnovers anymore.
* Validate SECS attributes properly against CPUID given attributes and
  against allowed attributes. SECS attributes are the ones that are
  enforced whereas SIGSTRUCT attributes tell what is required to run
  the enclave.
* Add KSS (Key Sharing Support) to the enclave attributes.
* Deny MAP_PRIVATE as an enclave is always a shared memory entity.
* Revert back to shmem backing storage so that it can be easily shared
  by multiple processes.
* Split the recognization of an ENCLS leaf failure by using three
  functions to detect it: encsl_faulted(), encls_returned_code() and
  sgx_failed(). encls_failed() is only caused by a spurious expections that
  should never happen. Thus, it is not defined as an inline function in
  order to easily insert a kprobe to it.
* Move low-level enclave management routines, page fault handler and page
  reclaiming routines from driver to the core. These cannot be separated
  from each other as they are heavily interdependent. The rationale is that
  the core does not call any code from the driver.
* Allow the driver to be compiled as a module now that it no code is using
  its routines and it only uses exported symbols. Now the driver is
  essentially just a thin ioctl layer.
* Reworked the driver to maintain a list of mm_struct's. The VMA callbacks
  add new entries to this list as the process is forked. Each entry has
  its own refcount because they have a different life-cycle as the enclave
  does. In effect @tgid and @mm have been removed from struct sgx_encl
  and we allow forking by removing VM_DONTCOPY from vm flags.
* Generate a cpu mask in the reclaimer from the cpu mask's of all
  mm_struct's. This will kick out the hardware threads out of the enclave
  from multiple processes. It is not a local variable because it would
  eat too much of the stack space but instead a field in struct
  sgx_encl.
* Allow forking i.e. remove VM_DONTCOPY. I did not change the API
  because the old API scaled to the workload that Andy described. The
  codebase is now mostly API independent i.e. changing the API is a
  small task. For me the proper trigger to chanage it is a as concrete
  as possible workload that cannot be fulfilled. I hope you understand
  my thinking here. I don't want to change anything w/o proper basis
  but I'm ready to change anything if there is a proper basis. I do
  not have any kind of attachment to any particular type of API.
* Add Sean's vDSO ENCLS(EENTER) patches and update selftest to use the
  new vDSO.

v18:
* Update the ioctl-number.txt.
* Move the driver under arch/x86.
* Add SGX features (SGX, SGX1, SGX2) to the disabled-features.h.
* Rename the selftest as test_sgx (previously sgx-selftest).
* In order to enable process accounting, swap EPC pages and PCMD's to a VMA
  instead of shmem.
* Allow only to initialize and run enclaves with a subset of
  {DEBUG, MODE64BIT} set.
* Add SGX_IOC_ENCLAVE_SET_ATTRIBUTE to allow an enclave to have privileged
  attributes e.g. PROVISIONKEY.

v17:
* Add a simple selftest.
* Fix a null pointer dereference to section->pages when its
  allocation fails.
* Add Sean's description of the exception handling to the documentation.

v16:
* Fixed SOB's in the commits that were a bit corrupted in v15.
* Implemented exceptio handling properly to detect_sgx().
* Use GENMASK() to define SGX_CPUID_SUB_LEAF_TYPE_MASK.
* Updated the documentation to use rst definition lists.
* Added the missing Documentation/x86/index.rst, which has a link to
  intel_sgx.rst. Now the SGX and uapi documentation is properly generated
  with 'make htmldocs'.
* While enumerating EPC sections, if an undefined section is found, fail
  the driver initialization instead of continuing the initialization.
* Issue a warning if there are more than %SGX_MAX_EPC_SECTIONS.
* Remove copyright notice from arch/x86/include/asm/sgx.h.
* Migrated from ioremap_cache() to memremap().

v15:
* Split into more digestable size patches.
* Lots of small fixes and clean ups.
* Signal a "plain" SIGSEGV on an EPCM violation.

v14:
* Change the comment about X86_FEATURE_SGX_LC from “SGX launch
  configuration” to “SGX launch control”.
* Move the SGX-related CPU feature flags as part of the Linux defined
  virtual leaf 8.
* Add SGX_ prefix to the constants defining the ENCLS leaf functions.
* Use GENMASK*() and BIT*() in sgx_arch.h instead of raw hex numbers.
* Refine the long description for CONFIG_INTEL_SGX_CORE.
* Do not use pr_*_ratelimited()  in the driver. The use of the rate limited
  versions is legacy cruft from the prototyping phase.
* Detect sleep with SGX_INVALID_EINIT_TOKEN instead of counting power
  cycles.
* Manually prefix with “sgx:” in the core SGX code instead of redefining
  pr_fmt.
* Report if IA32_SGXLEPUBKEYHASHx MSRs are not writable in the driver
  instead of core because it is a driver requirement.
* Change prompt to bool in the entry for CONFIG_INTEL_SGX_CORE because the
  default is ‘n’.
* Rename struct sgx_epc_bank as struct sgx_epc_section in order to match
  the SDM.
* Allocate struct sgx_epc_page instances one at a time.
* Use “__iomem void *” pointers for the mapped EPC memory consistently.
* Retry once on SGX_INVALID_TOKEN in sgx_einit() instead of counting power
  cycles.
* Call enclave swapping operations directly from the driver instead of
  calling them .indirectly through struct sgx_epc_page_ops because indirect
  calls are not required yet as the patch set does not contain the KVM
  support.
* Added special signal SEGV_SGXERR to notify about SGX EPCM violation
  errors.

v13:
* Always use SGX_CPUID constant instead of a hardcoded value.
* Simplified and documented the macros and functions for ENCLS leaves.
* Enable sgx_free_page() to free active enclave pages on demand
  in order to allow sgx_invalidate() to delete enclave pages.
  It no longer performs EREMOVE if a page is in the process of
  being reclaimed.
* Use PM notifier per enclave so that we don't have to traverse
  the global list of active EPC pages to find enclaves.
* Removed unused SGX_LE_ROLLBACK constant from uapi/asm/sgx.h
* Always use ioremap() to map EPC banks as we only support 64-bit kernel.
* Invalidate IA32_SGXLEPUBKEYHASH cache used by sgx_einit() when going
  to sleep.

v12:
* Split to more narrow scoped commits in order to ease the review process and
  use co-developed-by tag for co-authors of commits instead of listing them in
  the source files.
* Removed cruft EXPORT_SYMBOL() declarations and converted to static variables.
* Removed in-kernel LE i.e. this version of the SGX software stack only
  supports unlocked IA32_SGXLEPUBKEYHASHx MSRs.
* Refined documentation on launching enclaves, swapping and enclave
  construction.
* Refined sgx_arch.h to include alignment information for every struct that
  requires it and removed structs that are not needed without an LE.
* Got rid of SGX_CPUID.
* SGX detection now prints log messages about firmware configuration issues.

v11:
* Polished ENCLS wrappers with refined exception handling.
* ksgxswapd was not stopped (regression in v5) in
  sgx_page_cache_teardown(), which causes a leaked kthread after driver
  deinitialization.
* Shutdown sgx_le_proxy when going to suspend because its EPC pages will be
  invalidated when resuming, which will cause it not function properly
  anymore.
* Set EINITTOKEN.VALID to zero for a token that is passed when
  SGXLEPUBKEYHASH matches MRSIGNER as alloc_page() does not give a zero
  page.
* Fixed the check in sgx_edbgrd() for a TCS page. Allowed to read offsets
  around the flags field, which causes a #GP. Only flags read is readable.
* On read access memcpy() call inside sgx_vma_access() had src and dest
  parameters in wrong order.
* The build issue with CONFIG_KASAN is now fixed. Added undefined symbols
  to LE even if “KASAN_SANITIZE := false” was set in the makefile.
* Fixed a regression in the #PF handler. If a page has
  SGX_ENCL_PAGE_RESERVED flag the #PF handler should unconditionally fail.
  It did not, which caused weird races when trying to change other parts of
  swapping code.
* EPC management has been refactored to a flat LRU cache and moved to
  arch/x86. The swapper thread reads a cluster of EPC pages and swaps all
  of them. It can now swap from multiple enclaves in the same round.
* For the sake of consistency with SGX_IOC_ENCLAVE_ADD_PAGE, return -EINVAL
  when an enclave is already initialized or dead instead of zero.

v10:
* Cleaned up anon inode based IPC between the ring-0 and ring-3 parts
  of the driver.
* Unset the reserved flag from an enclave page if EDBGRD/WR fails
  (regression in v6).
* Close the anon inode when LE is stopped (regression in v9).
* Update the documentation with a more detailed description of SGX.

v9:
* Replaced kernel-LE IPC based on pipes with an anonymous inode.
  The driver does not require anymore new exports.

v8:
* Check that public key MSRs match the LE public key hash in the
  driver initialization when the MSRs are read-only.
* Fix the race in VA slot allocation by checking the fullness
  immediately after succeesful allocation.
* Fix the race in hash mrsigner calculation between the launch
  enclave and user enclaves by having a separate lock for hash
  calculation.

v7:
* Fixed offset calculation in sgx_edbgr/wr(). Address was masked with PAGE_MASK
  when it should have been masked with ~PAGE_MASK.
* Fixed a memory leak in sgx_ioc_enclave_create().
* Simplified swapping code by using a pointer array for a cluster
  instead of a linked list.
* Squeezed struct sgx_encl_page to 32 bytes.
* Fixed deferencing of an RSA key on OpenSSL 1.1.0.
* Modified TC's CMAC to use kernel AES-NI. Restructured the code
  a bit in order to better align with kernel conventions.

v6:
* Fixed semaphore underrun when accessing /dev/sgx from the launch enclave.
* In sgx_encl_create() s/IS_ERR(secs)/IS_ERR(encl)/.
* Removed virtualization chapter from the documentation.
* Changed the default filename for the signing key as signing_key.pem.
* Reworked EPC management in a way that instead of a linked list of
  struct sgx_epc_page instances there is an array of integers that
  encodes address and bank of an EPC page (the same data as 'pa' field
  earlier). The locking has been moved to the EPC bank level instead
  of a global lock.
* Relaxed locking requirements for EPC management. EPC pages can be
  released back to the EPC bank concurrently.
* Cleaned up ptrace() code.
* Refined commit messages for new architectural constants.
* Sorted includes in every source file.
* Sorted local variable declarations according to the line length in
  every function.
* Style fixes based on Darren's comments to sgx_le.c.

v5:
* Described IPC between the Launch Enclave and kernel in the commit messages.
* Fixed all relevant checkpatch.pl issues that I have forgot fix in earlier
  versions except those that exist in the imported TinyCrypt code.
* Fixed spelling mistakes in the documentation.
* Forgot to check the return value of sgx_drv_subsys_init().
* Encapsulated properly page cache init and teardown.
* Collect epc pages to a temp list in sgx_add_epc_bank
* Removed SGX_ENCLAVE_INIT_ARCH constant.

v4:
* Tied life-cycle of the sgx_le_proxy process to /dev/sgx.
* Removed __exit annotation from sgx_drv_subsys_exit().
* Fixed a leak of a backing page in sgx_process_add_page_req() in the
  case when vm_insert_pfn() fails.
* Removed unused symbol exports for sgx_page_cache.c.
* Updated sgx_alloc_page() to require encl parameter and documented the
  behavior (Sean Christopherson).
* Refactored a more lean API for sgx_encl_find() and documented the behavior.
* Moved #PF handler to sgx_fault.c.
* Replaced subsys_system_register() with plain bus_register().
* Retry EINIT 2nd time only if MSRs are not locked.

v3:
* Check that FEATURE_CONTROL_LOCKED and FEATURE_CONTROL_SGX_ENABLE are set.
* Return -ERESTARTSYS in __sgx_encl_add_page() when sgx_alloc_page() fails.
* Use unused bits in epc_page->pa to store the bank number.
* Removed #ifdef for WQ_NONREENTRANT.
* If mmu_notifier_register() fails with -EINTR, return -ERESTARTSYS.
* Added --remove-section=.got.plt to objcopy flags in order to prevent a
  dummy .got.plt, which will cause an inconsistent size for the LE.
* Documented sgx_encl_* functions.
* Added remark about AES implementation used inside the LE.
* Removed redundant sgx_sys_exit() from le/main.c.
* Fixed struct sgx_secinfo alignment from 128 to 64 bytes.
* Validate miscselect in sgx_encl_create().
* Fixed SSA frame size calculation to take the misc region into account.
* Implemented consistent exception handling to __encls() and __encls_ret().
* Implemented a proper device model in order to allow sysfs attributes
  and in-kernel API.
* Cleaned up various "find enclave" implementations to the unified
  sgx_encl_find().
* Validate that vm_pgoff is zero.
* Discard backing pages with shmem_truncate_range() after EADD.
* Added missing EEXTEND operations to LE signing and launch.
* Fixed SSA size for GPRS region from 168 to 184 bytes.
* Fixed the checks for TCS flags. Now DBGOPTIN is allowed.
* Check that TCS addresses are in ELRANGE and not just page aligned.
* Require kernel to be compiled with X64_64 and CPU_SUP_INTEL.
* Fixed an incorrect value for SGX_ATTR_DEBUG from 0x01 to 0x02.

v2:
* get_rand_uint32() changed the value of the pointer instead of value
  where it is pointing at.
* Launch enclave incorrectly used sigstruct attributes-field instead of
  enclave attributes-field.
* Removed unused struct sgx_add_page_req from sgx_ioctl.c
* Removed unused sgx_has_sgx2.
* Updated arch/x86/include/asm/sgx.h so that it provides stub
  implementations when sgx in not enabled.
* Removed cruft rdmsr-calls from sgx_set_pubkeyhash_msrs().
* return -ENOMEM in sgx_alloc_page() when VA pages consume too much space
* removed unused global sgx_nr_pids
* moved sgx_encl_release to sgx_encl.c
* return -ERESTARTSYS instead of -EINTR in sgx_encl_init()

Jarkko Sakkinen (10):
  x86/sgx: Update MAINTAINERS
  x86/sgx: Add SGX microarchitectural data structures
  x86/sgx: Add wrappers for ENCLS leaf functions
  x86/sgx: Add functions to allocate and free EPC pages
  x86/sgx: Linux Enclave Driver
  x86/sgx: Add provisioning
  x86/sgx: Add a page reclaimer
  x86/sgx: ptrace() support for the SGX driver
  selftests/x86: Add a selftest for SGX
  docs: x86/sgx: Document SGX micro architecture and kernel internals

Sean Christopherson (10):
  x86/cpufeatures: x86/msr: Add Intel SGX hardware bits
  x86/cpufeatures: x86/msr: Intel SGX Launch Control hardware bits
  x86/mm: x86/sgx: Signal SIGSEGV with PF_SGX
  x86/cpu/intel: Detect SGX support
  x86/sgx: Enumerate and track EPC sections
  mm: Introduce vm_ops->may_mprotect()
  x86/vdso: Add support for exception fixup in vDSO functions
  x86/fault: Add helper function to sanitize error code
  x86/traps: Attempt to fixup exceptions in vDSO before signaling
  x86/vdso: Implement a vDSO for Intel SGX enclave call

 .../userspace-api/ioctl/ioctl-number.rst      |   1 +
 Documentation/x86/index.rst                   |   1 +
 Documentation/x86/sgx.rst                     | 206 +++++
 MAINTAINERS                                   |  11 +
 arch/x86/Kconfig                              |  14 +
 arch/x86/entry/vdso/Makefile                  |   8 +-
 arch/x86/entry/vdso/extable.c                 |  46 +
 arch/x86/entry/vdso/extable.h                 |  29 +
 arch/x86/entry/vdso/vdso-layout.lds.S         |   9 +-
 arch/x86/entry/vdso/vdso.lds.S                |   1 +
 arch/x86/entry/vdso/vdso2c.h                  |  58 +-
 arch/x86/entry/vdso/vsgx_enter_enclave.S      | 131 +++
 arch/x86/include/asm/cpufeature.h             |   5 +-
 arch/x86/include/asm/cpufeatures.h            |   8 +-
 arch/x86/include/asm/disabled-features.h      |  18 +-
 arch/x86/include/asm/enclu.h                  |   8 +
 arch/x86/include/asm/msr-index.h              |   8 +
 arch/x86/include/asm/required-features.h      |   2 +-
 arch/x86/include/asm/traps.h                  |   1 +
 arch/x86/include/asm/vdso.h                   |   5 +
 arch/x86/include/uapi/asm/sgx.h               | 175 ++++
 arch/x86/kernel/cpu/Makefile                  |   1 +
 arch/x86/kernel/cpu/common.c                  |   4 +
 arch/x86/kernel/cpu/feat_ctl.c                |  32 +-
 arch/x86/kernel/cpu/sgx/Makefile              |   6 +
 arch/x86/kernel/cpu/sgx/arch.h                | 343 ++++++++
 arch/x86/kernel/cpu/sgx/driver.c              | 209 +++++
 arch/x86/kernel/cpu/sgx/driver.h              |  32 +
 arch/x86/kernel/cpu/sgx/encl.c                | 756 +++++++++++++++++
 arch/x86/kernel/cpu/sgx/encl.h                | 128 +++
 arch/x86/kernel/cpu/sgx/encls.h               | 238 ++++++
 arch/x86/kernel/cpu/sgx/ioctl.c               | 800 ++++++++++++++++++
 arch/x86/kernel/cpu/sgx/main.c                | 280 ++++++
 arch/x86/kernel/cpu/sgx/reclaim.c             | 471 +++++++++++
 arch/x86/kernel/cpu/sgx/sgx.h                 | 108 +++
 arch/x86/kernel/traps.c                       |  14 +
 arch/x86/mm/fault.c                           |  45 +-
 include/linux/mm.h                            |   2 +
 mm/mprotect.c                                 |  14 +-
 tools/arch/x86/include/asm/cpufeatures.h      |   7 +-
 tools/testing/selftests/Makefile              |   1 +
 tools/testing/selftests/sgx/.gitignore        |   2 +
 tools/testing/selftests/sgx/Makefile          |  53 ++
 tools/testing/selftests/sgx/call.S            |  54 ++
 tools/testing/selftests/sgx/defines.h         |  21 +
 tools/testing/selftests/sgx/load.c            | 282 ++++++
 tools/testing/selftests/sgx/main.c            | 199 +++++
 tools/testing/selftests/sgx/main.h            |  38 +
 tools/testing/selftests/sgx/sigstruct.c       | 395 +++++++++
 tools/testing/selftests/sgx/test_encl.c       |  20 +
 tools/testing/selftests/sgx/test_encl.lds     |  40 +
 .../selftests/sgx/test_encl_bootstrap.S       |  89 ++
 52 files changed, 5398 insertions(+), 31 deletions(-)
 create mode 100644 Documentation/x86/sgx.rst
 create mode 100644 arch/x86/entry/vdso/extable.c
 create mode 100644 arch/x86/entry/vdso/extable.h
 create mode 100644 arch/x86/entry/vdso/vsgx_enter_enclave.S
 create mode 100644 arch/x86/include/asm/enclu.h
 create mode 100644 arch/x86/include/uapi/asm/sgx.h
 create mode 100644 arch/x86/kernel/cpu/sgx/Makefile
 create mode 100644 arch/x86/kernel/cpu/sgx/arch.h
 create mode 100644 arch/x86/kernel/cpu/sgx/driver.c
 create mode 100644 arch/x86/kernel/cpu/sgx/driver.h
 create mode 100644 arch/x86/kernel/cpu/sgx/encl.c
 create mode 100644 arch/x86/kernel/cpu/sgx/encl.h
 create mode 100644 arch/x86/kernel/cpu/sgx/encls.h
 create mode 100644 arch/x86/kernel/cpu/sgx/ioctl.c
 create mode 100644 arch/x86/kernel/cpu/sgx/main.c
 create mode 100644 arch/x86/kernel/cpu/sgx/reclaim.c
 create mode 100644 arch/x86/kernel/cpu/sgx/sgx.h
 create mode 100644 tools/testing/selftests/sgx/.gitignore
 create mode 100644 tools/testing/selftests/sgx/Makefile
 create mode 100644 tools/testing/selftests/sgx/call.S
 create mode 100644 tools/testing/selftests/sgx/defines.h
 create mode 100644 tools/testing/selftests/sgx/load.c
 create mode 100644 tools/testing/selftests/sgx/main.c
 create mode 100644 tools/testing/selftests/sgx/main.h
 create mode 100644 tools/testing/selftests/sgx/sigstruct.c
 create mode 100644 tools/testing/selftests/sgx/test_encl.c
 create mode 100644 tools/testing/selftests/sgx/test_encl.lds
 create mode 100644 tools/testing/selftests/sgx/test_encl_bootstrap.S

Connor Kuehl April 22, 2020, 4:48 p.m. UTC | #1

On 4/21/20 2:52 PM, Jarkko Sakkinen wrote:
> v29:
> * The selftest has been moved to selftests/sgx. Because SGX is an execution
>    environment of its own, it really isn't a great fit with more "standard"
>    x86 tests.
> 
>    The RSA key is now generated on fly and the whole signing process has
>    been made as part of the enclave loader instead of signing the enclave
>    during the compilation time.
> 
>    Finally, the enclave loader loads now the test enclave directly from its
>    ELF file, which means that ELF file does not need to be coverted as raw
>    binary during the build process.
> * Version the mm_list instead of using on synchronize_mm() when adding new
>    entries. We hold the write lock for the mm_struct, and dup_mm() can thus
>    deadlock with the page reclaimer, which could hold the lock for the old
>    mm_struct.
> * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. anonymous) can
>    be used to reserve the address range. Now /dev/sgx supports only opaque
>    mappings to the (initialized) enclave data.
> * Make the vDSO callable directly from C by preserving RBX and taking leaf
>    from RCX.

Hi all,

I've been producing Fedora 32 kernel builds with the SGX patches applied 
for a few of weeks and I've just produced a build with this latest 
revision[1]. We've been using these kernels to enable SGX for some of 
our development/test machines.

We wanted to offer them here in the hopes that others might find them 
useful for testing the SGX patchsets on their own machines to send 
feedback to this list. Please note that these are *not* meant to replace 
your distro kernel and these are for testing purposes only.

I'll continue to upload builds to a Fedora Copr[2] as long as the 
patches continue to apply cleanly to the Fedora kernels.

Best,

Connor

[1] 
https://download.copr.fedorainfracloud.org/results/npmccallum/enarx/fedora-32-x86_64/01344404-kernel/

[2] https://copr.fedorainfracloud.org/coprs/npmccallum/enarx/

[3] This is the packaging branch that I work from and rebase on top of 
the f32 kernels: 
https://github.com/connorkuehl/fedora-kernel-enarx-pkg/commits/f32-enarx

Dr. Greg April 26, 2020, 4:57 p.m. UTC | #2

On Wed, Apr 22, 2020 at 12:52:56AM +0300, Jarkko Sakkinen wrote:

Good day, I hope the weekend is going well for everyone.

> Intel(R) SGX is a set of CPU instructions that can be used by applications
> to set aside private regions of code and data. The code outside the enclave
> is disallowed to access the memory inside the enclave by the CPU access
> control.
>
> ... [ elided ] ..
> 
> The current implementation requires that the firmware sets
> IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can
> decide what enclaves it wants run. The implementation does not create
> any bottlenecks to support read-only MSRs later on.

It seems highly unlikely that a driver implementation with any type of
support for read-only launch control registers would ever get into the
kernel.  All one needs to do is review the conversations that Matthew
Garrett's lockdown patches engender to get a sense of that, ie:

https://lwn.net/Articles/818277/

As a result, the proposed SGX driver needs support for cryptographic
policy management before it goes into the kernel.  Either the patch
that we have offered or something equivalent.

Absent that, the driver won't address the full needs of the
development community implementing runtimes.  In addition it also
poses security and privacy issues that are well documented in the
literature.

As an aside, for those who haven't spent the last 5+ years of their
life working with this technology.  SGX2 hardware platforms have the
ability to allow unrestricted code execution in enclave context.  No
amount of LSM or IMA interventions can provide any control over that.

In fact, the Confidential Computing Consortium, sponsored by none
other then the Linux Foundation, has at its fundamental tenant, the
notion of developing an eco-system that allows the execution of code
and processing of data, over which, the owner or administrator of the
platform has no visibility or control.  It would seem only logical
that adversarial interests would indulge themseleves in those
capabilities as well.

With respect to SGX and these issues, cryptographic policy management
is important for the same reason that 2-factor authentication is now
considered standard practice in the security industry.

We appreciate, Jarkko, that you feel that such infrastructure is
optional, like virtualization support, and is something that can go in
after the driver is mainlined.  As the diffstat for our patch points
out, the proposed support has virtually no impact on the driver, the
same cannot be said for virtualization capabilities.

Moreover, adding support for key based policy management later would
require the addition of another ioctl in order to avoid ABI
compatibility issues.  The current initialization ioctl is best
suited, from an engineering perspective, to support this type of
infrastructure.  In fact, the necessary support was removed from the
ioctl for political reasons rather then for any valid engineering
rationale on flexible launch control platforms, particularly with our
patch or an equivalent approach.

For the benefit of the kernel community at large, I will follow up this
e-mail with a copy of our patch for review.  In case anyone misses it,
or it is corrupted, the patch can be pulled from the following URL:

ftp://ftp.enjellic.com/pub/sgx/kernel/SFLC-current.patch

We believe the patch or an equivalent approach deserves consideration
for the following reasons:

1.) It does not modify the default behavior of the driver. ie. any
enclave will be initialized that is presented.

2.) It enables needed functionality only at the discretion and control
of the platform owner/administrator.

3.) The impact on the architecture of the driver is negligible.

In closing, it is important to note that the proposed SGX driver is
not available as a module.  This effectively excludes any alternative
implementations of the driver without replacement of the kernel at
large.  It also means that any platform, with SGX hardware support,
running a kernel with this driver, has the potential for the
security/privacy issues noted above.

If key based policy management is not allowed, then the driver needs
to be re-architected to have modular support so that alternative
implementations or the absence of any driver support are at least
tenable.

Hopefully this is a reasoned technical approach to what has been a
long standing issue surrounding this technology.

Best wishes for a productive week.

Dr. Greg

As always,
Dr. Greg Wettstein, Ph.D, Worker      SGX secured infrastructure and
Enjellic Systems Development, LLC     autonomously self-defensive
4206 N. 19th Ave.                     platforms.
Fargo, ND  58102
PH: 701-281-1686                      EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"Opportunity is missed by most people because it is dressed in overalls
 and looks like work."
                                -- Thomas Edison

Dr. Greg April 26, 2020, 5:03 p.m. UTC | #3

The patch I discussed in my previous mail.

--
    Implement cryptographic initialization control.
    
    This patch introduces the ability of the platform owner to
    implement cryptographically controlled enclave initialization
    policy.  This functionality provides the platform owner with the
    ability to use the identity modulus signature of an enclave
    signer (SHA256 hash of the modulus of the signing key) to gate
    access to enclave initialization, rather then simply relying
    on discretionary access controls that are applied to the SGX
    relevant device driver nodes.
    
    The following policy functionality is introduced in this commit:
    
    1.) Control over which keys are allowed to initialize an
    enclave.
    
    2.) Control over which keys are allowed to implement launch
    enclaves.
    
    3.) Control over which keys are allowed to initialize enclaves
    that have access to the PROVISION_KEY attribute.
    
    For each policy type a plurality of key signatures are allowed.
    
    Absent an intent by the platform owner/administrator to use
    cryptographic initialization policies, this functionality does
    not change the standard behavior of the driver, which is to
    allow any enclave presented to the driver to be initialized.
    
    Cryptographic initialization policy is accessed through the
    following three pseudo-files that are implemented by this patch:
    
    /sys/kernel/security/signing_keys
    
    /sys/kernel/security/launch_keys
    
    /sys/kernel/security/provisioning_keys
    
    Policy keys are registered with the driver by writing the identity
    modulus signature to these files in simple hexadecimal format, ie:
    
    0000000000000000000000000000000000000000000000000000000000000000
    
    The current list of policy keys can be displayed by reading the
    contents of the pseudo-files.
    
    In addition to a key signature, the following keywords are
    accepted as valid entries for a policy file:
    
    clear
    
    lock
    
    The 'clear' keyword causes all existing entries in a policy list
    to be deleted.
    
    The 'lock' keyword causes any further modifications or access to
    a policy list to be denied.
    
    All of the policy code is implemented in a single file, policy.c,
    with minimal impact to the driver at large.  Since the calculation
    of the identity modulus signature needed to program a launch control
    register is effectively a policy decision, the code to compute the
    signature was moved from the ioctl.c file to the policy.c file.
    
    In order to support a plurality of launch keys the code that
    loops over initialization attempts was pushed downward into a new
    function that is named as follows:
    
    sgx_try_init()
    
    Primarily to avoid excessive indentation that would otherwise be
    needed in the sgx_encl_init() function.
    
    Signed-off-by: Dr. Greg <greg@enjellic.com>
---
 arch/x86/Kconfig                 |   1 +
 arch/x86/include/uapi/asm/sgx.h  |   1 +
 arch/x86/kernel/cpu/sgx/Makefile |   3 +-
 arch/x86/kernel/cpu/sgx/driver.c |   8 +
 arch/x86/kernel/cpu/sgx/driver.h |   2 +
 arch/x86/kernel/cpu/sgx/encl.h   |   2 +
 arch/x86/kernel/cpu/sgx/ioctl.c  |  82 ++++---
 arch/x86/kernel/cpu/sgx/policy.c | 513 +++++++++++++++++++++++++++++++++++++++
 8 files changed, 573 insertions(+), 39 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8bbb4313fae3..5cfde1d36dc9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1955,6 +1955,7 @@ config INTEL_SGX
 	depends on X86_64 && CPU_SUP_INTEL
 	select SRCU
 	select MMU_NOTIFIER
+	select SECURITYFS
 	help
 	  Intel(R) SGX is a set of CPU instructions that can be used by
 	  applications to set aside private regions of code and data, referred
diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index e196cfd44b70..1c3cdbce533d 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -63,6 +63,7 @@ struct sgx_enclave_add_pages {
  */
 struct sgx_enclave_init {
 	__u64 sigstruct;
+	__u64 token;
 };
 
 /**
diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
index f8d32da3a67a..d8ee2a889ca1 100644
--- a/arch/x86/kernel/cpu/sgx/Makefile
+++ b/arch/x86/kernel/cpu/sgx/Makefile
@@ -3,4 +3,5 @@ obj-y += \
 	encl.o \
 	ioctl.o \
 	main.o \
-	reclaim.o
+	reclaim.o \
+	policy.o
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index 997a7f4117c5..d4330b32c243 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -205,5 +205,13 @@ int __init sgx_drv_init(void)
 		return ret;
 	}
 
+	ret = sgx_policy_fs_init();
+	if (ret) {
+		pr_err("SGX policy fs creation failed with %d.\n", ret);
+		misc_deregister(&sgx_dev_provision);
+		misc_deregister(&sgx_dev_enclave);
+		return ret;
+	}
+
 	return 0;
 }
diff --git a/arch/x86/kernel/cpu/sgx/driver.h b/arch/x86/kernel/cpu/sgx/driver.h
index 72747d01c046..8293f4d12e82 100644
--- a/arch/x86/kernel/cpu/sgx/driver.h
+++ b/arch/x86/kernel/cpu/sgx/driver.h
@@ -29,4 +29,6 @@ long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg);
 
 int sgx_drv_init(void);
 
+int sgx_policy_fs_init(void);
+u64 *sgx_policy_get_launch_signer(u64 *signature);
 #endif /* __ARCH_X86_SGX_DRIVER_H__ */
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index 44b353aa8866..b5be1f2233ea 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -124,4 +124,6 @@ unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
 void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
 bool sgx_va_page_full(struct sgx_va_page *va_page);
 
+int sgx_policy_get_params(struct sgx_encl *encl, void *modulus, u64 *signer,
+			  int *signcnt);
 #endif /* _X86_ENCL_H */
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 12e1496f8a8b..e960be8f924c 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -556,31 +556,6 @@ static long sgx_ioc_enclave_add_pages(struct sgx_encl *encl, void __user *arg)
 	return ret;
 }
 
-static int __sgx_get_key_hash(struct crypto_shash *tfm, const void *modulus,
-			      void *hash)
-{
-	SHASH_DESC_ON_STACK(shash, tfm);
-
-	shash->tfm = tfm;
-
-	return crypto_shash_digest(shash, modulus, SGX_MODULUS_SIZE, hash);
-}
-
-static int sgx_get_key_hash(const void *modulus, void *hash)
-{
-	struct crypto_shash *tfm;
-	int ret;
-
-	tfm = crypto_alloc_shash("sha256", 0, CRYPTO_ALG_ASYNC);
-	if (IS_ERR(tfm))
-		return PTR_ERR(tfm);
-
-	ret = __sgx_get_key_hash(tfm, modulus, hash);
-
-	crypto_free_shash(tfm);
-	return ret;
-}
-
 static void sgx_update_lepubkeyhash_msrs(u64 *lepubkeyhash, bool enforce)
 {
 	u64 *cache;
@@ -611,22 +586,14 @@ static int sgx_einit(struct sgx_sigstruct *sigstruct, void *token,
 	return ret;
 }
 
-static int sgx_encl_init(struct sgx_encl *encl, struct sgx_sigstruct *sigstruct,
-			 void *token)
+static int sgx_try_init(struct sgx_encl *encl, struct sgx_sigstruct *sigstruct,
+			void *token, u64 *lepubkeyhash)
+
 {
-	u64 mrsigner[4];
 	int ret;
 	int i;
 	int j;
 
-	/* Check that the required attributes have been authorized. */
-	if (encl->secs_attributes & ~encl->allowed_attributes)
-		return -EACCES;
-
-	ret = sgx_get_key_hash(sigstruct->modulus, mrsigner);
-	if (ret)
-		return ret;
-
 	mutex_lock(&encl->lock);
 
 	if (atomic_read(&encl->flags) & SGX_ENCL_INITIALIZED) {
@@ -637,7 +604,7 @@ static int sgx_encl_init(struct sgx_encl *encl, struct sgx_sigstruct *sigstruct,
 	for (i = 0; i < SGX_EINIT_SLEEP_COUNT; i++) {
 		for (j = 0; j < SGX_EINIT_SPIN_COUNT; j++) {
 			ret = sgx_einit(sigstruct, token, encl->secs.epc_page,
-					mrsigner);
+					lepubkeyhash);
 			if (ret == SGX_UNMASKED_EVENT)
 				continue;
 			else
@@ -673,6 +640,36 @@ static int sgx_encl_init(struct sgx_encl *encl, struct sgx_sigstruct *sigstruct,
 	return ret;
 }
 
+static int sgx_encl_init(struct sgx_encl *encl,
+			 struct sgx_sigstruct *sigstruct, void *token)
+{
+	u64 mrsigner[4];
+	u64 *signer;
+	int ret;
+	int signcnt = 1;
+
+	/* Configure the launch policy. */
+	ret = sgx_policy_get_params(encl, sigstruct->modulus, mrsigner,
+				    &signcnt);
+	if (ret)
+		return ret;
+
+	/* Check that the required attributes have been authorized. */
+	if (encl->secs_attributes & ~encl->allowed_attributes)
+		return -EACCES;
+
+	signer = mrsigner;
+	while (signcnt--) {
+		ret = sgx_try_init(encl, sigstruct, token, signer);
+		if (!ret)
+			return ret;
+		if (signcnt)
+			signer = sgx_policy_get_launch_signer(signer);
+	}
+
+	return ret;
+}
+
 /**
  * sgx_ioc_enclave_init - handler for %SGX_IOC_ENCLAVE_INIT
  *
@@ -708,7 +705,16 @@ static long sgx_ioc_enclave_init(struct sgx_encl *encl, void __user *arg)
 
 	sigstruct = kmap(initp_page);
 	token = (void *)((unsigned long)sigstruct + PAGE_SIZE / 2);
-	memset(token, 0, SGX_LAUNCH_TOKEN_SIZE);
+	if (!einit.token)
+		memset(token, 0, SGX_LAUNCH_TOKEN_SIZE);
+	else {
+		if (copy_from_user((uint8_t *) token,
+				   (void __user *) einit.token,
+				   SGX_LAUNCH_TOKEN_SIZE)) {
+			ret = -EFAULT;
+			goto out;
+		}
+	}
 
 	if (copy_from_user(sigstruct, (void __user *)einit.sigstruct,
 			   sizeof(*sigstruct))) {
diff --git a/arch/x86/kernel/cpu/sgx/policy.c b/arch/x86/kernel/cpu/sgx/policy.c
new file mode 100644
index 000000000000..04ff13e4ce73
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/policy.c
@@ -0,0 +1,513 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+// Copyright(c) Enjellic Systems Development, LLC
+
+#define KEY_SIZE 32
+
+#include <linux/types.h>
+#include <linux/seq_file.h>
+#include <linux/atomic.h>
+#include <linux/security.h>
+#include "driver.h"
+#include "encl.h"
+
+static struct dentry *sgx_fs;
+
+struct list_key {
+	struct list_head list;
+	u64 key[KEY_SIZE / 8];
+};
+
+struct list_key_iterator {
+	char *type;
+	atomic_t *opencount;
+	unsigned int *count;
+	struct mutex *lock;
+	struct list_head *list;
+	bool *lockfile;
+};
+
+static struct dentry *launch_keys;
+static atomic_t launch_keys_opencount = ATOMIC_INIT(1);
+static unsigned int launch_keys_count;
+static bool launch_keys_locked;
+static DEFINE_MUTEX(launch_key_list_mutex);
+static LIST_HEAD(launch_key_list);
+
+static struct dentry *provision_keys;
+static atomic_t provision_keys_opencount = ATOMIC_INIT(1);
+static unsigned int provision_keys_count;
+static bool provision_keys_locked;
+static DEFINE_MUTEX(provision_key_list_mutex);
+static LIST_HEAD(provision_key_list);
+
+static struct dentry *signing_keys;
+static atomic_t signing_keys_opencount = ATOMIC_INIT(1);
+static unsigned int signing_keys_count;
+static bool signing_keys_locked;
+static DEFINE_MUTEX(signing_key_list_mutex);
+static LIST_HEAD(signing_key_list);
+
+/**
+ * have_signer - Verify the presence presence of a key signer.
+ *
+ * @signature:	Pointer to signature of signer.
+ *
+ * Return:
+ *   0 Signer signature was not found.
+ *   1 Signer signature was found.
+ */
+static bool have_signer(struct list_head *keylist, struct mutex *lock,
+			uint8_t *signature)
+{
+	bool retn = false;
+	struct list_key *kp;
+
+	mutex_lock(lock);
+	list_for_each_entry(kp, keylist, list) {
+		pr_debug("%s: Checking signer=%*phN, ks=%*phN\n", __func__,
+			 KEY_SIZE, signature, KEY_SIZE, kp->key);
+		if (memcmp(kp->key, signature, KEY_SIZE) == 0) {
+			retn = true;
+			goto done;
+		}
+	}
+
+ done:
+	mutex_unlock(lock);
+	return retn;
+}
+
+static int process_write_key(const char __user *buf, size_t datalen,
+			     unsigned int *keycnt, struct mutex *lock,
+			     struct list_head *keylist)
+{
+	ssize_t retn;
+
+	char *p, keybufr[KEY_SIZE*2 + 1], key[KEY_SIZE];
+
+	struct list_key *kp;
+
+	if (datalen != sizeof(keybufr)) {
+		retn = -EINVAL;
+		goto done;
+	}
+
+	memset(keybufr, '\0', sizeof(keybufr));
+	if (copy_from_user(keybufr, buf, datalen)) {
+		retn = -EFAULT;
+		goto done;
+	}
+
+	p = strchr(keybufr, '\n');
+	if (!p) {
+		retn = -EINVAL;
+		goto done;
+	}
+	*p = '\0';
+	if (hex2bin(key, keybufr, sizeof(key))) {
+		retn = -EINVAL;
+		goto done;
+	}
+
+	kp = kzalloc(sizeof(*kp), GFP_KERNEL);
+	if (!kp) {
+		retn = -ENOMEM;
+		goto done;
+	}
+	memcpy(kp->key, key, sizeof(kp->key));
+
+	mutex_lock(lock);
+	list_add_tail(&kp->list, keylist);
+	++*keycnt;
+	mutex_unlock(lock);
+
+	retn = datalen;
+	pr_debug("%s: Added key: %*phN\n", __func__, KEY_SIZE, key);
+
+ done:
+	return retn;
+}
+
+static int process_lock(const char __user *buf, size_t datalen, bool *lockfile)
+{
+	char bufr[5];
+
+	if (datalen != strlen("lock") + 1)
+		return 0;
+
+	memset(bufr, '\0', sizeof(bufr));
+	if (copy_from_user(bufr, buf, datalen-1))
+		return -EFAULT;
+	if (strcmp(bufr, "lock") != 0)
+		return 0;
+
+	*lockfile = true;
+	return datalen;
+}
+
+static int process_clear(const char __user *buf, size_t datalen, char *type,
+			 unsigned int *keycnt, struct mutex *lock,
+			 struct list_head *keylist)
+{
+	char bufr[6];
+	struct list_key *kp, *kp_tmp;
+
+	if (datalen != strlen("clear") + 1)
+		return 0;
+
+	memset(bufr, '\0', sizeof(bufr));
+	if (copy_from_user(bufr, buf, datalen-1))
+		return -EFAULT;
+	if (strcmp(bufr, "clear") != 0)
+		return 0;
+
+	mutex_lock(lock);
+	list_for_each_entry_safe(kp, kp_tmp, keylist, list) {
+		pr_debug("[%s]: Freeing signature: %*phN\n", __FILE__,
+			 KEY_SIZE, kp->key);
+		list_del(&kp->list);
+		kfree(kp);
+	}
+	*keycnt = 0;
+	mutex_unlock(lock);
+
+	pr_info("Cleared %s signatures.\n", type);
+	return datalen;
+}
+
+static void *key_start(struct seq_file *c, loff_t *pos)
+{
+	struct list_key_iterator *ki = (struct list_key_iterator *) c->private;
+
+	if (*pos >= *ki->count)
+		return NULL;
+
+	mutex_lock(ki->lock);
+	return seq_list_start(ki->list, *pos);
+}
+
+static void *key_next(struct seq_file *c, void *p, loff_t *pos)
+{
+	struct list_key_iterator *ki = (struct list_key_iterator *) c->private;
+
+	return seq_list_next(p, ki->list, pos);
+}
+
+static void key_stop(struct seq_file *c, void *p)
+{
+	struct list_key_iterator *ki = (struct list_key_iterator *) c->private;
+
+	mutex_unlock(ki->lock);
+}
+
+static int key_show(struct seq_file *c, void *key)
+{
+	struct list_key *kp;
+
+	kp = list_entry(key, struct list_key, list);
+	seq_printf(c, "%*phN\n", KEY_SIZE, kp->key);
+	return 0;
+}
+
+static const struct seq_operations keys_seqops = {
+	.start = key_start,
+	.next = key_next,
+	.stop = key_stop,
+	.show = key_show
+};
+
+static ssize_t write_keys(struct file *file, const char __user *buf,
+			  size_t datalen, loff_t *ppos)
+{
+	struct seq_file *s = file->private_data;
+	struct list_key_iterator *ki = (struct list_key_iterator *) s->private;
+	ssize_t retn;
+
+	if (*ppos != 0)
+		return -EINVAL;
+
+	retn = process_lock(buf, datalen, ki->lockfile);
+	if (retn != 0)
+		return retn;
+
+	retn = process_clear(buf, datalen, ki->type, ki->count, ki->lock,
+			     ki->list);
+	if (retn != 0)
+		return retn;
+
+	retn = process_write_key(buf, datalen, ki->count, ki->lock, ki->list);
+	return retn;
+}
+
+static int release_keys(struct inode *inode, struct file *file)
+{
+	struct seq_file *s = file->private_data;
+	struct list_key_iterator *ki = (struct list_key_iterator *) s->private;
+
+	atomic_set(ki->opencount, 1);
+	seq_release_private(inode, file);
+	return 0;
+}
+
+static int open_launch_keys(struct inode *inode, struct file *filp)
+{
+	struct list_key_iterator *ki;
+
+	if (launch_keys_locked)
+		return -EACCES;
+
+	if (!atomic_dec_and_test(&launch_keys_opencount))
+		return -EBUSY;
+
+	ki = __seq_open_private(filp, &keys_seqops, sizeof(*ki));
+	if (!ki)
+		return -ENOMEM;
+
+	ki->type = "launch control";
+	ki->opencount = &launch_keys_opencount;
+	ki->count = &launch_keys_count;
+	ki->lock = &launch_key_list_mutex;
+	ki->list = &launch_key_list;
+	ki->lockfile = &launch_keys_locked;
+
+	return 0;
+}
+
+static const struct file_operations launch_keys_ops = {
+	.open = open_launch_keys,
+	.write = write_keys,
+	.release = release_keys,
+	.read = seq_read,
+	.llseek = seq_lseek,
+};
+
+/* Provisioning control. */
+
+static int open_provision_keys(struct inode *inode, struct file *filp)
+{
+	struct list_key_iterator *ki;
+
+	if (provision_keys_locked)
+		return -EACCES;
+
+	if (!atomic_dec_and_test(&provision_keys_opencount))
+		return -EBUSY;
+
+	ki = __seq_open_private(filp, &keys_seqops, sizeof(*ki));
+	if (!ki)
+		return -ENOMEM;
+
+	ki->type = "provisioning control";
+	ki->opencount = &provision_keys_opencount;
+	ki->count = &provision_keys_count;
+	ki->lock = &provision_key_list_mutex;
+	ki->list = &provision_key_list;
+
+	return 0;
+}
+
+static const struct file_operations provision_keys_ops = {
+	.open = open_provision_keys,
+	.write = write_keys,
+	.release = release_keys,
+	.read = seq_read,
+	.llseek = seq_lseek,
+};
+
+/* Signing control. */
+
+static int open_signing_keys(struct inode *inode, struct file *filp)
+{
+	struct list_key_iterator *ki;
+
+	if (signing_keys_locked)
+		return -EACCES;
+
+	if (!atomic_dec_and_test(&signing_keys_opencount))
+		return -EBUSY;
+
+	ki = __seq_open_private(filp, &keys_seqops, sizeof(*ki));
+	if (!ki)
+		return -ENOMEM;
+
+	ki->type = "signing control";
+	ki->opencount = &signing_keys_opencount;
+	ki->count = &signing_keys_count;
+	ki->lock = &signing_key_list_mutex;
+	ki->list = &signing_key_list;
+
+	return 0;
+}
+
+static const struct file_operations signing_keys_ops = {
+	.open = open_signing_keys,
+	.write = write_keys,
+	.release = release_keys,
+	.read = seq_read,
+	.llseek = seq_lseek,
+};
+
+static int __sgx_get_key_hash(struct crypto_shash *tfm, const void *modulus,
+			      void *hash)
+{
+	SHASH_DESC_ON_STACK(shash, tfm);
+
+	shash->tfm = tfm;
+
+	return crypto_shash_digest(shash, modulus, SGX_MODULUS_SIZE, hash);
+}
+
+static int sgx_get_key_hash(const void *modulus, void *hash)
+{
+	struct crypto_shash *tfm;
+	int ret;
+
+	tfm = crypto_alloc_shash("sha256", 0, CRYPTO_ALG_ASYNC);
+	if (IS_ERR(tfm))
+		return PTR_ERR(tfm);
+
+	ret = __sgx_get_key_hash(tfm, modulus, hash);
+
+	crypto_free_shash(tfm);
+	return ret;
+}
+
+/**
+ * sgx_policy_get_params
+ *
+ * This function sets the cryptographically configured initialization
+ * policy parameters.  These include the identity modulus signature to
+ * be used as well as the configuration of the allowed enclave
+ * attributes.
+ *
+ * Return:
+ *   0 on success.
+ *   -errno otherwise
+ */
+
+int sgx_policy_get_params(struct sgx_encl *encl, void *modulus, u64 *signer,
+			  int *signcnt)
+{
+	int retn = -EINVAL;
+	uint8_t mrsigner[KEY_SIZE];
+	struct list_key *kp;
+
+	retn = sgx_get_key_hash(modulus, mrsigner);
+	if (retn)
+		goto no_signer;
+
+	if (provision_keys_count > 0 &&
+	    have_signer(&provision_key_list, &provision_key_list_mutex,
+			mrsigner))
+		encl->allowed_attributes |= SGX_ATTR_PROVISIONKEY;
+
+	if (signing_keys_count > 0 &&
+	    have_signer(&signing_key_list, &signing_key_list_mutex, mrsigner))
+		goto have_signer;
+
+	if (encl->secs_attributes & SGX_ATTR_EINITTOKENKEY &&
+	    launch_keys_count > 0) {
+		if (have_signer(&launch_key_list, &launch_key_list_mutex,
+				 mrsigner)) {
+			encl->allowed_attributes |= SGX_ATTR_EINITTOKENKEY;
+			goto have_signer;
+		} else
+			goto no_signer;
+	}
+
+	if (launch_keys_count > 0) {
+		*signcnt = launch_keys_count;
+		kp = list_first_entry(&launch_key_list, struct list_key, list);
+		memcpy(mrsigner, kp->key, KEY_SIZE);
+	}
+
+ have_signer:
+	memcpy(signer, mrsigner, KEY_SIZE);
+	pr_debug("%s: Using signer: %*phN\n", __func__, KEY_SIZE, signer);
+	return 0;
+ no_signer:
+	memset(signer, '\0', KEY_SIZE);
+	return retn;
+}
+
+/**
+ * sgx_policy_get_launch_signer - Iterate through list of enclave authorizers.
+ *
+ * @signer:	The last returned enclave signer.
+ *
+ * This function iterates through the list of enclave signers from the
+ * last signature.  Calling the function with a NULL  value
+ * resets the iteration to the beginning of the list.
+ *
+ * Return:
+ *   NULL indicates end of list
+ *   non-NULL the next enclave signature on the list.
+ */
+
+u64 *sgx_policy_get_launch_signer(u64 *signer)
+{
+	bool seeking = false;
+	u64 *retn = NULL;
+	struct list_key *kp;
+
+	if (!signer) {
+		kp = list_first_entry(&launch_key_list, struct list_key, list);
+		return kp->key;
+	}
+	kp = list_last_entry(&launch_key_list, struct list_key, list);
+	if (memcmp(kp->key, signer, sizeof(kp->key)) == 0)
+		return NULL;
+
+	mutex_lock(&launch_key_list_mutex);
+	list_for_each_entry(kp, &launch_key_list, list) {
+		if (seeking) {
+			retn = kp->key;
+			goto done;
+		}
+		pr_debug("%s: Skipping: %*phN\n", __func__, KEY_SIZE, kp->key);
+		if (memcmp(kp->key, signer, KEY_SIZE) == 0)
+			seeking = true;
+	}
+
+ done:
+	mutex_unlock(&launch_key_list_mutex);
+	return retn;
+}
+
+int __init sgx_policy_fs_init(void)
+{
+	int retn = -1;
+
+	sgx_fs = securityfs_create_dir("sgx", NULL);
+	if (IS_ERR(sgx_fs)) {
+		retn = PTR_ERR(sgx_fs);
+		goto err;
+	}
+
+	launch_keys = securityfs_create_file("launch_keys", 0600, sgx_fs,
+					     NULL, &launch_keys_ops);
+	if (IS_ERR(launch_keys)) {
+		retn = PTR_ERR(launch_keys);
+		goto err;
+	}
+
+	provision_keys = securityfs_create_file("provisioning_keys", 0600,
+						sgx_fs, NULL,
+						&provision_keys_ops);
+	if (IS_ERR(provision_keys)) {
+		retn = PTR_ERR(provision_keys);
+		goto err;
+	}
+
+	signing_keys = securityfs_create_file("signing_keys", 0600, sgx_fs,
+					      NULL, &signing_keys_ops);
+	if (IS_ERR(signing_keys)) {
+		retn = PTR_ERR(signing_keys);
+		goto err;
+	}
+
+	return 0;
+
+ err:
+	return retn;
+}

Jarkko Sakkinen April 29, 2020, 5:22 a.m. UTC | #4

On Wed, Apr 22, 2020 at 09:48:58AM -0700, Connor Kuehl wrote:
> On 4/21/20 2:52 PM, Jarkko Sakkinen wrote:
> > v29:
> > * The selftest has been moved to selftests/sgx. Because SGX is an execution
> >    environment of its own, it really isn't a great fit with more "standard"
> >    x86 tests.
> > 
> >    The RSA key is now generated on fly and the whole signing process has
> >    been made as part of the enclave loader instead of signing the enclave
> >    during the compilation time.
> > 
> >    Finally, the enclave loader loads now the test enclave directly from its
> >    ELF file, which means that ELF file does not need to be coverted as raw
> >    binary during the build process.
> > * Version the mm_list instead of using on synchronize_mm() when adding new
> >    entries. We hold the write lock for the mm_struct, and dup_mm() can thus
> >    deadlock with the page reclaimer, which could hold the lock for the old
> >    mm_struct.
> > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. anonymous) can
> >    be used to reserve the address range. Now /dev/sgx supports only opaque
> >    mappings to the (initialized) enclave data.
> > * Make the vDSO callable directly from C by preserving RBX and taking leaf
> >    from RCX.
> 
> Hi all,
> 
> I've been producing Fedora 32 kernel builds with the SGX patches applied for
> a few of weeks and I've just produced a build with this latest revision[1].
> We've been using these kernels to enable SGX for some of our
> development/test machines.

Thanks a lot!

/Jarkko

Jarkko Sakkinen April 29, 2020, 5:23 a.m. UTC | #5

On Sun, Apr 26, 2020 at 11:57:53AM -0500, Dr. Greg wrote:
> On Wed, Apr 22, 2020 at 12:52:56AM +0300, Jarkko Sakkinen wrote:
> 
> Good day, I hope the weekend is going well for everyone.
> 
> > Intel(R) SGX is a set of CPU instructions that can be used by applications
> > to set aside private regions of code and data. The code outside the enclave
> > is disallowed to access the memory inside the enclave by the CPU access
> > control.
> >
> > ... [ elided ] ..
> > 
> > The current implementation requires that the firmware sets
> > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can
> > decide what enclaves it wants run. The implementation does not create
> > any bottlenecks to support read-only MSRs later on.
> 
> It seems highly unlikely that a driver implementation with any type of
> support for read-only launch control registers would ever get into the
> kernel.  All one needs to do is review the conversations that Matthew
> Garrett's lockdown patches engender to get a sense of that, ie:
> 
> https://lwn.net/Articles/818277/

We do not require read-only MSRs.

/Jarkko

Sean Christopherson April 29, 2020, 3:14 p.m. UTC | #6

On Wed, Apr 29, 2020 at 08:23:29AM +0300, Jarkko Sakkinen wrote:
> On Sun, Apr 26, 2020 at 11:57:53AM -0500, Dr. Greg wrote:
> > On Wed, Apr 22, 2020 at 12:52:56AM +0300, Jarkko Sakkinen wrote:
> > 
> > Good day, I hope the weekend is going well for everyone.
> > 
> > > Intel(R) SGX is a set of CPU instructions that can be used by applications
> > > to set aside private regions of code and data. The code outside the enclave
> > > is disallowed to access the memory inside the enclave by the CPU access
> > > control.
> > >
> > > ... [ elided ] ..
> > > 
> > > The current implementation requires that the firmware sets
> > > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can
> > > decide what enclaves it wants run. The implementation does not create
> > > any bottlenecks to support read-only MSRs later on.
> > 
> > It seems highly unlikely that a driver implementation with any type of
> > support for read-only launch control registers would ever get into the
> > kernel.  All one needs to do is review the conversations that Matthew
> > Garrett's lockdown patches engender to get a sense of that, ie:
> > 
> > https://lwn.net/Articles/818277/
> 
> We do not require read-only MSRs.

Greg is pointing out the opposite, that supporting read-only MSRs is highly
unlikely to ever be supported in the mainline kernel.

Jethro Beekman April 29, 2020, 3:27 p.m. UTC | #7

On 2020-04-21 23:52, Jarkko Sakkinen wrote:
> Intel(R) SGX is a set of CPU instructions that can be used by applications
> to set aside private regions of code and data. The code outside the enclave
> is disallowed to access the memory inside the enclave by the CPU access
> control.
> 
> There is a new hardware unit in the processor called Memory Encryption
> Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define
> one or many MEE regions that can hold enclave data by configuring them with
> PRMRR registers.
> 
> The MEE automatically encrypts the data leaving the processor package to
> the MEE regions. The data is encrypted using a random key whose life-time
> is exactly one power cycle.
> 
> The current implementation requires that the firmware sets
> IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can
> decide what enclaves it wants run. The implementation does not create
> any bottlenecks to support read-only MSRs later on.
> 
> You can tell if your CPU supports SGX by looking into /proc/cpuinfo:
> 
> 	cat /proc/cpuinfo  | grep sgx

Let's merge this.

--
Jethro Beekman | Fortanix

Sean Christopherson April 29, 2020, 3:30 p.m. UTC | #8

On Sun, Apr 26, 2020 at 11:57:53AM -0500, Dr. Greg wrote:
> In closing, it is important to note that the proposed SGX driver is
> not available as a module.  This effectively excludes any alternative
> implementations of the driver without replacement of the kernel at
> large.

No it doesn't.  The SGX subsytem won't allocate EPC pages unless userspace
creates an enclave, i.e. preventing unprivileged userspace from accessing
/dev/sgx/enclave will allow loading an alternative out-of-tree SGX module.
Yes, SGX sanitizes the EPC on boot, but that's arguably a good thing for
out-of-tree modules.

And if you want to get crafty and squash in-kernel SGX altogether, boot
with "clearcpuid=<SGX_LC>" and/or "clearcpuid=<SGX>" to disable in-kernel
support entirely.  SGX won't be correctly enumerated in /proc/cpuinfo
relative to the existence of an out-of-tree module, but that seems like a
very minor issue if you're running with a completely different SGX driver.

> It also means that any platform, with SGX hardware support,
> running a kernel with this driver, has the potential for the
> security/privacy issues noted above.

Unless I'm mistaken, /dev/sgx is root-only by default.  There are far
scarier mechanisms available to root for hosing the system.

> If key based policy management is not allowed, then the driver needs
> to be re-architected to have modular support so that alternative
> implementations or the absence of any driver support are at least
> tenable.

As above, using an alternative implementation is teneble, albeit a bit
kludgy.

Jarkko Sakkinen April 30, 2020, 3:46 a.m. UTC | #9

On Wed, Apr 29, 2020 at 05:27:48PM +0200, Jethro Beekman wrote:
> On 2020-04-21 23:52, Jarkko Sakkinen wrote:
> > Intel(R) SGX is a set of CPU instructions that can be used by applications
> > to set aside private regions of code and data. The code outside the enclave
> > is disallowed to access the memory inside the enclave by the CPU access
> > control.
> > 
> > There is a new hardware unit in the processor called Memory Encryption
> > Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define
> > one or many MEE regions that can hold enclave data by configuring them with
> > PRMRR registers.
> > 
> > The MEE automatically encrypts the data leaving the processor package to
> > the MEE regions. The data is encrypted using a random key whose life-time
> > is exactly one power cycle.
> > 
> > The current implementation requires that the firmware sets
> > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can
> > decide what enclaves it wants run. The implementation does not create
> > any bottlenecks to support read-only MSRs later on.
> > 
> > You can tell if your CPU supports SGX by looking into /proc/cpuinfo:
> > 
> > 	cat /proc/cpuinfo  | grep sgx
> 
> Let's merge this.

So can I tag reviewed-by's?

/Jarkko

Jarkko Sakkinen April 30, 2020, 3:59 a.m. UTC | #10

On Wed, Apr 29, 2020 at 08:14:59AM -0700, Sean Christopherson wrote:
> On Wed, Apr 29, 2020 at 08:23:29AM +0300, Jarkko Sakkinen wrote:
> > On Sun, Apr 26, 2020 at 11:57:53AM -0500, Dr. Greg wrote:
> > > On Wed, Apr 22, 2020 at 12:52:56AM +0300, Jarkko Sakkinen wrote:
> > > 
> > > Good day, I hope the weekend is going well for everyone.
> > > 
> > > > Intel(R) SGX is a set of CPU instructions that can be used by applications
> > > > to set aside private regions of code and data. The code outside the enclave
> > > > is disallowed to access the memory inside the enclave by the CPU access
> > > > control.
> > > >
> > > > ... [ elided ] ..
> > > > 
> > > > The current implementation requires that the firmware sets
> > > > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can
> > > > decide what enclaves it wants run. The implementation does not create
> > > > any bottlenecks to support read-only MSRs later on.
> > > 
> > > It seems highly unlikely that a driver implementation with any type of
> > > support for read-only launch control registers would ever get into the
> > > kernel.  All one needs to do is review the conversations that Matthew
> > > Garrett's lockdown patches engender to get a sense of that, ie:
> > > 
> > > https://lwn.net/Articles/818277/
> > 
> > We do not require read-only MSRs.
> 
> Greg is pointing out the opposite, that supporting read-only MSRs is highly
> unlikely to ever be supported in the mainline kernel.

In a nutshell, what is wrong in the current code changes and what
*exactly* should we change? This is way too high level at the moment at
least for my limited brain capacity.

/Jarkko

Jethro Beekman April 30, 2020, 7:19 a.m. UTC | #11

On 2020-04-30 05:46, Jarkko Sakkinen wrote:
> On Wed, Apr 29, 2020 at 05:27:48PM +0200, Jethro Beekman wrote:
>> On 2020-04-21 23:52, Jarkko Sakkinen wrote:
>>> Intel(R) SGX is a set of CPU instructions that can be used by applications
>>> to set aside private regions of code and data. The code outside the enclave
>>> is disallowed to access the memory inside the enclave by the CPU access
>>> control.
>>>
>>> There is a new hardware unit in the processor called Memory Encryption
>>> Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define
>>> one or many MEE regions that can hold enclave data by configuring them with
>>> PRMRR registers.
>>>
>>> The MEE automatically encrypts the data leaving the processor package to
>>> the MEE regions. The data is encrypted using a random key whose life-time
>>> is exactly one power cycle.
>>>
>>> The current implementation requires that the firmware sets
>>> IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can
>>> decide what enclaves it wants run. The implementation does not create
>>> any bottlenecks to support read-only MSRs later on.
>>>
>>> You can tell if your CPU supports SGX by looking into /proc/cpuinfo:
>>>
>>> 	cat /proc/cpuinfo  | grep sgx
>>
>> Let's merge this.
> 
> So can I tag reviewed-by's?
> 

No, but you already have my tested-by's.

If it helps I can try to review some patches, but 1) I know nothing about kernel coding guidelines and best practices and 2) I know little about most kernel internals, so I won't be able to review every patch.

--
Jethro Beekman | Fortanix

Jarkko Sakkinen April 30, 2020, 8:23 a.m. UTC | #12

On Thu, Apr 30, 2020 at 09:19:48AM +0200, Jethro Beekman wrote:
> On 2020-04-30 05:46, Jarkko Sakkinen wrote:
> > On Wed, Apr 29, 2020 at 05:27:48PM +0200, Jethro Beekman wrote:
> >> On 2020-04-21 23:52, Jarkko Sakkinen wrote:
> >>> Intel(R) SGX is a set of CPU instructions that can be used by applications
> >>> to set aside private regions of code and data. The code outside the enclave
> >>> is disallowed to access the memory inside the enclave by the CPU access
> >>> control.
> >>>
> >>> There is a new hardware unit in the processor called Memory Encryption
> >>> Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define
> >>> one or many MEE regions that can hold enclave data by configuring them with
> >>> PRMRR registers.
> >>>
> >>> The MEE automatically encrypts the data leaving the processor package to
> >>> the MEE regions. The data is encrypted using a random key whose life-time
> >>> is exactly one power cycle.
> >>>
> >>> The current implementation requires that the firmware sets
> >>> IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can
> >>> decide what enclaves it wants run. The implementation does not create
> >>> any bottlenecks to support read-only MSRs later on.
> >>>
> >>> You can tell if your CPU supports SGX by looking into /proc/cpuinfo:
> >>>
> >>> 	cat /proc/cpuinfo  | grep sgx
> >>
> >> Let's merge this.
> > 
> > So can I tag reviewed-by's?
> > 
> 
> No, but you already have my tested-by's.
> 
> If it helps I can try to review some patches, but 1) I know nothing
> about kernel coding guidelines and best practices and 2) I know little
> about most kernel internals, so I won't be able to review every patch.

Ackd-by *acknowledges* that the patches work for you. I think that would
be then the correct choice for the driver patch and patches before that.

Lets go with that if that is cool for you of course.

Did you run the selftest only or possibly also some internal Fortanix
tests?

/Jarkko

Jethro Beekman April 30, 2020, 2:12 p.m. UTC | #13

On 2020-04-30 10:23, Jarkko Sakkinen wrote:
> On Thu, Apr 30, 2020 at 09:19:48AM +0200, Jethro Beekman wrote:
>> On 2020-04-30 05:46, Jarkko Sakkinen wrote:
>>> On Wed, Apr 29, 2020 at 05:27:48PM +0200, Jethro Beekman wrote:
>>>> On 2020-04-21 23:52, Jarkko Sakkinen wrote:
>>>>> Intel(R) SGX is a set of CPU instructions that can be used by applications
>>>>> to set aside private regions of code and data. The code outside the enclave
>>>>> is disallowed to access the memory inside the enclave by the CPU access
>>>>> control.
>>>>>
>>>>> There is a new hardware unit in the processor called Memory Encryption
>>>>> Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define
>>>>> one or many MEE regions that can hold enclave data by configuring them with
>>>>> PRMRR registers.
>>>>>
>>>>> The MEE automatically encrypts the data leaving the processor package to
>>>>> the MEE regions. The data is encrypted using a random key whose life-time
>>>>> is exactly one power cycle.
>>>>>
>>>>> The current implementation requires that the firmware sets
>>>>> IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can
>>>>> decide what enclaves it wants run. The implementation does not create
>>>>> any bottlenecks to support read-only MSRs later on.
>>>>>
>>>>> You can tell if your CPU supports SGX by looking into /proc/cpuinfo:
>>>>>
>>>>> 	cat /proc/cpuinfo  | grep sgx
>>>>
>>>> Let's merge this.
>>>
>>> So can I tag reviewed-by's?
>>>
>>
>> No, but you already have my tested-by's.
>>
>> If it helps I can try to review some patches, but 1) I know nothing
>> about kernel coding guidelines and best practices and 2) I know little
>> about most kernel internals, so I won't be able to review every patch.
> 
> Ackd-by *acknowledges* that the patches work for you. I think that would
> be then the correct choice for the driver patch and patches before that.
> 
> Lets go with that if that is cool for you of course.
> 
> Did you run the selftest only or possibly also some internal Fortanix
> tests?
> 

v29 patches 2 through 18:

Acked-by: Jethro Beekman <jethro@fortanix.com>

I only ran production SGX software. I didn't run the self test.

--
Jethro Beekman | Fortanix

Dr. Greg May 2, 2020, 11:05 p.m. UTC | #14

On Thu, Apr 30, 2020 at 06:59:11AM +0300, Jarkko Sakkinen wrote:

Good afternoon, I hope the weekend is going well for everyone.

> On Wed, Apr 29, 2020 at 08:14:59AM -0700, Sean Christopherson wrote:
> > On Wed, Apr 29, 2020 at 08:23:29AM +0300, Jarkko Sakkinen wrote:
> > > On Sun, Apr 26, 2020 at 11:57:53AM -0500, Dr. Greg wrote:
> > > > On Wed, Apr 22, 2020 at 12:52:56AM +0300, Jarkko Sakkinen wrote:
> > >
> > > > > The current implementation requires that the firmware sets
> > > > > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately
> > > > > the kernel can decide what enclaves it wants run. The
> > > > > implementation does not create any bottlenecks to support
> > > > > read-only MSRs later on.
> > >
> > > > It seems highly unlikely that a driver implementation with any type of
> > > > support for read-only launch control registers would ever get into the
> > > > kernel.  All one needs to do is review the conversations that Matthew
> > > > Garrett's lockdown patches engender to get a sense of that, ie:
> > > > 
> > > > https://lwn.net/Articles/818277/
> > > 
> > > We do not require read-only MSRs.
> > 
> > Greg is pointing out the opposite, that supporting read-only MSRs is highly
> > unlikely to ever be supported in the mainline kernel.

> In a nutshell, what is wrong in the current code changes and what
> *exactly* should we change? This is way too high level at the moment
> at least for my limited brain capacity.

In a nutshell, the driver needs our patch that implements
cryptographic policy management.

A patch that:

1.) Does not change the default behavior of the driver.

2.) Implements capabilities that are consistent with what the hardware
was designed to do, but only at the discretion of the platform owner.

3.) Has no impact on the driver architecture.

The only consumer for this driver are SGX runtimes.  To our knowledge,
there exist only two complete runtime implementations, Intel's and
ours.  It us unclear why our approach to the use of the technology
should be discriminated against when it doesn't impact the other user
community.

The Linux kernel that I have worked on and supported since 1992 has
always focused on technical rationale and meritocracy rather then
politics.  We would be interested in why our proposal for the driver
fails on the former grounds rather then the latter.

> /Jarkko

Best wishes for a productive week.

Dr. Greg

As always,
Dr. Greg Wettstein, Ph.D, Worker      Artisans in autonomously
Enjellic Systems Development, LLC     self-defensive IOT platforms
4206 N. 19th Ave.                     and edge devices.
Fargo, ND  58102
PH: 701-281-1686                      EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"The best way to predict the future is to invent it."
                                -- Alan Kay

Andy Lutomirski May 3, 2020, 12:48 a.m. UTC | #15

> On May 2, 2020, at 4:05 PM, Dr. Greg <greg@enjellic.com> wrote:
> 
> On Thu, Apr 30, 2020 at 06:59:11AM +0300, Jarkko Sakkinen wrote:
> 
> Good afternoon, I hope the weekend is going well for everyone.
> 
>>> On Wed, Apr 29, 2020 at 08:14:59AM -0700, Sean Christopherson wrote:
>>> On Wed, Apr 29, 2020 at 08:23:29AM +0300, Jarkko Sakkinen wrote:
>>>> On Sun, Apr 26, 2020 at 11:57:53AM -0500, Dr. Greg wrote:
>>>>> On Wed, Apr 22, 2020 at 12:52:56AM +0300, Jarkko Sakkinen wrote:
>>>> 
>>>>>> The current implementation requires that the firmware sets
>>>>>> IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately
>>>>>> the kernel can decide what enclaves it wants run. The
>>>>>> implementation does not create any bottlenecks to support
>>>>>> read-only MSRs later on.
>>>> 
>>>>> It seems highly unlikely that a driver implementation with any type of
>>>>> support for read-only launch control registers would ever get into the
>>>>> kernel.  All one needs to do is review the conversations that Matthew
>>>>> Garrett's lockdown patches engender to get a sense of that, ie:
>>>>> 
>>>>> https://lwn.net/Articles/818277/
>>>> 
>>>> We do not require read-only MSRs.
>>> 
>>> Greg is pointing out the opposite, that supporting read-only MSRs is highly
>>> unlikely to ever be supported in the mainline kernel.
> 
>> In a nutshell, what is wrong in the current code changes and what
>> *exactly* should we change? This is way too high level at the moment
>> at least for my limited brain capacity.
> 
> In a nutshell, the driver needs our patch that implements
> cryptographic policy management.
> 
> A patch that:
> 
> 1.) Does not change the default behavior of the driver.
> 
> 2.) Implements capabilities that are consistent with what the hardware
> was designed to do, but only at the discretion of the platform owner.
> 
> 3.) Has no impact on the driver architecture.
> 
> The only consumer for this driver are SGX runtimes.  To our knowledge,
> there exist only two complete runtime implementations, Intel's and
> ours.  It us unclear why our approach to the use of the technology
> should be discriminated against when it doesn't impact the other user
> community.

Can you clarify how exactly this patch set discriminates against your stack?

> 
> The Linux kernel that I have worked on and supported since 1992 has
> always focused on technical rationale and meritocracy rather then
> politics.  We would be interested in why our proposal for the driver
> fails on the former grounds rather then the latter.
> 
>> /Jarkko
> 
> Best wishes for a productive week.
> 
> Dr. Greg
> 
> As always,
> Dr. Greg Wettstein, Ph.D, Worker      Artisans in autonomously
> Enjellic Systems Development, LLC     self-defensive IOT platforms
> 4206 N. 19th Ave.                     and edge devices.
> Fargo, ND  58102
> PH: 701-281-1686                      EMAIL: greg@enjellic.com
> ------------------------------------------------------------------------------
> "The best way to predict the future is to invent it."
>                                -- Alan Kay

Dr. Greg May 4, 2020, 9:34 a.m. UTC | #16

On Sat, May 02, 2020 at 05:48:30PM -0700, Andy Lutomirski wrote:

Good morning, I hope the week is starting well for everyone.

> > On May 2, 2020, at 4:05 PM, Dr. Greg <greg@enjellic.com> wrote:
> > In a nutshell, the driver needs our patch that implements
> > cryptographic policy management.
> > 
> > A patch that:
> > 
> > 1.) Does not change the default behavior of the driver.
> > 
> > 2.) Implements capabilities that are consistent with what the hardware
> > was designed to do, but only at the discretion of the platform owner.
> > 
> > 3.) Has no impact on the driver architecture.
> > 
> > The only consumer for this driver are SGX runtimes.  To our knowledge,
> > there exist only two complete runtime implementations, Intel's and
> > ours.  It us unclear why our approach to the use of the technology
> > should be discriminated against when it doesn't impact the other user
> > community.

> Can you clarify how exactly this patch set discriminates against
> your stack?

The driver has no provisions for implementing cryptographically based
SGX policy management of any type.

Our stack is extremely lightweight with no external dependencies and
is used in privacy and security sensitive applications, including
financial services of certain types.  There is a desire in this, and
other venues, to use cloud and edge resources with a strong guarantee
that the platforms have only had a known set of behaviors.  The
current DAC based controls in the driver are insufficient to provide
those guarantees.

I believe I have discussed our use of SGX previously.  In a nutshell,
we use SGX based modeling engines to supervise kernel based behavioral
namespaces, one enclave per namespace.  The closest equivalent work
that we have seen may be the IPE architecture advanced by Deven Bowers
at Microsoft but we address a number of issues that work does not,
including non-kernel based behavioral supervision.

We support the concern over hardware locked platforms and do not
disagree with the driver not supporting those platforms.  That being
said, there is no technical rationale for not providing cryptographic
policy management on FLC based platforms, as I believe our patch
demonstrates.

Best wishes for a productive week.

Dr. Greg

As always,
Dr. Greg Wettstein, Ph.D, Worker      Artisans in autonomously
Enjellic Systems Development, LLC     self-defensive platforms.
4206 N. 19th Ave.
Fargo, ND  58102
PH: 701-281-1686                      EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"Can't they?

 A 64bit number incremented every millisecond can grow for half a
 billion years.  As far as I'm concerned, that is forever."
                                -- Neil Brown
                                   linux-raid

Jarkko Sakkinen May 6, 2020, 12:16 p.m. UTC | #17

On Thu, Apr 30, 2020 at 04:12:07PM +0200, Jethro Beekman wrote:
> On 2020-04-30 10:23, Jarkko Sakkinen wrote:
> > On Thu, Apr 30, 2020 at 09:19:48AM +0200, Jethro Beekman wrote:
> >> On 2020-04-30 05:46, Jarkko Sakkinen wrote:
> >>> On Wed, Apr 29, 2020 at 05:27:48PM +0200, Jethro Beekman wrote:
> >>>> On 2020-04-21 23:52, Jarkko Sakkinen wrote:
> >>>>> Intel(R) SGX is a set of CPU instructions that can be used by applications
> >>>>> to set aside private regions of code and data. The code outside the enclave
> >>>>> is disallowed to access the memory inside the enclave by the CPU access
> >>>>> control.
> >>>>>
> >>>>> There is a new hardware unit in the processor called Memory Encryption
> >>>>> Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define
> >>>>> one or many MEE regions that can hold enclave data by configuring them with
> >>>>> PRMRR registers.
> >>>>>
> >>>>> The MEE automatically encrypts the data leaving the processor package to
> >>>>> the MEE regions. The data is encrypted using a random key whose life-time
> >>>>> is exactly one power cycle.
> >>>>>
> >>>>> The current implementation requires that the firmware sets
> >>>>> IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can
> >>>>> decide what enclaves it wants run. The implementation does not create
> >>>>> any bottlenecks to support read-only MSRs later on.
> >>>>>
> >>>>> You can tell if your CPU supports SGX by looking into /proc/cpuinfo:
> >>>>>
> >>>>> 	cat /proc/cpuinfo  | grep sgx
> >>>>
> >>>> Let's merge this.
> >>>
> >>> So can I tag reviewed-by's?
> >>>
> >>
> >> No, but you already have my tested-by's.
> >>
> >> If it helps I can try to review some patches, but 1) I know nothing
> >> about kernel coding guidelines and best practices and 2) I know little
> >> about most kernel internals, so I won't be able to review every patch.
> > 
> > Ackd-by *acknowledges* that the patches work for you. I think that would
> > be then the correct choice for the driver patch and patches before that.
> > 
> > Lets go with that if that is cool for you of course.
> > 
> > Did you run the selftest only or possibly also some internal Fortanix
> > tests?
> > 
> 
> v29 patches 2 through 18:
> 
> Acked-by: Jethro Beekman <jethro@fortanix.com>
> 
> I only ran production SGX software. I didn't run the self test.

That's great to hear thank you.

Updated my tree accordingly.

/Jarkko

Jordan Hand May 6, 2020, 4:39 p.m. UTC | #18

On 4/21/20 2:52 PM, Jarkko Sakkinen wrote:
> Intel(R) SGX is a set of CPU instructions that can be used by applications
> to set aside private regions of code and data. The code outside the enclave
> is disallowed to access the memory inside the enclave by the CPU access
> control.
> 
> There is a new hardware unit in the processor called Memory Encryption
> Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define
> one or many MEE regions that can hold enclave data by configuring them with
> PRMRR registers.
> 
> The MEE automatically encrypts the data leaving the processor package to
> the MEE regions. The data is encrypted using a random key whose life-time
> is exactly one power cycle.
> 
> The current implementation requires that the firmware sets
> IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can
> decide what enclaves it wants run. The implementation does not create
> any bottlenecks to support read-only MSRs later on.
> 
> You can tell if your CPU supports SGX by looking into /proc/cpuinfo:
> 
> 	cat /proc/cpuinfo  | grep sgx
> 
> v29:
> * The selftest has been moved to selftests/sgx. Because SGX is an execution
>    environment of its own, it really isn't a great fit with more "standard"
>    x86 tests.
> 
>    The RSA key is now generated on fly and the whole signing process has
>    been made as part of the enclave loader instead of signing the enclave
>    during the compilation time.
> 
>    Finally, the enclave loader loads now the test enclave directly from its
>    ELF file, which means that ELF file does not need to be coverted as raw
>    binary during the build process.
> * Version the mm_list instead of using on synchronize_mm() when adding new
>    entries. We hold the write lock for the mm_struct, and dup_mm() can thus
>    deadlock with the page reclaimer, which could hold the lock for the old
>    mm_struct.
> * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. anonymous) can
>    be used to reserve the address range. Now /dev/sgx supports only opaque
>    mappings to the (initialized) enclave data.
> * Make the vDSO callable directly from C by preserving RBX and taking leaf
>    from RCX.
> 

Tested with the Open Enclave SDK on top of Intel PSW. Specifically built 
the Intel PSW with changes to support /dev/sgx mapping[1] new in v29.

Tested-by: Jordan Hand <jorhand@linux.microsoft.com>

[1] https://github.com/intel/linux-sgx/pull/530

Nathaniel McCallum May 6, 2020, 9:42 p.m. UTC | #19

Tested on Enarx. This requires a patch[0] for v29 support.

Tested-by: Nathaniel McCallum <npmccallum@redhat.com>

However, we did uncover a small usability issue. See below.

[0]: https://github.com/enarx/enarx/pull/507/commits/80da2352aba46aa7bc6b4d1fccf20fe1bda58662

On Tue, Apr 21, 2020 at 5:53 PM Jarkko Sakkinen
<jarkko.sakkinen@linux.intel.com> wrote:
>
> Intel(R) SGX is a set of CPU instructions that can be used by applications
> to set aside private regions of code and data. The code outside the enclave
> is disallowed to access the memory inside the enclave by the CPU access
> control.
>
> There is a new hardware unit in the processor called Memory Encryption
> Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define
> one or many MEE regions that can hold enclave data by configuring them with
> PRMRR registers.
>
> The MEE automatically encrypts the data leaving the processor package to
> the MEE regions. The data is encrypted using a random key whose life-time
> is exactly one power cycle.
>
> The current implementation requires that the firmware sets
> IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can
> decide what enclaves it wants run. The implementation does not create
> any bottlenecks to support read-only MSRs later on.
>
> You can tell if your CPU supports SGX by looking into /proc/cpuinfo:
>
>         cat /proc/cpuinfo  | grep sgx
>
> v29:
> * The selftest has been moved to selftests/sgx. Because SGX is an execution
>   environment of its own, it really isn't a great fit with more "standard"
>   x86 tests.
>
>   The RSA key is now generated on fly and the whole signing process has
>   been made as part of the enclave loader instead of signing the enclave
>   during the compilation time.
>
>   Finally, the enclave loader loads now the test enclave directly from its
>   ELF file, which means that ELF file does not need to be coverted as raw
>   binary during the build process.
> * Version the mm_list instead of using on synchronize_mm() when adding new
>   entries. We hold the write lock for the mm_struct, and dup_mm() can thus
>   deadlock with the page reclaimer, which could hold the lock for the old
>   mm_struct.
> * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. anonymous) can
>   be used to reserve the address range. Now /dev/sgx supports only opaque
>   mappings to the (initialized) enclave data.

The statement "Any mapping..." isn't actually true.

Enarx creates a large enclave (currently 64GiB). This worked when we
created a file-backed mapping on /dev/sgx/enclave. However, switching
to an anonymous mapping fails with ENOMEM. We suspect this is because
the kernel attempts to allocate all the pages and zero them but there
is insufficient RAM available. We currently work around this by
creating a shared mapping on /dev/zero.

If we want to keep this mmap() strategy, we probably don't want to
advise mmap(ANON) if it allocates all the memory for the enclave ahead
of time, even if it won't be used. This would be wasteful.

OTOH, having to mmap("/dev/zero") seems a bit awkward.

Currently, the selftest uses mmap(ANON) and the patch message above
recommends it.

> * Make the vDSO callable directly from C by preserving RBX and taking leaf
>   from RCX.
>
> v28:
> * Documented to Documentation/x86/sgx.rst how the kernel manages the
>   enclave ownership.
> * Removed non-LC flow from sgx_einit().
> * Removed struct sgx_einittoken since only the size of the corresponding
>   microarchitectural structure is used in the series ATM.
>
> v27:
> * Disallow RIE processes to use enclaves as there could a permission
>   conflict between VMA and enclave permissions.
> * In the documentation, replace "grep /proc/cpuinfo" with
>   "grep sgx /proc/cpuinfo".
>
> v26:
> * Fixed the commit author in "x86/sgx: Linux Enclave Driver", which was
>   changed in v25 by mistake.
> * Addressed a bunch of grammar mistakes in sgx.rst (thanks Randy once
>   again for such a detailed feedback).
> * Added back the MAINTAINERS update commit, which was mistakenly removed
>   in v25.
> * EREMOVE's for SECS cannot be done while sanitizing an EPC section. The
>   CPU does not allow to remove a SECS page before all of its children have
>   been removed, and a child page can be in some other section than the one
>   currently being processed. Thus, removed special SECS processing from
>   sgx_sanitize_page() and instead put sections through it twice. In the
>   2nd round the lists should only contain SECS pages.
>
> v25:
> * Fix a double-free issue when SGX_IOC_ENCLAVE_ADD_PAGES
>   fails on executing ENCLS[EADD]. The rollback path executed
>   radix_tree_delete() on the same address twice when this happened.
> * Return -EINTR instead of -ERESTARTSYS in SGX_IOC_ENCLAVE_ADD_PAGES when
>   a signal is pending.
> * As requested by Borislav, move the CPUID 0x12 features to their own word
>   in cpufeatures.
> * Sean fixed a bug from sgx_reclaimer_write() where sgx_encl_put_backing()
>   was called with an uninitialized pointer when sgx_encl_get_backing()
>   fails.
> * Migrated /dev/sgx/* to misc. This is future-proof as struct miscdevice
>   has 'groups' for setting up sysfs attributes for the device.
> * Use device_initcall instead of subsys_initcall so that misc_class is
>   initialized before SGX is initialized.
> * Return -EACCES in SGX_IOC_ENCLAVE_INIT when caller tries to select
>   enclave attributes that we the kernel does not allow it to set instead
>   of -EINVAL.
> * Unless SGX public key MSRs are writable always deny the feature from
>   Linux. Previously this was only denied from driver. How VMs should be
>   supported is not really part of initial patch set, which makes this
>   an obvious choice.
> * Cleaned up and refined documentation to be more approachable.
>
> v24:
> * Reclaim unmeasured and TCS pages (regression in v23).
> * Replace usages of GFP_HIGHUSER with GFP_KERNEL.
> * Return -EIO on when EADD or EEXTEND fails in %SGX_IOC_ENCLAVE_ADD_PAGES
>   and use the same rollback (destroy enclave). This can happen when host
>   suspends itself unknowingly to a VM running enclaves. From -EIO the user
>   space can deduce what happened.
> * Have a separate @count in struct sgx_enclave_add_pages to output number
>   of bytes processed instead of overwriting the input parameters for
>   clarity and more importantly that the API provides means for partial
>   processing (@count could be less than @length in success case).
>
> v23:
> * Replace SGX_ENCLAVE_ADD_PAGE with SGX_ENCLAVE_ADD_PAGES. Replace @mrmask
>   with %SGX_PAGE_MEASURE flag.
> * Return -EIO instead of -ECANCELED when ptrace() fails to read a TCS page.
> * In the reclaimer, pin page before ENCLS[EBLOCK] because pinning can fail
>   (because of OOM) even in legit behaviour and after EBLOCK the reclaiming
>   flow can be only reverted by killing the whole enclave.
> * Fixed SGX_ATTR_RESERVED_MASK. Bit 7 was marked as reserved while in fact
>   it should have been bit 6 (Table 37-3 in the SDM).
> * Return -EPERM from SGX_IOC_ENCLAVE_INIT when ENCLS[EINIT] returns an SGX
>   error code.
>
> v22:
> * Refined bunch commit messages and added associated SDM references as
>   many of them were too exhausting and some outdated.
> * Alignment checks have been removed from mmap() because it does not define the
>   ELRANGE. VMAs only act as windows to the enclave. The semantics compare
>   somewhat how mmap() works with regular files.
> * We now require user space addresses given to SGX_IOC_ENCLAVE_ADD_PAGE to be
>   page aligned so that we can pass the page directly to EADD and do not have
>   to do an extra copy. This was made effectively possible by removing the
>   worker thread for adding pages.
> * The selftest build files have been refined throughout of various glitches
>   and work properly in a cross compilation environment such as BuildRoot.
>   In addition, libcalls fail the build with an assertion in the linker
>   script, if they end up to the enclave binary.
> * CONFIG_INTEL_SGX_DRIVER has been removed because you cannot use SGX core
>   for anything without having the driver. This could change when KVM support
>   is added.
> * We require zero permissions in SECINFO for TCS pages because the CPU
>   overwrites SECINFO flags with zero permissions and measures the page
>   only after that. Allowing to pass TCS with non-zero permissions would
>   cause mismatching measurement between the one provided in SIGSTRUCT and
>   the one computed by the CPU.
> * Obviously lots of small fixes and clean ups (does make sense to
>   document them all).
>
> v21:
> * Check on mmap() that the VMA does cover an area that does not have
>   enclave pages. Only mapping with PROT_NONE can do that to reserve
>   initial address space for an enclave.
> * Check om mmap() and mprotect() that the VMA permissions do not
>   surpass the enclave permissions.
> * Remove two refcounts from vma_close(): mm_list and encl->refcount.
>   Enclave refcount is only need for swapper/enclave sync and we can
>   remove mm_list refcount by destroying mm_struct when the process
>   is closed. By not having vm_close() the Linux MM can merge VMAs.
> * Do not naturally align MAP_FIXED address.
> * Numerous small fixes and clean ups.
> * Use SRCU for synchronizing the list of mm_struct's.
> * Move to stack based call convention in the vDSO.
>
> v20:
> * Fine-tune Kconfig messages and spacing and remove MMU_NOTIFIER
>   dependency as MMU notifiers are no longer used in the driver.
> * Use mm_users instead of mm_count as refcount for mm_struct as mm_count
>   only protects from deleting mm_struct, not removing its contents.
> * Sanitize EPC when the reclaimer thread starts by doing EREMOVE for all
>   of them. They could be in initialized state when the kernel starts
>   because it might be spawned by kexec().
> * Documentation overhaul.
> * Use a device /dev/sgx/provision for delivering the provision token
>   instead of securityfs.
> * Create a reference to the enclave when already when opening
>   /dev/sgx/enclave.  The file is then associated with this enclave only.
>   mmap() can be done at free at any point and always get a reference to
>   the enclave. To summarize the file now represents the enclave.
>
> v19:
> * Took 3-4 months but in some sense this was more like a rewrite of most
>   of the corners of the source code. If I've forgotten to deal with some
>   feedback, please don't shout me. Make a remark and I will fix it for
>   the next version. Hopefully there won't be this big turnovers anymore.
> * Validate SECS attributes properly against CPUID given attributes and
>   against allowed attributes. SECS attributes are the ones that are
>   enforced whereas SIGSTRUCT attributes tell what is required to run
>   the enclave.
> * Add KSS (Key Sharing Support) to the enclave attributes.
> * Deny MAP_PRIVATE as an enclave is always a shared memory entity.
> * Revert back to shmem backing storage so that it can be easily shared
>   by multiple processes.
> * Split the recognization of an ENCLS leaf failure by using three
>   functions to detect it: encsl_faulted(), encls_returned_code() and
>   sgx_failed(). encls_failed() is only caused by a spurious expections that
>   should never happen. Thus, it is not defined as an inline function in
>   order to easily insert a kprobe to it.
> * Move low-level enclave management routines, page fault handler and page
>   reclaiming routines from driver to the core. These cannot be separated
>   from each other as they are heavily interdependent. The rationale is that
>   the core does not call any code from the driver.
> * Allow the driver to be compiled as a module now that it no code is using
>   its routines and it only uses exported symbols. Now the driver is
>   essentially just a thin ioctl layer.
> * Reworked the driver to maintain a list of mm_struct's. The VMA callbacks
>   add new entries to this list as the process is forked. Each entry has
>   its own refcount because they have a different life-cycle as the enclave
>   does. In effect @tgid and @mm have been removed from struct sgx_encl
>   and we allow forking by removing VM_DONTCOPY from vm flags.
> * Generate a cpu mask in the reclaimer from the cpu mask's of all
>   mm_struct's. This will kick out the hardware threads out of the enclave
>   from multiple processes. It is not a local variable because it would
>   eat too much of the stack space but instead a field in struct
>   sgx_encl.
> * Allow forking i.e. remove VM_DONTCOPY. I did not change the API
>   because the old API scaled to the workload that Andy described. The
>   codebase is now mostly API independent i.e. changing the API is a
>   small task. For me the proper trigger to chanage it is a as concrete
>   as possible workload that cannot be fulfilled. I hope you understand
>   my thinking here. I don't want to change anything w/o proper basis
>   but I'm ready to change anything if there is a proper basis. I do
>   not have any kind of attachment to any particular type of API.
> * Add Sean's vDSO ENCLS(EENTER) patches and update selftest to use the
>   new vDSO.
>
> v18:
> * Update the ioctl-number.txt.
> * Move the driver under arch/x86.
> * Add SGX features (SGX, SGX1, SGX2) to the disabled-features.h.
> * Rename the selftest as test_sgx (previously sgx-selftest).
> * In order to enable process accounting, swap EPC pages and PCMD's to a VMA
>   instead of shmem.
> * Allow only to initialize and run enclaves with a subset of
>   {DEBUG, MODE64BIT} set.
> * Add SGX_IOC_ENCLAVE_SET_ATTRIBUTE to allow an enclave to have privileged
>   attributes e.g. PROVISIONKEY.
>
> v17:
> * Add a simple selftest.
> * Fix a null pointer dereference to section->pages when its
>   allocation fails.
> * Add Sean's description of the exception handling to the documentation.
>
> v16:
> * Fixed SOB's in the commits that were a bit corrupted in v15.
> * Implemented exceptio handling properly to detect_sgx().
> * Use GENMASK() to define SGX_CPUID_SUB_LEAF_TYPE_MASK.
> * Updated the documentation to use rst definition lists.
> * Added the missing Documentation/x86/index.rst, which has a link to
>   intel_sgx.rst. Now the SGX and uapi documentation is properly generated
>   with 'make htmldocs'.
> * While enumerating EPC sections, if an undefined section is found, fail
>   the driver initialization instead of continuing the initialization.
> * Issue a warning if there are more than %SGX_MAX_EPC_SECTIONS.
> * Remove copyright notice from arch/x86/include/asm/sgx.h.
> * Migrated from ioremap_cache() to memremap().
>
> v15:
> * Split into more digestable size patches.
> * Lots of small fixes and clean ups.
> * Signal a "plain" SIGSEGV on an EPCM violation.
>
> v14:
> * Change the comment about X86_FEATURE_SGX_LC from “SGX launch
>   configuration” to “SGX launch control”.
> * Move the SGX-related CPU feature flags as part of the Linux defined
>   virtual leaf 8.
> * Add SGX_ prefix to the constants defining the ENCLS leaf functions.
> * Use GENMASK*() and BIT*() in sgx_arch.h instead of raw hex numbers.
> * Refine the long description for CONFIG_INTEL_SGX_CORE.
> * Do not use pr_*_ratelimited()  in the driver. The use of the rate limited
>   versions is legacy cruft from the prototyping phase.
> * Detect sleep with SGX_INVALID_EINIT_TOKEN instead of counting power
>   cycles.
> * Manually prefix with “sgx:” in the core SGX code instead of redefining
>   pr_fmt.
> * Report if IA32_SGXLEPUBKEYHASHx MSRs are not writable in the driver
>   instead of core because it is a driver requirement.
> * Change prompt to bool in the entry for CONFIG_INTEL_SGX_CORE because the
>   default is ‘n’.
> * Rename struct sgx_epc_bank as struct sgx_epc_section in order to match
>   the SDM.
> * Allocate struct sgx_epc_page instances one at a time.
> * Use “__iomem void *” pointers for the mapped EPC memory consistently.
> * Retry once on SGX_INVALID_TOKEN in sgx_einit() instead of counting power
>   cycles.
> * Call enclave swapping operations directly from the driver instead of
>   calling them .indirectly through struct sgx_epc_page_ops because indirect
>   calls are not required yet as the patch set does not contain the KVM
>   support.
> * Added special signal SEGV_SGXERR to notify about SGX EPCM violation
>   errors.
>
> v13:
> * Always use SGX_CPUID constant instead of a hardcoded value.
> * Simplified and documented the macros and functions for ENCLS leaves.
> * Enable sgx_free_page() to free active enclave pages on demand
>   in order to allow sgx_invalidate() to delete enclave pages.
>   It no longer performs EREMOVE if a page is in the process of
>   being reclaimed.
> * Use PM notifier per enclave so that we don't have to traverse
>   the global list of active EPC pages to find enclaves.
> * Removed unused SGX_LE_ROLLBACK constant from uapi/asm/sgx.h
> * Always use ioremap() to map EPC banks as we only support 64-bit kernel.
> * Invalidate IA32_SGXLEPUBKEYHASH cache used by sgx_einit() when going
>   to sleep.
>
> v12:
> * Split to more narrow scoped commits in order to ease the review process and
>   use co-developed-by tag for co-authors of commits instead of listing them in
>   the source files.
> * Removed cruft EXPORT_SYMBOL() declarations and converted to static variables.
> * Removed in-kernel LE i.e. this version of the SGX software stack only
>   supports unlocked IA32_SGXLEPUBKEYHASHx MSRs.
> * Refined documentation on launching enclaves, swapping and enclave
>   construction.
> * Refined sgx_arch.h to include alignment information for every struct that
>   requires it and removed structs that are not needed without an LE.
> * Got rid of SGX_CPUID.
> * SGX detection now prints log messages about firmware configuration issues.
>
> v11:
> * Polished ENCLS wrappers with refined exception handling.
> * ksgxswapd was not stopped (regression in v5) in
>   sgx_page_cache_teardown(), which causes a leaked kthread after driver
>   deinitialization.
> * Shutdown sgx_le_proxy when going to suspend because its EPC pages will be
>   invalidated when resuming, which will cause it not function properly
>   anymore.
> * Set EINITTOKEN.VALID to zero for a token that is passed when
>   SGXLEPUBKEYHASH matches MRSIGNER as alloc_page() does not give a zero
>   page.
> * Fixed the check in sgx_edbgrd() for a TCS page. Allowed to read offsets
>   around the flags field, which causes a #GP. Only flags read is readable.
> * On read access memcpy() call inside sgx_vma_access() had src and dest
>   parameters in wrong order.
> * The build issue with CONFIG_KASAN is now fixed. Added undefined symbols
>   to LE even if “KASAN_SANITIZE := false” was set in the makefile.
> * Fixed a regression in the #PF handler. If a page has
>   SGX_ENCL_PAGE_RESERVED flag the #PF handler should unconditionally fail.
>   It did not, which caused weird races when trying to change other parts of
>   swapping code.
> * EPC management has been refactored to a flat LRU cache and moved to
>   arch/x86. The swapper thread reads a cluster of EPC pages and swaps all
>   of them. It can now swap from multiple enclaves in the same round.
> * For the sake of consistency with SGX_IOC_ENCLAVE_ADD_PAGE, return -EINVAL
>   when an enclave is already initialized or dead instead of zero.
>
> v10:
> * Cleaned up anon inode based IPC between the ring-0 and ring-3 parts
>   of the driver.
> * Unset the reserved flag from an enclave page if EDBGRD/WR fails
>   (regression in v6).
> * Close the anon inode when LE is stopped (regression in v9).
> * Update the documentation with a more detailed description of SGX.
>
> v9:
> * Replaced kernel-LE IPC based on pipes with an anonymous inode.
>   The driver does not require anymore new exports.
>
> v8:
> * Check that public key MSRs match the LE public key hash in the
>   driver initialization when the MSRs are read-only.
> * Fix the race in VA slot allocation by checking the fullness
>   immediately after succeesful allocation.
> * Fix the race in hash mrsigner calculation between the launch
>   enclave and user enclaves by having a separate lock for hash
>   calculation.
>
> v7:
> * Fixed offset calculation in sgx_edbgr/wr(). Address was masked with PAGE_MASK
>   when it should have been masked with ~PAGE_MASK.
> * Fixed a memory leak in sgx_ioc_enclave_create().
> * Simplified swapping code by using a pointer array for a cluster
>   instead of a linked list.
> * Squeezed struct sgx_encl_page to 32 bytes.
> * Fixed deferencing of an RSA key on OpenSSL 1.1.0.
> * Modified TC's CMAC to use kernel AES-NI. Restructured the code
>   a bit in order to better align with kernel conventions.
>
> v6:
> * Fixed semaphore underrun when accessing /dev/sgx from the launch enclave.
> * In sgx_encl_create() s/IS_ERR(secs)/IS_ERR(encl)/.
> * Removed virtualization chapter from the documentation.
> * Changed the default filename for the signing key as signing_key.pem.
> * Reworked EPC management in a way that instead of a linked list of
>   struct sgx_epc_page instances there is an array of integers that
>   encodes address and bank of an EPC page (the same data as 'pa' field
>   earlier). The locking has been moved to the EPC bank level instead
>   of a global lock.
> * Relaxed locking requirements for EPC management. EPC pages can be
>   released back to the EPC bank concurrently.
> * Cleaned up ptrace() code.
> * Refined commit messages for new architectural constants.
> * Sorted includes in every source file.
> * Sorted local variable declarations according to the line length in
>   every function.
> * Style fixes based on Darren's comments to sgx_le.c.
>
> v5:
> * Described IPC between the Launch Enclave and kernel in the commit messages.
> * Fixed all relevant checkpatch.pl issues that I have forgot fix in earlier
>   versions except those that exist in the imported TinyCrypt code.
> * Fixed spelling mistakes in the documentation.
> * Forgot to check the return value of sgx_drv_subsys_init().
> * Encapsulated properly page cache init and teardown.
> * Collect epc pages to a temp list in sgx_add_epc_bank
> * Removed SGX_ENCLAVE_INIT_ARCH constant.
>
> v4:
> * Tied life-cycle of the sgx_le_proxy process to /dev/sgx.
> * Removed __exit annotation from sgx_drv_subsys_exit().
> * Fixed a leak of a backing page in sgx_process_add_page_req() in the
>   case when vm_insert_pfn() fails.
> * Removed unused symbol exports for sgx_page_cache.c.
> * Updated sgx_alloc_page() to require encl parameter and documented the
>   behavior (Sean Christopherson).
> * Refactored a more lean API for sgx_encl_find() and documented the behavior.
> * Moved #PF handler to sgx_fault.c.
> * Replaced subsys_system_register() with plain bus_register().
> * Retry EINIT 2nd time only if MSRs are not locked.
>
> v3:
> * Check that FEATURE_CONTROL_LOCKED and FEATURE_CONTROL_SGX_ENABLE are set.
> * Return -ERESTARTSYS in __sgx_encl_add_page() when sgx_alloc_page() fails.
> * Use unused bits in epc_page->pa to store the bank number.
> * Removed #ifdef for WQ_NONREENTRANT.
> * If mmu_notifier_register() fails with -EINTR, return -ERESTARTSYS.
> * Added --remove-section=.got.plt to objcopy flags in order to prevent a
>   dummy .got.plt, which will cause an inconsistent size for the LE.
> * Documented sgx_encl_* functions.
> * Added remark about AES implementation used inside the LE.
> * Removed redundant sgx_sys_exit() from le/main.c.
> * Fixed struct sgx_secinfo alignment from 128 to 64 bytes.
> * Validate miscselect in sgx_encl_create().
> * Fixed SSA frame size calculation to take the misc region into account.
> * Implemented consistent exception handling to __encls() and __encls_ret().
> * Implemented a proper device model in order to allow sysfs attributes
>   and in-kernel API.
> * Cleaned up various "find enclave" implementations to the unified
>   sgx_encl_find().
> * Validate that vm_pgoff is zero.
> * Discard backing pages with shmem_truncate_range() after EADD.
> * Added missing EEXTEND operations to LE signing and launch.
> * Fixed SSA size for GPRS region from 168 to 184 bytes.
> * Fixed the checks for TCS flags. Now DBGOPTIN is allowed.
> * Check that TCS addresses are in ELRANGE and not just page aligned.
> * Require kernel to be compiled with X64_64 and CPU_SUP_INTEL.
> * Fixed an incorrect value for SGX_ATTR_DEBUG from 0x01 to 0x02.
>
> v2:
> * get_rand_uint32() changed the value of the pointer instead of value
>   where it is pointing at.
> * Launch enclave incorrectly used sigstruct attributes-field instead of
>   enclave attributes-field.
> * Removed unused struct sgx_add_page_req from sgx_ioctl.c
> * Removed unused sgx_has_sgx2.
> * Updated arch/x86/include/asm/sgx.h so that it provides stub
>   implementations when sgx in not enabled.
> * Removed cruft rdmsr-calls from sgx_set_pubkeyhash_msrs().
> * return -ENOMEM in sgx_alloc_page() when VA pages consume too much space
> * removed unused global sgx_nr_pids
> * moved sgx_encl_release to sgx_encl.c
> * return -ERESTARTSYS instead of -EINTR in sgx_encl_init()
>
> Jarkko Sakkinen (10):
>   x86/sgx: Update MAINTAINERS
>   x86/sgx: Add SGX microarchitectural data structures
>   x86/sgx: Add wrappers for ENCLS leaf functions
>   x86/sgx: Add functions to allocate and free EPC pages
>   x86/sgx: Linux Enclave Driver
>   x86/sgx: Add provisioning
>   x86/sgx: Add a page reclaimer
>   x86/sgx: ptrace() support for the SGX driver
>   selftests/x86: Add a selftest for SGX
>   docs: x86/sgx: Document SGX micro architecture and kernel internals
>
> Sean Christopherson (10):
>   x86/cpufeatures: x86/msr: Add Intel SGX hardware bits
>   x86/cpufeatures: x86/msr: Intel SGX Launch Control hardware bits
>   x86/mm: x86/sgx: Signal SIGSEGV with PF_SGX
>   x86/cpu/intel: Detect SGX support
>   x86/sgx: Enumerate and track EPC sections
>   mm: Introduce vm_ops->may_mprotect()
>   x86/vdso: Add support for exception fixup in vDSO functions
>   x86/fault: Add helper function to sanitize error code
>   x86/traps: Attempt to fixup exceptions in vDSO before signaling
>   x86/vdso: Implement a vDSO for Intel SGX enclave call
>
>  .../userspace-api/ioctl/ioctl-number.rst      |   1 +
>  Documentation/x86/index.rst                   |   1 +
>  Documentation/x86/sgx.rst                     | 206 +++++
>  MAINTAINERS                                   |  11 +
>  arch/x86/Kconfig                              |  14 +
>  arch/x86/entry/vdso/Makefile                  |   8 +-
>  arch/x86/entry/vdso/extable.c                 |  46 +
>  arch/x86/entry/vdso/extable.h                 |  29 +
>  arch/x86/entry/vdso/vdso-layout.lds.S         |   9 +-
>  arch/x86/entry/vdso/vdso.lds.S                |   1 +
>  arch/x86/entry/vdso/vdso2c.h                  |  58 +-
>  arch/x86/entry/vdso/vsgx_enter_enclave.S      | 131 +++
>  arch/x86/include/asm/cpufeature.h             |   5 +-
>  arch/x86/include/asm/cpufeatures.h            |   8 +-
>  arch/x86/include/asm/disabled-features.h      |  18 +-
>  arch/x86/include/asm/enclu.h                  |   8 +
>  arch/x86/include/asm/msr-index.h              |   8 +
>  arch/x86/include/asm/required-features.h      |   2 +-
>  arch/x86/include/asm/traps.h                  |   1 +
>  arch/x86/include/asm/vdso.h                   |   5 +
>  arch/x86/include/uapi/asm/sgx.h               | 175 ++++
>  arch/x86/kernel/cpu/Makefile                  |   1 +
>  arch/x86/kernel/cpu/common.c                  |   4 +
>  arch/x86/kernel/cpu/feat_ctl.c                |  32 +-
>  arch/x86/kernel/cpu/sgx/Makefile              |   6 +
>  arch/x86/kernel/cpu/sgx/arch.h                | 343 ++++++++
>  arch/x86/kernel/cpu/sgx/driver.c              | 209 +++++
>  arch/x86/kernel/cpu/sgx/driver.h              |  32 +
>  arch/x86/kernel/cpu/sgx/encl.c                | 756 +++++++++++++++++
>  arch/x86/kernel/cpu/sgx/encl.h                | 128 +++
>  arch/x86/kernel/cpu/sgx/encls.h               | 238 ++++++
>  arch/x86/kernel/cpu/sgx/ioctl.c               | 800 ++++++++++++++++++
>  arch/x86/kernel/cpu/sgx/main.c                | 280 ++++++
>  arch/x86/kernel/cpu/sgx/reclaim.c             | 471 +++++++++++
>  arch/x86/kernel/cpu/sgx/sgx.h                 | 108 +++
>  arch/x86/kernel/traps.c                       |  14 +
>  arch/x86/mm/fault.c                           |  45 +-
>  include/linux/mm.h                            |   2 +
>  mm/mprotect.c                                 |  14 +-
>  tools/arch/x86/include/asm/cpufeatures.h      |   7 +-
>  tools/testing/selftests/Makefile              |   1 +
>  tools/testing/selftests/sgx/.gitignore        |   2 +
>  tools/testing/selftests/sgx/Makefile          |  53 ++
>  tools/testing/selftests/sgx/call.S            |  54 ++
>  tools/testing/selftests/sgx/defines.h         |  21 +
>  tools/testing/selftests/sgx/load.c            | 282 ++++++
>  tools/testing/selftests/sgx/main.c            | 199 +++++
>  tools/testing/selftests/sgx/main.h            |  38 +
>  tools/testing/selftests/sgx/sigstruct.c       | 395 +++++++++
>  tools/testing/selftests/sgx/test_encl.c       |  20 +
>  tools/testing/selftests/sgx/test_encl.lds     |  40 +
>  .../selftests/sgx/test_encl_bootstrap.S       |  89 ++
>  52 files changed, 5398 insertions(+), 31 deletions(-)
>  create mode 100644 Documentation/x86/sgx.rst
>  create mode 100644 arch/x86/entry/vdso/extable.c
>  create mode 100644 arch/x86/entry/vdso/extable.h
>  create mode 100644 arch/x86/entry/vdso/vsgx_enter_enclave.S
>  create mode 100644 arch/x86/include/asm/enclu.h
>  create mode 100644 arch/x86/include/uapi/asm/sgx.h
>  create mode 100644 arch/x86/kernel/cpu/sgx/Makefile
>  create mode 100644 arch/x86/kernel/cpu/sgx/arch.h
>  create mode 100644 arch/x86/kernel/cpu/sgx/driver.c
>  create mode 100644 arch/x86/kernel/cpu/sgx/driver.h
>  create mode 100644 arch/x86/kernel/cpu/sgx/encl.c
>  create mode 100644 arch/x86/kernel/cpu/sgx/encl.h
>  create mode 100644 arch/x86/kernel/cpu/sgx/encls.h
>  create mode 100644 arch/x86/kernel/cpu/sgx/ioctl.c
>  create mode 100644 arch/x86/kernel/cpu/sgx/main.c
>  create mode 100644 arch/x86/kernel/cpu/sgx/reclaim.c
>  create mode 100644 arch/x86/kernel/cpu/sgx/sgx.h
>  create mode 100644 tools/testing/selftests/sgx/.gitignore
>  create mode 100644 tools/testing/selftests/sgx/Makefile
>  create mode 100644 tools/testing/selftests/sgx/call.S
>  create mode 100644 tools/testing/selftests/sgx/defines.h
>  create mode 100644 tools/testing/selftests/sgx/load.c
>  create mode 100644 tools/testing/selftests/sgx/main.c
>  create mode 100644 tools/testing/selftests/sgx/main.h
>  create mode 100644 tools/testing/selftests/sgx/sigstruct.c
>  create mode 100644 tools/testing/selftests/sgx/test_encl.c
>  create mode 100644 tools/testing/selftests/sgx/test_encl.lds
>  create mode 100644 tools/testing/selftests/sgx/test_encl_bootstrap.S
>
> --
> 2.25.1
>

Sean Christopherson May 6, 2020, 10:14 p.m. UTC | #20

On Wed, May 06, 2020 at 05:42:42PM -0400, Nathaniel McCallum wrote:
> Tested on Enarx. This requires a patch[0] for v29 support.
> 
> Tested-by: Nathaniel McCallum <npmccallum@redhat.com>
> 
> However, we did uncover a small usability issue. See below.
> 
> [0]: https://github.com/enarx/enarx/pull/507/commits/80da2352aba46aa7bc6b4d1fccf20fe1bda58662

...

> > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. anonymous) can
> >   be used to reserve the address range. Now /dev/sgx supports only opaque
> >   mappings to the (initialized) enclave data.
> 
> The statement "Any mapping..." isn't actually true.
> 
> Enarx creates a large enclave (currently 64GiB). This worked when we
> created a file-backed mapping on /dev/sgx/enclave. However, switching
> to an anonymous mapping fails with ENOMEM. We suspect this is because
> the kernel attempts to allocate all the pages and zero them but there
> is insufficient RAM available. We currently work around this by
> creating a shared mapping on /dev/zero.

Hmm, the kernel shouldn't actually allocate physical pages unless they're
written.  I'll see if I can reproduce.
 
> If we want to keep this mmap() strategy, we probably don't want to
> advise mmap(ANON) if it allocates all the memory for the enclave ahead
> of time, even if it won't be used. This would be wasteful.
> 
> OTOH, having to mmap("/dev/zero") seems a bit awkward.

Thomas Gleixner May 7, 2020, 12:41 a.m. UTC | #21

Greg,

"Dr. Greg" <greg@enjellic.com> writes:
> As an aside, for those who haven't spent the last 5+ years of their
> life working with this technology.  SGX2 hardware platforms have the
> ability to allow unrestricted code execution in enclave context.

Unrestricted code execution even before loaded? Unrestricted by
priviledge levels?

> No amount of LSM or IMA interventions can provide any control over
> that.

They can restrict what is started and loaded before anything SGX
happens.

If you want to make real technical arguments then please be specific and
precise and spare us the handwaving marketing speak.

> In fact, the Confidential Computing Consortium, sponsored by none
> other then the Linux Foundation, has at its fundamental tenant,

And that's relevant to the technical question in which way?

> the notion of developing an eco-system that allows the execution of
> code and processing of data, over which, the owner or administrator of
> the platform has no visibility or control.  It would seem only logical
> that adversarial interests would indulge themseleves in those
> capabilities as well.
>
> With respect to SGX and these issues, cryptographic policy management
> is important for the same reason that 2-factor authentication is now
> considered standard practice in the security industry.
>
> We appreciate, Jarkko, that you feel that such infrastructure is
> optional, like virtualization support, and is something that can go in
> after the driver is mainlined.  As the diffstat for our patch points
> out, the proposed support has virtually no impact on the driver, the
> same cannot be said for virtualization capabilities.

The diffstat of your patch is irrelevant. What's relevant is the fact
that it is completely unreviewed and that it creates yet another user
space visible ABI with a noticable lack of documentation. 

> Moreover, adding support for key based policy management later would
> require the addition of another ioctl in order to avoid ABI
> compatibility issues.

And that's a problem because? 

> The current initialization ioctl is best suited, from an engineering
> perspective, to support this type of infrastructure.

What's wrong with having IOCTL_INIT_TYPE_A and IOCTL_INIT_TYPE_B?

Nothing at all. It's pretty straight forward and in fact a better
solution than a duct taped multiplexing all in one IOCTL_INIT_PONIES
approach.

> In fact, the necessary support was removed from the ioctl for
> political reasons rather then for any valid engineering rationale on
> flexible launch control platforms, particularly with our patch or an
> equivalent approach.

You're surely making a convincing technical argument by claiming that
this was a political decision. The amount of non-technical, i.e.
political arguments in your mail is definitely larger than the technical
content.

> Hopefully this is a reasoned technical approach to what has been a
> long standing issue surrounding this technology.

It's an approach which guarantees that the driver will miss the next
merge window. If that's your intention, then please let us know.

Merging the current set of patches does not prevent anything you want to
be added. It's an extension to the base implementation and not a
prerequisite.

> Best wishes for a productive week.
>
> Dr. Greg

Thanks a lot for the best wishes. Unfortunately reading this email was
not necessarily productive for me, but I surely wish that you can make
productive use of my reply.

Thanks,

	tglx

> ------------------------------------------------------------------------------
> "Opportunity is missed by most people because it is dressed in overalls
>  and looks like work."
>                                 -- Thomas Edison

------------------------------------------------------------------------------
 "Failure is simply the opportunity to begin again, this time more
  intelligently"

				--  Henry Ford

Haitao Huang May 7, 2020, 5:02 a.m. UTC | #22

On Wed, 06 May 2020 17:14:22 -0500, Sean Christopherson  
<sean.j.christopherson@intel.com> wrote:

> On Wed, May 06, 2020 at 05:42:42PM -0400, Nathaniel McCallum wrote:
>> Tested on Enarx. This requires a patch[0] for v29 support.
>>
>> Tested-by: Nathaniel McCallum <npmccallum@redhat.com>
>>
>> However, we did uncover a small usability issue. See below.
>>
>> [0]:  
>> https://github.com/enarx/enarx/pull/507/commits/80da2352aba46aa7bc6b4d1fccf20fe1bda58662
>
> ...
>
>> > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g.  
>> anonymous) can
>> >   be used to reserve the address range. Now /dev/sgx supports only  
>> opaque
>> >   mappings to the (initialized) enclave data.
>>
>> The statement "Any mapping..." isn't actually true.
>>
>> Enarx creates a large enclave (currently 64GiB). This worked when we
>> created a file-backed mapping on /dev/sgx/enclave. However, switching
>> to an anonymous mapping fails with ENOMEM. We suspect this is because
>> the kernel attempts to allocate all the pages and zero them but there
>> is insufficient RAM available. We currently work around this by
>> creating a shared mapping on /dev/zero.
>
> Hmm, the kernel shouldn't actually allocate physical pages unless they're
> written.  I'll see if I can reproduce.
>

For larger size mmap, I think it requires enabling vm overcommit mode 1:
echo 1 | sudo tee /proc/sys/vm/overcommit_memory


>> If we want to keep this mmap() strategy, we probably don't want to
>> advise mmap(ANON) if it allocates all the memory for the enclave ahead
>> of time, even if it won't be used. This would be wasteful.
>>
>> OTOH, having to mmap("/dev/zero") seems a bit awkward.

Nathaniel McCallum May 7, 2020, 4:49 p.m. UTC | #23

On Thu, May 7, 2020 at 1:03 AM Haitao Huang
<haitao.huang@linux.intel.com> wrote:
>
> On Wed, 06 May 2020 17:14:22 -0500, Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
>
> > On Wed, May 06, 2020 at 05:42:42PM -0400, Nathaniel McCallum wrote:
> >> Tested on Enarx. This requires a patch[0] for v29 support.
> >>
> >> Tested-by: Nathaniel McCallum <npmccallum@redhat.com>
> >>
> >> However, we did uncover a small usability issue. See below.
> >>
> >> [0]:
> >> https://github.com/enarx/enarx/pull/507/commits/80da2352aba46aa7bc6b4d1fccf20fe1bda58662
> >
> > ...
> >
> >> > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g.
> >> anonymous) can
> >> >   be used to reserve the address range. Now /dev/sgx supports only
> >> opaque
> >> >   mappings to the (initialized) enclave data.
> >>
> >> The statement "Any mapping..." isn't actually true.
> >>
> >> Enarx creates a large enclave (currently 64GiB). This worked when we
> >> created a file-backed mapping on /dev/sgx/enclave. However, switching
> >> to an anonymous mapping fails with ENOMEM. We suspect this is because
> >> the kernel attempts to allocate all the pages and zero them but there
> >> is insufficient RAM available. We currently work around this by
> >> creating a shared mapping on /dev/zero.
> >
> > Hmm, the kernel shouldn't actually allocate physical pages unless they're
> > written.  I'll see if I can reproduce.
> >
>
> For larger size mmap, I think it requires enabling vm overcommit mode 1:
> echo 1 | sudo tee /proc/sys/vm/overcommit_memory

Which means the default experience isn't good.

Dr. Greg May 7, 2020, 6:06 p.m. UTC | #24

On Wed, May 06, 2020 at 09:39:55AM -0700, Jordan Hand wrote:

Good afternoon, I hope the week is going well for everyone.

> On 4/21/20 2:52 PM, Jarkko Sakkinen wrote:
> > Make the vDSO callable directly from C by preserving RBX and taking leaf
> >   from RCX.

> Tested with the Open Enclave SDK on top of Intel PSW. Specifically built 
> the Intel PSW with changes to support /dev/sgx mapping[1] new in v29.
> 
> Tested-by: Jordan Hand <jorhand@linux.microsoft.com>
> 
> [1] https://github.com/intel/linux-sgx/pull/530

Did you re-wire your SDK to convert all your ECALL and exception
handling to the new VDSO architecture?

Failures in enclave loading and initialization demonstrate themselves
pretty clearly and are in the domain of the PSW being used.  If there
are going to be subtle SGX application operability issues that need to
be found they will be in the new ECALL and exception handling
mechanisms.

Have a good remainder of the day.

Dr. Greg

As always,
Dr. Greg Wettstein, Ph.D, Worker      Artisans in autonomously
Enjellic Systems Development, LLC     self-defensive IOT platforms
4206 N. 19th Ave.                     and edge devices.
Fargo, ND  58102
PH: 701-281-1686                      EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"Davidsen's first rule of system administration:  He learns to swim fastest
 who is thrown in the deepest water."
                                -- Bill Davidsen

Sean Christopherson May 7, 2020, 7:34 p.m. UTC | #25

On Thu, May 07, 2020 at 12:49:15PM -0400, Nathaniel McCallum wrote:
> On Thu, May 7, 2020 at 1:03 AM Haitao Huang
> <haitao.huang@linux.intel.com> wrote:
> >
> > On Wed, 06 May 2020 17:14:22 -0500, Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> >
> > > On Wed, May 06, 2020 at 05:42:42PM -0400, Nathaniel McCallum wrote:
> > >> Tested on Enarx. This requires a patch[0] for v29 support.
> > >>
> > >> Tested-by: Nathaniel McCallum <npmccallum@redhat.com>
> > >>
> > >> However, we did uncover a small usability issue. See below.
> > >>
> > >> [0]:
> > >> https://github.com/enarx/enarx/pull/507/commits/80da2352aba46aa7bc6b4d1fccf20fe1bda58662
> > >
> > > ...
> > >
> > >> > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g.
> > >> anonymous) can
> > >> >   be used to reserve the address range. Now /dev/sgx supports only
> > >> opaque
> > >> >   mappings to the (initialized) enclave data.
> > >>
> > >> The statement "Any mapping..." isn't actually true.

Yeah, this definitely misleading.  I haven't looked at our most recent docs,
but I'm going to go out on a limb and assume we haven't documented the
preferred mechanism for carving out virtual memory for the enclave.  That
absolutely should be done.

> > >> Enarx creates a large enclave (currently 64GiB). This worked when we
> > >> created a file-backed mapping on /dev/sgx/enclave. However, switching
> > >> to an anonymous mapping fails with ENOMEM. We suspect this is because
> > >> the kernel attempts to allocate all the pages and zero them but there
> > >> is insufficient RAM available. We currently work around this by
> > >> creating a shared mapping on /dev/zero.
> > >
> > > Hmm, the kernel shouldn't actually allocate physical pages unless they're
> > > written.  I'll see if I can reproduce.
> > >
> >
> > For larger size mmap, I think it requires enabling vm overcommit mode 1:
> > echo 1 | sudo tee /proc/sys/vm/overcommit_memory

It shouldn't unless the initial mmap is "broken".  Not truly broken, but
broken in the sense that what Enarx is asking for is not actually what it
desires.
 
> Which means the default experience isn't good.

What PROT_* and MAP_* flags are passed to mmap()?  Overcommit only applies
to

	VM_WRITE (a.k.a. PROT_WRITE) && !VM_SHARED && !VM_NORESERVED

and, ignoring rlimits, VM expansion only applies to

	VM_WRITE && !VM_SHARED && !VM_STACK


So hopefully Enarx is doing something like

	base = mmap(NULL, 64gb, PROT_READ | PROT_WRITE,
		    MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

because that means this is effectively a userspace bug.  This goes back to
my comment about the mmap() being "broken".  Userspace is asking for a
writable, private mapping, in which case it absolutely should be accounted.

If using

	base = mmap(NULL, 64gb, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

works, then updating the SGX docs to better explain how to establish ELRANGE
is sufficient (we need to that in any case).  If the above still fails then
something else is in play.

Haitao Huang May 7, 2020, 10:31 p.m. UTC | #26

On Thu, 07 May 2020 11:49:15 -0500, Nathaniel McCallum  
<npmccallum@redhat.com> wrote:

> On Thu, May 7, 2020 at 1:03 AM Haitao Huang
> <haitao.huang@linux.intel.com> wrote:
>>
>> On Wed, 06 May 2020 17:14:22 -0500, Sean Christopherson
>> <sean.j.christopherson@intel.com> wrote:
>>
>> > On Wed, May 06, 2020 at 05:42:42PM -0400, Nathaniel McCallum wrote:
>> >> Tested on Enarx. This requires a patch[0] for v29 support.
>> >>
>> >> Tested-by: Nathaniel McCallum <npmccallum@redhat.com>
>> >>
>> >> However, we did uncover a small usability issue. See below.
>> >>
>> >> [0]:
>> >>  
>> https://github.com/enarx/enarx/pull/507/commits/80da2352aba46aa7bc6b4d1fccf20fe1bda58662
>> >
>> > ...
>> >
>> >> > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g.
>> >> anonymous) can
>> >> >   be used to reserve the address range. Now /dev/sgx supports only
>> >> opaque
>> >> >   mappings to the (initialized) enclave data.
>> >>
>> >> The statement "Any mapping..." isn't actually true.
>> >>
>> >> Enarx creates a large enclave (currently 64GiB). This worked when we
>> >> created a file-backed mapping on /dev/sgx/enclave. However, switching
>> >> to an anonymous mapping fails with ENOMEM. We suspect this is because
>> >> the kernel attempts to allocate all the pages and zero them but there
>> >> is insufficient RAM available. We currently work around this by
>> >> creating a shared mapping on /dev/zero.
>> >
>> > Hmm, the kernel shouldn't actually allocate physical pages unless  
>> they're
>> > written.  I'll see if I can reproduce.
>> >
>>
>> For larger size mmap, I think it requires enabling vm overcommit mode 1:
>> echo 1 | sudo tee /proc/sys/vm/overcommit_memory
>
> Which means the default experience isn't good.
>


Yes, it is not good default. But this is not sgx specific IIUC. Normal  
applications would have the same issue if they ask for large mapping than  
whatever limit kernel enforces by default.

Haitao Huang May 7, 2020, 10:35 p.m. UTC | #27

On Thu, 07 May 2020 14:34:59 -0500, Sean Christopherson  
<sean.j.christopherson@intel.com> wrote:

> On Thu, May 07, 2020 at 12:49:15PM -0400, Nathaniel McCallum wrote:
>> On Thu, May 7, 2020 at 1:03 AM Haitao Huang
>> <haitao.huang@linux.intel.com> wrote:
>> >
>> > On Wed, 06 May 2020 17:14:22 -0500, Sean Christopherson
>> > <sean.j.christopherson@intel.com> wrote:
>> >
>> > > On Wed, May 06, 2020 at 05:42:42PM -0400, Nathaniel McCallum wrote:
>> > >> Tested on Enarx. This requires a patch[0] for v29 support.
>> > >>
>> > >> Tested-by: Nathaniel McCallum <npmccallum@redhat.com>
>> > >>
>> > >> However, we did uncover a small usability issue. See below.
>> > >>
>> > >> [0]:
>> > >>  
>> https://github.com/enarx/enarx/pull/507/commits/80da2352aba46aa7bc6b4d1fccf20fe1bda58662
>> > >
>> > > ...
>> > >
>> > >> > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g.
>> > >> anonymous) can
>> > >> >   be used to reserve the address range. Now /dev/sgx supports  
>> only
>> > >> opaque
>> > >> >   mappings to the (initialized) enclave data.
>> > >>
>> > >> The statement "Any mapping..." isn't actually true.
>
> Yeah, this definitely misleading.  I haven't looked at our most recent  
> docs,
> but I'm going to go out on a limb and assume we haven't documented the
> preferred mechanism for carving out virtual memory for the enclave.  That
> absolutely should be done.
>
>> > >> Enarx creates a large enclave (currently 64GiB). This worked when  
>> we
>> > >> created a file-backed mapping on /dev/sgx/enclave. However,  
>> switching
>> > >> to an anonymous mapping fails with ENOMEM. We suspect this is  
>> because
>> > >> the kernel attempts to allocate all the pages and zero them but  
>> there
>> > >> is insufficient RAM available. We currently work around this by
>> > >> creating a shared mapping on /dev/zero.
>> > >
>> > > Hmm, the kernel shouldn't actually allocate physical pages unless  
>> they're
>> > > written.  I'll see if I can reproduce.
>> > >
>> >
>> > For larger size mmap, I think it requires enabling vm overcommit mode  
>> 1:
>> > echo 1 | sudo tee /proc/sys/vm/overcommit_memory
>
> It shouldn't unless the initial mmap is "broken".  Not truly broken, but
> broken in the sense that what Enarx is asking for is not actually what it
> desires.
>
So I tried, this passes with mode 1 but fail with ENOMEM by default:

mmap(NULL, 0x100000000000UL, PROT_NONE, MAP_SHARED| MAP_ANONYMOUS, -1, 0);

Sean Christopherson May 8, 2020, 12:25 a.m. UTC | #28

On Thu, May 07, 2020 at 05:35:31PM -0500, Haitao Huang wrote:
> On Thu, 07 May 2020 14:34:59 -0500, Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> 
> >On Thu, May 07, 2020 at 12:49:15PM -0400, Nathaniel McCallum wrote:
> >>> For larger size mmap, I think it requires enabling vm overcommit mode
> >>1:
> >>> echo 1 | sudo tee /proc/sys/vm/overcommit_memory
> >
> >It shouldn't unless the initial mmap is "broken".  Not truly broken, but
> >broken in the sense that what Enarx is asking for is not actually what it
> >desires.
> >
> So I tried, this passes with mode 1 but fail with ENOMEM by default:
> 
> mmap(NULL, 0x100000000000UL, PROT_NONE, MAP_SHARED| MAP_ANONYMOUS, -1, 0);

Ah, fudge.  shmem_zero_setup() triggers shmem_acct_size() and thus
__vm_enough_memory().  Which I should have rememered because I've stared
at that code several times when dealing with the enclave's backing store.
I wasn't seeing the issue because I happened to use MAP_PRIVATE.

So, bad analysis, good conclusion, i.e. the kernel is still doing the
right thing, it's just not ideal for userspace.

Jarkko, we should update the docs and selftest to recommend and use

  PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS

or

  PROT_NONE, MAP_SHARED | MAP_NORESERVE | MAP_ANONYMOUS"

when carving out ELRANGE, with an explicit comment that all the normal
rules for mapping memory still apply.

Andy Lutomirski May 8, 2020, 12:40 a.m. UTC | #29

On Wed, Apr 29, 2020 at 8:30 AM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Sun, Apr 26, 2020 at 11:57:53AM -0500, Dr. Greg wrote:
> > In closing, it is important to note that the proposed SGX driver is
> > not available as a module.  This effectively excludes any alternative
> > implementations of the driver without replacement of the kernel at
> > large.
>
> No it doesn't.  The SGX subsytem won't allocate EPC pages unless userspace
> creates an enclave, i.e. preventing unprivileged userspace from accessing
> /dev/sgx/enclave will allow loading an alternative out-of-tree SGX module.
> Yes, SGX sanitizes the EPC on boot, but that's arguably a good thing for
> out-of-tree modules.
>
> And if you want to get crafty and squash in-kernel SGX altogether, boot
> with "clearcpuid=<SGX_LC>" and/or "clearcpuid=<SGX>" to disable in-kernel
> support entirely.  SGX won't be correctly enumerated in /proc/cpuinfo
> relative to the existence of an out-of-tree module, but that seems like a
> very minor issue if you're running with a completely different SGX driver.
>
> > It also means that any platform, with SGX hardware support,
> > running a kernel with this driver, has the potential for the
> > security/privacy issues noted above.
>
> Unless I'm mistaken, /dev/sgx is root-only by default.  There are far
> scarier mechanisms available to root for hosing the system.
>
> > If key based policy management is not allowed, then the driver needs
> > to be re-architected to have modular support so that alternative
> > implementations or the absence of any driver support are at least
> > tenable.
>
> As above, using an alternative implementation is teneble, albeit a bit
> kludgy.

It is worth noting that, if someone actualy does this, and a future
kernel patch breaks it, the upstream developers are unlikely to
apologize or even feel particularly bad.  See, for example, the
current situation with VirtualBox.

Jordan Hand May 8, 2020, 4:16 p.m. UTC | #30

On 5/7/20 11:06 AM, Dr. Greg wrote:
> On Wed, May 06, 2020 at 09:39:55AM -0700, Jordan Hand wrote:
> 
> Good afternoon, I hope the week is going well for everyone.
> 
>> On 4/21/20 2:52 PM, Jarkko Sakkinen wrote:
>>> Make the vDSO callable directly from C by preserving RBX and taking leaf
>>>    from RCX.
> 
>> Tested with the Open Enclave SDK on top of Intel PSW. Specifically built
>> the Intel PSW with changes to support /dev/sgx mapping[1] new in v29.
>>
>> Tested-by: Jordan Hand <jorhand@linux.microsoft.com>
>>
>> [1] https://github.com/intel/linux-sgx/pull/530
> 
> Did you re-wire your SDK to convert all your ECALL and exception
> handling to the new VDSO architecture?
 >

No. We have many users of our SDK who rely on the out-of-tree driver and 
will for the foreseeable future. I aim to support both in-tree and 
out-of-tree with minimal code diff.

>
> Failures in enclave loading and initialization demonstrate themselves
> pretty clearly and are in the domain of the PSW being used.  If there
> are going to be subtle SGX application operability issues that need to
> be found they will be in the new ECALL and exception handling
> mechanisms.

Fair enough, no I have not tested those mechanisms. Apologies, I should 
have removed that line from the quoted text.

-Jordan

Dr. Greg May 8, 2020, 7:02 p.m. UTC | #31

On Thu, May 07, 2020 at 02:41:30AM +0200, Thomas Gleixner wrote:

> Greg,

Good morning Thomas, I hope the week has gone well for you, the same
to everyone else reading this.

> "Dr. Greg" <greg@enjellic.com> writes:
> > As an aside, for those who haven't spent the last 5+ years of their
> > life working with this technology.  SGX2 hardware platforms have the
> > ability to allow unrestricted code execution in enclave context.

> Unrestricted code execution even before loaded? Unrestricted by
> priviledge levels?

The LSM/IMA infrastructure will have no visibility into code that will
be executed and data processed in enclave context, see immediately
below.

> > No amount of LSM or IMA interventions can provide any control over
> > that.

> They can restrict what is started and loaded before anything SGX
> happens.

At best, the visibility and inspection will be over a standard
bootstrap loader of some type is in no way related to the actual code
that will be executed or data processed in enclave context.
Furthermore, given what is becoming the excepted software architecture
for the SGX industry, that code will largely have full access to
system resources.

> If you want to make real technical arguments then please be specific
> and precise and spare us the handwaving marketing speak.

Thomas, you don't know me from boo, those that do know me and have
worked for me would tell you with absolute certainty that being a
'handwavy marketing type' is the complete antithesis of my character
and how I practice engineering.

Andy wanted to know why I felt the current driver disadvantaged our
work, I provided a technical summary, omitted completely in your
response, that indicated how we are using SGX technology in a manner
that is inherently different from the rest of the industry.

> > In fact, the Confidential Computing Consortium, sponsored by none
> > other then the Linux Foundation, has at its fundamental tenant,

> And that's relevant to the technical question in which way?

It speaks directly to the primary marketing objective that is driving
the economics of what this technology is going to turn into.  The
metaphor for objective, that now seems to be generally accepted, is
'Runtime Encryption'.

One of the most common threads on the SGX developer's list is how
developers can restrict the ability of other's to see what their
enclaves are doing or the code in them.  The standard response is
'nothing', a security context has to be established through remote
attestation and the confidential material then conveyed into the
enclave using appropriate confidentiality and integrity protections.

As further technical evidence for all of this, I would refer readers
to the following tag in the Intel Linux SGX SDK:

sgx_2.1.3

That tags the release that introduced the implementation of Protected
Code Loader (PCL) support.  This allows developers to create enclaves
that are encrypted at rest and then decrypted, loaded and executed by
a second bootstrap loader enclave that protects the confidentiality of
the first enclave.

SGX2 hardware support makes the concept of protected code loading, at
once, more practical, more efficient and closes a visibility
vulnerability that the current PCL model has.

The impact of this has to be viewed in the context of the economic
market forces that are driving 'runtime encryption'.

The original SGX programming model promoted by Intel was to partition
applications so that sensitive data, and the code that operated on it,
were inside of an enclave.  That approach was doomed by the economic
factors driving software development which are roughly as follows:

1.) Ease of development.

2.) Cost of development.

3.) Time to market.

4.) Return on investment.

5.) Security

In about that order, although there are probably a half dozen other
factors between 4 and 5.

As a result, the 'runtime encryption' industry moved to a library OS
model where unmodified applications can be run in an enclave along
along with full interpreter environments such as Java, Python,
Javascript and WASM.  In this architecture all of the requests for
operating system resources are shimmed through OCALL's to be serviced
by the OS.

If you combine all of these factors and influences, you end up with,
looking forward, to an architecture that will favor a small bootstrap
loader that downloads and executes code, over protected channels, that
has almost full access to operating system resources.  In this model
the only visibility that the platform owner has, and by extension the
LSM/IMA infrastructure, is the bootstrap loader itself.

This architecture will also be driven by how attractive the concept is
to the devops model and cloud orchestration in general.

Our technical arguement is that it does not seem unreasonable for the
Linux driver, at the discretion of the platform owner, to be able to
implement the equivalent of 2-factor authentication over the
initiation of such execution.

I apologize for the detail and hope it is at once both suitably
technical and 'non-hand-wavy'.

> > the notion of developing an eco-system that allows the execution of
> > code and processing of data, over which, the owner or administrator of
> > the platform has no visibility or control.  It would seem only logical
> > that adversarial interests would indulge themseleves in those
> > capabilities as well.
> >
> > With respect to SGX and these issues, cryptographic policy management
> > is important for the same reason that 2-factor authentication is now
> > considered standard practice in the security industry.
> >
> > We appreciate, Jarkko, that you feel that such infrastructure is
> > optional, like virtualization support, and is something that can go in
> > after the driver is mainlined.  As the diffstat for our patch points
> > out, the proposed support has virtually no impact on the driver, the
> > same cannot be said for virtualization capabilities.

> The diffstat of your patch is irrelevant. What's relevant is the
> fact that it is completely unreviewed and that it creates yet
> another user space visible ABI with a noticable lack of
> documentation.

A number of points on this issue.

If anyone cares to review the patch, it has a 73 line commit message
that describes how the architecture works.  That would obviously be
embellished and added to the general documentation.

We posted the initial concept implementation of this infrastructure 14
months ago.  Andy, rightly so, indicated the design was unclean.  The
simplification of the SGX driver at large over the last year allowed a
much more straight forward implementation of the patch.

This version of the patch has been posted twice in the last three
months, in response to the two major architectural revisions to the
driver that have occurred.

Jarkko indicated a year ago that our approach would 'bloat' the
driver.  A common criticism of patches in general on LKML is that they
complicate sub-systems.  I believe that diffstats are generally
recognized as cognitive indicators of the amount of bloat, complexity
and impact that a proposed patch introduces.

As to lack of review, we would certainly welcome technical and API
comments but we cannot force them.  Candidly, the only people capable
of working with the patch are groups that have full and complete
runtime implementations that they can modify and test and that group
is extremely limited.

> > Moreover, adding support for key based policy management later would
> > require the addition of another ioctl in order to avoid ABI
> > compatibility issues.

> And that's a problem because? 

See below.

> > The current initialization ioctl is best suited, from an engineering
> > perspective, to support this type of infrastructure.

> What's wrong with having IOCTL_INIT_TYPE_A and IOCTL_INIT_TYPE_B?
>
> Nothing at all. It's pretty straight forward and in fact a better
> solution than a duct taped multiplexing all in one IOCTL_INIT_PONIES
> approach.

I believe that a review of our patch would indicate that a 'duct taped
multplexing INIT_PONIES ioctl' is not a technically honest assessment
of what we are proposing.  There is no flag or multi-case variable
implementation, the patch simply re-adds a pointer to a structure that
was in the previously out of tree driver.

In fact, anyone who reviews the patch will see that the current driver
creates a pointer in the ioctl call to pass downward into the hardware
initialization routines.  Our code simply replaces that pointer with
one supplied by userspace.

That being said, we could certainly wire up a second ioctl and use
that.  Candidly, under normal circumstances, that would arguably raise
a bloat accusation, since why would a second ioctl be implemented when
there is an already fully functional and mono-purpose ioctl that
triggers enclave initialization.

> > In fact, the necessary support was removed from the ioctl for
> > political reasons rather then for any valid engineering rationale on
> > flexible launch control platforms, particularly with our patch or an
> > equivalent approach.

> You're surely making a convincing technical argument by claiming
> that this was a political decision. The amount of non-technical,
> i.e.  political arguments in your mail is definitely larger than the
> technical content.

There is the adage out here in the Upper Midwest, shared elsewhere,
that you shouldn't bring a knife to a gunfight, to date the issue of
cryptographic policy management has been exclusively political and
decidedly non-technical.

We have tried to respond by demonstrating, with code, that a minimum
impact technical approach is possible.  To date the only response has
been that we need to get this driver into the kernel as fast as we
can and then deal with other issues.

Candidly, this is equivalent to, 'lets hurry up and ship it and then
we can worry about bugs and security issues', that plagues the rest of
the software industry.

> > Hopefully this is a reasoned technical approach to what has been a
> > long standing issue surrounding this technology.

> It's an approach which guarantees that the driver will miss the next
> merge window. If that's your intention, then please let us know.

I believe that a dispassionate observer, reviewing the last 2-3 years
of LKML conversations surrounding SGX, would not conclude that our
actions have delayed the driver.

Candidly, the issue may be coming down to a question as to whether or
not the driver should go into the kernel.

I pride myself on extreme technical honesty, I assume everyone else
does, so it probably needs to be taken into consideration that it is
now common knowledge that Intel is dropping support for SGX, at least
on the client side:

https://software.intel.com/en-us/forums/intel-software-guard-extensions-intel-sgx/topic/850599

https://www.techspot.com/news/84506-leaked-intel-rocket-lake-s-diagram-highlights-platform.html

My understanding is that getting SGX enabled on Comet Lake platforms
requires a discussion with vendors regarding a customized
BIOS/firmware implementation for the platform that enables the SGX
technology that is now being left silent on the chip/firmware.

A pity really, the technology arguably had a lot of legs with respect
to the capabilities that it brought to edge security but that is
another topic in and of itself.

SGX takes a lot of expensive silicon real estate.  Whether or not that
price continues to get paid is going to depend on whether or not there
is sufficient market 'pull' from entities who want to push code and
data up into the cloud so that it can be executed without anyone
knowing what it is doing.

Hence our continued advocacy for a driver architecture that allows
stronger protections to be applied to that process, without affecting
the default behavior of the driver.

At the very least a modular form of the driver should be considered
that would allow alternate implementations.  Sean indicated that there
was a 'kludgy' approach that would allow an alternate modular
implementation alongside the in-kernel driver.  I believe that Andy
has already indicated that may not be an advisable approach.

> Merging the current set of patches does not prevent anything you want to
> be added. It's an extension to the base implementation and not a
> prerequisite.

My response to that is that a method for the driver to implement the
equivalent of 2-factor authentication over unrestricted code execution
and data manipulation, that does not affect the standard driver
behavior, needs to be part of the base implementation.

> > Best wishes for a productive week.
> >
> > Dr. Greg

> Thanks a lot for the best wishes. Unfortunately reading this email was
> not necessarily productive for me, but I surely wish that you can make
> productive use of my reply.

I'm sorry that you found reading my e-mail to be a waste of your time,
hopefully the time you took to respond has now allowed everyone
reading along at home to enjoy a thorough review of the issues at
hand.

In a precise and non-handwavy sort of fashion....

> Thanks,
> 
> 	tglx

At the risk of wasting more of everyone's time, best wishes for a
productive weekend.

Dr. Greg

As always,
Dr. Greg Wettstein, Ph.D, Worker      Developers of autonomously
Enjellic Systems Development, LLC     self-defensive IOT platforms
4206 N. 19th Ave.                     and edge devices.
Fargo, ND  58102
PH: 701-281-1686                      EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"Don't worry about people stealing your ideas.  If your ideas are any
 good, you'll have to ram them down people's throats."
                                -- Howard Aiken

Sean Christopherson May 8, 2020, 7:08 p.m. UTC | #32

Adding some Google folks to the party.

On Wed, Apr 22, 2020 at 12:52:56AM +0300, Jarkko Sakkinen wrote:
> Intel(R) SGX is a set of CPU instructions that can be used by applications
> to set aside private regions of code and data. The code outside the enclave
> is disallowed to access the memory inside the enclave by the CPU access
> control.
> 
> There is a new hardware unit in the processor called Memory Encryption
> Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define
> one or many MEE regions that can hold enclave data by configuring them with
> PRMRR registers.
> 
> The MEE automatically encrypts the data leaving the processor package to
> the MEE regions. The data is encrypted using a random key whose life-time
> is exactly one power cycle.
> 
> The current implementation requires that the firmware sets
> IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can
> decide what enclaves it wants run. The implementation does not create
> any bottlenecks to support read-only MSRs later on.
> 
> You can tell if your CPU supports SGX by looking into /proc/cpuinfo:
> 
> 	cat /proc/cpuinfo  | grep sgx
> 
> v29:
> * The selftest has been moved to selftests/sgx. Because SGX is an execution
>   environment of its own, it really isn't a great fit with more "standard"
>   x86 tests.
> 
>   The RSA key is now generated on fly and the whole signing process has
>   been made as part of the enclave loader instead of signing the enclave
>   during the compilation time.
> 
>   Finally, the enclave loader loads now the test enclave directly from its
>   ELF file, which means that ELF file does not need to be coverted as raw
>   binary during the build process.
> * Version the mm_list instead of using on synchronize_mm() when adding new
>   entries. We hold the write lock for the mm_struct, and dup_mm() can thus
>   deadlock with the page reclaimer, which could hold the lock for the old
>   mm_struct.
> * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. anonymous) can
>   be used to reserve the address range. Now /dev/sgx supports only opaque
>   mappings to the (initialized) enclave data.
> * Make the vDSO callable directly from C by preserving RBX and taking leaf
>   from RCX.
> 
> v28:
> * Documented to Documentation/x86/sgx.rst how the kernel manages the
>   enclave ownership.
> * Removed non-LC flow from sgx_einit().
> * Removed struct sgx_einittoken since only the size of the corresponding
>   microarchitectural structure is used in the series ATM.
> 
> v27:
> * Disallow RIE processes to use enclaves as there could a permission
>   conflict between VMA and enclave permissions.
> * In the documentation, replace "grep /proc/cpuinfo" with
>   "grep sgx /proc/cpuinfo".
> 
> v26:
> * Fixed the commit author in "x86/sgx: Linux Enclave Driver", which was
>   changed in v25 by mistake.
> * Addressed a bunch of grammar mistakes in sgx.rst (thanks Randy once
>   again for such a detailed feedback).
> * Added back the MAINTAINERS update commit, which was mistakenly removed
>   in v25.
> * EREMOVE's for SECS cannot be done while sanitizing an EPC section. The
>   CPU does not allow to remove a SECS page before all of its children have
>   been removed, and a child page can be in some other section than the one
>   currently being processed. Thus, removed special SECS processing from
>   sgx_sanitize_page() and instead put sections through it twice. In the
>   2nd round the lists should only contain SECS pages.
> 
> v25:
> * Fix a double-free issue when SGX_IOC_ENCLAVE_ADD_PAGES
>   fails on executing ENCLS[EADD]. The rollback path executed
>   radix_tree_delete() on the same address twice when this happened.
> * Return -EINTR instead of -ERESTARTSYS in SGX_IOC_ENCLAVE_ADD_PAGES when
>   a signal is pending.
> * As requested by Borislav, move the CPUID 0x12 features to their own word
>   in cpufeatures.
> * Sean fixed a bug from sgx_reclaimer_write() where sgx_encl_put_backing()
>   was called with an uninitialized pointer when sgx_encl_get_backing()
>   fails.
> * Migrated /dev/sgx/* to misc. This is future-proof as struct miscdevice
>   has 'groups' for setting up sysfs attributes for the device.
> * Use device_initcall instead of subsys_initcall so that misc_class is
>   initialized before SGX is initialized.
> * Return -EACCES in SGX_IOC_ENCLAVE_INIT when caller tries to select
>   enclave attributes that we the kernel does not allow it to set instead
>   of -EINVAL.
> * Unless SGX public key MSRs are writable always deny the feature from
>   Linux. Previously this was only denied from driver. How VMs should be
>   supported is not really part of initial patch set, which makes this
>   an obvious choice.
> * Cleaned up and refined documentation to be more approachable.
> 
> v24:
> * Reclaim unmeasured and TCS pages (regression in v23).
> * Replace usages of GFP_HIGHUSER with GFP_KERNEL.
> * Return -EIO on when EADD or EEXTEND fails in %SGX_IOC_ENCLAVE_ADD_PAGES
>   and use the same rollback (destroy enclave). This can happen when host
>   suspends itself unknowingly to a VM running enclaves. From -EIO the user
>   space can deduce what happened.
> * Have a separate @count in struct sgx_enclave_add_pages to output number
>   of bytes processed instead of overwriting the input parameters for
>   clarity and more importantly that the API provides means for partial
>   processing (@count could be less than @length in success case).
> 
> v23:
> * Replace SGX_ENCLAVE_ADD_PAGE with SGX_ENCLAVE_ADD_PAGES. Replace @mrmask
>   with %SGX_PAGE_MEASURE flag.
> * Return -EIO instead of -ECANCELED when ptrace() fails to read a TCS page.
> * In the reclaimer, pin page before ENCLS[EBLOCK] because pinning can fail
>   (because of OOM) even in legit behaviour and after EBLOCK the reclaiming
>   flow can be only reverted by killing the whole enclave.
> * Fixed SGX_ATTR_RESERVED_MASK. Bit 7 was marked as reserved while in fact
>   it should have been bit 6 (Table 37-3 in the SDM).
> * Return -EPERM from SGX_IOC_ENCLAVE_INIT when ENCLS[EINIT] returns an SGX
>   error code.
> 
> v22:
> * Refined bunch commit messages and added associated SDM references as
>   many of them were too exhausting and some outdated.
> * Alignment checks have been removed from mmap() because it does not define the
>   ELRANGE. VMAs only act as windows to the enclave. The semantics compare
>   somewhat how mmap() works with regular files.
> * We now require user space addresses given to SGX_IOC_ENCLAVE_ADD_PAGE to be
>   page aligned so that we can pass the page directly to EADD and do not have
>   to do an extra copy. This was made effectively possible by removing the
>   worker thread for adding pages.
> * The selftest build files have been refined throughout of various glitches
>   and work properly in a cross compilation environment such as BuildRoot.
>   In addition, libcalls fail the build with an assertion in the linker
>   script, if they end up to the enclave binary.
> * CONFIG_INTEL_SGX_DRIVER has been removed because you cannot use SGX core
>   for anything without having the driver. This could change when KVM support
>   is added.
> * We require zero permissions in SECINFO for TCS pages because the CPU
>   overwrites SECINFO flags with zero permissions and measures the page
>   only after that. Allowing to pass TCS with non-zero permissions would
>   cause mismatching measurement between the one provided in SIGSTRUCT and
>   the one computed by the CPU.
> * Obviously lots of small fixes and clean ups (does make sense to
>   document them all).
> 
> v21:
> * Check on mmap() that the VMA does cover an area that does not have
>   enclave pages. Only mapping with PROT_NONE can do that to reserve
>   initial address space for an enclave.
> * Check om mmap() and mprotect() that the VMA permissions do not
>   surpass the enclave permissions.
> * Remove two refcounts from vma_close(): mm_list and encl->refcount.
>   Enclave refcount is only need for swapper/enclave sync and we can
>   remove mm_list refcount by destroying mm_struct when the process
>   is closed. By not having vm_close() the Linux MM can merge VMAs.
> * Do not naturally align MAP_FIXED address.
> * Numerous small fixes and clean ups.
> * Use SRCU for synchronizing the list of mm_struct's.
> * Move to stack based call convention in the vDSO.
> 
> v20:
> * Fine-tune Kconfig messages and spacing and remove MMU_NOTIFIER
>   dependency as MMU notifiers are no longer used in the driver.
> * Use mm_users instead of mm_count as refcount for mm_struct as mm_count
>   only protects from deleting mm_struct, not removing its contents.
> * Sanitize EPC when the reclaimer thread starts by doing EREMOVE for all
>   of them. They could be in initialized state when the kernel starts
>   because it might be spawned by kexec().
> * Documentation overhaul.
> * Use a device /dev/sgx/provision for delivering the provision token
>   instead of securityfs.
> * Create a reference to the enclave when already when opening
>   /dev/sgx/enclave.  The file is then associated with this enclave only.
>   mmap() can be done at free at any point and always get a reference to
>   the enclave. To summarize the file now represents the enclave.
> 
> v19:
> * Took 3-4 months but in some sense this was more like a rewrite of most
>   of the corners of the source code. If I've forgotten to deal with some
>   feedback, please don't shout me. Make a remark and I will fix it for
>   the next version. Hopefully there won't be this big turnovers anymore.
> * Validate SECS attributes properly against CPUID given attributes and
>   against allowed attributes. SECS attributes are the ones that are
>   enforced whereas SIGSTRUCT attributes tell what is required to run
>   the enclave.
> * Add KSS (Key Sharing Support) to the enclave attributes.
> * Deny MAP_PRIVATE as an enclave is always a shared memory entity.
> * Revert back to shmem backing storage so that it can be easily shared
>   by multiple processes.
> * Split the recognization of an ENCLS leaf failure by using three
>   functions to detect it: encsl_faulted(), encls_returned_code() and
>   sgx_failed(). encls_failed() is only caused by a spurious expections that
>   should never happen. Thus, it is not defined as an inline function in
>   order to easily insert a kprobe to it.
> * Move low-level enclave management routines, page fault handler and page
>   reclaiming routines from driver to the core. These cannot be separated
>   from each other as they are heavily interdependent. The rationale is that
>   the core does not call any code from the driver.
> * Allow the driver to be compiled as a module now that it no code is using
>   its routines and it only uses exported symbols. Now the driver is
>   essentially just a thin ioctl layer.
> * Reworked the driver to maintain a list of mm_struct's. The VMA callbacks
>   add new entries to this list as the process is forked. Each entry has
>   its own refcount because they have a different life-cycle as the enclave
>   does. In effect @tgid and @mm have been removed from struct sgx_encl
>   and we allow forking by removing VM_DONTCOPY from vm flags.
> * Generate a cpu mask in the reclaimer from the cpu mask's of all
>   mm_struct's. This will kick out the hardware threads out of the enclave
>   from multiple processes. It is not a local variable because it would
>   eat too much of the stack space but instead a field in struct
>   sgx_encl.
> * Allow forking i.e. remove VM_DONTCOPY. I did not change the API
>   because the old API scaled to the workload that Andy described. The
>   codebase is now mostly API independent i.e. changing the API is a
>   small task. For me the proper trigger to chanage it is a as concrete
>   as possible workload that cannot be fulfilled. I hope you understand
>   my thinking here. I don't want to change anything w/o proper basis
>   but I'm ready to change anything if there is a proper basis. I do
>   not have any kind of attachment to any particular type of API.
> * Add Sean's vDSO ENCLS(EENTER) patches and update selftest to use the
>   new vDSO.
> 
> v18:
> * Update the ioctl-number.txt.
> * Move the driver under arch/x86.
> * Add SGX features (SGX, SGX1, SGX2) to the disabled-features.h.
> * Rename the selftest as test_sgx (previously sgx-selftest).
> * In order to enable process accounting, swap EPC pages and PCMD's to a VMA
>   instead of shmem.
> * Allow only to initialize and run enclaves with a subset of
>   {DEBUG, MODE64BIT} set.
> * Add SGX_IOC_ENCLAVE_SET_ATTRIBUTE to allow an enclave to have privileged
>   attributes e.g. PROVISIONKEY.
> 
> v17:
> * Add a simple selftest.
> * Fix a null pointer dereference to section->pages when its
>   allocation fails.
> * Add Sean's description of the exception handling to the documentation.
> 
> v16:
> * Fixed SOB's in the commits that were a bit corrupted in v15.
> * Implemented exceptio handling properly to detect_sgx().
> * Use GENMASK() to define SGX_CPUID_SUB_LEAF_TYPE_MASK.
> * Updated the documentation to use rst definition lists.
> * Added the missing Documentation/x86/index.rst, which has a link to
>   intel_sgx.rst. Now the SGX and uapi documentation is properly generated
>   with 'make htmldocs'.
> * While enumerating EPC sections, if an undefined section is found, fail
>   the driver initialization instead of continuing the initialization.
> * Issue a warning if there are more than %SGX_MAX_EPC_SECTIONS.
> * Remove copyright notice from arch/x86/include/asm/sgx.h.
> * Migrated from ioremap_cache() to memremap().
> 
> v15:
> * Split into more digestable size patches.
> * Lots of small fixes and clean ups.
> * Signal a "plain" SIGSEGV on an EPCM violation.
> 
> v14:
> * Change the comment about X86_FEATURE_SGX_LC from “SGX launch
>   configuration” to “SGX launch control”.
> * Move the SGX-related CPU feature flags as part of the Linux defined
>   virtual leaf 8.
> * Add SGX_ prefix to the constants defining the ENCLS leaf functions.
> * Use GENMASK*() and BIT*() in sgx_arch.h instead of raw hex numbers.
> * Refine the long description for CONFIG_INTEL_SGX_CORE.
> * Do not use pr_*_ratelimited()  in the driver. The use of the rate limited
>   versions is legacy cruft from the prototyping phase.
> * Detect sleep with SGX_INVALID_EINIT_TOKEN instead of counting power
>   cycles.
> * Manually prefix with “sgx:” in the core SGX code instead of redefining
>   pr_fmt.
> * Report if IA32_SGXLEPUBKEYHASHx MSRs are not writable in the driver
>   instead of core because it is a driver requirement.
> * Change prompt to bool in the entry for CONFIG_INTEL_SGX_CORE because the
>   default is ‘n’.
> * Rename struct sgx_epc_bank as struct sgx_epc_section in order to match
>   the SDM.
> * Allocate struct sgx_epc_page instances one at a time.
> * Use “__iomem void *” pointers for the mapped EPC memory consistently.
> * Retry once on SGX_INVALID_TOKEN in sgx_einit() instead of counting power
>   cycles.
> * Call enclave swapping operations directly from the driver instead of
>   calling them .indirectly through struct sgx_epc_page_ops because indirect
>   calls are not required yet as the patch set does not contain the KVM
>   support.
> * Added special signal SEGV_SGXERR to notify about SGX EPCM violation
>   errors.
> 
> v13:
> * Always use SGX_CPUID constant instead of a hardcoded value.
> * Simplified and documented the macros and functions for ENCLS leaves.
> * Enable sgx_free_page() to free active enclave pages on demand
>   in order to allow sgx_invalidate() to delete enclave pages.
>   It no longer performs EREMOVE if a page is in the process of
>   being reclaimed.
> * Use PM notifier per enclave so that we don't have to traverse
>   the global list of active EPC pages to find enclaves.
> * Removed unused SGX_LE_ROLLBACK constant from uapi/asm/sgx.h
> * Always use ioremap() to map EPC banks as we only support 64-bit kernel.
> * Invalidate IA32_SGXLEPUBKEYHASH cache used by sgx_einit() when going
>   to sleep.
> 
> v12:
> * Split to more narrow scoped commits in order to ease the review process and
>   use co-developed-by tag for co-authors of commits instead of listing them in
>   the source files.
> * Removed cruft EXPORT_SYMBOL() declarations and converted to static variables.
> * Removed in-kernel LE i.e. this version of the SGX software stack only
>   supports unlocked IA32_SGXLEPUBKEYHASHx MSRs.
> * Refined documentation on launching enclaves, swapping and enclave
>   construction.
> * Refined sgx_arch.h to include alignment information for every struct that
>   requires it and removed structs that are not needed without an LE.
> * Got rid of SGX_CPUID.
> * SGX detection now prints log messages about firmware configuration issues.
> 
> v11:
> * Polished ENCLS wrappers with refined exception handling.
> * ksgxswapd was not stopped (regression in v5) in
>   sgx_page_cache_teardown(), which causes a leaked kthread after driver
>   deinitialization.
> * Shutdown sgx_le_proxy when going to suspend because its EPC pages will be
>   invalidated when resuming, which will cause it not function properly
>   anymore.
> * Set EINITTOKEN.VALID to zero for a token that is passed when
>   SGXLEPUBKEYHASH matches MRSIGNER as alloc_page() does not give a zero
>   page.
> * Fixed the check in sgx_edbgrd() for a TCS page. Allowed to read offsets
>   around the flags field, which causes a #GP. Only flags read is readable.
> * On read access memcpy() call inside sgx_vma_access() had src and dest
>   parameters in wrong order.
> * The build issue with CONFIG_KASAN is now fixed. Added undefined symbols
>   to LE even if “KASAN_SANITIZE := false” was set in the makefile.
> * Fixed a regression in the #PF handler. If a page has
>   SGX_ENCL_PAGE_RESERVED flag the #PF handler should unconditionally fail.
>   It did not, which caused weird races when trying to change other parts of
>   swapping code.
> * EPC management has been refactored to a flat LRU cache and moved to
>   arch/x86. The swapper thread reads a cluster of EPC pages and swaps all
>   of them. It can now swap from multiple enclaves in the same round.
> * For the sake of consistency with SGX_IOC_ENCLAVE_ADD_PAGE, return -EINVAL
>   when an enclave is already initialized or dead instead of zero.
> 
> v10:
> * Cleaned up anon inode based IPC between the ring-0 and ring-3 parts
>   of the driver.
> * Unset the reserved flag from an enclave page if EDBGRD/WR fails
>   (regression in v6).
> * Close the anon inode when LE is stopped (regression in v9).
> * Update the documentation with a more detailed description of SGX.
> 
> v9:
> * Replaced kernel-LE IPC based on pipes with an anonymous inode.
>   The driver does not require anymore new exports.
> 
> v8:
> * Check that public key MSRs match the LE public key hash in the
>   driver initialization when the MSRs are read-only.
> * Fix the race in VA slot allocation by checking the fullness
>   immediately after succeesful allocation.
> * Fix the race in hash mrsigner calculation between the launch
>   enclave and user enclaves by having a separate lock for hash
>   calculation.
> 
> v7:
> * Fixed offset calculation in sgx_edbgr/wr(). Address was masked with PAGE_MASK
>   when it should have been masked with ~PAGE_MASK.
> * Fixed a memory leak in sgx_ioc_enclave_create().
> * Simplified swapping code by using a pointer array for a cluster
>   instead of a linked list.
> * Squeezed struct sgx_encl_page to 32 bytes.
> * Fixed deferencing of an RSA key on OpenSSL 1.1.0.
> * Modified TC's CMAC to use kernel AES-NI. Restructured the code
>   a bit in order to better align with kernel conventions.
> 
> v6:
> * Fixed semaphore underrun when accessing /dev/sgx from the launch enclave.
> * In sgx_encl_create() s/IS_ERR(secs)/IS_ERR(encl)/.
> * Removed virtualization chapter from the documentation.
> * Changed the default filename for the signing key as signing_key.pem.
> * Reworked EPC management in a way that instead of a linked list of
>   struct sgx_epc_page instances there is an array of integers that
>   encodes address and bank of an EPC page (the same data as 'pa' field
>   earlier). The locking has been moved to the EPC bank level instead
>   of a global lock.
> * Relaxed locking requirements for EPC management. EPC pages can be
>   released back to the EPC bank concurrently.
> * Cleaned up ptrace() code.
> * Refined commit messages for new architectural constants.
> * Sorted includes in every source file.
> * Sorted local variable declarations according to the line length in
>   every function.
> * Style fixes based on Darren's comments to sgx_le.c.
> 
> v5:
> * Described IPC between the Launch Enclave and kernel in the commit messages.
> * Fixed all relevant checkpatch.pl issues that I have forgot fix in earlier
>   versions except those that exist in the imported TinyCrypt code.
> * Fixed spelling mistakes in the documentation.
> * Forgot to check the return value of sgx_drv_subsys_init().
> * Encapsulated properly page cache init and teardown.
> * Collect epc pages to a temp list in sgx_add_epc_bank
> * Removed SGX_ENCLAVE_INIT_ARCH constant.
> 
> v4:
> * Tied life-cycle of the sgx_le_proxy process to /dev/sgx.
> * Removed __exit annotation from sgx_drv_subsys_exit().
> * Fixed a leak of a backing page in sgx_process_add_page_req() in the
>   case when vm_insert_pfn() fails.
> * Removed unused symbol exports for sgx_page_cache.c.
> * Updated sgx_alloc_page() to require encl parameter and documented the
>   behavior (Sean Christopherson).
> * Refactored a more lean API for sgx_encl_find() and documented the behavior.
> * Moved #PF handler to sgx_fault.c.
> * Replaced subsys_system_register() with plain bus_register().
> * Retry EINIT 2nd time only if MSRs are not locked.
> 
> v3:
> * Check that FEATURE_CONTROL_LOCKED and FEATURE_CONTROL_SGX_ENABLE are set.
> * Return -ERESTARTSYS in __sgx_encl_add_page() when sgx_alloc_page() fails.
> * Use unused bits in epc_page->pa to store the bank number.
> * Removed #ifdef for WQ_NONREENTRANT.
> * If mmu_notifier_register() fails with -EINTR, return -ERESTARTSYS.
> * Added --remove-section=.got.plt to objcopy flags in order to prevent a
>   dummy .got.plt, which will cause an inconsistent size for the LE.
> * Documented sgx_encl_* functions.
> * Added remark about AES implementation used inside the LE.
> * Removed redundant sgx_sys_exit() from le/main.c.
> * Fixed struct sgx_secinfo alignment from 128 to 64 bytes.
> * Validate miscselect in sgx_encl_create().
> * Fixed SSA frame size calculation to take the misc region into account.
> * Implemented consistent exception handling to __encls() and __encls_ret().
> * Implemented a proper device model in order to allow sysfs attributes
>   and in-kernel API.
> * Cleaned up various "find enclave" implementations to the unified
>   sgx_encl_find().
> * Validate that vm_pgoff is zero.
> * Discard backing pages with shmem_truncate_range() after EADD.
> * Added missing EEXTEND operations to LE signing and launch.
> * Fixed SSA size for GPRS region from 168 to 184 bytes.
> * Fixed the checks for TCS flags. Now DBGOPTIN is allowed.
> * Check that TCS addresses are in ELRANGE and not just page aligned.
> * Require kernel to be compiled with X64_64 and CPU_SUP_INTEL.
> * Fixed an incorrect value for SGX_ATTR_DEBUG from 0x01 to 0x02.
> 
> v2:
> * get_rand_uint32() changed the value of the pointer instead of value
>   where it is pointing at.
> * Launch enclave incorrectly used sigstruct attributes-field instead of
>   enclave attributes-field.
> * Removed unused struct sgx_add_page_req from sgx_ioctl.c
> * Removed unused sgx_has_sgx2.
> * Updated arch/x86/include/asm/sgx.h so that it provides stub
>   implementations when sgx in not enabled.
> * Removed cruft rdmsr-calls from sgx_set_pubkeyhash_msrs().
> * return -ENOMEM in sgx_alloc_page() when VA pages consume too much space
> * removed unused global sgx_nr_pids
> * moved sgx_encl_release to sgx_encl.c
> * return -ERESTARTSYS instead of -EINTR in sgx_encl_init()
> 
> Jarkko Sakkinen (10):
>   x86/sgx: Update MAINTAINERS
>   x86/sgx: Add SGX microarchitectural data structures
>   x86/sgx: Add wrappers for ENCLS leaf functions
>   x86/sgx: Add functions to allocate and free EPC pages
>   x86/sgx: Linux Enclave Driver
>   x86/sgx: Add provisioning
>   x86/sgx: Add a page reclaimer
>   x86/sgx: ptrace() support for the SGX driver
>   selftests/x86: Add a selftest for SGX
>   docs: x86/sgx: Document SGX micro architecture and kernel internals
> 
> Sean Christopherson (10):
>   x86/cpufeatures: x86/msr: Add Intel SGX hardware bits
>   x86/cpufeatures: x86/msr: Intel SGX Launch Control hardware bits
>   x86/mm: x86/sgx: Signal SIGSEGV with PF_SGX
>   x86/cpu/intel: Detect SGX support
>   x86/sgx: Enumerate and track EPC sections
>   mm: Introduce vm_ops->may_mprotect()
>   x86/vdso: Add support for exception fixup in vDSO functions
>   x86/fault: Add helper function to sanitize error code
>   x86/traps: Attempt to fixup exceptions in vDSO before signaling
>   x86/vdso: Implement a vDSO for Intel SGX enclave call
> 
>  .../userspace-api/ioctl/ioctl-number.rst      |   1 +
>  Documentation/x86/index.rst                   |   1 +
>  Documentation/x86/sgx.rst                     | 206 +++++
>  MAINTAINERS                                   |  11 +
>  arch/x86/Kconfig                              |  14 +
>  arch/x86/entry/vdso/Makefile                  |   8 +-
>  arch/x86/entry/vdso/extable.c                 |  46 +
>  arch/x86/entry/vdso/extable.h                 |  29 +
>  arch/x86/entry/vdso/vdso-layout.lds.S         |   9 +-
>  arch/x86/entry/vdso/vdso.lds.S                |   1 +
>  arch/x86/entry/vdso/vdso2c.h                  |  58 +-
>  arch/x86/entry/vdso/vsgx_enter_enclave.S      | 131 +++
>  arch/x86/include/asm/cpufeature.h             |   5 +-
>  arch/x86/include/asm/cpufeatures.h            |   8 +-
>  arch/x86/include/asm/disabled-features.h      |  18 +-
>  arch/x86/include/asm/enclu.h                  |   8 +
>  arch/x86/include/asm/msr-index.h              |   8 +
>  arch/x86/include/asm/required-features.h      |   2 +-
>  arch/x86/include/asm/traps.h                  |   1 +
>  arch/x86/include/asm/vdso.h                   |   5 +
>  arch/x86/include/uapi/asm/sgx.h               | 175 ++++
>  arch/x86/kernel/cpu/Makefile                  |   1 +
>  arch/x86/kernel/cpu/common.c                  |   4 +
>  arch/x86/kernel/cpu/feat_ctl.c                |  32 +-
>  arch/x86/kernel/cpu/sgx/Makefile              |   6 +
>  arch/x86/kernel/cpu/sgx/arch.h                | 343 ++++++++
>  arch/x86/kernel/cpu/sgx/driver.c              | 209 +++++
>  arch/x86/kernel/cpu/sgx/driver.h              |  32 +
>  arch/x86/kernel/cpu/sgx/encl.c                | 756 +++++++++++++++++
>  arch/x86/kernel/cpu/sgx/encl.h                | 128 +++
>  arch/x86/kernel/cpu/sgx/encls.h               | 238 ++++++
>  arch/x86/kernel/cpu/sgx/ioctl.c               | 800 ++++++++++++++++++
>  arch/x86/kernel/cpu/sgx/main.c                | 280 ++++++
>  arch/x86/kernel/cpu/sgx/reclaim.c             | 471 +++++++++++
>  arch/x86/kernel/cpu/sgx/sgx.h                 | 108 +++
>  arch/x86/kernel/traps.c                       |  14 +
>  arch/x86/mm/fault.c                           |  45 +-
>  include/linux/mm.h                            |   2 +
>  mm/mprotect.c                                 |  14 +-
>  tools/arch/x86/include/asm/cpufeatures.h      |   7 +-
>  tools/testing/selftests/Makefile              |   1 +
>  tools/testing/selftests/sgx/.gitignore        |   2 +
>  tools/testing/selftests/sgx/Makefile          |  53 ++
>  tools/testing/selftests/sgx/call.S            |  54 ++
>  tools/testing/selftests/sgx/defines.h         |  21 +
>  tools/testing/selftests/sgx/load.c            | 282 ++++++
>  tools/testing/selftests/sgx/main.c            | 199 +++++
>  tools/testing/selftests/sgx/main.h            |  38 +
>  tools/testing/selftests/sgx/sigstruct.c       | 395 +++++++++
>  tools/testing/selftests/sgx/test_encl.c       |  20 +
>  tools/testing/selftests/sgx/test_encl.lds     |  40 +
>  .../selftests/sgx/test_encl_bootstrap.S       |  89 ++
>  52 files changed, 5398 insertions(+), 31 deletions(-)
>  create mode 100644 Documentation/x86/sgx.rst
>  create mode 100644 arch/x86/entry/vdso/extable.c
>  create mode 100644 arch/x86/entry/vdso/extable.h
>  create mode 100644 arch/x86/entry/vdso/vsgx_enter_enclave.S
>  create mode 100644 arch/x86/include/asm/enclu.h
>  create mode 100644 arch/x86/include/uapi/asm/sgx.h
>  create mode 100644 arch/x86/kernel/cpu/sgx/Makefile
>  create mode 100644 arch/x86/kernel/cpu/sgx/arch.h
>  create mode 100644 arch/x86/kernel/cpu/sgx/driver.c
>  create mode 100644 arch/x86/kernel/cpu/sgx/driver.h
>  create mode 100644 arch/x86/kernel/cpu/sgx/encl.c
>  create mode 100644 arch/x86/kernel/cpu/sgx/encl.h
>  create mode 100644 arch/x86/kernel/cpu/sgx/encls.h
>  create mode 100644 arch/x86/kernel/cpu/sgx/ioctl.c
>  create mode 100644 arch/x86/kernel/cpu/sgx/main.c
>  create mode 100644 arch/x86/kernel/cpu/sgx/reclaim.c
>  create mode 100644 arch/x86/kernel/cpu/sgx/sgx.h
>  create mode 100644 tools/testing/selftests/sgx/.gitignore
>  create mode 100644 tools/testing/selftests/sgx/Makefile
>  create mode 100644 tools/testing/selftests/sgx/call.S
>  create mode 100644 tools/testing/selftests/sgx/defines.h
>  create mode 100644 tools/testing/selftests/sgx/load.c
>  create mode 100644 tools/testing/selftests/sgx/main.c
>  create mode 100644 tools/testing/selftests/sgx/main.h
>  create mode 100644 tools/testing/selftests/sgx/sigstruct.c
>  create mode 100644 tools/testing/selftests/sgx/test_encl.c
>  create mode 100644 tools/testing/selftests/sgx/test_encl.lds
>  create mode 100644 tools/testing/selftests/sgx/test_encl_bootstrap.S
> 
> -- 
> 2.25.1
>

Sean Christopherson May 8, 2020, 7:56 p.m. UTC | #33

On Fri, May 08, 2020 at 02:02:26PM -0500, Dr. Greg wrote:
> On Thu, May 07, 2020 at 02:41:30AM +0200, Thomas Gleixner wrote:
> > The diffstat of your patch is irrelevant. What's relevant is the
> > fact that it is completely unreviewed and that it creates yet
> > another user space visible ABI with a noticable lack of
> > documentation.

...

> As to lack of review, we would certainly welcome technical and API
> comments but we cannot force them.

Thomas' point isn't that your code needs to be reviewed, it's that trying
to squeeze it into the initial merge will, not might, _will_ push out the
acceptance of SGX.  The same is true for other features we have lined up,
e.g. KVM and cgroup support.  It's not a slight on your code, it's just
reality at this point.

> In fact, anyone who reviews the patch will see that the current driver
> creates a pointer in the ioctl call to pass downward into the hardware
> initialization routines.  Our code simply replaces that pointer with
> one supplied by userspace.

Personally, I'm in favor of adding a reserved placeholder for a token so
as to avoid adding a second ioctl() in the event that something gets
upstreamed that wants the token.  Ditto for a file descriptor for the
backing store in sgx_enclave_create.

But, I have contributed exactly zero ioctls to the kernel, so take that
with a big block of salt :-)

> At the very least a modular form of the driver should be considered
> that would allow alternate implementations.  Sean indicated that there
> was a 'kludgy' approach that would allow an alternate modular
> implementation alongside the in-kernel driver.  I believe that Andy
> has already indicated that may not be an advisable approach.

Modularizing the the driver does nothing for your use case.  The "driver"
is a relatively lightweight wrapper, removing that is akin to making
/dev/sgx/enclave inaccessible, i.e. it doesn't make the EPC tracking and
core code go away.  Modularizing the whole thing is flat out not an option,
as doing so is not compatible with an EPC cgroup and adds unnecessary
complexity to KVM enabling, both of which are highly desired features by
pretty much everyone that plans on using SGX.

Andy is right to caution against playing games to squish in-kernel things,
but the virtualization snafu is a completely different beast.  E.g. SGX
doesn't require fiddling with CR4, won't corrupt random memory across
kexec(), doesn't block INIT, etc...  And I'm not advocating long-term
shenanigans, I just wanted to point out that there are options for running
out-of-tree drivers in the short term, e.g. until proper policy controls
land upstream.

Hui, Chunyang May 12, 2020, 11:55 a.m. UTC | #34

On Wed, Apr 22, 2020 at 12:52:56AM +0300, Jarkko Sakkinen wrote:
> Intel(R) SGX is a set of CPU instructions that can be used by applications
> to set aside private regions of code and data. The code outside the enclave
> is disallowed to access the memory inside the enclave by the CPU access
> control.
> 
> There is a new hardware unit in the processor called Memory Encryption
> Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define
> one or many MEE regions that can hold enclave data by configuring them with
> PRMRR registers.
> 
> The MEE automatically encrypts the data leaving the processor package to
> the MEE regions. The data is encrypted using a random key whose life-time
> is exactly one power cycle.
> 
> The current implementation requires that the firmware sets
> IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can
> decide what enclaves it wants run. The implementation does not create
> any bottlenecks to support read-only MSRs later on.
> 
> You can tell if your CPU supports SGX by looking into /proc/cpuinfo:
> 
> 	cat /proc/cpuinfo  | grep sgx

Tested-by: Chunyang Hui <sanqian.hcy@antfin.com>

Occlum project (https://github.com/occlum/occlum) is a libOS built on top of
Intel SGX feature. We ran Occlum tests using patch v29 on SGX hardware with
the Flexible Launch Control (FLC) feature and didn't find any problems.
As Occlum core developers, we would like these patches to be merged soon.

> v29:
> * The selftest has been moved to selftests/sgx. Because SGX is an execution
>   environment of its own, it really isn't a great fit with more "standard"
>   x86 tests.
> 
>   The RSA key is now generated on fly and the whole signing process has
>   been made as part of the enclave loader instead of signing the enclave
>   during the compilation time.
> 
>   Finally, the enclave loader loads now the test enclave directly from its
>   ELF file, which means that ELF file does not need to be coverted as raw
>   binary during the build process.
> * Version the mm_list instead of using on synchronize_mm() when adding new
>   entries. We hold the write lock for the mm_struct, and dup_mm() can thus
>   deadlock with the page reclaimer, which could hold the lock for the old
>   mm_struct.
> * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. anonymous) can
>   be used to reserve the address range. Now /dev/sgx supports only opaque
>   mappings to the (initialized) enclave data.
> * Make the vDSO callable directly from C by preserving RBX and taking leaf
>   from RCX.
> 
> v28:
> * Documented to Documentation/x86/sgx.rst how the kernel manages the
>   enclave ownership.
> * Removed non-LC flow from sgx_einit().
> * Removed struct sgx_einittoken since only the size of the corresponding
>   microarchitectural structure is used in the series ATM.
> 
> v27:
> * Disallow RIE processes to use enclaves as there could a permission
>   conflict between VMA and enclave permissions.
> * In the documentation, replace "grep /proc/cpuinfo" with
>   "grep sgx /proc/cpuinfo".
> 
> v26:
> * Fixed the commit author in "x86/sgx: Linux Enclave Driver", which was
>   changed in v25 by mistake.
> * Addressed a bunch of grammar mistakes in sgx.rst (thanks Randy once
>   again for such a detailed feedback).
> * Added back the MAINTAINERS update commit, which was mistakenly removed
>   in v25.
> * EREMOVE's for SECS cannot be done while sanitizing an EPC section. The
>   CPU does not allow to remove a SECS page before all of its children have
>   been removed, and a child page can be in some other section than the one
>   currently being processed. Thus, removed special SECS processing from
>   sgx_sanitize_page() and instead put sections through it twice. In the
>   2nd round the lists should only contain SECS pages.
> 
> v25:
> * Fix a double-free issue when SGX_IOC_ENCLAVE_ADD_PAGES
>   fails on executing ENCLS[EADD]. The rollback path executed
>   radix_tree_delete() on the same address twice when this happened.
> * Return -EINTR instead of -ERESTARTSYS in SGX_IOC_ENCLAVE_ADD_PAGES when
>   a signal is pending.
> * As requested by Borislav, move the CPUID 0x12 features to their own word
>   in cpufeatures.
> * Sean fixed a bug from sgx_reclaimer_write() where sgx_encl_put_backing()
>   was called with an uninitialized pointer when sgx_encl_get_backing()
>   fails.
> * Migrated /dev/sgx/* to misc. This is future-proof as struct miscdevice
>   has 'groups' for setting up sysfs attributes for the device.
> * Use device_initcall instead of subsys_initcall so that misc_class is
>   initialized before SGX is initialized.
> * Return -EACCES in SGX_IOC_ENCLAVE_INIT when caller tries to select
>   enclave attributes that we the kernel does not allow it to set instead
>   of -EINVAL.
> * Unless SGX public key MSRs are writable always deny the feature from
>   Linux. Previously this was only denied from driver. How VMs should be
>   supported is not really part of initial patch set, which makes this
>   an obvious choice.
> * Cleaned up and refined documentation to be more approachable.
> 
> v24:
> * Reclaim unmeasured and TCS pages (regression in v23).
> * Replace usages of GFP_HIGHUSER with GFP_KERNEL.
> * Return -EIO on when EADD or EEXTEND fails in %SGX_IOC_ENCLAVE_ADD_PAGES
>   and use the same rollback (destroy enclave). This can happen when host
>   suspends itself unknowingly to a VM running enclaves. From -EIO the user
>   space can deduce what happened.
> * Have a separate @count in struct sgx_enclave_add_pages to output number
>   of bytes processed instead of overwriting the input parameters for
>   clarity and more importantly that the API provides means for partial
>   processing (@count could be less than @length in success case).
> 
> v23:
> * Replace SGX_ENCLAVE_ADD_PAGE with SGX_ENCLAVE_ADD_PAGES. Replace @mrmask
>   with %SGX_PAGE_MEASURE flag.
> * Return -EIO instead of -ECANCELED when ptrace() fails to read a TCS page.
> * In the reclaimer, pin page before ENCLS[EBLOCK] because pinning can fail
>   (because of OOM) even in legit behaviour and after EBLOCK the reclaiming
>   flow can be only reverted by killing the whole enclave.
> * Fixed SGX_ATTR_RESERVED_MASK. Bit 7 was marked as reserved while in fact
>   it should have been bit 6 (Table 37-3 in the SDM).
> * Return -EPERM from SGX_IOC_ENCLAVE_INIT when ENCLS[EINIT] returns an SGX
>   error code.
> 
> v22:
> * Refined bunch commit messages and added associated SDM references as
>   many of them were too exhausting and some outdated.
> * Alignment checks have been removed from mmap() because it does not define the
>   ELRANGE. VMAs only act as windows to the enclave. The semantics compare
>   somewhat how mmap() works with regular files.
> * We now require user space addresses given to SGX_IOC_ENCLAVE_ADD_PAGE to be
>   page aligned so that we can pass the page directly to EADD and do not have
>   to do an extra copy. This was made effectively possible by removing the
>   worker thread for adding pages.
> * The selftest build files have been refined throughout of various glitches
>   and work properly in a cross compilation environment such as BuildRoot.
>   In addition, libcalls fail the build with an assertion in the linker
>   script, if they end up to the enclave binary.
> * CONFIG_INTEL_SGX_DRIVER has been removed because you cannot use SGX core
>   for anything without having the driver. This could change when KVM support
>   is added.
> * We require zero permissions in SECINFO for TCS pages because the CPU
>   overwrites SECINFO flags with zero permissions and measures the page
>   only after that. Allowing to pass TCS with non-zero permissions would
>   cause mismatching measurement between the one provided in SIGSTRUCT and
>   the one computed by the CPU.
> * Obviously lots of small fixes and clean ups (does make sense to
>   document them all).
> 
> v21:
> * Check on mmap() that the VMA does cover an area that does not have
>   enclave pages. Only mapping with PROT_NONE can do that to reserve
>   initial address space for an enclave.
> * Check om mmap() and mprotect() that the VMA permissions do not
>   surpass the enclave permissions.
> * Remove two refcounts from vma_close(): mm_list and encl->refcount.
>   Enclave refcount is only need for swapper/enclave sync and we can
>   remove mm_list refcount by destroying mm_struct when the process
>   is closed. By not having vm_close() the Linux MM can merge VMAs.
> * Do not naturally align MAP_FIXED address.
> * Numerous small fixes and clean ups.
> * Use SRCU for synchronizing the list of mm_struct's.
> * Move to stack based call convention in the vDSO.
> 
> v20:
> * Fine-tune Kconfig messages and spacing and remove MMU_NOTIFIER
>   dependency as MMU notifiers are no longer used in the driver.
> * Use mm_users instead of mm_count as refcount for mm_struct as mm_count
>   only protects from deleting mm_struct, not removing its contents.
> * Sanitize EPC when the reclaimer thread starts by doing EREMOVE for all
>   of them. They could be in initialized state when the kernel starts
>   because it might be spawned by kexec().
> * Documentation overhaul.
> * Use a device /dev/sgx/provision for delivering the provision token
>   instead of securityfs.
> * Create a reference to the enclave when already when opening
>   /dev/sgx/enclave.  The file is then associated with this enclave only.
>   mmap() can be done at free at any point and always get a reference to
>   the enclave. To summarize the file now represents the enclave.
> 
> v19:
> * Took 3-4 months but in some sense this was more like a rewrite of most
>   of the corners of the source code. If I've forgotten to deal with some
>   feedback, please don't shout me. Make a remark and I will fix it for
>   the next version. Hopefully there won't be this big turnovers anymore.
> * Validate SECS attributes properly against CPUID given attributes and
>   against allowed attributes. SECS attributes are the ones that are
>   enforced whereas SIGSTRUCT attributes tell what is required to run
>   the enclave.
> * Add KSS (Key Sharing Support) to the enclave attributes.
> * Deny MAP_PRIVATE as an enclave is always a shared memory entity.
> * Revert back to shmem backing storage so that it can be easily shared
>   by multiple processes.
> * Split the recognization of an ENCLS leaf failure by using three
>   functions to detect it: encsl_faulted(), encls_returned_code() and
>   sgx_failed(). encls_failed() is only caused by a spurious expections that
>   should never happen. Thus, it is not defined as an inline function in
>   order to easily insert a kprobe to it.
> * Move low-level enclave management routines, page fault handler and page
>   reclaiming routines from driver to the core. These cannot be separated
>   from each other as they are heavily interdependent. The rationale is that
>   the core does not call any code from the driver.
> * Allow the driver to be compiled as a module now that it no code is using
>   its routines and it only uses exported symbols. Now the driver is
>   essentially just a thin ioctl layer.
> * Reworked the driver to maintain a list of mm_struct's. The VMA callbacks
>   add new entries to this list as the process is forked. Each entry has
>   its own refcount because they have a different life-cycle as the enclave
>   does. In effect @tgid and @mm have been removed from struct sgx_encl
>   and we allow forking by removing VM_DONTCOPY from vm flags.
> * Generate a cpu mask in the reclaimer from the cpu mask's of all
>   mm_struct's. This will kick out the hardware threads out of the enclave
>   from multiple processes. It is not a local variable because it would
>   eat too much of the stack space but instead a field in struct
>   sgx_encl.
> * Allow forking i.e. remove VM_DONTCOPY. I did not change the API
>   because the old API scaled to the workload that Andy described. The
>   codebase is now mostly API independent i.e. changing the API is a
>   small task. For me the proper trigger to chanage it is a as concrete
>   as possible workload that cannot be fulfilled. I hope you understand
>   my thinking here. I don't want to change anything w/o proper basis
>   but I'm ready to change anything if there is a proper basis. I do
>   not have any kind of attachment to any particular type of API.
> * Add Sean's vDSO ENCLS(EENTER) patches and update selftest to use the
>   new vDSO.
> 
> v18:
> * Update the ioctl-number.txt.
> * Move the driver under arch/x86.
> * Add SGX features (SGX, SGX1, SGX2) to the disabled-features.h.
> * Rename the selftest as test_sgx (previously sgx-selftest).
> * In order to enable process accounting, swap EPC pages and PCMD's to a VMA
>   instead of shmem.
> * Allow only to initialize and run enclaves with a subset of
>   {DEBUG, MODE64BIT} set.
> * Add SGX_IOC_ENCLAVE_SET_ATTRIBUTE to allow an enclave to have privileged
>   attributes e.g. PROVISIONKEY.
> 
> v17:
> * Add a simple selftest.
> * Fix a null pointer dereference to section->pages when its
>   allocation fails.
> * Add Sean's description of the exception handling to the documentation.
> 
> v16:
> * Fixed SOB's in the commits that were a bit corrupted in v15.
> * Implemented exceptio handling properly to detect_sgx().
> * Use GENMASK() to define SGX_CPUID_SUB_LEAF_TYPE_MASK.
> * Updated the documentation to use rst definition lists.
> * Added the missing Documentation/x86/index.rst, which has a link to
>   intel_sgx.rst. Now the SGX and uapi documentation is properly generated
>   with 'make htmldocs'.
> * While enumerating EPC sections, if an undefined section is found, fail
>   the driver initialization instead of continuing the initialization.
> * Issue a warning if there are more than %SGX_MAX_EPC_SECTIONS.
> * Remove copyright notice from arch/x86/include/asm/sgx.h.
> * Migrated from ioremap_cache() to memremap().
> 
> v15:
> * Split into more digestable size patches.
> * Lots of small fixes and clean ups.
> * Signal a "plain" SIGSEGV on an EPCM violation.
> 
> v14:
> * Change the comment about X86_FEATURE_SGX_LC from ???SGX launch
>   configuration??? to ???SGX launch control???.
> * Move the SGX-related CPU feature flags as part of the Linux defined
>   virtual leaf 8.
> * Add SGX_ prefix to the constants defining the ENCLS leaf functions.
> * Use GENMASK*() and BIT*() in sgx_arch.h instead of raw hex numbers.
> * Refine the long description for CONFIG_INTEL_SGX_CORE.
> * Do not use pr_*_ratelimited()  in the driver. The use of the rate limited
>   versions is legacy cruft from the prototyping phase.
> * Detect sleep with SGX_INVALID_EINIT_TOKEN instead of counting power
>   cycles.
> * Manually prefix with ???sgx:??? in the core SGX code instead of redefining
>   pr_fmt.
> * Report if IA32_SGXLEPUBKEYHASHx MSRs are not writable in the driver
>   instead of core because it is a driver requirement.
> * Change prompt to bool in the entry for CONFIG_INTEL_SGX_CORE because the
>   default is ???n???.
> * Rename struct sgx_epc_bank as struct sgx_epc_section in order to match
>   the SDM.
> * Allocate struct sgx_epc_page instances one at a time.
> * Use ???__iomem void *??? pointers for the mapped EPC memory consistently.
> * Retry once on SGX_INVALID_TOKEN in sgx_einit() instead of counting power
>   cycles.
> * Call enclave swapping operations directly from the driver instead of
>   calling them .indirectly through struct sgx_epc_page_ops because indirect
>   calls are not required yet as the patch set does not contain the KVM
>   support.
> * Added special signal SEGV_SGXERR to notify about SGX EPCM violation
>   errors.
> 
> v13:
> * Always use SGX_CPUID constant instead of a hardcoded value.
> * Simplified and documented the macros and functions for ENCLS leaves.
> * Enable sgx_free_page() to free active enclave pages on demand
>   in order to allow sgx_invalidate() to delete enclave pages.
>   It no longer performs EREMOVE if a page is in the process of
>   being reclaimed.
> * Use PM notifier per enclave so that we don't have to traverse
>   the global list of active EPC pages to find enclaves.
> * Removed unused SGX_LE_ROLLBACK constant from uapi/asm/sgx.h
> * Always use ioremap() to map EPC banks as we only support 64-bit kernel.
> * Invalidate IA32_SGXLEPUBKEYHASH cache used by sgx_einit() when going
>   to sleep.
> 
> v12:
> * Split to more narrow scoped commits in order to ease the review process and
>   use co-developed-by tag for co-authors of commits instead of listing them in
>   the source files.
> * Removed cruft EXPORT_SYMBOL() declarations and converted to static variables.
> * Removed in-kernel LE i.e. this version of the SGX software stack only
>   supports unlocked IA32_SGXLEPUBKEYHASHx MSRs.
> * Refined documentation on launching enclaves, swapping and enclave
>   construction.
> * Refined sgx_arch.h to include alignment information for every struct that
>   requires it and removed structs that are not needed without an LE.
> * Got rid of SGX_CPUID.
> * SGX detection now prints log messages about firmware configuration issues.
> 
> v11:
> * Polished ENCLS wrappers with refined exception handling.
> * ksgxswapd was not stopped (regression in v5) in
>   sgx_page_cache_teardown(), which causes a leaked kthread after driver
>   deinitialization.
> * Shutdown sgx_le_proxy when going to suspend because its EPC pages will be
>   invalidated when resuming, which will cause it not function properly
>   anymore.
> * Set EINITTOKEN.VALID to zero for a token that is passed when
>   SGXLEPUBKEYHASH matches MRSIGNER as alloc_page() does not give a zero
>   page.
> * Fixed the check in sgx_edbgrd() for a TCS page. Allowed to read offsets
>   around the flags field, which causes a #GP. Only flags read is readable.
> * On read access memcpy() call inside sgx_vma_access() had src and dest
>   parameters in wrong order.
> * The build issue with CONFIG_KASAN is now fixed. Added undefined symbols
>   to LE even if ???KASAN_SANITIZE := false??? was set in the makefile.
> * Fixed a regression in the #PF handler. If a page has
>   SGX_ENCL_PAGE_RESERVED flag the #PF handler should unconditionally fail.
>   It did not, which caused weird races when trying to change other parts of
>   swapping code.
> * EPC management has been refactored to a flat LRU cache and moved to
>   arch/x86. The swapper thread reads a cluster of EPC pages and swaps all
>   of them. It can now swap from multiple enclaves in the same round.
> * For the sake of consistency with SGX_IOC_ENCLAVE_ADD_PAGE, return -EINVAL
>   when an enclave is already initialized or dead instead of zero.
> 
> v10:
> * Cleaned up anon inode based IPC between the ring-0 and ring-3 parts
>   of the driver.
> * Unset the reserved flag from an enclave page if EDBGRD/WR fails
>   (regression in v6).
> * Close the anon inode when LE is stopped (regression in v9).
> * Update the documentation with a more detailed description of SGX.
> 
> v9:
> * Replaced kernel-LE IPC based on pipes with an anonymous inode.
>   The driver does not require anymore new exports.
> 
> v8:
> * Check that public key MSRs match the LE public key hash in the
>   driver initialization when the MSRs are read-only.
> * Fix the race in VA slot allocation by checking the fullness
>   immediately after succeesful allocation.
> * Fix the race in hash mrsigner calculation between the launch
>   enclave and user enclaves by having a separate lock for hash
>   calculation.
> 
> v7:
> * Fixed offset calculation in sgx_edbgr/wr(). Address was masked with PAGE_MASK
>   when it should have been masked with ~PAGE_MASK.
> * Fixed a memory leak in sgx_ioc_enclave_create().
> * Simplified swapping code by using a pointer array for a cluster
>   instead of a linked list.
> * Squeezed struct sgx_encl_page to 32 bytes.
> * Fixed deferencing of an RSA key on OpenSSL 1.1.0.
> * Modified TC's CMAC to use kernel AES-NI. Restructured the code
>   a bit in order to better align with kernel conventions.
> 
> v6:
> * Fixed semaphore underrun when accessing /dev/sgx from the launch enclave.
> * In sgx_encl_create() s/IS_ERR(secs)/IS_ERR(encl)/.
> * Removed virtualization chapter from the documentation.
> * Changed the default filename for the signing key as signing_key.pem.
> * Reworked EPC management in a way that instead of a linked list of
>   struct sgx_epc_page instances there is an array of integers that
>   encodes address and bank of an EPC page (the same data as 'pa' field
>   earlier). The locking has been moved to the EPC bank level instead
>   of a global lock.
> * Relaxed locking requirements for EPC management. EPC pages can be
>   released back to the EPC bank concurrently.
> * Cleaned up ptrace() code.
> * Refined commit messages for new architectural constants.
> * Sorted includes in every source file.
> * Sorted local variable declarations according to the line length in
>   every function.
> * Style fixes based on Darren's comments to sgx_le.c.
> 
> v5:
> * Described IPC between the Launch Enclave and kernel in the commit messages.
> * Fixed all relevant checkpatch.pl issues that I have forgot fix in earlier
>   versions except those that exist in the imported TinyCrypt code.
> * Fixed spelling mistakes in the documentation.
> * Forgot to check the return value of sgx_drv_subsys_init().
> * Encapsulated properly page cache init and teardown.
> * Collect epc pages to a temp list in sgx_add_epc_bank
> * Removed SGX_ENCLAVE_INIT_ARCH constant.
> 
> v4:
> * Tied life-cycle of the sgx_le_proxy process to /dev/sgx.
> * Removed __exit annotation from sgx_drv_subsys_exit().
> * Fixed a leak of a backing page in sgx_process_add_page_req() in the
>   case when vm_insert_pfn() fails.
> * Removed unused symbol exports for sgx_page_cache.c.
> * Updated sgx_alloc_page() to require encl parameter and documented the
>   behavior (Sean Christopherson).
> * Refactored a more lean API for sgx_encl_find() and documented the behavior.
> * Moved #PF handler to sgx_fault.c.
> * Replaced subsys_system_register() with plain bus_register().
> * Retry EINIT 2nd time only if MSRs are not locked.
> 
> v3:
> * Check that FEATURE_CONTROL_LOCKED and FEATURE_CONTROL_SGX_ENABLE are set.
> * Return -ERESTARTSYS in __sgx_encl_add_page() when sgx_alloc_page() fails.
> * Use unused bits in epc_page->pa to store the bank number.
> * Removed #ifdef for WQ_NONREENTRANT.
> * If mmu_notifier_register() fails with -EINTR, return -ERESTARTSYS.
> * Added --remove-section=.got.plt to objcopy flags in order to prevent a
>   dummy .got.plt, which will cause an inconsistent size for the LE.
> * Documented sgx_encl_* functions.
> * Added remark about AES implementation used inside the LE.
> * Removed redundant sgx_sys_exit() from le/main.c.
> * Fixed struct sgx_secinfo alignment from 128 to 64 bytes.
> * Validate miscselect in sgx_encl_create().
> * Fixed SSA frame size calculation to take the misc region into account.
> * Implemented consistent exception handling to __encls() and __encls_ret().
> * Implemented a proper device model in order to allow sysfs attributes
>   and in-kernel API.
> * Cleaned up various "find enclave" implementations to the unified
>   sgx_encl_find().
> * Validate that vm_pgoff is zero.
> * Discard backing pages with shmem_truncate_range() after EADD.
> * Added missing EEXTEND operations to LE signing and launch.
> * Fixed SSA size for GPRS region from 168 to 184 bytes.
> * Fixed the checks for TCS flags. Now DBGOPTIN is allowed.
> * Check that TCS addresses are in ELRANGE and not just page aligned.
> * Require kernel to be compiled with X64_64 and CPU_SUP_INTEL.
> * Fixed an incorrect value for SGX_ATTR_DEBUG from 0x01 to 0x02.
> 
> v2:
> * get_rand_uint32() changed the value of the pointer instead of value
>   where it is pointing at.
> * Launch enclave incorrectly used sigstruct attributes-field instead of
>   enclave attributes-field.
> * Removed unused struct sgx_add_page_req from sgx_ioctl.c
> * Removed unused sgx_has_sgx2.
> * Updated arch/x86/include/asm/sgx.h so that it provides stub
>   implementations when sgx in not enabled.
> * Removed cruft rdmsr-calls from sgx_set_pubkeyhash_msrs().
> * return -ENOMEM in sgx_alloc_page() when VA pages consume too much space
> * removed unused global sgx_nr_pids
> * moved sgx_encl_release to sgx_encl.c
> * return -ERESTARTSYS instead of -EINTR in sgx_encl_init()
> 
> Jarkko Sakkinen (10):
>   x86/sgx: Update MAINTAINERS
>   x86/sgx: Add SGX microarchitectural data structures
>   x86/sgx: Add wrappers for ENCLS leaf functions
>   x86/sgx: Add functions to allocate and free EPC pages
>   x86/sgx: Linux Enclave Driver
>   x86/sgx: Add provisioning
>   x86/sgx: Add a page reclaimer
>   x86/sgx: ptrace() support for the SGX driver
>   selftests/x86: Add a selftest for SGX
>   docs: x86/sgx: Document SGX micro architecture and kernel internals
> 
> Sean Christopherson (10):
>   x86/cpufeatures: x86/msr: Add Intel SGX hardware bits
>   x86/cpufeatures: x86/msr: Intel SGX Launch Control hardware bits
>   x86/mm: x86/sgx: Signal SIGSEGV with PF_SGX
>   x86/cpu/intel: Detect SGX support
>   x86/sgx: Enumerate and track EPC sections
>   mm: Introduce vm_ops->may_mprotect()
>   x86/vdso: Add support for exception fixup in vDSO functions
>   x86/fault: Add helper function to sanitize error code
>   x86/traps: Attempt to fixup exceptions in vDSO before signaling
>   x86/vdso: Implement a vDSO for Intel SGX enclave call
> 
>  .../userspace-api/ioctl/ioctl-number.rst      |   1 +
>  Documentation/x86/index.rst                   |   1 +
>  Documentation/x86/sgx.rst                     | 206 +++++
>  MAINTAINERS                                   |  11 +
>  arch/x86/Kconfig                              |  14 +
>  arch/x86/entry/vdso/Makefile                  |   8 +-
>  arch/x86/entry/vdso/extable.c                 |  46 +
>  arch/x86/entry/vdso/extable.h                 |  29 +
>  arch/x86/entry/vdso/vdso-layout.lds.S         |   9 +-
>  arch/x86/entry/vdso/vdso.lds.S                |   1 +
>  arch/x86/entry/vdso/vdso2c.h                  |  58 +-
>  arch/x86/entry/vdso/vsgx_enter_enclave.S      | 131 +++
>  arch/x86/include/asm/cpufeature.h             |   5 +-
>  arch/x86/include/asm/cpufeatures.h            |   8 +-
>  arch/x86/include/asm/disabled-features.h      |  18 +-
>  arch/x86/include/asm/enclu.h                  |   8 +
>  arch/x86/include/asm/msr-index.h              |   8 +
>  arch/x86/include/asm/required-features.h      |   2 +-
>  arch/x86/include/asm/traps.h                  |   1 +
>  arch/x86/include/asm/vdso.h                   |   5 +
>  arch/x86/include/uapi/asm/sgx.h               | 175 ++++
>  arch/x86/kernel/cpu/Makefile                  |   1 +
>  arch/x86/kernel/cpu/common.c                  |   4 +
>  arch/x86/kernel/cpu/feat_ctl.c                |  32 +-
>  arch/x86/kernel/cpu/sgx/Makefile              |   6 +
>  arch/x86/kernel/cpu/sgx/arch.h                | 343 ++++++++
>  arch/x86/kernel/cpu/sgx/driver.c              | 209 +++++
>  arch/x86/kernel/cpu/sgx/driver.h              |  32 +
>  arch/x86/kernel/cpu/sgx/encl.c                | 756 +++++++++++++++++
>  arch/x86/kernel/cpu/sgx/encl.h                | 128 +++
>  arch/x86/kernel/cpu/sgx/encls.h               | 238 ++++++
>  arch/x86/kernel/cpu/sgx/ioctl.c               | 800 ++++++++++++++++++
>  arch/x86/kernel/cpu/sgx/main.c                | 280 ++++++
>  arch/x86/kernel/cpu/sgx/reclaim.c             | 471 +++++++++++
>  arch/x86/kernel/cpu/sgx/sgx.h                 | 108 +++
>  arch/x86/kernel/traps.c                       |  14 +
>  arch/x86/mm/fault.c                           |  45 +-
>  include/linux/mm.h                            |   2 +
>  mm/mprotect.c                                 |  14 +-
>  tools/arch/x86/include/asm/cpufeatures.h      |   7 +-
>  tools/testing/selftests/Makefile              |   1 +
>  tools/testing/selftests/sgx/.gitignore        |   2 +
>  tools/testing/selftests/sgx/Makefile          |  53 ++
>  tools/testing/selftests/sgx/call.S            |  54 ++
>  tools/testing/selftests/sgx/defines.h         |  21 +
>  tools/testing/selftests/sgx/load.c            | 282 ++++++
>  tools/testing/selftests/sgx/main.c            | 199 +++++
>  tools/testing/selftests/sgx/main.h            |  38 +
>  tools/testing/selftests/sgx/sigstruct.c       | 395 +++++++++
>  tools/testing/selftests/sgx/test_encl.c       |  20 +
>  tools/testing/selftests/sgx/test_encl.lds     |  40 +
>  .../selftests/sgx/test_encl_bootstrap.S       |  89 ++
>  52 files changed, 5398 insertions(+), 31 deletions(-)
>  create mode 100644 Documentation/x86/sgx.rst
>  create mode 100644 arch/x86/entry/vdso/extable.c
>  create mode 100644 arch/x86/entry/vdso/extable.h
>  create mode 100644 arch/x86/entry/vdso/vsgx_enter_enclave.S
>  create mode 100644 arch/x86/include/asm/enclu.h
>  create mode 100644 arch/x86/include/uapi/asm/sgx.h
>  create mode 100644 arch/x86/kernel/cpu/sgx/Makefile
>  create mode 100644 arch/x86/kernel/cpu/sgx/arch.h
>  create mode 100644 arch/x86/kernel/cpu/sgx/driver.c
>  create mode 100644 arch/x86/kernel/cpu/sgx/driver.h
>  create mode 100644 arch/x86/kernel/cpu/sgx/encl.c
>  create mode 100644 arch/x86/kernel/cpu/sgx/encl.h
>  create mode 100644 arch/x86/kernel/cpu/sgx/encls.h
>  create mode 100644 arch/x86/kernel/cpu/sgx/ioctl.c
>  create mode 100644 arch/x86/kernel/cpu/sgx/main.c
>  create mode 100644 arch/x86/kernel/cpu/sgx/reclaim.c
>  create mode 100644 arch/x86/kernel/cpu/sgx/sgx.h
>  create mode 100644 tools/testing/selftests/sgx/.gitignore
>  create mode 100644 tools/testing/selftests/sgx/Makefile
>  create mode 100644 tools/testing/selftests/sgx/call.S
>  create mode 100644 tools/testing/selftests/sgx/defines.h
>  create mode 100644 tools/testing/selftests/sgx/load.c
>  create mode 100644 tools/testing/selftests/sgx/main.c
>  create mode 100644 tools/testing/selftests/sgx/main.h
>  create mode 100644 tools/testing/selftests/sgx/sigstruct.c
>  create mode 100644 tools/testing/selftests/sgx/test_encl.c
>  create mode 100644 tools/testing/selftests/sgx/test_encl.lds
>  create mode 100644 tools/testing/selftests/sgx/test_encl_bootstrap.S
> 
> -- 
> 2.25.1
>

Dr. Greg May 12, 2020, 4:51 p.m. UTC | #35

On Tue, May 12, 2020 at 07:55:58PM +0800, Hui, Chunyang wrote:

> > You can tell if your CPU supports SGX by looking into /proc/cpuinfo:
> > 
> > 	cat /proc/cpuinfo  | grep sgx
> 
> Tested-by: Chunyang Hui <sanqian.hcy@antfin.com>

> Occlum project (https://github.com/occlum/occlum) is a libOS built
> on top of Intel SGX feature. We ran Occlum tests using patch v29 on
> SGX hardware with the Flexible Launch Control (FLC) feature and
> didn't find any problems.  As Occlum core developers, we would like
> these patches to be merged soon.

Do you use the Intel PSW or your own?

Are you using the standard ECALL interface or did the tests run with
the new VDSO entry and exception handler?

Have a good day.

Dr. Greg

As always,
Dr. Greg Wettstein, Ph.D, Worker      Autonomously self-defensive
Enjellic Systems Development, LLC     IOT platforms and edge devices.
4206 N. 19th Ave.
Fargo, ND  58102
PH: 701-281-1686                      EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"I had far rather walk, as I do, in daily terror of eternity, than feel
 that this was only a children's game in which all of the contestants
 would get equally worthless prizes in the end."
                                -- T. S. Elliot

Jarkko Sakkinen May 13, 2020, 10:14 p.m. UTC | #36

On Wed, 2020-05-06 at 17:42 -0400, Nathaniel McCallum wrote:
> Tested on Enarx. This requires a patch[0] for v29 support.
> 
> Tested-by: Nathaniel McCallum <npmccallum@redhat.com>

Thank you. Update in my tree.

Sean, I'll fixed that whitespace issue too in my tree.

General question: maybe it would be easiest that I issue a pull request
once everyone feels that the series is ready to be pulled and stop sending
new versions of the series?

/Jarkko

Jarkko Sakkinen May 13, 2020, 10:18 p.m. UTC | #37

On Thu, 2020-05-14 at 01:14 +0300, Jarkko Sakkinen wrote:
> On Wed, 2020-05-06 at 17:42 -0400, Nathaniel McCallum wrote:
> > Tested on Enarx. This requires a patch[0] for v29 support.
> > 
> > Tested-by: Nathaniel McCallum <npmccallum@redhat.com>
> 
> Thank you. Update in my tree.
> 
> Sean, I'll fixed that whitespace issue too in my tree.
> 
> General question: maybe it would be easiest that I issue a pull request
> once everyone feels that the series is ready to be pulled and stop sending
> new versions of the series?

My honest feelings about the series ATM are:

1. It is not perfect like no code never is and there are always issues.
2. Some things are very well matured, even more so than in a lot of mainline
   code I've seen. I'm particularly happy how the locking code has been
   converged.
3. Not worried to maintain the code in its current state. It is manageable.

/Jarkko

Jarkko Sakkinen May 13, 2020, 11:09 p.m. UTC | #38

On Wed, 2020-05-06 at 09:39 -0700, Jordan Hand wrote:
> On 4/21/20 2:52 PM, Jarkko Sakkinen wrote:
> > Intel(R) SGX is a set of CPU instructions that can be used by applications
> > to set aside private regions of code and data. The code outside the enclave
> > is disallowed to access the memory inside the enclave by the CPU access
> > control.
> > 
> > There is a new hardware unit in the processor called Memory Encryption
> > Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define
> > one or many MEE regions that can hold enclave data by configuring them with
> > PRMRR registers.
> > 
> > The MEE automatically encrypts the data leaving the processor package to
> > the MEE regions. The data is encrypted using a random key whose life-time
> > is exactly one power cycle.
> > 
> > The current implementation requires that the firmware sets
> > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can
> > decide what enclaves it wants run. The implementation does not create
> > any bottlenecks to support read-only MSRs later on.
> > 
> > You can tell if your CPU supports SGX by looking into /proc/cpuinfo:
> > 
> > 	cat /proc/cpuinfo  | grep sgx
> > 
> > v29:
> > * The selftest has been moved to selftests/sgx. Because SGX is an execution
> >    environment of its own, it really isn't a great fit with more "standard"
> >    x86 tests.
> > 
> >    The RSA key is now generated on fly and the whole signing process has
> >    been made as part of the enclave loader instead of signing the enclave
> >    during the compilation time.
> > 
> >    Finally, the enclave loader loads now the test enclave directly from its
> >    ELF file, which means that ELF file does not need to be coverted as raw
> >    binary during the build process.
> > * Version the mm_list instead of using on synchronize_mm() when adding new
> >    entries. We hold the write lock for the mm_struct, and dup_mm() can thus
> >    deadlock with the page reclaimer, which could hold the lock for the old
> >    mm_struct.
> > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. anonymous) can
> >    be used to reserve the address range. Now /dev/sgx supports only opaque
> >    mappings to the (initialized) enclave data.
> > * Make the vDSO callable directly from C by preserving RBX and taking leaf
> >    from RCX.
> > 
> 
> Tested with the Open Enclave SDK on top of Intel PSW. Specifically built 
> the Intel PSW with changes to support /dev/sgx mapping[1] new in v29.
> 
> Tested-by: Jordan Hand <jorhand@linux.microsoft.com>
> 
> [1] https://github.com/intel/linux-sgx/pull/530

Thank you!

/Jarkko

Jethro Beekman May 14, 2020, 6:31 a.m. UTC | #39

On 2020-05-14 00:14, Jarkko Sakkinen wrote:
> General question: maybe it would be easiest that I issue a pull request
> once everyone feels that the series is ready to be pulled and stop sending
> new versions of the series?

Sounds good

--
Jethro Beekman | Fortanix

Dr. Greg May 14, 2020, 9:16 a.m. UTC | #40

On Fri, May 08, 2020 at 12:56:35PM -0700, Sean Christopherson wrote:

Good morning, I hope the week is proceeding well for everyone.

> On Fri, May 08, 2020 at 02:02:26PM -0500, Dr. Greg wrote:
> > On Thu, May 07, 2020 at 02:41:30AM +0200, Thomas Gleixner wrote:
> > > The diffstat of your patch is irrelevant. What's relevant is the
> > > fact that it is completely unreviewed and that it creates yet
> > > another user space visible ABI with a noticable lack of
> > > documentation.
> 
> ...
> 
> > As to lack of review, we would certainly welcome technical and API
> > comments but we cannot force them.

> Thomas' point isn't that your code needs to be reviewed, it's that
> trying to squeeze it into the initial merge will, not might, _will_
> push out the acceptance of SGX.  The same is true for other features
> we have lined up, e.g. KVM and cgroup support.  It's not a slight on
> your code, it's just reality at this point.

For the record and for whatever it is worth at this point.  The patch
has been available in its present form and function for around 14
months.

So there was plenty of time for its consideration and integration into
what is being prepared as the final merge candidate.

> > In fact, anyone who reviews the patch will see that the current driver
> > creates a pointer in the ioctl call to pass downward into the hardware
> > initialization routines.  Our code simply replaces that pointer with
> > one supplied by userspace.

> Personally, I'm in favor of adding a reserved placeholder for a
> token so as to avoid adding a second ioctl() in the event that
> something gets upstreamed that wants the token.  Ditto for a file
> descriptor for the backing store in sgx_enclave_create.

That would be a very low cost and forward looking addition.

> > At the very least a modular form of the driver should be
> > considered that would allow alternate implementations.  Sean
> > indicated that there was a 'kludgy' approach that would allow an
> > alternate modular implementation alongside the in-kernel driver.
> > I believe that Andy has already indicated that may not be an
> > advisable approach.

> Modularizing the the driver does nothing for your use case.  The
> "driver" is a relatively lightweight wrapper, removing that is akin
> to making /dev/sgx/enclave inaccessible, i.e. it doesn't make the
> EPC tracking and core code go away.  Modularizing the whole thing is
> flat out not an option, as doing so is not compatible with an EPC
> cgroup and adds unnecessary complexity to KVM enabling, both of
> which are highly desired features by pretty much everyone that plans
> on using SGX.

All well taken technical points, but they don't speak directly to the
issue at hand.  The potential security issue in all of this is access
to /dev/sgx/enclave, not the EPC tracking and core code.

Here in a nutshell is the paradox the kernel faces:

No one seems to be disputing the fact that the primary focus of this
driver will be to support the notion of 'runtime encryption' and
Confidential Computing.  The whole premise of the concept and economic
predicate of the initiative is that the owner/manager of the platform
has no visibility into what is being done on the platform.

If the Linux community believes that standard platform security
controls can make qualitatively good judgements on what enclave based
execution is doing, it is effectively making the statement that the
very concept of runtime encryption and by extension Confidential
Computing is invalid.

If we accept the concept that Confidential Computing is valid then we
admit the technology provides the opportunity for unsupervised code
execution and data manipulation.

Our premise in all of this is that one page of code implementing three
linked lists is a small price to pay to provide platform owners with
the opportunity to optionally implement what is effectively 2-factor
authentication over this process.

Going forward, if we are intellectually honest, all of this suggests
that adding complexity to the driver with LSM controls makes little
sense technically.  Since, by definition and economic intent, the
technology provides tools and infrastructure that allows these
controls to be evaded.

The notion that entities with adversarial intent would not try and
take advantage of this flies in the face of everything we know about
security.

> Andy is right to caution against playing games to squish in-kernel
> things, but the virtualization snafu is a completely different
> beast.  E.g. SGX doesn't require fiddling with CR4, won't corrupt
> random memory across kexec(), doesn't block INIT, etc...  And I'm
> not advocating long-term shenanigans, I just wanted to point out
> that there are options for running out-of-tree drivers in the short
> term, e.g. until proper policy controls land upstream.

It appears that the distributions, at least IBM/RHEL, are going to
compile this driver in and backport it as well.

What we would recommend at this point is that you and Jarkko do the
Linux community and beyond a favor and wire up a simple kernel
command-line parameter that controls the ability of the driver to be
used, ie. enables/disables access to /dev/sgx/enclave.

Distributions or others can set this command-line parameter by default
to 'disable' and avoid any possibility of the technology being used
for nefarious purposes.  Since the technology now appears to be
focused only on the cloud providers they will certainly be capable of
configuring their implementations to change the default.

In essence, make the kernel's behavior secure by default.

Best wishes for a pleasant weekend to everyone.

Dr. Greg

As always,
Dr. Greg Wettstein, Ph.D, Worker      Artisans in autonomously
Enjellic Systems Development, LLC     self-defensive IOT platforms
4206 N. 19th Ave.                     and edge devices.
Fargo, ND  58102
PH: 701-281-1686                      EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"Intellectuals solve problems; geniuses prevent them."
                                -- Albert Einstein

Jarkko Sakkinen May 14, 2020, 10:49 a.m. UTC | #41

On Tue, 2020-05-12 at 19:55 +0800, Hui, Chunyang wrote:
> On Wed, Apr 22, 2020 at 12:52:56AM +0300, Jarkko Sakkinen wrote:
> > Intel(R) SGX is a set of CPU instructions that can be used by applications
> > to set aside private regions of code and data. The code outside the enclave
> > is disallowed to access the memory inside the enclave by the CPU access
> > control.
> > 
> > There is a new hardware unit in the processor called Memory Encryption
> > Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define
> > one or many MEE regions that can hold enclave data by configuring them with
> > PRMRR registers.
> > 
> > The MEE automatically encrypts the data leaving the processor package to
> > the MEE regions. The data is encrypted using a random key whose life-time
> > is exactly one power cycle.
> > 
> > The current implementation requires that the firmware sets
> > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can
> > decide what enclaves it wants run. The implementation does not create
> > any bottlenecks to support read-only MSRs later on.
> > 
> > You can tell if your CPU supports SGX by looking into /proc/cpuinfo:
> > 
> > 	cat /proc/cpuinfo  | grep sgx
> 
> Tested-by: Chunyang Hui <sanqian.hcy@antfin.com>
> 
> Occlum project (https://github.com/occlum/occlum) is a libOS built on top of
> Intel SGX feature. We ran Occlum tests using patch v29 on SGX hardware with
> the Flexible Launch Control (FLC) feature and didn't find any problems.
> As Occlum core developers, we would like these patches to be merged soon.

Great, thanks adding tested by to the driver and reclaimer patch.

/Jarkko

Sean Christopherson May 14, 2020, 4:15 p.m. UTC | #42

On Thu, May 14, 2020 at 04:16:37AM -0500, Dr. Greg wrote:
> What we would recommend at this point is that you and Jarkko do the
> Linux community and beyond a favor and wire up a simple kernel
> command-line parameter that controls the ability of the driver to be
> used, ie. enables/disables access to /dev/sgx/enclave.

I'm not opposed to adding a kernel param to disable SGX.  At one point
there was a proposal to extend clearcpuid to allow disabling multiple
feature bits, but it looks like that went the way of the dodo.

Note, such a param would disable SGX entirely, e.g. clear the feature bit
in /proc/cpuinfo and prevent any in-kernel SGX code from running.

Borislav Petkov May 14, 2020, 4:20 p.m. UTC | #43

On Thu, May 14, 2020 at 09:15:59AM -0700, Sean Christopherson wrote:
> I'm not opposed to adding a kernel param to disable SGX.  At one point
> there was a proposal to extend clearcpuid to allow disabling multiple
> feature bits, but it looks like that went the way of the dodo.
> 
> Note, such a param would disable SGX entirely, e.g. clear the feature bit
> in /proc/cpuinfo and prevent any in-kernel SGX code from running.

It is a usual practice for big features like SGX to add a
"nosgx" cmdline param to disable it in case something goes
south. We do this for all features - see all "no*" switches in
Documentation/admin-guide/kernel-parameters.txt

Thx.

Seth Moore May 14, 2020, 7:05 p.m. UTC | #44

On Fri, May 8, 2020 at 12:08 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> Adding some Google folks to the party.

Thanks, Sean.

> On Wed, Apr 22, 2020 at 12:52:56AM +0300, Jarkko Sakkinen wrote:
> > Intel(R) SGX is a set of CPU instructions that can be used by applications
> > to set aside private regions of code and data. The code outside the enclave
> > is disallowed to access the memory inside the enclave by the CPU access
> > control.
> >
> > There is a new hardware unit in the processor called Memory Encryption
> > Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define
> > one or many MEE regions that can hold enclave data by configuring them with
> > PRMRR registers.
> >
> > The MEE automatically encrypts the data leaving the processor package to
> > the MEE regions. The data is encrypted using a random key whose life-time
> > is exactly one power cycle.
> >
> > The current implementation requires that the firmware sets
> > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can
> > decide what enclaves it wants run. The implementation does not create
> > any bottlenecks to support read-only MSRs later on.
> >
> > You can tell if your CPU supports SGX by looking into /proc/cpuinfo:
> >
> >       cat /proc/cpuinfo  | grep sgx

We applied the v29 patches to Linux 5.6.0, then tested on Xeon(R) E-2186G
with Asylo (http://asylo.dev).

Looks good. All Asylo tests pass.

Tested-by: Seth Moore <sethmo@google.com>

Thomas Gleixner May 14, 2020, 7:29 p.m. UTC | #45

Borislav Petkov <bp@alien8.de> writes:

> On Thu, May 14, 2020 at 09:15:59AM -0700, Sean Christopherson wrote:
>> I'm not opposed to adding a kernel param to disable SGX.  At one point
>> there was a proposal to extend clearcpuid to allow disabling multiple
>> feature bits, but it looks like that went the way of the dodo.

clearcpuid is a trainwreck which should have never happened.

>> Note, such a param would disable SGX entirely, e.g. clear the feature bit
>> in /proc/cpuinfo and prevent any in-kernel SGX code from running.
>
> It is a usual practice for big features like SGX to add a
> "nosgx" cmdline param to disable it in case something goes
> south. We do this for all features - see all "no*" switches in
> Documentation/admin-guide/kernel-parameters.txt

Correct.

Thanks,

        tglx

Thomas Gleixner May 14, 2020, 7:30 p.m. UTC | #46

Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> writes:
>
> General question: maybe it would be easiest that I issue a pull request
> once everyone feels that the series is ready to be pulled and stop sending
> new versions of the series?

Might be the easiest for you, but I prefer a final series in email.

Thanks,

        tglx

Jarkko Sakkinen May 15, 2020, 12:09 a.m. UTC | #47

On Thu, 2020-05-14 at 09:15 -0700, Sean Christopherson wrote:
> On Thu, May 14, 2020 at 04:16:37AM -0500, Dr. Greg wrote:
> > What we would recommend at this point is that you and Jarkko do the
> > Linux community and beyond a favor and wire up a simple kernel
> > command-line parameter that controls the ability of the driver to be
> > used, ie. enables/disables access to /dev/sgx/enclave.
> 
> I'm not opposed to adding a kernel param to disable SGX.  At one point
> there was a proposal to extend clearcpuid to allow disabling multiple
> feature bits, but it looks like that went the way of the dodo.
> 
> Note, such a param would disable SGX entirely, e.g. clear the feature bit
> in /proc/cpuinfo and prevent any in-kernel SGX code from running.

Greg, you are free to submit a patch for review that adds any possible
kernel command line parameter SGX and beyond. SGX support does not "wire
up" anything that would prevent reviewing such patches.

/Jarkko

Jarkko Sakkinen May 15, 2020, 12:22 a.m. UTC | #48

On Thu, 2020-05-14 at 21:30 +0200, Thomas Gleixner wrote:
> Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> writes:
> > General question: maybe it would be easiest that I issue a pull request
> > once everyone feels that the series is ready to be pulled and stop sending
> > new versions of the series?
> 
> Might be the easiest for you, but I prefer a final series in email.
> 
> Thanks,
> 
>         tglx

For me both are just as easy or as hard :-)

Just wanted to query the preference.

/Jarkko

Jarkko Sakkinen May 15, 2020, 12:23 a.m. UTC | #49

On Thu, 2020-05-14 at 12:05 -0700, Seth Moore wrote:
> On Fri, May 8, 2020 at 12:08 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> > Adding some Google folks to the party.
> 
> Thanks, Sean.
> 
> > On Wed, Apr 22, 2020 at 12:52:56AM +0300, Jarkko Sakkinen wrote:
> > > Intel(R) SGX is a set of CPU instructions that can be used by applications
> > > to set aside private regions of code and data. The code outside the enclave
> > > is disallowed to access the memory inside the enclave by the CPU access
> > > control.
> > > 
> > > There is a new hardware unit in the processor called Memory Encryption
> > > Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define
> > > one or many MEE regions that can hold enclave data by configuring them with
> > > PRMRR registers.
> > > 
> > > The MEE automatically encrypts the data leaving the processor package to
> > > the MEE regions. The data is encrypted using a random key whose life-time
> > > is exactly one power cycle.
> > > 
> > > The current implementation requires that the firmware sets
> > > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can
> > > decide what enclaves it wants run. The implementation does not create
> > > any bottlenecks to support read-only MSRs later on.
> > > 
> > > You can tell if your CPU supports SGX by looking into /proc/cpuinfo:
> > > 
> > >       cat /proc/cpuinfo  | grep sgx
> 
> We applied the v29 patches to Linux 5.6.0, then tested on Xeon(R) E-2186G
> with Asylo (http://asylo.dev).
> 
> Looks good. All Asylo tests pass.
> 
> Tested-by: Seth Moore <sethmo@google.com>

Thanks.

/Jarkko

Jarkko Sakkinen May 15, 2020, 9:28 a.m. UTC | #50

On Thu, 2020-05-14 at 18:20 +0200, Borislav Petkov wrote:
> On Thu, May 14, 2020 at 09:15:59AM -0700, Sean Christopherson wrote:
> > I'm not opposed to adding a kernel param to disable SGX.  At one point
> > there was a proposal to extend clearcpuid to allow disabling multiple
> > feature bits, but it looks like that went the way of the dodo.
> > 
> > Note, such a param would disable SGX entirely, e.g. clear the feature bit
> > in /proc/cpuinfo and prevent any in-kernel SGX code from running.
> 
> It is a usual practice for big features like SGX to add a
> "nosgx" cmdline param to disable it in case something goes
> south. We do this for all features - see all "no*" switches in
> Documentation/admin-guide/kernel-parameters.txt

Uh oh, should probably address this. Should I send v31 today with a "nosgx"
patch added? Sorry for missing this one :-/

/Jarkko

Borislav Petkov May 15, 2020, 9:42 a.m. UTC | #51

On Fri, May 15, 2020 at 12:28:54PM +0300, Jarkko Sakkinen wrote:
> Uh oh, should probably address this. Should I send v31 today with a "nosgx"
> patch added? Sorry for missing this one :-/

Not the whole thing - just the one patch as a reply to your thread pls.

Thx.

Jarkko Sakkinen May 15, 2020, 4:24 p.m. UTC | #52

On Fri, 2020-05-15 at 11:42 +0200, Borislav Petkov wrote:
> On Fri, May 15, 2020 at 12:28:54PM +0300, Jarkko Sakkinen wrote:
> > Uh oh, should probably address this. Should I send v31 today with a "nosgx"
> > patch added? Sorry for missing this one :-/
> 
> Not the whole thing - just the one patch as a reply to your thread pls.
> 
> Thx.

OK, cool, thank you.

/Jarkko

Nathaniel McCallum May 15, 2020, 7:54 p.m. UTC | #53

On Thu, May 14, 2020 at 5:17 AM Dr. Greg <greg@enjellic.com> wrote:
>
> On Fri, May 08, 2020 at 12:56:35PM -0700, Sean Christopherson wrote:
>
> Good morning, I hope the week is proceeding well for everyone.
>
> > On Fri, May 08, 2020 at 02:02:26PM -0500, Dr. Greg wrote:
> > > On Thu, May 07, 2020 at 02:41:30AM +0200, Thomas Gleixner wrote:
> > > > The diffstat of your patch is irrelevant. What's relevant is the
> > > > fact that it is completely unreviewed and that it creates yet
> > > > another user space visible ABI with a noticable lack of
> > > > documentation.
> >
> > ...
> >
> > > As to lack of review, we would certainly welcome technical and API
> > > comments but we cannot force them.
>
> > Thomas' point isn't that your code needs to be reviewed, it's that
> > trying to squeeze it into the initial merge will, not might, _will_
> > push out the acceptance of SGX.  The same is true for other features
> > we have lined up, e.g. KVM and cgroup support.  It's not a slight on
> > your code, it's just reality at this point.
>
> For the record and for whatever it is worth at this point.  The patch
> has been available in its present form and function for around 14
> months.
>
> So there was plenty of time for its consideration and integration into
> what is being prepared as the final merge candidate.
>
> > > In fact, anyone who reviews the patch will see that the current driver
> > > creates a pointer in the ioctl call to pass downward into the hardware
> > > initialization routines.  Our code simply replaces that pointer with
> > > one supplied by userspace.
>
> > Personally, I'm in favor of adding a reserved placeholder for a
> > token so as to avoid adding a second ioctl() in the event that
> > something gets upstreamed that wants the token.  Ditto for a file
> > descriptor for the backing store in sgx_enclave_create.
>
> That would be a very low cost and forward looking addition.
>
> > > At the very least a modular form of the driver should be
> > > considered that would allow alternate implementations.  Sean
> > > indicated that there was a 'kludgy' approach that would allow an
> > > alternate modular implementation alongside the in-kernel driver.
> > > I believe that Andy has already indicated that may not be an
> > > advisable approach.
>
> > Modularizing the the driver does nothing for your use case.  The
> > "driver" is a relatively lightweight wrapper, removing that is akin
> > to making /dev/sgx/enclave inaccessible, i.e. it doesn't make the
> > EPC tracking and core code go away.  Modularizing the whole thing is
> > flat out not an option, as doing so is not compatible with an EPC
> > cgroup and adds unnecessary complexity to KVM enabling, both of
> > which are highly desired features by pretty much everyone that plans
> > on using SGX.
>
> All well taken technical points, but they don't speak directly to the
> issue at hand.  The potential security issue in all of this is access
> to /dev/sgx/enclave, not the EPC tracking and core code.
>
> Here in a nutshell is the paradox the kernel faces:
>
> No one seems to be disputing the fact that the primary focus of this
> driver will be to support the notion of 'runtime encryption' and
> Confidential Computing.  The whole premise of the concept and economic
> predicate of the initiative is that the owner/manager of the platform
> has no visibility into what is being done on the platform.
>
> If the Linux community believes that standard platform security
> controls can make qualitatively good judgements on what enclave based
> execution is doing, it is effectively making the statement that the
> very concept of runtime encryption and by extension Confidential
> Computing is invalid.
>
> If we accept the concept that Confidential Computing is valid then we
> admit the technology provides the opportunity for unsupervised code
> execution and data manipulation.
>
> Our premise in all of this is that one page of code implementing three
> linked lists is a small price to pay to provide platform owners with
> the opportunity to optionally implement what is effectively 2-factor
> authentication over this process.
>
> Going forward, if we are intellectually honest, all of this suggests
> that adding complexity to the driver with LSM controls makes little
> sense technically.  Since, by definition and economic intent, the
> technology provides tools and infrastructure that allows these
> controls to be evaded.
>
> The notion that entities with adversarial intent would not try and
> take advantage of this flies in the face of everything we know about
> security.
>
> > Andy is right to caution against playing games to squish in-kernel
> > things, but the virtualization snafu is a completely different
> > beast.  E.g. SGX doesn't require fiddling with CR4, won't corrupt
> > random memory across kexec(), doesn't block INIT, etc...  And I'm
> > not advocating long-term shenanigans, I just wanted to point out
> > that there are options for running out-of-tree drivers in the short
> > term, e.g. until proper policy controls land upstream.
>
> It appears that the distributions, at least IBM/RHEL, are going to
> compile this driver in and backport it as well.

The (Red Hat sponsored) Enarx project will continue building an
unofficial, unsupported version of the Fedora kernel with the SGX
patches[0] until such time as the patches are upstream. Once upstream,
I intend to propose that the feature be enabled in the stock Fedora
kernel.

Enarx requires EDMM support as a prerequisite to being production
ready. Therefore, we are likely to continue building this custom
Fedora kernel with the latest patches until such point as EDMM support
lands upstream. This also implies that I have no current plan to
request an SGX backport to a RHEL kernel until such time as it
supports our full needs.

Disclaimer: I do not control RHEL or Fedora kernel features. None of
the above constitutes a commitment to deliver anything.

[0]: https://copr.fedorainfracloud.org/coprs/npmccallum/enarx/package/kernel/

> What we would recommend at this point is that you and Jarkko do the
> Linux community and beyond a favor and wire up a simple kernel
> command-line parameter that controls the ability of the driver to be
> used, ie. enables/disables access to /dev/sgx/enclave.
>
> Distributions or others can set this command-line parameter by default
> to 'disable' and avoid any possibility of the technology being used
> for nefarious purposes.  Since the technology now appears to be
> focused only on the cloud providers they will certainly be capable of
> configuring their implementations to change the default.
>
> In essence, make the kernel's behavior secure by default.
>
> Best wishes for a pleasant weekend to everyone.
>
> Dr. Greg
>
> As always,
> Dr. Greg Wettstein, Ph.D, Worker      Artisans in autonomously
> Enjellic Systems Development, LLC     self-defensive IOT platforms
> 4206 N. 19th Ave.                     and edge devices.
> Fargo, ND  58102
> PH: 701-281-1686                      EMAIL: greg@enjellic.com
> ------------------------------------------------------------------------------
> "Intellectuals solve problems; geniuses prevent them."
>                                 -- Albert Einstein
>

Jarkko Sakkinen May 16, 2020, 9:58 a.m. UTC | #54

On Fri, 2020-05-15 at 15:54 -0400, Nathaniel McCallum wrote:
> The (Red Hat sponsored) Enarx project will continue building an
> unofficial, unsupported version of the Fedora kernel with the SGX
> patches[0] until such time as the patches are upstream. Once upstream,
> I intend to propose that the feature be enabled in the stock Fedora
> kernel.
> 
> Enarx requires EDMM support as a prerequisite to being production
> ready. Therefore, we are likely to continue building this custom
> Fedora kernel with the latest patches until such point as EDMM support
> lands upstream. This also implies that I have no current plan to
> request an SGX backport to a RHEL kernel until such time as it
> supports our full needs.
> 
> Disclaimer: I do not control RHEL or Fedora kernel features. None of
> the above constitutes a commitment to deliver anything.
> 
> [0]: https://copr.fedorainfracloud.org/coprs/npmccallum/enarx/package/kernel/

SGX is somewhat self-contained feature, i.e. should be easy to backport
for any recent kernel. Only the vDSO is outside of arch/x86/kernel/cpu/sgx.

/Jarkko

Pavel Machek May 24, 2020, 9:27 p.m. UTC | #55

Hi!

> > > At the very least a modular form of the driver should be
> > > considered that would allow alternate implementations.  Sean
> > > indicated that there was a 'kludgy' approach that would allow an
> > > alternate modular implementation alongside the in-kernel driver.
> > > I believe that Andy has already indicated that may not be an
> > > advisable approach.
> 
> > Modularizing the the driver does nothing for your use case.  The
> > "driver" is a relatively lightweight wrapper, removing that is akin
> > to making /dev/sgx/enclave inaccessible, i.e. it doesn't make the

Well... SGX is proprietary feature of Intel. I don't see any effort for standartization
so that other architectures could use it. Yet it provides userspace interface...

You clearly want distros to enable it, but that will waste memory on non-Intel systems.

That is not good.

> Here in a nutshell is the paradox the kernel faces:
> 
> No one seems to be disputing the fact that the primary focus of this
> driver will be to support the notion of 'runtime encryption' and
> Confidential Computing.  The whole premise of the concept and economic
> predicate of the initiative is that the owner/manager of the platform
> has no visibility into what is being done on the platform.

Well, in my eyes preventing owner of the machine from accessing all its parts is
pretty questionable.

Physics says it is impossible, many tried, and many failed. Why it should be
different this time?

> If the Linux community believes that standard platform security
> controls can make qualitatively good judgements on what enclave based
> execution is doing, it is effectively making the statement that the
> very concept of runtime encryption and by extension Confidential
> Computing is invalid.

And yes, I believe that concept of Confidential Computing is invalid.. and we
should simply not merge this support.

It provides false sense of security, and I'm afraid big companies will try to force
people to use it. "DRM, now with hardware support". "Finally advertisments you can
not skip". "Just run this piece of code on your machine to access your bank account.
Trust us!"

:-(.

									Pavel

David Laight May 26, 2020, 8:16 a.m. UTC | #56

From: Pavel Machek
> Sent: 24 May 2020 22:27
..
> It provides false sense of security, and I'm afraid big companies will try to force
> people to use it. "DRM, now with hardware support". "Finally advertisments you can
> not skip". "Just run this piece of code on your machine to access your bank account.
> Trust us!"

Hey malware guys, here is somewhere you can hide your code
making it very difficult to find.
You can then use the hardware disk encryption that people think
gives them security to encrypt all the files.

Job done...

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Jarkko Sakkinen May 28, 2020, 11:15 a.m. UTC | #57

On Thu, May 07, 2020 at 05:25:55PM -0700, Sean Christopherson wrote:
> Ah, fudge.  shmem_zero_setup() triggers shmem_acct_size() and thus
> __vm_enough_memory().  Which I should have rememered because I've stared
> at that code several times when dealing with the enclave's backing store.
> I wasn't seeing the issue because I happened to use MAP_PRIVATE.
> 
> So, bad analysis, good conclusion, i.e. the kernel is still doing the
> right thing, it's just not ideal for userspace.
> 
> 
> Jarkko, we should update the docs and selftest to recommend and use
> 
>   PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS
> 
> or
> 
>   PROT_NONE, MAP_SHARED | MAP_NORESERVE | MAP_ANONYMOUS"
> 
> when carving out ELRANGE, with an explicit comment that all the normal
> rules for mapping memory still apply.

Ugh, had forgotten this.

OK, I guess this comment explains it all:

"
/*
 * shmem_file_setup pre-accounts the whole fixed size of a VM object,
 * for shared memory and for shared anonymous (/dev/zero) mappings
 * (unless MAP_NORESERVE and sysctl_overcommit_memory <= 1),
 * consistent with the pre-accounting of private mappings ...
 */
static inline int shmem_acct_size(unsigned long flags, loff_t size)
"

/Jarkko

Jarkko Sakkinen May 28, 2020, 11:19 a.m. UTC | #58

On Thu, May 28, 2020 at 02:15:18PM +0300, Jarkko Sakkinen wrote:
> On Thu, May 07, 2020 at 05:25:55PM -0700, Sean Christopherson wrote:
> > Ah, fudge.  shmem_zero_setup() triggers shmem_acct_size() and thus
> > __vm_enough_memory().  Which I should have rememered because I've stared
> > at that code several times when dealing with the enclave's backing store.
> > I wasn't seeing the issue because I happened to use MAP_PRIVATE.
> > 
> > So, bad analysis, good conclusion, i.e. the kernel is still doing the
> > right thing, it's just not ideal for userspace.
> > 
> > 
> > Jarkko, we should update the docs and selftest to recommend and use
> > 
> >   PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS
> > 
> > or
> > 
> >   PROT_NONE, MAP_SHARED | MAP_NORESERVE | MAP_ANONYMOUS"
> > 
> > when carving out ELRANGE, with an explicit comment that all the normal
> > rules for mapping memory still apply.
> 
> Ugh, had forgotten this.
> 
> OK, I guess this comment explains it all:
> 
> "
> /*
>  * shmem_file_setup pre-accounts the whole fixed size of a VM object,
>  * for shared memory and for shared anonymous (/dev/zero) mappings
>  * (unless MAP_NORESERVE and sysctl_overcommit_memory <= 1),
>  * consistent with the pre-accounting of private mappings ...
>  */
> static inline int shmem_acct_size(unsigned long flags, loff_t size)
> "

Do not agree though that any documentation should be produced but the
selftest should have correct parameters, yes. Instructions on how to
reserve a range of addresses simply does not belong to SGX documentation
because it is not SGX related in the first place. The patterns you
showed are universal.

I'll fix just the selftest for v31.

/Jarkko

[v29,00/20] Intel SGX foundations

Message

Comments