From patchwork Fri Sep  4 17:37:01 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Paraschiv, Andra-Irina" <andraprs@amazon.com>
X-Patchwork-Id: 11758303
Return-Path: <SRS0=EfB3=CN=vger.kernel.org=kvm-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8C263138E
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:02 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 5AF5423D58
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:02 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com
 header.b="REyWZldh"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1728045AbgIDRhv (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Fri, 4 Sep 2020 13:37:51 -0400
Received: from smtp-fw-4101.amazon.com ([72.21.198.25]:49812 "EHLO
        smtp-fw-4101.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1726800AbgIDRhu (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 4 Sep 2020 13:37:50 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209;
  t=1599241068; x=1630777068;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=rZmxzJa++9MgO5sbtwcclbk5uyV6IgYtmIUjDOWHdJQ=;
  b=REyWZldhnq1t1h6iA91O5bZq2JMp3/BpqhHAHMkAk5Rzbpii72N+KjsG
   Ww1AAzpq7+8ceDs0kcQM0F5TeRRHC+1J0hu2ZNiPEGr3q1QIAasHE7ILS
   NMWhuBreQeCMXhslbyhqqSl4Z6lHa6pu3gmGHy65EfNZX5mxbAgz5gZUU
   0=;
X-IronPort-AV: E=Sophos;i="5.76,390,1592870400";
   d="scan'208";a="52178755"
Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO
 email-inbound-relay-1d-5dd976cd.us-east-1.amazon.com) ([10.43.8.6])
  by smtp-border-fw-out-4101.iad4.amazon.com with ESMTP;
 04 Sep 2020 17:37:46 +0000
Received: from EX13D16EUB001.ant.amazon.com
 (iad55-ws-svc-p15-lb9-vlan2.iad.amazon.com [10.40.159.162])
        by email-inbound-relay-1d-5dd976cd.us-east-1.amazon.com (Postfix) with
 ESMTPS id C8ED8A20F8;
        Fri,  4 Sep 2020 17:37:45 +0000 (UTC)
Received: from 38f9d34ed3b1.ant.amazon.com (10.43.161.85) by
 EX13D16EUB001.ant.amazon.com (10.43.166.28) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Fri, 4 Sep 2020 17:37:35 +0000
From: Andra Paraschiv <andraprs@amazon.com>
To: linux-kernel <linux-kernel@vger.kernel.org>
CC: Anthony Liguori <aliguori@amazon.com>,
        Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Colm MacCarthaigh <colmmacc@amazon.com>,
        "David Duncan" <davdunc@amazon.com>,
        Bjoern Doebel <doebel@amazon.de>,
        "David Woodhouse" <dwmw@amazon.co.uk>,
        Frank van der Linden <fllinden@amazon.com>,
        Alexander Graf <graf@amazon.de>,
        Greg KH <gregkh@linuxfoundation.org>,
        "Karen Noel" <knoel@redhat.com>,
        Martin Pohlack <mpohlack@amazon.de>,
        Matt Wilson <msw@amazon.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Balbir Singh <sblbir@amazon.com>,
        Stefano Garzarella <sgarzare@redhat.com>,
        "Stefan Hajnoczi" <stefanha@redhat.com>,
        Stewart Smith <trawets@amazon.com>,
        "Uwe Dannowski" <uwed@amazon.de>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        kvm <kvm@vger.kernel.org>,
        ne-devel-upstream <ne-devel-upstream@amazon.com>,
        Andra Paraschiv <andraprs@amazon.com>
Subject: [PATCH v8 01/18] nitro_enclaves: Add ioctl interface definition
Date: Fri, 4 Sep 2020 20:37:01 +0300
Message-ID: <20200904173718.64857-2-andraprs@amazon.com>
X-Mailer: git-send-email 2.20.1 (Apple Git-117)
In-Reply-To: <20200904173718.64857-1-andraprs@amazon.com>
References: <20200904173718.64857-1-andraprs@amazon.com>
MIME-Version: 1.0
X-Originating-IP: [10.43.161.85]
X-ClientProxiedBy: EX13D13UWB001.ant.amazon.com (10.43.161.156) To
 EX13D16EUB001.ant.amazon.com (10.43.166.28)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

The Nitro Enclaves driver handles the enclave lifetime management. This
includes enclave creation, termination and setting up its resources such
as memory and CPU.

An enclave runs alongside the VM that spawned it. It is abstracted as a
process running in the VM that launched it. The process interacts with
the NE driver, that exposes an ioctl interface for creating an enclave
and setting up its resources.

Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
Changelog

v7 -> v8

* Add NE custom error codes for user space memory regions not backed by
  pages multiple of 2 MiB, invalid flags and enclave CID.
* Add max flag value for enclave image load info.

v6 -> v7

* Clarify in the ioctls documentation that the return value is -1 and
  errno is set on failure.
* Update the error code value for NE_ERR_INVALID_MEM_REGION_SIZE as it
  gets in user space as value 25 (ENOTTY) instead of 515. Update the
  NE custom error codes values range to not be the same as the ones
  defined in include/linux/errno.h, although these are not propagated
  to user space.

v5 -> v6

* Fix typo in the description about the NE CPU pool.
* Update documentation to kernel-doc format.
* Remove the ioctl to query API version.

v4 -> v5

* Add more details about the ioctl calls usage e.g. error codes, file
  descriptors used.
* Update the ioctl to set an enclave vCPU to not return a file
  descriptor.
* Add specific NE error codes.

v3 -> v4

* Decouple NE ioctl interface from KVM API.
* Add NE API version and the corresponding ioctl call.
* Add enclave / image load flags options.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Add ioctl for getting enclave image load metadata.
* Update NE_ENCLAVE_START ioctl name to NE_START_ENCLAVE.
* Add entry in Documentation/userspace-api/ioctl/ioctl-number.rst for NE
  ioctls.
* Update NE ioctls definition based on the updated ioctl range for major
  and minor.
---
 .../userspace-api/ioctl/ioctl-number.rst      |   5 +-
 include/linux/nitro_enclaves.h                |  11 +
 include/uapi/linux/nitro_enclaves.h           | 359 ++++++++++++++++++
 3 files changed, 374 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/nitro_enclaves.h
 create mode 100644 include/uapi/linux/nitro_enclaves.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 2a198838fca9..5f7ff00f394e 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -328,8 +328,11 @@ Code  Seq#    Include File                                           Comments
 0xAC  00-1F  linux/raw.h
 0xAD  00                                                             Netfilter device in development:
                                                                      <mailto:rusty@rustcorp.com.au>
-0xAE  all    linux/kvm.h                                             Kernel-based Virtual Machine
+0xAE  00-1F  linux/kvm.h                                             Kernel-based Virtual Machine
                                                                      <mailto:kvm@vger.kernel.org>
+0xAE  40-FF  linux/kvm.h                                             Kernel-based Virtual Machine
+                                                                     <mailto:kvm@vger.kernel.org>
+0xAE  20-3F  linux/nitro_enclaves.h                                  Nitro Enclaves
 0xAF  00-1F  linux/fsl_hypervisor.h                                  Freescale hypervisor
 0xB0  all                                                            RATIO devices in development:
                                                                      <mailto:vgo@ratio.de>
diff --git a/include/linux/nitro_enclaves.h b/include/linux/nitro_enclaves.h
new file mode 100644
index 000000000000..d91ef2bfdf47
--- /dev/null
+++ b/include/linux/nitro_enclaves.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _LINUX_NITRO_ENCLAVES_H_
+#define _LINUX_NITRO_ENCLAVES_H_
+
+#include <uapi/linux/nitro_enclaves.h>
+
+#endif /* _LINUX_NITRO_ENCLAVES_H_ */
diff --git a/include/uapi/linux/nitro_enclaves.h b/include/uapi/linux/nitro_enclaves.h
new file mode 100644
index 000000000000..b945073fe544
--- /dev/null
+++ b/include/uapi/linux/nitro_enclaves.h
@@ -0,0 +1,359 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _UAPI_LINUX_NITRO_ENCLAVES_H_
+#define _UAPI_LINUX_NITRO_ENCLAVES_H_
+
+#include <linux/types.h>
+
+/**
+ * DOC: Nitro Enclaves (NE) Kernel Driver Interface
+ */
+
+/**
+ * NE_CREATE_VM - The command is used to create a slot that is associated with
+ *		  an enclave VM.
+ *		  The generated unique slot id is an output parameter.
+ *		  The ioctl can be invoked on the /dev/nitro_enclaves fd, before
+ *		  setting any resources, such as memory and vCPUs, for an
+ *		  enclave. Memory and vCPUs are set for the slot mapped to an enclave.
+ *		  A NE CPU pool has to be set before calling this function. The
+ *		  pool can be set after the NE driver load, using
+ *		  /sys/module/nitro_enclaves/parameters/ne_cpus.
+ *		  Its format is the detailed in the cpu-lists section:
+ *		  https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
+ *		  CPU 0 and its siblings have to remain available for the
+ *		  primary / parent VM, so they cannot be set for enclaves. Full
+ *		  CPU core(s), from the same NUMA node, need(s) to be included
+ *		  in the CPU pool.
+ *
+ * Context: Process context.
+ * Return:
+ * * Enclave file descriptor		- Enclave file descriptor used with
+ *					  ioctl calls to set vCPUs and memory
+ *					  regions, then start the enclave.
+ * *  -1				- There was a failure in the ioctl logic.
+ * On failure, errno is set to:
+ * * EFAULT				- copy_to_user() failure.
+ * * ENOMEM				- Memory allocation failure for internal
+ *					  bookkeeping variables.
+ * * NE_ERR_NO_CPUS_AVAIL_IN_POOL	- No NE CPU pool set / no CPUs available
+ *					  in the pool.
+ * * Error codes from get_unused_fd_flags() and anon_inode_getfile().
+ * * Error codes from the NE PCI device request.
+ */
+#define NE_CREATE_VM			_IOR(0xAE, 0x20, __u64)
+
+/**
+ * NE_ADD_VCPU - The command is used to set a vCPU for an enclave. The vCPU can
+ *		 be auto-chosen from the NE CPU pool or it can be set by the
+ *		 caller, with the note that it needs to be available in the NE
+ *		 CPU pool. Full CPU core(s), from the same NUMA node, need(s) to
+ *		 be associated with an enclave.
+ *		 The vCPU id is an input / output parameter. If its value is 0,
+ *		 then a CPU is chosen from the enclave CPU pool and returned via
+ *		 this parameter.
+ *		 The ioctl can be invoked on the enclave fd, before an enclave
+ *		 is started.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0					- Logic succesfully completed.
+ * *  -1				- There was a failure in the ioctl logic.
+ * On failure, errno is set to:
+ * * EFAULT				- copy_from_user() / copy_to_user() failure.
+ * * ENOMEM				- Memory allocation failure for internal
+ *					  bookkeeping variables.
+ * * EIO				- Current task mm is not the same as the one
+ *					  that created the enclave.
+ * * NE_ERR_NO_CPUS_AVAIL_IN_POOL	- No CPUs available in the NE CPU pool.
+ * * NE_ERR_VCPU_ALREADY_USED		- The provided vCPU is already used.
+ * * NE_ERR_VCPU_NOT_IN_CPU_POOL	- The provided vCPU is not available in the
+ *					  NE CPU pool.
+ * * NE_ERR_VCPU_INVALID_CPU_CORE	- The core id of the provided vCPU is invalid
+ *					  or out of range.
+ * * NE_ERR_NOT_IN_INIT_STATE		- The enclave is not in init state
+ *					  (init = before being started).
+ * * NE_ERR_INVALID_VCPU		- The provided vCPU is not in the available
+ *					  CPUs range.
+ * * Error codes from the NE PCI device request.
+ */
+#define NE_ADD_VCPU			_IOWR(0xAE, 0x21, __u32)
+
+/**
+ * NE_GET_IMAGE_LOAD_INFO - The command is used to get information needed for
+ *			    in-memory enclave image loading e.g. offset in
+ *			    enclave memory to start placing the enclave image.
+ *			    The image load info is an input / output parameter.
+ *			    It includes info provided by the caller - flags -
+ *			    and returns the offset in enclave memory where to
+ *			    start placing the enclave image.
+ *			    The ioctl can be invoked on the enclave fd, before
+ *			    an enclave is started.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0				- Logic succesfully completed.
+ * *  -1			- There was a failure in the ioctl logic.
+ * On failure, errno is set to:
+ * * EFAULT			- copy_from_user() / copy_to_user() failure.
+ * * NE_ERR_NOT_IN_INIT_STATE	- The enclave is not in init state (init =
+ *				  before being started).
+ * * NE_ERR_INVALID_FLAG_VALUE	- The value of the provided flag is invalid.
+ */
+#define NE_GET_IMAGE_LOAD_INFO		_IOWR(0xAE, 0x22, struct ne_image_load_info)
+
+/**
+ * NE_SET_USER_MEMORY_REGION - The command is used to set a memory region for an
+ *			       enclave, given the allocated memory from the
+ *			       userspace. Enclave memory needs to be from the
+ *			       same NUMA node as the enclave CPUs.
+ *			       The user memory region is an input parameter. It
+ *			       includes info provided by the caller - flags,
+ *			       memory size and userspace address.
+ *			       The ioctl can be invoked on the enclave fd,
+ *			       before an enclave is started.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0					- Logic succesfully completed.
+ * *  -1				- There was a failure in the ioctl logic.
+ * On failure, errno is set to:
+ * * EFAULT				- copy_from_user() failure.
+ * * EINVAL				- Invalid physical memory region(s) e.g.
+ *					  unaligned address.
+ * * EIO				- Current task mm is not the same as
+ *					  the one that created the enclave.
+ * * ENOMEM				- Memory allocation failure for internal
+ *					  bookkeeping variables.
+ * * NE_ERR_NOT_IN_INIT_STATE		- The enclave is not in init state
+ *					  (init = before being started).
+ * * NE_ERR_INVALID_MEM_REGION_SIZE	- The memory size of the region is not
+ *					  multiple of 2 MiB.
+ * * NE_ERR_INVALID_MEM_REGION_ADDR	- Invalid user space address given.
+ * * NE_ERR_UNALIGNED_MEM_REGION_ADDR	- Unaligned user space address given.
+ * * NE_ERR_MEM_REGION_ALREADY_USED	- The memory region is already used.
+ * * NE_ERR_MEM_NOT_HUGE_PAGE		- The memory region is not backed by
+ *					  huge pages.
+ * * NE_ERR_MEM_DIFFERENT_NUMA_NODE	- The memory region is not from the same
+ *					  NUMA node as the CPUs.
+ * * NE_ERR_MEM_MAX_REGIONS		- The number of memory regions set for
+ *					  the enclave reached maximum.
+ * * NE_ERR_INVALID_PAGE_SIZE		- The memory region is not backed by
+ *					  pages multiple of 2 MiB.
+ * * NE_ERR_INVALID_FLAG_VALUE		- The value of the provided flag is invalid.
+ * * Error codes from get_user_pages().
+ * * Error codes from the NE PCI device request.
+ */
+#define NE_SET_USER_MEMORY_REGION	_IOW(0xAE, 0x23, struct ne_user_memory_region)
+
+/**
+ * NE_START_ENCLAVE - The command is used to trigger enclave start after the
+ *		      enclave resources, such as memory and CPU, have been set.
+ *		      The enclave start info is an input / output parameter. It
+ *		      includes info provided by the caller - enclave cid and
+ *		      flags - and returns the cid (if input cid is 0).
+ *		      The ioctl can be invoked on the enclave fd, after an
+ *		      enclave slot is created and resources, such as memory and
+ *		      vCPUs are set for an enclave.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0					- Logic succesfully completed.
+ * *  -1				- There was a failure in the ioctl logic.
+ * On failure, errno is set to:
+ * * EFAULT				- copy_from_user() / copy_to_user() failure.
+ * * NE_ERR_NOT_IN_INIT_STATE		- The enclave is not in init state
+ *					  (init = before being started).
+ * * NE_ERR_NO_MEM_REGIONS_ADDED	- No memory regions are set.
+ * * NE_ERR_NO_VCPUS_ADDED		- No vCPUs are set.
+ * *  NE_ERR_FULL_CORES_NOT_USED	- Full core(s) not set for the enclave.
+ * * NE_ERR_ENCLAVE_MEM_MIN_SIZE	- Enclave memory is less than minimum
+ *					  memory size (64 MiB).
+ * * NE_ERR_INVALID_FLAG_VALUE		- The value of the provided flag is invalid.
+ * *  NE_ERR_INVALID_ENCLAVE_CID	- The provided enclave CID is invalid.
+ * * Error codes from the NE PCI device request.
+ */
+#define NE_START_ENCLAVE		_IOWR(0xAE, 0x24, struct ne_enclave_start_info)
+
+/**
+ * DOC: NE specific error codes
+ */
+
+/**
+ * NE_ERR_VCPU_ALREADY_USED - The provided vCPU is already used.
+ */
+#define NE_ERR_VCPU_ALREADY_USED		(256)
+/**
+ * NE_ERR_VCPU_NOT_IN_CPU_POOL - The provided vCPU is not available in the
+ *				 NE CPU pool.
+ */
+#define NE_ERR_VCPU_NOT_IN_CPU_POOL		(257)
+/**
+ * NE_ERR_VCPU_INVALID_CPU_CORE - The core id of the provided vCPU is invalid
+ *				  or out of range of the NE CPU pool.
+ */
+#define NE_ERR_VCPU_INVALID_CPU_CORE		(258)
+/**
+ * NE_ERR_INVALID_MEM_REGION_SIZE - The user space memory region size is not
+ *				    multiple of 2 MiB.
+ */
+#define NE_ERR_INVALID_MEM_REGION_SIZE		(259)
+/**
+ * NE_ERR_INVALID_MEM_REGION_ADDR - The user space memory region address range
+ *				    is invalid.
+ */
+#define NE_ERR_INVALID_MEM_REGION_ADDR		(260)
+/**
+ * NE_ERR_UNALIGNED_MEM_REGION_ADDR - The user space memory region address is
+ *				      not aligned.
+ */
+#define NE_ERR_UNALIGNED_MEM_REGION_ADDR	(261)
+/**
+ * NE_ERR_MEM_REGION_ALREADY_USED - The user space memory region is already used.
+ */
+#define NE_ERR_MEM_REGION_ALREADY_USED		(262)
+/**
+ * NE_ERR_MEM_NOT_HUGE_PAGE - The user space memory region is not backed by
+ *			      contiguous physical huge page(s).
+ */
+#define NE_ERR_MEM_NOT_HUGE_PAGE		(263)
+/**
+ * NE_ERR_MEM_DIFFERENT_NUMA_NODE - The user space memory region is backed by
+ *				    pages from different NUMA nodes than the CPUs.
+ */
+#define NE_ERR_MEM_DIFFERENT_NUMA_NODE		(264)
+/**
+ * NE_ERR_MEM_MAX_REGIONS - The supported max memory regions per enclaves has
+ *			    been reached.
+ */
+#define NE_ERR_MEM_MAX_REGIONS			(265)
+/**
+ * NE_ERR_NO_MEM_REGIONS_ADDED - The command to start an enclave is triggered
+ *				 and no memory regions are added.
+ */
+#define NE_ERR_NO_MEM_REGIONS_ADDED		(266)
+/**
+ * NE_ERR_NO_VCPUS_ADDED - The command to start an enclave is triggered and no
+ *			   vCPUs are added.
+ */
+#define NE_ERR_NO_VCPUS_ADDED			(267)
+/**
+ * NE_ERR_ENCLAVE_MEM_MIN_SIZE - The enclave memory size is lower than the
+ *				 minimum supported.
+ */
+#define NE_ERR_ENCLAVE_MEM_MIN_SIZE		(268)
+/**
+ * NE_ERR_FULL_CORES_NOT_USED - The command to start an enclave is triggered and
+ *				full CPU cores are not set.
+ */
+#define NE_ERR_FULL_CORES_NOT_USED		(269)
+/**
+ * NE_ERR_NOT_IN_INIT_STATE - The enclave is not in init state when setting
+ *			      resources or triggering start.
+ */
+#define NE_ERR_NOT_IN_INIT_STATE		(270)
+/**
+ * NE_ERR_INVALID_VCPU - The provided vCPU is out of range of the available CPUs.
+ */
+#define NE_ERR_INVALID_VCPU			(271)
+/**
+ * NE_ERR_NO_CPUS_AVAIL_IN_POOL - The command to create an enclave is triggered
+ *				  and no CPUs are available in the pool.
+ */
+#define NE_ERR_NO_CPUS_AVAIL_IN_POOL		(272)
+/**
+ * NE_ERR_INVALID_PAGE_SIZE - The user space memory region is not backed by pages
+ *			      multiple of 2 MiB.
+ */
+#define NE_ERR_INVALID_PAGE_SIZE		(273)
+/**
+ * NE_ERR_INVALID_FLAG_VALUE - The provided flag value is invalid.
+ */
+#define NE_ERR_INVALID_FLAG_VALUE		(274)
+/**
+ * NE_ERR_INVALID_ENCLAVE_CID - The provided enclave CID is invalid, either
+ *				being a well-known value or the CID of the
+ *				parent / primary VM.
+ */
+#define NE_ERR_INVALID_ENCLAVE_CID		(275)
+
+/**
+ * DOC: Image load info flags
+ */
+
+/**
+ * NE_EIF_IMAGE - Enclave Image Format (EIF)
+ */
+#define NE_EIF_IMAGE			(0x01)
+
+#define NE_IMAGE_LOAD_MAX_FLAG_VAL	(0x02)
+
+/**
+ * struct ne_image_load_info - Info necessary for in-memory enclave image
+ *			       loading (in / out).
+ * @flags:		Flags to determine the enclave image type
+ *			(e.g. Enclave Image Format - EIF) (in).
+ * @memory_offset:	Offset in enclave memory where to start placing the
+ *			enclave image (out).
+ */
+struct ne_image_load_info {
+	__u64	flags;
+	__u64	memory_offset;
+};
+
+/**
+ * DOC: User memory region flags
+ */
+
+/**
+ * NE_DEFAULT_MEMORY_REGION - Memory region for enclave general usage.
+ */
+#define NE_DEFAULT_MEMORY_REGION	(0x00)
+
+#define NE_MEMORY_REGION_MAX_FLAG_VAL	(0x01)
+
+/**
+ * struct ne_user_memory_region - Memory region to be set for an enclave (in).
+ * @flags:		Flags to determine the usage for the memory region (in).
+ * @memory_size:	The size, in bytes, of the memory region to be set for
+ *			an enclave (in).
+ * @userspace_addr:	The start address of the userspace allocated memory of
+ *			the memory region to set for an enclave (in).
+ */
+struct ne_user_memory_region {
+	__u64	flags;
+	__u64	memory_size;
+	__u64	userspace_addr;
+};
+
+/**
+ * DOC: Enclave start info flags
+ */
+
+/**
+ * NE_ENCLAVE_PRODUCTION_MODE - Start enclave in production mode.
+ */
+#define NE_ENCLAVE_PRODUCTION_MODE	(0x00)
+/**
+ * NE_ENCLAVE_DEBUG_MODE - Start enclave in debug mode.
+ */
+#define NE_ENCLAVE_DEBUG_MODE		(0x01)
+
+#define NE_ENCLAVE_START_MAX_FLAG_VAL	(0x02)
+
+/**
+ * struct ne_enclave_start_info - Setup info necessary for enclave start (in / out).
+ * @flags:		Flags for the enclave to start with (e.g. debug mode) (in).
+ * @enclave_cid:	Context ID (CID) for the enclave vsock device. If 0 as
+ *			input, the CID is autogenerated by the hypervisor and
+ *			returned back as output by the driver (in / out).
+ */
+struct ne_enclave_start_info {
+	__u64	flags;
+	__u64	enclave_cid;
+};
+
+#endif /* _UAPI_LINUX_NITRO_ENCLAVES_H_ */

From patchwork Fri Sep  4 17:37:02 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Paraschiv, Andra-Irina" <andraprs@amazon.com>
X-Patchwork-Id: 11758305
Return-Path: <SRS0=EfB3=CN=vger.kernel.org=kvm-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E0BA992C
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:02 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id ACA0E23D58
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:02 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com
 header.b="PZi4nwkF"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1728071AbgIDRiI (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Fri, 4 Sep 2020 13:38:08 -0400
Received: from smtp-fw-4101.amazon.com ([72.21.198.25]:49872 "EHLO
        smtp-fw-4101.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1726800AbgIDRiB (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 4 Sep 2020 13:38:01 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209;
  t=1599241078; x=1630777078;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=r/ToL4pElbxPdyhMOFoiLdoIDOKKVPpemQExiSHgjig=;
  b=PZi4nwkF9KDXfKR3q+2F4oekcwQTvq81mqv/il8vPt+ekUiJCD7dWX35
   tCWyst7ztlJH5YCBztUI9z4i6VdsQJFHmRk9HO80e20rRyBV1kA+Dmn/G
   uBVXK7G0qMRwQWlmyyCVK4QS19bP9FSUHvcm8GE5p4Od23FO8tFx8YHhm
   o=;
X-IronPort-AV: E=Sophos;i="5.76,390,1592870400";
   d="scan'208";a="52178803"
Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO
 email-inbound-relay-1e-a70de69e.us-east-1.amazon.com) ([10.43.8.6])
  by smtp-border-fw-out-4101.iad4.amazon.com with ESMTP;
 04 Sep 2020 17:37:57 +0000
Received: from EX13D16EUB001.ant.amazon.com
 (iad55-ws-svc-p15-lb9-vlan3.iad.amazon.com [10.40.159.166])
        by email-inbound-relay-1e-a70de69e.us-east-1.amazon.com (Postfix) with
 ESMTPS id 1879DA0629;
        Fri,  4 Sep 2020 17:37:54 +0000 (UTC)
Received: from 38f9d34ed3b1.ant.amazon.com (10.43.161.85) by
 EX13D16EUB001.ant.amazon.com (10.43.166.28) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Fri, 4 Sep 2020 17:37:45 +0000
From: Andra Paraschiv <andraprs@amazon.com>
To: linux-kernel <linux-kernel@vger.kernel.org>
CC: Anthony Liguori <aliguori@amazon.com>,
        Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Colm MacCarthaigh <colmmacc@amazon.com>,
        "David Duncan" <davdunc@amazon.com>,
        Bjoern Doebel <doebel@amazon.de>,
        "David Woodhouse" <dwmw@amazon.co.uk>,
        Frank van der Linden <fllinden@amazon.com>,
        Alexander Graf <graf@amazon.de>,
        Greg KH <gregkh@linuxfoundation.org>,
        "Karen Noel" <knoel@redhat.com>,
        Martin Pohlack <mpohlack@amazon.de>,
        Matt Wilson <msw@amazon.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Balbir Singh <sblbir@amazon.com>,
        Stefano Garzarella <sgarzare@redhat.com>,
        "Stefan Hajnoczi" <stefanha@redhat.com>,
        Stewart Smith <trawets@amazon.com>,
        "Uwe Dannowski" <uwed@amazon.de>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        kvm <kvm@vger.kernel.org>,
        ne-devel-upstream <ne-devel-upstream@amazon.com>,
        Andra Paraschiv <andraprs@amazon.com>
Subject: [PATCH v8 02/18] nitro_enclaves: Define the PCI device interface
Date: Fri, 4 Sep 2020 20:37:02 +0300
Message-ID: <20200904173718.64857-3-andraprs@amazon.com>
X-Mailer: git-send-email 2.20.1 (Apple Git-117)
In-Reply-To: <20200904173718.64857-1-andraprs@amazon.com>
References: <20200904173718.64857-1-andraprs@amazon.com>
MIME-Version: 1.0
X-Originating-IP: [10.43.161.85]
X-ClientProxiedBy: EX13D13UWB001.ant.amazon.com (10.43.161.156) To
 EX13D16EUB001.ant.amazon.com (10.43.166.28)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

The Nitro Enclaves (NE) driver communicates with a new PCI device, that
is exposed to a virtual machine (VM) and handles commands meant for
handling enclaves lifetime e.g. creation, termination, setting memory
regions. The communication with the PCI device is handled using a MMIO
space and MSI-X interrupts.

This device communicates with the hypervisor on the host, where the VM
that spawned the enclave itself runs, e.g. to launch a VM that is used
for the enclave.

Define the MMIO space of the NE PCI device, the commands that are
provided by this device. Add an internal data structure used as private
data for the PCI device driver and the function for the PCI device
command requests handling.

Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
Signed-off-by: Alexandru Ciobotaru <alcioa@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
---
Changelog

v7 -> v8

* No changes.

v6 -> v7

* Update the documentation to include references to the NE PCI device id
  and MMIO bar.

v5 -> v6

* Update documentation to kernel-doc format.

v4 -> v5

* Add a TODO for including flags in the request to the NE PCI device to
  set a memory region for an enclave. It is not used for now.

v3 -> v4

* Remove the "packed" attribute and include padding in the NE data
  structures.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Update path naming to drivers/virt/nitro_enclaves.
* Update NE_ENABLE_OFF / NE_ENABLE_ON defines.
---
 drivers/virt/nitro_enclaves/ne_pci_dev.h | 327 +++++++++++++++++++++++
 1 file changed, 327 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.h

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.h b/drivers/virt/nitro_enclaves/ne_pci_dev.h
new file mode 100644
index 000000000000..336fa344d630
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.h
@@ -0,0 +1,327 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _NE_PCI_DEV_H_
+#define _NE_PCI_DEV_H_
+
+#include <linux/atomic.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/pci_ids.h>
+#include <linux/wait.h>
+
+/**
+ * DOC: Nitro Enclaves (NE) PCI device
+ */
+
+/**
+ * PCI_DEVICE_ID_NE - Nitro Enclaves PCI device id.
+ */
+#define PCI_DEVICE_ID_NE	(0xe4c1)
+/**
+ * PCI_BAR_NE - Nitro Enclaves PCI device MMIO BAR.
+ */
+#define PCI_BAR_NE		(0x03)
+
+/**
+ * DOC: Device registers in the NE PCI device MMIO BAR
+ */
+
+/**
+ * NE_ENABLE - (1 byte) Register to notify the device that the driver is using
+ *	       it (Read/Write).
+ */
+#define NE_ENABLE		(0x0000)
+#define NE_ENABLE_OFF		(0x00)
+#define NE_ENABLE_ON		(0x01)
+
+/**
+ * NE_VERSION - (2 bytes) Register to select the device run-time version
+ *		(Read/Write).
+ */
+#define NE_VERSION		(0x0002)
+#define NE_VERSION_MAX		(0x0001)
+
+/**
+ * NE_COMMAND - (4 bytes) Register to notify the device what command was
+ *		requested (Write-Only).
+ */
+#define NE_COMMAND		(0x0004)
+
+/**
+ * NE_EVTCNT - (4 bytes) Register to notify the driver that a reply or a device
+ *	       event is available (Read-Only):
+ *	       - Lower half  - command reply counter
+ *	       - Higher half - out-of-band device event counter
+ */
+#define NE_EVTCNT		(0x000c)
+#define NE_EVTCNT_REPLY_SHIFT	(0)
+#define NE_EVTCNT_REPLY_MASK	(0x0000ffff)
+#define NE_EVTCNT_REPLY(cnt)	(((cnt) & NE_EVTCNT_REPLY_MASK) >> \
+				NE_EVTCNT_REPLY_SHIFT)
+#define NE_EVTCNT_EVENT_SHIFT	(16)
+#define NE_EVTCNT_EVENT_MASK	(0xffff0000)
+#define NE_EVTCNT_EVENT(cnt)	(((cnt) & NE_EVTCNT_EVENT_MASK) >> \
+				NE_EVTCNT_EVENT_SHIFT)
+
+/**
+ * NE_SEND_DATA - (240 bytes) Buffer for sending the command request payload
+ *		  (Read/Write).
+ */
+#define NE_SEND_DATA		(0x0010)
+
+/**
+ * NE_RECV_DATA - (240 bytes) Buffer for receiving the command reply payload
+ *		  (Read-Only).
+ */
+#define NE_RECV_DATA		(0x0100)
+
+/**
+ * DOC: Device MMIO buffer sizes
+ */
+
+/**
+ * NE_SEND_DATA_SIZE / NE_RECV_DATA_SIZE - 240 bytes for send / recv buffer.
+ */
+#define NE_SEND_DATA_SIZE	(240)
+#define NE_RECV_DATA_SIZE	(240)
+
+/**
+ * DOC: MSI-X interrupt vectors
+ */
+
+/**
+ * NE_VEC_REPLY - MSI-X vector used for command reply notification.
+ */
+#define NE_VEC_REPLY		(0)
+
+/**
+ * NE_VEC_EVENT - MSI-X vector used for out-of-band events e.g. enclave crash.
+ */
+#define NE_VEC_EVENT		(1)
+
+/**
+ * enum ne_pci_dev_cmd_type - Device command types.
+ * @INVALID_CMD:		Invalid command.
+ * @ENCLAVE_START:		Start an enclave, after setting its resources.
+ * @ENCLAVE_GET_SLOT:		Get the slot uid of an enclave.
+ * @ENCLAVE_STOP:		Terminate an enclave.
+ * @SLOT_ALLOC :		Allocate a slot for an enclave.
+ * @SLOT_FREE:			Free the slot allocated for an enclave
+ * @SLOT_ADD_MEM:		Add a memory region to an enclave slot.
+ * @SLOT_ADD_VCPU:		Add a vCPU to an enclave slot.
+ * @SLOT_COUNT :		Get the number of allocated slots.
+ * @NEXT_SLOT:			Get the next slot in the list of allocated slots.
+ * @SLOT_INFO:			Get the info for a slot e.g. slot uid, vCPUs count.
+ * @SLOT_ADD_BULK_VCPUS:	Add a number of vCPUs, not providing CPU ids.
+ * @MAX_CMD:			A gatekeeper for max possible command type.
+ */
+enum ne_pci_dev_cmd_type {
+	INVALID_CMD = 0,
+	ENCLAVE_START = 1,
+	ENCLAVE_GET_SLOT = 2,
+	ENCLAVE_STOP = 3,
+	SLOT_ALLOC = 4,
+	SLOT_FREE = 5,
+	SLOT_ADD_MEM = 6,
+	SLOT_ADD_VCPU = 7,
+	SLOT_COUNT = 8,
+	NEXT_SLOT = 9,
+	SLOT_INFO = 10,
+	SLOT_ADD_BULK_VCPUS = 11,
+	MAX_CMD,
+};
+
+/**
+ * DOC: Device commands - payload structure for requests and replies.
+ */
+
+/**
+ * struct enclave_start_req - ENCLAVE_START request.
+ * @slot_uid:		Slot unique id mapped to the enclave to start.
+ * @enclave_cid:	Context ID (CID) for the enclave vsock device.
+ *			If 0, CID is autogenerated.
+ * @flags:		Flags for the enclave to start with (e.g. debug mode).
+ */
+struct enclave_start_req {
+	u64	slot_uid;
+	u64	enclave_cid;
+	u64	flags;
+};
+
+/**
+ * struct enclave_get_slot_req - ENCLAVE_GET_SLOT request.
+ * @enclave_cid:	Context ID (CID) for the enclave vsock device.
+ */
+struct enclave_get_slot_req {
+	u64	enclave_cid;
+};
+
+/**
+ * struct enclave_stop_req - ENCLAVE_STOP request.
+ * @slot_uid:	Slot unique id mapped to the enclave to stop.
+ */
+struct enclave_stop_req {
+	u64	slot_uid;
+};
+
+/**
+ * struct slot_alloc_req - SLOT_ALLOC request.
+ * @unused:	In order to avoid weird sizeof edge cases.
+ */
+struct slot_alloc_req {
+	u8	unused;
+};
+
+/**
+ * struct slot_free_req - SLOT_FREE request.
+ * @slot_uid:	Slot unique id mapped to the slot to free.
+ */
+struct slot_free_req {
+	u64	slot_uid;
+};
+
+/* TODO: Add flags field to the request to add memory region. */
+/**
+ * struct slot_add_mem_req - SLOT_ADD_MEM request.
+ * @slot_uid:	Slot unique id mapped to the slot to add the memory region to.
+ * @paddr:	Physical address of the memory region to add to the slot.
+ * @size:	Memory size, in bytes, of the memory region to add to the slot.
+ */
+struct slot_add_mem_req {
+	u64	slot_uid;
+	u64	paddr;
+	u64	size;
+};
+
+/**
+ * struct slot_add_vcpu_req - SLOT_ADD_VCPU request.
+ * @slot_uid:	Slot unique id mapped to the slot to add the vCPU to.
+ * @vcpu_id:	vCPU ID of the CPU to add to the enclave.
+ * @padding:	Padding for the overall data structure.
+ */
+struct slot_add_vcpu_req {
+	u64	slot_uid;
+	u32	vcpu_id;
+	u8	padding[4];
+};
+
+/**
+ * struct slot_count_req - SLOT_COUNT request.
+ * @unused:	In order to avoid weird sizeof edge cases.
+ */
+struct slot_count_req {
+	u8	unused;
+};
+
+/**
+ * struct next_slot_req - NEXT_SLOT request.
+ * @slot_uid:	Slot unique id of the next slot in the iteration.
+ */
+struct next_slot_req {
+	u64	slot_uid;
+};
+
+/**
+ * struct slot_info_req - SLOT_INFO request.
+ * @slot_uid:	Slot unique id mapped to the slot to get information about.
+ */
+struct slot_info_req {
+	u64	slot_uid;
+};
+
+/**
+ * struct slot_add_bulk_vcpus_req - SLOT_ADD_BULK_VCPUS request.
+ * @slot_uid:	Slot unique id mapped to the slot to add vCPUs to.
+ * @nr_vcpus:	Number of vCPUs to add to the slot.
+ */
+struct slot_add_bulk_vcpus_req {
+	u64	slot_uid;
+	u64	nr_vcpus;
+};
+
+/**
+ * struct ne_pci_dev_cmd_reply - NE PCI device command reply.
+ * @rc :		Return code of the logic that processed the request.
+ * @padding0:		Padding for the overall data structure.
+ * @slot_uid:		Valid for all commands except SLOT_COUNT.
+ * @enclave_cid:	Valid for ENCLAVE_START command.
+ * @slot_count :	Valid for SLOT_COUNT command.
+ * @mem_regions:	Valid for SLOT_ALLOC and SLOT_INFO commands.
+ * @mem_size:		Valid for SLOT_INFO command.
+ * @nr_vcpus:		Valid for SLOT_INFO command.
+ * @flags:		Valid for SLOT_INFO command.
+ * @state:		Valid for SLOT_INFO command.
+ * @padding1:		Padding for the overall data structure.
+ */
+struct ne_pci_dev_cmd_reply {
+	s32	rc;
+	u8	padding0[4];
+	u64	slot_uid;
+	u64	enclave_cid;
+	u64	slot_count;
+	u64	mem_regions;
+	u64	mem_size;
+	u64	nr_vcpus;
+	u64	flags;
+	u16	state;
+	u8	padding1[6];
+};
+
+/**
+ * struct ne_pci_dev - Nitro Enclaves (NE) PCI device.
+ * @cmd_reply_avail:		Variable set if a reply has been sent by the
+ *				PCI device.
+ * @cmd_reply_wait_q:		Wait queue for handling command reply from the
+ *				PCI device.
+ * @enclaves_list:		List of the enclaves managed by the PCI device.
+ * @enclaves_list_mutex:	Mutex for accessing the list of enclaves.
+ * @event_wq:			Work queue for handling out-of-band events
+ *				triggered by the Nitro Hypervisor which require
+ *				enclave state scanning and propagation to the
+ *				enclave process.
+ * @iomem_base :		MMIO region of the PCI device.
+ * @notify_work:		Work item for every received out-of-band event.
+ * @pci_dev_mutex:		Mutex for accessing the PCI device MMIO space.
+ * @pdev:			PCI device data structure.
+ */
+struct ne_pci_dev {
+	atomic_t		cmd_reply_avail;
+	wait_queue_head_t	cmd_reply_wait_q;
+	struct list_head	enclaves_list;
+	struct mutex		enclaves_list_mutex;
+	struct workqueue_struct	*event_wq;
+	void __iomem		*iomem_base;
+	struct work_struct	notify_work;
+	struct mutex		pci_dev_mutex;
+	struct pci_dev		*pdev;
+};
+
+/**
+ * ne_do_request() - Submit command request to the PCI device based on the command
+ *		     type and retrieve the associated reply.
+ * @pdev:		PCI device to send the command to and receive the reply from.
+ * @cmd_type:		Command type of the request sent to the PCI device.
+ * @cmd_request:	Command request payload.
+ * @cmd_request_size:	Size of the command request payload.
+ * @cmd_reply:		Command reply payload.
+ * @cmd_reply_size:	Size of the command reply payload.
+ *
+ * Context: Process context. This function uses the ne_pci_dev mutex to handle
+ *	    one command at a time.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+int ne_do_request(struct pci_dev *pdev, enum ne_pci_dev_cmd_type cmd_type,
+		  void *cmd_request, size_t cmd_request_size,
+		  struct ne_pci_dev_cmd_reply *cmd_reply,
+		  size_t cmd_reply_size);
+
+/* Nitro Enclaves (NE) PCI device driver */
+extern struct pci_driver ne_pci_driver;
+
+#endif /* _NE_PCI_DEV_H_ */

From patchwork Fri Sep  4 17:37:03 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Paraschiv, Andra-Irina" <andraprs@amazon.com>
X-Patchwork-Id: 11758307
Return-Path: <SRS0=EfB3=CN=vger.kernel.org=kvm-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 547061599
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:03 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 2228223D58
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:03 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com
 header.b="MyR6X/1g"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1727867AbgIDRiT (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Fri, 4 Sep 2020 13:38:19 -0400
Received: from smtp-fw-2101.amazon.com ([72.21.196.25]:2960 "EHLO
        smtp-fw-2101.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1728069AbgIDRiJ (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 4 Sep 2020 13:38:09 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209;
  t=1599241088; x=1630777088;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=sy6e6VascrUNe6HJdOQOpPygSVEo53UfSvxaOKmtn5A=;
  b=MyR6X/1g94ov01x5+6tZdLXGm+PAJUVsRYKbt39tpKu5Af5NkWH8C8LR
   2Md5ff60zcBKX1sRH5XHG4gr16Jnr4wazYrfDwCTDkFXJxMVrWHVKip71
   JuhGlNZSVECWZNXtSGcQrKfUIZ1fovU3pH5sDh5gfEuvECPTbIYtG/BvU
   I=;
X-IronPort-AV: E=Sophos;i="5.76,390,1592870400";
   d="scan'208";a="51938801"
Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO
 email-inbound-relay-1d-38ae4ad2.us-east-1.amazon.com) ([10.43.8.2])
  by smtp-border-fw-out-2101.iad2.amazon.com with ESMTP;
 04 Sep 2020 17:38:06 +0000
Received: from EX13D16EUB001.ant.amazon.com
 (iad55-ws-svc-p15-lb9-vlan2.iad.amazon.com [10.40.159.162])
        by email-inbound-relay-1d-38ae4ad2.us-east-1.amazon.com (Postfix) with
 ESMTPS id 5C08DA1837;
        Fri,  4 Sep 2020 17:38:04 +0000 (UTC)
Received: from 38f9d34ed3b1.ant.amazon.com (10.43.161.85) by
 EX13D16EUB001.ant.amazon.com (10.43.166.28) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Fri, 4 Sep 2020 17:37:54 +0000
From: Andra Paraschiv <andraprs@amazon.com>
To: linux-kernel <linux-kernel@vger.kernel.org>
CC: Anthony Liguori <aliguori@amazon.com>,
        Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Colm MacCarthaigh <colmmacc@amazon.com>,
        "David Duncan" <davdunc@amazon.com>,
        Bjoern Doebel <doebel@amazon.de>,
        "David Woodhouse" <dwmw@amazon.co.uk>,
        Frank van der Linden <fllinden@amazon.com>,
        Alexander Graf <graf@amazon.de>,
        Greg KH <gregkh@linuxfoundation.org>,
        "Karen Noel" <knoel@redhat.com>,
        Martin Pohlack <mpohlack@amazon.de>,
        Matt Wilson <msw@amazon.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Balbir Singh <sblbir@amazon.com>,
        Stefano Garzarella <sgarzare@redhat.com>,
        "Stefan Hajnoczi" <stefanha@redhat.com>,
        Stewart Smith <trawets@amazon.com>,
        "Uwe Dannowski" <uwed@amazon.de>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        kvm <kvm@vger.kernel.org>,
        ne-devel-upstream <ne-devel-upstream@amazon.com>,
        Andra Paraschiv <andraprs@amazon.com>
Subject: [PATCH v8 03/18] nitro_enclaves: Define enclave info for internal
 bookkeeping
Date: Fri, 4 Sep 2020 20:37:03 +0300
Message-ID: <20200904173718.64857-4-andraprs@amazon.com>
X-Mailer: git-send-email 2.20.1 (Apple Git-117)
In-Reply-To: <20200904173718.64857-1-andraprs@amazon.com>
References: <20200904173718.64857-1-andraprs@amazon.com>
MIME-Version: 1.0
X-Originating-IP: [10.43.161.85]
X-ClientProxiedBy: EX13D13UWB001.ant.amazon.com (10.43.161.156) To
 EX13D16EUB001.ant.amazon.com (10.43.166.28)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

The Nitro Enclaves driver keeps an internal info per each enclave.

This is needed to be able to manage enclave resources state, enclave
notifications and have a reference of the PCI device that handles
command requests for enclave lifetime management.

Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
---
Changelog

v7 -> v8

* No changes.

v6 -> v7

* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.

v5 -> v6

* Update documentation to kernel-doc format.
* Include in the enclave memory region data structure the user space
  address and size for duplicate user space memory regions checks.

v4 -> v5

* Include enclave cores field in the enclave metadata.
* Update the vCPU ids data structure to be a cpumask instead of a list.

v3 -> v4

* Add NUMA node field for an enclave metadata as the enclave memory and
  CPUs need to be from the same NUMA node.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Add enclave memory regions and vcpus count for enclave bookkeeping.
* Update ne_state comments to reflect NE_START_ENCLAVE ioctl naming
  update.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.h | 99 +++++++++++++++++++++++
 1 file changed, 99 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.h

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.h b/drivers/virt/nitro_enclaves/ne_misc_dev.h
new file mode 100644
index 000000000000..a907924de7ca
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.h
@@ -0,0 +1,99 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _NE_MISC_DEV_H_
+#define _NE_MISC_DEV_H_
+
+#include <linux/cpumask.h>
+#include <linux/list.h>
+#include <linux/miscdevice.h>
+#include <linux/mm.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/wait.h>
+
+/**
+ * struct ne_mem_region - Entry in the enclave user space memory regions list.
+ * @mem_region_list_entry:	Entry in the list of enclave memory regions.
+ * @memory_size:		Size of the user space memory region.
+ * @nr_pages:			Number of pages that make up the memory region.
+ * @pages:			Pages that make up the user space memory region.
+ * @userspace_addr:		User space address of the memory region.
+ */
+struct ne_mem_region {
+	struct list_head	mem_region_list_entry;
+	u64			memory_size;
+	unsigned long		nr_pages;
+	struct page		**pages;
+	u64			userspace_addr;
+};
+
+/**
+ * struct ne_enclave - Per-enclave data used for enclave lifetime management.
+ * @enclave_info_mutex :	Mutex for accessing this internal state.
+ * @enclave_list_entry :	Entry in the list of created enclaves.
+ * @eventq:			Wait queue used for out-of-band event notifications
+ *				triggered from the PCI device event handler to
+ *				the enclave process via the poll function.
+ * @has_event:			Variable used to determine if the out-of-band event
+ *				was triggered.
+ * @max_mem_regions:		The maximum number of memory regions that can be
+ *				handled by the hypervisor.
+ * @mem_regions_list:		Enclave user space memory regions list.
+ * @mem_size:			Enclave memory size.
+ * @mm :			Enclave process abstraction mm data struct.
+ * @nr_mem_regions:		Number of memory regions associated with the enclave.
+ * @nr_parent_vm_cores :	The size of the threads per core array. The
+ *				total number of CPU cores available on the
+ *				parent / primary VM.
+ * @nr_threads_per_core:	The number of threads that a full CPU core has.
+ * @nr_vcpus:			Number of vcpus associated with the enclave.
+ * @numa_node:			NUMA node of the enclave memory and CPUs.
+ * @pdev:			PCI device used for enclave lifetime management.
+ * @slot_uid:			Slot unique id mapped to the enclave.
+ * @state:			Enclave state, updated during enclave lifetime.
+ * @threads_per_core:		Enclave full CPU cores array, indexed by core id,
+ *				consisting of cpumasks with all their threads.
+ *				Full CPU cores are taken from the NE CPU pool
+ *				and are available to the enclave.
+ * @vcpu_ids:			Cpumask of the vCPUs that are set for the enclave.
+ */
+struct ne_enclave {
+	struct mutex		enclave_info_mutex;
+	struct list_head	enclave_list_entry;
+	wait_queue_head_t	eventq;
+	bool			has_event;
+	u64			max_mem_regions;
+	struct list_head	mem_regions_list;
+	u64			mem_size;
+	struct mm_struct	*mm;
+	unsigned int		nr_mem_regions;
+	unsigned int		nr_parent_vm_cores;
+	unsigned int		nr_threads_per_core;
+	unsigned int		nr_vcpus;
+	int			numa_node;
+	struct pci_dev		*pdev;
+	u64			slot_uid;
+	u16			state;
+	cpumask_var_t		*threads_per_core;
+	cpumask_var_t		vcpu_ids;
+};
+
+/**
+ * enum ne_state - States available for an enclave.
+ * @NE_STATE_INIT:	The enclave has not been started yet.
+ * @NE_STATE_RUNNING:	The enclave was started and is running as expected.
+ * @NE_STATE_STOPPED:	The enclave exited without userspace interaction.
+ */
+enum ne_state {
+	NE_STATE_INIT		= 0,
+	NE_STATE_RUNNING	= 2,
+	NE_STATE_STOPPED	= U16_MAX,
+};
+
+/* Nitro Enclaves (NE) misc device */
+extern struct miscdevice ne_misc_dev;
+
+#endif /* _NE_MISC_DEV_H_ */

From patchwork Fri Sep  4 17:37:04 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Paraschiv, Andra-Irina" <andraprs@amazon.com>
X-Patchwork-Id: 11758309
Return-Path: <SRS0=EfB3=CN=vger.kernel.org=kvm-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8238715AB
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:03 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 5A38D23D58
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:03 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com
 header.b="kD16Nbcp"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1727937AbgIDRia (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Fri, 4 Sep 2020 13:38:30 -0400
Received: from smtp-fw-33001.amazon.com ([207.171.190.10]:36661 "EHLO
        smtp-fw-33001.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1728079AbgIDRiW (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 4 Sep 2020 13:38:22 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209;
  t=1599241101; x=1630777101;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=JEWfYjE51IeGs4iE1nUxiTR4JpJ31r8IMUhicXekmBw=;
  b=kD16NbcpFK0Se8mxiXnhUm1hS96LN4sGgk2Um7GNia37ViesxVe+9QVw
   n/+x8Xjj+cErjw5xSLhL7t0BF2O58EizsA0hO3qqRGcKEdVp2q0pRQDBS
   Db4W1nhuSMgglv/Ay28gHOeo2RYeqp3MXvZgcNrt0VuazGMBnwzEsLpBF
   s=;
X-IronPort-AV: E=Sophos;i="5.76,390,1592870400";
   d="scan'208";a="72479279"
Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO
 email-inbound-relay-1e-97fdccfd.us-east-1.amazon.com) ([10.47.23.38])
  by smtp-border-fw-out-33001.sea14.amazon.com with ESMTP;
 04 Sep 2020 17:38:16 +0000
Received: from EX13D16EUB001.ant.amazon.com
 (iad55-ws-svc-p15-lb9-vlan2.iad.amazon.com [10.40.159.162])
        by email-inbound-relay-1e-97fdccfd.us-east-1.amazon.com (Postfix) with
 ESMTPS id 4DCF0A1D75;
        Fri,  4 Sep 2020 17:38:13 +0000 (UTC)
Received: from 38f9d34ed3b1.ant.amazon.com (10.43.161.85) by
 EX13D16EUB001.ant.amazon.com (10.43.166.28) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Fri, 4 Sep 2020 17:38:03 +0000
From: Andra Paraschiv <andraprs@amazon.com>
To: linux-kernel <linux-kernel@vger.kernel.org>
CC: Anthony Liguori <aliguori@amazon.com>,
        Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Colm MacCarthaigh <colmmacc@amazon.com>,
        "David Duncan" <davdunc@amazon.com>,
        Bjoern Doebel <doebel@amazon.de>,
        "David Woodhouse" <dwmw@amazon.co.uk>,
        Frank van der Linden <fllinden@amazon.com>,
        Alexander Graf <graf@amazon.de>,
        Greg KH <gregkh@linuxfoundation.org>,
        "Karen Noel" <knoel@redhat.com>,
        Martin Pohlack <mpohlack@amazon.de>,
        Matt Wilson <msw@amazon.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Balbir Singh <sblbir@amazon.com>,
        Stefano Garzarella <sgarzare@redhat.com>,
        "Stefan Hajnoczi" <stefanha@redhat.com>,
        Stewart Smith <trawets@amazon.com>,
        "Uwe Dannowski" <uwed@amazon.de>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        kvm <kvm@vger.kernel.org>,
        ne-devel-upstream <ne-devel-upstream@amazon.com>,
        Andra Paraschiv <andraprs@amazon.com>
Subject: [PATCH v8 04/18] nitro_enclaves: Init PCI device driver
Date: Fri, 4 Sep 2020 20:37:04 +0300
Message-ID: <20200904173718.64857-5-andraprs@amazon.com>
X-Mailer: git-send-email 2.20.1 (Apple Git-117)
In-Reply-To: <20200904173718.64857-1-andraprs@amazon.com>
References: <20200904173718.64857-1-andraprs@amazon.com>
MIME-Version: 1.0
X-Originating-IP: [10.43.161.85]
X-ClientProxiedBy: EX13D13UWB001.ant.amazon.com (10.43.161.156) To
 EX13D16EUB001.ant.amazon.com (10.43.166.28)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

The Nitro Enclaves PCI device is used by the kernel driver as a means of
communication with the hypervisor on the host where the primary VM and
the enclaves run. It handles requests with regard to enclave lifetime.

Setup the PCI device driver and add support for MSI-X interrupts.

Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
Signed-off-by: Alexandru Ciobotaru <alcioa@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
---
Changelog

v7 -> v8

* Add NE PCI driver shutdown logic.

v6 -> v7

* No changes.

v5 -> v6

* Update documentation to kernel-doc format.

v4 -> v5

* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Update NE PCI driver name to "nitro_enclaves".

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.
* Remove the WARN_ON calls.
* Remove linux/bug include that is not needed.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Update PCI device setup functions to receive PCI device data structure and
  then get private data from it inside the functions logic.
* Remove the BUG_ON calls.
* Add teardown function for MSI-X setup.
* Update goto labels to match their purpose.
* Implement TODO for NE PCI device disable state check.
* Update function name for NE PCI device probe / remove.
---
 drivers/virt/nitro_enclaves/ne_pci_dev.c | 298 +++++++++++++++++++++++
 1 file changed, 298 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.c

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c b/drivers/virt/nitro_enclaves/ne_pci_dev.c
new file mode 100644
index 000000000000..daf8b36383f1
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -0,0 +1,298 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+/**
+ * DOC: Nitro Enclaves (NE) PCI device driver.
+ */
+
+#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/nitro_enclaves.h>
+#include <linux/pci.h>
+#include <linux/types.h>
+#include <linux/wait.h>
+
+#include "ne_misc_dev.h"
+#include "ne_pci_dev.h"
+
+/**
+ * NE_DEFAULT_TIMEOUT_MSECS - Default timeout to wait for a reply from
+ *			      the NE PCI device.
+ */
+#define NE_DEFAULT_TIMEOUT_MSECS	(120000) /* 120 sec */
+
+static const struct pci_device_id ne_pci_ids[] = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_AMAZON, PCI_DEVICE_ID_NE) },
+	{ 0, }
+};
+
+MODULE_DEVICE_TABLE(pci, ne_pci_ids);
+
+/**
+ * ne_setup_msix() - Setup MSI-X vectors for the PCI device.
+ * @pdev:	PCI device to setup the MSI-X for.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_setup_msix(struct pci_dev *pdev)
+{
+	int nr_vecs = 0;
+	int rc = -EINVAL;
+
+	nr_vecs = pci_msix_vec_count(pdev);
+	if (nr_vecs < 0) {
+		rc = nr_vecs;
+
+		dev_err(&pdev->dev, "Error in getting vec count [rc=%d]\n", rc);
+
+		return rc;
+	}
+
+	rc = pci_alloc_irq_vectors(pdev, nr_vecs, nr_vecs, PCI_IRQ_MSIX);
+	if (rc < 0) {
+		dev_err(&pdev->dev, "Error in alloc MSI-X vecs [rc=%d]\n", rc);
+
+		return rc;
+	}
+
+	return 0;
+}
+
+/**
+ * ne_teardown_msix() - Teardown MSI-X vectors for the PCI device.
+ * @pdev:	PCI device to teardown the MSI-X for.
+ *
+ * Context: Process context.
+ */
+static void ne_teardown_msix(struct pci_dev *pdev)
+{
+	pci_free_irq_vectors(pdev);
+}
+
+/**
+ * ne_pci_dev_enable() - Select the PCI device version and enable it.
+ * @pdev:	PCI device to select version for and then enable.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_pci_dev_enable(struct pci_dev *pdev)
+{
+	u8 dev_enable_reply = 0;
+	u16 dev_version_reply = 0;
+	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+	iowrite16(NE_VERSION_MAX, ne_pci_dev->iomem_base + NE_VERSION);
+
+	dev_version_reply = ioread16(ne_pci_dev->iomem_base + NE_VERSION);
+	if (dev_version_reply != NE_VERSION_MAX) {
+		dev_err(&pdev->dev, "Error in pci dev version cmd\n");
+
+		return -EIO;
+	}
+
+	iowrite8(NE_ENABLE_ON, ne_pci_dev->iomem_base + NE_ENABLE);
+
+	dev_enable_reply = ioread8(ne_pci_dev->iomem_base + NE_ENABLE);
+	if (dev_enable_reply != NE_ENABLE_ON) {
+		dev_err(&pdev->dev, "Error in pci dev enable cmd\n");
+
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/**
+ * ne_pci_dev_disable() - Disable the PCI device.
+ * @pdev:	PCI device to disable.
+ *
+ * Context: Process context.
+ */
+static void ne_pci_dev_disable(struct pci_dev *pdev)
+{
+	u8 dev_disable_reply = 0;
+	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+	const unsigned int sleep_time = 10; /* 10 ms */
+	unsigned int sleep_time_count = 0;
+
+	iowrite8(NE_ENABLE_OFF, ne_pci_dev->iomem_base + NE_ENABLE);
+
+	/*
+	 * Check for NE_ENABLE_OFF in a loop, to handle cases when the device
+	 * state is not immediately set to disabled and going through a
+	 * transitory state of disabling.
+	 */
+	while (sleep_time_count < NE_DEFAULT_TIMEOUT_MSECS) {
+		dev_disable_reply = ioread8(ne_pci_dev->iomem_base + NE_ENABLE);
+		if (dev_disable_reply == NE_ENABLE_OFF)
+			return;
+
+		msleep_interruptible(sleep_time);
+		sleep_time_count += sleep_time;
+	}
+
+	dev_disable_reply = ioread8(ne_pci_dev->iomem_base + NE_ENABLE);
+	if (dev_disable_reply != NE_ENABLE_OFF)
+		dev_err(&pdev->dev, "Error in pci dev disable cmd\n");
+}
+
+/**
+ * ne_pci_probe() - Probe function for the NE PCI device.
+ * @pdev:	PCI device to match with the NE PCI driver.
+ * @id :	PCI device id table associated with the NE PCI driver.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	struct ne_pci_dev *ne_pci_dev = NULL;
+	int rc = -EINVAL;
+
+	ne_pci_dev = kzalloc(sizeof(*ne_pci_dev), GFP_KERNEL);
+	if (!ne_pci_dev)
+		return -ENOMEM;
+
+	rc = pci_enable_device(pdev);
+	if (rc < 0) {
+		dev_err(&pdev->dev, "Error in pci dev enable [rc=%d]\n", rc);
+
+		goto free_ne_pci_dev;
+	}
+
+	rc = pci_request_regions_exclusive(pdev, "nitro_enclaves");
+	if (rc < 0) {
+		dev_err(&pdev->dev, "Error in pci request regions [rc=%d]\n", rc);
+
+		goto disable_pci_dev;
+	}
+
+	ne_pci_dev->iomem_base = pci_iomap(pdev, PCI_BAR_NE, 0);
+	if (!ne_pci_dev->iomem_base) {
+		rc = -ENOMEM;
+
+		dev_err(&pdev->dev, "Error in pci iomap [rc=%d]\n", rc);
+
+		goto release_pci_regions;
+	}
+
+	pci_set_drvdata(pdev, ne_pci_dev);
+
+	rc = ne_setup_msix(pdev);
+	if (rc < 0) {
+		dev_err(&pdev->dev, "Error in pci dev msix setup [rc=%d]\n", rc);
+
+		goto iounmap_pci_bar;
+	}
+
+	ne_pci_dev_disable(pdev);
+
+	rc = ne_pci_dev_enable(pdev);
+	if (rc < 0) {
+		dev_err(&pdev->dev, "Error in ne_pci_dev enable [rc=%d]\n", rc);
+
+		goto teardown_msix;
+	}
+
+	atomic_set(&ne_pci_dev->cmd_reply_avail, 0);
+	init_waitqueue_head(&ne_pci_dev->cmd_reply_wait_q);
+	INIT_LIST_HEAD(&ne_pci_dev->enclaves_list);
+	mutex_init(&ne_pci_dev->enclaves_list_mutex);
+	mutex_init(&ne_pci_dev->pci_dev_mutex);
+	ne_pci_dev->pdev = pdev;
+
+	return 0;
+
+teardown_msix:
+	ne_teardown_msix(pdev);
+iounmap_pci_bar:
+	pci_set_drvdata(pdev, NULL);
+	pci_iounmap(pdev, ne_pci_dev->iomem_base);
+release_pci_regions:
+	pci_release_regions(pdev);
+disable_pci_dev:
+	pci_disable_device(pdev);
+free_ne_pci_dev:
+	kfree(ne_pci_dev);
+
+	return rc;
+}
+
+/**
+ * ne_pci_remove() - Remove function for the NE PCI device.
+ * @pdev:	PCI device associated with the NE PCI driver.
+ *
+ * Context: Process context.
+ */
+static void ne_pci_remove(struct pci_dev *pdev)
+{
+	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+	ne_pci_dev_disable(pdev);
+
+	ne_teardown_msix(pdev);
+
+	pci_set_drvdata(pdev, NULL);
+
+	pci_iounmap(pdev, ne_pci_dev->iomem_base);
+
+	pci_release_regions(pdev);
+
+	pci_disable_device(pdev);
+
+	kfree(ne_pci_dev);
+}
+
+/**
+ * ne_pci_shutdown() - Shutdown function for the NE PCI device.
+ * @pdev:	PCI device associated with the NE PCI driver.
+ *
+ * Context: Process context.
+ */
+static void ne_pci_shutdown(struct pci_dev *pdev)
+{
+	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+	if (!ne_pci_dev)
+		return;
+
+	ne_pci_dev_disable(pdev);
+
+	ne_teardown_msix(pdev);
+
+	pci_set_drvdata(pdev, NULL);
+
+	pci_iounmap(pdev, ne_pci_dev->iomem_base);
+
+	pci_release_regions(pdev);
+
+	pci_disable_device(pdev);
+
+	kfree(ne_pci_dev);
+}
+
+/*
+ * TODO: Add suspend / resume functions for power management w/ CONFIG_PM, if
+ * needed.
+ */
+/* NE PCI device driver. */
+struct pci_driver ne_pci_driver = {
+	.name		= "nitro_enclaves",
+	.id_table	= ne_pci_ids,
+	.probe		= ne_pci_probe,
+	.remove		= ne_pci_remove,
+	.shutdown	= ne_pci_shutdown,
+};

From patchwork Fri Sep  4 17:37:05 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Paraschiv, Andra-Irina" <andraprs@amazon.com>
X-Patchwork-Id: 11758311
Return-Path: <SRS0=EfB3=CN=vger.kernel.org=kvm-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C8AC292C
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:03 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id A06F223F34
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:03 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com
 header.b="GF80BcJO"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1727851AbgIDRin (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Fri, 4 Sep 2020 13:38:43 -0400
Received: from smtp-fw-4101.amazon.com ([72.21.198.25]:50006 "EHLO
        smtp-fw-4101.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1728108AbgIDRid (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 4 Sep 2020 13:38:33 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209;
  t=1599241111; x=1630777111;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=L4/dDnWumHZ2zxQUErq4vmdHts70qp38paag+MgVKac=;
  b=GF80BcJOswdcnPm3fbYDeT4eIGT+cT4lSV/oqdyhN+YXXhrPuZ9D0Zuf
   8T2hwyhwDOnKsuulrcCBQnvDiOw4Jah8zaHhD6Ow0qd87NaEKVlv8c7GM
   8CbJXAlhOb/WsEOMD8AzdjuZN/xYjXRHzBk7F9UgHMN8vVYSOPG4qcqMP
   A=;
X-IronPort-AV: E=Sophos;i="5.76,390,1592870400";
   d="scan'208";a="52178914"
Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO
 email-inbound-relay-1d-474bcd9f.us-east-1.amazon.com) ([10.43.8.6])
  by smtp-border-fw-out-4101.iad4.amazon.com with ESMTP;
 04 Sep 2020 17:38:31 +0000
Received: from EX13D16EUB001.ant.amazon.com
 (iad55-ws-svc-p15-lb9-vlan3.iad.amazon.com [10.40.159.166])
        by email-inbound-relay-1d-474bcd9f.us-east-1.amazon.com (Postfix) with
 ESMTPS id A7729A2132;
        Fri,  4 Sep 2020 17:38:28 +0000 (UTC)
Received: from 38f9d34ed3b1.ant.amazon.com (10.43.161.85) by
 EX13D16EUB001.ant.amazon.com (10.43.166.28) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Fri, 4 Sep 2020 17:38:18 +0000
From: Andra Paraschiv <andraprs@amazon.com>
To: linux-kernel <linux-kernel@vger.kernel.org>
CC: Anthony Liguori <aliguori@amazon.com>,
        Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Colm MacCarthaigh <colmmacc@amazon.com>,
        "David Duncan" <davdunc@amazon.com>,
        Bjoern Doebel <doebel@amazon.de>,
        "David Woodhouse" <dwmw@amazon.co.uk>,
        Frank van der Linden <fllinden@amazon.com>,
        Alexander Graf <graf@amazon.de>,
        Greg KH <gregkh@linuxfoundation.org>,
        "Karen Noel" <knoel@redhat.com>,
        Martin Pohlack <mpohlack@amazon.de>,
        Matt Wilson <msw@amazon.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Balbir Singh <sblbir@amazon.com>,
        Stefano Garzarella <sgarzare@redhat.com>,
        "Stefan Hajnoczi" <stefanha@redhat.com>,
        Stewart Smith <trawets@amazon.com>,
        "Uwe Dannowski" <uwed@amazon.de>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        kvm <kvm@vger.kernel.org>,
        ne-devel-upstream <ne-devel-upstream@amazon.com>,
        Andra Paraschiv <andraprs@amazon.com>
Subject: [PATCH v8 05/18] nitro_enclaves: Handle PCI device command requests
Date: Fri, 4 Sep 2020 20:37:05 +0300
Message-ID: <20200904173718.64857-6-andraprs@amazon.com>
X-Mailer: git-send-email 2.20.1 (Apple Git-117)
In-Reply-To: <20200904173718.64857-1-andraprs@amazon.com>
References: <20200904173718.64857-1-andraprs@amazon.com>
MIME-Version: 1.0
X-Originating-IP: [10.43.161.85]
X-ClientProxiedBy: EX13D02UWC003.ant.amazon.com (10.43.162.199) To
 EX13D16EUB001.ant.amazon.com (10.43.166.28)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

The Nitro Enclaves PCI device exposes a MMIO space that this driver
uses to submit command requests and to receive command replies e.g. for
enclave creation / termination or setting enclave resources.

Add logic for handling PCI device command requests based on the given
command type.

Register an MSI-X interrupt vector for command reply notifications to
handle this type of communication events.

Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
---
Changelog

v7 -> v8

* Update function signature for submit request and retrive reply
  functions as they only returned 0, no error code.
* Include command type value in the error logs of ne_do_request().

v6 -> v7

* No changes.

v5 -> v6

* Update documentation to kernel-doc format.

v4 -> v5

* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Return IRQ_NONE when interrupts are not handled.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.

v1 -> v2

* Add log pattern for NE.
* Remove the BUG_ON calls.
* Update goto labels to match their purpose.
* Add fix for kbuild report:
  https://lore.kernel.org/lkml/202004231644.xTmN4Z1z%25lkp@intel.com/
---
 drivers/virt/nitro_enclaves/ne_pci_dev.c | 189 +++++++++++++++++++++++
 1 file changed, 189 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c b/drivers/virt/nitro_enclaves/ne_pci_dev.c
index daf8b36383f1..e9e3ff882cc7 100644
--- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -33,6 +33,172 @@ static const struct pci_device_id ne_pci_ids[] = {
 
 MODULE_DEVICE_TABLE(pci, ne_pci_ids);
 
+/**
+ * ne_submit_request() - Submit command request to the PCI device based on the
+ *			 command type.
+ * @pdev:		PCI device to send the command to.
+ * @cmd_type:		Command type of the request sent to the PCI device.
+ * @cmd_request:	Command request payload.
+ * @cmd_request_size:	Size of the command request payload.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev mutex held.
+ */
+static void ne_submit_request(struct pci_dev *pdev, enum ne_pci_dev_cmd_type cmd_type,
+			      void *cmd_request, size_t cmd_request_size)
+{
+	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+	memcpy_toio(ne_pci_dev->iomem_base + NE_SEND_DATA, cmd_request, cmd_request_size);
+
+	iowrite32(cmd_type, ne_pci_dev->iomem_base + NE_COMMAND);
+}
+
+/**
+ * ne_retrieve_reply() - Retrieve reply from the PCI device.
+ * @pdev:		PCI device to receive the reply from.
+ * @cmd_reply:		Command reply payload.
+ * @cmd_reply_size:	Size of the command reply payload.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev mutex held.
+ */
+static void ne_retrieve_reply(struct pci_dev *pdev, struct ne_pci_dev_cmd_reply *cmd_reply,
+			      size_t cmd_reply_size)
+{
+	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+	memcpy_fromio(cmd_reply, ne_pci_dev->iomem_base + NE_RECV_DATA, cmd_reply_size);
+}
+
+/**
+ * ne_wait_for_reply() - Wait for a reply of a PCI device command.
+ * @pdev:	PCI device for which a reply is waited.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev mutex held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_wait_for_reply(struct pci_dev *pdev)
+{
+	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+	int rc = -EINVAL;
+
+	/*
+	 * TODO: Update to _interruptible and handle interrupted wait event
+	 * e.g. -ERESTARTSYS, incoming signals + update timeout, if needed.
+	 */
+	rc = wait_event_timeout(ne_pci_dev->cmd_reply_wait_q,
+				atomic_read(&ne_pci_dev->cmd_reply_avail) != 0,
+				msecs_to_jiffies(NE_DEFAULT_TIMEOUT_MSECS));
+	if (!rc)
+		return -ETIMEDOUT;
+
+	return 0;
+}
+
+int ne_do_request(struct pci_dev *pdev, enum ne_pci_dev_cmd_type cmd_type,
+		  void *cmd_request, size_t cmd_request_size,
+		  struct ne_pci_dev_cmd_reply *cmd_reply, size_t cmd_reply_size)
+{
+	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+	int rc = -EINVAL;
+
+	if (cmd_type <= INVALID_CMD || cmd_type >= MAX_CMD) {
+		dev_err_ratelimited(&pdev->dev, "Invalid cmd type=%u\n", cmd_type);
+
+		return -EINVAL;
+	}
+
+	if (!cmd_request) {
+		dev_err_ratelimited(&pdev->dev, "Null cmd request for cmd type=%u\n",
+				    cmd_type);
+
+		return -EINVAL;
+	}
+
+	if (cmd_request_size > NE_SEND_DATA_SIZE) {
+		dev_err_ratelimited(&pdev->dev, "Invalid req size=%zu for cmd type=%u\n",
+				    cmd_request_size, cmd_type);
+
+		return -EINVAL;
+	}
+
+	if (!cmd_reply) {
+		dev_err_ratelimited(&pdev->dev, "Null cmd reply for cmd type=%u\n",
+				    cmd_type);
+
+		return -EINVAL;
+	}
+
+	if (cmd_reply_size > NE_RECV_DATA_SIZE) {
+		dev_err_ratelimited(&pdev->dev, "Invalid reply size=%zu for cmd type=%u\n",
+				    cmd_reply_size, cmd_type);
+
+		return -EINVAL;
+	}
+
+	/*
+	 * Use this mutex so that the PCI device handles one command request at
+	 * a time.
+	 */
+	mutex_lock(&ne_pci_dev->pci_dev_mutex);
+
+	atomic_set(&ne_pci_dev->cmd_reply_avail, 0);
+
+	ne_submit_request(pdev, cmd_type, cmd_request, cmd_request_size);
+
+	rc = ne_wait_for_reply(pdev);
+	if (rc < 0) {
+		dev_err_ratelimited(&pdev->dev, "Error in wait for reply for cmd type=%u [rc=%d]\n",
+				    cmd_type, rc);
+
+		goto unlock_mutex;
+	}
+
+	ne_retrieve_reply(pdev, cmd_reply, cmd_reply_size);
+
+	atomic_set(&ne_pci_dev->cmd_reply_avail, 0);
+
+	if (cmd_reply->rc < 0) {
+		rc = cmd_reply->rc;
+
+		dev_err_ratelimited(&pdev->dev, "Error in cmd process logic, cmd type=%u [rc=%d]\n",
+				    cmd_type, rc);
+
+		goto unlock_mutex;
+	}
+
+	rc = 0;
+
+unlock_mutex:
+	mutex_unlock(&ne_pci_dev->pci_dev_mutex);
+
+	return rc;
+}
+
+/**
+ * ne_reply_handler() - Interrupt handler for retrieving a reply matching a
+ *			request sent to the PCI device for enclave lifetime
+ *			management.
+ * @irq:	Received interrupt for a reply sent by the PCI device.
+ * @args:	PCI device private data structure.
+ *
+ * Context: Interrupt context.
+ * Return:
+ * * IRQ_HANDLED on handled interrupt.
+ */
+static irqreturn_t ne_reply_handler(int irq, void *args)
+{
+	struct ne_pci_dev *ne_pci_dev = (struct ne_pci_dev *)args;
+
+	atomic_set(&ne_pci_dev->cmd_reply_avail, 1);
+
+	/* TODO: Update to _interruptible. */
+	wake_up(&ne_pci_dev->cmd_reply_wait_q);
+
+	return IRQ_HANDLED;
+}
+
 /**
  * ne_setup_msix() - Setup MSI-X vectors for the PCI device.
  * @pdev:	PCI device to setup the MSI-X for.
@@ -44,6 +210,7 @@ MODULE_DEVICE_TABLE(pci, ne_pci_ids);
  */
 static int ne_setup_msix(struct pci_dev *pdev)
 {
+	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
 	int nr_vecs = 0;
 	int rc = -EINVAL;
 
@@ -63,7 +230,25 @@ static int ne_setup_msix(struct pci_dev *pdev)
 		return rc;
 	}
 
+	/*
+	 * This IRQ gets triggered every time the PCI device responds to a
+	 * command request. The reply is then retrieved, reading from the MMIO
+	 * space of the PCI device.
+	 */
+	rc = request_irq(pci_irq_vector(pdev, NE_VEC_REPLY), ne_reply_handler,
+			 0, "enclave_cmd", ne_pci_dev);
+	if (rc < 0) {
+		dev_err(&pdev->dev, "Error in request irq reply [rc=%d]\n", rc);
+
+		goto free_irq_vectors;
+	}
+
 	return 0;
+
+free_irq_vectors:
+	pci_free_irq_vectors(pdev);
+
+	return rc;
 }
 
 /**
@@ -74,6 +259,10 @@ static int ne_setup_msix(struct pci_dev *pdev)
  */
 static void ne_teardown_msix(struct pci_dev *pdev)
 {
+	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+	free_irq(pci_irq_vector(pdev, NE_VEC_REPLY), ne_pci_dev);
+
 	pci_free_irq_vectors(pdev);
 }
 

From patchwork Fri Sep  4 17:37:06 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Paraschiv, Andra-Irina" <andraprs@amazon.com>
X-Patchwork-Id: 11758315
Return-Path: <SRS0=EfB3=CN=vger.kernel.org=kvm-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6A8A415AB
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:04 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 2C0ED23F5A
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:04 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com
 header.b="VSYVCsIK"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1728157AbgIDRjI (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Fri, 4 Sep 2020 13:39:08 -0400
Received: from smtp-fw-6002.amazon.com ([52.95.49.90]:24326 "EHLO
        smtp-fw-6002.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1725984AbgIDRin (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 4 Sep 2020 13:38:43 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209;
  t=1599241123; x=1630777123;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=IDJPC6h8zTuYLDlXHaoNCskbQsP9fULf+RfT7UZQ5Dc=;
  b=VSYVCsIKvIBR/3zNkt1uORo9iKTF1e2vTupBHr0zW64zVTtRWsNumgmd
   SR/M/f2wLEj6s6fA7ZxlKfjifAglC8099m42DeHVJHDouKen5PzEFG3/J
   26/W1dIBp338d/8jLjYxgpdRWbH/dGniGxbI1xYv6nQlPaQMC4PKjCfF5
   k=;
X-IronPort-AV: E=Sophos;i="5.76,390,1592870400";
   d="scan'208";a="52099875"
Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO
 email-inbound-relay-1a-807d4a99.us-east-1.amazon.com) ([10.43.8.6])
  by smtp-border-fw-out-6002.iad6.amazon.com with ESMTP;
 04 Sep 2020 17:38:42 +0000
Received: from EX13D16EUB001.ant.amazon.com
 (iad55-ws-svc-p15-lb9-vlan3.iad.amazon.com [10.40.159.166])
        by email-inbound-relay-1a-807d4a99.us-east-1.amazon.com (Postfix) with
 ESMTPS id B87E8A248E;
        Fri,  4 Sep 2020 17:38:38 +0000 (UTC)
Received: from 38f9d34ed3b1.ant.amazon.com (10.43.161.85) by
 EX13D16EUB001.ant.amazon.com (10.43.166.28) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Fri, 4 Sep 2020 17:38:27 +0000
From: Andra Paraschiv <andraprs@amazon.com>
To: linux-kernel <linux-kernel@vger.kernel.org>
CC: Anthony Liguori <aliguori@amazon.com>,
        Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Colm MacCarthaigh <colmmacc@amazon.com>,
        "David Duncan" <davdunc@amazon.com>,
        Bjoern Doebel <doebel@amazon.de>,
        "David Woodhouse" <dwmw@amazon.co.uk>,
        Frank van der Linden <fllinden@amazon.com>,
        Alexander Graf <graf@amazon.de>,
        Greg KH <gregkh@linuxfoundation.org>,
        "Karen Noel" <knoel@redhat.com>,
        Martin Pohlack <mpohlack@amazon.de>,
        Matt Wilson <msw@amazon.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Balbir Singh <sblbir@amazon.com>,
        Stefano Garzarella <sgarzare@redhat.com>,
        "Stefan Hajnoczi" <stefanha@redhat.com>,
        Stewart Smith <trawets@amazon.com>,
        "Uwe Dannowski" <uwed@amazon.de>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        kvm <kvm@vger.kernel.org>,
        ne-devel-upstream <ne-devel-upstream@amazon.com>,
        Andra Paraschiv <andraprs@amazon.com>
Subject: [PATCH v8 06/18] nitro_enclaves: Handle out-of-band PCI device events
Date: Fri, 4 Sep 2020 20:37:06 +0300
Message-ID: <20200904173718.64857-7-andraprs@amazon.com>
X-Mailer: git-send-email 2.20.1 (Apple Git-117)
In-Reply-To: <20200904173718.64857-1-andraprs@amazon.com>
References: <20200904173718.64857-1-andraprs@amazon.com>
MIME-Version: 1.0
X-Originating-IP: [10.43.161.85]
X-ClientProxiedBy: EX13D02UWC003.ant.amazon.com (10.43.162.199) To
 EX13D16EUB001.ant.amazon.com (10.43.166.28)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

In addition to the replies sent by the Nitro Enclaves PCI device in
response to command requests, out-of-band enclave events can happen e.g.
an enclave crashes. In this case, the Nitro Enclaves driver needs to be
aware of the event and notify the corresponding user space process that
abstracts the enclave.

Register an MSI-X interrupt vector to be used for this kind of
out-of-band events. The interrupt notifies that the state of an enclave
changed and the driver logic scans the state of each running enclave to
identify for which this notification is intended.

Create an workqueue to handle the out-of-band events. Notify user space
enclave process that is using a polling mechanism on the enclave fd.

Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
---
Changelog

v7 -> v8

* No changes.

v6 -> v7

* No changes.

v5 -> v6

* Update documentation to kernel-doc format.

v4 -> v5

* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Return IRQ_NONE when interrupts are not handled.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
---
 drivers/virt/nitro_enclaves/ne_pci_dev.c | 116 +++++++++++++++++++++++
 1 file changed, 116 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c b/drivers/virt/nitro_enclaves/ne_pci_dev.c
index e9e3ff882cc7..dcf529ba509d 100644
--- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -199,6 +199,88 @@ static irqreturn_t ne_reply_handler(int irq, void *args)
 	return IRQ_HANDLED;
 }
 
+/**
+ * ne_event_work_handler() - Work queue handler for notifying enclaves on a
+ *			     state change received by the event interrupt
+ *			     handler.
+ * @work:	Item containing the NE PCI device for which an out-of-band event
+ *		was issued.
+ *
+ * An out-of-band event is being issued by the Nitro Hypervisor when at least
+ * one enclave is changing state without client interaction.
+ *
+ * Context: Work queue context.
+ */
+static void ne_event_work_handler(struct work_struct *work)
+{
+	struct ne_pci_dev_cmd_reply cmd_reply = {};
+	struct ne_enclave *ne_enclave = NULL;
+	struct ne_pci_dev *ne_pci_dev =
+		container_of(work, struct ne_pci_dev, notify_work);
+	int rc = -EINVAL;
+	struct slot_info_req slot_info_req = {};
+
+	mutex_lock(&ne_pci_dev->enclaves_list_mutex);
+
+	/*
+	 * Iterate over all enclaves registered for the Nitro Enclaves
+	 * PCI device and determine for which enclave(s) the out-of-band event
+	 * is corresponding to.
+	 */
+	list_for_each_entry(ne_enclave, &ne_pci_dev->enclaves_list, enclave_list_entry) {
+		mutex_lock(&ne_enclave->enclave_info_mutex);
+
+		/*
+		 * Enclaves that were never started cannot receive out-of-band
+		 * events.
+		 */
+		if (ne_enclave->state != NE_STATE_RUNNING)
+			goto unlock;
+
+		slot_info_req.slot_uid = ne_enclave->slot_uid;
+
+		rc = ne_do_request(ne_enclave->pdev, SLOT_INFO, &slot_info_req,
+				   sizeof(slot_info_req), &cmd_reply, sizeof(cmd_reply));
+		if (rc < 0)
+			dev_err(&ne_enclave->pdev->dev, "Error in slot info [rc=%d]\n", rc);
+
+		/* Notify enclave process that the enclave state changed. */
+		if (ne_enclave->state != cmd_reply.state) {
+			ne_enclave->state = cmd_reply.state;
+
+			ne_enclave->has_event = true;
+
+			wake_up_interruptible(&ne_enclave->eventq);
+		}
+
+unlock:
+		 mutex_unlock(&ne_enclave->enclave_info_mutex);
+	}
+
+	mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
+}
+
+/**
+ * ne_event_handler() - Interrupt handler for PCI device out-of-band events.
+ *			This interrupt does not supply any data in the MMIO
+ *			region. It notifies a change in the state of any of
+ *			the launched enclaves.
+ * @irq:	Received interrupt for an out-of-band event.
+ * @args:	PCI device private data structure.
+ *
+ * Context: Interrupt context.
+ * Return:
+ * * IRQ_HANDLED on handled interrupt.
+ */
+static irqreturn_t ne_event_handler(int irq, void *args)
+{
+	struct ne_pci_dev *ne_pci_dev = (struct ne_pci_dev *)args;
+
+	queue_work(ne_pci_dev->event_wq, &ne_pci_dev->notify_work);
+
+	return IRQ_HANDLED;
+}
+
 /**
  * ne_setup_msix() - Setup MSI-X vectors for the PCI device.
  * @pdev:	PCI device to setup the MSI-X for.
@@ -243,8 +325,36 @@ static int ne_setup_msix(struct pci_dev *pdev)
 		goto free_irq_vectors;
 	}
 
+	ne_pci_dev->event_wq = create_singlethread_workqueue("ne_pci_dev_wq");
+	if (!ne_pci_dev->event_wq) {
+		rc = -ENOMEM;
+
+		dev_err(&pdev->dev, "Cannot get wq for dev events [rc=%d]\n", rc);
+
+		goto free_reply_irq_vec;
+	}
+
+	INIT_WORK(&ne_pci_dev->notify_work, ne_event_work_handler);
+
+	/*
+	 * This IRQ gets triggered every time any enclave's state changes. Its
+	 * handler then scans for the changes and propagates them to the user
+	 * space.
+	 */
+	rc = request_irq(pci_irq_vector(pdev, NE_VEC_EVENT), ne_event_handler,
+			 0, "enclave_evt", ne_pci_dev);
+	if (rc < 0) {
+		dev_err(&pdev->dev, "Error in request irq event [rc=%d]\n", rc);
+
+		goto destroy_wq;
+	}
+
 	return 0;
 
+destroy_wq:
+	destroy_workqueue(ne_pci_dev->event_wq);
+free_reply_irq_vec:
+	free_irq(pci_irq_vector(pdev, NE_VEC_REPLY), ne_pci_dev);
 free_irq_vectors:
 	pci_free_irq_vectors(pdev);
 
@@ -261,6 +371,12 @@ static void ne_teardown_msix(struct pci_dev *pdev)
 {
 	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
 
+	free_irq(pci_irq_vector(pdev, NE_VEC_EVENT), ne_pci_dev);
+
+	flush_work(&ne_pci_dev->notify_work);
+	flush_workqueue(ne_pci_dev->event_wq);
+	destroy_workqueue(ne_pci_dev->event_wq);
+
 	free_irq(pci_irq_vector(pdev, NE_VEC_REPLY), ne_pci_dev);
 
 	pci_free_irq_vectors(pdev);

From patchwork Fri Sep  4 17:37:07 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Paraschiv, Andra-Irina" <andraprs@amazon.com>
X-Patchwork-Id: 11758313
Return-Path: <SRS0=EfB3=CN=vger.kernel.org=kvm-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 153981599
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:04 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id E512023F38
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:03 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com
 header.b="sMQqw4gW"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1728149AbgIDRjC (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Fri, 4 Sep 2020 13:39:02 -0400
Received: from smtp-fw-33001.amazon.com ([207.171.190.10]:36828 "EHLO
        smtp-fw-33001.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1728142AbgIDRiy (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 4 Sep 2020 13:38:54 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209;
  t=1599241134; x=1630777134;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=fFdKeLYDWj4ETO2b02HMa6YOvLDpd4VgwKpWlnv2J7o=;
  b=sMQqw4gWtttEueDBi0mf2NK10ZH9jkbjgfM01KFldc/u4TdqCtlsRu+S
   IHxmsrGpri36IBcgObg3aQlDkxZE0SThloVLolQ2t27zxI0Pyc3M55c6e
   hi0Mecvp0vCoilAjli74Ph7vApWqBI/J/rwj1cC9FqJ28OzIF63Qnvg8p
   I=;
X-IronPort-AV: E=Sophos;i="5.76,390,1592870400";
   d="scan'208";a="72479435"
Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO
 email-inbound-relay-1a-715bee71.us-east-1.amazon.com) ([10.47.23.38])
  by smtp-border-fw-out-33001.sea14.amazon.com with ESMTP;
 04 Sep 2020 17:38:52 +0000
Received: from EX13D16EUB001.ant.amazon.com
 (iad55-ws-svc-p15-lb9-vlan3.iad.amazon.com [10.40.159.166])
        by email-inbound-relay-1a-715bee71.us-east-1.amazon.com (Postfix) with
 ESMTPS id 9FDDBA0383;
        Fri,  4 Sep 2020 17:38:49 +0000 (UTC)
Received: from 38f9d34ed3b1.ant.amazon.com (10.43.161.85) by
 EX13D16EUB001.ant.amazon.com (10.43.166.28) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Fri, 4 Sep 2020 17:38:38 +0000
From: Andra Paraschiv <andraprs@amazon.com>
To: linux-kernel <linux-kernel@vger.kernel.org>
CC: Anthony Liguori <aliguori@amazon.com>,
        Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Colm MacCarthaigh <colmmacc@amazon.com>,
        "David Duncan" <davdunc@amazon.com>,
        Bjoern Doebel <doebel@amazon.de>,
        "David Woodhouse" <dwmw@amazon.co.uk>,
        Frank van der Linden <fllinden@amazon.com>,
        Alexander Graf <graf@amazon.de>,
        Greg KH <gregkh@linuxfoundation.org>,
        "Karen Noel" <knoel@redhat.com>,
        Martin Pohlack <mpohlack@amazon.de>,
        Matt Wilson <msw@amazon.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Balbir Singh <sblbir@amazon.com>,
        Stefano Garzarella <sgarzare@redhat.com>,
        "Stefan Hajnoczi" <stefanha@redhat.com>,
        Stewart Smith <trawets@amazon.com>,
        "Uwe Dannowski" <uwed@amazon.de>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        kvm <kvm@vger.kernel.org>,
        ne-devel-upstream <ne-devel-upstream@amazon.com>,
        Andra Paraschiv <andraprs@amazon.com>
Subject: [PATCH v8 07/18] nitro_enclaves: Init misc device providing the ioctl
 interface
Date: Fri, 4 Sep 2020 20:37:07 +0300
Message-ID: <20200904173718.64857-8-andraprs@amazon.com>
X-Mailer: git-send-email 2.20.1 (Apple Git-117)
In-Reply-To: <20200904173718.64857-1-andraprs@amazon.com>
References: <20200904173718.64857-1-andraprs@amazon.com>
MIME-Version: 1.0
X-Originating-IP: [10.43.161.85]
X-ClientProxiedBy: EX13D02UWC003.ant.amazon.com (10.43.162.199) To
 EX13D16EUB001.ant.amazon.com (10.43.166.28)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

The Nitro Enclaves driver provides an ioctl interface to the user space
for enclave lifetime management e.g. enclave creation / termination and
setting enclave resources such as memory and CPU.

This ioctl interface is mapped to a Nitro Enclaves misc device.

Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
---
Changelog

v7 -> v8

* Add define for the CID of the primary / parent VM.
* Update the NE PCI driver shutdown logic to include misc device
  deregister.

v6 -> v7

* Set the NE PCI device the parent of the NE misc device to be able to
  use it in the ioctl logic.
* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.

v5 -> v6

* Remove the ioctl to query API version.
* Update documentation to kernel-doc format.

v4 -> v5

* Update the size of the NE CPU pool string from 4096 to 512 chars.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Remove the NE CPU pool init during kernel module loading, as the CPU
  pool is now setup at runtime, via a sysfs file for the kernel
  parameter.
* Add minimum enclave memory size definition.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.
* Remove the WARN_ON calls.
* Remove linux/bug and linux/kvm_host includes that are not needed.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.
* Remove file ops that do nothing for now - open and release.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Update ne_cpu_pool data structure to include the global mutex.
* Update NE misc device mode to 0660.
* Check if the CPU siblings are included in the NE CPU pool, as full CPU
  cores are given for the enclave(s).
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 135 ++++++++++++++++++++++
 drivers/virt/nitro_enclaves/ne_pci_dev.c  |  21 ++++
 2 files changed, 156 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.c

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c b/drivers/virt/nitro_enclaves/ne_misc_dev.c
new file mode 100644
index 000000000000..6bb05217b593
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+/**
+ * DOC: Enclave lifetime management driver for Nitro Enclaves (NE).
+ * Nitro is a hypervisor that has been developed by Amazon.
+ */
+
+#include <linux/anon_inodes.h>
+#include <linux/capability.h>
+#include <linux/cpu.h>
+#include <linux/device.h>
+#include <linux/file.h>
+#include <linux/hugetlb.h>
+#include <linux/limits.h>
+#include <linux/list.h>
+#include <linux/miscdevice.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/nitro_enclaves.h>
+#include <linux/pci.h>
+#include <linux/poll.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <uapi/linux/vm_sockets.h>
+
+#include "ne_misc_dev.h"
+#include "ne_pci_dev.h"
+
+/**
+ * NE_CPUS_SIZE - Size for max 128 CPUs, for now, in a cpu-list string, comma
+ *		  separated. The NE CPU pool includes CPUs from a single NUMA
+ *		  node.
+ */
+#define NE_CPUS_SIZE		(512)
+
+/**
+ * NE_EIF_LOAD_OFFSET - The offset where to copy the Enclave Image Format (EIF)
+ *			image in enclave memory.
+ */
+#define NE_EIF_LOAD_OFFSET	(8 * 1024UL * 1024UL)
+
+/**
+ * NE_MIN_ENCLAVE_MEM_SIZE - The minimum memory size an enclave can be launched
+ *			     with.
+ */
+#define NE_MIN_ENCLAVE_MEM_SIZE	(64 * 1024UL * 1024UL)
+
+/**
+ * NE_MIN_MEM_REGION_SIZE - The minimum size of an enclave memory region.
+ */
+#define NE_MIN_MEM_REGION_SIZE	(2 * 1024UL * 1024UL)
+
+/**
+ * NE_PARENT_VM_CID - The CID for the vsock device of the primary / parent VM.
+ */
+#define NE_PARENT_VM_CID	(3)
+
+/*
+ * TODO: Update logic to create new sysfs entries instead of using
+ * a kernel parameter e.g. if multiple sysfs files needed.
+ */
+static const struct kernel_param_ops ne_cpu_pool_ops = {
+	.get	= param_get_string,
+};
+
+static char ne_cpus[NE_CPUS_SIZE];
+static struct kparam_string ne_cpus_arg = {
+	.maxlen	= sizeof(ne_cpus),
+	.string	= ne_cpus,
+};
+
+module_param_cb(ne_cpus, &ne_cpu_pool_ops, &ne_cpus_arg, 0644);
+/* https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html#cpu-lists */
+MODULE_PARM_DESC(ne_cpus, "<cpu-list> - CPU pool used for Nitro Enclaves");
+
+/**
+ * struct ne_cpu_pool - CPU pool used for Nitro Enclaves.
+ * @avail_threads_per_core:	Available full CPU cores to be dedicated to
+ *				enclave(s). The cpumasks from the array, indexed
+ *				by core id, contain all the threads from the
+ *				available cores, that are not set for created
+ *				enclave(s). The full CPU cores are part of the
+ *				NE CPU pool.
+ * @mutex:			Mutex for the access to the NE CPU pool.
+ * @nr_parent_vm_cores :	The size of the available threads per core array.
+ *				The total number of CPU cores available on the
+ *				primary / parent VM.
+ * @nr_threads_per_core:	The number of threads that a full CPU core has.
+ * @numa_node:			NUMA node of the CPUs in the pool.
+ */
+struct ne_cpu_pool {
+	cpumask_var_t	*avail_threads_per_core;
+	struct mutex	mutex;
+	unsigned int	nr_parent_vm_cores;
+	unsigned int	nr_threads_per_core;
+	int		numa_node;
+};
+
+static struct ne_cpu_pool ne_cpu_pool;
+
+static const struct file_operations ne_fops = {
+	.owner		= THIS_MODULE,
+	.llseek		= noop_llseek,
+};
+
+struct miscdevice ne_misc_dev = {
+	.minor	= MISC_DYNAMIC_MINOR,
+	.name	= "nitro_enclaves",
+	.fops	= &ne_fops,
+	.mode	= 0660,
+};
+
+static int __init ne_init(void)
+{
+	mutex_init(&ne_cpu_pool.mutex);
+
+	return pci_register_driver(&ne_pci_driver);
+}
+
+static void __exit ne_exit(void)
+{
+	pci_unregister_driver(&ne_pci_driver);
+}
+
+module_init(ne_init);
+module_exit(ne_exit);
+
+MODULE_AUTHOR("Amazon.com, Inc. or its affiliates");
+MODULE_DESCRIPTION("Nitro Enclaves Driver");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c b/drivers/virt/nitro_enclaves/ne_pci_dev.c
index dcf529ba509d..0d2a49a5d87e 100644
--- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -512,6 +512,16 @@ static int ne_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		goto teardown_msix;
 	}
 
+	/* Set the NE PCI device as parent to use it in the ioctl logic. */
+	ne_misc_dev.parent = &pdev->dev;
+
+	rc = misc_register(&ne_misc_dev);
+	if (rc < 0) {
+		dev_err(&pdev->dev, "Error in misc dev register [rc=%d]\n", rc);
+
+		goto disable_ne_pci_dev;
+	}
+
 	atomic_set(&ne_pci_dev->cmd_reply_avail, 0);
 	init_waitqueue_head(&ne_pci_dev->cmd_reply_wait_q);
 	INIT_LIST_HEAD(&ne_pci_dev->enclaves_list);
@@ -521,6 +531,9 @@ static int ne_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	return 0;
 
+disable_ne_pci_dev:
+	ne_misc_dev.parent = NULL;
+	ne_pci_dev_disable(pdev);
 teardown_msix:
 	ne_teardown_msix(pdev);
 iounmap_pci_bar:
@@ -546,6 +559,10 @@ static void ne_pci_remove(struct pci_dev *pdev)
 {
 	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
 
+	misc_deregister(&ne_misc_dev);
+
+	ne_misc_dev.parent = NULL;
+
 	ne_pci_dev_disable(pdev);
 
 	ne_teardown_msix(pdev);
@@ -574,6 +591,10 @@ static void ne_pci_shutdown(struct pci_dev *pdev)
 	if (!ne_pci_dev)
 		return;
 
+	misc_deregister(&ne_misc_dev);
+
+	ne_misc_dev.parent = NULL;
+
 	ne_pci_dev_disable(pdev);
 
 	ne_teardown_msix(pdev);

From patchwork Fri Sep  4 17:37:08 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Paraschiv, Andra-Irina" <andraprs@amazon.com>
X-Patchwork-Id: 11758317
Return-Path: <SRS0=EfB3=CN=vger.kernel.org=kvm-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9928C92C
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:04 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 711CF23F5A
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:04 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com
 header.b="Dd6HnfQr"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1728164AbgIDRjJ (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Fri, 4 Sep 2020 13:39:09 -0400
Received: from smtp-fw-4101.amazon.com ([72.21.198.25]:50152 "EHLO
        smtp-fw-4101.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1728084AbgIDRjD (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 4 Sep 2020 13:39:03 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209;
  t=1599241142; x=1630777142;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=tGrC+SlxjJeG3uHG/01gS56i7/7e0xoTW543TJVmOGg=;
  b=Dd6HnfQrEcy11BMtuYkre8xkOdNc08F4DSAj1eOFBbKIhTICcbCEgs6M
   u7BKtJmtc3Jwo8I2j/9qesdk1x3Bl/frIHBsSYyZDWVwyTBPNXKDM6QCs
   69j59GVeaue+EMjLboXqWgroOOH7is+lsp3AdULjfoY2YTX4zouH2PW0G
   c=;
X-IronPort-AV: E=Sophos;i="5.76,390,1592870400";
   d="scan'208";a="52179026"
Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO
 email-inbound-relay-1e-17c49630.us-east-1.amazon.com) ([10.43.8.6])
  by smtp-border-fw-out-4101.iad4.amazon.com with ESMTP;
 04 Sep 2020 17:39:01 +0000
Received: from EX13D16EUB001.ant.amazon.com
 (iad55-ws-svc-p15-lb9-vlan2.iad.amazon.com [10.40.159.162])
        by email-inbound-relay-1e-17c49630.us-east-1.amazon.com (Postfix) with
 ESMTPS id 57D9EA0679;
        Fri,  4 Sep 2020 17:38:59 +0000 (UTC)
Received: from 38f9d34ed3b1.ant.amazon.com (10.43.161.85) by
 EX13D16EUB001.ant.amazon.com (10.43.166.28) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Fri, 4 Sep 2020 17:38:49 +0000
From: Andra Paraschiv <andraprs@amazon.com>
To: linux-kernel <linux-kernel@vger.kernel.org>
CC: Anthony Liguori <aliguori@amazon.com>,
        Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Colm MacCarthaigh <colmmacc@amazon.com>,
        "David Duncan" <davdunc@amazon.com>,
        Bjoern Doebel <doebel@amazon.de>,
        "David Woodhouse" <dwmw@amazon.co.uk>,
        Frank van der Linden <fllinden@amazon.com>,
        Alexander Graf <graf@amazon.de>,
        Greg KH <gregkh@linuxfoundation.org>,
        "Karen Noel" <knoel@redhat.com>,
        Martin Pohlack <mpohlack@amazon.de>,
        Matt Wilson <msw@amazon.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Balbir Singh <sblbir@amazon.com>,
        Stefano Garzarella <sgarzare@redhat.com>,
        "Stefan Hajnoczi" <stefanha@redhat.com>,
        Stewart Smith <trawets@amazon.com>,
        "Uwe Dannowski" <uwed@amazon.de>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        kvm <kvm@vger.kernel.org>,
        ne-devel-upstream <ne-devel-upstream@amazon.com>,
        Andra Paraschiv <andraprs@amazon.com>
Subject: [PATCH v8 08/18] nitro_enclaves: Add logic for creating an enclave VM
Date: Fri, 4 Sep 2020 20:37:08 +0300
Message-ID: <20200904173718.64857-9-andraprs@amazon.com>
X-Mailer: git-send-email 2.20.1 (Apple Git-117)
In-Reply-To: <20200904173718.64857-1-andraprs@amazon.com>
References: <20200904173718.64857-1-andraprs@amazon.com>
MIME-Version: 1.0
X-Originating-IP: [10.43.161.85]
X-ClientProxiedBy: EX13D02UWC003.ant.amazon.com (10.43.162.199) To
 EX13D16EUB001.ant.amazon.com (10.43.166.28)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

Add ioctl command logic for enclave VM creation. It triggers a slot
allocation. The enclave resources will be associated with this slot and
it will be used as an identifier for triggering enclave run.

Return a file descriptor, namely enclave fd. This is further used by the
associated user space enclave process to set enclave resources and
trigger enclave termination.

The poll function is implemented in order to notify the enclave process
when an enclave exits without a specific enclave termination command
trigger e.g. when an enclave crashes.

Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
---
Changelog

v7 -> v8

* No changes.

v6 -> v7

* Use the NE misc device parent field to get the NE PCI device.
* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.

v5 -> v6

* Update the code base to init the ioctl function in this patch.
* Update documentation to kernel-doc format.

v4 -> v5

* Release the reference to the NE PCI device on create VM error.
* Close enclave fd on copy_to_user() failure; rename fd to enclave fd
  while at it.
* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.
* Remove log on copy_to_user() failure.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.
* Add metadata for the NUMA node for the enclave memory and CPUs.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
* Remove file ops that do nothing for now - open.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 226 ++++++++++++++++++++++
 1 file changed, 226 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 6bb05217b593..7ad3f1eb75d4 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -103,9 +103,235 @@ struct ne_cpu_pool {
 
 static struct ne_cpu_pool ne_cpu_pool;
 
+/**
+ * ne_enclave_poll() - Poll functionality used for enclave out-of-band events.
+ * @file:	File associated with this poll function.
+ * @wait:	Poll table data structure.
+ *
+ * Context: Process context.
+ * Return:
+ * * Poll mask.
+ */
+static __poll_t ne_enclave_poll(struct file *file, poll_table *wait)
+{
+	__poll_t mask = 0;
+	struct ne_enclave *ne_enclave = file->private_data;
+
+	poll_wait(file, &ne_enclave->eventq, wait);
+
+	if (!ne_enclave->has_event)
+		return mask;
+
+	mask = POLLHUP;
+
+	return mask;
+}
+
+static const struct file_operations ne_enclave_fops = {
+	.owner		= THIS_MODULE,
+	.llseek		= noop_llseek,
+	.poll		= ne_enclave_poll,
+};
+
+/**
+ * ne_create_vm_ioctl() - Alloc slot to be associated with an enclave. Create
+ *			  enclave file descriptor to be further used for enclave
+ *			  resources handling e.g. memory regions and CPUs.
+ * @pdev:		PCI device used for enclave lifetime management.
+ * @ne_pci_dev :	Private data associated with the PCI device.
+ * @slot_uid:		Generated unique slot id associated with an enclave.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev enclave
+ *	    mutex held.
+ * Return:
+ * * Enclave fd on success.
+ * * Negative return value on failure.
+ */
+static int ne_create_vm_ioctl(struct pci_dev *pdev, struct ne_pci_dev *ne_pci_dev,
+			      u64 *slot_uid)
+{
+	struct ne_pci_dev_cmd_reply cmd_reply = {};
+	int enclave_fd = -1;
+	struct file *enclave_file = NULL;
+	unsigned int i = 0;
+	struct ne_enclave *ne_enclave = NULL;
+	int rc = -EINVAL;
+	struct slot_alloc_req slot_alloc_req = {};
+
+	mutex_lock(&ne_cpu_pool.mutex);
+
+	for (i = 0; i < ne_cpu_pool.nr_parent_vm_cores; i++)
+		if (!cpumask_empty(ne_cpu_pool.avail_threads_per_core[i]))
+			break;
+
+	if (i == ne_cpu_pool.nr_parent_vm_cores) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "No CPUs available in CPU pool\n");
+
+		mutex_unlock(&ne_cpu_pool.mutex);
+
+		return -NE_ERR_NO_CPUS_AVAIL_IN_POOL;
+	}
+
+	mutex_unlock(&ne_cpu_pool.mutex);
+
+	ne_enclave = kzalloc(sizeof(*ne_enclave), GFP_KERNEL);
+	if (!ne_enclave)
+		return -ENOMEM;
+
+	mutex_lock(&ne_cpu_pool.mutex);
+
+	ne_enclave->nr_parent_vm_cores = ne_cpu_pool.nr_parent_vm_cores;
+	ne_enclave->nr_threads_per_core = ne_cpu_pool.nr_threads_per_core;
+	ne_enclave->numa_node = ne_cpu_pool.numa_node;
+
+	mutex_unlock(&ne_cpu_pool.mutex);
+
+	ne_enclave->threads_per_core = kcalloc(ne_enclave->nr_parent_vm_cores,
+		sizeof(*ne_enclave->threads_per_core), GFP_KERNEL);
+	if (!ne_enclave->threads_per_core) {
+		rc = -ENOMEM;
+
+		goto free_ne_enclave;
+	}
+
+	for (i = 0; i < ne_enclave->nr_parent_vm_cores; i++)
+		if (!zalloc_cpumask_var(&ne_enclave->threads_per_core[i], GFP_KERNEL)) {
+			rc = -ENOMEM;
+
+			goto free_cpumask;
+		}
+
+	if (!zalloc_cpumask_var(&ne_enclave->vcpu_ids, GFP_KERNEL)) {
+		rc = -ENOMEM;
+
+		goto free_cpumask;
+	}
+
+	ne_enclave->pdev = pdev;
+
+	enclave_fd = get_unused_fd_flags(O_CLOEXEC);
+	if (enclave_fd < 0) {
+		rc = enclave_fd;
+
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Error in getting unused fd [rc=%d]\n", rc);
+
+		goto free_cpumask;
+	}
+
+	enclave_file = anon_inode_getfile("ne-vm", &ne_enclave_fops, ne_enclave, O_RDWR);
+	if (IS_ERR(enclave_file)) {
+		rc = PTR_ERR(enclave_file);
+
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Error in anon inode get file [rc=%d]\n", rc);
+
+		goto put_fd;
+	}
+
+	rc = ne_do_request(ne_enclave->pdev, SLOT_ALLOC, &slot_alloc_req, sizeof(slot_alloc_req),
+			   &cmd_reply, sizeof(cmd_reply));
+	if (rc < 0) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Error in slot alloc [rc=%d]\n", rc);
+
+		goto put_file;
+	}
+
+	init_waitqueue_head(&ne_enclave->eventq);
+	ne_enclave->has_event = false;
+	mutex_init(&ne_enclave->enclave_info_mutex);
+	ne_enclave->max_mem_regions = cmd_reply.mem_regions;
+	INIT_LIST_HEAD(&ne_enclave->mem_regions_list);
+	ne_enclave->mm = current->mm;
+	ne_enclave->slot_uid = cmd_reply.slot_uid;
+	ne_enclave->state = NE_STATE_INIT;
+
+	list_add(&ne_enclave->enclave_list_entry, &ne_pci_dev->enclaves_list);
+
+	*slot_uid = ne_enclave->slot_uid;
+
+	fd_install(enclave_fd, enclave_file);
+
+	return enclave_fd;
+
+put_file:
+	fput(enclave_file);
+put_fd:
+	put_unused_fd(enclave_fd);
+free_cpumask:
+	free_cpumask_var(ne_enclave->vcpu_ids);
+	for (i = 0; i < ne_enclave->nr_parent_vm_cores; i++)
+		free_cpumask_var(ne_enclave->threads_per_core[i]);
+	kfree(ne_enclave->threads_per_core);
+free_ne_enclave:
+	kfree(ne_enclave);
+
+	return rc;
+}
+
+/**
+ * ne_ioctl() - Ioctl function provided by the NE misc device.
+ * @file:	File associated with this ioctl function.
+ * @cmd:	The command that is set for the ioctl call.
+ * @arg:	The argument that is provided for the ioctl call.
+ *
+ * Context: Process context.
+ * Return:
+ * * Ioctl result (e.g. enclave file descriptor) on success.
+ * * Negative return value on failure.
+ */
+static long ne_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	switch (cmd) {
+	case NE_CREATE_VM: {
+		int enclave_fd = -1;
+		struct file *enclave_file = NULL;
+		struct ne_pci_dev *ne_pci_dev = NULL;
+		struct pci_dev *pdev = to_pci_dev(ne_misc_dev.parent);
+		int rc = -EINVAL;
+		u64 slot_uid = 0;
+
+		ne_pci_dev = pci_get_drvdata(pdev);
+
+		mutex_lock(&ne_pci_dev->enclaves_list_mutex);
+
+		enclave_fd = ne_create_vm_ioctl(pdev, ne_pci_dev, &slot_uid);
+		if (enclave_fd < 0) {
+			rc = enclave_fd;
+
+			mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
+
+			return rc;
+		}
+
+		mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
+
+		if (copy_to_user((void __user *)arg, &slot_uid, sizeof(slot_uid))) {
+			enclave_file = fget(enclave_fd);
+			/* Decrement file refs to have release() called. */
+			fput(enclave_file);
+			fput(enclave_file);
+			put_unused_fd(enclave_fd);
+
+			return -EFAULT;
+		}
+
+		return enclave_fd;
+	}
+
+	default:
+		return -ENOTTY;
+	}
+
+	return 0;
+}
+
 static const struct file_operations ne_fops = {
 	.owner		= THIS_MODULE,
 	.llseek		= noop_llseek,
+	.unlocked_ioctl	= ne_ioctl,
 };
 
 struct miscdevice ne_misc_dev = {

From patchwork Fri Sep  4 17:37:09 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Paraschiv, Andra-Irina" <andraprs@amazon.com>
X-Patchwork-Id: 11758319
Return-Path: <SRS0=EfB3=CN=vger.kernel.org=kvm-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ED8F71599
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:04 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id C766F2405A
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:04 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com
 header.b="oFWQ+sA3"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1728141AbgIDRjo (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Fri, 4 Sep 2020 13:39:44 -0400
Received: from smtp-fw-33001.amazon.com ([207.171.190.10]:36949 "EHLO
        smtp-fw-33001.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1728191AbgIDRjS (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 4 Sep 2020 13:39:18 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209;
  t=1599241156; x=1630777156;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=M9VSxAAq5b6t/rz6JqYdglWzpAZxeeerj4f6IiFPC1o=;
  b=oFWQ+sA3NKedCSD0nhIuGdeXMGkjOtX1xfclnXVY+jsG54K/RdqhDgyR
   IqtlU1Y2CNPjmConDczxd7eem9PqMBfZe2omLl6tfifk0X6ESWMK/0rR9
   3vqVBjwJuY10tKJQ+mXy1gafcUcl4+RxuaD86wek6Dh9rgW8W5M4ob+tF
   U=;
X-IronPort-AV: E=Sophos;i="5.76,390,1592870400";
   d="scan'208";a="72479568"
Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO
 email-inbound-relay-1d-474bcd9f.us-east-1.amazon.com) ([10.47.23.38])
  by smtp-border-fw-out-33001.sea14.amazon.com with ESMTP;
 04 Sep 2020 17:39:15 +0000
Received: from EX13D16EUB001.ant.amazon.com
 (iad55-ws-svc-p15-lb9-vlan3.iad.amazon.com [10.40.159.166])
        by email-inbound-relay-1d-474bcd9f.us-east-1.amazon.com (Postfix) with
 ESMTPS id 1012AA266E;
        Fri,  4 Sep 2020 17:39:11 +0000 (UTC)
Received: from 38f9d34ed3b1.ant.amazon.com (10.43.161.85) by
 EX13D16EUB001.ant.amazon.com (10.43.166.28) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Fri, 4 Sep 2020 17:38:58 +0000
From: Andra Paraschiv <andraprs@amazon.com>
To: linux-kernel <linux-kernel@vger.kernel.org>
CC: Anthony Liguori <aliguori@amazon.com>,
        Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Colm MacCarthaigh <colmmacc@amazon.com>,
        "David Duncan" <davdunc@amazon.com>,
        Bjoern Doebel <doebel@amazon.de>,
        "David Woodhouse" <dwmw@amazon.co.uk>,
        Frank van der Linden <fllinden@amazon.com>,
        Alexander Graf <graf@amazon.de>,
        Greg KH <gregkh@linuxfoundation.org>,
        "Karen Noel" <knoel@redhat.com>,
        Martin Pohlack <mpohlack@amazon.de>,
        Matt Wilson <msw@amazon.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Balbir Singh <sblbir@amazon.com>,
        Stefano Garzarella <sgarzare@redhat.com>,
        "Stefan Hajnoczi" <stefanha@redhat.com>,
        Stewart Smith <trawets@amazon.com>,
        "Uwe Dannowski" <uwed@amazon.de>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        kvm <kvm@vger.kernel.org>,
        ne-devel-upstream <ne-devel-upstream@amazon.com>,
        Andra Paraschiv <andraprs@amazon.com>
Subject: [PATCH v8 09/18] nitro_enclaves: Add logic for setting an enclave
 vCPU
Date: Fri, 4 Sep 2020 20:37:09 +0300
Message-ID: <20200904173718.64857-10-andraprs@amazon.com>
X-Mailer: git-send-email 2.20.1 (Apple Git-117)
In-Reply-To: <20200904173718.64857-1-andraprs@amazon.com>
References: <20200904173718.64857-1-andraprs@amazon.com>
MIME-Version: 1.0
X-Originating-IP: [10.43.161.85]
X-ClientProxiedBy: EX13D02UWC003.ant.amazon.com (10.43.162.199) To
 EX13D16EUB001.ant.amazon.com (10.43.166.28)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

An enclave, before being started, has its resources set. One of its
resources is CPU.

A NE CPU pool is set and enclave CPUs are chosen from it. Offline the
CPUs from the NE CPU pool during the pool setup and online them back
during the NE CPU pool teardown. The CPU offline is necessary so that
there would not be more vCPUs than physical CPUs available to the
primary / parent VM. In that case the CPUs would be overcommitted and
would change the initial configuration of the primary / parent VM of
having dedicated vCPUs to physical CPUs.

The enclave CPUs need to be full cores and from the same NUMA node. CPU
0 and its siblings have to remain available to the primary / parent VM.

Add ioctl command logic for setting an enclave vCPU.

Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
---
Changelog

v7 -> v8

* No changes.

v6 -> v7

* Check for error return value when setting the kernel parameter string.
* Use the NE misc device parent field to get the NE PCI device.
* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.
* Calculate the number of threads per core and not use smp_num_siblings
  that is x86 specific.

v5 -> v6

* Check CPUs are from the same NUMA node before going through CPU
  siblings during the NE CPU pool setup.
* Update documentation to kernel-doc format.

v4 -> v5

* Set empty string in case of invalid NE CPU pool.
* Clear NE CPU pool mask on pool setup failure.
* Setup NE CPU cores out of the NE CPU pool.
* Early exit on NE CPU pool setup if enclave(s) already running.
* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.
* Add check for maximum vCPU id possible before looking into the CPU
  pool.
* Remove log on copy_from_user() / copy_to_user() failure and on admin
  capability check for setting the NE CPU pool.
* Update the ioctl call to not create a file descriptor for the vCPU.
* Split the CPU pool usage logic in 2 separate functions - one to get a
  CPU from the pool and the other to check the given CPU is available in
  the pool.

v3 -> v4

* Setup the NE CPU pool at runtime via a sysfs file for the kernel
  parameter.
* Check enclave CPUs to be from the same NUMA node.
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
* Remove file ops that do nothing for now - open, ioctl and release.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
* Check if enclave state is init when setting enclave vCPU.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 702 ++++++++++++++++++++++
 1 file changed, 702 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 7ad3f1eb75d4..0477b11bf15d 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -64,8 +64,11 @@
  * TODO: Update logic to create new sysfs entries instead of using
  * a kernel parameter e.g. if multiple sysfs files needed.
  */
+static int ne_set_kernel_param(const char *val, const struct kernel_param *kp);
+
 static const struct kernel_param_ops ne_cpu_pool_ops = {
 	.get	= param_get_string,
+	.set	= ne_set_kernel_param,
 };
 
 static char ne_cpus[NE_CPUS_SIZE];
@@ -103,6 +106,702 @@ struct ne_cpu_pool {
 
 static struct ne_cpu_pool ne_cpu_pool;
 
+/**
+ * ne_check_enclaves_created() - Verify if at least one enclave has been created.
+ * @void:	No parameters provided.
+ *
+ * Context: Process context.
+ * Return:
+ * * True if at least one enclave is created.
+ * * False otherwise.
+ */
+static bool ne_check_enclaves_created(void)
+{
+	struct ne_pci_dev *ne_pci_dev = NULL;
+	struct pci_dev *pdev = NULL;
+	bool ret = false;
+
+	if (!ne_misc_dev.parent)
+		return ret;
+
+	pdev = to_pci_dev(ne_misc_dev.parent);
+	if (!pdev)
+		return ret;
+
+	ne_pci_dev = pci_get_drvdata(pdev);
+	if (!ne_pci_dev)
+		return ret;
+
+	mutex_lock(&ne_pci_dev->enclaves_list_mutex);
+
+	if (!list_empty(&ne_pci_dev->enclaves_list))
+		ret = true;
+
+	mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
+
+	return ret;
+}
+
+/**
+ * ne_setup_cpu_pool() - Set the NE CPU pool after handling sanity checks such
+ *			 as not sharing CPU cores with the primary / parent VM
+ *			 or not using CPU 0, which should remain available for
+ *			 the primary / parent VM. Offline the CPUs from the
+ *			 pool after the checks passed.
+ * @ne_cpu_list:	The CPU list used for setting NE CPU pool.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_setup_cpu_pool(const char *ne_cpu_list)
+{
+	int core_id = -1;
+	unsigned int cpu = 0;
+	cpumask_var_t cpu_pool;
+	unsigned int cpu_sibling = 0;
+	unsigned int i = 0;
+	int numa_node = -1;
+	int rc = -EINVAL;
+
+	if (!zalloc_cpumask_var(&cpu_pool, GFP_KERNEL))
+		return -ENOMEM;
+
+	mutex_lock(&ne_cpu_pool.mutex);
+
+	rc = cpulist_parse(ne_cpu_list, cpu_pool);
+	if (rc < 0) {
+		pr_err("%s: Error in cpulist parse [rc=%d]\n", ne_misc_dev.name, rc);
+
+		goto free_pool_cpumask;
+	}
+
+	cpu = cpumask_any(cpu_pool);
+	if (cpu >= nr_cpu_ids) {
+		pr_err("%s: No CPUs available in CPU pool\n", ne_misc_dev.name);
+
+		rc = -EINVAL;
+
+		goto free_pool_cpumask;
+	}
+
+	/*
+	 * Check if the CPUs are online, to further get info about them
+	 * e.g. numa node, core id, siblings.
+	 */
+	for_each_cpu(cpu, cpu_pool)
+		if (cpu_is_offline(cpu)) {
+			pr_err("%s: CPU %d is offline, has to be online to get its metadata\n",
+			       ne_misc_dev.name, cpu);
+
+			rc = -EINVAL;
+
+			goto free_pool_cpumask;
+		}
+
+	/*
+	 * Check if the CPUs from the NE CPU pool are from the same NUMA node.
+	 */
+	for_each_cpu(cpu, cpu_pool)
+		if (numa_node < 0) {
+			numa_node = cpu_to_node(cpu);
+			if (numa_node < 0) {
+				pr_err("%s: Invalid NUMA node %d\n",
+				       ne_misc_dev.name, numa_node);
+
+				rc = -EINVAL;
+
+				goto free_pool_cpumask;
+			}
+		} else {
+			if (numa_node != cpu_to_node(cpu)) {
+				pr_err("%s: CPUs with different NUMA nodes\n",
+				       ne_misc_dev.name);
+
+				rc = -EINVAL;
+
+				goto free_pool_cpumask;
+			}
+		}
+
+	/*
+	 * Check if CPU 0 and its siblings are included in the provided CPU pool
+	 * They should remain available for the primary / parent VM.
+	 */
+	if (cpumask_test_cpu(0, cpu_pool)) {
+		pr_err("%s: CPU 0 has to remain available\n", ne_misc_dev.name);
+
+		rc = -EINVAL;
+
+		goto free_pool_cpumask;
+	}
+
+	for_each_cpu(cpu_sibling, topology_sibling_cpumask(0)) {
+		if (cpumask_test_cpu(cpu_sibling, cpu_pool)) {
+			pr_err("%s: CPU sibling %d for CPU 0 is in CPU pool\n",
+			       ne_misc_dev.name, cpu_sibling);
+
+			rc = -EINVAL;
+
+			goto free_pool_cpumask;
+		}
+	}
+
+	/*
+	 * Check if CPU siblings are included in the provided CPU pool. The
+	 * expectation is that full CPU cores are made available in the CPU pool
+	 * for enclaves.
+	 */
+	for_each_cpu(cpu, cpu_pool) {
+		for_each_cpu(cpu_sibling, topology_sibling_cpumask(cpu)) {
+			if (!cpumask_test_cpu(cpu_sibling, cpu_pool)) {
+				pr_err("%s: CPU %d is not in CPU pool\n",
+				       ne_misc_dev.name, cpu_sibling);
+
+				rc = -EINVAL;
+
+				goto free_pool_cpumask;
+			}
+		}
+	}
+
+	/* Calculate the number of threads from a full CPU core. */
+	cpu = cpumask_any(cpu_pool);
+	for_each_cpu(cpu_sibling, topology_sibling_cpumask(cpu))
+		ne_cpu_pool.nr_threads_per_core++;
+
+	ne_cpu_pool.nr_parent_vm_cores = nr_cpu_ids / ne_cpu_pool.nr_threads_per_core;
+
+	ne_cpu_pool.avail_threads_per_core = kcalloc(ne_cpu_pool.nr_parent_vm_cores,
+					     sizeof(*ne_cpu_pool.avail_threads_per_core),
+					     GFP_KERNEL);
+	if (!ne_cpu_pool.avail_threads_per_core) {
+		rc = -ENOMEM;
+
+		goto free_pool_cpumask;
+	}
+
+	for (i = 0; i < ne_cpu_pool.nr_parent_vm_cores; i++)
+		if (!zalloc_cpumask_var(&ne_cpu_pool.avail_threads_per_core[i], GFP_KERNEL)) {
+			rc = -ENOMEM;
+
+			goto free_cores_cpumask;
+		}
+
+	/*
+	 * Split the NE CPU pool in threads per core to keep the CPU topology
+	 * after offlining the CPUs.
+	 */
+	for_each_cpu(cpu, cpu_pool) {
+		core_id = topology_core_id(cpu);
+		if (core_id < 0 || core_id >= ne_cpu_pool.nr_parent_vm_cores) {
+			pr_err("%s: Invalid core id  %d for CPU %d\n",
+			       ne_misc_dev.name, core_id, cpu);
+
+			rc = -EINVAL;
+
+			goto clear_cpumask;
+		}
+
+		cpumask_set_cpu(cpu, ne_cpu_pool.avail_threads_per_core[core_id]);
+	}
+
+	/*
+	 * CPUs that are given to enclave(s) should not be considered online
+	 * by Linux anymore, as the hypervisor will degrade them to floating.
+	 * The physical CPUs (full cores) are carved out of the primary / parent
+	 * VM and given to the enclave VM. The same number of vCPUs would run
+	 * on less pCPUs for the primary / parent VM.
+	 *
+	 * We offline them here, to not degrade performance and expose correct
+	 * topology to Linux and user space.
+	 */
+	for_each_cpu(cpu, cpu_pool) {
+		rc = remove_cpu(cpu);
+		if (rc != 0) {
+			pr_err("%s: CPU %d is not offlined [rc=%d]\n",
+			       ne_misc_dev.name, cpu, rc);
+
+			goto online_cpus;
+		}
+	}
+
+	free_cpumask_var(cpu_pool);
+
+	ne_cpu_pool.numa_node = numa_node;
+
+	mutex_unlock(&ne_cpu_pool.mutex);
+
+	return 0;
+
+online_cpus:
+	for_each_cpu(cpu, cpu_pool)
+		add_cpu(cpu);
+clear_cpumask:
+	for (i = 0; i < ne_cpu_pool.nr_parent_vm_cores; i++)
+		cpumask_clear(ne_cpu_pool.avail_threads_per_core[i]);
+free_cores_cpumask:
+	for (i = 0; i < ne_cpu_pool.nr_parent_vm_cores; i++)
+		free_cpumask_var(ne_cpu_pool.avail_threads_per_core[i]);
+	kfree(ne_cpu_pool.avail_threads_per_core);
+free_pool_cpumask:
+	free_cpumask_var(cpu_pool);
+	ne_cpu_pool.nr_parent_vm_cores = 0;
+	ne_cpu_pool.nr_threads_per_core = 0;
+	ne_cpu_pool.numa_node = -1;
+	mutex_unlock(&ne_cpu_pool.mutex);
+
+	return rc;
+}
+
+/**
+ * ne_teardown_cpu_pool() - Online the CPUs from the NE CPU pool and cleanup the
+ *			    CPU pool.
+ * @void:	No parameters provided.
+ *
+ * Context: Process context.
+ */
+static void ne_teardown_cpu_pool(void)
+{
+	unsigned int cpu = 0;
+	unsigned int i = 0;
+	int rc = -EINVAL;
+
+	mutex_lock(&ne_cpu_pool.mutex);
+
+	if (!ne_cpu_pool.nr_parent_vm_cores) {
+		mutex_unlock(&ne_cpu_pool.mutex);
+
+		return;
+	}
+
+	for (i = 0; i < ne_cpu_pool.nr_parent_vm_cores; i++) {
+		for_each_cpu(cpu, ne_cpu_pool.avail_threads_per_core[i]) {
+			rc = add_cpu(cpu);
+			if (rc != 0)
+				pr_err("%s: CPU %d is not onlined [rc=%d]\n",
+				       ne_misc_dev.name, cpu, rc);
+		}
+
+		cpumask_clear(ne_cpu_pool.avail_threads_per_core[i]);
+
+		free_cpumask_var(ne_cpu_pool.avail_threads_per_core[i]);
+	}
+
+	kfree(ne_cpu_pool.avail_threads_per_core);
+	ne_cpu_pool.nr_parent_vm_cores = 0;
+	ne_cpu_pool.nr_threads_per_core = 0;
+	ne_cpu_pool.numa_node = -1;
+
+	mutex_unlock(&ne_cpu_pool.mutex);
+}
+
+/**
+ * ne_set_kernel_param() - Set the NE CPU pool value via the NE kernel parameter.
+ * @val:	NE CPU pool string value.
+ * @kp :	NE kernel parameter associated with the NE CPU pool.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_set_kernel_param(const char *val, const struct kernel_param *kp)
+{
+	char error_val[] = "";
+	int rc = -EINVAL;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (ne_check_enclaves_created()) {
+		pr_err("%s: The CPU pool is used by enclave(s)\n", ne_misc_dev.name);
+
+		return -EPERM;
+	}
+
+	ne_teardown_cpu_pool();
+
+	rc = ne_setup_cpu_pool(val);
+	if (rc < 0) {
+		pr_err("%s: Error in setup CPU pool [rc=%d]\n", ne_misc_dev.name, rc);
+
+		param_set_copystring(error_val, kp);
+
+		return rc;
+	}
+
+	rc = param_set_copystring(val, kp);
+	if (rc < 0) {
+		pr_err("%s: Error in param set copystring [rc=%d]\n", ne_misc_dev.name, rc);
+
+		ne_teardown_cpu_pool();
+
+		param_set_copystring(error_val, kp);
+
+		return rc;
+	}
+
+	return 0;
+}
+
+/**
+ * ne_donated_cpu() - Check if the provided CPU is already used by the enclave.
+ * @ne_enclave :	Private data associated with the current enclave.
+ * @cpu:		CPU to check if already used.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex held.
+ * Return:
+ * * True if the provided CPU is already used by the enclave.
+ * * False otherwise.
+ */
+static bool ne_donated_cpu(struct ne_enclave *ne_enclave, unsigned int cpu)
+{
+	if (cpumask_test_cpu(cpu, ne_enclave->vcpu_ids))
+		return true;
+
+	return false;
+}
+
+/**
+ * ne_get_unused_core_from_cpu_pool() - Get the id of a full core from the
+ *					NE CPU pool.
+ * @void:	No parameters provided.
+ *
+ * Context: Process context. This function is called with the ne_enclave and
+ *	    ne_cpu_pool mutexes held.
+ * Return:
+ * * Core id.
+ * * -1 if no CPU core available in the pool.
+ */
+static int ne_get_unused_core_from_cpu_pool(void)
+{
+	int core_id = -1;
+	unsigned int i = 0;
+
+	for (i = 0; i < ne_cpu_pool.nr_parent_vm_cores; i++)
+		if (!cpumask_empty(ne_cpu_pool.avail_threads_per_core[i])) {
+			core_id = i;
+
+			break;
+		}
+
+	return core_id;
+}
+
+/**
+ * ne_set_enclave_threads_per_core() - Set the threads of the provided core in
+ *				       the enclave data structure.
+ * @ne_enclave :	Private data associated with the current enclave.
+ * @core_id:		Core id to get its threads from the NE CPU pool.
+ * @vcpu_id:		vCPU id part of the provided core.
+ *
+ * Context: Process context. This function is called with the ne_enclave and
+ *	    ne_cpu_pool mutexes held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_set_enclave_threads_per_core(struct ne_enclave *ne_enclave,
+					   int core_id, u32 vcpu_id)
+{
+	unsigned int cpu = 0;
+
+	if (core_id < 0 && vcpu_id == 0) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "No CPUs available in NE CPU pool\n");
+
+		return -NE_ERR_NO_CPUS_AVAIL_IN_POOL;
+	}
+
+	if (core_id < 0) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "CPU %d is not in NE CPU pool\n", vcpu_id);
+
+		return -NE_ERR_VCPU_NOT_IN_CPU_POOL;
+	}
+
+	if (core_id >= ne_enclave->nr_parent_vm_cores) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Invalid core id %d - ne_enclave\n", core_id);
+
+		return -NE_ERR_VCPU_INVALID_CPU_CORE;
+	}
+
+	for_each_cpu(cpu, ne_cpu_pool.avail_threads_per_core[core_id])
+		cpumask_set_cpu(cpu, ne_enclave->threads_per_core[core_id]);
+
+	cpumask_clear(ne_cpu_pool.avail_threads_per_core[core_id]);
+
+	return 0;
+}
+
+/**
+ * ne_get_cpu_from_cpu_pool() - Get a CPU from the NE CPU pool, either from the
+ *				remaining sibling(s) of a CPU core or the first
+ *				sibling of a new CPU core.
+ * @ne_enclave :	Private data associated with the current enclave.
+ * @vcpu_id:		vCPU to get from the NE CPU pool.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_get_cpu_from_cpu_pool(struct ne_enclave *ne_enclave, u32 *vcpu_id)
+{
+	int core_id = -1;
+	unsigned int cpu = 0;
+	unsigned int i = 0;
+	int rc = -EINVAL;
+
+	/*
+	 * If previously allocated a thread of a core to this enclave, first
+	 * check remaining sibling(s) for new CPU allocations, so that full
+	 * CPU cores are used for the enclave.
+	 */
+	for (i = 0; i < ne_enclave->nr_parent_vm_cores; i++)
+		for_each_cpu(cpu, ne_enclave->threads_per_core[i])
+			if (!ne_donated_cpu(ne_enclave, cpu)) {
+				*vcpu_id = cpu;
+
+				return 0;
+			}
+
+	mutex_lock(&ne_cpu_pool.mutex);
+
+	/*
+	 * If no remaining siblings, get a core from the NE CPU pool and keep
+	 * track of all the threads in the enclave threads per core data structure.
+	 */
+	core_id = ne_get_unused_core_from_cpu_pool();
+
+	rc = ne_set_enclave_threads_per_core(ne_enclave, core_id, *vcpu_id);
+	if (rc < 0)
+		goto unlock_mutex;
+
+	*vcpu_id = cpumask_any(ne_enclave->threads_per_core[core_id]);
+
+	rc = 0;
+
+unlock_mutex:
+	mutex_unlock(&ne_cpu_pool.mutex);
+
+	return rc;
+}
+
+/**
+ * ne_get_vcpu_core_from_cpu_pool() - Get from the NE CPU pool the id of the
+ *				      core associated with the provided vCPU.
+ * @vcpu_id:	Provided vCPU id to get its associated core id.
+ *
+ * Context: Process context. This function is called with the ne_enclave and
+ *	    ne_cpu_pool mutexes held.
+ * Return:
+ * * Core id.
+ * * -1 if the provided vCPU is not in the pool.
+ */
+static int ne_get_vcpu_core_from_cpu_pool(u32 vcpu_id)
+{
+	int core_id = -1;
+	unsigned int i = 0;
+
+	for (i = 0; i < ne_cpu_pool.nr_parent_vm_cores; i++)
+		if (cpumask_test_cpu(vcpu_id, ne_cpu_pool.avail_threads_per_core[i])) {
+			core_id = i;
+
+			break;
+	}
+
+	return core_id;
+}
+
+/**
+ * ne_check_cpu_in_cpu_pool() - Check if the given vCPU is in the available CPUs
+ *				from the pool.
+ * @ne_enclave :	Private data associated with the current enclave.
+ * @vcpu_id:		ID of the vCPU to check if available in the NE CPU pool.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_check_cpu_in_cpu_pool(struct ne_enclave *ne_enclave, u32 vcpu_id)
+{
+	int core_id = -1;
+	unsigned int i = 0;
+	int rc = -EINVAL;
+
+	if (ne_donated_cpu(ne_enclave, vcpu_id)) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "CPU %d already used\n", vcpu_id);
+
+		return -NE_ERR_VCPU_ALREADY_USED;
+	}
+
+	/*
+	 * If previously allocated a thread of a core to this enclave, but not
+	 * the full core, first check remaining sibling(s).
+	 */
+	for (i = 0; i < ne_enclave->nr_parent_vm_cores; i++)
+		if (cpumask_test_cpu(vcpu_id, ne_enclave->threads_per_core[i]))
+			return 0;
+
+	mutex_lock(&ne_cpu_pool.mutex);
+
+	/*
+	 * If no remaining siblings, get from the NE CPU pool the core
+	 * associated with the vCPU and keep track of all the threads in the
+	 * enclave threads per core data structure.
+	 */
+	core_id = ne_get_vcpu_core_from_cpu_pool(vcpu_id);
+
+	rc = ne_set_enclave_threads_per_core(ne_enclave, core_id, vcpu_id);
+	if (rc < 0)
+		goto unlock_mutex;
+
+	rc = 0;
+
+unlock_mutex:
+	mutex_unlock(&ne_cpu_pool.mutex);
+
+	return rc;
+}
+
+/**
+ * ne_add_vcpu_ioctl() - Add a vCPU to the slot associated with the current
+ *			 enclave.
+ * @ne_enclave :	Private data associated with the current enclave.
+ * @vcpu_id:		ID of the CPU to be associated with the given slot,
+ *			apic id on x86.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_add_vcpu_ioctl(struct ne_enclave *ne_enclave, u32 vcpu_id)
+{
+	struct ne_pci_dev_cmd_reply cmd_reply = {};
+	int rc = -EINVAL;
+	struct slot_add_vcpu_req slot_add_vcpu_req = {};
+
+	if (ne_enclave->mm != current->mm)
+		return -EIO;
+
+	slot_add_vcpu_req.slot_uid = ne_enclave->slot_uid;
+	slot_add_vcpu_req.vcpu_id = vcpu_id;
+
+	rc = ne_do_request(ne_enclave->pdev, SLOT_ADD_VCPU, &slot_add_vcpu_req,
+			   sizeof(slot_add_vcpu_req), &cmd_reply, sizeof(cmd_reply));
+	if (rc < 0) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Error in slot add vCPU [rc=%d]\n", rc);
+
+		return rc;
+	}
+
+	cpumask_set_cpu(vcpu_id, ne_enclave->vcpu_ids);
+
+	ne_enclave->nr_vcpus++;
+
+	return 0;
+}
+
+/**
+ * ne_enclave_ioctl() - Ioctl function provided by the enclave file.
+ * @file:	File associated with this ioctl function.
+ * @cmd:	The command that is set for the ioctl call.
+ * @arg:	The argument that is provided for the ioctl call.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static long ne_enclave_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	struct ne_enclave *ne_enclave = file->private_data;
+
+	switch (cmd) {
+	case NE_ADD_VCPU: {
+		int rc = -EINVAL;
+		u32 vcpu_id = 0;
+
+		if (copy_from_user(&vcpu_id, (void __user *)arg, sizeof(vcpu_id)))
+			return -EFAULT;
+
+		mutex_lock(&ne_enclave->enclave_info_mutex);
+
+		if (ne_enclave->state != NE_STATE_INIT) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Enclave is not in init state\n");
+
+			mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+			return -NE_ERR_NOT_IN_INIT_STATE;
+		}
+
+		if (vcpu_id >= (ne_enclave->nr_parent_vm_cores *
+		    ne_enclave->nr_threads_per_core)) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "vCPU id higher than max CPU id\n");
+
+			mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+			return -NE_ERR_INVALID_VCPU;
+		}
+
+		if (!vcpu_id) {
+			/* Use the CPU pool for choosing a CPU for the enclave. */
+			rc = ne_get_cpu_from_cpu_pool(ne_enclave, &vcpu_id);
+			if (rc < 0) {
+				dev_err_ratelimited(ne_misc_dev.this_device,
+						    "Error in get CPU from pool [rc=%d]\n",
+						    rc);
+
+				mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+				return rc;
+			}
+		} else {
+			/* Check if the provided vCPU is available in the NE CPU pool. */
+			rc = ne_check_cpu_in_cpu_pool(ne_enclave, vcpu_id);
+			if (rc < 0) {
+				dev_err_ratelimited(ne_misc_dev.this_device,
+						    "Error in check CPU %d in pool [rc=%d]\n",
+						    vcpu_id, rc);
+
+				mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+				return rc;
+			}
+		}
+
+		rc = ne_add_vcpu_ioctl(ne_enclave, vcpu_id);
+		if (rc < 0) {
+			mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+			return rc;
+		}
+
+		mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+		if (copy_to_user((void __user *)arg, &vcpu_id, sizeof(vcpu_id)))
+			return -EFAULT;
+
+		return 0;
+	}
+
+	default:
+		return -ENOTTY;
+	}
+
+	return 0;
+}
+
 /**
  * ne_enclave_poll() - Poll functionality used for enclave out-of-band events.
  * @file:	File associated with this poll function.
@@ -131,6 +830,7 @@ static const struct file_operations ne_enclave_fops = {
 	.owner		= THIS_MODULE,
 	.llseek		= noop_llseek,
 	.poll		= ne_enclave_poll,
+	.unlocked_ioctl	= ne_enclave_ioctl,
 };
 
 /**
@@ -351,6 +1051,8 @@ static int __init ne_init(void)
 static void __exit ne_exit(void)
 {
 	pci_unregister_driver(&ne_pci_driver);
+
+	ne_teardown_cpu_pool();
 }
 
 module_init(ne_init);

From patchwork Fri Sep  4 17:37:10 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Paraschiv, Andra-Irina" <andraprs@amazon.com>
X-Patchwork-Id: 11758337
Return-Path: <SRS0=EfB3=CN=vger.kernel.org=kvm-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8EBEE138E
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:07 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 6797624640
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:07 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com
 header.b="p+gTGLD5"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1728089AbgIDRmJ (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Fri, 4 Sep 2020 13:42:09 -0400
Received: from smtp-fw-9101.amazon.com ([207.171.184.25]:5287 "EHLO
        smtp-fw-9101.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1728209AbgIDRjc (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 4 Sep 2020 13:39:32 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209;
  t=1599241172; x=1630777172;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Hz4U1g0+78SNWFckpMwulwqEiiWdBffw1BYOxHtKAuA=;
  b=p+gTGLD546HgHMlSC1r3S0/i76iLL7xSVp1+bNHxlraRjxScRe7Xd32O
   zlOCRShG92XkkIK4G/ctXszrdmXJx9yRYRooSRtv+fd4ap3O7kDdJoFa2
   qMB4bXpxC6Ds1HdV9ymBH+C+++cK/QVjrXSMOrsvBu6/LgTFGjYFyA9i6
   A=;
X-IronPort-AV: E=Sophos;i="5.76,390,1592870400";
   d="scan'208";a="65515420"
Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO
 email-inbound-relay-1d-74cf8b49.us-east-1.amazon.com) ([10.47.23.38])
  by smtp-border-fw-out-9101.sea19.amazon.com with ESMTP;
 04 Sep 2020 17:39:30 +0000
Received: from EX13D16EUB001.ant.amazon.com
 (iad55-ws-svc-p15-lb9-vlan3.iad.amazon.com [10.40.159.166])
        by email-inbound-relay-1d-74cf8b49.us-east-1.amazon.com (Postfix) with
 ESMTPS id 06CC9C19E5;
        Fri,  4 Sep 2020 17:39:27 +0000 (UTC)
Received: from 38f9d34ed3b1.ant.amazon.com (10.43.161.145) by
 EX13D16EUB001.ant.amazon.com (10.43.166.28) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Fri, 4 Sep 2020 17:39:16 +0000
From: Andra Paraschiv <andraprs@amazon.com>
To: linux-kernel <linux-kernel@vger.kernel.org>
CC: Anthony Liguori <aliguori@amazon.com>,
        Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Colm MacCarthaigh <colmmacc@amazon.com>,
        "David Duncan" <davdunc@amazon.com>,
        Bjoern Doebel <doebel@amazon.de>,
        "David Woodhouse" <dwmw@amazon.co.uk>,
        Frank van der Linden <fllinden@amazon.com>,
        Alexander Graf <graf@amazon.de>,
        Greg KH <gregkh@linuxfoundation.org>,
        "Karen Noel" <knoel@redhat.com>,
        Martin Pohlack <mpohlack@amazon.de>,
        Matt Wilson <msw@amazon.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Balbir Singh <sblbir@amazon.com>,
        Stefano Garzarella <sgarzare@redhat.com>,
        "Stefan Hajnoczi" <stefanha@redhat.com>,
        Stewart Smith <trawets@amazon.com>,
        "Uwe Dannowski" <uwed@amazon.de>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        kvm <kvm@vger.kernel.org>,
        ne-devel-upstream <ne-devel-upstream@amazon.com>,
        Andra Paraschiv <andraprs@amazon.com>
Subject: [PATCH v8 10/18] nitro_enclaves: Add logic for getting the enclave
 image load info
Date: Fri, 4 Sep 2020 20:37:10 +0300
Message-ID: <20200904173718.64857-11-andraprs@amazon.com>
X-Mailer: git-send-email 2.20.1 (Apple Git-117)
In-Reply-To: <20200904173718.64857-1-andraprs@amazon.com>
References: <20200904173718.64857-1-andraprs@amazon.com>
MIME-Version: 1.0
X-Originating-IP: [10.43.161.145]
X-ClientProxiedBy: EX13D08UWC002.ant.amazon.com (10.43.162.168) To
 EX13D16EUB001.ant.amazon.com (10.43.166.28)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

Before setting the memory regions for the enclave, the enclave image
needs to be placed in memory. After the memory regions are set, this
memory cannot be used anymore by the VM, being carved out.

Add ioctl command logic to get the offset in enclave memory where to
place the enclave image. Then the user space tooling copies the enclave
image in the memory using the given memory offset.

Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
---
Changelog

v7 -> v8

* Add custom error code for incorrect enclave image load info flag.

v6 -> v7

* No changes.

v5 -> v6

* Check for invalid enclave image load flags.

v4 -> v5

* Check for the enclave not being started when invoking this ioctl call.
* Remove log on copy_from_user() / copy_to_user() failure.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Set enclave image load offset based on flags.
* Update the naming for the ioctl command from metadata to info.

v2 -> v3

* No changes.

v1 -> v2

* New in v2.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 36 +++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 0477b11bf15d..0248db07fd6a 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -795,6 +795,42 @@ static long ne_enclave_ioctl(struct file *file, unsigned int cmd, unsigned long
 		return 0;
 	}
 
+	case NE_GET_IMAGE_LOAD_INFO: {
+		struct ne_image_load_info image_load_info = {};
+
+		if (copy_from_user(&image_load_info, (void __user *)arg, sizeof(image_load_info)))
+			return -EFAULT;
+
+		mutex_lock(&ne_enclave->enclave_info_mutex);
+
+		if (ne_enclave->state != NE_STATE_INIT) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Enclave is not in init state\n");
+
+			mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+			return -NE_ERR_NOT_IN_INIT_STATE;
+		}
+
+		mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+		if (!image_load_info.flags ||
+		    image_load_info.flags >= NE_IMAGE_LOAD_MAX_FLAG_VAL) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Incorrect flag in enclave image load info\n");
+
+			return -NE_ERR_INVALID_FLAG_VALUE;
+		}
+
+		if (image_load_info.flags == NE_EIF_IMAGE)
+			image_load_info.memory_offset = NE_EIF_LOAD_OFFSET;
+
+		if (copy_to_user((void __user *)arg, &image_load_info, sizeof(image_load_info)))
+			return -EFAULT;
+
+		return 0;
+	}
+
 	default:
 		return -ENOTTY;
 	}

From patchwork Fri Sep  4 17:37:11 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Paraschiv, Andra-Irina" <andraprs@amazon.com>
X-Patchwork-Id: 11758321
Return-Path: <SRS0=EfB3=CN=vger.kernel.org=kvm-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3E7EE15AB
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:05 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 137F42405A
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:05 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com
 header.b="WgObfEim"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1728193AbgIDRjz (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Fri, 4 Sep 2020 13:39:55 -0400
Received: from smtp-fw-6002.amazon.com ([52.95.49.90]:24564 "EHLO
        smtp-fw-6002.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1727963AbgIDRjl (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 4 Sep 2020 13:39:41 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209;
  t=1599241180; x=1630777180;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=rO1RD3swUWSwwGKlW8S7yBoh9aSmUUdMhAbwXbKQAQI=;
  b=WgObfEimqguK+bCjgA7O4fTFOqJu3Grlu0Ej46jn4CR4uhMEyhYC4rOM
   wZTFHw4X0Txo+U4CaM0pAtHR93CklNYgXadQOU8DQBg/EWEl1QW5yDp24
   DxvZA0we2jO+udGzGYL9mEh5oK3Syk/VBjXu5yjx7BTT6sWpzfSDvQP95
   Q=;
X-IronPort-AV: E=Sophos;i="5.76,390,1592870400";
   d="scan'208";a="52100087"
Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO
 email-inbound-relay-1a-67b371d8.us-east-1.amazon.com) ([10.43.8.6])
  by smtp-border-fw-out-6002.iad6.amazon.com with ESMTP;
 04 Sep 2020 17:39:39 +0000
Received: from EX13D16EUB001.ant.amazon.com
 (iad55-ws-svc-p15-lb9-vlan2.iad.amazon.com [10.40.159.162])
        by email-inbound-relay-1a-67b371d8.us-east-1.amazon.com (Postfix) with
 ESMTPS id 027E3A1BFA;
        Fri,  4 Sep 2020 17:39:37 +0000 (UTC)
Received: from 38f9d34ed3b1.ant.amazon.com (10.43.161.145) by
 EX13D16EUB001.ant.amazon.com (10.43.166.28) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Fri, 4 Sep 2020 17:39:27 +0000
From: Andra Paraschiv <andraprs@amazon.com>
To: linux-kernel <linux-kernel@vger.kernel.org>
CC: Anthony Liguori <aliguori@amazon.com>,
        Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Colm MacCarthaigh <colmmacc@amazon.com>,
        "David Duncan" <davdunc@amazon.com>,
        Bjoern Doebel <doebel@amazon.de>,
        "David Woodhouse" <dwmw@amazon.co.uk>,
        Frank van der Linden <fllinden@amazon.com>,
        Alexander Graf <graf@amazon.de>,
        Greg KH <gregkh@linuxfoundation.org>,
        "Karen Noel" <knoel@redhat.com>,
        Martin Pohlack <mpohlack@amazon.de>,
        Matt Wilson <msw@amazon.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Balbir Singh <sblbir@amazon.com>,
        Stefano Garzarella <sgarzare@redhat.com>,
        "Stefan Hajnoczi" <stefanha@redhat.com>,
        Stewart Smith <trawets@amazon.com>,
        "Uwe Dannowski" <uwed@amazon.de>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        kvm <kvm@vger.kernel.org>,
        ne-devel-upstream <ne-devel-upstream@amazon.com>,
        Andra Paraschiv <andraprs@amazon.com>
Subject: [PATCH v8 11/18] nitro_enclaves: Add logic for setting an enclave
 memory region
Date: Fri, 4 Sep 2020 20:37:11 +0300
Message-ID: <20200904173718.64857-12-andraprs@amazon.com>
X-Mailer: git-send-email 2.20.1 (Apple Git-117)
In-Reply-To: <20200904173718.64857-1-andraprs@amazon.com>
References: <20200904173718.64857-1-andraprs@amazon.com>
MIME-Version: 1.0
X-Originating-IP: [10.43.161.145]
X-ClientProxiedBy: EX13D08UWC002.ant.amazon.com (10.43.162.168) To
 EX13D16EUB001.ant.amazon.com (10.43.166.28)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

Another resource that is being set for an enclave is memory. User space
memory regions, that need to be backed by contiguous memory regions,
are associated with the enclave.

One solution for allocating / reserving contiguous memory regions, that
is used for integration, is hugetlbfs. The user space process that is
associated with the enclave passes to the driver these memory regions.

The enclave memory regions need to be from the same NUMA node as the
enclave CPUs.

Add ioctl command logic for setting user space memory region for an
enclave.

Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
---
Changelog

v7 -> v8

* Add early check, while getting user pages, to be multiple of 2 MiB for
  the pages that back the user space memory region.
* Add custom error code for incorrect user space memory region flag.
* Include in a separate function the sanity checks for each page of the
  user space memory region.

v6 -> v7

* Update check for duplicate user space memory regions to cover
  additional possible scenarios.

v5 -> v6

* Check for max number of pages allocated for the internal data
  structure for pages.
* Check for invalid memory region flags.
* Check for aligned physical memory regions.
* Update documentation to kernel-doc format.
* Check for duplicate user space memory regions.
* Use directly put_page() instead of unpin_user_pages(), to match the
  get_user_pages() calls.

v4 -> v5

* Add early exit on set memory region ioctl function call error.
* Remove log on copy_from_user() failure.
* Exit without unpinning the pages on NE PCI dev request failure as
  memory regions from the user space range may have already been added.
* Add check for the memory region user space address to be 2 MiB
  aligned.
* Update logic to not have a hardcoded check for 2 MiB memory regions.

v3 -> v4

* Check enclave memory regions are from the same NUMA node as the
  enclave CPUs.
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
* Check if enclave max memory regions is reached when setting an enclave
  memory region.
* Check if enclave state is init when setting an enclave memory region.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 316 ++++++++++++++++++++++
 1 file changed, 316 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 0248db07fd6a..9912d78c0905 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -710,6 +710,285 @@ static int ne_add_vcpu_ioctl(struct ne_enclave *ne_enclave, u32 vcpu_id)
 	return 0;
 }
 
+/**
+ * ne_sanity_check_user_mem_region() - Sanity check the user space memory
+ *				       region received during the set user
+ *				       memory region ioctl call.
+ * @ne_enclave :	Private data associated with the current enclave.
+ * @mem_region :	User space memory region to be sanity checked.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_sanity_check_user_mem_region(struct ne_enclave *ne_enclave,
+	struct ne_user_memory_region mem_region)
+{
+	struct ne_mem_region *ne_mem_region = NULL;
+
+	if (ne_enclave->mm != current->mm)
+		return -EIO;
+
+	if (mem_region.memory_size & (NE_MIN_MEM_REGION_SIZE - 1)) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "User space memory size is not multiple of 2 MiB\n");
+
+		return -NE_ERR_INVALID_MEM_REGION_SIZE;
+	}
+
+	if (!IS_ALIGNED(mem_region.userspace_addr, NE_MIN_MEM_REGION_SIZE)) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "User space address is not 2 MiB aligned\n");
+
+		return -NE_ERR_UNALIGNED_MEM_REGION_ADDR;
+	}
+
+	if ((mem_region.userspace_addr & (NE_MIN_MEM_REGION_SIZE - 1)) ||
+	    !access_ok((void __user *)(unsigned long)mem_region.userspace_addr,
+		       mem_region.memory_size)) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Invalid user space address range\n");
+
+		return -NE_ERR_INVALID_MEM_REGION_ADDR;
+	}
+
+	list_for_each_entry(ne_mem_region, &ne_enclave->mem_regions_list,
+			    mem_region_list_entry) {
+		u64 memory_size = ne_mem_region->memory_size;
+		u64 userspace_addr = ne_mem_region->userspace_addr;
+
+		if ((userspace_addr <= mem_region.userspace_addr &&
+		    mem_region.userspace_addr < (userspace_addr + memory_size)) ||
+		    (mem_region.userspace_addr <= userspace_addr &&
+		    (mem_region.userspace_addr + mem_region.memory_size) > userspace_addr)) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "User space memory region already used\n");
+
+			return -NE_ERR_MEM_REGION_ALREADY_USED;
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * ne_sanity_check_user_mem_region_page() - Sanity check a page from the user space
+ *					    memory region received during the set
+ *					    user memory region ioctl call.
+ * @ne_enclave :	Private data associated with the current enclave.
+ * @mem_region_page:	Page from the user space memory region to be sanity checked.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_sanity_check_user_mem_region_page(struct ne_enclave *ne_enclave,
+						struct page *mem_region_page)
+{
+	if (!PageHuge(mem_region_page)) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Not a hugetlbfs page\n");
+
+		return -NE_ERR_MEM_NOT_HUGE_PAGE;
+	}
+
+	if (page_size(mem_region_page) & (NE_MIN_MEM_REGION_SIZE - 1)) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Page size not multiple of 2 MiB\n");
+
+		return -NE_ERR_INVALID_PAGE_SIZE;
+	}
+
+	if (ne_enclave->numa_node != page_to_nid(mem_region_page)) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Page is not from NUMA node %d\n",
+				    ne_enclave->numa_node);
+
+		return -NE_ERR_MEM_DIFFERENT_NUMA_NODE;
+	}
+
+	return 0;
+}
+
+/**
+ * ne_set_user_memory_region_ioctl() - Add user space memory region to the slot
+ *				       associated with the current enclave.
+ * @ne_enclave :	Private data associated with the current enclave.
+ * @mem_region :	User space memory region to be associated with the given slot.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_set_user_memory_region_ioctl(struct ne_enclave *ne_enclave,
+	struct ne_user_memory_region mem_region)
+{
+	long gup_rc = 0;
+	unsigned long i = 0;
+	unsigned long max_nr_pages = 0;
+	unsigned long memory_size = 0;
+	struct ne_mem_region *ne_mem_region = NULL;
+	unsigned long nr_phys_contig_mem_regions = 0;
+	struct page **phys_contig_mem_regions = NULL;
+	int rc = -EINVAL;
+
+	rc = ne_sanity_check_user_mem_region(ne_enclave, mem_region);
+	if (rc < 0)
+		return rc;
+
+	ne_mem_region = kzalloc(sizeof(*ne_mem_region), GFP_KERNEL);
+	if (!ne_mem_region)
+		return -ENOMEM;
+
+	max_nr_pages = mem_region.memory_size / NE_MIN_MEM_REGION_SIZE;
+
+	ne_mem_region->pages = kcalloc(max_nr_pages, sizeof(*ne_mem_region->pages),
+				       GFP_KERNEL);
+	if (!ne_mem_region->pages) {
+		rc = -ENOMEM;
+
+		goto free_mem_region;
+	}
+
+	phys_contig_mem_regions = kcalloc(max_nr_pages, sizeof(*phys_contig_mem_regions),
+					  GFP_KERNEL);
+	if (!phys_contig_mem_regions) {
+		rc = -ENOMEM;
+
+		goto free_mem_region;
+	}
+
+	do {
+		i = ne_mem_region->nr_pages;
+
+		if (i == max_nr_pages) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Reached max nr of pages in the pages data struct\n");
+
+			rc = -ENOMEM;
+
+			goto put_pages;
+		}
+
+		gup_rc = get_user_pages(mem_region.userspace_addr + memory_size, 1, FOLL_GET,
+					ne_mem_region->pages + i, NULL);
+		if (gup_rc < 0) {
+			rc = gup_rc;
+
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Error in get user pages [rc=%d]\n", rc);
+
+			goto put_pages;
+		}
+
+		rc = ne_sanity_check_user_mem_region_page(ne_enclave, ne_mem_region->pages[i]);
+		if (rc < 0)
+			goto put_pages;
+
+		/*
+		 * TODO: Update once handled non-contiguous memory regions
+		 * received from user space or contiguous physical memory regions
+		 * larger than 2 MiB e.g. 8 MiB.
+		 */
+		phys_contig_mem_regions[i] = ne_mem_region->pages[i];
+
+		memory_size += page_size(ne_mem_region->pages[i]);
+
+		ne_mem_region->nr_pages++;
+	} while (memory_size < mem_region.memory_size);
+
+	/*
+	 * TODO: Update once handled non-contiguous memory regions received
+	 * from user space or contiguous physical memory regions larger than
+	 * 2 MiB e.g. 8 MiB.
+	 */
+	nr_phys_contig_mem_regions = ne_mem_region->nr_pages;
+
+	if ((ne_enclave->nr_mem_regions + nr_phys_contig_mem_regions) >
+	    ne_enclave->max_mem_regions) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Reached max memory regions %lld\n",
+				    ne_enclave->max_mem_regions);
+
+		rc = -NE_ERR_MEM_MAX_REGIONS;
+
+		goto put_pages;
+	}
+
+	for (i = 0; i < nr_phys_contig_mem_regions; i++) {
+		u64 phys_region_addr = page_to_phys(phys_contig_mem_regions[i]);
+		u64 phys_region_size = page_size(phys_contig_mem_regions[i]);
+
+		if (phys_region_size & (NE_MIN_MEM_REGION_SIZE - 1)) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Physical mem region size is not multiple of 2 MiB\n");
+
+			rc = -EINVAL;
+
+			goto put_pages;
+		}
+
+		if (!IS_ALIGNED(phys_region_addr, NE_MIN_MEM_REGION_SIZE)) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Physical mem region address is not 2 MiB aligned\n");
+
+			rc = -EINVAL;
+
+			goto put_pages;
+		}
+	}
+
+	ne_mem_region->memory_size = mem_region.memory_size;
+	ne_mem_region->userspace_addr = mem_region.userspace_addr;
+
+	list_add(&ne_mem_region->mem_region_list_entry, &ne_enclave->mem_regions_list);
+
+	for (i = 0; i < nr_phys_contig_mem_regions; i++) {
+		struct ne_pci_dev_cmd_reply cmd_reply = {};
+		struct slot_add_mem_req slot_add_mem_req = {};
+
+		slot_add_mem_req.slot_uid = ne_enclave->slot_uid;
+		slot_add_mem_req.paddr = page_to_phys(phys_contig_mem_regions[i]);
+		slot_add_mem_req.size = page_size(phys_contig_mem_regions[i]);
+
+		rc = ne_do_request(ne_enclave->pdev, SLOT_ADD_MEM,
+				   &slot_add_mem_req, sizeof(slot_add_mem_req),
+				   &cmd_reply, sizeof(cmd_reply));
+		if (rc < 0) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Error in slot add mem [rc=%d]\n", rc);
+
+			kfree(phys_contig_mem_regions);
+
+			/*
+			 * Exit here without put pages as memory regions may
+			 * already been added.
+			 */
+			return rc;
+		}
+
+		ne_enclave->mem_size += slot_add_mem_req.size;
+		ne_enclave->nr_mem_regions++;
+	}
+
+	kfree(phys_contig_mem_regions);
+
+	return 0;
+
+put_pages:
+	for (i = 0; i < ne_mem_region->nr_pages; i++)
+		put_page(ne_mem_region->pages[i]);
+free_mem_region:
+	kfree(phys_contig_mem_regions);
+	kfree(ne_mem_region->pages);
+	kfree(ne_mem_region);
+
+	return rc;
+}
+
 /**
  * ne_enclave_ioctl() - Ioctl function provided by the enclave file.
  * @file:	File associated with this ioctl function.
@@ -831,6 +1110,43 @@ static long ne_enclave_ioctl(struct file *file, unsigned int cmd, unsigned long
 		return 0;
 	}
 
+	case NE_SET_USER_MEMORY_REGION: {
+		struct ne_user_memory_region mem_region = {};
+		int rc = -EINVAL;
+
+		if (copy_from_user(&mem_region, (void __user *)arg, sizeof(mem_region)))
+			return -EFAULT;
+
+		if (mem_region.flags >= NE_MEMORY_REGION_MAX_FLAG_VAL) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Incorrect flag for user memory region\n");
+
+			return -NE_ERR_INVALID_FLAG_VALUE;
+		}
+
+		mutex_lock(&ne_enclave->enclave_info_mutex);
+
+		if (ne_enclave->state != NE_STATE_INIT) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Enclave is not in init state\n");
+
+			mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+			return -NE_ERR_NOT_IN_INIT_STATE;
+		}
+
+		rc = ne_set_user_memory_region_ioctl(ne_enclave, mem_region);
+		if (rc < 0) {
+			mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+			return rc;
+		}
+
+		mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+		return 0;
+	}
+
 	default:
 		return -ENOTTY;
 	}

From patchwork Fri Sep  4 17:37:12 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Paraschiv, Andra-Irina" <andraprs@amazon.com>
X-Patchwork-Id: 11758325
Return-Path: <SRS0=EfB3=CN=vger.kernel.org=kvm-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 01D2F1599
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:06 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 9D77D24170
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:05 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com
 header.b="FyjrU+HS"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1728249AbgIDRkU (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Fri, 4 Sep 2020 13:40:20 -0400
Received: from smtp-fw-9101.amazon.com ([207.171.184.25]:5381 "EHLO
        smtp-fw-9101.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1728213AbgIDRkB (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 4 Sep 2020 13:40:01 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209;
  t=1599241200; x=1630777200;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=yi9Y28k8+KNZA7HbOZBu8uKIvfwREPwb/zQHiET4WYo=;
  b=FyjrU+HSEObzq8AOvptic93CRyVHiFls2BGNgHAv4Epeo3XK6ilz7JSa
   faojz0MtCVTf7QO6v8Z2/LO0S/sZpVxILHscDQlSSLo2MMPVUy7n34rCd
   09JKK+U/Xx/68x3QtXMsUDdNUBlryqGRoOvLYwGCd8EWVxW+pWDUKLVgW
   4=;
X-IronPort-AV: E=Sophos;i="5.76,390,1592870400";
   d="scan'208";a="65515504"
Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO
 email-inbound-relay-1a-821c648d.us-east-1.amazon.com) ([10.47.23.38])
  by smtp-border-fw-out-9101.sea19.amazon.com with ESMTP;
 04 Sep 2020 17:39:50 +0000
Received: from EX13D16EUB001.ant.amazon.com
 (iad55-ws-svc-p15-lb9-vlan2.iad.amazon.com [10.40.159.162])
        by email-inbound-relay-1a-821c648d.us-east-1.amazon.com (Postfix) with
 ESMTPS id DB443A2097;
        Fri,  4 Sep 2020 17:39:47 +0000 (UTC)
Received: from 38f9d34ed3b1.ant.amazon.com (10.43.161.145) by
 EX13D16EUB001.ant.amazon.com (10.43.166.28) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Fri, 4 Sep 2020 17:39:37 +0000
From: Andra Paraschiv <andraprs@amazon.com>
To: linux-kernel <linux-kernel@vger.kernel.org>
CC: Anthony Liguori <aliguori@amazon.com>,
        Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Colm MacCarthaigh <colmmacc@amazon.com>,
        "David Duncan" <davdunc@amazon.com>,
        Bjoern Doebel <doebel@amazon.de>,
        "David Woodhouse" <dwmw@amazon.co.uk>,
        Frank van der Linden <fllinden@amazon.com>,
        Alexander Graf <graf@amazon.de>,
        Greg KH <gregkh@linuxfoundation.org>,
        "Karen Noel" <knoel@redhat.com>,
        Martin Pohlack <mpohlack@amazon.de>,
        Matt Wilson <msw@amazon.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Balbir Singh <sblbir@amazon.com>,
        Stefano Garzarella <sgarzare@redhat.com>,
        "Stefan Hajnoczi" <stefanha@redhat.com>,
        Stewart Smith <trawets@amazon.com>,
        "Uwe Dannowski" <uwed@amazon.de>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        kvm <kvm@vger.kernel.org>,
        ne-devel-upstream <ne-devel-upstream@amazon.com>,
        Andra Paraschiv <andraprs@amazon.com>
Subject: [PATCH v8 12/18] nitro_enclaves: Add logic for starting an enclave
Date: Fri, 4 Sep 2020 20:37:12 +0300
Message-ID: <20200904173718.64857-13-andraprs@amazon.com>
X-Mailer: git-send-email 2.20.1 (Apple Git-117)
In-Reply-To: <20200904173718.64857-1-andraprs@amazon.com>
References: <20200904173718.64857-1-andraprs@amazon.com>
MIME-Version: 1.0
X-Originating-IP: [10.43.161.145]
X-ClientProxiedBy: EX13D08UWC002.ant.amazon.com (10.43.162.168) To
 EX13D16EUB001.ant.amazon.com (10.43.166.28)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

After all the enclave resources are set, the enclave is ready for
beginning to run.

Add ioctl command logic for starting an enclave after all its resources,
memory regions and CPUs, have been set.

The enclave start information includes the local channel addressing -
vsock CID - and the flags associated with the enclave.

Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
---
Changelog

v7 -> v8

* Add check for invalid enclave CID value e.g. well-known CIDs and
  parent VM CID.
* Add custom error code for incorrect flag in enclave start info and
  invalid enclave CID.

v6 -> v7

* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.

v5 -> v6

* Check for invalid enclave start flags.
* Update documentation to kernel-doc format.

v4 -> v5

* Add early exit on enclave start ioctl function call error.
* Move sanity checks in the enclave start ioctl function, outside of the
  switch-case block.
* Remove log on copy_from_user() / copy_to_user() failure.

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Update the naming for the ioctl command from metadata to info.
* Check for minimum enclave memory size.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.

v1 -> v2

* Add log pattern for NE.
* Check if enclave state is init when starting an enclave.
* Remove the BUG_ON calls.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 155 ++++++++++++++++++++++
 1 file changed, 155 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 9912d78c0905..5ec7fdf9d08e 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -989,6 +989,77 @@ static int ne_set_user_memory_region_ioctl(struct ne_enclave *ne_enclave,
 	return rc;
 }
 
+/**
+ * ne_start_enclave_ioctl() - Trigger enclave start after the enclave resources,
+ *			      such as memory and CPU, have been set.
+ * @ne_enclave :		Private data associated with the current enclave.
+ * @enclave_start_info :	Enclave info that includes enclave cid and flags.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex held.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_start_enclave_ioctl(struct ne_enclave *ne_enclave,
+	struct ne_enclave_start_info *enclave_start_info)
+{
+	struct ne_pci_dev_cmd_reply cmd_reply = {};
+	unsigned int cpu = 0;
+	struct enclave_start_req enclave_start_req = {};
+	unsigned int i = 0;
+	int rc = -EINVAL;
+
+	if (!ne_enclave->nr_mem_regions) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Enclave has no mem regions\n");
+
+		return -NE_ERR_NO_MEM_REGIONS_ADDED;
+	}
+
+	if (ne_enclave->mem_size < NE_MIN_ENCLAVE_MEM_SIZE) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Enclave memory is less than %ld\n",
+				    NE_MIN_ENCLAVE_MEM_SIZE);
+
+		return -NE_ERR_ENCLAVE_MEM_MIN_SIZE;
+	}
+
+	if (!ne_enclave->nr_vcpus) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Enclave has no vCPUs\n");
+
+		return -NE_ERR_NO_VCPUS_ADDED;
+	}
+
+	for (i = 0; i < ne_enclave->nr_parent_vm_cores; i++)
+		for_each_cpu(cpu, ne_enclave->threads_per_core[i])
+			if (!cpumask_test_cpu(cpu, ne_enclave->vcpu_ids)) {
+				dev_err_ratelimited(ne_misc_dev.this_device,
+						    "Full CPU cores not used\n");
+
+				return -NE_ERR_FULL_CORES_NOT_USED;
+			}
+
+	enclave_start_req.enclave_cid = enclave_start_info->enclave_cid;
+	enclave_start_req.flags = enclave_start_info->flags;
+	enclave_start_req.slot_uid = ne_enclave->slot_uid;
+
+	rc = ne_do_request(ne_enclave->pdev, ENCLAVE_START, &enclave_start_req,
+			   sizeof(enclave_start_req), &cmd_reply, sizeof(cmd_reply));
+	if (rc < 0) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Error in enclave start [rc=%d]\n", rc);
+
+		return rc;
+	}
+
+	ne_enclave->state = NE_STATE_RUNNING;
+
+	enclave_start_info->enclave_cid = cmd_reply.enclave_cid;
+
+	return 0;
+}
+
 /**
  * ne_enclave_ioctl() - Ioctl function provided by the enclave file.
  * @file:	File associated with this ioctl function.
@@ -1147,6 +1218,90 @@ static long ne_enclave_ioctl(struct file *file, unsigned int cmd, unsigned long
 		return 0;
 	}
 
+	case NE_START_ENCLAVE: {
+		struct ne_enclave_start_info enclave_start_info = {};
+		int rc = -EINVAL;
+
+		if (copy_from_user(&enclave_start_info, (void __user *)arg,
+				   sizeof(enclave_start_info)))
+			return -EFAULT;
+
+		if (enclave_start_info.flags >= NE_ENCLAVE_START_MAX_FLAG_VAL) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Incorrect flag in enclave start info\n");
+
+			return -NE_ERR_INVALID_FLAG_VALUE;
+		}
+
+		/*
+		 * Do not use well-known CIDs - 0, 1, 2 - for enclaves.
+		 * VMADDR_CID_ANY = -1U
+		 * VMADDR_CID_HYPERVISOR = 0
+		 * VMADDR_CID_LOCAL = 1
+		 * VMADDR_CID_HOST = 2
+		 * Note: 0 is used as a placeholder to auto-generate an enclave CID.
+		 * http://man7.org/linux/man-pages/man7/vsock.7.html
+		 */
+		if (enclave_start_info.enclave_cid > 0 &&
+		    enclave_start_info.enclave_cid <= VMADDR_CID_HOST) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Well-known CID value, not to be used for enclaves\n");
+
+			return -NE_ERR_INVALID_ENCLAVE_CID;
+		}
+
+		if (enclave_start_info.enclave_cid == U32_MAX) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Well-known CID value, not to be used for enclaves\n");
+
+			return -NE_ERR_INVALID_ENCLAVE_CID;
+		}
+
+		/*
+		 * Do not use the CID of the primary / parent VM for enclaves.
+		 */
+		if (enclave_start_info.enclave_cid == NE_PARENT_VM_CID) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "CID of the parent VM, not to be used for enclaves\n");
+
+			return -NE_ERR_INVALID_ENCLAVE_CID;
+		}
+
+		/* 64-bit CIDs are not yet supported for the vsock device. */
+		if (enclave_start_info.enclave_cid > U32_MAX) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "64-bit CIDs not yet supported for the vsock device\n");
+
+			return -NE_ERR_INVALID_ENCLAVE_CID;
+		}
+
+		mutex_lock(&ne_enclave->enclave_info_mutex);
+
+		if (ne_enclave->state != NE_STATE_INIT) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Enclave is not in init state\n");
+
+			mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+			return -NE_ERR_NOT_IN_INIT_STATE;
+		}
+
+		rc = ne_start_enclave_ioctl(ne_enclave, &enclave_start_info);
+		if (rc < 0) {
+			mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+			return rc;
+		}
+
+		mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+		if (copy_to_user((void __user *)arg, &enclave_start_info,
+				 sizeof(enclave_start_info)))
+			return -EFAULT;
+
+		return 0;
+	}
+
 	default:
 		return -ENOTTY;
 	}

From patchwork Fri Sep  4 17:37:13 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Paraschiv, Andra-Irina" <andraprs@amazon.com>
X-Patchwork-Id: 11758323
Return-Path: <SRS0=EfB3=CN=vger.kernel.org=kvm-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BCD6992C
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:05 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 5D11F2415A
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:05 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com
 header.b="CuiA0Wk1"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1728215AbgIDRkT (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Fri, 4 Sep 2020 13:40:19 -0400
Received: from smtp-fw-33001.amazon.com ([207.171.190.10]:37119 "EHLO
        smtp-fw-33001.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1728245AbgIDRkC (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 4 Sep 2020 13:40:02 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209;
  t=1599241202; x=1630777202;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=nbyWTFeOZZR57KSpqNP9YA6sOZzSZMtSryytZtxqtRw=;
  b=CuiA0Wk1kPK8G1cq+zKDNdgGgJiBzoS863UjddvlmkWgNX8IBxsDzYNn
   5naoedlgzWFtwe4V64Wwjz2oFneWu9LDvZzQyc4r42OSm9w6OZb7hZxyj
   XGYHiYHxOkz9iyqZHb/fGB4an1r4YocgG0/B/yhVYPzzzGl7VjPwkgxgS
   M=;
X-IronPort-AV: E=Sophos;i="5.76,390,1592870400";
   d="scan'208";a="72479778"
Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO
 email-inbound-relay-1d-474bcd9f.us-east-1.amazon.com) ([10.47.23.38])
  by smtp-border-fw-out-33001.sea14.amazon.com with ESMTP;
 04 Sep 2020 17:40:01 +0000
Received: from EX13D16EUB001.ant.amazon.com
 (iad55-ws-svc-p15-lb9-vlan3.iad.amazon.com [10.40.159.166])
        by email-inbound-relay-1d-474bcd9f.us-east-1.amazon.com (Postfix) with
 ESMTPS id 12884A2105;
        Fri,  4 Sep 2020 17:39:57 +0000 (UTC)
Received: from 38f9d34ed3b1.ant.amazon.com (10.43.161.145) by
 EX13D16EUB001.ant.amazon.com (10.43.166.28) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Fri, 4 Sep 2020 17:39:47 +0000
From: Andra Paraschiv <andraprs@amazon.com>
To: linux-kernel <linux-kernel@vger.kernel.org>
CC: Anthony Liguori <aliguori@amazon.com>,
        Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Colm MacCarthaigh <colmmacc@amazon.com>,
        "David Duncan" <davdunc@amazon.com>,
        Bjoern Doebel <doebel@amazon.de>,
        "David Woodhouse" <dwmw@amazon.co.uk>,
        Frank van der Linden <fllinden@amazon.com>,
        Alexander Graf <graf@amazon.de>,
        Greg KH <gregkh@linuxfoundation.org>,
        "Karen Noel" <knoel@redhat.com>,
        Martin Pohlack <mpohlack@amazon.de>,
        Matt Wilson <msw@amazon.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Balbir Singh <sblbir@amazon.com>,
        Stefano Garzarella <sgarzare@redhat.com>,
        "Stefan Hajnoczi" <stefanha@redhat.com>,
        Stewart Smith <trawets@amazon.com>,
        "Uwe Dannowski" <uwed@amazon.de>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        kvm <kvm@vger.kernel.org>,
        ne-devel-upstream <ne-devel-upstream@amazon.com>,
        Andra Paraschiv <andraprs@amazon.com>
Subject: [PATCH v8 13/18] nitro_enclaves: Add logic for terminating an enclave
Date: Fri, 4 Sep 2020 20:37:13 +0300
Message-ID: <20200904173718.64857-14-andraprs@amazon.com>
X-Mailer: git-send-email 2.20.1 (Apple Git-117)
In-Reply-To: <20200904173718.64857-1-andraprs@amazon.com>
References: <20200904173718.64857-1-andraprs@amazon.com>
MIME-Version: 1.0
X-Originating-IP: [10.43.161.145]
X-ClientProxiedBy: EX13D08UWC002.ant.amazon.com (10.43.162.168) To
 EX13D16EUB001.ant.amazon.com (10.43.166.28)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

An enclave is associated with an fd that is returned after the enclave
creation logic is completed. This enclave fd is further used to setup
enclave resources. Once the enclave needs to be terminated, the enclave
fd is closed.

Add logic for enclave termination, that is mapped to the enclave fd
release callback. Free the internal enclave info used for bookkeeping.

Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
---
Changelog

v7 -> v8

* No changes.

v6 -> v7

* Remove the pci_dev_put() call as the NE misc device parent field is
  used now to get the NE PCI device.
* Update the naming and add more comments to make more clear the logic
  of handling full CPU cores and dedicating them to the enclave.

v5 -> v6

* Update documentation to kernel-doc format.
* Use directly put_page() instead of unpin_user_pages(), to match the
  get_user_pages() calls.

v4 -> v5

* Release the reference to the NE PCI device on enclave fd release.
* Adapt the logic to cpumask enclave vCPU ids and CPU cores.
* Remove sanity checks for situations that shouldn't happen, only if
  buggy system or broken logic at all.

v3 -> v4

* Use dev_err instead of custom NE log pattern.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Remove the BUG_ON calls.
* Update goto labels to match their purpose.
* Add early exit in release() if there was a slot alloc error in the fd
  creation path.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 166 ++++++++++++++++++++++
 1 file changed, 166 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 5ec7fdf9d08e..1fb194d3ab62 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -1309,6 +1309,171 @@ static long ne_enclave_ioctl(struct file *file, unsigned int cmd, unsigned long
 	return 0;
 }
 
+/**
+ * ne_enclave_remove_all_mem_region_entries() - Remove all memory region entries
+ *						from the enclave data structure.
+ * @ne_enclave :	Private data associated with the current enclave.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex held.
+ */
+static void ne_enclave_remove_all_mem_region_entries(struct ne_enclave *ne_enclave)
+{
+	unsigned long i = 0;
+	struct ne_mem_region *ne_mem_region = NULL;
+	struct ne_mem_region *ne_mem_region_tmp = NULL;
+
+	list_for_each_entry_safe(ne_mem_region, ne_mem_region_tmp,
+				 &ne_enclave->mem_regions_list,
+				 mem_region_list_entry) {
+		list_del(&ne_mem_region->mem_region_list_entry);
+
+		for (i = 0; i < ne_mem_region->nr_pages; i++)
+			put_page(ne_mem_region->pages[i]);
+
+		kfree(ne_mem_region->pages);
+
+		kfree(ne_mem_region);
+	}
+}
+
+/**
+ * ne_enclave_remove_all_vcpu_id_entries() - Remove all vCPU id entries from
+ *					     the enclave data structure.
+ * @ne_enclave :	Private data associated with the current enclave.
+ *
+ * Context: Process context. This function is called with the ne_enclave mutex held.
+ */
+static void ne_enclave_remove_all_vcpu_id_entries(struct ne_enclave *ne_enclave)
+{
+	unsigned int cpu = 0;
+	unsigned int i = 0;
+
+	mutex_lock(&ne_cpu_pool.mutex);
+
+	for (i = 0; i < ne_enclave->nr_parent_vm_cores; i++) {
+		for_each_cpu(cpu, ne_enclave->threads_per_core[i])
+			/* Update the available NE CPU pool. */
+			cpumask_set_cpu(cpu, ne_cpu_pool.avail_threads_per_core[i]);
+
+		free_cpumask_var(ne_enclave->threads_per_core[i]);
+	}
+
+	mutex_unlock(&ne_cpu_pool.mutex);
+
+	kfree(ne_enclave->threads_per_core);
+
+	free_cpumask_var(ne_enclave->vcpu_ids);
+}
+
+/**
+ * ne_pci_dev_remove_enclave_entry() - Remove the enclave entry from the data
+ *				       structure that is part of the NE PCI
+ *				       device private data.
+ * @ne_enclave :	Private data associated with the current enclave.
+ * @ne_pci_dev :	Private data associated with the PCI device.
+ *
+ * Context: Process context. This function is called with the ne_pci_dev enclave
+ *	    mutex held.
+ */
+static void ne_pci_dev_remove_enclave_entry(struct ne_enclave *ne_enclave,
+					    struct ne_pci_dev *ne_pci_dev)
+{
+	struct ne_enclave *ne_enclave_entry = NULL;
+	struct ne_enclave *ne_enclave_entry_tmp = NULL;
+
+	list_for_each_entry_safe(ne_enclave_entry, ne_enclave_entry_tmp,
+				 &ne_pci_dev->enclaves_list, enclave_list_entry) {
+		if (ne_enclave_entry->slot_uid == ne_enclave->slot_uid) {
+			list_del(&ne_enclave_entry->enclave_list_entry);
+
+			break;
+		}
+	}
+}
+
+/**
+ * ne_enclave_release() - Release function provided by the enclave file.
+ * @inode:	Inode associated with this file release function.
+ * @file:	File associated with this release function.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_enclave_release(struct inode *inode, struct file *file)
+{
+	struct ne_pci_dev_cmd_reply cmd_reply = {};
+	struct enclave_stop_req enclave_stop_request = {};
+	struct ne_enclave *ne_enclave = file->private_data;
+	struct ne_pci_dev *ne_pci_dev = NULL;
+	int rc = -EINVAL;
+	struct slot_free_req slot_free_req = {};
+
+	if (!ne_enclave)
+		return 0;
+
+	/*
+	 * Early exit in case there is an error in the enclave creation logic
+	 * and fput() is called on the cleanup path.
+	 */
+	if (!ne_enclave->slot_uid)
+		return 0;
+
+	ne_pci_dev = pci_get_drvdata(ne_enclave->pdev);
+
+	/*
+	 * Acquire the enclave list mutex before the enclave mutex
+	 * in order to avoid deadlocks with @ref ne_event_work_handler.
+	 */
+	mutex_lock(&ne_pci_dev->enclaves_list_mutex);
+	mutex_lock(&ne_enclave->enclave_info_mutex);
+
+	if (ne_enclave->state != NE_STATE_INIT && ne_enclave->state != NE_STATE_STOPPED) {
+		enclave_stop_request.slot_uid = ne_enclave->slot_uid;
+
+		rc = ne_do_request(ne_enclave->pdev, ENCLAVE_STOP,
+				   &enclave_stop_request, sizeof(enclave_stop_request),
+				   &cmd_reply, sizeof(cmd_reply));
+		if (rc < 0) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Error in enclave stop [rc=%d]\n", rc);
+
+			goto unlock_mutex;
+		}
+
+		memset(&cmd_reply, 0, sizeof(cmd_reply));
+	}
+
+	slot_free_req.slot_uid = ne_enclave->slot_uid;
+
+	rc = ne_do_request(ne_enclave->pdev, SLOT_FREE, &slot_free_req, sizeof(slot_free_req),
+			   &cmd_reply, sizeof(cmd_reply));
+	if (rc < 0) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Error in slot free [rc=%d]\n", rc);
+
+		goto unlock_mutex;
+	}
+
+	ne_pci_dev_remove_enclave_entry(ne_enclave, ne_pci_dev);
+	ne_enclave_remove_all_mem_region_entries(ne_enclave);
+	ne_enclave_remove_all_vcpu_id_entries(ne_enclave);
+
+	mutex_unlock(&ne_enclave->enclave_info_mutex);
+	mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
+
+	kfree(ne_enclave);
+
+	return 0;
+
+unlock_mutex:
+	mutex_unlock(&ne_enclave->enclave_info_mutex);
+	mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
+
+	return rc;
+}
+
 /**
  * ne_enclave_poll() - Poll functionality used for enclave out-of-band events.
  * @file:	File associated with this poll function.
@@ -1338,6 +1503,7 @@ static const struct file_operations ne_enclave_fops = {
 	.llseek		= noop_llseek,
 	.poll		= ne_enclave_poll,
 	.unlocked_ioctl	= ne_enclave_ioctl,
+	.release	= ne_enclave_release,
 };
 
 /**

From patchwork Fri Sep  4 17:37:14 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Paraschiv, Andra-Irina" <andraprs@amazon.com>
X-Patchwork-Id: 11758335
Return-Path: <SRS0=EfB3=CN=vger.kernel.org=kvm-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 411A815AB
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:07 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 1D7B024640
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:07 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com
 header.b="MwcQkMZ8"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1727986AbgIDRmH (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Fri, 4 Sep 2020 13:42:07 -0400
Received: from smtp-fw-2101.amazon.com ([72.21.196.25]:3558 "EHLO
        smtp-fw-2101.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1728198AbgIDRkL (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 4 Sep 2020 13:40:11 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209;
  t=1599241210; x=1630777210;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=uCCUSEPsgK7LpKM7i1LnBPb3b+0b1zudShw68xsNyIE=;
  b=MwcQkMZ8qIqpRGO5pn7jnTm2WBXX3CbA6U0OJ1N4wd5x1ZDoKmW1T7ns
   B/NrIG1b1bbGE154vuXvqcrYF21tdC26XOeaaNk39bL0/YywbXqc3c3ND
   qmiOU6h9Bn/pyWiUgIzCpiwvaKGiTaiFu5u2C1+yag7LJRi3ECbzBwt4a
   g=;
X-IronPort-AV: E=Sophos;i="5.76,390,1592870400";
   d="scan'208";a="51939252"
Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO
 email-inbound-relay-1e-97fdccfd.us-east-1.amazon.com) ([10.43.8.2])
  by smtp-border-fw-out-2101.iad2.amazon.com with ESMTP;
 04 Sep 2020 17:40:10 +0000
Received: from EX13D16EUB001.ant.amazon.com
 (iad55-ws-svc-p15-lb9-vlan2.iad.amazon.com [10.40.159.162])
        by email-inbound-relay-1e-97fdccfd.us-east-1.amazon.com (Postfix) with
 ESMTPS id EC7BCA269E;
        Fri,  4 Sep 2020 17:40:07 +0000 (UTC)
Received: from 38f9d34ed3b1.ant.amazon.com (10.43.161.145) by
 EX13D16EUB001.ant.amazon.com (10.43.166.28) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Fri, 4 Sep 2020 17:39:57 +0000
From: Andra Paraschiv <andraprs@amazon.com>
To: linux-kernel <linux-kernel@vger.kernel.org>
CC: Anthony Liguori <aliguori@amazon.com>,
        Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Colm MacCarthaigh <colmmacc@amazon.com>,
        "David Duncan" <davdunc@amazon.com>,
        Bjoern Doebel <doebel@amazon.de>,
        "David Woodhouse" <dwmw@amazon.co.uk>,
        Frank van der Linden <fllinden@amazon.com>,
        Alexander Graf <graf@amazon.de>,
        Greg KH <gregkh@linuxfoundation.org>,
        "Karen Noel" <knoel@redhat.com>,
        Martin Pohlack <mpohlack@amazon.de>,
        Matt Wilson <msw@amazon.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Balbir Singh <sblbir@amazon.com>,
        Stefano Garzarella <sgarzare@redhat.com>,
        "Stefan Hajnoczi" <stefanha@redhat.com>,
        Stewart Smith <trawets@amazon.com>,
        "Uwe Dannowski" <uwed@amazon.de>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        kvm <kvm@vger.kernel.org>,
        ne-devel-upstream <ne-devel-upstream@amazon.com>,
        Andra Paraschiv <andraprs@amazon.com>
Subject: [PATCH v8 14/18] nitro_enclaves: Add Kconfig for the Nitro Enclaves
 driver
Date: Fri, 4 Sep 2020 20:37:14 +0300
Message-ID: <20200904173718.64857-15-andraprs@amazon.com>
X-Mailer: git-send-email 2.20.1 (Apple Git-117)
In-Reply-To: <20200904173718.64857-1-andraprs@amazon.com>
References: <20200904173718.64857-1-andraprs@amazon.com>
MIME-Version: 1.0
X-Originating-IP: [10.43.161.145]
X-ClientProxiedBy: EX13D08UWC002.ant.amazon.com (10.43.162.168) To
 EX13D16EUB001.ant.amazon.com (10.43.166.28)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
---
Changelog

v7 -> v8

* No changes.

v6 -> v7

* Remove, for now, the dependency on ARM64 arch. x86 is currently
  supported, with Arm to come afterwards. The NE kernel driver can be
  built for aarch64 arch.

v5 -> v6

* No changes.

v4 -> v5

* Add arch dependency for Arm / x86.

v3 -> v4

* Add PCI and SMP dependencies.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Update path to Kconfig to match the drivers/virt/nitro_enclaves
  directory.
* Update help in Kconfig.
---
 drivers/virt/Kconfig                |  2 ++
 drivers/virt/nitro_enclaves/Kconfig | 20 ++++++++++++++++++++
 2 files changed, 22 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/Kconfig

diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
index cbc1f25c79ab..80c5f9c16ec1 100644
--- a/drivers/virt/Kconfig
+++ b/drivers/virt/Kconfig
@@ -32,4 +32,6 @@ config FSL_HV_MANAGER
 	     partition shuts down.
 
 source "drivers/virt/vboxguest/Kconfig"
+
+source "drivers/virt/nitro_enclaves/Kconfig"
 endif
diff --git a/drivers/virt/nitro_enclaves/Kconfig b/drivers/virt/nitro_enclaves/Kconfig
new file mode 100644
index 000000000000..8c9387a232df
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/Kconfig
@@ -0,0 +1,20 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+
+# Amazon Nitro Enclaves (NE) support.
+# Nitro is a hypervisor that has been developed by Amazon.
+
+# TODO: Add dependency for ARM64 once NE is supported on Arm platforms. For now,
+# the NE kernel driver can be built for aarch64 arch.
+# depends on (ARM64 || X86) && HOTPLUG_CPU && PCI && SMP
+
+config NITRO_ENCLAVES
+	tristate "Nitro Enclaves Support"
+	depends on X86 && HOTPLUG_CPU && PCI && SMP
+	help
+	  This driver consists of support for enclave lifetime management
+	  for Nitro Enclaves (NE).
+
+	  To compile this driver as a module, choose M here.
+	  The module will be called nitro_enclaves.

From patchwork Fri Sep  4 17:37:15 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Paraschiv, Andra-Irina" <andraprs@amazon.com>
X-Patchwork-Id: 11758327
Return-Path: <SRS0=EfB3=CN=vger.kernel.org=kvm-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4881515AB
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:06 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id D6DCD2415A
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:05 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com
 header.b="EHCRkOUl"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1728252AbgIDRka (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Fri, 4 Sep 2020 13:40:30 -0400
Received: from smtp-fw-9102.amazon.com ([207.171.184.29]:31235 "EHLO
        smtp-fw-9102.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1727964AbgIDRk3 (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 4 Sep 2020 13:40:29 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209;
  t=1599241229; x=1630777229;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Lm0xp2nyoJ7iwxMrT0gLboEtI9wzxivQfh/SLbvjtB4=;
  b=EHCRkOUlyHmb+L43osaH5kjMMbCGkFvK4ISAaRhNSbia473/NC/qT8Wf
   H1o+JkZYSodz4Ld949uzoI4WSOFFM7IGD8gvPw1UCdjat6YDAkzGaH/bM
   +DGlH9x5tZbNFe7lMfVRSHS2gvQPVdVvnHxRM6LrpyGltHeuG6e4uJdkV
   A=;
X-IronPort-AV: E=Sophos;i="5.76,390,1592870400";
   d="scan'208";a="73692053"
Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO
 email-inbound-relay-1a-67b371d8.us-east-1.amazon.com) ([10.47.23.38])
  by smtp-border-fw-out-9102.sea19.amazon.com with ESMTP;
 04 Sep 2020 17:40:24 +0000
Received: from EX13D16EUB001.ant.amazon.com
 (iad55-ws-svc-p15-lb9-vlan2.iad.amazon.com [10.40.159.162])
        by email-inbound-relay-1a-67b371d8.us-east-1.amazon.com (Postfix) with
 ESMTPS id 167CFA1BD7;
        Fri,  4 Sep 2020 17:40:23 +0000 (UTC)
Received: from 38f9d34ed3b1.ant.amazon.com (10.43.161.145) by
 EX13D16EUB001.ant.amazon.com (10.43.166.28) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Fri, 4 Sep 2020 17:40:12 +0000
From: Andra Paraschiv <andraprs@amazon.com>
To: linux-kernel <linux-kernel@vger.kernel.org>
CC: Anthony Liguori <aliguori@amazon.com>,
        Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Colm MacCarthaigh <colmmacc@amazon.com>,
        "David Duncan" <davdunc@amazon.com>,
        Bjoern Doebel <doebel@amazon.de>,
        "David Woodhouse" <dwmw@amazon.co.uk>,
        Frank van der Linden <fllinden@amazon.com>,
        Alexander Graf <graf@amazon.de>,
        Greg KH <gregkh@linuxfoundation.org>,
        "Karen Noel" <knoel@redhat.com>,
        Martin Pohlack <mpohlack@amazon.de>,
        Matt Wilson <msw@amazon.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Balbir Singh <sblbir@amazon.com>,
        Stefano Garzarella <sgarzare@redhat.com>,
        "Stefan Hajnoczi" <stefanha@redhat.com>,
        Stewart Smith <trawets@amazon.com>,
        "Uwe Dannowski" <uwed@amazon.de>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        kvm <kvm@vger.kernel.org>,
        ne-devel-upstream <ne-devel-upstream@amazon.com>,
        Andra Paraschiv <andraprs@amazon.com>
Subject: [PATCH v8 15/18] nitro_enclaves: Add Makefile for the Nitro Enclaves
 driver
Date: Fri, 4 Sep 2020 20:37:15 +0300
Message-ID: <20200904173718.64857-16-andraprs@amazon.com>
X-Mailer: git-send-email 2.20.1 (Apple Git-117)
In-Reply-To: <20200904173718.64857-1-andraprs@amazon.com>
References: <20200904173718.64857-1-andraprs@amazon.com>
MIME-Version: 1.0
X-Originating-IP: [10.43.161.145]
X-ClientProxiedBy: EX13D06UWC001.ant.amazon.com (10.43.162.91) To
 EX13D16EUB001.ant.amazon.com (10.43.166.28)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
---
Changelog

v7 -> v8

* No changes.

v6 -> v7

* No changes.

v5 -> v6

* No changes.

v4 -> v5

* No changes.

v3 -> v4

* No changes.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Update path to Makefile to match the drivers/virt/nitro_enclaves
  directory.
---
 drivers/virt/Makefile                |  2 ++
 drivers/virt/nitro_enclaves/Makefile | 11 +++++++++++
 2 files changed, 13 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/Makefile

diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
index fd331247c27a..f28425ce4b39 100644
--- a/drivers/virt/Makefile
+++ b/drivers/virt/Makefile
@@ -5,3 +5,5 @@
 
 obj-$(CONFIG_FSL_HV_MANAGER)	+= fsl_hypervisor.o
 obj-y				+= vboxguest/
+
+obj-$(CONFIG_NITRO_ENCLAVES)	+= nitro_enclaves/
diff --git a/drivers/virt/nitro_enclaves/Makefile b/drivers/virt/nitro_enclaves/Makefile
new file mode 100644
index 000000000000..e9f4fcd1591e
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/Makefile
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+
+# Enclave lifetime management support for Nitro Enclaves (NE).
+
+obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves.o
+
+nitro_enclaves-y := ne_pci_dev.o ne_misc_dev.o
+
+ccflags-y += -Wall

From patchwork Fri Sep  4 17:37:16 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Paraschiv, Andra-Irina" <andraprs@amazon.com>
X-Patchwork-Id: 11758331
Return-Path: <SRS0=EfB3=CN=vger.kernel.org=kvm-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A94B1138E
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:06 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 7060224640
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:06 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com
 header.b="AbT4QyAP"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1728292AbgIDRkv (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Fri, 4 Sep 2020 13:40:51 -0400
Received: from smtp-fw-4101.amazon.com ([72.21.198.25]:50639 "EHLO
        smtp-fw-4101.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1728279AbgIDRkl (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 4 Sep 2020 13:40:41 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209;
  t=1599241237; x=1630777237;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=JHWbSdfDvt8NNhmcMfRK8xBmbggJxJMKuHA88kVk7Fo=;
  b=AbT4QyAPdG78XKKuzv8qpnmimOdEk6D6kyq6Bwt2XM7taZIkYJd4BkbW
   kUw6EgGxS77xFsEYhuMi1Lnm5GhLJBHKmm+0G4l2IKHDLuaFpKh9ChqeV
   p/BauCVuYtoIeBxR6oEno/VtN0LAkLq94CZkzL9MaEIKCYT4iLWm+1en+
   Y=;
X-IronPort-AV: E=Sophos;i="5.76,390,1592870400";
   d="scan'208";a="52179432"
Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO
 email-inbound-relay-1e-97fdccfd.us-east-1.amazon.com) ([10.43.8.6])
  by smtp-border-fw-out-4101.iad4.amazon.com with ESMTP;
 04 Sep 2020 17:40:36 +0000
Received: from EX13D16EUB001.ant.amazon.com
 (iad55-ws-svc-p15-lb9-vlan2.iad.amazon.com [10.40.159.162])
        by email-inbound-relay-1e-97fdccfd.us-east-1.amazon.com (Postfix) with
 ESMTPS id 37CB3A25A2;
        Fri,  4 Sep 2020 17:40:33 +0000 (UTC)
Received: from 38f9d34ed3b1.ant.amazon.com (10.43.161.145) by
 EX13D16EUB001.ant.amazon.com (10.43.166.28) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Fri, 4 Sep 2020 17:40:22 +0000
From: Andra Paraschiv <andraprs@amazon.com>
To: linux-kernel <linux-kernel@vger.kernel.org>
CC: Anthony Liguori <aliguori@amazon.com>,
        Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Colm MacCarthaigh <colmmacc@amazon.com>,
        "David Duncan" <davdunc@amazon.com>,
        Bjoern Doebel <doebel@amazon.de>,
        "David Woodhouse" <dwmw@amazon.co.uk>,
        Frank van der Linden <fllinden@amazon.com>,
        Alexander Graf <graf@amazon.de>,
        Greg KH <gregkh@linuxfoundation.org>,
        "Karen Noel" <knoel@redhat.com>,
        Martin Pohlack <mpohlack@amazon.de>,
        Matt Wilson <msw@amazon.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Balbir Singh <sblbir@amazon.com>,
        Stefano Garzarella <sgarzare@redhat.com>,
        "Stefan Hajnoczi" <stefanha@redhat.com>,
        Stewart Smith <trawets@amazon.com>,
        "Uwe Dannowski" <uwed@amazon.de>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        kvm <kvm@vger.kernel.org>,
        ne-devel-upstream <ne-devel-upstream@amazon.com>,
        Andra Paraschiv <andraprs@amazon.com>
Subject: [PATCH v8 16/18] nitro_enclaves: Add sample for ioctl interface usage
Date: Fri, 4 Sep 2020 20:37:16 +0300
Message-ID: <20200904173718.64857-17-andraprs@amazon.com>
X-Mailer: git-send-email 2.20.1 (Apple Git-117)
In-Reply-To: <20200904173718.64857-1-andraprs@amazon.com>
References: <20200904173718.64857-1-andraprs@amazon.com>
MIME-Version: 1.0
X-Originating-IP: [10.43.161.145]
X-ClientProxiedBy: EX13D06UWC001.ant.amazon.com (10.43.162.91) To
 EX13D16EUB001.ant.amazon.com (10.43.166.28)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
---
Changelog

v7 -> v8

* Track NE custom error codes for invalid page size, invalid flags and
  enclave CID.
* Update the heartbeat logic to have a listener fd first, then start the
  enclave and then accept connection to get the heartbeat.
* Update the reference link to the hugetlb documentation.

v6 -> v7

* Track POLLNVAL as poll event in addition to POLLHUP.

v5 -> v6

* Remove "rc" mentioning when printing errno string.
* Remove the ioctl to query API version.
* Include usage info for NUMA-aware hugetlb configuration.
* Update documentation to kernel-doc format.
* Add logic for enclave image loading.

v4 -> v5

* Print enclave vCPU ids when they are created.
* Update logic to map the modified vCPU ioctl call.
* Add check for the path to the enclave image to be less than PATH_MAX.
* Update the ioctl calls error checking logic to match the NE specific
  error codes.

v3 -> v4

* Update usage details to match the updates in v4.
* Update NE ioctl interface usage.

v2 -> v3

* Remove the include directory to use the uapi from the kernel.
* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* New in v2.
---
 samples/nitro_enclaves/.gitignore        |   2 +
 samples/nitro_enclaves/Makefile          |  16 +
 samples/nitro_enclaves/ne_ioctl_sample.c | 883 +++++++++++++++++++++++
 3 files changed, 901 insertions(+)
 create mode 100644 samples/nitro_enclaves/.gitignore
 create mode 100644 samples/nitro_enclaves/Makefile
 create mode 100644 samples/nitro_enclaves/ne_ioctl_sample.c

diff --git a/samples/nitro_enclaves/.gitignore b/samples/nitro_enclaves/.gitignore
new file mode 100644
index 000000000000..827934129c90
--- /dev/null
+++ b/samples/nitro_enclaves/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+ne_ioctl_sample
diff --git a/samples/nitro_enclaves/Makefile b/samples/nitro_enclaves/Makefile
new file mode 100644
index 000000000000..a3ec78fefb52
--- /dev/null
+++ b/samples/nitro_enclaves/Makefile
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+
+# Enclave lifetime management support for Nitro Enclaves (NE) - ioctl sample
+# usage.
+
+.PHONY: all clean
+
+CFLAGS += -Wall
+
+all:
+	$(CC) $(CFLAGS) -o ne_ioctl_sample ne_ioctl_sample.c -lpthread
+
+clean:
+	rm -f ne_ioctl_sample
diff --git a/samples/nitro_enclaves/ne_ioctl_sample.c b/samples/nitro_enclaves/ne_ioctl_sample.c
new file mode 100644
index 000000000000..480b763142b3
--- /dev/null
+++ b/samples/nitro_enclaves/ne_ioctl_sample.c
@@ -0,0 +1,883 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+/**
+ * DOC: Sample flow of using the ioctl interface provided by the Nitro Enclaves (NE)
+ * kernel driver.
+ *
+ * Usage
+ * -----
+ *
+ * Load the nitro_enclaves module, setting also the enclave CPU pool. The
+ * enclave CPUs need to be full cores from the same NUMA node. CPU 0 and its
+ * siblings have to remain available for the primary / parent VM, so they
+ * cannot be included in the enclave CPU pool.
+ *
+ * See the cpu list section from the kernel documentation.
+ * https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html#cpu-lists
+ *
+ *	insmod drivers/virt/nitro_enclaves/nitro_enclaves.ko
+ *	lsmod
+ *
+ *	The CPU pool can be set at runtime, after the kernel module is loaded.
+ *
+ *	echo <cpu-list> > /sys/module/nitro_enclaves/parameters/ne_cpus
+ *
+ *	NUMA and CPU siblings information can be found using:
+ *
+ *	lscpu
+ *	/proc/cpuinfo
+ *
+ * Check the online / offline CPU list. The CPUs from the pool should be
+ * offlined.
+ *
+ *	lscpu
+ *
+ * Check dmesg for any warnings / errors through the NE driver lifetime / usage.
+ * The NE logs contain the "nitro_enclaves" or "pci 0000:00:02.0" pattern.
+ *
+ *	dmesg
+ *
+ * Setup hugetlbfs huge pages. The memory needs to be from the same NUMA node as
+ * the enclave CPUs.
+ *
+ * https://www.kernel.org/doc/html/latest/admin-guide/mm/hugetlbpage.html
+ *
+ * By default, the allocation of hugetlb pages are distributed on all possible
+ * NUMA nodes. Use the following configuration files to set the number of huge
+ * pages from a NUMA node:
+ *
+ *	/sys/devices/system/node/node<X>/hugepages/hugepages-2048kB/nr_hugepages
+ *	/sys/devices/system/node/node<X>/hugepages/hugepages-1048576kB/nr_hugepages
+ *
+ *	or, if not on a system with multiple NUMA nodes, can also set the number
+ *	of 2 MiB / 1 GiB huge pages using
+ *
+ *	/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
+ *	/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
+ *
+ *	In this example 256 hugepages of 2 MiB are used.
+ *
+ * Build and run the NE sample.
+ *
+ *	make -C samples/nitro_enclaves clean
+ *	make -C samples/nitro_enclaves
+ *	./samples/nitro_enclaves/ne_ioctl_sample <path_to_enclave_image>
+ *
+ * Unload the nitro_enclaves module.
+ *
+ *	rmmod nitro_enclaves
+ *	lsmod
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <poll.h>
+#include <pthread.h>
+#include <string.h>
+#include <sys/eventfd.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include <linux/mman.h>
+#include <linux/nitro_enclaves.h>
+#include <linux/vm_sockets.h>
+
+/**
+ * NE_DEV_NAME - Nitro Enclaves (NE) misc device that provides the ioctl interface.
+ */
+#define NE_DEV_NAME			"/dev/nitro_enclaves"
+
+/**
+ * NE_POLL_WAIT_TIME - Timeout in seconds for each poll event.
+ */
+#define NE_POLL_WAIT_TIME		(60)
+/**
+ * NE_POLL_WAIT_TIME_MS - Timeout in milliseconds for each poll event.
+ */
+#define NE_POLL_WAIT_TIME_MS		(NE_POLL_WAIT_TIME * 1000)
+
+/**
+ * NE_SLEEP_TIME - Amount of time in seconds for the process to keep the enclave alive.
+ */
+#define NE_SLEEP_TIME			(300)
+
+/**
+ * NE_DEFAULT_NR_VCPUS - Default number of vCPUs set for an enclave.
+ */
+#define NE_DEFAULT_NR_VCPUS		(2)
+
+/**
+ * NE_MIN_MEM_REGION_SIZE - Minimum size of a memory region - 2 MiB.
+ */
+#define NE_MIN_MEM_REGION_SIZE		(2 * 1024 * 1024)
+
+/**
+ * NE_DEFAULT_NR_MEM_REGIONS - Default number of memory regions of 2 MiB set for
+ *			       an enclave.
+ */
+#define NE_DEFAULT_NR_MEM_REGIONS	(256)
+
+/**
+ * NE_IMAGE_LOAD_HEARTBEAT_CID - Vsock CID for enclave image loading heartbeat logic.
+ */
+#define NE_IMAGE_LOAD_HEARTBEAT_CID	(3)
+/**
+ * NE_IMAGE_LOAD_HEARTBEAT_PORT - Vsock port for enclave image loading heartbeat logic.
+ */
+#define NE_IMAGE_LOAD_HEARTBEAT_PORT	(9000)
+/**
+ * NE_IMAGE_LOAD_HEARTBEAT_VALUE - Heartbeat value for enclave image loading.
+ */
+#define NE_IMAGE_LOAD_HEARTBEAT_VALUE	(0xb7)
+
+/**
+ * struct ne_user_mem_region - User space memory region set for an enclave.
+ * @userspace_addr:	Address of the user space memory region.
+ * @memory_size:	Size of the user space memory region.
+ */
+struct ne_user_mem_region {
+	void	*userspace_addr;
+	size_t	memory_size;
+};
+
+/**
+ * ne_create_vm() - Create a slot for the enclave VM.
+ * @ne_dev_fd:		The file descriptor of the NE misc device.
+ * @slot_uid:		The generated slot uid for the enclave.
+ * @enclave_fd :	The generated file descriptor for the enclave.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_create_vm(int ne_dev_fd, unsigned long *slot_uid, int *enclave_fd)
+{
+	int rc = -EINVAL;
+	*enclave_fd = ioctl(ne_dev_fd, NE_CREATE_VM, slot_uid);
+
+	if (*enclave_fd < 0) {
+		rc = *enclave_fd;
+		switch (errno) {
+		case NE_ERR_NO_CPUS_AVAIL_IN_POOL: {
+			printf("Error in create VM, no CPUs available in the NE CPU pool\n");
+
+			break;
+		}
+
+		default:
+			printf("Error in create VM [%m]\n");
+		}
+
+		return rc;
+	}
+
+	return 0;
+}
+
+
+/**
+ * ne_poll_enclave_fd() - Thread function for polling the enclave fd.
+ * @data:	Argument provided for the polling function.
+ *
+ * Context: Process context.
+ * Return:
+ * * NULL on success / failure.
+ */
+void *ne_poll_enclave_fd(void *data)
+{
+	int enclave_fd = *(int *)data;
+	struct pollfd fds[1] = {};
+	int i = 0;
+	int rc = -EINVAL;
+
+	printf("Running from poll thread, enclave fd %d\n", enclave_fd);
+
+	fds[0].fd = enclave_fd;
+	fds[0].events = POLLIN | POLLERR | POLLHUP;
+
+	/* Keep on polling until the current process is terminated. */
+	while (1) {
+		printf("[iter %d] Polling ...\n", i);
+
+		rc = poll(fds, 1, NE_POLL_WAIT_TIME_MS);
+		if (rc < 0) {
+			printf("Error in poll [%m]\n");
+
+			return NULL;
+		}
+
+		i++;
+
+		if (!rc) {
+			printf("Poll: %d seconds elapsed\n",
+			       i * NE_POLL_WAIT_TIME);
+
+			continue;
+		}
+
+		printf("Poll received value 0x%x\n", fds[0].revents);
+
+		if (fds[0].revents & POLLHUP) {
+			printf("Received POLLHUP\n");
+
+			return NULL;
+		}
+
+		if (fds[0].revents & POLLNVAL) {
+			printf("Received POLLNVAL\n");
+
+			return NULL;
+		}
+	}
+
+	return NULL;
+}
+
+/**
+ * ne_alloc_user_mem_region() - Allocate a user space memory region for an enclave.
+ * @ne_user_mem_region:	User space memory region allocated using hugetlbfs.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_alloc_user_mem_region(struct ne_user_mem_region *ne_user_mem_region)
+{
+	/**
+	 * Check available hugetlb encodings for different huge page sizes in
+	 * include/uapi/linux/mman.h.
+	 */
+	ne_user_mem_region->userspace_addr = mmap(NULL, ne_user_mem_region->memory_size,
+						  PROT_READ | PROT_WRITE,
+						  MAP_PRIVATE | MAP_ANONYMOUS |
+						  MAP_HUGETLB | MAP_HUGE_2MB, -1, 0);
+	if (ne_user_mem_region->userspace_addr == MAP_FAILED) {
+		printf("Error in mmap memory [%m]\n");
+
+		return -1;
+	}
+
+	return 0;
+}
+
+/**
+ * ne_load_enclave_image() - Place the enclave image in the enclave memory.
+ * @enclave_fd :		The file descriptor associated with the enclave.
+ * @ne_user_mem_regions:	User space memory regions allocated for the enclave.
+ * @enclave_image_path :	The file path of the enclave image.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_load_enclave_image(int enclave_fd, struct ne_user_mem_region ne_user_mem_regions[],
+				 char *enclave_image_path)
+{
+	unsigned char *enclave_image = NULL;
+	int enclave_image_fd = -1;
+	size_t enclave_image_size = 0;
+	size_t enclave_memory_size = 0;
+	unsigned long i = 0;
+	size_t image_written_bytes = 0;
+	struct ne_image_load_info image_load_info = {
+		.flags = NE_EIF_IMAGE,
+	};
+	struct stat image_stat_buf = {};
+	int rc = -EINVAL;
+	size_t temp_image_offset = 0;
+
+	for (i = 0; i < NE_DEFAULT_NR_MEM_REGIONS; i++)
+		enclave_memory_size += ne_user_mem_regions[i].memory_size;
+
+	rc = stat(enclave_image_path, &image_stat_buf);
+	if (rc < 0) {
+		printf("Error in get image stat info [%m]\n");
+
+		return rc;
+	}
+
+	enclave_image_size = image_stat_buf.st_size;
+
+	if (enclave_memory_size < enclave_image_size) {
+		printf("The enclave memory is smaller than the enclave image size\n");
+
+		return -ENOMEM;
+	}
+
+	rc = ioctl(enclave_fd, NE_GET_IMAGE_LOAD_INFO, &image_load_info);
+	if (rc < 0) {
+		switch (errno) {
+		case NE_ERR_NOT_IN_INIT_STATE: {
+			printf("Error in get image load info, enclave not in init state\n");
+
+			break;
+		}
+
+		case NE_ERR_INVALID_FLAG_VALUE: {
+			printf("Error in get image load info, provided invalid flag\n");
+
+			break;
+		}
+
+		default:
+			printf("Error in get image load info [%m]\n");
+		}
+
+		return rc;
+	}
+
+	printf("Enclave image offset in enclave memory is %lld\n",
+	       image_load_info.memory_offset);
+
+	enclave_image_fd = open(enclave_image_path, O_RDONLY);
+	if (enclave_image_fd < 0) {
+		printf("Error in open enclave image file [%m]\n");
+
+		return enclave_image_fd;
+	}
+
+	enclave_image = mmap(NULL, enclave_image_size, PROT_READ,
+			     MAP_PRIVATE, enclave_image_fd, 0);
+	if (enclave_image == MAP_FAILED) {
+		printf("Error in mmap enclave image [%m]\n");
+
+		return -1;
+	}
+
+	temp_image_offset = image_load_info.memory_offset;
+
+	for (i = 0; i < NE_DEFAULT_NR_MEM_REGIONS; i++) {
+		size_t bytes_to_write = 0;
+		size_t memory_offset = 0;
+		size_t memory_size = ne_user_mem_regions[i].memory_size;
+		size_t remaining_bytes = 0;
+		void *userspace_addr = ne_user_mem_regions[i].userspace_addr;
+
+		if (temp_image_offset >= memory_size) {
+			temp_image_offset -= memory_size;
+
+			continue;
+		} else if (temp_image_offset != 0) {
+			memory_offset = temp_image_offset;
+			memory_size -= temp_image_offset;
+			temp_image_offset = 0;
+		}
+
+		remaining_bytes = enclave_image_size - image_written_bytes;
+		bytes_to_write = memory_size < remaining_bytes ?
+				 memory_size : remaining_bytes;
+
+		memcpy(userspace_addr + memory_offset,
+		       enclave_image + image_written_bytes, bytes_to_write);
+
+		image_written_bytes += bytes_to_write;
+
+		if (image_written_bytes == enclave_image_size)
+			break;
+	}
+
+	munmap(enclave_image, enclave_image_size);
+
+	close(enclave_image_fd);
+
+	return 0;
+}
+
+/**
+ * ne_set_user_mem_region() - Set a user space memory region for the given enclave.
+ * @enclave_fd :		The file descriptor associated with the enclave.
+ * @ne_user_mem_region :	User space memory region to be set for the enclave.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_set_user_mem_region(int enclave_fd, struct ne_user_mem_region ne_user_mem_region)
+{
+	struct ne_user_memory_region mem_region = {
+		.flags = NE_DEFAULT_MEMORY_REGION,
+		.memory_size = ne_user_mem_region.memory_size,
+		.userspace_addr = (__u64)ne_user_mem_region.userspace_addr,
+	};
+	int rc = -EINVAL;
+
+	rc = ioctl(enclave_fd, NE_SET_USER_MEMORY_REGION, &mem_region);
+	if (rc < 0) {
+		switch (errno) {
+		case NE_ERR_NOT_IN_INIT_STATE: {
+			printf("Error in set user memory region, enclave not in init state\n");
+
+			break;
+		}
+
+		case NE_ERR_INVALID_MEM_REGION_SIZE: {
+			printf("Error in set user memory region, mem size not multiple of 2 MiB\n");
+
+			break;
+		}
+
+		case NE_ERR_INVALID_MEM_REGION_ADDR: {
+			printf("Error in set user memory region, invalid user space address\n");
+
+			break;
+		}
+
+		case NE_ERR_UNALIGNED_MEM_REGION_ADDR: {
+			printf("Error in set user memory region, unaligned user space address\n");
+
+			break;
+		}
+
+		case NE_ERR_MEM_REGION_ALREADY_USED: {
+			printf("Error in set user memory region, memory region already used\n");
+
+			break;
+		}
+
+		case NE_ERR_MEM_NOT_HUGE_PAGE: {
+			printf("Error in set user memory region, not backed by huge pages\n");
+
+			break;
+		}
+
+		case NE_ERR_MEM_DIFFERENT_NUMA_NODE: {
+			printf("Error in set user memory region, different NUMA node than CPUs\n");
+
+			break;
+		}
+
+		case NE_ERR_MEM_MAX_REGIONS: {
+			printf("Error in set user memory region, max memory regions reached\n");
+
+			break;
+		}
+
+		case NE_ERR_INVALID_PAGE_SIZE: {
+			printf("Error in set user memory region, has page not multiple of 2 MiB\n");
+
+			break;
+		}
+
+		case NE_ERR_INVALID_FLAG_VALUE: {
+			printf("Error in set user memory region, provided invalid flag\n");
+
+			break;
+		}
+
+		default:
+			printf("Error in set user memory region [%m]\n");
+		}
+
+		return rc;
+	}
+
+	return 0;
+}
+
+/**
+ * ne_free_mem_regions() - Unmap all the user space memory regions that were set
+ *			   aside for the enclave.
+ * @ne_user_mem_regions:	The user space memory regions associated with an enclave.
+ *
+ * Context: Process context.
+ */
+static void ne_free_mem_regions(struct ne_user_mem_region ne_user_mem_regions[])
+{
+	unsigned int i = 0;
+
+	for (i = 0; i < NE_DEFAULT_NR_MEM_REGIONS; i++)
+		munmap(ne_user_mem_regions[i].userspace_addr,
+		       ne_user_mem_regions[i].memory_size);
+}
+
+/**
+ * ne_add_vcpu() - Add a vCPU to the given enclave.
+ * @enclave_fd :	The file descriptor associated with the enclave.
+ * @vcpu_id:		vCPU id to be set for the enclave, either provided or
+ *			auto-generated (if provided vCPU id is 0).
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_add_vcpu(int enclave_fd, unsigned int *vcpu_id)
+{
+	int rc = -EINVAL;
+
+	rc = ioctl(enclave_fd, NE_ADD_VCPU, vcpu_id);
+	if (rc < 0) {
+		switch (errno) {
+		case NE_ERR_NO_CPUS_AVAIL_IN_POOL: {
+			printf("Error in add vcpu, no CPUs available in the NE CPU pool\n");
+
+			break;
+		}
+
+		case NE_ERR_VCPU_ALREADY_USED: {
+			printf("Error in add vcpu, the provided vCPU is already used\n");
+
+			break;
+		}
+
+		case NE_ERR_VCPU_NOT_IN_CPU_POOL: {
+			printf("Error in add vcpu, the provided vCPU is not in the NE CPU pool\n");
+
+			break;
+		}
+
+		case NE_ERR_VCPU_INVALID_CPU_CORE: {
+			printf("Error in add vcpu, the core id of the provided vCPU is invalid\n");
+
+			break;
+		}
+
+		case NE_ERR_NOT_IN_INIT_STATE: {
+			printf("Error in add vcpu, enclave not in init state\n");
+
+			break;
+		}
+
+		case NE_ERR_INVALID_VCPU: {
+			printf("Error in add vcpu, the provided vCPU is out of avail CPUs range\n");
+
+			break;
+		}
+
+		default:
+			printf("Error in add vcpu [%m]\n");
+
+		}
+		return rc;
+	}
+
+	return 0;
+}
+
+/**
+ * ne_start_enclave() - Start the given enclave.
+ * @enclave_fd :		The file descriptor associated with the enclave.
+ * @enclave_start_info :	Enclave metadata used for starting e.g. vsock CID.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_start_enclave(int enclave_fd,  struct ne_enclave_start_info *enclave_start_info)
+{
+	int rc = -EINVAL;
+
+	rc = ioctl(enclave_fd, NE_START_ENCLAVE, enclave_start_info);
+	if (rc < 0) {
+		switch (errno) {
+		case NE_ERR_NOT_IN_INIT_STATE: {
+			printf("Error in start enclave, enclave not in init state\n");
+
+			break;
+		}
+
+		case NE_ERR_NO_MEM_REGIONS_ADDED: {
+			printf("Error in start enclave, no memory regions have been added\n");
+
+			break;
+		}
+
+		case NE_ERR_NO_VCPUS_ADDED: {
+			printf("Error in start enclave, no vCPUs have been added\n");
+
+			break;
+		}
+
+		case NE_ERR_FULL_CORES_NOT_USED: {
+			printf("Error in start enclave, enclave has no full cores set\n");
+
+			break;
+		}
+
+		case NE_ERR_ENCLAVE_MEM_MIN_SIZE: {
+			printf("Error in start enclave, enclave memory is less than min size\n");
+
+			break;
+		}
+
+		case NE_ERR_INVALID_FLAG_VALUE: {
+			printf("Error in start enclave, provided invalid flag\n");
+
+			break;
+		}
+
+		case NE_ERR_INVALID_ENCLAVE_CID: {
+			printf("Error in start enclave, provided invalid enclave CID\n");
+
+			break;
+		}
+
+		default:
+			printf("Error in start enclave [%m]\n");
+		}
+
+		return rc;
+	}
+
+	return 0;
+}
+
+/**
+ * ne_start_enclave_check_booted() - Start the enclave and wait for a hearbeat
+ *				     from it, on a newly created vsock channel,
+ *				     to check it has booted.
+ * @enclave_fd :	The file descriptor associated with the enclave.
+ *
+ * Context: Process context.
+ * Return:
+ * * 0 on success.
+ * * Negative return value on failure.
+ */
+static int ne_start_enclave_check_booted(int enclave_fd)
+{
+	struct sockaddr_vm client_vsock_addr = {};
+	int client_vsock_fd = -1;
+	socklen_t client_vsock_len = sizeof(client_vsock_addr);
+	struct ne_enclave_start_info enclave_start_info = {};
+	struct pollfd fds[1] = {};
+	int rc = -EINVAL;
+	unsigned char recv_buf = 0;
+	struct sockaddr_vm server_vsock_addr = {
+		.svm_family = AF_VSOCK,
+		.svm_cid = NE_IMAGE_LOAD_HEARTBEAT_CID,
+		.svm_port = NE_IMAGE_LOAD_HEARTBEAT_PORT,
+	};
+	int server_vsock_fd = -1;
+
+	server_vsock_fd = socket(AF_VSOCK, SOCK_STREAM, 0);
+	if (server_vsock_fd < 0) {
+		rc = server_vsock_fd;
+
+		printf("Error in socket [%m]\n");
+
+		return rc;
+	}
+
+	rc = bind(server_vsock_fd, (struct sockaddr *)&server_vsock_addr,
+		  sizeof(server_vsock_addr));
+	if (rc < 0) {
+		printf("Error in bind [%m]\n");
+
+		goto out;
+	}
+
+	rc = listen(server_vsock_fd, 1);
+	if (rc < 0) {
+		printf("Error in listen [%m]\n");
+
+		goto out;
+	}
+
+	rc = ne_start_enclave(enclave_fd, &enclave_start_info);
+	if (rc < 0)
+		goto out;
+
+	printf("Enclave started, CID %llu\n", enclave_start_info.enclave_cid);
+
+	fds[0].fd = server_vsock_fd;
+	fds[0].events = POLLIN;
+
+	rc = poll(fds, 1, NE_POLL_WAIT_TIME_MS);
+	if (rc < 0) {
+		printf("Error in poll [%m]\n");
+
+		goto out;
+	}
+
+	if (!rc) {
+		printf("Poll timeout, %d seconds elapsed\n", NE_POLL_WAIT_TIME);
+
+		rc = -ETIMEDOUT;
+
+		goto out;
+	}
+
+	if ((fds[0].revents & POLLIN) == 0) {
+		printf("Poll received value %d\n", fds[0].revents);
+
+		rc = -EINVAL;
+
+		goto out;
+	}
+
+	rc = accept(server_vsock_fd, (struct sockaddr *)&client_vsock_addr,
+		    &client_vsock_len);
+	if (rc < 0) {
+		printf("Error in accept [%m]\n");
+
+		goto out;
+	}
+
+	client_vsock_fd = rc;
+
+	/*
+	 * Read the heartbeat value that the init process in the enclave sends
+	 * after vsock connect.
+	 */
+	rc = read(client_vsock_fd, &recv_buf, sizeof(recv_buf));
+	if (rc < 0) {
+		printf("Error in read [%m]\n");
+
+		goto out;
+	}
+
+	if (rc != sizeof(recv_buf) || recv_buf != NE_IMAGE_LOAD_HEARTBEAT_VALUE) {
+		printf("Read %d instead of %d\n", recv_buf,
+		       NE_IMAGE_LOAD_HEARTBEAT_VALUE);
+
+		goto out;
+	}
+
+	/* Write the heartbeat value back. */
+	rc = write(client_vsock_fd, &recv_buf, sizeof(recv_buf));
+	if (rc < 0) {
+		printf("Error in write [%m]\n");
+
+		goto out;
+	}
+
+	rc = 0;
+
+out:
+	close(server_vsock_fd);
+
+	return rc;
+}
+
+int main(int argc, char *argv[])
+{
+	int enclave_fd = -1;
+	unsigned int i = 0;
+	int ne_dev_fd = -1;
+	struct ne_user_mem_region ne_user_mem_regions[NE_DEFAULT_NR_MEM_REGIONS] = {};
+	unsigned int ne_vcpus[NE_DEFAULT_NR_VCPUS] = {};
+	int rc = -EINVAL;
+	pthread_t thread_id = 0;
+	unsigned long slot_uid = 0;
+
+	if (argc != 2) {
+		printf("Usage: %s <path_to_enclave_image>\n", argv[0]);
+
+		exit(EXIT_FAILURE);
+	}
+
+	if (strlen(argv[1]) >= PATH_MAX) {
+		printf("The size of the path to enclave image is higher than max path\n");
+
+		exit(EXIT_FAILURE);
+	}
+
+	ne_dev_fd = open(NE_DEV_NAME, O_RDWR | O_CLOEXEC);
+	if (ne_dev_fd < 0) {
+		printf("Error in open NE device [%m]\n");
+
+		exit(EXIT_FAILURE);
+	}
+
+	printf("Creating enclave slot ...\n");
+
+	rc = ne_create_vm(ne_dev_fd, &slot_uid, &enclave_fd);
+
+	close(ne_dev_fd);
+
+	if (rc < 0)
+		exit(EXIT_FAILURE);
+
+	printf("Enclave fd %d\n", enclave_fd);
+
+	rc = pthread_create(&thread_id, NULL, ne_poll_enclave_fd, (void *)&enclave_fd);
+	if (rc < 0) {
+		printf("Error in thread create [%m]\n");
+
+		close(enclave_fd);
+
+		exit(EXIT_FAILURE);
+	}
+
+	for (i = 0; i < NE_DEFAULT_NR_MEM_REGIONS; i++) {
+		ne_user_mem_regions[i].memory_size = NE_MIN_MEM_REGION_SIZE;
+
+		rc = ne_alloc_user_mem_region(&ne_user_mem_regions[i]);
+		if (rc < 0) {
+			printf("Error in alloc userspace memory region, iter %d\n", i);
+
+			goto release_enclave_fd;
+		}
+	}
+
+	rc = ne_load_enclave_image(enclave_fd, ne_user_mem_regions, argv[1]);
+	if (rc < 0)
+		goto release_enclave_fd;
+
+	for (i = 0; i < NE_DEFAULT_NR_MEM_REGIONS; i++) {
+		rc = ne_set_user_mem_region(enclave_fd, ne_user_mem_regions[i]);
+		if (rc < 0) {
+			printf("Error in set memory region, iter %d\n", i);
+
+			goto release_enclave_fd;
+		}
+	}
+
+	printf("Enclave memory regions were added\n");
+
+	for (i = 0; i < NE_DEFAULT_NR_VCPUS; i++) {
+		/*
+		 * The vCPU is chosen from the enclave vCPU pool, if the value
+		 * of the vcpu_id is 0.
+		 */
+		ne_vcpus[i] = 0;
+		rc = ne_add_vcpu(enclave_fd, &ne_vcpus[i]);
+		if (rc < 0) {
+			printf("Error in add vcpu, iter %d\n", i);
+
+			goto release_enclave_fd;
+		}
+
+		printf("Added vCPU %d to the enclave\n", ne_vcpus[i]);
+	}
+
+	printf("Enclave vCPUs were added\n");
+
+	rc = ne_start_enclave_check_booted(enclave_fd);
+	if (rc < 0) {
+		printf("Error in the enclave start / image loading heartbeat logic [rc=%d]\n", rc);
+
+		goto release_enclave_fd;
+	}
+
+	printf("Entering sleep for %d seconds ...\n", NE_SLEEP_TIME);
+
+	sleep(NE_SLEEP_TIME);
+
+	close(enclave_fd);
+
+	ne_free_mem_regions(ne_user_mem_regions);
+
+	exit(EXIT_SUCCESS);
+
+release_enclave_fd:
+	close(enclave_fd);
+	ne_free_mem_regions(ne_user_mem_regions);
+
+	exit(EXIT_FAILURE);
+}

From patchwork Fri Sep  4 17:37:17 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Paraschiv, Andra-Irina" <andraprs@amazon.com>
X-Patchwork-Id: 11758329
Return-Path: <SRS0=EfB3=CN=vger.kernel.org=kvm-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7E34692C
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:06 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 20CAF24171
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:06 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com
 header.b="fAN+eeXE"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1728305AbgIDRkv (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Fri, 4 Sep 2020 13:40:51 -0400
Received: from smtp-fw-9102.amazon.com ([207.171.184.29]:31337 "EHLO
        smtp-fw-9102.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1728290AbgIDRks (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 4 Sep 2020 13:40:48 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209;
  t=1599241247; x=1630777247;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=nq00NqV/xKNnhaG9cnl5+WXQq9WlTRCESodejl73kQg=;
  b=fAN+eeXE5xWRUUKUkylRBoHFm/OW7dzO07Y/ZpqkDjvFUzL/EHdP1UhW
   GIqPldJ8OVPd4KIu7mgABdGZqk0PgxnYcTZi5cHTlxZSdkCF/4sIThWpp
   0AHhadlcbZWhWuQzNi4fMoPIQ6hxbRBUJ0dWplYQLCvSnjgji1tl5SS09
   g=;
X-IronPort-AV: E=Sophos;i="5.76,390,1592870400";
   d="scan'208";a="73692168"
Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO
 email-inbound-relay-1a-e34f1ddc.us-east-1.amazon.com) ([10.47.23.38])
  by smtp-border-fw-out-9102.sea19.amazon.com with ESMTP;
 04 Sep 2020 17:40:46 +0000
Received: from EX13D16EUB001.ant.amazon.com
 (iad55-ws-svc-p15-lb9-vlan3.iad.amazon.com [10.40.159.166])
        by email-inbound-relay-1a-e34f1ddc.us-east-1.amazon.com (Postfix) with
 ESMTPS id 625CBA21D3;
        Fri,  4 Sep 2020 17:40:43 +0000 (UTC)
Received: from 38f9d34ed3b1.ant.amazon.com (10.43.161.145) by
 EX13D16EUB001.ant.amazon.com (10.43.166.28) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Fri, 4 Sep 2020 17:40:33 +0000
From: Andra Paraschiv <andraprs@amazon.com>
To: linux-kernel <linux-kernel@vger.kernel.org>
CC: Anthony Liguori <aliguori@amazon.com>,
        Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Colm MacCarthaigh <colmmacc@amazon.com>,
        "David Duncan" <davdunc@amazon.com>,
        Bjoern Doebel <doebel@amazon.de>,
        "David Woodhouse" <dwmw@amazon.co.uk>,
        Frank van der Linden <fllinden@amazon.com>,
        Alexander Graf <graf@amazon.de>,
        Greg KH <gregkh@linuxfoundation.org>,
        "Karen Noel" <knoel@redhat.com>,
        Martin Pohlack <mpohlack@amazon.de>,
        Matt Wilson <msw@amazon.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Balbir Singh <sblbir@amazon.com>,
        Stefano Garzarella <sgarzare@redhat.com>,
        "Stefan Hajnoczi" <stefanha@redhat.com>,
        Stewart Smith <trawets@amazon.com>,
        "Uwe Dannowski" <uwed@amazon.de>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        kvm <kvm@vger.kernel.org>,
        ne-devel-upstream <ne-devel-upstream@amazon.com>,
        Andra Paraschiv <andraprs@amazon.com>
Subject: [PATCH v8 17/18] nitro_enclaves: Add overview documentation
Date: Fri, 4 Sep 2020 20:37:17 +0300
Message-ID: <20200904173718.64857-18-andraprs@amazon.com>
X-Mailer: git-send-email 2.20.1 (Apple Git-117)
In-Reply-To: <20200904173718.64857-1-andraprs@amazon.com>
References: <20200904173718.64857-1-andraprs@amazon.com>
MIME-Version: 1.0
X-Originating-IP: [10.43.161.145]
X-ClientProxiedBy: EX13D06UWC001.ant.amazon.com (10.43.162.91) To
 EX13D16EUB001.ant.amazon.com (10.43.166.28)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
---
Changelog

v7 -> v8

* Add info about the primary / parent VM CID value.
* Update reference link for huge pages.
* Add reference link for the x86 boot protocol.
* Add license mention and update doc title / chapter formatting.

v6 -> v7

* No changes.

v5 -> v6

* No changes.

v4 -> v5

* No changes.

v3 -> v4

* Update doc type from .txt to .rst.
* Update documentation based on the changes from v4.

v2 -> v3

* No changes.

v1 -> v2

* New in v2.
---
 Documentation/nitro_enclaves/ne_overview.rst | 95 ++++++++++++++++++++
 1 file changed, 95 insertions(+)
 create mode 100644 Documentation/nitro_enclaves/ne_overview.rst

diff --git a/Documentation/nitro_enclaves/ne_overview.rst b/Documentation/nitro_enclaves/ne_overview.rst
new file mode 100644
index 000000000000..39b0c8fe2654
--- /dev/null
+++ b/Documentation/nitro_enclaves/ne_overview.rst
@@ -0,0 +1,95 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+Nitro Enclaves
+==============
+
+Overview
+========
+
+Nitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability
+that allows customers to carve out isolated compute environments within EC2
+instances [1].
+
+For example, an application that processes sensitive data and runs in a VM,
+can be separated from other applications running in the same VM. This
+application then runs in a separate VM than the primary VM, namely an enclave.
+
+An enclave runs alongside the VM that spawned it. This setup matches low latency
+applications needs. The resources that are allocated for the enclave, such as
+memory and CPUs, are carved out of the primary VM. Each enclave is mapped to a
+process running in the primary VM, that communicates with the NE driver via an
+ioctl interface.
+
+In this sense, there are two components:
+
+1. An enclave abstraction process - a user space process running in the primary
+VM guest that uses the provided ioctl interface of the NE driver to spawn an
+enclave VM (that's 2 below).
+
+There is a NE emulated PCI device exposed to the primary VM. The driver for this
+new PCI device is included in the NE driver.
+
+The ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE ioctl
+maps to an enclave start PCI command. The PCI device commands are then
+translated into  actions taken on the hypervisor side; that's the Nitro
+hypervisor running on the host where the primary VM is running. The Nitro
+hypervisor is based on core KVM technology.
+
+2. The enclave itself - a VM running on the same host as the primary VM that
+spawned it. Memory and CPUs are carved out of the primary VM and are dedicated
+for the enclave VM. An enclave does not have persistent storage attached.
+
+The memory regions carved out of the primary VM and given to an enclave need to
+be aligned 2 MiB / 1 GiB physically contiguous memory regions (or multiple of
+this size e.g. 8 MiB). The memory can be allocated e.g. by using hugetlbfs from
+user space [2][3]. The memory size for an enclave needs to be at least 64 MiB.
+The enclave memory and CPUs need to be from the same NUMA node.
+
+An enclave runs on dedicated cores. CPU 0 and its CPU siblings need to remain
+available for the primary VM. A CPU pool has to be set for NE purposes by an
+user with admin capability. See the cpu list section from the kernel
+documentation [4] for how a CPU pool format looks.
+
+An enclave communicates with the primary VM via a local communication channel,
+using virtio-vsock [5]. The primary VM has virtio-pci vsock emulated device,
+while the enclave VM has a virtio-mmio vsock emulated device. The vsock device
+uses eventfd for signaling. The enclave VM sees the usual interfaces - local
+APIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio
+device is placed in memory below the typical 4 GiB.
+
+The application that runs in the enclave needs to be packaged in an enclave
+image together with the OS ( e.g. kernel, ramdisk, init ) that will run in the
+enclave VM. The enclave VM has its own kernel and follows the standard Linux
+boot protocol [6].
+
+The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
+Enclave Image Format (EIF); plus an EIF header including metadata such as magic
+number, eif version, image size and CRC.
+
+Hash values are computed for the entire enclave image (EIF), the kernel and
+ramdisk(s). That's used, for example, to check that the enclave image that is
+loaded in the enclave VM is the one that was intended to be run.
+
+These crypto measurements are included in a signed attestation document
+generated by the Nitro Hypervisor and further used to prove the identity of the
+enclave; KMS is an example of service that NE is integrated with and that checks
+the attestation doc.
+
+The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
+init process in the enclave connects to the vsock CID of the primary VM and a
+predefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
+used to check in the primary VM that the enclave has booted. The CID of the
+primary VM is 3.
+
+If the enclave VM crashes or gracefully exits, an interrupt event is received by
+the NE driver. This event is sent further to the user space enclave process
+running in the primary VM via a poll notification mechanism. Then the user space
+enclave process can exit.
+
+[1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/
+[2] https://www.kernel.org/doc/html/latest/admin-guide/mm/hugetlbpage.html
+[3] https://lwn.net/Articles/807108/
+[4] https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
+[5] https://man7.org/linux/man-pages/man7/vsock.7.html
+[6] https://www.kernel.org/doc/html/latest/x86/boot.html

From patchwork Fri Sep  4 17:37:18 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Paraschiv, Andra-Irina" <andraprs@amazon.com>
X-Patchwork-Id: 11758333
Return-Path: <SRS0=EfB3=CN=vger.kernel.org=kvm-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E75C61599
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:06 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id C547024655
	for <patchwork-kvm@patchwork.kernel.org>;
 Fri,  4 Sep 2020 18:02:06 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com
 header.b="MV0bTC2X"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1728277AbgIDRlW (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Fri, 4 Sep 2020 13:41:22 -0400
Received: from smtp-fw-33001.amazon.com ([207.171.190.10]:37493 "EHLO
        smtp-fw-33001.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1728088AbgIDRk4 (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 4 Sep 2020 13:40:56 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209;
  t=1599241257; x=1630777257;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=3WH1JXdCdbpYmzmEhgenY7vg+t0Pb4yn9+A2i1nvGdc=;
  b=MV0bTC2XzubinStFWBjlVJF8LC0xaung2G1F1iv9VywCOeO2P3J1zpks
   w3y8+4/kGrByzn/Ifm44b57TTI41X2Qdnk7jRA0XC+PkEeemPeclD46r0
   TYGYAlhDZkVxwJDpQyLBZM8fiya2wdBA/fxo7d3SMQEww+yrqxrF0/nSh
   8=;
X-IronPort-AV: E=Sophos;i="5.76,390,1592870400";
   d="scan'208";a="72480161"
Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO
 email-inbound-relay-1d-37fd6b3d.us-east-1.amazon.com) ([10.47.23.38])
  by smtp-border-fw-out-33001.sea14.amazon.com with ESMTP;
 04 Sep 2020 17:40:56 +0000
Received: from EX13D16EUB001.ant.amazon.com
 (iad55-ws-svc-p15-lb9-vlan3.iad.amazon.com [10.40.159.166])
        by email-inbound-relay-1d-37fd6b3d.us-east-1.amazon.com (Postfix) with
 ESMTPS id 37C00289B9C;
        Fri,  4 Sep 2020 17:40:52 +0000 (UTC)
Received: from 38f9d34ed3b1.ant.amazon.com (10.43.161.145) by
 EX13D16EUB001.ant.amazon.com (10.43.166.28) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Fri, 4 Sep 2020 17:40:42 +0000
From: Andra Paraschiv <andraprs@amazon.com>
To: linux-kernel <linux-kernel@vger.kernel.org>
CC: Anthony Liguori <aliguori@amazon.com>,
        Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Colm MacCarthaigh <colmmacc@amazon.com>,
        "David Duncan" <davdunc@amazon.com>,
        Bjoern Doebel <doebel@amazon.de>,
        "David Woodhouse" <dwmw@amazon.co.uk>,
        Frank van der Linden <fllinden@amazon.com>,
        Alexander Graf <graf@amazon.de>,
        Greg KH <gregkh@linuxfoundation.org>,
        "Karen Noel" <knoel@redhat.com>,
        Martin Pohlack <mpohlack@amazon.de>,
        Matt Wilson <msw@amazon.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Balbir Singh <sblbir@amazon.com>,
        Stefano Garzarella <sgarzare@redhat.com>,
        "Stefan Hajnoczi" <stefanha@redhat.com>,
        Stewart Smith <trawets@amazon.com>,
        "Uwe Dannowski" <uwed@amazon.de>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        kvm <kvm@vger.kernel.org>,
        ne-devel-upstream <ne-devel-upstream@amazon.com>,
        Andra Paraschiv <andraprs@amazon.com>
Subject: [PATCH v8 18/18] MAINTAINERS: Add entry for the Nitro Enclaves driver
Date: Fri, 4 Sep 2020 20:37:18 +0300
Message-ID: <20200904173718.64857-19-andraprs@amazon.com>
X-Mailer: git-send-email 2.20.1 (Apple Git-117)
In-Reply-To: <20200904173718.64857-1-andraprs@amazon.com>
References: <20200904173718.64857-1-andraprs@amazon.com>
MIME-Version: 1.0
X-Originating-IP: [10.43.161.145]
X-ClientProxiedBy: EX13D06UWC001.ant.amazon.com (10.43.162.91) To
 EX13D16EUB001.ant.amazon.com (10.43.166.28)
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
---
Changelog

v7 -> v8

* No changes.

v6 -> v7

* No changes.

v5 -> v6

* No changes.

v4 -> v5

* No changes.

v3 -> v4

* No changes.

v2 -> v3

* Update file entries to be in alphabetical order.

v1 -> v2

* No changes.
---
 MAINTAINERS | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e4647c84c987..73a9b4e9b04b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12269,6 +12269,19 @@ S:	Maintained
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2.git
 F:	arch/nios2/
 
+NITRO ENCLAVES (NE)
+M:	Andra Paraschiv <andraprs@amazon.com>
+M:	Alexandru Vasile <lexnv@amazon.com>
+M:	Alexandru Ciobotaru <alcioa@amazon.com>
+L:	linux-kernel@vger.kernel.org
+S:	Supported
+W:	https://aws.amazon.com/ec2/nitro/nitro-enclaves/
+F:	Documentation/nitro_enclaves/
+F:	drivers/virt/nitro_enclaves/
+F:	include/linux/nitro_enclaves.h
+F:	include/uapi/linux/nitro_enclaves.h
+F:	samples/nitro_enclaves/
+
 NOHZ, DYNTICKS SUPPORT
 M:	Frederic Weisbecker <fweisbec@gmail.com>
 M:	Thomas Gleixner <tglx@linutronix.de>