diff mbox series

[v24,08/24] x86/sgx: Enumerate and track EPC sections

Message ID 20191129231326.18076-9-jarkko.sakkinen@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series Intel SGX foundations | expand

Commit Message

Jarkko Sakkinen Nov. 29, 2019, 11:13 p.m. UTC
From: Sean Christopherson <sean.j.christopherson@intel.com>

Enumerate Enclave Page Cache (EPC) sections via CPUID and add the data
structures necessary to track EPC pages so that they can be allocated,
freed and managed. As a system may have multiple EPC sections, invoke CPUID
on SGX sub-leafs until an invalid leaf is encountered.

For simplicity, support a maximum of eight EPC sections. Existing client
hardware supports only a single section, while upcoming server hardware
will support at most eight sections. Bounding the number of sections also
allows the section ID to be embedded along with a page's offset in a single
unsigned long, enabling easy retrieval of both the VA and PA for a given
page.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Serge Ayoun <serge.ayoun@intel.com>
Signed-off-by: Serge Ayoun <serge.ayoun@intel.com>
Co-developed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 arch/x86/Kconfig                  |  14 +++
 arch/x86/kernel/cpu/Makefile      |   1 +
 arch/x86/kernel/cpu/sgx/Makefile  |   3 +
 arch/x86/kernel/cpu/sgx/main.c    | 154 ++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/reclaim.c |  87 +++++++++++++++++
 arch/x86/kernel/cpu/sgx/sgx.h     |  70 ++++++++++++++
 6 files changed, 329 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/sgx/Makefile
 create mode 100644 arch/x86/kernel/cpu/sgx/main.c
 create mode 100644 arch/x86/kernel/cpu/sgx/reclaim.c
 create mode 100644 arch/x86/kernel/cpu/sgx/sgx.h

Comments

Borislav Petkov Dec. 18, 2019, 9:18 a.m. UTC | #1
On Sat, Nov 30, 2019 at 01:13:10AM +0200, Jarkko Sakkinen wrote:
> +static bool __init sgx_alloc_epc_section(u64 addr, u64 size,
> +					 unsigned long index,
> +					 struct sgx_epc_section *section)
> +{
> +	unsigned long nr_pages = size >> PAGE_SHIFT;

I'm assuming here that size which gets communicated through CPUID -
which is an interesting way to communicate SGX settings in itself :-) - is
in multiples of 4K? SDM doesn't say...

And last time I asked:

"This size comes from CPUID but it might be prudent to sanity-check it
nevertheless, before doing the memremap()."

but it was left uncommented.

> +/**
> + * A section metric is concatenated in a way that @low bits 12-31 define the
> + * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
> + * metric.
> + */
> +static inline u64 __init sgx_calc_section_metric(u64 low, u64 high)
> +{
> +	return (low & GENMASK_ULL(31, 12)) +
> +	       ((high & GENMASK_ULL(19, 0)) << 32);
> +}
> +
> +static bool __init sgx_page_cache_init(void)
> +{
> +	u32 eax, ebx, ecx, edx, type;
> +	u64 pa, size;
> +	int i;
> +
> +	BUILD_BUG_ON(SGX_MAX_EPC_SECTIONS > (SGX_EPC_SECTION_MASK + 1));
> +
> +	for (i = 0; i < (SGX_MAX_EPC_SECTIONS + 1); i++) {

Those brackets are still here from the last time. You said:

"For nothing :-)

I'll change it as:

  for (i = 0; i <= SGX_MAX_EPC_SECTIONS; i++) {"

but probably forgot...

and looking at my review comments here:

https://lkml.kernel.org/r/20191005092627.GA25699@zn.tnic

and your reply:

https://lkml.kernel.org/r/20191007115850.GA20830@linux.intel.com

you clearly missed addressing some so I'm going to stop reviewing here.

Please have a look at those review comments again and check whether the
apply - and then do them - or they don't and they pls explain why they
don't.

And do that for the rest of the patchset, please, before you send it
again.

Thx.
Sean Christopherson Dec. 18, 2019, 3:19 p.m. UTC | #2
On Wed, Dec 18, 2019 at 10:18:56AM +0100, Borislav Petkov wrote:
> On Sat, Nov 30, 2019 at 01:13:10AM +0200, Jarkko Sakkinen wrote:
> > +static bool __init sgx_alloc_epc_section(u64 addr, u64 size,
> > +					 unsigned long index,
> > +					 struct sgx_epc_section *section)
> > +{
> > +	unsigned long nr_pages = size >> PAGE_SHIFT;
> 
> I'm assuming here that size which gets communicated through CPUID -
> which is an interesting way to communicate SGX settings in itself :-) - is
> in multiples of 4K? SDM doesn't say...

Yes, EPC pages are architecturally defined to be 4k sized and aligned.

  36.5 Enclave Page Cache

  The EPC is divided into EPC pages. An EPC page is 4KB in size and always
  aligned on a 4KB boundary.
Borislav Petkov Dec. 18, 2019, 4:18 p.m. UTC | #3
On Wed, Dec 18, 2019 at 07:19:44AM -0800, Sean Christopherson wrote:
> Yes, EPC pages are architecturally defined to be 4k sized and aligned.
> 
>   36.5 Enclave Page Cache
> 
>   The EPC is divided into EPC pages. An EPC page is 4KB in size and always
>   aligned on a 4KB boundary.

Aha, thx!
Jarkko Sakkinen Dec. 19, 2019, 12:53 a.m. UTC | #4
On Wed, 2019-12-18 at 10:18 +0100, Borislav Petkov wrote:
> On Sat, Nov 30, 2019 at 01:13:10AM +0200, Jarkko Sakkinen wrote:
> > +static bool __init sgx_alloc_epc_section(u64 addr, u64 size,
> > +					 unsigned long index,
> > +					 struct sgx_epc_section *section)
> > +{
> > +	unsigned long nr_pages = size >> PAGE_SHIFT;
> 
> I'm assuming here that size which gets communicated through CPUID -
> which is an interesting way to communicate SGX settings in itself :-) - is
> in multiples of 4K? SDM doesn't say...

Yes.

> And last time I asked:
> 
> "This size comes from CPUID but it might be prudent to sanity-check it
> nevertheless, before doing the memremap()."
> 
> but it was left uncommented.

I'm sorry about that. Not intended. I just forgot to deal with it or
missed it.

> > +/**
> > + * A section metric is concatenated in a way that @low bits 12-31 define the
> > + * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
> > + * metric.
> > + */
> > +static inline u64 __init sgx_calc_section_metric(u64 low, u64 high)
> > +{
> > +	return (low & GENMASK_ULL(31, 12)) +
> > +	       ((high & GENMASK_ULL(19, 0)) << 32);
> > +}
> > +
> > +static bool __init sgx_page_cache_init(void)
> > +{
> > +	u32 eax, ebx, ecx, edx, type;
> > +	u64 pa, size;
> > +	int i;
> > +
> > +	BUILD_BUG_ON(SGX_MAX_EPC_SECTIONS > (SGX_EPC_SECTION_MASK + 1));
> > +
> > +	for (i = 0; i < (SGX_MAX_EPC_SECTIONS + 1); i++) {
> 
> Those brackets are still here from the last time. You said:
> 
> "For nothing :-)
> 
> I'll change it as:
> 
>   for (i = 0; i <= SGX_MAX_EPC_SECTIONS; i++) {"
> 
> but probably forgot...
> 
> and looking at my review comments here:
> 
> https://lkml.kernel.org/r/20191005092627.GA25699@zn.tnic
> 
> and your reply:
> 
> https://lkml.kernel.org/r/20191007115850.GA20830@linux.intel.com
> 
> you clearly missed addressing some so I'm going to stop reviewing here.
> 
> Please have a look at those review comments again and check whether the
> apply - and then do them - or they don't and they pls explain why they
> don't.
> 
> And do that for the rest of the patchset, please, before you send it
> again.
> 
> Thx.

It is unintentional but I seriously do my best on keeping track of
things.  Sometimes when you multitask with maintaining other subsystems
and refactor huge patch set like this, it just happens, no matter how
well you try to organize your work.

I'll go through v23 comments with time before sending v25.

/Jarkko
diff mbox series

Patch

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d6e1faa28c58..8f2faadc447e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1940,6 +1940,20 @@  config X86_INTEL_MEMORY_PROTECTION_KEYS
 
 	  If unsure, say y.
 
+config INTEL_SGX
+	bool "Intel SGX"
+	depends on X86_64 && CPU_SUP_INTEL
+	select SRCU
+	select MMU_NOTIFIER
+	help
+	  Intel(R) SGX is a set of CPU instructions that can be used by
+	  applications to set aside private regions of code and data, referred
+	  to as enclaves. An enclave's private memory can only be accessed by
+	  code running within the enclave. Accesses from outside the enclave,
+	  including other enclaves, are disallowed by hardware.
+
+	  If unsure, say N.
+
 config EFI
 	bool "EFI runtime service support"
 	depends on ACPI
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index d7a1e5a9331c..97deac5108df 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -45,6 +45,7 @@  obj-$(CONFIG_X86_MCE)			+= mce/
 obj-$(CONFIG_MTRR)			+= mtrr/
 obj-$(CONFIG_MICROCODE)			+= microcode/
 obj-$(CONFIG_X86_CPU_RESCTRL)		+= resctrl/
+obj-$(CONFIG_INTEL_SGX)			+= sgx/
 
 obj-$(CONFIG_X86_LOCAL_APIC)		+= perfctr-watchdog.o
 
diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
new file mode 100644
index 000000000000..2dec75916a5e
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/Makefile
@@ -0,0 +1,3 @@ 
+obj-y += \
+	main.o \
+	reclaim.o
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
new file mode 100644
index 000000000000..f8ba10516eaf
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -0,0 +1,154 @@ 
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+// Copyright(c) 2016-17 Intel Corporation.
+
+#include <linux/freezer.h>
+#include <linux/highmem.h>
+#include <linux/kthread.h>
+#include <linux/pagemap.h>
+#include <linux/ratelimit.h>
+#include <linux/sched/signal.h>
+#include <linux/slab.h>
+#include "encls.h"
+
+struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
+int sgx_nr_epc_sections;
+
+static void __init sgx_free_epc_section(struct sgx_epc_section *section)
+{
+	struct sgx_epc_page *page;
+
+	while (!list_empty(&section->page_list)) {
+		page = list_first_entry(&section->page_list,
+					struct sgx_epc_page, list);
+		list_del(&page->list);
+		kfree(page);
+	}
+
+	while (!list_empty(&section->unsanitized_page_list)) {
+		page = list_first_entry(&section->unsanitized_page_list,
+					struct sgx_epc_page, list);
+		list_del(&page->list);
+		kfree(page);
+	}
+
+	memunmap(section->va);
+}
+
+static bool __init sgx_alloc_epc_section(u64 addr, u64 size,
+					 unsigned long index,
+					 struct sgx_epc_section *section)
+{
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+	struct sgx_epc_page *page;
+	unsigned long i;
+
+	section->va = memremap(addr, size, MEMREMAP_WB);
+	if (!section->va)
+		return false;
+
+	section->pa = addr;
+	spin_lock_init(&section->lock);
+	INIT_LIST_HEAD(&section->page_list);
+	INIT_LIST_HEAD(&section->unsanitized_page_list);
+
+	for (i = 0; i < nr_pages; i++) {
+		page = kzalloc(sizeof(*page), GFP_KERNEL);
+		if (!page)
+			goto err_out;
+
+		page->desc = (addr + (i << PAGE_SHIFT)) | index;
+		list_add_tail(&page->list, &section->unsanitized_page_list);
+	}
+
+	return true;
+
+err_out:
+	sgx_free_epc_section(section);
+	return false;
+}
+
+static void __init sgx_page_cache_teardown(void)
+{
+	int i;
+
+	for (i = 0; i < sgx_nr_epc_sections; i++)
+		sgx_free_epc_section(&sgx_epc_sections[i]);
+}
+
+/**
+ * A section metric is concatenated in a way that @low bits 12-31 define the
+ * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
+ * metric.
+ */
+static inline u64 __init sgx_calc_section_metric(u64 low, u64 high)
+{
+	return (low & GENMASK_ULL(31, 12)) +
+	       ((high & GENMASK_ULL(19, 0)) << 32);
+}
+
+static bool __init sgx_page_cache_init(void)
+{
+	u32 eax, ebx, ecx, edx, type;
+	u64 pa, size;
+	int i;
+
+	BUILD_BUG_ON(SGX_MAX_EPC_SECTIONS > (SGX_EPC_SECTION_MASK + 1));
+
+	for (i = 0; i < (SGX_MAX_EPC_SECTIONS + 1); i++) {
+		cpuid_count(SGX_CPUID, i + SGX_CPUID_FIRST_VARIABLE_SUB_LEAF,
+			    &eax, &ebx, &ecx, &edx);
+
+		type = eax & SGX_CPUID_SUB_LEAF_TYPE_MASK;
+		if (type == SGX_CPUID_SUB_LEAF_INVALID)
+			break;
+
+		if (type != SGX_CPUID_SUB_LEAF_EPC_SECTION) {
+			pr_err_once("Unknown sub-leaf type: %u\n", type);
+			break;
+		}
+
+		if (i == SGX_MAX_EPC_SECTIONS) {
+			pr_warn("More than %d EPC sections\n",
+				SGX_MAX_EPC_SECTIONS);
+			break;
+		}
+
+		pa = sgx_calc_section_metric(eax, ebx);
+		size = sgx_calc_section_metric(ecx, edx);
+
+		pr_info("EPC section 0x%llx-0x%llx\n", pa, pa + size - 1);
+
+		if (!sgx_alloc_epc_section(pa, size, i, &sgx_epc_sections[i])) {
+			pr_err("No memory for the EPC section\n");
+			break;
+		}
+
+		sgx_nr_epc_sections++;
+	}
+
+	if (!sgx_nr_epc_sections) {
+		pr_err("There are zero EPC sections.\n");
+		return false;
+	}
+
+	return true;
+}
+
+static void __init sgx_init(void)
+{
+	if (!boot_cpu_has(X86_FEATURE_SGX))
+		return;
+
+	if (!sgx_page_cache_init())
+		return;
+
+	if (!sgx_page_reclaimer_init())
+		goto err_page_cache;
+
+	return;
+
+err_page_cache:
+	sgx_page_cache_teardown();
+}
+
+arch_initcall(sgx_init);
diff --git a/arch/x86/kernel/cpu/sgx/reclaim.c b/arch/x86/kernel/cpu/sgx/reclaim.c
new file mode 100644
index 000000000000..f071158d34f6
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/reclaim.c
@@ -0,0 +1,87 @@ 
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+// Copyright(c) 2016-19 Intel Corporation.
+
+#include <linux/freezer.h>
+#include <linux/highmem.h>
+#include <linux/kthread.h>
+#include <linux/pagemap.h>
+#include <linux/ratelimit.h>
+#include <linux/slab.h>
+#include <linux/sched/mm.h>
+#include <linux/sched/signal.h>
+#include "encls.h"
+
+struct task_struct *ksgxswapd_tsk;
+
+/*
+ * Reset all pages to uninitialized state. Pages could be in initialized on
+ * kmemexec.
+ */
+static void sgx_sanitize_section(struct sgx_epc_section *section)
+{
+	struct sgx_epc_page *page, *tmp;
+	LIST_HEAD(secs_list);
+	int ret;
+
+	while (!list_empty(&section->unsanitized_page_list)) {
+		if (kthread_should_stop())
+			return;
+
+		spin_lock(&section->lock);
+
+		page = list_first_entry(&section->unsanitized_page_list,
+					struct sgx_epc_page, list);
+
+		ret = __eremove(sgx_epc_addr(page));
+		if (!ret)
+			list_move(&page->list, &section->page_list);
+		else
+			list_move_tail(&page->list, &secs_list);
+
+		spin_unlock(&section->lock);
+
+		cond_resched();
+	}
+
+	list_for_each_entry_safe(page, tmp, &secs_list, list) {
+		if (kthread_should_stop())
+			return;
+
+		ret = __eremove(sgx_epc_addr(page));
+		if (!WARN_ON_ONCE(ret)) {
+			spin_lock(&section->lock);
+			list_move(&page->list, &section->page_list);
+			spin_unlock(&section->lock);
+		} else {
+			list_del(&page->list);
+			kfree(page);
+		}
+
+		cond_resched();
+	}
+}
+
+static int ksgxswapd(void *p)
+{
+	int i;
+
+	set_freezable();
+
+	for (i = 0; i < sgx_nr_epc_sections; i++)
+		sgx_sanitize_section(&sgx_epc_sections[i]);
+
+	return 0;
+}
+
+bool __init sgx_page_reclaimer_init(void)
+{
+	struct task_struct *tsk;
+
+	tsk = kthread_run(ksgxswapd, NULL, "ksgxswapd");
+	if (IS_ERR(tsk))
+		return false;
+
+	ksgxswapd_tsk = tsk;
+
+	return true;
+}
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
new file mode 100644
index 000000000000..9d8036f997b1
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -0,0 +1,70 @@ 
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
+#ifndef _X86_SGX_H
+#define _X86_SGX_H
+
+#include <linux/bitops.h>
+#include <linux/err.h>
+#include <linux/io.h>
+#include <linux/rwsem.h>
+#include <linux/types.h>
+#include <asm/asm.h>
+#include "arch.h"
+
+#undef pr_fmt
+#define pr_fmt(fmt) "sgx: " fmt
+
+struct sgx_epc_page {
+	unsigned long desc;
+	struct list_head list;
+};
+
+/**
+ * struct sgx_epc_section
+ *
+ * The firmware can define multiple chunks of EPC to the different areas of the
+ * physical memory e.g. for memory areas of the each node. This structure is
+ * used to store EPC pages for one EPC section and virtual memory area where
+ * the pages have been mapped.
+ */
+struct sgx_epc_section {
+	unsigned long pa;
+	void *va;
+	struct list_head page_list;
+	struct list_head unsanitized_page_list;
+	spinlock_t lock;
+};
+
+#define SGX_MAX_EPC_SECTIONS	8
+
+extern struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
+
+/**
+ * enum sgx_epc_page_desc - bits and masks for an EPC page's descriptor
+ * %SGX_EPC_SECTION_MASK:	SGX allows to have multiple EPC sections in the
+ *				physical memory. The existing and near-future
+ *				hardware defines at most eight sections, hence
+ *				three bits to hold a section.
+ */
+enum sgx_epc_page_desc {
+	SGX_EPC_SECTION_MASK			= GENMASK_ULL(3, 0),
+	/* bits 12-63 are reserved for the physical page address of the page */
+};
+
+static inline struct sgx_epc_section *sgx_epc_section(struct sgx_epc_page *page)
+{
+	return &sgx_epc_sections[page->desc & SGX_EPC_SECTION_MASK];
+}
+
+static inline void *sgx_epc_addr(struct sgx_epc_page *page)
+{
+	struct sgx_epc_section *section = sgx_epc_section(page);
+
+	return section->va + (page->desc & PAGE_MASK) - section->pa;
+}
+
+extern int sgx_nr_epc_sections;
+extern struct task_struct *ksgxswapd_tsk;
+
+bool __init sgx_page_reclaimer_init(void);
+
+#endif /* _X86_SGX_H */