Message ID | 20190903142655.21943-9-jarkko.sakkinen@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Intel SGX foundations | expand |
On Tue, Sep 03, 2019 at 05:26:39PM +0300, Jarkko Sakkinen wrote: > From: Sean Christopherson <sean.j.christopherson@intel.com> > > Enumerate Enclave Page Cache (EPC) sections via CPUID and add the data > structures necessary to track EPC pages so that they can be allocated, > freed and managed. As a system may have multiple EPC sections, invoke > CPUID on SGX sub-leafs until an invalid leaf is encountered. > > On NUMA systems, a node can have at most one bank. A bank can be at Is that a DRAM bank or what exactly is a "bank" here? > most part of two nodes. SGX supports both nodes with a single memory > controller and also sub-cluster nodes with severals memory controllers s/severals/several/ > on a single die. > > For simplicity, support a maximum of eight EPC sections. Exisiting s/Exisiting/Existing/g Please introduce a spellchecker into your patch creation workflow and run all your text through it. > client hardware supports only a single section, while upcoming server > hardware will support at most eight sections. Bounding the number of > sections also allows the section ID to be embedded along with a page's > offset in a single unsigned long, enabling easy retrieval of both the > VA and PA for a given page. > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > Co-developed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> > Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> > Co-developed-by: Suresh Siddha <suresh.b.siddha@intel.com> > Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> > Co-developed-by: Serge Ayoun <serge.ayoun@intel.com> > Signed-off-by: Serge Ayoun <serge.ayoun@intel.com> As before, your SOB needs to come last as you're handling the patch now but you know already. :) > --- > arch/x86/Kconfig | 14 +++ > arch/x86/kernel/cpu/Makefile | 1 + > arch/x86/kernel/cpu/sgx/Makefile | 2 +- > arch/x86/kernel/cpu/sgx/main.c | 158 ++++++++++++++++++++++++++++++ > arch/x86/kernel/cpu/sgx/reclaim.c | 84 ++++++++++++++++ > arch/x86/kernel/cpu/sgx/sgx.h | 67 +++++++++++++ > 6 files changed, 325 insertions(+), 1 deletion(-) > create mode 100644 arch/x86/kernel/cpu/sgx/main.c > create mode 100644 arch/x86/kernel/cpu/sgx/reclaim.c > create mode 100644 arch/x86/kernel/cpu/sgx/sgx.h > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 222855cc0158..2a8988aaa074 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -1934,6 +1934,20 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS > > If unsure, say y. > > +config INTEL_SGX > + bool "Intel SGX core functionality" This sounds like there's other functionality which will have a separate config option(s) ? It is not in this patchset though... > + depends on X86_64 && CPU_SUP_INTEL > + select SRCU > + select MMU_NOTIFIER > + help > + Intel(R) SGX is a set of CPU instructions that can be used by > + applications to set aside private regions of code and data, referred > + to as enclaves. An enclave's private memory can only be accessed by > + code running within the enclave. Accesses from outside the enclave, > + including other enclaves, are disallowed by hardware. > + > + If unsure, say N. > + > config EFI > bool "EFI runtime service support" > depends on ACPI > diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile > index d7a1e5a9331c..97deac5108df 100644 > --- a/arch/x86/kernel/cpu/Makefile > +++ b/arch/x86/kernel/cpu/Makefile > @@ -45,6 +45,7 @@ obj-$(CONFIG_X86_MCE) += mce/ > obj-$(CONFIG_MTRR) += mtrr/ > obj-$(CONFIG_MICROCODE) += microcode/ > obj-$(CONFIG_X86_CPU_RESCTRL) += resctrl/ > +obj-$(CONFIG_INTEL_SGX) += sgx/ > > obj-$(CONFIG_X86_LOCAL_APIC) += perfctr-watchdog.o > > diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile > index 4432d935894e..fa930e292110 100644 > --- a/arch/x86/kernel/cpu/sgx/Makefile > +++ b/arch/x86/kernel/cpu/sgx/Makefile > @@ -1 +1 @@ > -obj-y += encls.o > +obj-y += encls.o main.o reclaim.o > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c > new file mode 100644 > index 000000000000..e2317f6e4374 > --- /dev/null > +++ b/arch/x86/kernel/cpu/sgx/main.c > @@ -0,0 +1,158 @@ > +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) > +// Copyright(c) 2016-17 Intel Corporation. > + > +#include <linux/freezer.h> > +#include <linux/highmem.h> > +#include <linux/kthread.h> > +#include <linux/pagemap.h> > +#include <linux/ratelimit.h> > +#include <linux/sched/signal.h> > +#include <linux/slab.h> > +#include "arch.h" > +#include "sgx.h" > + > +struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS]; > +EXPORT_SYMBOL_GPL(sgx_epc_sections); This export gets removed again in patch 12. Please audit your whole patchset for stuff being added and later removed and kill it so that the diffstat is decreased. > + > +int sgx_nr_epc_sections; > + > +static __init void sgx_free_epc_section(struct sgx_epc_section *section) > +{ > + struct sgx_epc_page *page; > + > + while (!list_empty(§ion->page_list)) { > + page = list_first_entry(§ion->page_list, > + struct sgx_epc_page, list); > + list_del(&page->list); > + kfree(page); > + } > + > + while (!list_empty(§ion->unsanitized_page_list)) { > + page = list_first_entry(§ion->unsanitized_page_list, > + struct sgx_epc_page, list); > + list_del(&page->list); > + kfree(page); > + } > + > + memunmap(section->va); > +} > + > +static __init int sgx_init_epc_section(u64 addr, u64 size, unsigned long index, > + struct sgx_epc_section *section) If the "free" function above is the counterpart of this, then this should be called sgx_alloc_epc_section() or so. > +{ > + unsigned long nr_pages = size >> PAGE_SHIFT; > + struct sgx_epc_page *page; > + unsigned long i; > + > + section->va = memremap(addr, size, MEMREMAP_WB); > + if (!section->va) > + return -ENOMEM; > + > + section->pa = addr; > + spin_lock_init(§ion->lock); > + INIT_LIST_HEAD(§ion->page_list); > + INIT_LIST_HEAD(§ion->unsanitized_page_list); > + > + for (i = 0; i < nr_pages; i++) { > + page = kzalloc(sizeof(*page), GFP_KERNEL); > + if (!page) > + goto out; <---- newline here. > + page->desc = (addr + (i << PAGE_SHIFT)) | index; > + list_add_tail(&page->list, §ion->unsanitized_page_list); > + section->free_cnt++; > + } > + > + return 0; > +out: > + sgx_free_epc_section(section); > + return -ENOMEM; > +} > + > +static __init void sgx_page_cache_teardown(void) > +{ > + int i; > + > + for (i = 0; i < sgx_nr_epc_sections; i++) > + sgx_free_epc_section(&sgx_epc_sections[i]); > +} > + > +/** > + * A section metric is concatenated in a way that @low bits 12-31 define the > + * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the > + * metric. > + */ > +static inline u64 sgx_calc_section_metric(u64 low, u64 high) > +{ > + return (low & GENMASK_ULL(31, 12)) + > + ((high & GENMASK_ULL(19, 0)) << 32); > +} > + > +static __init int sgx_page_cache_init(void) > +{ > + u32 eax, ebx, ecx, edx, type; > + u64 pa, size; > + int ret; > + int i; > + > + BUILD_BUG_ON(SGX_MAX_EPC_SECTIONS > (SGX_EPC_SECTION_MASK + 1)); > + > + for (i = 0; i < (SGX_MAX_EPC_SECTIONS + 1); i++) { ^ ^ - what are those brackets for? > + cpuid_count(SGX_CPUID, i + SGX_CPUID_FIRST_VARIABLE_SUB_LEAF, > + &eax, &ebx, &ecx, &edx); > + > + type = eax & SGX_CPUID_SUB_LEAF_TYPE_MASK; > + if (type == SGX_CPUID_SUB_LEAF_INVALID) > + break; <---- newline here. > + if (type != SGX_CPUID_SUB_LEAF_EPC_SECTION) { > + pr_err_once("sgx: Unknown sub-leaf type: %u\n", type); ^^^^ That's done with: #undef pr_fmt #define pr_fmt(fmt) "sgx: " fmt for the whole compilation unit or you can simply raise it into sgx.h for the whole sgx pile. > + return -ENODEV; > + } <---- newline here. Yeah, let's space out those a bit, for better readability. > + if (i == SGX_MAX_EPC_SECTIONS) { > + pr_warn("sgx: More than " > + __stringify(SGX_MAX_EPC_SECTIONS) > + " EPC sections\n"); Huh, what's wrong with using "%d" like a normal printk does? > + break; > + } > + > + pa = sgx_calc_section_metric(eax, ebx); > + size = sgx_calc_section_metric(ecx, edx); This size comes from CPUID but it might be prudent to sanity-check it nevertheless, before doing the memremap(). > + pr_info("sgx: EPC section 0x%llx-0x%llx\n", pa, pa + size - 1); > + > + ret = sgx_init_epc_section(pa, size, i, &sgx_epc_sections[i]); > + if (ret) { > + sgx_page_cache_teardown(); So even if one section fails to allocate, we teardown the whole thing? I.e., can't run with only 7 or so? IOW, do we absolutely have to fail here or can we fail more gracefully? > + return ret; > + } > + > + sgx_nr_epc_sections++; > + } > + > + if (!sgx_nr_epc_sections) { > + pr_err("sgx: There are zero EPC sections.\n"); > + return -ENODEV; > + } > + > + return 0; > +} > + > +static __init int sgx_init(void) > +{ > + int ret; > + > + if (!boot_cpu_has(X86_FEATURE_SGX)) > + return false; > + > + ret = sgx_page_cache_init(); > + if (ret) > + return ret; > + > + ret = sgx_page_reclaimer_init(); > + if (ret) { > + sgx_page_cache_teardown(); > + return ret; > + } > + > + return 0; > +} > + > +arch_initcall(sgx_init); Why does this have to be an arch initcall and can't it run after detect_sgx() in init_intel()? You'd need to run it only once but that's easy. > diff --git a/arch/x86/kernel/cpu/sgx/reclaim.c b/arch/x86/kernel/cpu/sgx/reclaim.c > new file mode 100644 > index 000000000000..042769f03be9 > --- /dev/null > +++ b/arch/x86/kernel/cpu/sgx/reclaim.c > @@ -0,0 +1,84 @@ > +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) > +// Copyright(c) 2016-19 Intel Corporation. > + > +#include <linux/freezer.h> > +#include <linux/highmem.h> > +#include <linux/kthread.h> > +#include <linux/pagemap.h> > +#include <linux/ratelimit.h> > +#include <linux/slab.h> > +#include <linux/sched/mm.h> > +#include <linux/sched/signal.h> > +#include "encls.h" > +#include "sgx.h" > + > +static struct task_struct *ksgxswapd_tsk; > + > +static void sgx_sanitize_section(struct sgx_epc_section *section) > +{ > + struct sgx_epc_page *page, *tmp; > + LIST_HEAD(secs_list); > + int ret; > + > + while (!list_empty(§ion->unsanitized_page_list)) { > + if (kthread_should_stop()) > + return; > + > + spin_lock(§ion->lock); > + > + page = list_first_entry(§ion->unsanitized_page_list, > + struct sgx_epc_page, list); > + > + ret = __eremove(sgx_epc_addr(page)); > + if (!ret) > + list_move(&page->list, §ion->page_list); > + else > + list_move_tail(&page->list, &secs_list); > + > + spin_unlock(§ion->lock); > + > + cond_resched(); > + } > + > + list_for_each_entry_safe(page, tmp, &secs_list, list) { > + if (kthread_should_stop()) > + return; > + > + ret = __eremove(sgx_epc_addr(page)); > + if (!WARN_ON_ONCE(ret)) { > + spin_lock(§ion->lock); > + list_move(&page->list, §ion->page_list); > + spin_unlock(§ion->lock); > + } else { > + list_del(&page->list); > + kfree(page); > + } > + > + cond_resched(); > + } > +} I could use a sentence or two above this function explaining what the idea behind those page lists is and why we're moving off pages to and from lists, what the unsanitized_page_list is, how it is being used, etc. That probably has come up already so pointing me to the text would suffice too. > + > +static int ksgxswapd(void *p) > +{ > + int i; > + > + set_freezable(); > + > + for (i = 0; i < sgx_nr_epc_sections; i++) > + sgx_sanitize_section(&sgx_epc_sections[i]); > + > + return 0; > +} > + > +int sgx_page_reclaimer_init(void) > +{ > + struct task_struct *tsk; > + > + tsk = kthread_run(ksgxswapd, NULL, "ksgxswapd"); > + if (IS_ERR(tsk)) > + return PTR_ERR(tsk); > + > + ksgxswapd_tsk = tsk; > + > + return 0; > +} > diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h > new file mode 100644 > index 000000000000..3009ec816339 > --- /dev/null > +++ b/arch/x86/kernel/cpu/sgx/sgx.h > @@ -0,0 +1,67 @@ > +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */ > +#ifndef _X86_SGX_H > +#define _X86_SGX_H > + > +#include <linux/bitops.h> > +#include <linux/err.h> > +#include <linux/io.h> > +#include <linux/rwsem.h> > +#include <linux/types.h> > +#include <asm/asm.h> > +#include <uapi/asm/sgx_errno.h> > + > +struct sgx_epc_page { > + unsigned long desc; > + struct list_head list; > +}; > + > +/** > + * struct sgx_epc_section > + * > + * The firmware can define multiple chunks of EPC to the different areas of the My usual question: what if fw doesn't? Can we define our own chunks or do we need special firmware support for the whole EPC thing to even exist? > + * physical memory e.g. for memory areas of the each node. This structure is > + * used to store EPC pages for one EPC section and virtual memory area where > + * the pages have been mapped. > + */ Thx.
On Sat, Oct 05, 2019 at 11:26:27AM +0200, Borislav Petkov wrote: > On Tue, Sep 03, 2019 at 05:26:39PM +0300, Jarkko Sakkinen wrote: > > From: Sean Christopherson <sean.j.christopherson@intel.com> > > > > Enumerate Enclave Page Cache (EPC) sections via CPUID and add the data > > structures necessary to track EPC pages so that they can be allocated, > > freed and managed. As a system may have multiple EPC sections, invoke > > CPUID on SGX sub-leafs until an invalid leaf is encountered. > > > > On NUMA systems, a node can have at most one bank. A bank can be at > > Is that a DRAM bank or what exactly is a "bank" here? > > > most part of two nodes. SGX supports both nodes with a single memory > > controller and also sub-cluster nodes with severals memory controllers > > s/severals/several/ > > > on a single die. > > > > For simplicity, support a maximum of eight EPC sections. Exisiting > > s/Exisiting/Existing/g > > Please introduce a spellchecker into your patch creation workflow and > run all your text through it. Cannot disagree. Will do for the next version. > > client hardware supports only a single section, while upcoming server > > hardware will support at most eight sections. Bounding the number of > > sections also allows the section ID to be embedded along with a page's > > offset in a single unsigned long, enabling easy retrieval of both the > > VA and PA for a given page. > > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > > Co-developed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> > > Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> > > Co-developed-by: Suresh Siddha <suresh.b.siddha@intel.com> > > Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> > > Co-developed-by: Serge Ayoun <serge.ayoun@intel.com> > > Signed-off-by: Serge Ayoun <serge.ayoun@intel.com> > > As before, your SOB needs to come last as you're handling the patch now > but you know already. :) Already fixed :) https://github.com/jsakkine-intel/linux-sgx/commit/20d378e8e55d821fcef8e53babfe26c40388ca04 Did go through all commits, there were a few these. The whole CDB is a new thing for me. Thanks for pointing out how to use it correctly. > > > --- > > arch/x86/Kconfig | 14 +++ > > arch/x86/kernel/cpu/Makefile | 1 + > > arch/x86/kernel/cpu/sgx/Makefile | 2 +- > > arch/x86/kernel/cpu/sgx/main.c | 158 ++++++++++++++++++++++++++++++ > > arch/x86/kernel/cpu/sgx/reclaim.c | 84 ++++++++++++++++ > > arch/x86/kernel/cpu/sgx/sgx.h | 67 +++++++++++++ > > 6 files changed, 325 insertions(+), 1 deletion(-) > > create mode 100644 arch/x86/kernel/cpu/sgx/main.c > > create mode 100644 arch/x86/kernel/cpu/sgx/reclaim.c > > create mode 100644 arch/x86/kernel/cpu/sgx/sgx.h > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > index 222855cc0158..2a8988aaa074 100644 > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -1934,6 +1934,20 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS > > > > If unsure, say y. > > > > +config INTEL_SGX > > + bool "Intel SGX core functionality" > > This sounds like there's other functionality which will have a separate > config option(s) ? > > It is not in this patchset though... It's cruft when there still was a conditional driver. Will fix. > > + depends on X86_64 && CPU_SUP_INTEL > > + select SRCU > > + select MMU_NOTIFIER > > + help > > + Intel(R) SGX is a set of CPU instructions that can be used by > > + applications to set aside private regions of code and data, referred > > + to as enclaves. An enclave's private memory can only be accessed by > > + code running within the enclave. Accesses from outside the enclave, > > + including other enclaves, are disallowed by hardware. > > + > > + If unsure, say N. > > + > > config EFI > > bool "EFI runtime service support" > > depends on ACPI > > diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile > > index d7a1e5a9331c..97deac5108df 100644 > > --- a/arch/x86/kernel/cpu/Makefile > > +++ b/arch/x86/kernel/cpu/Makefile > > @@ -45,6 +45,7 @@ obj-$(CONFIG_X86_MCE) += mce/ > > obj-$(CONFIG_MTRR) += mtrr/ > > obj-$(CONFIG_MICROCODE) += microcode/ > > obj-$(CONFIG_X86_CPU_RESCTRL) += resctrl/ > > +obj-$(CONFIG_INTEL_SGX) += sgx/ > > > > obj-$(CONFIG_X86_LOCAL_APIC) += perfctr-watchdog.o > > > > diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile > > index 4432d935894e..fa930e292110 100644 > > --- a/arch/x86/kernel/cpu/sgx/Makefile > > +++ b/arch/x86/kernel/cpu/sgx/Makefile > > @@ -1 +1 @@ > > -obj-y += encls.o > > +obj-y += encls.o main.o reclaim.o > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c > > new file mode 100644 > > index 000000000000..e2317f6e4374 > > --- /dev/null > > +++ b/arch/x86/kernel/cpu/sgx/main.c > > @@ -0,0 +1,158 @@ > > +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) > > +// Copyright(c) 2016-17 Intel Corporation. > > + > > +#include <linux/freezer.h> > > +#include <linux/highmem.h> > > +#include <linux/kthread.h> > > +#include <linux/pagemap.h> > > +#include <linux/ratelimit.h> > > +#include <linux/sched/signal.h> > > +#include <linux/slab.h> > > +#include "arch.h" > > +#include "sgx.h" > > + > > +struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS]; > > +EXPORT_SYMBOL_GPL(sgx_epc_sections); > > This export gets removed again in patch 12. Please audit your whole > patchset for stuff being added and later removed and kill it so that the > diffstat is decreased. Yeah, I'll try to carefully go through all patches and check these. The easily slip when you reorganize constantly a huge patch set. > > > + > > +int sgx_nr_epc_sections; > > + > > +static __init void sgx_free_epc_section(struct sgx_epc_section *section) > > +{ > > + struct sgx_epc_page *page; > > + > > + while (!list_empty(§ion->page_list)) { > > + page = list_first_entry(§ion->page_list, > > + struct sgx_epc_page, list); > > + list_del(&page->list); > > + kfree(page); > > + } > > + > > + while (!list_empty(§ion->unsanitized_page_list)) { > > + page = list_first_entry(§ion->unsanitized_page_list, > > + struct sgx_epc_page, list); > > + list_del(&page->list); > > + kfree(page); > > + } > > + > > + memunmap(section->va); > > +} > > + > > +static __init int sgx_init_epc_section(u64 addr, u64 size, unsigned long index, > > + struct sgx_epc_section *section) > > If the "free" function above is the counterpart of this, then this > should be called sgx_alloc_epc_section() or so. Agreed. > > +{ > > + unsigned long nr_pages = size >> PAGE_SHIFT; > > + struct sgx_epc_page *page; > > + unsigned long i; > > + > > + section->va = memremap(addr, size, MEMREMAP_WB); > > + if (!section->va) > > + return -ENOMEM; > > + > > + section->pa = addr; > > + spin_lock_init(§ion->lock); > > + INIT_LIST_HEAD(§ion->page_list); > > + INIT_LIST_HEAD(§ion->unsanitized_page_list); > > + > > + for (i = 0; i < nr_pages; i++) { > > + page = kzalloc(sizeof(*page), GFP_KERNEL); > > + if (!page) > > + goto out; > > <---- newline here. > > > + page->desc = (addr + (i << PAGE_SHIFT)) | index; > > + list_add_tail(&page->list, §ion->unsanitized_page_list); > > + section->free_cnt++; > > + } > > + > > + return 0; > > +out: > > + sgx_free_epc_section(section); > > + return -ENOMEM; > > +} > > + > > +static __init void sgx_page_cache_teardown(void) > > +{ > > + int i; > > + > > + for (i = 0; i < sgx_nr_epc_sections; i++) > > + sgx_free_epc_section(&sgx_epc_sections[i]); > > +} > > + > > +/** > > + * A section metric is concatenated in a way that @low bits 12-31 define the > > + * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the > > + * metric. > > + */ > > +static inline u64 sgx_calc_section_metric(u64 low, u64 high) > > +{ > > + return (low & GENMASK_ULL(31, 12)) + > > + ((high & GENMASK_ULL(19, 0)) << 32); > > +} > > + > > +static __init int sgx_page_cache_init(void) > > +{ > > + u32 eax, ebx, ecx, edx, type; > > + u64 pa, size; > > + int ret; > > + int i; > > + > > + BUILD_BUG_ON(SGX_MAX_EPC_SECTIONS > (SGX_EPC_SECTION_MASK + 1)); > > + > > + for (i = 0; i < (SGX_MAX_EPC_SECTIONS + 1); i++) { > > ^ ^ - what are those brackets for? For nothing :-) I'll change it as: for (i = 0; i <= SGX_MAX_EPC_SECTIONS; i++) { > > > + cpuid_count(SGX_CPUID, i + SGX_CPUID_FIRST_VARIABLE_SUB_LEAF, > > + &eax, &ebx, &ecx, &edx); > > + > > + type = eax & SGX_CPUID_SUB_LEAF_TYPE_MASK; > > + if (type == SGX_CPUID_SUB_LEAF_INVALID) > > + break; > > <---- newline here. > > > + if (type != SGX_CPUID_SUB_LEAF_EPC_SECTION) { > > + pr_err_once("sgx: Unknown sub-leaf type: %u\n", type); > ^^^^ > > That's done with: > > #undef pr_fmt > #define pr_fmt(fmt) "sgx: " fmt > > for the whole compilation unit or you can simply raise it into sgx.h for > the whole sgx pile. I think I had that in some earlier patch set version. Do not see any good reason why not anymore so I'll just add it back. > > > > + return -ENODEV; > > + } > > <---- newline here. > > Yeah, let's space out those a bit, for better readability. > > > + if (i == SGX_MAX_EPC_SECTIONS) { > > + pr_warn("sgx: More than " > > + __stringify(SGX_MAX_EPC_SECTIONS) > > + " EPC sections\n"); > > Huh, what's wrong with using "%d" like a normal printk does? Will change. > > > + break; > > + } > > + > > + pa = sgx_calc_section_metric(eax, ebx); > > + size = sgx_calc_section_metric(ecx, edx); > > This size comes from CPUID but it might be prudent to sanity-check it > nevertheless, before doing the memremap(). > > > + pr_info("sgx: EPC section 0x%llx-0x%llx\n", pa, pa + size - 1); > > + > > + ret = sgx_init_epc_section(pa, size, i, &sgx_epc_sections[i]); > > + if (ret) { > > + sgx_page_cache_teardown(); > > So even if one section fails to allocate, we teardown the whole thing? > I.e., can't run with only 7 or so? IOW, do we absolutely have to fail > here or can we fail more gracefully? A good point. I think what you say makes sense. I think we can initialize even if not all EPC sectionns get allocated (with a warning to klog). > > > + return ret; > > + } > > + > > + sgx_nr_epc_sections++; > > + } > > + > > + if (!sgx_nr_epc_sections) { > > + pr_err("sgx: There are zero EPC sections.\n"); > > + return -ENODEV; > > + } > > + > > + return 0; > > +} > > + > > +static __init int sgx_init(void) > > +{ > > + int ret; > > + > > + if (!boot_cpu_has(X86_FEATURE_SGX)) > > + return false; > > + > > + ret = sgx_page_cache_init(); > > + if (ret) > > + return ret; > > + > > + ret = sgx_page_reclaimer_init(); > > + if (ret) { > > + sgx_page_cache_teardown(); > > + return ret; > > + } > > + > > + return 0; > > +} > > + > > +arch_initcall(sgx_init); > > Why does this have to be an arch initcall and can't it run after > detect_sgx() in init_intel()? You'd need to run it only once but that's > easy. Yeah, why not. I like the idea. Would make things more static and predictable. > > > diff --git a/arch/x86/kernel/cpu/sgx/reclaim.c b/arch/x86/kernel/cpu/sgx/reclaim.c > > new file mode 100644 > > index 000000000000..042769f03be9 > > --- /dev/null > > +++ b/arch/x86/kernel/cpu/sgx/reclaim.c > > @@ -0,0 +1,84 @@ > > +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) > > +// Copyright(c) 2016-19 Intel Corporation. > > + > > +#include <linux/freezer.h> > > +#include <linux/highmem.h> > > +#include <linux/kthread.h> > > +#include <linux/pagemap.h> > > +#include <linux/ratelimit.h> > > +#include <linux/slab.h> > > +#include <linux/sched/mm.h> > > +#include <linux/sched/signal.h> > > +#include "encls.h" > > +#include "sgx.h" > > + > > +static struct task_struct *ksgxswapd_tsk; > > + > > +static void sgx_sanitize_section(struct sgx_epc_section *section) > > +{ > > + struct sgx_epc_page *page, *tmp; > > + LIST_HEAD(secs_list); > > + int ret; > > + > > + while (!list_empty(§ion->unsanitized_page_list)) { > > + if (kthread_should_stop()) > > + return; > > + > > + spin_lock(§ion->lock); > > + > > + page = list_first_entry(§ion->unsanitized_page_list, > > + struct sgx_epc_page, list); > > + > > + ret = __eremove(sgx_epc_addr(page)); > > + if (!ret) > > + list_move(&page->list, §ion->page_list); > > + else > > + list_move_tail(&page->list, &secs_list); > > + > > + spin_unlock(§ion->lock); > > + > > + cond_resched(); > > + } > > + > > + list_for_each_entry_safe(page, tmp, &secs_list, list) { > > + if (kthread_should_stop()) > > + return; > > + > > + ret = __eremove(sgx_epc_addr(page)); > > + if (!WARN_ON_ONCE(ret)) { > > + spin_lock(§ion->lock); > > + list_move(&page->list, §ion->page_list); > > + spin_unlock(§ion->lock); > > + } else { > > + list_del(&page->list); > > + kfree(page); > > + } > > + > > + cond_resched(); > > + } > > +} > > I could use a sentence or two above this function explaining what the > idea behind those page lists is and why we're moving off pages to and > from lists, what the unsanitized_page_list is, how it is being used, > etc. That probably has come up already so pointing me to the text would > suffice too. Sure. > > > + > > +static int ksgxswapd(void *p) > > +{ > > + int i; > > + > > + set_freezable(); > > + > > + for (i = 0; i < sgx_nr_epc_sections; i++) > > + sgx_sanitize_section(&sgx_epc_sections[i]); > > + > > + return 0; > > +} > > + > > +int sgx_page_reclaimer_init(void) > > +{ > > + struct task_struct *tsk; > > + > > + tsk = kthread_run(ksgxswapd, NULL, "ksgxswapd"); > > + if (IS_ERR(tsk)) > > + return PTR_ERR(tsk); > > + > > + ksgxswapd_tsk = tsk; > > + > > + return 0; > > +} > > diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h > > new file mode 100644 > > index 000000000000..3009ec816339 > > --- /dev/null > > +++ b/arch/x86/kernel/cpu/sgx/sgx.h > > @@ -0,0 +1,67 @@ > > +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */ > > +#ifndef _X86_SGX_H > > +#define _X86_SGX_H > > + > > +#include <linux/bitops.h> > > +#include <linux/err.h> > > +#include <linux/io.h> > > +#include <linux/rwsem.h> > > +#include <linux/types.h> > > +#include <asm/asm.h> > > +#include <uapi/asm/sgx_errno.h> > > + > > +struct sgx_epc_page { > > + unsigned long desc; > > + struct list_head list; > > +}; > > + > > +/** > > + * struct sgx_epc_section > > + * > > + * The firmware can define multiple chunks of EPC to the different areas of the > > My usual question: what if fw doesn't? Can we define our own chunks or > do we need special firmware support for the whole EPC thing to even > exist? Yes, firmware needs to define the sections by writing them to protected memory range registers (PRMRR). > > > + * physical memory e.g. for memory areas of the each node. This structure is > > + * used to store EPC pages for one EPC section and virtual memory area where > > + * the pages have been mapped. > > + */ > > Thx. > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette Thanks a lot! /Jarkko
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 222855cc0158..2a8988aaa074 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1934,6 +1934,20 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS If unsure, say y. +config INTEL_SGX + bool "Intel SGX core functionality" + depends on X86_64 && CPU_SUP_INTEL + select SRCU + select MMU_NOTIFIER + help + Intel(R) SGX is a set of CPU instructions that can be used by + applications to set aside private regions of code and data, referred + to as enclaves. An enclave's private memory can only be accessed by + code running within the enclave. Accesses from outside the enclave, + including other enclaves, are disallowed by hardware. + + If unsure, say N. + config EFI bool "EFI runtime service support" depends on ACPI diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile index d7a1e5a9331c..97deac5108df 100644 --- a/arch/x86/kernel/cpu/Makefile +++ b/arch/x86/kernel/cpu/Makefile @@ -45,6 +45,7 @@ obj-$(CONFIG_X86_MCE) += mce/ obj-$(CONFIG_MTRR) += mtrr/ obj-$(CONFIG_MICROCODE) += microcode/ obj-$(CONFIG_X86_CPU_RESCTRL) += resctrl/ +obj-$(CONFIG_INTEL_SGX) += sgx/ obj-$(CONFIG_X86_LOCAL_APIC) += perfctr-watchdog.o diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile index 4432d935894e..fa930e292110 100644 --- a/arch/x86/kernel/cpu/sgx/Makefile +++ b/arch/x86/kernel/cpu/sgx/Makefile @@ -1 +1 @@ -obj-y += encls.o +obj-y += encls.o main.o reclaim.o diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c new file mode 100644 index 000000000000..e2317f6e4374 --- /dev/null +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -0,0 +1,158 @@ +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) +// Copyright(c) 2016-17 Intel Corporation. + +#include <linux/freezer.h> +#include <linux/highmem.h> +#include <linux/kthread.h> +#include <linux/pagemap.h> +#include <linux/ratelimit.h> +#include <linux/sched/signal.h> +#include <linux/slab.h> +#include "arch.h" +#include "sgx.h" + +struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS]; +EXPORT_SYMBOL_GPL(sgx_epc_sections); + +int sgx_nr_epc_sections; + +static __init void sgx_free_epc_section(struct sgx_epc_section *section) +{ + struct sgx_epc_page *page; + + while (!list_empty(§ion->page_list)) { + page = list_first_entry(§ion->page_list, + struct sgx_epc_page, list); + list_del(&page->list); + kfree(page); + } + + while (!list_empty(§ion->unsanitized_page_list)) { + page = list_first_entry(§ion->unsanitized_page_list, + struct sgx_epc_page, list); + list_del(&page->list); + kfree(page); + } + + memunmap(section->va); +} + +static __init int sgx_init_epc_section(u64 addr, u64 size, unsigned long index, + struct sgx_epc_section *section) +{ + unsigned long nr_pages = size >> PAGE_SHIFT; + struct sgx_epc_page *page; + unsigned long i; + + section->va = memremap(addr, size, MEMREMAP_WB); + if (!section->va) + return -ENOMEM; + + section->pa = addr; + spin_lock_init(§ion->lock); + INIT_LIST_HEAD(§ion->page_list); + INIT_LIST_HEAD(§ion->unsanitized_page_list); + + for (i = 0; i < nr_pages; i++) { + page = kzalloc(sizeof(*page), GFP_KERNEL); + if (!page) + goto out; + page->desc = (addr + (i << PAGE_SHIFT)) | index; + list_add_tail(&page->list, §ion->unsanitized_page_list); + section->free_cnt++; + } + + return 0; +out: + sgx_free_epc_section(section); + return -ENOMEM; +} + +static __init void sgx_page_cache_teardown(void) +{ + int i; + + for (i = 0; i < sgx_nr_epc_sections; i++) + sgx_free_epc_section(&sgx_epc_sections[i]); +} + +/** + * A section metric is concatenated in a way that @low bits 12-31 define the + * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the + * metric. + */ +static inline u64 sgx_calc_section_metric(u64 low, u64 high) +{ + return (low & GENMASK_ULL(31, 12)) + + ((high & GENMASK_ULL(19, 0)) << 32); +} + +static __init int sgx_page_cache_init(void) +{ + u32 eax, ebx, ecx, edx, type; + u64 pa, size; + int ret; + int i; + + BUILD_BUG_ON(SGX_MAX_EPC_SECTIONS > (SGX_EPC_SECTION_MASK + 1)); + + for (i = 0; i < (SGX_MAX_EPC_SECTIONS + 1); i++) { + cpuid_count(SGX_CPUID, i + SGX_CPUID_FIRST_VARIABLE_SUB_LEAF, + &eax, &ebx, &ecx, &edx); + + type = eax & SGX_CPUID_SUB_LEAF_TYPE_MASK; + if (type == SGX_CPUID_SUB_LEAF_INVALID) + break; + if (type != SGX_CPUID_SUB_LEAF_EPC_SECTION) { + pr_err_once("sgx: Unknown sub-leaf type: %u\n", type); + return -ENODEV; + } + if (i == SGX_MAX_EPC_SECTIONS) { + pr_warn("sgx: More than " + __stringify(SGX_MAX_EPC_SECTIONS) + " EPC sections\n"); + break; + } + + pa = sgx_calc_section_metric(eax, ebx); + size = sgx_calc_section_metric(ecx, edx); + pr_info("sgx: EPC section 0x%llx-0x%llx\n", pa, pa + size - 1); + + ret = sgx_init_epc_section(pa, size, i, &sgx_epc_sections[i]); + if (ret) { + sgx_page_cache_teardown(); + return ret; + } + + sgx_nr_epc_sections++; + } + + if (!sgx_nr_epc_sections) { + pr_err("sgx: There are zero EPC sections.\n"); + return -ENODEV; + } + + return 0; +} + +static __init int sgx_init(void) +{ + int ret; + + if (!boot_cpu_has(X86_FEATURE_SGX)) + return false; + + ret = sgx_page_cache_init(); + if (ret) + return ret; + + ret = sgx_page_reclaimer_init(); + if (ret) { + sgx_page_cache_teardown(); + return ret; + } + + return 0; +} + +arch_initcall(sgx_init); diff --git a/arch/x86/kernel/cpu/sgx/reclaim.c b/arch/x86/kernel/cpu/sgx/reclaim.c new file mode 100644 index 000000000000..042769f03be9 --- /dev/null +++ b/arch/x86/kernel/cpu/sgx/reclaim.c @@ -0,0 +1,84 @@ +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) +// Copyright(c) 2016-19 Intel Corporation. + +#include <linux/freezer.h> +#include <linux/highmem.h> +#include <linux/kthread.h> +#include <linux/pagemap.h> +#include <linux/ratelimit.h> +#include <linux/slab.h> +#include <linux/sched/mm.h> +#include <linux/sched/signal.h> +#include "encls.h" +#include "sgx.h" + +static struct task_struct *ksgxswapd_tsk; + +static void sgx_sanitize_section(struct sgx_epc_section *section) +{ + struct sgx_epc_page *page, *tmp; + LIST_HEAD(secs_list); + int ret; + + while (!list_empty(§ion->unsanitized_page_list)) { + if (kthread_should_stop()) + return; + + spin_lock(§ion->lock); + + page = list_first_entry(§ion->unsanitized_page_list, + struct sgx_epc_page, list); + + ret = __eremove(sgx_epc_addr(page)); + if (!ret) + list_move(&page->list, §ion->page_list); + else + list_move_tail(&page->list, &secs_list); + + spin_unlock(§ion->lock); + + cond_resched(); + } + + list_for_each_entry_safe(page, tmp, &secs_list, list) { + if (kthread_should_stop()) + return; + + ret = __eremove(sgx_epc_addr(page)); + if (!WARN_ON_ONCE(ret)) { + spin_lock(§ion->lock); + list_move(&page->list, §ion->page_list); + spin_unlock(§ion->lock); + } else { + list_del(&page->list); + kfree(page); + } + + cond_resched(); + } +} + +static int ksgxswapd(void *p) +{ + int i; + + set_freezable(); + + for (i = 0; i < sgx_nr_epc_sections; i++) + sgx_sanitize_section(&sgx_epc_sections[i]); + + return 0; +} + +int sgx_page_reclaimer_init(void) +{ + struct task_struct *tsk; + + tsk = kthread_run(ksgxswapd, NULL, "ksgxswapd"); + if (IS_ERR(tsk)) + return PTR_ERR(tsk); + + ksgxswapd_tsk = tsk; + + return 0; +} diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h new file mode 100644 index 000000000000..3009ec816339 --- /dev/null +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -0,0 +1,67 @@ +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */ +#ifndef _X86_SGX_H +#define _X86_SGX_H + +#include <linux/bitops.h> +#include <linux/err.h> +#include <linux/io.h> +#include <linux/rwsem.h> +#include <linux/types.h> +#include <asm/asm.h> +#include <uapi/asm/sgx_errno.h> + +struct sgx_epc_page { + unsigned long desc; + struct list_head list; +}; + +/** + * struct sgx_epc_section + * + * The firmware can define multiple chunks of EPC to the different areas of the + * physical memory e.g. for memory areas of the each node. This structure is + * used to store EPC pages for one EPC section and virtual memory area where + * the pages have been mapped. + */ +struct sgx_epc_section { + unsigned long pa; + void *va; + struct list_head page_list; + struct list_head unsanitized_page_list; + unsigned long free_cnt; + spinlock_t lock; +}; + +#define SGX_MAX_EPC_SECTIONS 8 + +extern struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS]; + +/** + * enum sgx_epc_page_desc - bits and masks for an EPC page's descriptor + * %SGX_EPC_SECTION_MASK: SGX allows to have multiple EPC sections in the + * physical memory. The existing and near-future + * hardware defines at most eight sections, hence + * three bits to hold a section. + */ +enum sgx_epc_page_desc { + SGX_EPC_SECTION_MASK = GENMASK_ULL(3, 0), + /* bits 12-63 are reserved for the physical page address of the page */ +}; + +static inline struct sgx_epc_section *sgx_epc_section(struct sgx_epc_page *page) +{ + return &sgx_epc_sections[page->desc & SGX_EPC_SECTION_MASK]; +} + +static inline void *sgx_epc_addr(struct sgx_epc_page *page) +{ + struct sgx_epc_section *section = sgx_epc_section(page); + + return section->va + (page->desc & PAGE_MASK) - section->pa; +} + +extern int sgx_nr_epc_sections; + +int sgx_page_reclaimer_init(void); + +#endif /* _X86_SGX_H */