Message ID | 20200623043931.157156-1-jarkko.sakkinen@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] x86/sgx: Allocate form local NUMA node first | expand |
On Tue, Jun 23, 2020 at 07:39:31AM +0300, Jarkko Sakkinen wrote: > Create a pointer array for each NUMA node with the references to the > contained EPC sections. Use this in __sgx_alloc_epc_page() to knock the > current NUMA node before the others. > > Cc: Dave Hansen <dave.hansen@intel.com> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> I took a reverse approach, i.e. have only pointer arrays in the numa structs. I think we should just include this. It makes the overall structure more legit, meaning that we do something useful with the sections. It clarifies more than dissolves so to speak... /Jarkko
On 6/22/20 9:39 PM, Jarkko Sakkinen wrote: > +static int __init sgx_pfn_to_nid(unsigned long pfn) > +{ > + pg_data_t *pgdat; > + int nid; > + > + for (nid = 0; nid < nr_node_ids; nid++) { > + pgdat = NODE_DATA(nid); > + > + if (pfn >= pgdat->node_start_pfn && > + pfn < (pgdat->node_start_pfn + pgdat->node_spanned_pages)) pgdat_end_pfn(), perhaps? > + return nid; > + } > + > + return 0; > +} Does this actually work? The node span (->node_start_pfn through start+->node_spanned_pages) only contains pages which the OS is actively managing, usually RAM but sometimes also persistent memory. This has some assumption that the SGX PFNs are within the node's span. I would only _expect_ that to happen if the node was built like this: | Node-X RAM | EPC | Node-X RAM | If the EPC was on either end: | Node-X RAM | EPC | or | EPC | Node-X RAM | I suspect that the pgdat span wouldn't include EPC. EPC is, if I remember correctly, a E820_RESERVED region.
On Wed, Jun 24, 2020 at 04:24:25PM -0700, Dave Hansen wrote: > On 6/22/20 9:39 PM, Jarkko Sakkinen wrote: > > +static int __init sgx_pfn_to_nid(unsigned long pfn) > > +{ > > + pg_data_t *pgdat; > > + int nid; > > + > > + for (nid = 0; nid < nr_node_ids; nid++) { > > + pgdat = NODE_DATA(nid); > > + > > + if (pfn >= pgdat->node_start_pfn && > > + pfn < (pgdat->node_start_pfn + pgdat->node_spanned_pages)) > > pgdat_end_pfn(), perhaps? > > > + return nid; > > + } > > + > > + return 0; > > +} > > Does this actually work? > > The node span (->node_start_pfn through start+->node_spanned_pages) only > contains pages which the OS is actively managing, usually RAM but > sometimes also persistent memory. This has some assumption that the SGX > PFNs are within the node's span. I would only _expect_ that to happen > if the node was built like this: > > | Node-X RAM | EPC | Node-X RAM | > > If the EPC was on either end: > > | Node-X RAM | EPC | > or > | EPC | Node-X RAM | > > I suspect that the pgdat span wouldn't include EPC. EPC is, if I > remember correctly, a E820_RESERVED region. It is indeed E820_RESERVED, but the BIOS WG for ICX states that EPC regions should be enumerated in ACPI SRAT along with regular memory. But, I haven't actually verified that info makes its way into the kernel's pgdata stuff.
On 6/24/20 4:54 PM, Sean Christopherson wrote: >> Does this actually work? >> >> The node span (->node_start_pfn through start+->node_spanned_pages) only >> contains pages which the OS is actively managing, usually RAM but >> sometimes also persistent memory. This has some assumption that the SGX >> PFNs are within the node's span. I would only _expect_ that to happen >> if the node was built like this: >> >> | Node-X RAM | EPC | Node-X RAM | >> >> If the EPC was on either end: >> >> | Node-X RAM | EPC | >> or >> | EPC | Node-X RAM | >> >> I suspect that the pgdat span wouldn't include EPC. EPC is, if I >> remember correctly, a E820_RESERVED region. > It is indeed E820_RESERVED, but the BIOS WG for ICX states that EPC regions > should be enumerated in ACPI SRAT along with regular memory. > > But, I haven't actually verified that info makes its way into the kernel's > pgdata stuff. Considering this, are we all agreed that this patch is in no condition to be submitted upstream?
On Wed, Jun 24, 2020 at 05:25:59PM -0700, Dave Hansen wrote: > On 6/24/20 4:54 PM, Sean Christopherson wrote: > >> Does this actually work? > >> > >> The node span (->node_start_pfn through start+->node_spanned_pages) only > >> contains pages which the OS is actively managing, usually RAM but > >> sometimes also persistent memory. This has some assumption that the SGX > >> PFNs are within the node's span. I would only _expect_ that to happen > >> if the node was built like this: > >> > >> | Node-X RAM | EPC | Node-X RAM | > >> > >> If the EPC was on either end: > >> > >> | Node-X RAM | EPC | > >> or > >> | EPC | Node-X RAM | > >> > >> I suspect that the pgdat span wouldn't include EPC. EPC is, if I > >> remember correctly, a E820_RESERVED region. > > It is indeed E820_RESERVED, but the BIOS WG for ICX states that EPC regions > > should be enumerated in ACPI SRAT along with regular memory. > > > > But, I haven't actually verified that info makes its way into the kernel's > > pgdata stuff. > > Considering this, are we all agreed that this patch is in no condition > to be submitted upstream? Yes, it needs to be tested first. I like the resulting code more than what we have now, but I see no reason to change it at this stage unless one of the maintainers actually complains.
On Wed, Jun 24, 2020 at 05:57:52PM -0700, Sean Christopherson wrote: > On Wed, Jun 24, 2020 at 05:25:59PM -0700, Dave Hansen wrote: > > On 6/24/20 4:54 PM, Sean Christopherson wrote: > > >> Does this actually work? > > >> > > >> The node span (->node_start_pfn through start+->node_spanned_pages) only > > >> contains pages which the OS is actively managing, usually RAM but > > >> sometimes also persistent memory. This has some assumption that the SGX > > >> PFNs are within the node's span. I would only _expect_ that to happen > > >> if the node was built like this: > > >> > > >> | Node-X RAM | EPC | Node-X RAM | > > >> > > >> If the EPC was on either end: > > >> > > >> | Node-X RAM | EPC | > > >> or > > >> | EPC | Node-X RAM | > > >> > > >> I suspect that the pgdat span wouldn't include EPC. EPC is, if I > > >> remember correctly, a E820_RESERVED region. > > > It is indeed E820_RESERVED, but the BIOS WG for ICX states that EPC regions > > > should be enumerated in ACPI SRAT along with regular memory. > > > > > > But, I haven't actually verified that info makes its way into the kernel's > > > pgdata stuff. > > > > Considering this, are we all agreed that this patch is in no condition > > to be submitted upstream? > > Yes, it needs to be tested first. > > I like the resulting code more than what we have now, but I see no reason to > change it at this stage unless one of the maintainers actually complains. I'm cool with this. I think that this patch shows that the current candidate patch set (v33) is in a good shape: the diff adheres very cleanly on what we have. /Jarkko
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 3594d37d545f..c9b47cf87730 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -13,6 +13,13 @@ #include "encl.h" #include "encls.h" +struct sgx_numa_node { + struct sgx_epc_section *sections[SGX_MAX_EPC_SECTIONS]; + int nr_sections; +}; + +static struct sgx_numa_node sgx_numa_nodes[MAX_NUMNODES]; +static int sgx_nr_numa_nodes; struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS]; static int sgx_nr_epc_sections; static struct task_struct *ksgxswapd_tsk; @@ -502,6 +509,27 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct sgx_epc_sec return page; } +static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid) +{ + struct sgx_numa_node *node = &sgx_numa_nodes[nid]; + struct sgx_epc_section *section; + struct sgx_epc_page *page; + int i; + + for (i = 0; i < node->nr_sections; i++) { + section = node->sections[i]; + spin_lock(§ion->lock); + page = __sgx_alloc_epc_page_from_section(section); + spin_unlock(§ion->lock); + + if (page) + return page; + } + + return NULL; +} + + /** * __sgx_alloc_epc_page() - Allocate an EPC page * @@ -514,16 +542,19 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct sgx_epc_sec */ struct sgx_epc_page *__sgx_alloc_epc_page(void) { - struct sgx_epc_section *section; struct sgx_epc_page *page; + int nid = numa_node_id(); int i; - for (i = 0; i < sgx_nr_epc_sections; i++) { - section = &sgx_epc_sections[i]; - spin_lock(§ion->lock); - page = __sgx_alloc_epc_page_from_section(section); - spin_unlock(§ion->lock); + page = __sgx_alloc_epc_page_from_node(nid); + if (page) + return page; + for (i = 0; i < sgx_nr_numa_nodes; i++) { + if (i == nid) + continue; + + page = __sgx_alloc_epc_page_from_node(i); if (page) return page; } @@ -684,11 +715,28 @@ static inline u64 __init sgx_calc_section_metric(u64 low, u64 high) ((high & GENMASK_ULL(19, 0)) << 32); } +static int __init sgx_pfn_to_nid(unsigned long pfn) +{ + pg_data_t *pgdat; + int nid; + + for (nid = 0; nid < nr_node_ids; nid++) { + pgdat = NODE_DATA(nid); + + if (pfn >= pgdat->node_start_pfn && + pfn < (pgdat->node_start_pfn + pgdat->node_spanned_pages)) + return nid; + } + + return 0; +} + static bool __init sgx_page_cache_init(void) { u32 eax, ebx, ecx, edx, type; + struct sgx_numa_node *node; u64 pa, size; - int i; + int i, nid; for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) { cpuid_count(SGX_CPUID, i + SGX_CPUID_FIRST_VARIABLE_SUB_LEAF, @@ -714,6 +762,14 @@ static bool __init sgx_page_cache_init(void) } sgx_nr_epc_sections++; + + nid = sgx_pfn_to_nid(PFN_DOWN(pa)); + node = &sgx_numa_nodes[nid]; + + node->sections[node->nr_sections] = &sgx_epc_sections[i]; + node->nr_sections++; + + sgx_nr_numa_nodes = max(sgx_nr_numa_nodes, nid + 1); } if (!sgx_nr_epc_sections) {