diff mbox series

[RFC] x86/sgx: Allocate form local NUMA node first

Message ID 20200623043931.157156-1-jarkko.sakkinen@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series [RFC] x86/sgx: Allocate form local NUMA node first | expand

Commit Message

Jarkko Sakkinen June 23, 2020, 4:39 a.m. UTC
Create a pointer array for each NUMA node with the references to the
contained EPC sections. Use this in __sgx_alloc_epc_page() to knock the
current NUMA node before the others.

Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 70 ++++++++++++++++++++++++++++++----
 1 file changed, 63 insertions(+), 7 deletions(-)

Comments

Jarkko Sakkinen June 24, 2020, 11:12 p.m. UTC | #1
On Tue, Jun 23, 2020 at 07:39:31AM +0300, Jarkko Sakkinen wrote:
> Create a pointer array for each NUMA node with the references to the
> contained EPC sections. Use this in __sgx_alloc_epc_page() to knock the
> current NUMA node before the others.
> 
> Cc: Dave Hansen <dave.hansen@intel.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>

I took a reverse approach, i.e. have only pointer arrays in the numa
structs.

I think we should just include this. It makes the overall structure
more legit, meaning that we do something useful with the sections.

It clarifies more than dissolves so to speak...

/Jarkko
Dave Hansen June 24, 2020, 11:24 p.m. UTC | #2
On 6/22/20 9:39 PM, Jarkko Sakkinen wrote:
> +static int __init sgx_pfn_to_nid(unsigned long pfn)
> +{
> +	pg_data_t *pgdat;
> +	int nid;
> +
> +	for (nid = 0; nid < nr_node_ids; nid++) {
> +		pgdat = NODE_DATA(nid);
> +
> +		if (pfn >= pgdat->node_start_pfn &&
> +		    pfn < (pgdat->node_start_pfn + pgdat->node_spanned_pages))

pgdat_end_pfn(), perhaps?

> +			return nid;
> +	}
> +
> +	return 0;
> +}

Does this actually work?

The node span (->node_start_pfn through start+->node_spanned_pages) only
contains pages which the OS is actively managing, usually RAM but
sometimes also persistent memory.  This has some assumption that the SGX
PFNs are within the node's span.  I would only _expect_ that to happen
if the node was built like this:

	| Node-X RAM | EPC | Node-X RAM |

If the EPC was on either end:

	| Node-X RAM | EPC |
or
	| EPC | Node-X RAM |

I suspect that the pgdat span wouldn't include EPC.  EPC is, if I
remember correctly, a E820_RESERVED region.
Sean Christopherson June 24, 2020, 11:54 p.m. UTC | #3
On Wed, Jun 24, 2020 at 04:24:25PM -0700, Dave Hansen wrote:
> On 6/22/20 9:39 PM, Jarkko Sakkinen wrote:
> > +static int __init sgx_pfn_to_nid(unsigned long pfn)
> > +{
> > +	pg_data_t *pgdat;
> > +	int nid;
> > +
> > +	for (nid = 0; nid < nr_node_ids; nid++) {
> > +		pgdat = NODE_DATA(nid);
> > +
> > +		if (pfn >= pgdat->node_start_pfn &&
> > +		    pfn < (pgdat->node_start_pfn + pgdat->node_spanned_pages))
> 
> pgdat_end_pfn(), perhaps?
> 
> > +			return nid;
> > +	}
> > +
> > +	return 0;
> > +}
> 
> Does this actually work?
> 
> The node span (->node_start_pfn through start+->node_spanned_pages) only
> contains pages which the OS is actively managing, usually RAM but
> sometimes also persistent memory.  This has some assumption that the SGX
> PFNs are within the node's span.  I would only _expect_ that to happen
> if the node was built like this:
> 
> 	| Node-X RAM | EPC | Node-X RAM |
> 
> If the EPC was on either end:
> 
> 	| Node-X RAM | EPC |
> or
> 	| EPC | Node-X RAM |
> 
> I suspect that the pgdat span wouldn't include EPC.  EPC is, if I
> remember correctly, a E820_RESERVED region.

It is indeed E820_RESERVED, but the BIOS WG for ICX states that EPC regions
should be enumerated in ACPI SRAT along with regular memory.

But, I haven't actually verified that info makes its way into the kernel's
pgdata stuff.
Dave Hansen June 25, 2020, 12:25 a.m. UTC | #4
On 6/24/20 4:54 PM, Sean Christopherson wrote:
>> Does this actually work?
>>
>> The node span (->node_start_pfn through start+->node_spanned_pages) only
>> contains pages which the OS is actively managing, usually RAM but
>> sometimes also persistent memory.  This has some assumption that the SGX
>> PFNs are within the node's span.  I would only _expect_ that to happen
>> if the node was built like this:
>>
>> 	| Node-X RAM | EPC | Node-X RAM |
>>
>> If the EPC was on either end:
>>
>> 	| Node-X RAM | EPC |
>> or
>> 	| EPC | Node-X RAM |
>>
>> I suspect that the pgdat span wouldn't include EPC.  EPC is, if I
>> remember correctly, a E820_RESERVED region.
> It is indeed E820_RESERVED, but the BIOS WG for ICX states that EPC regions
> should be enumerated in ACPI SRAT along with regular memory.
> 
> But, I haven't actually verified that info makes its way into the kernel's
> pgdata stuff.

Considering this, are we all agreed that this patch is in no condition
to be submitted upstream?
Sean Christopherson June 25, 2020, 12:57 a.m. UTC | #5
On Wed, Jun 24, 2020 at 05:25:59PM -0700, Dave Hansen wrote:
> On 6/24/20 4:54 PM, Sean Christopherson wrote:
> >> Does this actually work?
> >>
> >> The node span (->node_start_pfn through start+->node_spanned_pages) only
> >> contains pages which the OS is actively managing, usually RAM but
> >> sometimes also persistent memory.  This has some assumption that the SGX
> >> PFNs are within the node's span.  I would only _expect_ that to happen
> >> if the node was built like this:
> >>
> >> 	| Node-X RAM | EPC | Node-X RAM |
> >>
> >> If the EPC was on either end:
> >>
> >> 	| Node-X RAM | EPC |
> >> or
> >> 	| EPC | Node-X RAM |
> >>
> >> I suspect that the pgdat span wouldn't include EPC.  EPC is, if I
> >> remember correctly, a E820_RESERVED region.
> > It is indeed E820_RESERVED, but the BIOS WG for ICX states that EPC regions
> > should be enumerated in ACPI SRAT along with regular memory.
> > 
> > But, I haven't actually verified that info makes its way into the kernel's
> > pgdata stuff.
> 
> Considering this, are we all agreed that this patch is in no condition
> to be submitted upstream?

Yes, it needs to be tested first.

I like the resulting code more than what we have now, but I see no reason to
change it at this stage unless one of the maintainers actually complains.
Jarkko Sakkinen June 25, 2020, 2:57 a.m. UTC | #6
On Wed, Jun 24, 2020 at 05:57:52PM -0700, Sean Christopherson wrote:
> On Wed, Jun 24, 2020 at 05:25:59PM -0700, Dave Hansen wrote:
> > On 6/24/20 4:54 PM, Sean Christopherson wrote:
> > >> Does this actually work?
> > >>
> > >> The node span (->node_start_pfn through start+->node_spanned_pages) only
> > >> contains pages which the OS is actively managing, usually RAM but
> > >> sometimes also persistent memory.  This has some assumption that the SGX
> > >> PFNs are within the node's span.  I would only _expect_ that to happen
> > >> if the node was built like this:
> > >>
> > >> 	| Node-X RAM | EPC | Node-X RAM |
> > >>
> > >> If the EPC was on either end:
> > >>
> > >> 	| Node-X RAM | EPC |
> > >> or
> > >> 	| EPC | Node-X RAM |
> > >>
> > >> I suspect that the pgdat span wouldn't include EPC.  EPC is, if I
> > >> remember correctly, a E820_RESERVED region.
> > > It is indeed E820_RESERVED, but the BIOS WG for ICX states that EPC regions
> > > should be enumerated in ACPI SRAT along with regular memory.
> > > 
> > > But, I haven't actually verified that info makes its way into the kernel's
> > > pgdata stuff.
> > 
> > Considering this, are we all agreed that this patch is in no condition
> > to be submitted upstream?
> 
> Yes, it needs to be tested first.
> 
> I like the resulting code more than what we have now, but I see no reason to
> change it at this stage unless one of the maintainers actually complains.

I'm cool with this. I think that this patch shows that the current
candidate patch set (v33) is in a good shape: the diff adheres very
cleanly on what we have.

/Jarkko
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 3594d37d545f..c9b47cf87730 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -13,6 +13,13 @@ 
 #include "encl.h"
 #include "encls.h"
 
+struct sgx_numa_node {
+	struct sgx_epc_section *sections[SGX_MAX_EPC_SECTIONS];
+	int nr_sections;
+};
+
+static struct sgx_numa_node sgx_numa_nodes[MAX_NUMNODES];
+static int sgx_nr_numa_nodes;
 struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 static int sgx_nr_epc_sections;
 static struct task_struct *ksgxswapd_tsk;
@@ -502,6 +509,27 @@  static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct sgx_epc_sec
 	return page;
 }
 
+static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
+{
+	struct sgx_numa_node *node = &sgx_numa_nodes[nid];
+	struct sgx_epc_section *section;
+	struct sgx_epc_page *page;
+	int i;
+
+	for (i = 0; i < node->nr_sections; i++) {
+		section = node->sections[i];
+		spin_lock(&section->lock);
+		page = __sgx_alloc_epc_page_from_section(section);
+		spin_unlock(&section->lock);
+
+		if (page)
+			return page;
+	}
+
+	return NULL;
+}
+
+
 /**
  * __sgx_alloc_epc_page() - Allocate an EPC page
  *
@@ -514,16 +542,19 @@  static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct sgx_epc_sec
  */
 struct sgx_epc_page *__sgx_alloc_epc_page(void)
 {
-	struct sgx_epc_section *section;
 	struct sgx_epc_page *page;
+	int nid = numa_node_id();
 	int i;
 
-	for (i = 0; i < sgx_nr_epc_sections; i++) {
-		section = &sgx_epc_sections[i];
-		spin_lock(&section->lock);
-		page = __sgx_alloc_epc_page_from_section(section);
-		spin_unlock(&section->lock);
+	page = __sgx_alloc_epc_page_from_node(nid);
+	if (page)
+		return page;
 
+	for (i = 0; i < sgx_nr_numa_nodes; i++) {
+		if (i == nid)
+			continue;
+
+		page = __sgx_alloc_epc_page_from_node(i);
 		if (page)
 			return page;
 	}
@@ -684,11 +715,28 @@  static inline u64 __init sgx_calc_section_metric(u64 low, u64 high)
 	       ((high & GENMASK_ULL(19, 0)) << 32);
 }
 
+static int __init sgx_pfn_to_nid(unsigned long pfn)
+{
+	pg_data_t *pgdat;
+	int nid;
+
+	for (nid = 0; nid < nr_node_ids; nid++) {
+		pgdat = NODE_DATA(nid);
+
+		if (pfn >= pgdat->node_start_pfn &&
+		    pfn < (pgdat->node_start_pfn + pgdat->node_spanned_pages))
+			return nid;
+	}
+
+	return 0;
+}
+
 static bool __init sgx_page_cache_init(void)
 {
 	u32 eax, ebx, ecx, edx, type;
+	struct sgx_numa_node *node;
 	u64 pa, size;
-	int i;
+	int i, nid;
 
 	for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) {
 		cpuid_count(SGX_CPUID, i + SGX_CPUID_FIRST_VARIABLE_SUB_LEAF,
@@ -714,6 +762,14 @@  static bool __init sgx_page_cache_init(void)
 		}
 
 		sgx_nr_epc_sections++;
+
+		nid = sgx_pfn_to_nid(PFN_DOWN(pa));
+		node = &sgx_numa_nodes[nid];
+
+		node->sections[node->nr_sections] = &sgx_epc_sections[i];
+		node->nr_sections++;
+
+		sgx_nr_numa_nodes = max(sgx_nr_numa_nodes, nid + 1);
 	}
 
 	if (!sgx_nr_epc_sections) {