diff mbox series

[v3,4/6] xen/x86: use arch_get_ram_range to get information from E820 map

Message ID 20220822025810.2240707-5-wei.chen@arm.com (mailing list archive)
State Superseded
Headers show
Series Device tree based NUMA support for Arm - Part#2 | expand

Commit Message

Wei Chen Aug. 22, 2022, 2:58 a.m. UTC
The sanity check of nodes_cover_memory is also a requirement of
other architectures that support NUMA. But now, the code of
nodes_cover_memory is tied to the x86 E820. In this case, we
introduce arch_get_ram_range to decouple architecture specific
memory map from this function. This means, other architectures
like Arm can also use it to check its node and memory coverage
from bootmem info.

Depends arch_get_ram_range, we make nodes_cover_memory become
architecture independent. We also use neutral words to replace
SRAT and E820 in the print message of this function. This will
to make the massage seems more common.

As arch_get_ram_range use unsigned int for index, we also adjust
the index in nodes_cover_memory from int to unsigned int.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
v2 -> v3:
1. Rename arch_get_memory_map to arch_get_ram_range.
2. Use -ENOENT instead of -ENODEV to indicate end of memory map.
3. Add description to code comment that arch_get_ram_range returns
   RAM range in [start, end) format.
v1 -> v2:
1. Use arch_get_memory_map to replace arch_get_memory_bank_range
   and arch_get_memory_bank_number.
2. Remove the !start || !end check, because caller guarantee
   these two pointers will not be NULL.
---
 xen/arch/x86/numa.c    | 25 +++++++++++++++++++++++++
 xen/arch/x86/srat.c    | 18 +++++++++++-------
 xen/include/xen/numa.h |  3 +++
 3 files changed, 39 insertions(+), 7 deletions(-)

Comments

Jan Beulich Aug. 25, 2022, 12:14 p.m. UTC | #1
On 22.08.2022 04:58, Wei Chen wrote:
> @@ -96,3 +97,27 @@ unsigned int __init arch_get_dma_bitsize(void)
>                   flsl(node_start_pfn(node) + node_spanned_pages(node) / 4 - 1)
>                   + PAGE_SHIFT, 32);
>  }
> +
> +/*
> + * This function provides the ability for caller to get one RAM entry
> + * from architectural memory map by index.
> + *
> + * This function will return zero if it can return a proper RAM entry.
> + * otherwise it will return -ENOENT for out of scope index, or return
> + * -EINVAL for non-RAM type memory entry.
> + *
> + * Note: the range is exclusive at the end, e.g. [start, end).
> + */
> +int __init arch_get_ram_range(unsigned int idx, paddr_t *start, paddr_t *end)

Since the comment is intended to apply to all architectures providing,
I think it should go with the declaration (once) rather than the
definition (at least one instance per arch).

> +{
> +    if ( idx >= e820.nr_map )
> +        return -ENOENT;
> +
> +    if ( e820.map[idx].type != E820_RAM )
> +        return -EINVAL;

EINVAL is so heavily (over)loaded that I'm inclined to ask to use e.g.
-ENODATA here.

> --- a/xen/arch/x86/srat.c
> +++ b/xen/arch/x86/srat.c
> @@ -428,18 +428,22 @@ acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
>     Make sure the PXMs cover all memory. */
>  static int __init nodes_cover_memory(void)
>  {
> -	int i;
> +	unsigned int i;
>  
> -	for (i = 0; i < e820.nr_map; i++) {
> +	for (i = 0; ; i++) {
>  		int j, found;
>  		paddr_t start, end;
>  
> -		if (e820.map[i].type != E820_RAM) {
> +		/* Try to loop memory map from index 0 to end to get RAM ranges. */
> +		found = arch_get_ram_range(i, &start, &end);
> +
> +		/* Index relate entry is not RAM, skip it. */
> +		if (found == -EINVAL)
>  			continue;
> -		}
>  
> -		start = e820.map[i].addr;
> -		end = e820.map[i].addr + e820.map[i].size;
> +		/* Reach the end of arch's memory map */
> +		if (found == -ENOENT)
> +			break;

What if an arch returns a 3rd error indicator? The way you've written
it code below would assume success and use uninitialized data. I'd
like to suggest to only special-case -ENOENT and treat all other
errors the same. But of course the variable (re)used doesn't really
fit that:

		/* Reach the end of arch's memory map */
		if (found == -ENOENT)
			break;

		/* Index relate entry is not RAM, skip it. */
		if (found)
			continue;

because here really you mean "not found". Since in fact "found" would
want to be of "bool" type in the function, and "j" would want to be
"unsigned int" just like "i" is, I recommend introducing a new local
variable, e.g. "err".

Jan

>  		do {
>  			found = 0;
Wei Chen Aug. 29, 2022, 10:17 a.m. UTC | #2
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2022年8月25日 20:14
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: nd <nd@arm.com>; Andrew Cooper <andrew.cooper3@citrix.com>; Roger Pau
> Monné <roger.pau@citrix.com>; Wei Liu <wl@xen.org>; George Dunlap
> <george.dunlap@citrix.com>; Julien Grall <julien@xen.org>; Stefano
> Stabellini <sstabellini@kernel.org>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v3 4/6] xen/x86: use arch_get_ram_range to get
> information from E820 map
> 
> On 22.08.2022 04:58, Wei Chen wrote:
> > @@ -96,3 +97,27 @@ unsigned int __init arch_get_dma_bitsize(void)
> >                   flsl(node_start_pfn(node) + node_spanned_pages(node) /
> 4 - 1)
> >                   + PAGE_SHIFT, 32);
> >  }
> > +
> > +/*
> > + * This function provides the ability for caller to get one RAM entry
> > + * from architectural memory map by index.
> > + *
> > + * This function will return zero if it can return a proper RAM entry.
> > + * otherwise it will return -ENOENT for out of scope index, or return
> > + * -EINVAL for non-RAM type memory entry.
> > + *
> > + * Note: the range is exclusive at the end, e.g. [start, end).
> > + */
> > +int __init arch_get_ram_range(unsigned int idx, paddr_t *start, paddr_t
> *end)
> 
> Since the comment is intended to apply to all architectures providing,
> I think it should go with the declaration (once) rather than the
> definition (at least one instance per arch).
> 

Ok, I will move the comment to header file.

> > +{
> > +    if ( idx >= e820.nr_map )
> > +        return -ENOENT;
> > +
> > +    if ( e820.map[idx].type != E820_RAM )
> > +        return -EINVAL;
> 
> EINVAL is so heavily (over)loaded that I'm inclined to ask to use e.g.
> -ENODATA here.
> 

Ok.

> > --- a/xen/arch/x86/srat.c
> > +++ b/xen/arch/x86/srat.c
> > @@ -428,18 +428,22 @@ acpi_numa_memory_affinity_init(const struct
> acpi_srat_mem_affinity *ma)
> >     Make sure the PXMs cover all memory. */
> >  static int __init nodes_cover_memory(void)
> >  {
> > -	int i;
> > +	unsigned int i;
> >
> > -	for (i = 0; i < e820.nr_map; i++) {
> > +	for (i = 0; ; i++) {
> >  		int j, found;
> >  		paddr_t start, end;
> >
> > -		if (e820.map[i].type != E820_RAM) {
> > +		/* Try to loop memory map from index 0 to end to get RAM
> ranges. */
> > +		found = arch_get_ram_range(i, &start, &end);
> > +
> > +		/* Index relate entry is not RAM, skip it. */
> > +		if (found == -EINVAL)
> >  			continue;
> > -		}
> >
> > -		start = e820.map[i].addr;
> > -		end = e820.map[i].addr + e820.map[i].size;
> > +		/* Reach the end of arch's memory map */
> > +		if (found == -ENOENT)
> > +			break;
> 
> What if an arch returns a 3rd error indicator? The way you've written
> it code below would assume success and use uninitialized data. I'd
> like to suggest to only special-case -ENOENT and treat all other

Yes, I hadn't taken the 3rd error case into account. I will follow your
Comment in next version.

> errors the same. But of course the variable (re)used doesn't really
> fit that:
> 
> 		/* Reach the end of arch's memory map */
> 		if (found == -ENOENT)
> 			break;
> 
> 		/* Index relate entry is not RAM, skip it. */
> 		if (found)
> 			continue;
> 
> because here really you mean "not found". Since in fact "found" would
> want to be of "bool" type in the function, and "j" would want to be
> "unsigned int" just like "i" is, I recommend introducing a new local
> variable, e.g. "err".
> 

Ok.

Cheers,
Wei Chen

> Jan
> 
> >  		do {
> >  			found = 0;
diff mbox series

Patch

diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
index 9a9090e99a..c28c7388e4 100644
--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -9,6 +9,7 @@ 
 #include <xen/nodemask.h>
 #include <xen/numa.h>
 #include <asm/acpi.h>
+#include <asm/e820.h>
 
 #ifndef Dprintk
 #define Dprintk(x...)
@@ -96,3 +97,27 @@  unsigned int __init arch_get_dma_bitsize(void)
                  flsl(node_start_pfn(node) + node_spanned_pages(node) / 4 - 1)
                  + PAGE_SHIFT, 32);
 }
+
+/*
+ * This function provides the ability for caller to get one RAM entry
+ * from architectural memory map by index.
+ *
+ * This function will return zero if it can return a proper RAM entry.
+ * otherwise it will return -ENOENT for out of scope index, or return
+ * -EINVAL for non-RAM type memory entry.
+ *
+ * Note: the range is exclusive at the end, e.g. [start, end).
+ */
+int __init arch_get_ram_range(unsigned int idx, paddr_t *start, paddr_t *end)
+{
+    if ( idx >= e820.nr_map )
+        return -ENOENT;
+
+    if ( e820.map[idx].type != E820_RAM )
+        return -EINVAL;
+
+    *start = e820.map[idx].addr;
+    *end = *start + e820.map[idx].size;
+
+    return 0;
+}
diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index 4c7f0c547e..6bebfb344f 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -428,18 +428,22 @@  acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
    Make sure the PXMs cover all memory. */
 static int __init nodes_cover_memory(void)
 {
-	int i;
+	unsigned int i;
 
-	for (i = 0; i < e820.nr_map; i++) {
+	for (i = 0; ; i++) {
 		int j, found;
 		paddr_t start, end;
 
-		if (e820.map[i].type != E820_RAM) {
+		/* Try to loop memory map from index 0 to end to get RAM ranges. */
+		found = arch_get_ram_range(i, &start, &end);
+
+		/* Index relate entry is not RAM, skip it. */
+		if (found == -EINVAL)
 			continue;
-		}
 
-		start = e820.map[i].addr;
-		end = e820.map[i].addr + e820.map[i].size;
+		/* Reach the end of arch's memory map */
+		if (found == -ENOENT)
+			break;
 
 		do {
 			found = 0;
@@ -458,7 +462,7 @@  static int __init nodes_cover_memory(void)
 		} while (found && start < end);
 
 		if (start < end) {
-			printk(KERN_ERR "SRAT: No PXM for e820 range: "
+			printk(KERN_ERR "NUMA: No NODE for RAM range: "
 				"[%"PRIpaddr", %"PRIpaddr"]\n", start, end - 1);
 			return 0;
 		}
diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
index b779e68787..d64006483a 100644
--- a/xen/include/xen/numa.h
+++ b/xen/include/xen/numa.h
@@ -80,6 +80,9 @@  static inline __attribute_pure__ nodeid_t phys_to_nid(paddr_t addr)
 #define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
                                 NODE_DATA(nid)->node_spanned_pages)
 
+extern int arch_get_ram_range(unsigned int idx,
+                              paddr_t *start, paddr_t *end);
+
 #endif
 
 #endif /* _XEN_NUMA_H */