diff mbox series

x86/numa: Make numa_fill_memblks() @end parameter exclusive

Message ID 20240102213206.1493733-1-alison.schofield@intel.com
State New, archived
Headers show
Series x86/numa: Make numa_fill_memblks() @end parameter exclusive | expand

Commit Message

Alison Schofield Jan. 2, 2024, 9:32 p.m. UTC
From: Alison Schofield <alison.schofield@intel.com>

numa_fill_memblks() expects inclusive [start, end] parameters but
it's only caller, acpi_parse_cfmws(), is sending an exclusive end
parameter. This means that numa_fill_memblks() can create an overlap
between different NUMA nodes with adjacent memblks. That overlap is
discovered in numa_cleanup_meminfo() and numa initialization fails
like this:

[] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0xffffffffff]
[] ACPI: SRAT: Node 1 PXM 1 [mem 0x10000000000-0x1ffffffffff]
[] node 0 [mem 0x100000000-0xffffffffff] overlaps with node 1 [mem 0x100000000-0x1ffffffffff]

Changing the call site to send the expected inclusive @end parameter
was considered and rejected. Rather numa_fill_memblks() is made to
handle the exclusive @end, thereby making it the same as its neighbor
numa_add_memblks().

Fixes: 8f012db27c95 ("x86/numa: Introduce numa_fill_memblks()")
Suggested by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 arch/x86/mm/numa.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


base-commit: 659c07b7699a6e50af05a3bdcc201ff000fbcada

Comments

Dan Williams Jan. 2, 2024, 11:42 p.m. UTC | #1
alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> numa_fill_memblks() expects inclusive [start, end] parameters but
> it's only caller, acpi_parse_cfmws(), is sending an exclusive end
> parameter. 

This reads backwards to me, numa_fill_memblks() is currently doing
*exclusive* math on a inclusive parameter, and the fix is to make it do
inclusive math instead, right?

> This means that numa_fill_memblks() can create an overlap

Perhaps:

s/This means/This confusion means/

s/create an overlap/create a false positive overlap/

> between different NUMA nodes with adjacent memblks. That overlap is
> discovered in numa_cleanup_meminfo() and numa initialization fails
> like this:
> 
> [] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0xffffffffff]
> [] ACPI: SRAT: Node 1 PXM 1 [mem 0x10000000000-0x1ffffffffff]
> [] node 0 [mem 0x100000000-0xffffffffff] overlaps with node 1 [mem 0x100000000-0x1ffffffffff]
> 
> Changing the call site to send the expected inclusive @end parameter
> was considered and rejected. Rather numa_fill_memblks() is made to
> handle the exclusive @end, thereby making it the same as its neighbor
> numa_add_memblks().
> 
> Fixes: 8f012db27c95 ("x86/numa: Introduce numa_fill_memblks()")
> Suggested by: "Huang, Ying" <ying.huang@intel.com>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
>  arch/x86/mm/numa.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index b29ceb19e46e..4f81f75e4328 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -974,9 +974,9 @@ static struct numa_memblk *numa_memblk_list[NR_NODE_MEMBLKS] __initdata;
>   * @start: address to begin fill
>   * @end: address to end fill
>   *
> - * Find and extend numa_meminfo memblks to cover the @start-@end
> + * Find and extend numa_meminfo memblks to cover the [start, end)
>   * physical address range, such that the first memblk includes
> - * @start, the last memblk includes @end, and any gaps in between
> + * @start, the last memblk excludes @end, and any gaps in between
>   * are filled.
>   *
>   * RETURNS:
> @@ -1003,7 +1003,7 @@ int __init numa_fill_memblks(u64 start, u64 end)
>  	for (int i = 0; i < mi->nr_blks; i++) {
>  		struct numa_memblk *bi = &mi->blk[i];
>  
> -		if (start < bi->end && end >= bi->start) {
> +		if (start < bi->end && end > bi->start) {
>  			blk[count] = &mi->blk[i];
>  			count++;
>  		}

So I realize I asked for the minimal fix before a wider cleanup to
'struct numa_memblk' to use 'struct range', but I want to see some of
that cleanup now. How about the below (only compile-tested) to at least
convert numa_fill_memblks(), since it is new, and then circle back
sometime later to move 'struct numa_memblk' to embed a 'struct range'
directly? I.e. save touching legacy code for later, but fix the bug in
the new code with some modernization. This also documents that
numa_add_memblk() expects an inclusive argument.

Note that this would be 3 patches, the minimal logic fix as you have it
without the comment changes since they become moot later, the
introduction of range_overlaps() with namespace conflict fixup with the
btrfs version, and using range_overlaps() to cleanup numa_fill_memblks()

diff --git a/arch/x86/include/asm/sparsemem.h b/arch/x86/include/asm/sparsemem.h
index 1be13b2dfe8b..9e2279762eaa 100644
--- a/arch/x86/include/asm/sparsemem.h
+++ b/arch/x86/include/asm/sparsemem.h
@@ -37,7 +37,8 @@ extern int phys_to_target_node(phys_addr_t start);
 #define phys_to_target_node phys_to_target_node
 extern int memory_add_physaddr_to_nid(u64 start);
 #define memory_add_physaddr_to_nid memory_add_physaddr_to_nid
-extern int numa_fill_memblks(u64 start, u64 end);
+struct range;
+extern int numa_fill_memblks(struct range *range);
 #define numa_fill_memblks numa_fill_memblks
 #endif
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index b29ceb19e46e..ce1ccda6fee4 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -13,6 +13,7 @@
 #include <linux/sched.h>
 #include <linux/topology.h>
 #include <linux/sort.h>
+#include <linux/range.h>
 
 #include <asm/e820/api.h>
 #include <asm/proto.h>
@@ -971,20 +972,17 @@ static struct numa_memblk *numa_memblk_list[NR_NODE_MEMBLKS] __initdata;
 
 /**
  * numa_fill_memblks - Fill gaps in numa_meminfo memblks
- * @start: address to begin fill
- * @end: address to end fill
+ * @range: range to fill
  *
- * Find and extend numa_meminfo memblks to cover the @start-@end
- * physical address range, such that the first memblk includes
- * @start, the last memblk includes @end, and any gaps in between
- * are filled.
+ * Find and extend numa_meminfo memblks to cover the physical address
+ * range and fill any gaps.
  *
  * RETURNS:
  * 0		  : Success
- * NUMA_NO_MEMBLK : No memblk exists in @start-@end range
+ * NUMA_NO_MEMBLK : No memblk exists in @range
  */
 
-int __init numa_fill_memblks(u64 start, u64 end)
+int __init numa_fill_memblks(struct range *range)
 {
 	struct numa_memblk **blk = &numa_memblk_list[0];
 	struct numa_meminfo *mi = &numa_meminfo;
@@ -993,17 +991,17 @@ int __init numa_fill_memblks(u64 start, u64 end)
 
 	/*
 	 * Create a list of pointers to numa_meminfo memblks that
-	 * overlap start, end. Exclude (start == bi->end) since
-	 * end addresses in both a CFMWS range and a memblk range
-	 * are exclusive.
-	 *
-	 * This list of pointers is used to make in-place changes
-	 * that fill out the numa_meminfo memblks.
+	 * overlap start, end. This list is used to make in-place
+	 * changes that fill out the numa_meminfo memblks.
 	 */
 	for (int i = 0; i < mi->nr_blks; i++) {
 		struct numa_memblk *bi = &mi->blk[i];
+		struct range bi_range = {
+			.start = bi->start,
+			.end = bi->end - 1,
+		};
 
-		if (start < bi->end && end >= bi->start) {
+		if (range_overlaps(range, &bi_range)) {
 			blk[count] = &mi->blk[i];
 			count++;
 		}
@@ -1015,8 +1013,8 @@ int __init numa_fill_memblks(u64 start, u64 end)
 	sort(&blk[0], count, sizeof(blk[0]), cmp_memblk, NULL);
 
 	/* Make sure the first/last memblks include start/end */
-	blk[0]->start = min(blk[0]->start, start);
-	blk[count - 1]->end = max(blk[count - 1]->end, end);
+	blk[0]->start = min(blk[0]->start, range->start);
+	blk[count - 1]->end = max(blk[count - 1]->end, range->end + 1);
 
 	/*
 	 * Fill any gaps by tracking the previous memblks
diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
index 12f330b0eac0..6213a15c2a8d 100644
--- a/drivers/acpi/numa/srat.c
+++ b/drivers/acpi/numa/srat.c
@@ -307,8 +307,10 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
 	int node;
 
 	cfmws = (struct acpi_cedt_cfmws *)header;
-	start = cfmws->base_hpa;
-	end = cfmws->base_hpa + cfmws->window_size;
+	struct range range = {
+		.start = cfmws->base_hpa,
+		.end = cfmws->base_hpa + cfmws->window_size - 1,
+	};
 
 	/*
 	 * The SRAT may have already described NUMA details for all,
@@ -316,7 +318,7 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
 	 * found for any portion of the window to cover the entire
 	 * window.
 	 */
-	if (!numa_fill_memblks(start, end))
+	if (!numa_fill_memblks(&range))
 		return 0;
 
 	/* No SRAT description. Create a new node. */
@@ -327,7 +329,7 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
 		return -EINVAL;
 	}
 
-	if (numa_add_memblk(node, start, end) < 0) {
+	if (numa_add_memblk(node, range.start, range.end + 1) < 0) {
 		/* CXL driver must handle the NUMA_NO_NODE case */
 		pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n",
 			node, start, end);
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index a82e1417c4d2..e6c4ffc6003d 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -111,7 +111,7 @@ static struct rb_node *__tree_search(struct rb_root *root, u64 file_offset,
 	return NULL;
 }
 
-static int range_overlaps(struct btrfs_ordered_extent *entry, u64 file_offset,
+static int btrfs_range_overlaps(struct btrfs_ordered_extent *entry, u64 file_offset,
 			  u64 len)
 {
 	if (file_offset + len <= entry->file_offset ||
@@ -913,7 +913,7 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_range(
 
 	while (1) {
 		entry = rb_entry(node, struct btrfs_ordered_extent, rb_node);
-		if (range_overlaps(entry, file_offset, len))
+		if (btrfs_range_overlaps(entry, file_offset, len))
 			break;
 
 		if (entry->file_offset >= file_offset + len) {
@@ -1042,12 +1042,12 @@ struct btrfs_ordered_extent *btrfs_lookup_first_ordered_range(
 	}
 	if (prev) {
 		entry = rb_entry(prev, struct btrfs_ordered_extent, rb_node);
-		if (range_overlaps(entry, file_offset, len))
+		if (btrfs_range_overlaps(entry, file_offset, len))
 			goto out;
 	}
 	if (next) {
 		entry = rb_entry(next, struct btrfs_ordered_extent, rb_node);
-		if (range_overlaps(entry, file_offset, len))
+		if (btrfs_range_overlaps(entry, file_offset, len))
 			goto out;
 	}
 	/* No ordered extent in the range */
diff --git a/include/linux/numa.h b/include/linux/numa.h
index a904861de800..ef35c974e5f2 100644
--- a/include/linux/numa.h
+++ b/include/linux/numa.h
@@ -45,7 +45,7 @@ static inline int phys_to_target_node(u64 start)
 }
 #endif
 #ifndef numa_fill_memblks
-static inline int __init numa_fill_memblks(u64 start, u64 end)
+static inline int __init numa_fill_memblks(struct range *range)
 {
 	return NUMA_NO_MEMBLK;
 }
diff --git a/include/linux/range.h b/include/linux/range.h
index 6ad0b73cb7ad..981fd4f7731e 100644
--- a/include/linux/range.h
+++ b/include/linux/range.h
@@ -13,11 +13,18 @@ static inline u64 range_len(const struct range *range)
 	return range->end - range->start + 1;
 }
 
+/* True if r1 completely contains r2 */
 static inline bool range_contains(struct range *r1, struct range *r2)
 {
 	return r1->start <= r2->start && r1->end >= r2->end;
 }
 
+/* True if any part of r1 overlaps r2 */
+static inline bool range_overlaps(const struct range *r1, const struct range *r2)
+{
+       return r1->start <= r2->end && r1->end >= r2->start;
+}
+
 int add_range(struct range *range, int az, int nr_range,
 		u64 start, u64 end);
Alison Schofield Jan. 3, 2024, 6:49 p.m. UTC | #2
On Tue, Jan 02, 2024 at 03:42:50PM -0800, Dan Williams wrote:
> 
> So I realize I asked for the minimal fix before a wider cleanup to
> 'struct numa_memblk' to use 'struct range', but I want to see some of
> that cleanup now. How about the below (only compile-tested) to at least
> convert numa_fill_memblks(), since it is new, and then circle back
> sometime later to move 'struct numa_memblk' to embed a 'struct range'
> directly? I.e. save touching legacy code for later, but fix the bug in
> the new code with some modernization. This also documents that
> numa_add_memblk() expects an inclusive argument.
> 
> Note that this would be 3 patches, the minimal logic fix as you have it
> without the comment changes since they become moot later, the
> introduction of range_overlaps() with namespace conflict fixup with the
> btrfs version, and using range_overlaps() to cleanup numa_fill_memblks()

Hi Dan,

Your idea to use struct range in struct numa_memblk comes as tremendous
relief after staring at open coded usage of start/end in numa_memblk.

With that acknowledged, the minimal fix to the overlap issue could be
either in x86 or acpi. We're at x86 because we wanted to be 'like'
numa_add_memblks(). A partial cleanup now makes numa_fill_memblks()
diverge further. And, to avoid the partial cleanup from being
throw-away work, it seems we'd want to define the full cleanup ahead
of time.

Also recall that we have a bigger hope of replacing this whole
numa memblk usage with the "Memblock" (with an 'o') API.

Maybe I need clarification on what you are suggesting wrt 3 patches?
Are you saying you would not merge the minimal fix unless the partial
clean up patches are included in the same set?

Also, is this something that has to be merged through x86 or might
you be able to take it though cxl tree? Yes, that's me worrying 
about the review and intake burden this puts on x86.

Alison

> 
> diff --git a/arch/x86/include/asm/sparsemem.h b/arch/x86/include/asm/sparsemem.h
> index 1be13b2dfe8b..9e2279762eaa 100644
> --- a/arch/x86/include/asm/sparsemem.h
> +++ b/arch/x86/include/asm/sparsemem.h
> @@ -37,7 +37,8 @@ extern int phys_to_target_node(phys_addr_t start);
>  #define phys_to_target_node phys_to_target_node
>  extern int memory_add_physaddr_to_nid(u64 start);
>  #define memory_add_physaddr_to_nid memory_add_physaddr_to_nid
> -extern int numa_fill_memblks(u64 start, u64 end);
> +struct range;
> +extern int numa_fill_memblks(struct range *range);
>  #define numa_fill_memblks numa_fill_memblks
>  #endif
>  #endif /* __ASSEMBLY__ */
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index b29ceb19e46e..ce1ccda6fee4 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -13,6 +13,7 @@
>  #include <linux/sched.h>
>  #include <linux/topology.h>
>  #include <linux/sort.h>
> +#include <linux/range.h>
>  
>  #include <asm/e820/api.h>
>  #include <asm/proto.h>
> @@ -971,20 +972,17 @@ static struct numa_memblk *numa_memblk_list[NR_NODE_MEMBLKS] __initdata;
>  
>  /**
>   * numa_fill_memblks - Fill gaps in numa_meminfo memblks
> - * @start: address to begin fill
> - * @end: address to end fill
> + * @range: range to fill
>   *
> - * Find and extend numa_meminfo memblks to cover the @start-@end
> - * physical address range, such that the first memblk includes
> - * @start, the last memblk includes @end, and any gaps in between
> - * are filled.
> + * Find and extend numa_meminfo memblks to cover the physical address
> + * range and fill any gaps.
>   *
>   * RETURNS:
>   * 0		  : Success
> - * NUMA_NO_MEMBLK : No memblk exists in @start-@end range
> + * NUMA_NO_MEMBLK : No memblk exists in @range
>   */
>  
> -int __init numa_fill_memblks(u64 start, u64 end)
> +int __init numa_fill_memblks(struct range *range)
>  {
>  	struct numa_memblk **blk = &numa_memblk_list[0];
>  	struct numa_meminfo *mi = &numa_meminfo;
> @@ -993,17 +991,17 @@ int __init numa_fill_memblks(u64 start, u64 end)
>  
>  	/*
>  	 * Create a list of pointers to numa_meminfo memblks that
> -	 * overlap start, end. Exclude (start == bi->end) since
> -	 * end addresses in both a CFMWS range and a memblk range
> -	 * are exclusive.
> -	 *
> -	 * This list of pointers is used to make in-place changes
> -	 * that fill out the numa_meminfo memblks.
> +	 * overlap start, end. This list is used to make in-place
> +	 * changes that fill out the numa_meminfo memblks.
>  	 */
>  	for (int i = 0; i < mi->nr_blks; i++) {
>  		struct numa_memblk *bi = &mi->blk[i];
> +		struct range bi_range = {
> +			.start = bi->start,
> +			.end = bi->end - 1,
> +		};
>  
> -		if (start < bi->end && end >= bi->start) {
> +		if (range_overlaps(range, &bi_range)) {
>  			blk[count] = &mi->blk[i];
>  			count++;
>  		}
> @@ -1015,8 +1013,8 @@ int __init numa_fill_memblks(u64 start, u64 end)
>  	sort(&blk[0], count, sizeof(blk[0]), cmp_memblk, NULL);
>  
>  	/* Make sure the first/last memblks include start/end */
> -	blk[0]->start = min(blk[0]->start, start);
> -	blk[count - 1]->end = max(blk[count - 1]->end, end);
> +	blk[0]->start = min(blk[0]->start, range->start);
> +	blk[count - 1]->end = max(blk[count - 1]->end, range->end + 1);
>  
>  	/*
>  	 * Fill any gaps by tracking the previous memblks
> diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
> index 12f330b0eac0..6213a15c2a8d 100644
> --- a/drivers/acpi/numa/srat.c
> +++ b/drivers/acpi/numa/srat.c
> @@ -307,8 +307,10 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
>  	int node;
>  
>  	cfmws = (struct acpi_cedt_cfmws *)header;
> -	start = cfmws->base_hpa;
> -	end = cfmws->base_hpa + cfmws->window_size;
> +	struct range range = {
> +		.start = cfmws->base_hpa,
> +		.end = cfmws->base_hpa + cfmws->window_size - 1,
> +	};
>  
>  	/*
>  	 * The SRAT may have already described NUMA details for all,
> @@ -316,7 +318,7 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
>  	 * found for any portion of the window to cover the entire
>  	 * window.
>  	 */
> -	if (!numa_fill_memblks(start, end))
> +	if (!numa_fill_memblks(&range))
>  		return 0;
>  
>  	/* No SRAT description. Create a new node. */
> @@ -327,7 +329,7 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
>  		return -EINVAL;
>  	}
>  
> -	if (numa_add_memblk(node, start, end) < 0) {
> +	if (numa_add_memblk(node, range.start, range.end + 1) < 0) {
>  		/* CXL driver must handle the NUMA_NO_NODE case */
>  		pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n",
>  			node, start, end);
> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> index a82e1417c4d2..e6c4ffc6003d 100644
> --- a/fs/btrfs/ordered-data.c
> +++ b/fs/btrfs/ordered-data.c
> @@ -111,7 +111,7 @@ static struct rb_node *__tree_search(struct rb_root *root, u64 file_offset,
>  	return NULL;
>  }
>  
> -static int range_overlaps(struct btrfs_ordered_extent *entry, u64 file_offset,
> +static int btrfs_range_overlaps(struct btrfs_ordered_extent *entry, u64 file_offset,
>  			  u64 len)
>  {
>  	if (file_offset + len <= entry->file_offset ||
> @@ -913,7 +913,7 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_range(
>  
>  	while (1) {
>  		entry = rb_entry(node, struct btrfs_ordered_extent, rb_node);
> -		if (range_overlaps(entry, file_offset, len))
> +		if (btrfs_range_overlaps(entry, file_offset, len))
>  			break;
>  
>  		if (entry->file_offset >= file_offset + len) {
> @@ -1042,12 +1042,12 @@ struct btrfs_ordered_extent *btrfs_lookup_first_ordered_range(
>  	}
>  	if (prev) {
>  		entry = rb_entry(prev, struct btrfs_ordered_extent, rb_node);
> -		if (range_overlaps(entry, file_offset, len))
> +		if (btrfs_range_overlaps(entry, file_offset, len))
>  			goto out;
>  	}
>  	if (next) {
>  		entry = rb_entry(next, struct btrfs_ordered_extent, rb_node);
> -		if (range_overlaps(entry, file_offset, len))
> +		if (btrfs_range_overlaps(entry, file_offset, len))
>  			goto out;
>  	}
>  	/* No ordered extent in the range */
> diff --git a/include/linux/numa.h b/include/linux/numa.h
> index a904861de800..ef35c974e5f2 100644
> --- a/include/linux/numa.h
> +++ b/include/linux/numa.h
> @@ -45,7 +45,7 @@ static inline int phys_to_target_node(u64 start)
>  }
>  #endif
>  #ifndef numa_fill_memblks
> -static inline int __init numa_fill_memblks(u64 start, u64 end)
> +static inline int __init numa_fill_memblks(struct range *range)
>  {
>  	return NUMA_NO_MEMBLK;
>  }
> diff --git a/include/linux/range.h b/include/linux/range.h
> index 6ad0b73cb7ad..981fd4f7731e 100644
> --- a/include/linux/range.h
> +++ b/include/linux/range.h
> @@ -13,11 +13,18 @@ static inline u64 range_len(const struct range *range)
>  	return range->end - range->start + 1;
>  }
>  
> +/* True if r1 completely contains r2 */
>  static inline bool range_contains(struct range *r1, struct range *r2)
>  {
>  	return r1->start <= r2->start && r1->end >= r2->end;
>  }
>  
> +/* True if any part of r1 overlaps r2 */
> +static inline bool range_overlaps(const struct range *r1, const struct range *r2)
> +{
> +       return r1->start <= r2->end && r1->end >= r2->start;
> +}
> +
>  int add_range(struct range *range, int az, int nr_range,
>  		u64 start, u64 end);
>
Dan Williams Jan. 3, 2024, 8:18 p.m. UTC | #3
Alison Schofield wrote:
> On Tue, Jan 02, 2024 at 03:42:50PM -0800, Dan Williams wrote:
> > 
> > So I realize I asked for the minimal fix before a wider cleanup to
> > 'struct numa_memblk' to use 'struct range', but I want to see some of
> > that cleanup now. How about the below (only compile-tested) to at least
> > convert numa_fill_memblks(), since it is new, and then circle back
> > sometime later to move 'struct numa_memblk' to embed a 'struct range'
> > directly? I.e. save touching legacy code for later, but fix the bug in
> > the new code with some modernization. This also documents that
> > numa_add_memblk() expects an inclusive argument.
> > 
> > Note that this would be 3 patches, the minimal logic fix as you have it
> > without the comment changes since they become moot later, the
> > introduction of range_overlaps() with namespace conflict fixup with the
> > btrfs version, and using range_overlaps() to cleanup numa_fill_memblks()
> 
> Hi Dan,
> 
> Your idea to use struct range in struct numa_memblk comes as tremendous
> relief after staring at open coded usage of start/end in numa_memblk.
> 
> With that acknowledged, the minimal fix to the overlap issue could be
> either in x86 or acpi.

I don't see how it could be in acpi since there are currently
conflicting {in,ex}clusive usages of @end numa_fill_memblk(), right?

I.e.:
...
start < bi->end && end >= bi->start)
...
blk[count - 1]->end = max(blk[count - 1]->end, end);

> We're at x86 because we wanted to be 'like'
> numa_add_memblks(). A partial cleanup now makes numa_fill_memblks()
> diverge further.

Yes, but numa_add_memblks() parity is not necessarily a virtue, however
memblock (with an 'o') parity might be, see below:

>  And, to avoid the partial cleanup from being throw-away work, it
>  seems we'd want to define the full cleanup ahead of time.

Are you referring to throwing away this cleanup work for the eventual
memblock conversion, then yes, I agree, if this is just talking about
the numa_memblk with 'struct range' conversion, then I disagree.

> Also recall that we have a bigger hope of replacing this whole
> numa memblk usage with the "Memblock" (with an 'o') API.

2 wrinkles here...

1/ From a practical standpoint is anyone actively working on this? I
   would definitely stand down from the numa_memblk conversion in favor of
   memblock, but it has been years since this idea was floated, and I doubt
   it shows up anytime soon. However, "be more like memblock" is a good
   idea.

2/ memblock appears to be doing inclusive range math, which shoots holes
   in my position that numa_add_memblk() is an anti-pattern that merits
   conversion. I.e. "be more like memblock" contra-indicates a 'struct
   range' conversion.

> Maybe I need clarification on what you are suggesting wrt 3 patches?
> Are you saying you would not merge the minimal fix unless the partial
> clean up patches are included in the same set?

I think your memblock observation points to a third way. Lets uplevel
and steal memblock_addrs_overlap() and use it for this broken
comparison. That nods to an eventual full memblock conversion and keeps
the inclusive math status quo in memblock and numa_memblk related code.

> Also, is this something that has to be merged through x86 or might
> you be able to take it though cxl tree? Yes, that's me worrying 
> about the review and intake burden this puts on x86.

I can always merge things through cxl.git with an x86 Acked-by, or a
"holler if you don't want this to go through cxl.git" approach. Lets
keep maintainer busy-ness concerns separate from where the best long
term place for the changes to land.
Alison Schofield Jan. 8, 2024, 5:41 p.m. UTC | #4
On Wed, Jan 03, 2024 at 12:18:17PM -0800, Dan Williams wrote:
> Alison Schofield wrote:
> > On Tue, Jan 02, 2024 at 03:42:50PM -0800, Dan Williams wrote:
> > > 
> > > So I realize I asked for the minimal fix before a wider cleanup to
> > > 'struct numa_memblk' to use 'struct range', but I want to see some of
> > > that cleanup now. How about the below (only compile-tested) to at least
> > > convert numa_fill_memblks(), since it is new, and then circle back
> > > sometime later to move 'struct numa_memblk' to embed a 'struct range'
> > > directly? I.e. save touching legacy code for later, but fix the bug in
> > > the new code with some modernization. This also documents that
> > > numa_add_memblk() expects an inclusive argument.
> > > 
> > > Note that this would be 3 patches, the minimal logic fix as you have it
> > > without the comment changes since they become moot later, the
> > > introduction of range_overlaps() with namespace conflict fixup with the
> > > btrfs version, and using range_overlaps() to cleanup numa_fill_memblks()
> > 
> > Hi Dan,
> > 
> > Your idea to use struct range in struct numa_memblk comes as tremendous
> > relief after staring at open coded usage of start/end in numa_memblk.
> > 
> > With that acknowledged, the minimal fix to the overlap issue could be
> > either in x86 or acpi.
> 
> I don't see how it could be in acpi since there are currently
> conflicting {in,ex}clusive usages of @end numa_fill_memblk(), right?
> 

You're right

> I.e.:
> ...
> start < bi->end && end >= bi->start)
> ...
> blk[count - 1]->end = max(blk[count - 1]->end, end);
> 
> > We're at x86 because we wanted to be 'like'
> > numa_add_memblks(). A partial cleanup now makes numa_fill_memblks()
> > diverge further.
> 
> Yes, but numa_add_memblks() parity is not necessarily a virtue, however
> memblock (with an 'o') parity might be, see below:
> 
> >  And, to avoid the partial cleanup from being throw-away work, it
> >  seems we'd want to define the full cleanup ahead of time.
> 
> Are you referring to throwing away this cleanup work for the eventual
> memblock conversion, then yes, I agree, if this is just talking about
> the numa_memblk with 'struct range' conversion, then I disagree.
> 
> > Also recall that we have a bigger hope of replacing this whole
> > numa memblk usage with the "Memblock" (with an 'o') API.
> 
> 2 wrinkles here...
> 
> 1/ From a practical standpoint is anyone actively working on this? I
>    would definitely stand down from the numa_memblk conversion in favor of
>    memblock, but it has been years since this idea was floated, and I doubt
>    it shows up anytime soon. However, "be more like memblock" is a good
>    idea.
> 
> 2/ memblock appears to be doing inclusive range math, which shoots holes
>    in my position that numa_add_memblk() is an anti-pattern that merits
>    conversion. I.e. "be more like memblock" contra-indicates a 'struct
>    range' conversion.
> 
> > Maybe I need clarification on what you are suggesting wrt 3 patches?
> > Are you saying you would not merge the minimal fix unless the partial
> > clean up patches are included in the same set?
> 
> I think your memblock observation points to a third way. Lets uplevel
> and steal memblock_addrs_overlap() and use it for this broken
> comparison. That nods to an eventual full memblock conversion and keeps
> the inclusive math status quo in memblock and numa_memblk related code.
> 

Thanks for calling my attention to the available memblock helper.  
I went ahead and used memblock_addrs_overlap() for the comparison
and verified that it fixes the case where an existing memblk is
extended rather than creating a new node and memblock. It actually
is not clear how that relates to what the user reported: numa init
failure due to nodes overlapping, so I'm holding off rev'ing this
patch until that user reported failure is understood.

> > Also, is this something that has to be merged through x86 or might
> > you be able to take it though cxl tree? Yes, that's me worrying 
> > about the review and intake burden this puts on x86.
> 
> I can always merge things through cxl.git with an x86 Acked-by, or a
> "holler if you don't want this to go through cxl.git" approach. Lets
> keep maintainer busy-ness concerns separate from where the best long
> term place for the changes to land.

Got it.
Dan Williams Jan. 8, 2024, 5:59 p.m. UTC | #5
Alison Schofield wrote:
[..]
> > I think your memblock observation points to a third way. Lets uplevel
> > and steal memblock_addrs_overlap() and use it for this broken
> > comparison. That nods to an eventual full memblock conversion and keeps
> > the inclusive math status quo in memblock and numa_memblk related code.
> > 
> 
> Thanks for calling my attention to the available memblock helper.  
> I went ahead and used memblock_addrs_overlap() for the comparison
> and verified that it fixes the case where an existing memblk is
> extended rather than creating a new node and memblock. It actually
> is not clear how that relates to what the user reported: numa init
> failure due to nodes overlapping, so I'm holding off rev'ing this
> patch until that user reported failure is understood.

Oh, I understood the problem as numa_cleanup_meminfo() finds an
unexpected overlap and fails (returns -EINVAL) which causes NUMA
operation to be disabled altogether.

Otherwise, yes, it would be good to get confirmation from $reporter that
this fixes their issue.
diff mbox series

Patch

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index b29ceb19e46e..4f81f75e4328 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -974,9 +974,9 @@  static struct numa_memblk *numa_memblk_list[NR_NODE_MEMBLKS] __initdata;
  * @start: address to begin fill
  * @end: address to end fill
  *
- * Find and extend numa_meminfo memblks to cover the @start-@end
+ * Find and extend numa_meminfo memblks to cover the [start, end)
  * physical address range, such that the first memblk includes
- * @start, the last memblk includes @end, and any gaps in between
+ * @start, the last memblk excludes @end, and any gaps in between
  * are filled.
  *
  * RETURNS:
@@ -1003,7 +1003,7 @@  int __init numa_fill_memblks(u64 start, u64 end)
 	for (int i = 0; i < mi->nr_blks; i++) {
 		struct numa_memblk *bi = &mi->blk[i];
 
-		if (start < bi->end && end >= bi->start) {
+		if (start < bi->end && end > bi->start) {
 			blk[count] = &mi->blk[i];
 			count++;
 		}