Message ID | 20240102213206.1493733-1-alison.schofield@intel.com |
---|---|
State | New, archived |
Headers | show |
Series | x86/numa: Make numa_fill_memblks() @end parameter exclusive | expand |
alison.schofield@ wrote: > From: Alison Schofield <alison.schofield@intel.com> > > numa_fill_memblks() expects inclusive [start, end] parameters but > it's only caller, acpi_parse_cfmws(), is sending an exclusive end > parameter. This reads backwards to me, numa_fill_memblks() is currently doing *exclusive* math on a inclusive parameter, and the fix is to make it do inclusive math instead, right? > This means that numa_fill_memblks() can create an overlap Perhaps: s/This means/This confusion means/ s/create an overlap/create a false positive overlap/ > between different NUMA nodes with adjacent memblks. That overlap is > discovered in numa_cleanup_meminfo() and numa initialization fails > like this: > > [] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0xffffffffff] > [] ACPI: SRAT: Node 1 PXM 1 [mem 0x10000000000-0x1ffffffffff] > [] node 0 [mem 0x100000000-0xffffffffff] overlaps with node 1 [mem 0x100000000-0x1ffffffffff] > > Changing the call site to send the expected inclusive @end parameter > was considered and rejected. Rather numa_fill_memblks() is made to > handle the exclusive @end, thereby making it the same as its neighbor > numa_add_memblks(). > > Fixes: 8f012db27c95 ("x86/numa: Introduce numa_fill_memblks()") > Suggested by: "Huang, Ying" <ying.huang@intel.com> > Signed-off-by: Alison Schofield <alison.schofield@intel.com> > --- > arch/x86/mm/numa.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c > index b29ceb19e46e..4f81f75e4328 100644 > --- a/arch/x86/mm/numa.c > +++ b/arch/x86/mm/numa.c > @@ -974,9 +974,9 @@ static struct numa_memblk *numa_memblk_list[NR_NODE_MEMBLKS] __initdata; > * @start: address to begin fill > * @end: address to end fill > * > - * Find and extend numa_meminfo memblks to cover the @start-@end > + * Find and extend numa_meminfo memblks to cover the [start, end) > * physical address range, such that the first memblk includes > - * @start, the last memblk includes @end, and any gaps in between > + * @start, the last memblk excludes @end, and any gaps in between > * are filled. > * > * RETURNS: > @@ -1003,7 +1003,7 @@ int __init numa_fill_memblks(u64 start, u64 end) > for (int i = 0; i < mi->nr_blks; i++) { > struct numa_memblk *bi = &mi->blk[i]; > > - if (start < bi->end && end >= bi->start) { > + if (start < bi->end && end > bi->start) { > blk[count] = &mi->blk[i]; > count++; > } So I realize I asked for the minimal fix before a wider cleanup to 'struct numa_memblk' to use 'struct range', but I want to see some of that cleanup now. How about the below (only compile-tested) to at least convert numa_fill_memblks(), since it is new, and then circle back sometime later to move 'struct numa_memblk' to embed a 'struct range' directly? I.e. save touching legacy code for later, but fix the bug in the new code with some modernization. This also documents that numa_add_memblk() expects an inclusive argument. Note that this would be 3 patches, the minimal logic fix as you have it without the comment changes since they become moot later, the introduction of range_overlaps() with namespace conflict fixup with the btrfs version, and using range_overlaps() to cleanup numa_fill_memblks() diff --git a/arch/x86/include/asm/sparsemem.h b/arch/x86/include/asm/sparsemem.h index 1be13b2dfe8b..9e2279762eaa 100644 --- a/arch/x86/include/asm/sparsemem.h +++ b/arch/x86/include/asm/sparsemem.h @@ -37,7 +37,8 @@ extern int phys_to_target_node(phys_addr_t start); #define phys_to_target_node phys_to_target_node extern int memory_add_physaddr_to_nid(u64 start); #define memory_add_physaddr_to_nid memory_add_physaddr_to_nid -extern int numa_fill_memblks(u64 start, u64 end); +struct range; +extern int numa_fill_memblks(struct range *range); #define numa_fill_memblks numa_fill_memblks #endif #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index b29ceb19e46e..ce1ccda6fee4 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -13,6 +13,7 @@ #include <linux/sched.h> #include <linux/topology.h> #include <linux/sort.h> +#include <linux/range.h> #include <asm/e820/api.h> #include <asm/proto.h> @@ -971,20 +972,17 @@ static struct numa_memblk *numa_memblk_list[NR_NODE_MEMBLKS] __initdata; /** * numa_fill_memblks - Fill gaps in numa_meminfo memblks - * @start: address to begin fill - * @end: address to end fill + * @range: range to fill * - * Find and extend numa_meminfo memblks to cover the @start-@end - * physical address range, such that the first memblk includes - * @start, the last memblk includes @end, and any gaps in between - * are filled. + * Find and extend numa_meminfo memblks to cover the physical address + * range and fill any gaps. * * RETURNS: * 0 : Success - * NUMA_NO_MEMBLK : No memblk exists in @start-@end range + * NUMA_NO_MEMBLK : No memblk exists in @range */ -int __init numa_fill_memblks(u64 start, u64 end) +int __init numa_fill_memblks(struct range *range) { struct numa_memblk **blk = &numa_memblk_list[0]; struct numa_meminfo *mi = &numa_meminfo; @@ -993,17 +991,17 @@ int __init numa_fill_memblks(u64 start, u64 end) /* * Create a list of pointers to numa_meminfo memblks that - * overlap start, end. Exclude (start == bi->end) since - * end addresses in both a CFMWS range and a memblk range - * are exclusive. - * - * This list of pointers is used to make in-place changes - * that fill out the numa_meminfo memblks. + * overlap start, end. This list is used to make in-place + * changes that fill out the numa_meminfo memblks. */ for (int i = 0; i < mi->nr_blks; i++) { struct numa_memblk *bi = &mi->blk[i]; + struct range bi_range = { + .start = bi->start, + .end = bi->end - 1, + }; - if (start < bi->end && end >= bi->start) { + if (range_overlaps(range, &bi_range)) { blk[count] = &mi->blk[i]; count++; } @@ -1015,8 +1013,8 @@ int __init numa_fill_memblks(u64 start, u64 end) sort(&blk[0], count, sizeof(blk[0]), cmp_memblk, NULL); /* Make sure the first/last memblks include start/end */ - blk[0]->start = min(blk[0]->start, start); - blk[count - 1]->end = max(blk[count - 1]->end, end); + blk[0]->start = min(blk[0]->start, range->start); + blk[count - 1]->end = max(blk[count - 1]->end, range->end + 1); /* * Fill any gaps by tracking the previous memblks diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c index 12f330b0eac0..6213a15c2a8d 100644 --- a/drivers/acpi/numa/srat.c +++ b/drivers/acpi/numa/srat.c @@ -307,8 +307,10 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header, int node; cfmws = (struct acpi_cedt_cfmws *)header; - start = cfmws->base_hpa; - end = cfmws->base_hpa + cfmws->window_size; + struct range range = { + .start = cfmws->base_hpa, + .end = cfmws->base_hpa + cfmws->window_size - 1, + }; /* * The SRAT may have already described NUMA details for all, @@ -316,7 +318,7 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header, * found for any portion of the window to cover the entire * window. */ - if (!numa_fill_memblks(start, end)) + if (!numa_fill_memblks(&range)) return 0; /* No SRAT description. Create a new node. */ @@ -327,7 +329,7 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header, return -EINVAL; } - if (numa_add_memblk(node, start, end) < 0) { + if (numa_add_memblk(node, range.start, range.end + 1) < 0) { /* CXL driver must handle the NUMA_NO_NODE case */ pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n", node, start, end); diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index a82e1417c4d2..e6c4ffc6003d 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -111,7 +111,7 @@ static struct rb_node *__tree_search(struct rb_root *root, u64 file_offset, return NULL; } -static int range_overlaps(struct btrfs_ordered_extent *entry, u64 file_offset, +static int btrfs_range_overlaps(struct btrfs_ordered_extent *entry, u64 file_offset, u64 len) { if (file_offset + len <= entry->file_offset || @@ -913,7 +913,7 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_range( while (1) { entry = rb_entry(node, struct btrfs_ordered_extent, rb_node); - if (range_overlaps(entry, file_offset, len)) + if (btrfs_range_overlaps(entry, file_offset, len)) break; if (entry->file_offset >= file_offset + len) { @@ -1042,12 +1042,12 @@ struct btrfs_ordered_extent *btrfs_lookup_first_ordered_range( } if (prev) { entry = rb_entry(prev, struct btrfs_ordered_extent, rb_node); - if (range_overlaps(entry, file_offset, len)) + if (btrfs_range_overlaps(entry, file_offset, len)) goto out; } if (next) { entry = rb_entry(next, struct btrfs_ordered_extent, rb_node); - if (range_overlaps(entry, file_offset, len)) + if (btrfs_range_overlaps(entry, file_offset, len)) goto out; } /* No ordered extent in the range */ diff --git a/include/linux/numa.h b/include/linux/numa.h index a904861de800..ef35c974e5f2 100644 --- a/include/linux/numa.h +++ b/include/linux/numa.h @@ -45,7 +45,7 @@ static inline int phys_to_target_node(u64 start) } #endif #ifndef numa_fill_memblks -static inline int __init numa_fill_memblks(u64 start, u64 end) +static inline int __init numa_fill_memblks(struct range *range) { return NUMA_NO_MEMBLK; } diff --git a/include/linux/range.h b/include/linux/range.h index 6ad0b73cb7ad..981fd4f7731e 100644 --- a/include/linux/range.h +++ b/include/linux/range.h @@ -13,11 +13,18 @@ static inline u64 range_len(const struct range *range) return range->end - range->start + 1; } +/* True if r1 completely contains r2 */ static inline bool range_contains(struct range *r1, struct range *r2) { return r1->start <= r2->start && r1->end >= r2->end; } +/* True if any part of r1 overlaps r2 */ +static inline bool range_overlaps(const struct range *r1, const struct range *r2) +{ + return r1->start <= r2->end && r1->end >= r2->start; +} + int add_range(struct range *range, int az, int nr_range, u64 start, u64 end);
On Tue, Jan 02, 2024 at 03:42:50PM -0800, Dan Williams wrote: > > So I realize I asked for the minimal fix before a wider cleanup to > 'struct numa_memblk' to use 'struct range', but I want to see some of > that cleanup now. How about the below (only compile-tested) to at least > convert numa_fill_memblks(), since it is new, and then circle back > sometime later to move 'struct numa_memblk' to embed a 'struct range' > directly? I.e. save touching legacy code for later, but fix the bug in > the new code with some modernization. This also documents that > numa_add_memblk() expects an inclusive argument. > > Note that this would be 3 patches, the minimal logic fix as you have it > without the comment changes since they become moot later, the > introduction of range_overlaps() with namespace conflict fixup with the > btrfs version, and using range_overlaps() to cleanup numa_fill_memblks() Hi Dan, Your idea to use struct range in struct numa_memblk comes as tremendous relief after staring at open coded usage of start/end in numa_memblk. With that acknowledged, the minimal fix to the overlap issue could be either in x86 or acpi. We're at x86 because we wanted to be 'like' numa_add_memblks(). A partial cleanup now makes numa_fill_memblks() diverge further. And, to avoid the partial cleanup from being throw-away work, it seems we'd want to define the full cleanup ahead of time. Also recall that we have a bigger hope of replacing this whole numa memblk usage with the "Memblock" (with an 'o') API. Maybe I need clarification on what you are suggesting wrt 3 patches? Are you saying you would not merge the minimal fix unless the partial clean up patches are included in the same set? Also, is this something that has to be merged through x86 or might you be able to take it though cxl tree? Yes, that's me worrying about the review and intake burden this puts on x86. Alison > > diff --git a/arch/x86/include/asm/sparsemem.h b/arch/x86/include/asm/sparsemem.h > index 1be13b2dfe8b..9e2279762eaa 100644 > --- a/arch/x86/include/asm/sparsemem.h > +++ b/arch/x86/include/asm/sparsemem.h > @@ -37,7 +37,8 @@ extern int phys_to_target_node(phys_addr_t start); > #define phys_to_target_node phys_to_target_node > extern int memory_add_physaddr_to_nid(u64 start); > #define memory_add_physaddr_to_nid memory_add_physaddr_to_nid > -extern int numa_fill_memblks(u64 start, u64 end); > +struct range; > +extern int numa_fill_memblks(struct range *range); > #define numa_fill_memblks numa_fill_memblks > #endif > #endif /* __ASSEMBLY__ */ > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c > index b29ceb19e46e..ce1ccda6fee4 100644 > --- a/arch/x86/mm/numa.c > +++ b/arch/x86/mm/numa.c > @@ -13,6 +13,7 @@ > #include <linux/sched.h> > #include <linux/topology.h> > #include <linux/sort.h> > +#include <linux/range.h> > > #include <asm/e820/api.h> > #include <asm/proto.h> > @@ -971,20 +972,17 @@ static struct numa_memblk *numa_memblk_list[NR_NODE_MEMBLKS] __initdata; > > /** > * numa_fill_memblks - Fill gaps in numa_meminfo memblks > - * @start: address to begin fill > - * @end: address to end fill > + * @range: range to fill > * > - * Find and extend numa_meminfo memblks to cover the @start-@end > - * physical address range, such that the first memblk includes > - * @start, the last memblk includes @end, and any gaps in between > - * are filled. > + * Find and extend numa_meminfo memblks to cover the physical address > + * range and fill any gaps. > * > * RETURNS: > * 0 : Success > - * NUMA_NO_MEMBLK : No memblk exists in @start-@end range > + * NUMA_NO_MEMBLK : No memblk exists in @range > */ > > -int __init numa_fill_memblks(u64 start, u64 end) > +int __init numa_fill_memblks(struct range *range) > { > struct numa_memblk **blk = &numa_memblk_list[0]; > struct numa_meminfo *mi = &numa_meminfo; > @@ -993,17 +991,17 @@ int __init numa_fill_memblks(u64 start, u64 end) > > /* > * Create a list of pointers to numa_meminfo memblks that > - * overlap start, end. Exclude (start == bi->end) since > - * end addresses in both a CFMWS range and a memblk range > - * are exclusive. > - * > - * This list of pointers is used to make in-place changes > - * that fill out the numa_meminfo memblks. > + * overlap start, end. This list is used to make in-place > + * changes that fill out the numa_meminfo memblks. > */ > for (int i = 0; i < mi->nr_blks; i++) { > struct numa_memblk *bi = &mi->blk[i]; > + struct range bi_range = { > + .start = bi->start, > + .end = bi->end - 1, > + }; > > - if (start < bi->end && end >= bi->start) { > + if (range_overlaps(range, &bi_range)) { > blk[count] = &mi->blk[i]; > count++; > } > @@ -1015,8 +1013,8 @@ int __init numa_fill_memblks(u64 start, u64 end) > sort(&blk[0], count, sizeof(blk[0]), cmp_memblk, NULL); > > /* Make sure the first/last memblks include start/end */ > - blk[0]->start = min(blk[0]->start, start); > - blk[count - 1]->end = max(blk[count - 1]->end, end); > + blk[0]->start = min(blk[0]->start, range->start); > + blk[count - 1]->end = max(blk[count - 1]->end, range->end + 1); > > /* > * Fill any gaps by tracking the previous memblks > diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c > index 12f330b0eac0..6213a15c2a8d 100644 > --- a/drivers/acpi/numa/srat.c > +++ b/drivers/acpi/numa/srat.c > @@ -307,8 +307,10 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header, > int node; > > cfmws = (struct acpi_cedt_cfmws *)header; > - start = cfmws->base_hpa; > - end = cfmws->base_hpa + cfmws->window_size; > + struct range range = { > + .start = cfmws->base_hpa, > + .end = cfmws->base_hpa + cfmws->window_size - 1, > + }; > > /* > * The SRAT may have already described NUMA details for all, > @@ -316,7 +318,7 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header, > * found for any portion of the window to cover the entire > * window. > */ > - if (!numa_fill_memblks(start, end)) > + if (!numa_fill_memblks(&range)) > return 0; > > /* No SRAT description. Create a new node. */ > @@ -327,7 +329,7 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header, > return -EINVAL; > } > > - if (numa_add_memblk(node, start, end) < 0) { > + if (numa_add_memblk(node, range.start, range.end + 1) < 0) { > /* CXL driver must handle the NUMA_NO_NODE case */ > pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n", > node, start, end); > diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c > index a82e1417c4d2..e6c4ffc6003d 100644 > --- a/fs/btrfs/ordered-data.c > +++ b/fs/btrfs/ordered-data.c > @@ -111,7 +111,7 @@ static struct rb_node *__tree_search(struct rb_root *root, u64 file_offset, > return NULL; > } > > -static int range_overlaps(struct btrfs_ordered_extent *entry, u64 file_offset, > +static int btrfs_range_overlaps(struct btrfs_ordered_extent *entry, u64 file_offset, > u64 len) > { > if (file_offset + len <= entry->file_offset || > @@ -913,7 +913,7 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_range( > > while (1) { > entry = rb_entry(node, struct btrfs_ordered_extent, rb_node); > - if (range_overlaps(entry, file_offset, len)) > + if (btrfs_range_overlaps(entry, file_offset, len)) > break; > > if (entry->file_offset >= file_offset + len) { > @@ -1042,12 +1042,12 @@ struct btrfs_ordered_extent *btrfs_lookup_first_ordered_range( > } > if (prev) { > entry = rb_entry(prev, struct btrfs_ordered_extent, rb_node); > - if (range_overlaps(entry, file_offset, len)) > + if (btrfs_range_overlaps(entry, file_offset, len)) > goto out; > } > if (next) { > entry = rb_entry(next, struct btrfs_ordered_extent, rb_node); > - if (range_overlaps(entry, file_offset, len)) > + if (btrfs_range_overlaps(entry, file_offset, len)) > goto out; > } > /* No ordered extent in the range */ > diff --git a/include/linux/numa.h b/include/linux/numa.h > index a904861de800..ef35c974e5f2 100644 > --- a/include/linux/numa.h > +++ b/include/linux/numa.h > @@ -45,7 +45,7 @@ static inline int phys_to_target_node(u64 start) > } > #endif > #ifndef numa_fill_memblks > -static inline int __init numa_fill_memblks(u64 start, u64 end) > +static inline int __init numa_fill_memblks(struct range *range) > { > return NUMA_NO_MEMBLK; > } > diff --git a/include/linux/range.h b/include/linux/range.h > index 6ad0b73cb7ad..981fd4f7731e 100644 > --- a/include/linux/range.h > +++ b/include/linux/range.h > @@ -13,11 +13,18 @@ static inline u64 range_len(const struct range *range) > return range->end - range->start + 1; > } > > +/* True if r1 completely contains r2 */ > static inline bool range_contains(struct range *r1, struct range *r2) > { > return r1->start <= r2->start && r1->end >= r2->end; > } > > +/* True if any part of r1 overlaps r2 */ > +static inline bool range_overlaps(const struct range *r1, const struct range *r2) > +{ > + return r1->start <= r2->end && r1->end >= r2->start; > +} > + > int add_range(struct range *range, int az, int nr_range, > u64 start, u64 end); >
Alison Schofield wrote: > On Tue, Jan 02, 2024 at 03:42:50PM -0800, Dan Williams wrote: > > > > So I realize I asked for the minimal fix before a wider cleanup to > > 'struct numa_memblk' to use 'struct range', but I want to see some of > > that cleanup now. How about the below (only compile-tested) to at least > > convert numa_fill_memblks(), since it is new, and then circle back > > sometime later to move 'struct numa_memblk' to embed a 'struct range' > > directly? I.e. save touching legacy code for later, but fix the bug in > > the new code with some modernization. This also documents that > > numa_add_memblk() expects an inclusive argument. > > > > Note that this would be 3 patches, the minimal logic fix as you have it > > without the comment changes since they become moot later, the > > introduction of range_overlaps() with namespace conflict fixup with the > > btrfs version, and using range_overlaps() to cleanup numa_fill_memblks() > > Hi Dan, > > Your idea to use struct range in struct numa_memblk comes as tremendous > relief after staring at open coded usage of start/end in numa_memblk. > > With that acknowledged, the minimal fix to the overlap issue could be > either in x86 or acpi. I don't see how it could be in acpi since there are currently conflicting {in,ex}clusive usages of @end numa_fill_memblk(), right? I.e.: ... start < bi->end && end >= bi->start) ... blk[count - 1]->end = max(blk[count - 1]->end, end); > We're at x86 because we wanted to be 'like' > numa_add_memblks(). A partial cleanup now makes numa_fill_memblks() > diverge further. Yes, but numa_add_memblks() parity is not necessarily a virtue, however memblock (with an 'o') parity might be, see below: > And, to avoid the partial cleanup from being throw-away work, it > seems we'd want to define the full cleanup ahead of time. Are you referring to throwing away this cleanup work for the eventual memblock conversion, then yes, I agree, if this is just talking about the numa_memblk with 'struct range' conversion, then I disagree. > Also recall that we have a bigger hope of replacing this whole > numa memblk usage with the "Memblock" (with an 'o') API. 2 wrinkles here... 1/ From a practical standpoint is anyone actively working on this? I would definitely stand down from the numa_memblk conversion in favor of memblock, but it has been years since this idea was floated, and I doubt it shows up anytime soon. However, "be more like memblock" is a good idea. 2/ memblock appears to be doing inclusive range math, which shoots holes in my position that numa_add_memblk() is an anti-pattern that merits conversion. I.e. "be more like memblock" contra-indicates a 'struct range' conversion. > Maybe I need clarification on what you are suggesting wrt 3 patches? > Are you saying you would not merge the minimal fix unless the partial > clean up patches are included in the same set? I think your memblock observation points to a third way. Lets uplevel and steal memblock_addrs_overlap() and use it for this broken comparison. That nods to an eventual full memblock conversion and keeps the inclusive math status quo in memblock and numa_memblk related code. > Also, is this something that has to be merged through x86 or might > you be able to take it though cxl tree? Yes, that's me worrying > about the review and intake burden this puts on x86. I can always merge things through cxl.git with an x86 Acked-by, or a "holler if you don't want this to go through cxl.git" approach. Lets keep maintainer busy-ness concerns separate from where the best long term place for the changes to land.
On Wed, Jan 03, 2024 at 12:18:17PM -0800, Dan Williams wrote: > Alison Schofield wrote: > > On Tue, Jan 02, 2024 at 03:42:50PM -0800, Dan Williams wrote: > > > > > > So I realize I asked for the minimal fix before a wider cleanup to > > > 'struct numa_memblk' to use 'struct range', but I want to see some of > > > that cleanup now. How about the below (only compile-tested) to at least > > > convert numa_fill_memblks(), since it is new, and then circle back > > > sometime later to move 'struct numa_memblk' to embed a 'struct range' > > > directly? I.e. save touching legacy code for later, but fix the bug in > > > the new code with some modernization. This also documents that > > > numa_add_memblk() expects an inclusive argument. > > > > > > Note that this would be 3 patches, the minimal logic fix as you have it > > > without the comment changes since they become moot later, the > > > introduction of range_overlaps() with namespace conflict fixup with the > > > btrfs version, and using range_overlaps() to cleanup numa_fill_memblks() > > > > Hi Dan, > > > > Your idea to use struct range in struct numa_memblk comes as tremendous > > relief after staring at open coded usage of start/end in numa_memblk. > > > > With that acknowledged, the minimal fix to the overlap issue could be > > either in x86 or acpi. > > I don't see how it could be in acpi since there are currently > conflicting {in,ex}clusive usages of @end numa_fill_memblk(), right? > You're right > I.e.: > ... > start < bi->end && end >= bi->start) > ... > blk[count - 1]->end = max(blk[count - 1]->end, end); > > > We're at x86 because we wanted to be 'like' > > numa_add_memblks(). A partial cleanup now makes numa_fill_memblks() > > diverge further. > > Yes, but numa_add_memblks() parity is not necessarily a virtue, however > memblock (with an 'o') parity might be, see below: > > > And, to avoid the partial cleanup from being throw-away work, it > > seems we'd want to define the full cleanup ahead of time. > > Are you referring to throwing away this cleanup work for the eventual > memblock conversion, then yes, I agree, if this is just talking about > the numa_memblk with 'struct range' conversion, then I disagree. > > > Also recall that we have a bigger hope of replacing this whole > > numa memblk usage with the "Memblock" (with an 'o') API. > > 2 wrinkles here... > > 1/ From a practical standpoint is anyone actively working on this? I > would definitely stand down from the numa_memblk conversion in favor of > memblock, but it has been years since this idea was floated, and I doubt > it shows up anytime soon. However, "be more like memblock" is a good > idea. > > 2/ memblock appears to be doing inclusive range math, which shoots holes > in my position that numa_add_memblk() is an anti-pattern that merits > conversion. I.e. "be more like memblock" contra-indicates a 'struct > range' conversion. > > > Maybe I need clarification on what you are suggesting wrt 3 patches? > > Are you saying you would not merge the minimal fix unless the partial > > clean up patches are included in the same set? > > I think your memblock observation points to a third way. Lets uplevel > and steal memblock_addrs_overlap() and use it for this broken > comparison. That nods to an eventual full memblock conversion and keeps > the inclusive math status quo in memblock and numa_memblk related code. > Thanks for calling my attention to the available memblock helper. I went ahead and used memblock_addrs_overlap() for the comparison and verified that it fixes the case where an existing memblk is extended rather than creating a new node and memblock. It actually is not clear how that relates to what the user reported: numa init failure due to nodes overlapping, so I'm holding off rev'ing this patch until that user reported failure is understood. > > Also, is this something that has to be merged through x86 or might > > you be able to take it though cxl tree? Yes, that's me worrying > > about the review and intake burden this puts on x86. > > I can always merge things through cxl.git with an x86 Acked-by, or a > "holler if you don't want this to go through cxl.git" approach. Lets > keep maintainer busy-ness concerns separate from where the best long > term place for the changes to land. Got it.
Alison Schofield wrote: [..] > > I think your memblock observation points to a third way. Lets uplevel > > and steal memblock_addrs_overlap() and use it for this broken > > comparison. That nods to an eventual full memblock conversion and keeps > > the inclusive math status quo in memblock and numa_memblk related code. > > > > Thanks for calling my attention to the available memblock helper. > I went ahead and used memblock_addrs_overlap() for the comparison > and verified that it fixes the case where an existing memblk is > extended rather than creating a new node and memblock. It actually > is not clear how that relates to what the user reported: numa init > failure due to nodes overlapping, so I'm holding off rev'ing this > patch until that user reported failure is understood. Oh, I understood the problem as numa_cleanup_meminfo() finds an unexpected overlap and fails (returns -EINVAL) which causes NUMA operation to be disabled altogether. Otherwise, yes, it would be good to get confirmation from $reporter that this fixes their issue.
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index b29ceb19e46e..4f81f75e4328 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -974,9 +974,9 @@ static struct numa_memblk *numa_memblk_list[NR_NODE_MEMBLKS] __initdata; * @start: address to begin fill * @end: address to end fill * - * Find and extend numa_meminfo memblks to cover the @start-@end + * Find and extend numa_meminfo memblks to cover the [start, end) * physical address range, such that the first memblk includes - * @start, the last memblk includes @end, and any gaps in between + * @start, the last memblk excludes @end, and any gaps in between * are filled. * * RETURNS: @@ -1003,7 +1003,7 @@ int __init numa_fill_memblks(u64 start, u64 end) for (int i = 0; i < mi->nr_blks; i++) { struct numa_memblk *bi = &mi->blk[i]; - if (start < bi->end && end >= bi->start) { + if (start < bi->end && end > bi->start) { blk[count] = &mi->blk[i]; count++; }