mbox series

[GIT,PULL] memblock:fix validation of NUMA coverage

Message ID Zmr9oBecxdufMTeP@kernel.org (mailing list archive)
State New
Headers show
Series [GIT,PULL] memblock:fix validation of NUMA coverage | expand

Pull-request

https://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

Message

Mike Rapoport June 13, 2024, 2:09 p.m. UTC
Hi Linus,

The following changes since commit 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0:

  Linux 6.10-rc1 (2024-05-26 15:20:12 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock 

for you to fetch changes up to 3ac36aa7307363b7247ccb6f6a804e11496b2b36:

  x86/mm/numa: Use NUMA_NO_NODE when calling memblock_set_node() (2024-06-06 22:20:39 +0300)

----------------------------------------------------------------
Jan Beulich (2):
      memblock: make memblock_set_node() also warn about use of MAX_NUMNODES
      x86/mm/numa: Use NUMA_NO_NODE when calling memblock_set_node()

 arch/x86/mm/numa.c | 6 +++---
 mm/memblock.c      | 4 ++++
 2 files changed, 7 insertions(+), 3 deletions(-)

Comments

Linus Torvalds June 13, 2024, 5:09 p.m. UTC | #1
On Thu, 13 Jun 2024 at 07:11, Mike Rapoport <rppt@kernel.org> wrote:
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

What's going on? This is the second pull request recently that doesn't
actually mention where to pull from.

I can do a "git ls-remote", and I see that you have a tag called
"fixes-2024-06-13" that then points to the commit you mention:

> for you to fetch changes up to 3ac36aa7307363b7247ccb6f6a804e11496b2b36:

but that tag name isn't actually in the pull request.

Is there some broken scripting that people have started using (or have
been using for a while and was recently broken)?

                          Linus
Linus Torvalds June 13, 2024, 5:38 p.m. UTC | #2
On Thu, 13 Jun 2024 at 10:09, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Is there some broken scripting that people have started using (or have
> been using for a while and was recently broken)?

... and then when I actually pull the code, I note that the problem
where it checked _one_ bogus value has just been replaced with
checking _another_ bogus value.

Christ.

What if people use a node ID that is simply outside the range
entirely, instead of one of those special node IDs?

And now for memblock_set_node() you should apparently use NUMA_NO_NODE
to not get a warning, but for memblock_set_region_node() apparently
the right random constant to use is MAX_NUMNODES.

Does *any* of this make sense? No.

How about instead of having two random constants - and not having any
range checking that I see - just have *one* random constant for "I
have no range", call that NUMA_NO_NODE, and then have a simple helper
for "do I have a valid range", and make that be

   static inline bool numa_valid_node(int nid)
   { return (unsigned int)nid < MAX_NUMNODES; }

or something like that? Notice that now *all* of

 - NUMA_NO_NODE (explicitly no node)

 - MAX_NUMNODES (randomly used no node)

 - out of range node (who knows wth firmware tables do?)

will get the same result from that "numa_valid_node()" function.

And at that point you don't need to care, you don't need to warn, and
you don't need to have these insane rules where "sometimes you *HAVE*
to use NUMA_NO_NODE, or we warn, in other cases MAX_NUMNODES is the
thing".

Please? IOW, instead of adding a warning for fragile code, then change
some caller to follow the new rules, JUST FIX THE STUPID FRAGILITY!

Or hey, just do

    #define NUMA_NO_NODE MAX_NUMNODES

and have two names for the *same* constant, instead fo having two
different constants with strange semantic differences that seem to
make no sense and where the memblock code itself seems to go
back-and-forth on it in different contexts.

              Linus
pr-tracker-bot@kernel.org June 13, 2024, 7:30 p.m. UTC | #3
The pull request you sent on Thu, 13 Jun 2024 17:09:36 +0300:

> https://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock refs/heads/master

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/3572597ca844f625a3c9ba629ed0872b64c16179

Thank you!
Jan Beulich June 14, 2024, 6:01 a.m. UTC | #4
On 13.06.2024 19:38, Linus Torvalds wrote:
> On Thu, 13 Jun 2024 at 10:09, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> Is there some broken scripting that people have started using (or have
>> been using for a while and was recently broken)?
> 
> ... and then when I actually pull the code, I note that the problem
> where it checked _one_ bogus value has just been replaced with
> checking _another_ bogus value.
> 
> Christ.
> 
> What if people use a node ID that is simply outside the range
> entirely, instead of one of those special node IDs?
> 
> And now for memblock_set_node() you should apparently use NUMA_NO_NODE
> to not get a warning, but for memblock_set_region_node() apparently
> the right random constant to use is MAX_NUMNODES.
> 
> Does *any* of this make sense? No.
> 
> How about instead of having two random constants - and not having any
> range checking that I see - just have *one* random constant for "I
> have no range", call that NUMA_NO_NODE,

Just to mention it - my understanding is that this is an ongoing process
heading in this very direction. I'm not an mm person at all, so I can't
tell why the conversion wasn't done / can't be done all in one go.

Jan

> and then have a simple helper
> for "do I have a valid range", and make that be
> 
>    static inline bool numa_valid_node(int nid)
>    { return (unsigned int)nid < MAX_NUMNODES; }
> 
> or something like that? Notice that now *all* of
> 
>  - NUMA_NO_NODE (explicitly no node)
> 
>  - MAX_NUMNODES (randomly used no node)
> 
>  - out of range node (who knows wth firmware tables do?)
> 
> will get the same result from that "numa_valid_node()" function.
> 
> And at that point you don't need to care, you don't need to warn, and
> you don't need to have these insane rules where "sometimes you *HAVE*
> to use NUMA_NO_NODE, or we warn, in other cases MAX_NUMNODES is the
> thing".
> 
> Please? IOW, instead of adding a warning for fragile code, then change
> some caller to follow the new rules, JUST FIX THE STUPID FRAGILITY!
> 
> Or hey, just do
> 
>     #define NUMA_NO_NODE MAX_NUMNODES
> 
> and have two names for the *same* constant, instead fo having two
> different constants with strange semantic differences that seem to
> make no sense and where the memblock code itself seems to go
> back-and-forth on it in different contexts.
> 
>               Linus
Mike Rapoport June 14, 2024, 7:31 a.m. UTC | #5
On Fri, Jun 14, 2024 at 08:01:33AM +0200, Jan Beulich wrote:
> On 13.06.2024 19:38, Linus Torvalds wrote:
> > On Thu, 13 Jun 2024 at 10:09, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> >>
> >> Is there some broken scripting that people have started using (or have
> >> been using for a while and was recently broken)?
> > 
> > ... and then when I actually pull the code, I note that the problem
> > where it checked _one_ bogus value has just been replaced with
> > checking _another_ bogus value.
> > 
> > Christ.
> > 
> > What if people use a node ID that is simply outside the range
> > entirely, instead of one of those special node IDs?
> > 
> > And now for memblock_set_node() you should apparently use NUMA_NO_NODE
> > to not get a warning, but for memblock_set_region_node() apparently
> > the right random constant to use is MAX_NUMNODES.
> > 
> > Does *any* of this make sense? No.
> > 
> > How about instead of having two random constants - and not having any
> > range checking that I see - just have *one* random constant for "I
> > have no range", call that NUMA_NO_NODE,
> 
> Just to mention it - my understanding is that this is an ongoing process
> heading in this very direction. I'm not an mm person at all, so I can't
> tell why the conversion wasn't done / can't be done all in one go.

Nah, it's an historical mess and my oversight.
 
> Jan
Mike Rapoport June 14, 2024, 8:17 a.m. UTC | #6
On Thu, Jun 13, 2024 at 10:38:28AM -0700, Linus Torvalds wrote:
> On Thu, 13 Jun 2024 at 10:09, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Is there some broken scripting that people have started using (or have
> > been using for a while and was recently broken)?
> 
> ... and then when I actually pull the code, I note that the problem
> where it checked _one_ bogus value has just been replaced with
> checking _another_ bogus value.
> 
> Christ.
> 
> What if people use a node ID that is simply outside the range
> entirely, instead of one of those special node IDs?
> 
> And now for memblock_set_node() you should apparently use NUMA_NO_NODE
> to not get a warning, but for memblock_set_region_node() apparently
> the right random constant to use is MAX_NUMNODES.
> 
> Does *any* of this make sense? No.
> 
> How about instead of having two random constants - and not having any
> range checking that I see - just have *one* random constant for "I
> have no range", call that NUMA_NO_NODE, and then have a simple helper
> for "do I have a valid range", and make that be
> 
>    static inline bool numa_valid_node(int nid)
>    { return (unsigned int)nid < MAX_NUMNODES; }
> 
> or something like that? Notice that now *all* of
> 
>  - NUMA_NO_NODE (explicitly no node)
> 
>  - MAX_NUMNODES (randomly used no node)
> 
>  - out of range node (who knows wth firmware tables do?)
> 
> will get the same result from that "numa_valid_node()" function.
> 
> And at that point you don't need to care, you don't need to warn, and
> you don't need to have these insane rules where "sometimes you *HAVE*
> to use NUMA_NO_NODE, or we warn, in other cases MAX_NUMNODES is the
> thing".
> 
> Please? IOW, instead of adding a warning for fragile code, then change
> some caller to follow the new rules, JUST FIX THE STUPID FRAGILITY!
> 
> Or hey, just do
> 
>     #define NUMA_NO_NODE MAX_NUMNODES
> 
> and have two names for the *same* constant, instead fo having two
> different constants with strange semantic differences that seem to
> make no sense and where the memblock code itself seems to go
> back-and-forth on it in different contexts.

A single constant is likely to backfire because I remember seeing checks
like 'if (nid < 0)' so redefining NUMA_NO_NODE will require auditing all
those.

But a helper function works great.
I could only lightly test it as I don't have a fleet of machines with
variety of memory layouts, so I'm planning to push it into -next early next
week (with subject replaced by a more informative one)

From 319eddd74b372cae840782c7d53832ab30533a6b Mon Sep 17 00:00:00 2001
From: "Mike Rapoport (IBM)" <rppt@kernel.org>
Date: Fri, 14 Jun 2024 11:05:43 +0300
Subject: [PATCH] memblock: FIX THE STUPID FRAGILITY

Introduce numa_valid_node(nid) that verifies that nid is a valid node ID
and use that instead of comparing nid parameter with either NUMA_NO_NODE
or MAX_NUMNODES.

This makes the checks for valid node IDs consistent and more robust and
allows to get rid of multiple WARNings.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
---
 include/linux/numa.h |  5 +++++
 mm/memblock.c        | 28 +++++++---------------------
 2 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/include/linux/numa.h b/include/linux/numa.h
index 1d43371fafd2..eb19503604fe 100644
--- a/include/linux/numa.h
+++ b/include/linux/numa.h
@@ -15,6 +15,11 @@
 #define	NUMA_NO_NODE	(-1)
 #define	NUMA_NO_MEMBLK	(-1)
 
+static inline bool numa_valid_node(int nid)
+{
+	return nid >= 0 && nid < MAX_NUMNODES;
+}
+
 /* optionally keep NUMA memory info available post init */
 #ifdef CONFIG_NUMA_KEEP_MEMINFO
 #define __initdata_or_meminfo
diff --git a/mm/memblock.c b/mm/memblock.c
index 08e9806b1cf9..e81fb68f7f88 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -754,7 +754,7 @@ bool __init_memblock memblock_validate_numa_coverage(unsigned long threshold_byt
 
 	/* calculate lose page */
 	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
-		if (nid == NUMA_NO_NODE)
+		if (!numa_valid_node(nid))
 			nr_pages += end_pfn - start_pfn;
 	}
 
@@ -1061,7 +1061,7 @@ static bool should_skip_region(struct memblock_type *type,
 		return false;
 
 	/* only memory regions are associated with nodes, check it */
-	if (nid != NUMA_NO_NODE && nid != m_nid)
+	if (numa_valid_node(nid) && nid != m_nid)
 		return true;
 
 	/* skip hotpluggable memory regions if needed */
@@ -1118,10 +1118,6 @@ void __next_mem_range(u64 *idx, int nid, enum memblock_flags flags,
 	int idx_a = *idx & 0xffffffff;
 	int idx_b = *idx >> 32;
 
-	if (WARN_ONCE(nid == MAX_NUMNODES,
-	"Usage of MAX_NUMNODES is deprecated. Use NUMA_NO_NODE instead\n"))
-		nid = NUMA_NO_NODE;
-
 	for (; idx_a < type_a->cnt; idx_a++) {
 		struct memblock_region *m = &type_a->regions[idx_a];
 
@@ -1215,9 +1211,6 @@ void __init_memblock __next_mem_range_rev(u64 *idx, int nid,
 	int idx_a = *idx & 0xffffffff;
 	int idx_b = *idx >> 32;
 
-	if (WARN_ONCE(nid == MAX_NUMNODES, "Usage of MAX_NUMNODES is deprecated. Use NUMA_NO_NODE instead\n"))
-		nid = NUMA_NO_NODE;
-
 	if (*idx == (u64)ULLONG_MAX) {
 		idx_a = type_a->cnt - 1;
 		if (type_b != NULL)
@@ -1303,7 +1296,7 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid,
 
 		if (PFN_UP(r->base) >= PFN_DOWN(r->base + r->size))
 			continue;
-		if (nid == MAX_NUMNODES || nid == r_nid)
+		if (!numa_valid_node(nid) || nid == r_nid)
 			break;
 	}
 	if (*idx >= type->cnt) {
@@ -1339,10 +1332,6 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
 	int start_rgn, end_rgn;
 	int i, ret;
 
-	if (WARN_ONCE(nid == MAX_NUMNODES,
-		      "Usage of MAX_NUMNODES is deprecated. Use NUMA_NO_NODE instead\n"))
-		nid = NUMA_NO_NODE;
-
 	ret = memblock_isolate_range(type, base, size, &start_rgn, &end_rgn);
 	if (ret)
 		return ret;
@@ -1452,9 +1441,6 @@ phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
 	enum memblock_flags flags = choose_memblock_flags();
 	phys_addr_t found;
 
-	if (WARN_ONCE(nid == MAX_NUMNODES, "Usage of MAX_NUMNODES is deprecated. Use NUMA_NO_NODE instead\n"))
-		nid = NUMA_NO_NODE;
-
 	if (!align) {
 		/* Can't use WARNs this early in boot on powerpc */
 		dump_stack();
@@ -1467,7 +1453,7 @@ phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
 	if (found && !memblock_reserve(found, size))
 		goto done;
 
-	if (nid != NUMA_NO_NODE && !exact_nid) {
+	if (numa_valid_node(nid) && !exact_nid) {
 		found = memblock_find_in_range_node(size, align, start,
 						    end, NUMA_NO_NODE,
 						    flags);
@@ -1987,7 +1973,7 @@ static void __init_memblock memblock_dump(struct memblock_type *type)
 		end = base + size - 1;
 		flags = rgn->flags;
 #ifdef CONFIG_NUMA
-		if (memblock_get_region_node(rgn) != MAX_NUMNODES)
+		if (numa_valid_node(memblock_get_region_node(rgn)))
 			snprintf(nid_buf, sizeof(nid_buf), " on node %d",
 				 memblock_get_region_node(rgn));
 #endif
@@ -2181,7 +2167,7 @@ static void __init memmap_init_reserved_pages(void)
 			start = region->base;
 			end = start + region->size;
 
-			if (nid == NUMA_NO_NODE || nid >= MAX_NUMNODES)
+			if (!numa_valid_node(nid))
 				nid = early_pfn_to_nid(PFN_DOWN(start));
 
 			reserve_bootmem_region(start, end, nid);
@@ -2272,7 +2258,7 @@ static int memblock_debug_show(struct seq_file *m, void *private)
 
 		seq_printf(m, "%4d: ", i);
 		seq_printf(m, "%pa..%pa ", &reg->base, &end);
-		if (nid != MAX_NUMNODES)
+		if (numa_valid_node(nid))
 			seq_printf(m, "%4d ", nid);
 		else
 			seq_printf(m, "%4c ", 'x');
Linus Torvalds June 14, 2024, 4:28 p.m. UTC | #7
On Fri, 14 Jun 2024 at 01:20, Mike Rapoport <rppt@kernel.org> wrote:
>
> A single constant is likely to backfire because I remember seeing checks
> like 'if (nid < 0)' so redefining NUMA_NO_NODE will require auditing all
> those.

Yeah, fair enough.

> But a helper function works great.

Thanks, that patch looks like a nice improvement to me.

                Linus