diff mbox series

x86/NUMA: correct memnode_shift calculation for single node system

Message ID 87c5e6be-5ad8-fe2f-d729-4f9904a4a027@suse.com (mailing list archive)
State New, archived
Headers show
Series x86/NUMA: correct memnode_shift calculation for single node system | expand

Commit Message

Jan Beulich Sept. 27, 2022, 2:15 p.m. UTC
SRAT may describe even a single node system (including such with
multiple nodes, but only one having any memory) using multiple ranges.
Hence simply counting the number of ranges (note that function
parameters are mis-named) is not an indication of the number of nodes in
use. Since we only care about knowing whether we're on a single node
system, accounting for this is easy: Increment the local variable only
when adjacent ranges are for different nodes. That way the count may
still end up larger than the number of nodes in use, but it won't be
larger than 1 when only a single node has any memory.

To compensate populate_memnodemap() now needs to be prepared to find
the correct node ID already in place for a range. (This could of course
also happen when there's more than one node with memory, while at least
one node has multiple adjacent ranges, provided extract_lsb_from_nodes()
would also know to recognize this case.)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
On my Skylake system this changes memnodemapsize from 17 to 1 (and the
shift from 20 to 63).

Comments

Alex Bennée Sept. 27, 2022, 2:44 p.m. UTC | #1
Jan Beulich <jbeulich@suse.com> writes:

> SRAT may describe even a single node system (including such with
> multiple nodes, but only one having any memory) using multiple ranges.
> Hence simply counting the number of ranges (note that function
> parameters are mis-named) is not an indication of the number of nodes in
> use. Since we only care about knowing whether we're on a single node
> system, accounting for this is easy: Increment the local variable only
> when adjacent ranges are for different nodes. That way the count may
> still end up larger than the number of nodes in use, but it won't be
> larger than 1 when only a single node has any memory.
>
> To compensate populate_memnodemap() now needs to be prepared to find
> the correct node ID already in place for a range. (This could of course
> also happen when there's more than one node with memory, while at least
> one node has multiple adjacent ranges, provided extract_lsb_from_nodes()
> would also know to recognize this case.)
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> On my Skylake system this changes memnodemapsize from 17 to 1 (and the
> shift from 20 to 63).
>
> --- a/xen/arch/x86/numa.c
> +++ b/xen/arch/x86/numa.c
> @@ -78,7 +78,8 @@ static int __init populate_memnodemap(co
>          if ( (epdx >> shift) >= memnodemapsize )
>              return 0;
>          do {
> -            if ( memnodemap[spdx >> shift] != NUMA_NO_NODE )
> +            if ( memnodemap[spdx >> shift] != NUMA_NO_NODE &&
> +                 (!nodeids || memnodemap[spdx >> shift] != nodeids[i]) )
>                  return -1;
>  
>              if ( !nodeids )
> @@ -114,7 +115,7 @@ static int __init allocate_cachealigned_
>   * maximum possible shift.
>   */
>  static int __init extract_lsb_from_nodes(const struct node *nodes,
> -                                         int numnodes)
> +                                         int numnodes, const nodeid_t *nodeids)
>  {
>      int i, nodes_used = 0;
>      unsigned long spdx, epdx;
> @@ -127,7 +128,7 @@ static int __init extract_lsb_from_nodes
>          if ( spdx >= epdx )
>              continue;
>          bitfield |= spdx;
> -        nodes_used++;
> +        nodes_used += i == 0 || !nodeids || nodeids[i - 1] !=
>              nodeids[i];

Is that boolean short cutting worth it instead of a more easily
readable:

        if (i == 0 || !nodeids || nodeids[i - 1] != nodeids[i])
           nodes_used++;

?

>          if ( epdx > memtop )
>              memtop = epdx;
>      }
> @@ -144,7 +145,7 @@ int __init compute_hash_shift(struct nod
>  {
>      int shift;
>  
> -    shift = extract_lsb_from_nodes(nodes, numnodes);
> +    shift = extract_lsb_from_nodes(nodes, numnodes, nodeids);
>      if ( memnodemapsize <= ARRAY_SIZE(_memnodemap) )
>          memnodemap = _memnodemap;
>      else if ( allocate_cachealigned_memnodemap() )
Jan Beulich Sept. 27, 2022, 2:52 p.m. UTC | #2
On 27.09.2022 16:44, Alex Bennée wrote:
> Jan Beulich <jbeulich@suse.com> writes:
>> @@ -127,7 +128,7 @@ static int __init extract_lsb_from_nodes
>>          if ( spdx >= epdx )
>>              continue;
>>          bitfield |= spdx;
>> -        nodes_used++;
>> +        nodes_used += i == 0 || !nodeids || nodeids[i - 1] !=
>>              nodeids[i];
> 
> Is that boolean short cutting worth it instead of a more easily
> readable:
> 
>         if (i == 0 || !nodeids || nodeids[i - 1] != nodeids[i])
>            nodes_used++;
> 
> ?

If others (especially my co-maintainers) agree, I'd be willing to
switch. Generally I've come to prefer that form as it often serves
as an indication to compilers to try to avoid branches. (That said,
I've neither checked that this has this effect here, nor would it
really matter much, as this code is run exactly once during boot.)

Jan
Roger Pau Monne Sept. 27, 2022, 3:53 p.m. UTC | #3
On Tue, Sep 27, 2022 at 04:15:19PM +0200, Jan Beulich wrote:
> SRAT may describe even a single node system (including such with
> multiple nodes, but only one having any memory) using multiple ranges.
> Hence simply counting the number of ranges (note that function
> parameters are mis-named) is not an indication of the number of nodes in
> use. Since we only care about knowing whether we're on a single node
> system, accounting for this is easy: Increment the local variable only
> when adjacent ranges are for different nodes. That way the count may
> still end up larger than the number of nodes in use, but it won't be
> larger than 1 when only a single node has any memory.
> 
> To compensate populate_memnodemap() now needs to be prepared to find
> the correct node ID already in place for a range. (This could of course
> also happen when there's more than one node with memory, while at least
> one node has multiple adjacent ranges, provided extract_lsb_from_nodes()
> would also know to recognize this case.)
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

> ---
> On my Skylake system this changes memnodemapsize from 17 to 1 (and the
> shift from 20 to 63).
> 
> --- a/xen/arch/x86/numa.c
> +++ b/xen/arch/x86/numa.c
> @@ -78,7 +78,8 @@ static int __init populate_memnodemap(co
>          if ( (epdx >> shift) >= memnodemapsize )
>              return 0;
>          do {
> -            if ( memnodemap[spdx >> shift] != NUMA_NO_NODE )
> +            if ( memnodemap[spdx >> shift] != NUMA_NO_NODE &&
> +                 (!nodeids || memnodemap[spdx >> shift] != nodeids[i]) )
>                  return -1;
>  
>              if ( !nodeids )
> @@ -114,7 +115,7 @@ static int __init allocate_cachealigned_
>   * maximum possible shift.
>   */
>  static int __init extract_lsb_from_nodes(const struct node *nodes,
> -                                         int numnodes)
> +                                         int numnodes, const nodeid_t *nodeids)
>  {
>      int i, nodes_used = 0;
>      unsigned long spdx, epdx;
> @@ -127,7 +128,7 @@ static int __init extract_lsb_from_nodes
>          if ( spdx >= epdx )
>              continue;
>          bitfield |= spdx;
> -        nodes_used++;
> +        nodes_used += i == 0 || !nodeids || nodeids[i - 1] != nodeids[i];

I think I would also prefer the `if ( ... ) nodes_used++;` form, as
it's clearer.

Thanks, Roger.
Jan Beulich Sept. 27, 2022, 4:08 p.m. UTC | #4
On 27.09.2022 17:53, Roger Pau Monné wrote:
> On Tue, Sep 27, 2022 at 04:15:19PM +0200, Jan Beulich wrote:
>> SRAT may describe even a single node system (including such with
>> multiple nodes, but only one having any memory) using multiple ranges.
>> Hence simply counting the number of ranges (note that function
>> parameters are mis-named) is not an indication of the number of nodes in
>> use. Since we only care about knowing whether we're on a single node
>> system, accounting for this is easy: Increment the local variable only
>> when adjacent ranges are for different nodes. That way the count may
>> still end up larger than the number of nodes in use, but it won't be
>> larger than 1 when only a single node has any memory.
>>
>> To compensate populate_memnodemap() now needs to be prepared to find
>> the correct node ID already in place for a range. (This could of course
>> also happen when there's more than one node with memory, while at least
>> one node has multiple adjacent ranges, provided extract_lsb_from_nodes()
>> would also know to recognize this case.)
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

>> --- a/xen/arch/x86/numa.c
>> +++ b/xen/arch/x86/numa.c
>> @@ -78,7 +78,8 @@ static int __init populate_memnodemap(co
>>          if ( (epdx >> shift) >= memnodemapsize )
>>              return 0;
>>          do {
>> -            if ( memnodemap[spdx >> shift] != NUMA_NO_NODE )
>> +            if ( memnodemap[spdx >> shift] != NUMA_NO_NODE &&
>> +                 (!nodeids || memnodemap[spdx >> shift] != nodeids[i]) )
>>                  return -1;
>>  
>>              if ( !nodeids )
>> @@ -114,7 +115,7 @@ static int __init allocate_cachealigned_
>>   * maximum possible shift.
>>   */
>>  static int __init extract_lsb_from_nodes(const struct node *nodes,
>> -                                         int numnodes)
>> +                                         int numnodes, const nodeid_t *nodeids)
>>  {
>>      int i, nodes_used = 0;
>>      unsigned long spdx, epdx;
>> @@ -127,7 +128,7 @@ static int __init extract_lsb_from_nodes
>>          if ( spdx >= epdx )
>>              continue;
>>          bitfield |= spdx;
>> -        nodes_used++;
>> +        nodes_used += i == 0 || !nodeids || nodeids[i - 1] != nodeids[i];
> 
> I think I would also prefer the `if ( ... ) nodes_used++;` form, as
> it's clearer.

Okay, will switch then. This isn't for 4.17 anyway (I think), so
there's no rush.

Jan
Jan Beulich Sept. 28, 2022, 12:19 p.m. UTC | #5
On 27.09.2022 18:08, Jan Beulich wrote:
> On 27.09.2022 17:53, Roger Pau Monné wrote:
>> On Tue, Sep 27, 2022 at 04:15:19PM +0200, Jan Beulich wrote:
>>> @@ -127,7 +128,7 @@ static int __init extract_lsb_from_nodes
>>>          if ( spdx >= epdx )
>>>              continue;
>>>          bitfield |= spdx;
>>> -        nodes_used++;
>>> +        nodes_used += i == 0 || !nodeids || nodeids[i - 1] != nodeids[i];
>>
>> I think I would also prefer the `if ( ... ) nodes_used++;` form, as
>> it's clearer.
> 
> Okay, will switch then. This isn't for 4.17 anyway (I think), so
> there's no rush.

Actually I'm not so sure anymore as to 4.17 - we're in feature freeze until
the end of the week, not in code freeze. So I guess this (and the other two
related patches, provided they would get acked) ought to still be eligible.
I guess I'll give it a day for objections to surface, but otherwise commit
v2 perhaps during the afternoon tomorrow.

Jan
diff mbox series

Patch

--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -78,7 +78,8 @@  static int __init populate_memnodemap(co
         if ( (epdx >> shift) >= memnodemapsize )
             return 0;
         do {
-            if ( memnodemap[spdx >> shift] != NUMA_NO_NODE )
+            if ( memnodemap[spdx >> shift] != NUMA_NO_NODE &&
+                 (!nodeids || memnodemap[spdx >> shift] != nodeids[i]) )
                 return -1;
 
             if ( !nodeids )
@@ -114,7 +115,7 @@  static int __init allocate_cachealigned_
  * maximum possible shift.
  */
 static int __init extract_lsb_from_nodes(const struct node *nodes,
-                                         int numnodes)
+                                         int numnodes, const nodeid_t *nodeids)
 {
     int i, nodes_used = 0;
     unsigned long spdx, epdx;
@@ -127,7 +128,7 @@  static int __init extract_lsb_from_nodes
         if ( spdx >= epdx )
             continue;
         bitfield |= spdx;
-        nodes_used++;
+        nodes_used += i == 0 || !nodeids || nodeids[i - 1] != nodeids[i];
         if ( epdx > memtop )
             memtop = epdx;
     }
@@ -144,7 +145,7 @@  int __init compute_hash_shift(struct nod
 {
     int shift;
 
-    shift = extract_lsb_from_nodes(nodes, numnodes);
+    shift = extract_lsb_from_nodes(nodes, numnodes, nodeids);
     if ( memnodemapsize <= ARRAY_SIZE(_memnodemap) )
         memnodemap = _memnodemap;
     else if ( allocate_cachealigned_memnodemap() )