diff mbox series

[v2,2/9] xen/x86: Use enumerations to indicate NUMA status

Message ID 20220708145424.1848572-3-wei.chen@arm.com (mailing list archive)
State Superseded
Headers show
Series Device tree based NUMA support for Arm - Part#2 | expand

Commit Message

Wei Chen July 8, 2022, 2:54 p.m. UTC
In current code, x86 is using two variables, numa_off and acpi_numa,
to indicate the NUMA status. This is because NUMA is not coupled with
ACPI, and ACPI still can work without NUMA on x86. With these two
variables' combinations, x86 can have several NUMA status:
NUMA swith on,
NUMA swith off,
NUMA swith on with NUMA emulation,
NUMA swith on with no-ACPI,
NUMA swith on with ACPI.

In this case, we introduce an enumeration numa_mode in this patch
to indicate above NUMA status, except NUMA on with emulation. Because
NUMA emulation has another variable, numa_fake, to indicate the number
of nodes for emulation. We can't use the enumeration to replace it at
the same time. But it still can be indicated by numa_on and numa_fake
as what it has been indicated.

Based on the enumeration we introduce numa_enabled_with_firmware for
callers to check NUMA status is enabled + ACPI. Using this helper is
because some NUMA implementation will use other firmware, this helper
will be easy to them to check enabled + others.

As we have touched srat_disabled, we have corrected its return value
from int to bool.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
v1 -> v2:
1. Remove fw_numa.
2. Use enumeration to replace numa_off and acpi_numa.
3. Correct return value of srat_disabled.
4. Introduce numa_enabled_with_firmware.
---
 xen/arch/x86/include/asm/acpi.h |  1 -
 xen/arch/x86/include/asm/numa.h | 16 +++++++++++++---
 xen/arch/x86/numa.c             | 28 +++++++++++++++-------------
 xen/arch/x86/setup.c            |  3 ++-
 xen/arch/x86/srat.c             | 13 +++++++------
 5 files changed, 37 insertions(+), 24 deletions(-)

Comments

Jan Beulich July 12, 2022, 12:49 p.m. UTC | #1
On 08.07.2022 16:54, Wei Chen wrote:
> In current code, x86 is using two variables, numa_off and acpi_numa,
> to indicate the NUMA status. This is because NUMA is not coupled with
> ACPI, and ACPI still can work without NUMA on x86. With these two
> variables' combinations, x86 can have several NUMA status:
> NUMA swith on,
> NUMA swith off,
> NUMA swith on with NUMA emulation,
> NUMA swith on with no-ACPI,
> NUMA swith on with ACPI.

Hmm, with both this and the actual change I'm not able to convince
myself that you've expressed the prior combinations correctly. May
I suggest that you make table representing the 6 (I think)
combinations of original states with their mapping to the new
enumerators? (It doesn't need to be 6 different enumerators, but
all 6 existing states need a [proper] representation in the new
model.)

As an aside - I think you mean "switched" in all five of these
lines.

> --- a/xen/arch/x86/include/asm/numa.h
> +++ b/xen/arch/x86/include/asm/numa.h
> @@ -28,12 +28,22 @@ extern nodeid_t pxm_to_node(unsigned int pxm);
>  #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
>  #define VIRTUAL_BUG_ON(x) 
>  
> +/* Enumerations for NUMA status. */
> +enum numa_mode {
> +	numa_on = 0,
> +	numa_off,

May I suggest to switch these two around, such that "off" becomes
the meaning of 0, potentially allowing ! to be used in a boolean-
like fashion here or there? And please omit the "= 0" part - it's
only non-zero first values which actually need spelling out.

> +	/* NUMA turns on, but ACPI table is bad or disabled. */
> +	numa_no_acpi,
> +	/* NUMA turns on, and ACPI table works well. */
> +	numa_acpi,

As to the names of these: In the description you already say that
you want to re-use the code for non-ACPI cases. Wouldn't you better
avoid "acpi" in the names then (rather than perhaps renaming these
another time later on)?

I'd also like to understand what useful state "numa_no_acpi" is.
I realize this was a state expressable by the two original
variables, but does it make sense?

> @@ -528,7 +528,8 @@ int __init acpi_scan_nodes(paddr_t start, paddr_t end)
>  	for (i = 0; i < MAX_NUMNODES; i++)
>  		cutoff_node(i, start, end);
>  
> -	if (acpi_numa <= 0)
> +	/* Only when numa_on with good firmware, we can do numa scan nodes. */
> +	if (!numa_enabled_with_firmware())
>  		return -1;

Nit: Perhaps drop "do numa" from the comment?

Jan
Wei Chen July 14, 2022, 9:03 a.m. UTC | #2
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2022年7月12日 20:49
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: nd <nd@arm.com>; Andrew Cooper <andrew.cooper3@citrix.com>; Roger Pau
> Monné <roger.pau@citrix.com>; Wei Liu <wl@xen.org>; xen-
> devel@lists.xenproject.org
> Subject: Re: [PATCH v2 2/9] xen/x86: Use enumerations to indicate NUMA
> status
> 
> On 08.07.2022 16:54, Wei Chen wrote:
> > In current code, x86 is using two variables, numa_off and acpi_numa,
> > to indicate the NUMA status. This is because NUMA is not coupled with
> > ACPI, and ACPI still can work without NUMA on x86. With these two
> > variables' combinations, x86 can have several NUMA status:
> > NUMA swith on,
> > NUMA swith off,
> > NUMA swith on with NUMA emulation,
> > NUMA swith on with no-ACPI,
> > NUMA swith on with ACPI.
> 
> Hmm, with both this and the actual change I'm not able to convince
> myself that you've expressed the prior combinations correctly. May
> I suggest that you make table representing the 6 (I think)
> combinations of original states with their mapping to the new
> enumerators? (It doesn't need to be 6 different enumerators, but
> all 6 existing states need a [proper] representation in the new
> model.)
> 

Sorry for replying later, I paid sometime to check the code again,
and drew a table like below, I ignore two columns when numa_off=true
and acpi_numa=1/-1. Because when numa_off = true, ACPI srat table
will not be parsed:
+-----------+---------+---------------+-----------+------------+----------+
| original  |  col1   |  col2         |  col3     |  col4      |  col5    |
+-----------+---------+---------------+-----------+------------+----------+
|numa_off   |true     |false          |false      |false       |false     |
|acpi_numa  |0        |0              |1          |-1          |x         |
|numa_fake  |x        |x              |x          |x           |fake_nodes|
|enum state |numa_off |numa_fw_nodata |numa_fw_ok |numa_fw_bad |numa_emu  |
+-----------+---------+---------------+-----------+------------+----------+

Notes:
The following scenarios will make acpi_numa=0:
1. Xen disable ACPI through acpi_disabled = 1.
2. ACPI table doesn't have SRAT, or SRAT doesn't contain CPU and memory
   NUMA affinity information.
3. Emulate numa through "numa=fake" command line parameter. But I found
   that when numa_fake is enabled, current code will not prevent ACPI
   srat parsers to parse NUMA information. So acpi_numa still may be
   changed later. Is there any further reasons that we still need to
   parse NUMA info from SRAT table when numa=fake? Or can we set
   acpi_numa = -1 in nume_setup when "numa=fake" to make srat_disabled
   can prevent ACPI SRAT parsing.

If meet the following conditions, then acpi_numa = 1:
1. Xen enable ACPI through acpi_disabled = 0.
2. ACPI SRAT parser can parse CPU and Memory NUMA affinities successfully
   from SRAT table.
3. Pass the memmory blocks' sanity check and hash computing in
   acpi_scan_nodes.

The following scenarios will make acpi_numa=-1:
1. ACPI table is disabled  by "numa=noacpi" command like parameter.
2. Xen enable ACPI through acpi_disabled = 0 but ACPI SRAT parser can not
   parse CPU or Memory NUMA affinities successfully from SRAT table.
3. The memmory blocks' sanity check or hash computing in acpi_scan_nodes
   is failed.

> As an aside - I think you mean "switched" in all five of these
> lines.
> 

Yes, I will fix them in next version.

> > --- a/xen/arch/x86/include/asm/numa.h
> > +++ b/xen/arch/x86/include/asm/numa.h
> > @@ -28,12 +28,22 @@ extern nodeid_t pxm_to_node(unsigned int pxm);
> >  #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
> >  #define VIRTUAL_BUG_ON(x)
> >
> > +/* Enumerations for NUMA status. */
> > +enum numa_mode {
> > +	numa_on = 0,
> > +	numa_off,
> 
> May I suggest to switch these two around, such that "off" becomes
> the meaning of 0, potentially allowing ! to be used in a boolean-
> like fashion here or there? And please omit the "= 0" part - it's
> only non-zero first values which actually need spelling out.
> 

Ok.

> > +	/* NUMA turns on, but ACPI table is bad or disabled. */
> > +	numa_no_acpi,
> > +	/* NUMA turns on, and ACPI table works well. */
> > +	numa_acpi,
> 
> As to the names of these: In the description you already say that
> you want to re-use the code for non-ACPI cases. Wouldn't you better
> avoid "acpi" in the names then (rather than perhaps renaming these
> another time later on)?
> 
> I'd also like to understand what useful state "numa_no_acpi" is.
> I realize this was a state expressable by the two original
> variables, but does it make sense?
> 

I have updated the new names in above table. And "numa_no_acpi"
is mapping to " numa_fw_bad" enum state.

> > @@ -528,7 +528,8 @@ int __init acpi_scan_nodes(paddr_t start, paddr_t
> end)
> >  	for (i = 0; i < MAX_NUMNODES; i++)
> >  		cutoff_node(i, start, end);
> >
> > -	if (acpi_numa <= 0)
> > +	/* Only when numa_on with good firmware, we can do numa scan nodes.
> */
> > +	if (!numa_enabled_with_firmware())
> >  		return -1;
> 
> Nit: Perhaps drop "do numa" from the comment?
> 

Ok, I will do it in next version.

Cheers,
Wei Chen

> Jan
Jan Beulich July 14, 2022, 9:32 a.m. UTC | #3
On 14.07.2022 11:03, Wei Chen wrote:
> Hi Jan,
> 
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 2022年7月12日 20:49
>> To: Wei Chen <Wei.Chen@arm.com>
>> Cc: nd <nd@arm.com>; Andrew Cooper <andrew.cooper3@citrix.com>; Roger Pau
>> Monné <roger.pau@citrix.com>; Wei Liu <wl@xen.org>; xen-
>> devel@lists.xenproject.org
>> Subject: Re: [PATCH v2 2/9] xen/x86: Use enumerations to indicate NUMA
>> status
>>
>> On 08.07.2022 16:54, Wei Chen wrote:
>>> In current code, x86 is using two variables, numa_off and acpi_numa,
>>> to indicate the NUMA status. This is because NUMA is not coupled with
>>> ACPI, and ACPI still can work without NUMA on x86. With these two
>>> variables' combinations, x86 can have several NUMA status:
>>> NUMA swith on,
>>> NUMA swith off,
>>> NUMA swith on with NUMA emulation,
>>> NUMA swith on with no-ACPI,
>>> NUMA swith on with ACPI.
>>
>> Hmm, with both this and the actual change I'm not able to convince
>> myself that you've expressed the prior combinations correctly. May
>> I suggest that you make table representing the 6 (I think)
>> combinations of original states with their mapping to the new
>> enumerators? (It doesn't need to be 6 different enumerators, but
>> all 6 existing states need a [proper] representation in the new
>> model.)
>>
> 
> Sorry for replying later, I paid sometime to check the code again,
> and drew a table like below, I ignore two columns when numa_off=true
> and acpi_numa=1/-1. Because when numa_off = true, ACPI srat table
> will not be parsed:

While I agree with this fact, the problem is that there are two
independent command line options driving the then single variable.
IOW numa_off and acpi_numa both turned on may still need a
representation. In fact I'm not convinced we can eliminate the
original variables. What we ought to be able to do is consolidate
their values into the one single new variable you add, before
ever evaluating anything. _Then_ I think I agree with the table.

Jan

> +-----------+---------+---------------+-----------+------------+----------+
> | original  |  col1   |  col2         |  col3     |  col4      |  col5    |
> +-----------+---------+---------------+-----------+------------+----------+
> |numa_off   |true     |false          |false      |false       |false     |
> |acpi_numa  |0        |0              |1          |-1          |x         |
> |numa_fake  |x        |x              |x          |x           |fake_nodes|
> |enum state |numa_off |numa_fw_nodata |numa_fw_ok |numa_fw_bad |numa_emu  |
> +-----------+---------+---------------+-----------+------------+----------+
> 
> Notes:
> The following scenarios will make acpi_numa=0:
> 1. Xen disable ACPI through acpi_disabled = 1.
> 2. ACPI table doesn't have SRAT, or SRAT doesn't contain CPU and memory
>    NUMA affinity information.
> 3. Emulate numa through "numa=fake" command line parameter. But I found
>    that when numa_fake is enabled, current code will not prevent ACPI
>    srat parsers to parse NUMA information. So acpi_numa still may be
>    changed later. Is there any further reasons that we still need to
>    parse NUMA info from SRAT table when numa=fake? Or can we set
>    acpi_numa = -1 in nume_setup when "numa=fake" to make srat_disabled
>    can prevent ACPI SRAT parsing.
> 
> If meet the following conditions, then acpi_numa = 1:
> 1. Xen enable ACPI through acpi_disabled = 0.
> 2. ACPI SRAT parser can parse CPU and Memory NUMA affinities successfully
>    from SRAT table.
> 3. Pass the memmory blocks' sanity check and hash computing in
>    acpi_scan_nodes.
> 
> The following scenarios will make acpi_numa=-1:
> 1. ACPI table is disabled  by "numa=noacpi" command like parameter.
> 2. Xen enable ACPI through acpi_disabled = 0 but ACPI SRAT parser can not
>    parse CPU or Memory NUMA affinities successfully from SRAT table.
> 3. The memmory blocks' sanity check or hash computing in acpi_scan_nodes
>    is failed.
Wei Chen July 14, 2022, 9:55 a.m. UTC | #4
Hi Jan,

> -----Original Message-----
> >>
> >
> > Sorry for replying later, I paid sometime to check the code again,
> > and drew a table like below, I ignore two columns when numa_off=true
> > and acpi_numa=1/-1. Because when numa_off = true, ACPI srat table
> > will not be parsed:
> 
> While I agree with this fact, the problem is that there are two
> independent command line options driving the then single variable.
> IOW numa_off and acpi_numa both turned on may still need a
> representation. In fact I'm not convinced we can eliminate the
> original variables. What we ought to be able to do is consolidate
> their values into the one single new variable you add, before
> ever evaluating anything. _Then_ I think I agree with the table.
> 
> Jan
> 
> > +-----------+---------+---------------+-----------+------------+--------
> --+
> > | original  |  col1   |  col2         |  col3     |  col4      |  col5
> |
> > +-----------+---------+---------------+-----------+------------+--------
> --+
> > |numa_off   |true     |false          |false      |false       |false
> |
> > |acpi_numa  |0        |0              |1          |-1          |x
> |
> > |numa_fake  |x        |x              |x          |x
> |fake_nodes|
> > |enum state |numa_off |numa_fw_nodata |numa_fw_ok |numa_fw_bad |numa_emu
> |
> > +-----------+---------+---------------+-----------+------------+--------
> --+
> >

How about update the table like this:
+------------+----------+----------------+----------------+------------+
|  original  |          |                |                |            |
+------------+----------+----------------+----------------+------------+
| numa_off   | true     | true           | true           | true       |
| acpi_numa  | 0        | 1              | -1             | x          |
| numa_fake  | x        | x              | x              | fake_nodes |
| enum state | numa_off | numa_off       | numa_off       | numa_off   |
+------------+----------+----------------+----------------+------------+

+------------+----------------+------------+-------------+------------+
|  original  |                |            |             |            |
+------------+----------------+------------+-------------+------------+
| numa_off   | false          | false      | false       | false      |
| acpi_numa  | 0              | 1          | -1          | x          |
| numa_fake  | x              | x          | x           | fake_nodes |
| enum state | numa_fw_nodata | numa_fw_ok | numa_fw_bad | numa_emu   |
+------------+----------------+------------+-------------+------------+

Cheers,
Wei Chen

> > Notes:
> > The following scenarios will make acpi_numa=0:
sanity check or hash computing in acpi_scan_nodes
> >    is failed.
Jan Beulich July 14, 2022, 9:57 a.m. UTC | #5
On 14.07.2022 11:55, Wei Chen wrote:
> Hi Jan,
> 
>> -----Original Message-----
>>>>
>>>
>>> Sorry for replying later, I paid sometime to check the code again,
>>> and drew a table like below, I ignore two columns when numa_off=true
>>> and acpi_numa=1/-1. Because when numa_off = true, ACPI srat table
>>> will not be parsed:
>>
>> While I agree with this fact, the problem is that there are two
>> independent command line options driving the then single variable.
>> IOW numa_off and acpi_numa both turned on may still need a
>> representation. In fact I'm not convinced we can eliminate the
>> original variables. What we ought to be able to do is consolidate
>> their values into the one single new variable you add, before
>> ever evaluating anything. _Then_ I think I agree with the table.
>>
>> Jan
>>
>>> +-----------+---------+---------------+-----------+------------+--------
>> --+
>>> | original  |  col1   |  col2         |  col3     |  col4      |  col5
>> |
>>> +-----------+---------+---------------+-----------+------------+--------
>> --+
>>> |numa_off   |true     |false          |false      |false       |false
>> |
>>> |acpi_numa  |0        |0              |1          |-1          |x
>> |
>>> |numa_fake  |x        |x              |x          |x
>> |fake_nodes|
>>> |enum state |numa_off |numa_fw_nodata |numa_fw_ok |numa_fw_bad |numa_emu
>> |
>>> +-----------+---------+---------------+-----------+------------+--------
>> --+
>>>
> 
> How about update the table like this:
> +------------+----------+----------------+----------------+------------+
> |  original  |          |                |                |            |
> +------------+----------+----------------+----------------+------------+
> | numa_off   | true     | true           | true           | true       |
> | acpi_numa  | 0        | 1              | -1             | x          |
> | numa_fake  | x        | x              | x              | fake_nodes |
> | enum state | numa_off | numa_off       | numa_off       | numa_off   |
> +------------+----------+----------------+----------------+------------+
> 
> +------------+----------------+------------+-------------+------------+
> |  original  |                |            |             |            |
> +------------+----------------+------------+-------------+------------+
> | numa_off   | false          | false      | false       | false      |
> | acpi_numa  | 0              | 1          | -1          | x          |
> | numa_fake  | x              | x          | x           | fake_nodes |
> | enum state | numa_fw_nodata | numa_fw_ok | numa_fw_bad | numa_emu   |
> +------------+----------------+------------+-------------+------------+

Well, this makes the table complete, but it doesn't explain how you mean
to fold the settings of the two command line options into one variable.

Jan
Wei Chen July 14, 2022, 10:26 a.m. UTC | #6
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2022年7月14日 17:58
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: nd <nd@arm.com>; Andrew Cooper <andrew.cooper3@citrix.com>; Roger Pau
> Monné <roger.pau@citrix.com>; Wei Liu <wl@xen.org>; xen-
> devel@lists.xenproject.org
> Subject: Re: [PATCH v2 2/9] xen/x86: Use enumerations to indicate NUMA
> status
> 
> >
> > How about update the table like this:
> > +------------+----------+----------------+----------------+------------+
> > |  original  |          |                |                |            |
> > +------------+----------+----------------+----------------+------------+
> > | numa_off   | true     | true           | true           | true       |
> > | acpi_numa  | 0        | 1              | -1             | x          |
> > | numa_fake  | x        | x              | x              | fake_nodes |
> > | enum state | numa_off | numa_off       | numa_off       | numa_off   |
> > +------------+----------+----------------+----------------+------------+
> >
> > +------------+----------------+------------+-------------+------------+
> > |  original  |                |            |             |            |
> > +------------+----------------+------------+-------------+------------+
> > | numa_off   | false          | false      | false       | false      |
> > | acpi_numa  | 0              | 1          | -1          | x          |
> > | numa_fake  | x              | x          | x           | fake_nodes |
> > | enum state | numa_fw_nodata | numa_fw_ok | numa_fw_bad | numa_emu   |
> > +------------+----------------+------------+-------------+------------+
> 
> Well, this makes the table complete, but it doesn't explain how you mean
> to fold the settings of the two command line options into one variable.
> 

No matter how many separate "numa=" parameters have been parsed from
Command line, the values of these original variables are determined
after parsing the command line. So the determined status can be mapped
to the new one variable from above table.

Thanks,
Wei Chen

> Jan
Jan Beulich July 14, 2022, 10:37 a.m. UTC | #7
On 14.07.2022 12:26, Wei Chen wrote:
> Hi Jan,
> 
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 2022年7月14日 17:58
>> To: Wei Chen <Wei.Chen@arm.com>
>> Cc: nd <nd@arm.com>; Andrew Cooper <andrew.cooper3@citrix.com>; Roger Pau
>> Monné <roger.pau@citrix.com>; Wei Liu <wl@xen.org>; xen-
>> devel@lists.xenproject.org
>> Subject: Re: [PATCH v2 2/9] xen/x86: Use enumerations to indicate NUMA
>> status
>>
>>>
>>> How about update the table like this:
>>> +------------+----------+----------------+----------------+------------+
>>> |  original  |          |                |                |            |
>>> +------------+----------+----------------+----------------+------------+
>>> | numa_off   | true     | true           | true           | true       |
>>> | acpi_numa  | 0        | 1              | -1             | x          |
>>> | numa_fake  | x        | x              | x              | fake_nodes |
>>> | enum state | numa_off | numa_off       | numa_off       | numa_off   |
>>> +------------+----------+----------------+----------------+------------+
>>>
>>> +------------+----------------+------------+-------------+------------+
>>> |  original  |                |            |             |            |
>>> +------------+----------------+------------+-------------+------------+
>>> | numa_off   | false          | false      | false       | false      |
>>> | acpi_numa  | 0              | 1          | -1          | x          |
>>> | numa_fake  | x              | x          | x           | fake_nodes |
>>> | enum state | numa_fw_nodata | numa_fw_ok | numa_fw_bad | numa_emu   |
>>> +------------+----------------+------------+-------------+------------+
>>
>> Well, this makes the table complete, but it doesn't explain how you mean
>> to fold the settings of the two command line options into one variable.
>>
> 
> No matter how many separate "numa=" parameters have been parsed from
> Command line, the values of these original variables are determined
> after parsing the command line. So the determined status can be mapped
> to the new one variable from above table.

Hmm, I was partly wrong - the initial values of both variables are
indeed set from just the single "numa=" parsing. But later on they
"evolve" independently, and multiple "numa=" on the command line
can also have "interesting" effects. I'm afraid I still can't
convince myself that the new mapping fully represents all originally
possible combinations (nor can I convince myself that in the existing
code everything is working as intended).

Jan
Wei Chen July 14, 2022, 10:49 a.m. UTC | #8
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2022年7月14日 18:37
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: nd <nd@arm.com>; Andrew Cooper <andrew.cooper3@citrix.com>; Roger Pau
> Monné <roger.pau@citrix.com>; Wei Liu <wl@xen.org>; xen-
> devel@lists.xenproject.org
> Subject: Re: [PATCH v2 2/9] xen/x86: Use enumerations to indicate NUMA
> status
> >>
> >> Well, this makes the table complete, but it doesn't explain how you
> mean
> >> to fold the settings of the two command line options into one variable.
> >>
> >
> > No matter how many separate "numa=" parameters have been parsed from
> > Command line, the values of these original variables are determined
> > after parsing the command line. So the determined status can be mapped
> > to the new one variable from above table.
> 
> Hmm, I was partly wrong - the initial values of both variables are
> indeed set from just the single "numa=" parsing. But later on they
> "evolve" independently, and multiple "numa=" on the command line
> can also have "interesting" effects. I'm afraid I still can't

Can you provide some examples? This way I can better understand your
concerns.

Cheers,
Wei Chen

> convince myself that the new mapping fully represents all originally
> possible combinations (nor can I convince myself that in the existing
> code everything is working as intended).
> 
> Jan
Jan Beulich July 14, 2022, 10:51 a.m. UTC | #9
On 14.07.2022 12:37, Jan Beulich wrote:
> On 14.07.2022 12:26, Wei Chen wrote:
>>> From: Jan Beulich <jbeulich@suse.com>
>>> Sent: 2022年7月14日 17:58
>>> To: Wei Chen <Wei.Chen@arm.com>
>>>>
>>>> How about update the table like this:
>>>> +------------+----------+----------------+----------------+------------+
>>>> |  original  |          |                |                |            |
>>>> +------------+----------+----------------+----------------+------------+
>>>> | numa_off   | true     | true           | true           | true       |
>>>> | acpi_numa  | 0        | 1              | -1             | x          |
>>>> | numa_fake  | x        | x              | x              | fake_nodes |
>>>> | enum state | numa_off | numa_off       | numa_off       | numa_off   |
>>>> +------------+----------+----------------+----------------+------------+
>>>>
>>>> +------------+----------------+------------+-------------+------------+
>>>> |  original  |                |            |             |            |
>>>> +------------+----------------+------------+-------------+------------+
>>>> | numa_off   | false          | false      | false       | false      |
>>>> | acpi_numa  | 0              | 1          | -1          | x          |
>>>> | numa_fake  | x              | x          | x           | fake_nodes |
>>>> | enum state | numa_fw_nodata | numa_fw_ok | numa_fw_bad | numa_emu   |
>>>> +------------+----------------+------------+-------------+------------+
>>>
>>> Well, this makes the table complete, but it doesn't explain how you mean
>>> to fold the settings of the two command line options into one variable.
>>>
>>
>> No matter how many separate "numa=" parameters have been parsed from
>> Command line, the values of these original variables are determined
>> after parsing the command line. So the determined status can be mapped
>> to the new one variable from above table.
> 
> Hmm, I was partly wrong - the initial values of both variables are
> indeed set from just the single "numa=" parsing. But later on they
> "evolve" independently, and multiple "numa=" on the command line
> can also have "interesting" effects. I'm afraid I still can't
> convince myself that the new mapping fully represents all originally
> possible combinations (nor can I convince myself that in the existing
> code everything is working as intended).

Maybe the solution is to make numa_off common but keep acpi_numa
arch-specific? Then e.g. the replacement of srat_disabled() could
be

int numa_disabled(void)
{
    return numa_off || arch_numa_disabled();
}

with arch_numa_disabled() evaluating acpi_numa on x86.

Jan
Jan Beulich July 14, 2022, 10:58 a.m. UTC | #10
On 14.07.2022 12:49, Wei Chen wrote:
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 2022年7月14日 18:37
>> status
>>>>
>>>> Well, this makes the table complete, but it doesn't explain how you
>> mean
>>>> to fold the settings of the two command line options into one variable.
>>>>
>>>
>>> No matter how many separate "numa=" parameters have been parsed from
>>> Command line, the values of these original variables are determined
>>> after parsing the command line. So the determined status can be mapped
>>> to the new one variable from above table.
>>
>> Hmm, I was partly wrong - the initial values of both variables are
>> indeed set from just the single "numa=" parsing. But later on they
>> "evolve" independently, and multiple "numa=" on the command line
>> can also have "interesting" effects. I'm afraid I still can't
> 
> Can you provide some examples? This way I can better understand your
> concerns.

Take bad_srat(): you convert "acpi_numa = -1" to setting numa_no_acpi.
Yet imo (matching the present model) numa_off shouldn't be affected.
While your change is fine in practice for (current) x86, it is wrong
in the abstract model (which is relevant when making things common).

Jan
Wei Chen July 15, 2022, 7:18 a.m. UTC | #11
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2022年7月14日 18:58
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: nd <nd@arm.com>; Andrew Cooper <andrew.cooper3@citrix.com>; Roger Pau
> Monné <roger.pau@citrix.com>; Wei Liu <wl@xen.org>; xen-
> devel@lists.xenproject.org
> Subject: Re: [PATCH v2 2/9] xen/x86: Use enumerations to indicate NUMA
> status
> 
> On 14.07.2022 12:49, Wei Chen wrote:
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 2022年7月14日 18:37
> >> status
> >>>>
> >>>> Well, this makes the table complete, but it doesn't explain how you
> >> mean
> >>>> to fold the settings of the two command line options into one
> variable.
> >>>>
> >>>
> >>> No matter how many separate "numa=" parameters have been parsed from
> >>> Command line, the values of these original variables are determined
> >>> after parsing the command line. So the determined status can be mapped
> >>> to the new one variable from above table.
> >>
> >> Hmm, I was partly wrong - the initial values of both variables are
> >> indeed set from just the single "numa=" parsing. But later on they
> >> "evolve" independently, and multiple "numa=" on the command line
> >> can also have "interesting" effects. I'm afraid I still can't
> >
> > Can you provide some examples? This way I can better understand your
> > concerns.
> 
> Take bad_srat(): you convert "acpi_numa = -1" to setting numa_no_acpi.
> Yet imo (matching the present model) numa_off shouldn't be affected.
> While your change is fine in practice for (current) x86, it is wrong
> in the abstract model (which is relevant when making things common).
> 

Thanks, I understand your concerns now. In this case, I agree with your
suggestion in previous e-mail:

> int numa_disabled(void)
> {
>    return numa_off || arch_numa_disabled();
> }
>
> with arch_numa_disabled() evaluating acpi_numa on x86.

I would update this patch like above sample in next version. And in
this way, I think We don't need the new enumerations and mapping table.

Cheers,
Wei Chen

> Jan
Wei Chen July 25, 2022, 2:28 a.m. UTC | #12
Hi Jan,

On 2022/7/14 18:51, Jan Beulich wrote:
>>>
>>> No matter how many separate "numa=" parameters have been parsed from
>>> Command line, the values of these original variables are determined
>>> after parsing the command line. So the determined status can be mapped
>>> to the new one variable from above table.
>>
>> Hmm, I was partly wrong - the initial values of both variables are
>> indeed set from just the single "numa=" parsing. But later on they
>> "evolve" independently, and multiple "numa=" on the command line
>> can also have "interesting" effects. I'm afraid I still can't
>> convince myself that the new mapping fully represents all originally
>> possible combinations (nor can I convince myself that in the existing
>> code everything is working as intended).
> 
> Maybe the solution is to make numa_off common but keep acpi_numa
> arch-specific? Then e.g. the replacement of srat_disabled() could
> be
> 
> int numa_disabled(void)
> {
>      return numa_off || arch_numa_disabled();
> }
> 
> with arch_numa_disabled() evaluating acpi_numa on x86.
> 

While I am working on the new version, I think this may be not enough,
we may need another helper like arch_handle_numa_param to prevent direct
accessing of acpi_numa in numa_setup.

Cheers,
Wei Chen

> Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/include/asm/acpi.h b/xen/arch/x86/include/asm/acpi.h
index 9a9cc4c240..ab0d56dd70 100644
--- a/xen/arch/x86/include/asm/acpi.h
+++ b/xen/arch/x86/include/asm/acpi.h
@@ -101,7 +101,6 @@  extern unsigned long acpi_wakeup_address;
 
 #define ARCH_HAS_POWER_INIT	1
 
-extern s8 acpi_numa;
 extern int acpi_scan_nodes(u64 start, u64 end);
 #define NR_NODE_MEMBLKS (MAX_NUMNODES*2)
 
diff --git a/xen/arch/x86/include/asm/numa.h b/xen/arch/x86/include/asm/numa.h
index c32ccffde3..ee8262d969 100644
--- a/xen/arch/x86/include/asm/numa.h
+++ b/xen/arch/x86/include/asm/numa.h
@@ -28,12 +28,22 @@  extern nodeid_t pxm_to_node(unsigned int pxm);
 #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
 #define VIRTUAL_BUG_ON(x) 
 
+/* Enumerations for NUMA status. */
+enum numa_mode {
+	numa_on = 0,
+	numa_off,
+	/* NUMA turns on, but ACPI table is bad or disabled. */
+	numa_no_acpi,
+	/* NUMA turns on, and ACPI table works well. */
+	numa_acpi,
+};
+
 extern void numa_add_cpu(int cpu);
 extern void numa_init_array(void);
-extern bool numa_off;
-
+extern bool numa_enabled_with_firmware(void);
+extern enum numa_mode numa_status;
 
-extern int srat_disabled(void);
+extern bool srat_disabled(void);
 extern void numa_set_node(int cpu, nodeid_t node);
 extern nodeid_t setup_node(unsigned int pxm);
 extern void srat_detect_node(int cpu);
diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
index 627ae8aa95..0777a7518d 100644
--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -47,12 +47,16 @@  cpumask_t node_to_cpumask[MAX_NUMNODES] __read_mostly;
 
 nodemask_t __read_mostly node_online_map = { { [0] = 1UL } };
 
-bool numa_off;
-s8 acpi_numa = 0;
+enum numa_mode numa_status;
 
-int srat_disabled(void)
+bool srat_disabled(void)
 {
-    return numa_off || acpi_numa < 0;
+    return numa_status == numa_off || numa_status == numa_no_acpi;
+}
+
+bool __init numa_enabled_with_firmware(void)
+{
+    return numa_status == numa_acpi;
 }
 
 /*
@@ -254,12 +258,13 @@  void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
 #endif
 
 #ifdef CONFIG_ACPI_NUMA
-    if ( !numa_off && !acpi_scan_nodes(start, end) )
+    if ( numa_status != numa_off && !acpi_scan_nodes(start, end) )
         return;
 #endif
 
     printk(KERN_INFO "%s\n",
-           numa_off ? "NUMA turned off" : "No NUMA configuration found");
+           numa_status == numa_off ? "NUMA turned off"
+                                   : "No NUMA configuration found");
 
     printk(KERN_INFO "Faking a node at %"PRIpaddr"-%"PRIpaddr"\n",
            start, end);
@@ -292,13 +297,13 @@  void numa_set_node(int cpu, nodeid_t node)
 static int __init cf_check numa_setup(const char *opt)
 {
     if ( !strncmp(opt,"off",3) )
-        numa_off = true;
+        numa_status = numa_off;
     else if ( !strncmp(opt,"on",2) )
-        numa_off = false;
+        numa_status = numa_on;
 #ifdef CONFIG_NUMA_EMU
     else if ( !strncmp(opt, "fake=", 5) )
     {
-        numa_off = false;
+        numa_status = numa_on;
         numa_fake = simple_strtoul(opt+5,NULL,0);
         if ( numa_fake >= MAX_NUMNODES )
             numa_fake = MAX_NUMNODES;
@@ -306,10 +311,7 @@  static int __init cf_check numa_setup(const char *opt)
 #endif
 #ifdef CONFIG_ACPI_NUMA
     else if ( !strncmp(opt,"noacpi",6) )
-    {
-        numa_off = false;
-        acpi_numa = -1;
-    }
+        numa_status = numa_no_acpi;
 #endif
     else
         return -EINVAL;
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index f08b07b8de..4841af5926 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -329,7 +329,8 @@  void srat_detect_node(int cpu)
     node_set_online(node);
     numa_set_node(cpu, node);
 
-    if ( opt_cpu_info && acpi_numa > 0 )
+    /* Print CPU info when NUMA is enabled with ACPI. */
+    if ( opt_cpu_info && numa_enabled_with_firmware() )
         printk("CPU %d APIC %d -> Node %d\n", cpu, apicid, node);
 }
 
diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index f53431f5e8..422e4c73e3 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -185,7 +185,7 @@  static __init void bad_srat(void)
 {
 	int i;
 	printk(KERN_ERR "SRAT: SRAT not used.\n");
-	acpi_numa = -1;
+	numa_status = numa_no_acpi;
 	for (i = 0; i < MAX_LOCAL_APIC; i++)
 		apicid_to_node[i] = NUMA_NO_NODE;
 	for (i = 0; i < ARRAY_SIZE(pxm2node); i++)
@@ -260,7 +260,7 @@  acpi_numa_x2apic_affinity_init(const struct acpi_srat_x2apic_cpu_affinity *pa)
 
 	apicid_to_node[pa->apic_id] = node;
 	node_set(node, processor_nodes_parsed);
-	acpi_numa = 1;
+	numa_status = numa_acpi;
 
 	if (opt_acpi_verbose)
 		printk(KERN_INFO "SRAT: PXM %u -> APIC %08x -> Node %u\n",
@@ -295,7 +295,7 @@  acpi_numa_processor_affinity_init(const struct acpi_srat_cpu_affinity *pa)
 	}
 	apicid_to_node[pa->apic_id] = node;
 	node_set(node, processor_nodes_parsed);
-	acpi_numa = 1;
+	numa_status = numa_acpi;
 
 	if (opt_acpi_verbose)
 		printk(KERN_INFO "SRAT: PXM %u -> APIC %02x -> Node %u\n",
@@ -484,7 +484,7 @@  static int __init cf_check srat_parse_region(
 	    (ma->flags & ACPI_SRAT_MEM_NON_VOLATILE))
 		return 0;
 
-	if (numa_off)
+	if (numa_status == numa_off)
 		printk(KERN_INFO "SRAT: %013"PRIx64"-%013"PRIx64"\n",
 		       ma->base_address, ma->base_address + ma->length - 1);
 
@@ -499,7 +499,7 @@  void __init srat_parse_regions(paddr_t addr)
 	u64 mask;
 	unsigned int i;
 
-	if (acpi_disabled || acpi_numa < 0 ||
+	if (acpi_disabled || numa_status == numa_no_acpi ||
 	    acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat))
 		return;
 
@@ -528,7 +528,8 @@  int __init acpi_scan_nodes(paddr_t start, paddr_t end)
 	for (i = 0; i < MAX_NUMNODES; i++)
 		cutoff_node(i, start, end);
 
-	if (acpi_numa <= 0)
+	/* Only when numa_on with good firmware, we can do numa scan nodes. */
+	if (!numa_enabled_with_firmware())
 		return -1;
 
 	if (!nodes_cover_memory()) {