diff mbox

[3/4] acpi: provide _PXM method for CPU devices if QEMU is started numa enabled

Message ID 1467289387-47575-4-git-send-email-imammedo@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Igor Mammedov June 30, 2016, 12:23 p.m. UTC
fixes long standing issue where Linux kernel would assing
hotplugged CPU to 1st numa node as it discards proximity
for hotplugged CPUs after SRAT is parsed.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/acpi/cpu.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Marcel Apfelbaum June 30, 2016, 12:48 p.m. UTC | #1
On 06/30/2016 03:23 PM, Igor Mammedov wrote:
> fixes long standing issue where Linux kernel would assing
> hotplugged CPU to 1st numa node as it discards proximity
> for hotplugged CPUs after SRAT is parsed.
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>   hw/acpi/cpu.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
>
> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> index c13b65c..d9cf3ee 100644
> --- a/hw/acpi/cpu.c
> +++ b/hw/acpi/cpu.c
> @@ -4,6 +4,7 @@
>   #include "qapi/error.h"
>   #include "qapi-event.h"
>   #include "trace.h"
> +#include "sysemu/numa.h"
>
>   #define ACPI_CPU_HOTPLUG_REG_LEN 12
>   #define ACPI_CPU_SELECTOR_OFFSET_WR 0
> @@ -503,6 +504,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>
>           /* build Processor object for each processor */
>           for (i = 0; i < arch_ids->len; i++) {
> +            int j;
>               Aml *dev;
>               Aml *uid = aml_int(i);
>               GArray *madt_buf = g_array_new(0, 1, 1);
> @@ -546,6 +548,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>                             aml_arg(1), aml_arg(2))
>               );
>               aml_append(dev, method);
> +
> +            for (j = 0; j < nb_numa_nodes; j++) {
> +                if (test_bit(i, numa_info[j].node_cpu)) {
> +                    aml_append(dev, aml_name_decl("_PXM", aml_int(j)));
> +                }
> +            }
> +
>               aml_append(cpus_dev, dev);
>           }
>       }
>

I would add, at least in the commit message, a pointer to the ACPI spec:

ACPI 5.0 (6.2.13)
-----------------
If the Local APIC ID / Local SAPIC ID / Local x2APIC ID of a dynamically added processor is not
present in the System Resource Affinity Table (SRAT), a _PXM object must exist for the
processor’s device or one of its ancestors in the ACPI Namespace.


I suppose we don't have the APIC id in SRAT for all possible CPUs, so it OK.


Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>

Thanks,
Marcel
Igor Mammedov June 30, 2016, 1:01 p.m. UTC | #2
On Thu, 30 Jun 2016 15:48:54 +0300
Marcel Apfelbaum <marcel@redhat.com> wrote:

> On 06/30/2016 03:23 PM, Igor Mammedov wrote:
> > fixes long standing issue where Linux kernel would assing
> > hotplugged CPU to 1st numa node as it discards proximity
> > for hotplugged CPUs after SRAT is parsed.
> >
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >   hw/acpi/cpu.c | 9 +++++++++
> >   1 file changed, 9 insertions(+)
> >
> > diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> > index c13b65c..d9cf3ee 100644
> > --- a/hw/acpi/cpu.c
> > +++ b/hw/acpi/cpu.c
> > @@ -4,6 +4,7 @@
> >   #include "qapi/error.h"
> >   #include "qapi-event.h"
> >   #include "trace.h"
> > +#include "sysemu/numa.h"
> >
> >   #define ACPI_CPU_HOTPLUG_REG_LEN 12
> >   #define ACPI_CPU_SELECTOR_OFFSET_WR 0
> > @@ -503,6 +504,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >
> >           /* build Processor object for each processor */
> >           for (i = 0; i < arch_ids->len; i++) {
> > +            int j;
> >               Aml *dev;
> >               Aml *uid = aml_int(i);
> >               GArray *madt_buf = g_array_new(0, 1, 1);
> > @@ -546,6 +548,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >                             aml_arg(1), aml_arg(2))
> >               );
> >               aml_append(dev, method);
> > +
> > +            for (j = 0; j < nb_numa_nodes; j++) {
> > +                if (test_bit(i, numa_info[j].node_cpu)) {
> > +                    aml_append(dev, aml_name_decl("_PXM", aml_int(j)));
> > +                }
> > +            }
> > +
> >               aml_append(cpus_dev, dev);
> >           }
> >       }
> >  
> 
> I would add, at least in the commit message, a pointer to the ACPI spec:
> 
> ACPI 5.0 (6.2.13)
> -----------------
> If the Local APIC ID / Local SAPIC ID / Local x2APIC ID of a dynamically added processor is not
> present in the System Resource Affinity Table (SRAT), a _PXM object must exist for the
> processor’s device or one of its ancestors in the ACPI Namespace.
> 
> 
> I suppose we don't have the APIC id in SRAT for all possible CPUs, so it OK.
we have entries for possible CPUs in SRAT and commit says that Linux discards it,
hence we need to add _PXM to CPU objects. So broken linux handling would put
hotplugged CPUs into corrected nodes.

To fix it on linux side, ACPI part probably would need to be refactored to store
parsed tables info somewhere else, so it would be available past boot time
(not a small undertaking) I'd say.
While fixing it on QEMU side is easy and works well even for currently released
kernels.


> 
> 
> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Thanks!

> 
> Thanks,
> Marcel
Marcel Apfelbaum June 30, 2016, 1:11 p.m. UTC | #3
On 06/30/2016 04:01 PM, Igor Mammedov wrote:
> On Thu, 30 Jun 2016 15:48:54 +0300
> Marcel Apfelbaum <marcel@redhat.com> wrote:
>
>> On 06/30/2016 03:23 PM, Igor Mammedov wrote:
>>> fixes long standing issue where Linux kernel would assing
>>> hotplugged CPU to 1st numa node as it discards proximity
>>> for hotplugged CPUs after SRAT is parsed.
>>>
>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>>> ---
>>>    hw/acpi/cpu.c | 9 +++++++++
>>>    1 file changed, 9 insertions(+)
>>>
>>> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
>>> index c13b65c..d9cf3ee 100644
>>> --- a/hw/acpi/cpu.c
>>> +++ b/hw/acpi/cpu.c
>>> @@ -4,6 +4,7 @@
>>>    #include "qapi/error.h"
>>>    #include "qapi-event.h"
>>>    #include "trace.h"
>>> +#include "sysemu/numa.h"
>>>
>>>    #define ACPI_CPU_HOTPLUG_REG_LEN 12
>>>    #define ACPI_CPU_SELECTOR_OFFSET_WR 0
>>> @@ -503,6 +504,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>>
>>>            /* build Processor object for each processor */
>>>            for (i = 0; i < arch_ids->len; i++) {
>>> +            int j;
>>>                Aml *dev;
>>>                Aml *uid = aml_int(i);
>>>                GArray *madt_buf = g_array_new(0, 1, 1);
>>> @@ -546,6 +548,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>>                              aml_arg(1), aml_arg(2))
>>>                );
>>>                aml_append(dev, method);
>>> +
>>> +            for (j = 0; j < nb_numa_nodes; j++) {
>>> +                if (test_bit(i, numa_info[j].node_cpu)) {
>>> +                    aml_append(dev, aml_name_decl("_PXM", aml_int(j)));
>>> +                }
>>> +            }
>>> +
>>>                aml_append(cpus_dev, dev);
>>>            }
>>>        }
>>>
>>
>> I would add, at least in the commit message, a pointer to the ACPI spec:
>>
>> ACPI 5.0 (6.2.13)
>> -----------------
>> If the Local APIC ID / Local SAPIC ID / Local x2APIC ID of a dynamically added processor is not
>> present in the System Resource Affinity Table (SRAT), a _PXM object must exist for the
>> processor’s device or one of its ancestors in the ACPI Namespace.
>>
>>
>> I suppose we don't have the APIC id in SRAT for all possible CPUs, so it OK.
> we have entries for possible CPUs in SRAT and commit says that Linux discards it,
> hence we need to add _PXM to CPU objects. So broken linux handling would put
> hotplugged CPUs into corrected nodes.
>

OK, so the commit message was misleading: "Fixes" :)
Just flip "Fix" with "Workaround for... "

> To fix it on linux side, ACPI part probably would need to be refactored to store
> parsed tables info somewhere else, so it would be available past boot time
> (not a small undertaking) I'd say.

Maybe we should at least report it to the right mailing list.

> While fixing it on QEMU side is easy and works well even for currently released
> kernels.
>

I have nothing against this approach.

Thanks,
Marcel

>
>>
>>
>> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
> Thanks!
>
>>
>> Thanks,
>> Marcel
>
Michael S. Tsirkin June 30, 2016, 5:47 p.m. UTC | #4
On Thu, Jun 30, 2016 at 02:23:06PM +0200, Igor Mammedov wrote:
> fixes long standing issue where Linux kernel would assing

assign?

> hotplugged CPU to 1st numa node as it discards proximity
> for hotplugged CPUs after SRAT is parsed.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  hw/acpi/cpu.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> index c13b65c..d9cf3ee 100644
> --- a/hw/acpi/cpu.c
> +++ b/hw/acpi/cpu.c
> @@ -4,6 +4,7 @@
>  #include "qapi/error.h"
>  #include "qapi-event.h"
>  #include "trace.h"
> +#include "sysemu/numa.h"
>  
>  #define ACPI_CPU_HOTPLUG_REG_LEN 12
>  #define ACPI_CPU_SELECTOR_OFFSET_WR 0
> @@ -503,6 +504,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>  
>          /* build Processor object for each processor */
>          for (i = 0; i < arch_ids->len; i++) {
> +            int j;
>              Aml *dev;
>              Aml *uid = aml_int(i);
>              GArray *madt_buf = g_array_new(0, 1, 1);
> @@ -546,6 +548,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>                            aml_arg(1), aml_arg(2))
>              );
>              aml_append(dev, method);
> +
> +            for (j = 0; j < nb_numa_nodes; j++) {
> +                if (test_bit(i, numa_info[j].node_cpu)) {
> +                    aml_append(dev, aml_name_decl("_PXM", aml_int(j)));
> +                }
> +            }

this code is duplicated in a couple of other places.
how about a helper to find the numa node for a CPU?


> +
>              aml_append(cpus_dev, dev);
>          }
>      }


I'd like a comment here explaining why it's needed.
E.g. /*
      * Linux guests discard SRAT info for non-present CPUs
      * as a result _PXM is required for all CPUs which might
      * be hot-plugged. For simplicity, add it for all CPUs.
      */


> -- 
> 1.8.3.1
Michael S. Tsirkin June 30, 2016, 5:48 p.m. UTC | #5
On Thu, Jun 30, 2016 at 04:11:44PM +0300, Marcel Apfelbaum wrote:
> On 06/30/2016 04:01 PM, Igor Mammedov wrote:
> > On Thu, 30 Jun 2016 15:48:54 +0300
> > Marcel Apfelbaum <marcel@redhat.com> wrote:
> > 
> > > On 06/30/2016 03:23 PM, Igor Mammedov wrote:
> > > > fixes long standing issue where Linux kernel would assing
> > > > hotplugged CPU to 1st numa node as it discards proximity
> > > > for hotplugged CPUs after SRAT is parsed.
> > > > 
> > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > ---
> > > >    hw/acpi/cpu.c | 9 +++++++++
> > > >    1 file changed, 9 insertions(+)
> > > > 
> > > > diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> > > > index c13b65c..d9cf3ee 100644
> > > > --- a/hw/acpi/cpu.c
> > > > +++ b/hw/acpi/cpu.c
> > > > @@ -4,6 +4,7 @@
> > > >    #include "qapi/error.h"
> > > >    #include "qapi-event.h"
> > > >    #include "trace.h"
> > > > +#include "sysemu/numa.h"
> > > > 
> > > >    #define ACPI_CPU_HOTPLUG_REG_LEN 12
> > > >    #define ACPI_CPU_SELECTOR_OFFSET_WR 0
> > > > @@ -503,6 +504,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> > > > 
> > > >            /* build Processor object for each processor */
> > > >            for (i = 0; i < arch_ids->len; i++) {
> > > > +            int j;
> > > >                Aml *dev;
> > > >                Aml *uid = aml_int(i);
> > > >                GArray *madt_buf = g_array_new(0, 1, 1);
> > > > @@ -546,6 +548,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> > > >                              aml_arg(1), aml_arg(2))
> > > >                );
> > > >                aml_append(dev, method);
> > > > +
> > > > +            for (j = 0; j < nb_numa_nodes; j++) {
> > > > +                if (test_bit(i, numa_info[j].node_cpu)) {
> > > > +                    aml_append(dev, aml_name_decl("_PXM", aml_int(j)));
> > > > +                }
> > > > +            }
> > > > +
> > > >                aml_append(cpus_dev, dev);
> > > >            }
> > > >        }
> > > > 
> > > 
> > > I would add, at least in the commit message, a pointer to the ACPI spec:
> > > 
> > > ACPI 5.0 (6.2.13)
> > > -----------------
> > > If the Local APIC ID / Local SAPIC ID / Local x2APIC ID of a dynamically added processor is not
> > > present in the System Resource Affinity Table (SRAT), a _PXM object must exist for the
> > > processor’s device or one of its ancestors in the ACPI Namespace.
> > > 
> > > 
> > > I suppose we don't have the APIC id in SRAT for all possible CPUs, so it OK.
> > we have entries for possible CPUs in SRAT and commit says that Linux discards it,
> > hence we need to add _PXM to CPU objects. So broken linux handling would put
> > hotplugged CPUs into corrected nodes.
> > 
> 
> OK, so the commit message was misleading: "Fixes" :)
> Just flip "Fix" with "Workaround for... "
> 
> > To fix it on linux side, ACPI part probably would need to be refactored to store
> > parsed tables info somewhere else, so it would be available past boot time
> > (not a small undertaking) I'd say.
> 
> Maybe we should at least report it to the right mailing list.

For sure, this can't hurt, but does not have to block
this patch.

> 
> > While fixing it on QEMU side is easy and works well even for currently released
> > kernels.
> > 
> 
> I have nothing against this approach.
> 
> Thanks,
> Marcel
> 
> > 
> > > 
> > > 
> > > Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
> > Thanks!
> > 
> > > 
> > > Thanks,
> > > Marcel
> >
Igor Mammedov July 1, 2016, 8:12 a.m. UTC | #6
On Thu, 30 Jun 2016 20:47:33 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

Thanks for review,
I'll fix up patch according to your comments and post v3 shortly

> On Thu, Jun 30, 2016 at 02:23:06PM +0200, Igor Mammedov wrote:
> > fixes long standing issue where Linux kernel would assing  
> 
> assign?
> 
> > hotplugged CPU to 1st numa node as it discards proximity
> > for hotplugged CPUs after SRAT is parsed.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  hw/acpi/cpu.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> > index c13b65c..d9cf3ee 100644
> > --- a/hw/acpi/cpu.c
> > +++ b/hw/acpi/cpu.c
> > @@ -4,6 +4,7 @@
> >  #include "qapi/error.h"
> >  #include "qapi-event.h"
> >  #include "trace.h"
> > +#include "sysemu/numa.h"
> >  
> >  #define ACPI_CPU_HOTPLUG_REG_LEN 12
> >  #define ACPI_CPU_SELECTOR_OFFSET_WR 0
> > @@ -503,6 +504,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >  
> >          /* build Processor object for each processor */
> >          for (i = 0; i < arch_ids->len; i++) {
> > +            int j;
> >              Aml *dev;
> >              Aml *uid = aml_int(i);
> >              GArray *madt_buf = g_array_new(0, 1, 1);
> > @@ -546,6 +548,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >                            aml_arg(1), aml_arg(2))
> >              );
> >              aml_append(dev, method);
> > +
> > +            for (j = 0; j < nb_numa_nodes; j++) {
> > +                if (test_bit(i, numa_info[j].node_cpu)) {
> > +                    aml_append(dev, aml_name_decl("_PXM", aml_int(j)));
> > +                }
> > +            }  
> 
> this code is duplicated in a couple of other places.
> how about a helper to find the numa node for a CPU?
> 
> 
> > +
> >              aml_append(cpus_dev, dev);
> >          }
> >      }  
> 
> 
> I'd like a comment here explaining why it's needed.
> E.g. /*
>       * Linux guests discard SRAT info for non-present CPUs
>       * as a result _PXM is required for all CPUs which might
>       * be hot-plugged. For simplicity, add it for all CPUs.
>       */
> 
> 
> > -- 
> > 1.8.3.1  
>
diff mbox

Patch

diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
index c13b65c..d9cf3ee 100644
--- a/hw/acpi/cpu.c
+++ b/hw/acpi/cpu.c
@@ -4,6 +4,7 @@ 
 #include "qapi/error.h"
 #include "qapi-event.h"
 #include "trace.h"
+#include "sysemu/numa.h"
 
 #define ACPI_CPU_HOTPLUG_REG_LEN 12
 #define ACPI_CPU_SELECTOR_OFFSET_WR 0
@@ -503,6 +504,7 @@  void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
 
         /* build Processor object for each processor */
         for (i = 0; i < arch_ids->len; i++) {
+            int j;
             Aml *dev;
             Aml *uid = aml_int(i);
             GArray *madt_buf = g_array_new(0, 1, 1);
@@ -546,6 +548,13 @@  void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
                           aml_arg(1), aml_arg(2))
             );
             aml_append(dev, method);
+
+            for (j = 0; j < nb_numa_nodes; j++) {
+                if (test_bit(i, numa_info[j].node_cpu)) {
+                    aml_append(dev, aml_name_decl("_PXM", aml_int(j)));
+                }
+            }
+
             aml_append(cpus_dev, dev);
         }
     }